CSCI 150: Lab 5

Strings and Lists
Due: 11:59 PM on Friday March 19 Tuesday March 23

The purpose of this lab is to:

  • Practice using strings, lists, and while loops
  • Play a game!
  • Explore the use of programming for data science in biology

Optional Prelab

We have put together a set of optional prelab questions (with an answer key) that will help you work through some of the ideas related to this lab before you start doing any programming. You can find these questions here in Prelab 5. It is highly recommended that you read through the prelab before working on the lab.

Part 1 - Primes

primes.py: 10 points, individual.

As you may know, a number x is said to be prime if x is at least 2, and the only proper factors of x are itself and 1 (meaning that x is only divisible by 1 and itself). So the first few primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, etc. 4 isn't prime, since it is divisible by 2. Same goes for 6 and 8. 9 is out thanks to 3. And so on. There are a lot of primes. More precisely, there are infinitely many primes.

Describe the Problem

Write a program called primes.py that creates a list of prime numbers, and then prints out the contents of that list.

Input: A number n.
Output: The first n primes

Understand the Problem

If the user enters 13 then the output should be

  The first 13 primes are:
  2 3 5 7 11 13 17 19 23 29 31 37 41
                          

Design an Algorithm

Write pseudocode to solve this problem. You should decompose your main algorithm into small manageable chunks. For example, you should:

  • Design an algorithm that takes in an integer x and determines whether x is prime. For example, "isPrime(10)" would return false, while "isPrime(31)" would return true.
  • Make liberal use of this isPrime function to generate the first n primes.
  • Each time you find a prime number, add it to a list that contains all of the prime numbers found so far.
  • After you have found the first n prime numbers, print the contents of your list of prime numbers to the user (in a format similar to the example output above — to earn full credit, your output should not contain any brackets or commmas).

Implement a Design

We will want at least three functions for this program: isPrime(x) that returns True if x is prime and False otherwise, findPrimes(n) that returns a list containing the first n prime numbers, and our usual main(). As a hint, you will want to use a while loop as you search for primes inside the findPrimes function, since you won't know ahead of time just how far you need to go.

Test the Program

Try your program with a variety of inputs n. Certainly you should try n=0,1,13 but you might want to try larger n values to make sure your program works (although it might take a little longer to find all of the first prime numbers).

Part 2 - Mind Mastery

game.py: 14 points, partner encouraged.

Mastermind is a neat (although oftentimes frustrating) puzzle game. It works a something like this: There are two players. One player is the codemaker (your program), the other is the codebreaker (the user). The codemaker chooses a sequence of four colored pegs, out of a possible six colors (red, blue, green, yellow, orange, and purple). They may repeat colors and place them in any order they wish. This sequence is hidden from the codebreaker. The codebreaker has 10 chances to guess the sequence. The codebreaker places colored pegs down to indicate each of their guesses. After each guess, the codemaker is required to reveal certain information about how close the guess was to the actual hidden sequence.

Describe the Problem:

In this part of the lab, you will create a program to play Mastermind, where the computer is playing the codemaker, and the human user is the codebreaker. Thus your program needs to generate a secret code, and repeatedly prompt the user for guesses. For each guess, your program needs to give appropriate feedback (more detail below). The game ends when either the user guesses correctly (wins) or uses up 10 guesses (loses).

Understand the Problem:

The trickiest part of this game is determining how to provide feedback on the codebreaker's guesses. In particular, next to each guess that the codebreaker makes, the codemaker places up to four clue pegs. Each clue peg is either black or white. Each black peg indicates a correct color in a correct spot. Each white peg indicates a correct color in an incorrect spot. No indication is given as to which clue corresponds to which guess.

For example, suppose that the code is RYGY (red yellow green yellow). Then the guess GRGY (green red green yellow) would cause the codemaker to put down 2 black pegs (since guesses 3 and 4 were correct) and 1 white peg (since the red guess was correct, but out of place). Note that no peg was given for guess 1 even though there was a green in the code; this is because that green had already been "counted" (a black peg had been given for that one).

As another example, again using RYGY as our code, the guess YBBB would generate 1 white peg and 0 black; yellow appears twice in the code, but the guess only contains one yellow peg. Likewise, for the guess BRRR, only 1 white peg is given; there is an R in the code, but only one. Below is a table with guesses and the correponding number of black and white pegs given for that guess (still assuming the code is RYGY).

guess black pegs white pegs
YYYY 2 0
YRYR 0 3
BBPO 0 0
PGYR 0 3
YYYG 1 2
RYGY 4 0
Check here for an online graphical version of the game (where their red pegs are our black pegs).

A sample run of our text-based program may look like this:

Sample output

  %python game.py
    
  I have a 4 letter code, made from 6 colours.
  The colours are R, G, B, Y, P, or O.

	Your guess: GGGG
    Not quite. You get 0 black pegs, 0 white pegs.

        Your guess: YYYY
    Not quite. You get 1 black pegs, 0 white pegs.

        Your guess: YOYO
    Not quite. You get 0 black pegs, 2 white pegs.

        Your guess: PPYO
    Not quite. You get 1 black pegs, 2 white pegs.

        Your guess: POYB
    Not quite. You get 1 black pegs, 3 white pegs.

        Your guess: PBOY
    You win! So clever.
                          

Design an Algorithm

Once you understand how the game works, you should design a pseudocode to help you plan your program. The general steps are:

  • Randomly choose the codemaker's code.
  • Repeatedly prompt the user for their guess
    • If their guess is correct, end the game with a congratulatory message.
    • Otherwise, give clues (i.e., pegs) that correspond to their guess.
Some of these steps are straight-forward, but certainly it would be worth your while to write down an approach to randomly generating the code, and giving the clue. The prelab talks about some of this process; our recommendation for the clues algorithm is as follows:

  • First assign the black pegs. Do this by iterating through both strings one character at a time, assigning a black peg if the characters in the same position match. If they do match, change that character/peg in your guess to an 'x' and in the code to a 'z' so that you know you have used both of these pegs for a clue (and you won't use them again when assigning white pegs and accidentaly double-count).
  • Next assign the white pegs. Do this by considering the first peg in the guess string and trying to find the matching character in the code string (note that if the first character is an 'x' then there definitely won't be a match in the code string). If you find a match, again change the guess character to 'x', the matching code character to 'y' and continue with the second peg in the guess string, then the third, and so on. Because of your changes to 'x' and 'y', you won't assign both a black and white peg for one guess or clue peg.

Helpful Tips

  • Your design should use strings to represent both the code and the user's guess. Trying to use lists might seem like a natural fit for this problem (since we need to change some pegs to a 'x' or 'z' to indicate that they've been matched so that we do not count too many pegs), but that will introduce unexpected behavior in your programs due to other differences between strings and lists (in particular, changing the contents of the list in a function will cause it to be changed everywhere so that you will lose track of the original code generated at the start of the program).
  • Example code explaining how to replace a character in a string is given below.

Implement a Design

Now that you have some of the ideas worked out in theory, it is time to write your program game.py.

You may assume the user always provides a guess with the available colors, and always in uppercase.

Make and use an integer variable NUM_TURNS that represents the number of allowable turns (say, 10).

generateCode()

To generate the code, write a function generateCode() that generates the codemaker's code (and returns it as a String to the caller). That is, this function should randomly generate 4 colored pegs, selected from R, B, G, Y, O, and P, and return it as a 4-letter string. You'll want to use the random functions as discussed in lab03 in order to randomly generate a color for each peg. One way to randomly choose a peg for part of the code is to generate an integer between 0 and 5 inclusive, and use if-statements to map each result to one of the 6 colors (if the random number is a 0, add a "R" to the code; if the random number is a 1, add a "B" to the code; etc.). Test your generateCode() function thoroughly before continuing. No, seriously, test it before continuing.

clue(code, guess)

Next, write a function clue(code, guess) that prints out the white and black clue pegs according to the given guess and code, and returns True if code equals guess, and False otherwise. Translate the pseudocode above to help you out.

Note that you can "change" the i-th character in a string s to an 'x' as follows:


  s = s[0:i] + "x" + s[i+1:len(s)]
                        
Also note you can omit the len(s) from the above expression. That is, if you write s[i:], Python interprets that as the substring of s from position i to the end. Similarly, s[:i] denotes the substring of s from the beginning up to (but not including) i.

Test the Program

It is hard to test your program when you are given a random code that you don't know. Therefore, before you turn in the assignment, you should print out a hint message at the beginning of the program with the actual code, so that the graders know what the correct answer is when evaluating the number of black and white pegs your program provides.

Part 3 - Penguin Data Science!

Gentoo Penguin, credit to Andrew Shiva at Wikipedia, CC-BY-SA 4.0 (no changes)


penguins.py: 12 points, partner encouraged.

One of the benefits of using computers to solve problems is they can process data very quickly for us to help us discover important facts about the real-world. In this lab, we will be using Python to perform some data science on data describing observations biologists recorded about three different species of penguins on different islands around Antarctica. In this way, we are using data science as an interdisciplinary approach to answering questions related to biodiversity within biology.

Special thanks and credit to Professor Allison Horst at the University of California Santa Barbara for making this data set freely available! Twitter post and thread with more information and GitHub repository.

Describe the Problem:

Complete the already started program penguins.py to discover how the adelie, chinstrap, and gentoo penguins differ from one another.

Understand the Problem:

Penguin Data

We can represent each penguin as a list of two strings and four measurements: species, home island, bill length (in mm), bill depth (in mm), flipper length (in mm), and body mass (in g).

Thus, one penguin might be represented as a list:


  penguin = ["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0]
                        

and a list of 5 penguins might be represented as a two-dimensional list:


  fivePenguinsList = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0], 
                      ["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
                      ["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0], 
                      ["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],  
                      ["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
                        

Note: we will read in all the penguin observations from file, so you do not need to make any assignments like the above (they merely illustrate what the data looks like).

The program penguins.py already provides a read_data function for you that reads in all of the penguins from the included penguins_data.csv file and returns a list of individual penguin lists (structured similar to fivePenguinsList above).

Program Goal

Your goal in this program is to use that list of penguins to discover possible differences between the three species of penguins (adelie, chinstrap, and gentoo) based on their data. In particular, your program should contain:

  1. A function findSpecies that takes in a list of penguins (a list of lists) and a species name (a string) as inputs and outputs a smaller list of lists that contains only the penguins that belong to the given species.
  2. A function getMeasurements that takes as input a list of penguins (a list of lists) and an index (an integer), then outputs a new list that contains the value of penguin[index] for each penguin in the inputted list of penguins.
  3. Three functions findAverage, findMin, and findMax that each take in a list of numbers and output the average, minimum, and maximum values of the inputted list, repectively.

We can then use call these functions in the main function of your program to do the following:

  1. Ask the user for a species name (either "Adelie", "Chinstrap", or "Gentoo"). Your code should make sure that they entered in a valid option; if the user did not, you should print a message telling them what mistake they made then close the program. (Tip: you might want to provide the user with a menu like we did with filters in Lab 4).
  2. Ask the user for which type of measurement they want to see (either bill length or body mass). Again, make sure they provide a valid option; if the user did not choose a valid option, you should again print a message explaining their mistake and then close the program.
  3. Based on the species chosen by the user in Step 1, create a list of the penguins that are this species.
  4. Calculate and then print the average, minimum, and maximum of the measurements selected by the user in Step 2 (bill length or body mass) for the list of penguins created in Step 3.

Design an Algorithm:

findSpecies(penguins, species)

For the findSpecies function, we will want to create a new empty list, then loop over all the penguins in the inputted list (i.e., the entire collection of penguin data). Inside our loop, we will only append a penguin to the new (originally empty) list if that penguin's species matches the one that was input to the function. Since the species is the first measurement of a penguin, we can access it by using penguin[0], where penguin is one of the inner lists of your data collection passed into the function. After the loop has considered all of the penguins given in the input, then we can return the new list because it will have all the correct penguins we need.

For example, say we have the same five penguins as we did above:


  fivePenguinsList = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
                      ["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
                      ["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0],
                      ["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],
                      ["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
                        

Then, if I want to create a list of only the adelie penguins from those five, I can call findSpecies(fivePenguinsList, "Adelie"), which should return a list with the two adelie penguins:


  [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
   ["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0]]
                        

findMeasurements(penguins, index)

For the findMeasurements function, we will once again want to create an empty list, then loop over all the penguins in the inputted list (i.e., the collection of penguins belonging to a particular species). Inside the loop, we will get the measurement at the given index from the current penguin and append that measurement to the new (originally empty) list. We do not append the entire penguin to the new list. Then, when the loop is finished, we return the new list (which is now a list of number measurements, instead of a list of penguins).

For example, say we have the same five penguins as we did above:


  fivePenguinsList = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0], 
                      ["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
                      ["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0], 
                      ["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],  
                      ["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
                        

Then, if I want to create a list of all of their bill lengths, I can call findMeasurements(fivePenguinsList, 2), which should return a list:


  [39.1, 37.8, 46.1, 50.0, 46.5]
			

Here, I passed in 2 for the index because in each penguin, 2 is the index of the bill length measurement.

findAverage(nums)

For the findAverage function, we will want to add together all the numbers in the inputted list, then divide that total by the count of numbers in the list and return the result.

findMax(nums) and findMin(nums)

For the findMax and findMin functions, the class lecture notes should be helpful. You should not use Python's built in max and min functions here.

Implement a Design

You will implement your solution by adding code to the penguins.py program.

Looping over Penguins

As a hint of how to loop over all the individual penguins contained in a list of penguins (i.e., a list of lists), we can use the following code:


for penguin in penguins:
    # do something with penguin, which is a list of measurements
			

Closing a Program

To close a program at any point in the code (e.g., if the user gives us an invalid option), we can use the following line:

sys.exit(-1)

which requires us to add

import sys

at the top of our program.

Test the Program:

If correctly implemented, you should see the following results:

Species Measurement Min Average Max
Adelie Bill Length 32.1 38.7914 46.0
Adelie Body Mass 2850.0 3700.6623 4775.0
Chinstrap Bill Length 40.9 48.8338 58.0
Chinstrap Body Mass 2700.0 3733.0882 4800.0
Gentoo Bill Length 40.9 47.5049 59.6
Gentoo Body Mass 3950.0 5076.0163 6300.0

from which we can observe that adelie and chinstrap penguins have similar body masses, but differ in their bill length. Also, chinstrap and gentoo penguins have similar bill lengths, but different body masses.

Maintain:

Make sure your code is "readable": use short but meaningful variable names and make sure to add your own comments explaining what you are doing.

Part 4 - Wrap Up

Congratulations! You have finished the fifth lab. As with every lab, your last job prior to submission is to complete a brief write-up by filling out a Google Form, which is also how you submit your Honor Code statement (so please do not forget to do this).

Handin

Finally, all you need to do is submit your solution to the assignment. To do so, please click on the "Submit" button in the top right of your programming environment. This will save all of your programs for myself and the graders to look at. You are welcome and encouraged to submit partial solutions to your assignment -- you can always resubmit by pushing the "Resubmit" button in the top right, which will save the latest version of your programs. We will only grade the latest version that you submitted. By submitting multiple times, you avoid accidentially forgetting to turn in any completed part of your assignment, especially in case you become busy around the lab due date.


A. Eck, T. Wexler, A. Sharp.