CSCI 150: Lab 5
Strings
Due:
10PM
on
Tuesday
October 15th
The purpose of this lab is to:
- Practice using strings and files
- Play a game!
- Explore a connection between computer science and biology
Before you begin, please create a folder called lab05 inside your cs150 folder. This is where you should put all files made for this lab.
Part 1 - Mind Mastery
game.py: 18 points
Mastermind is a neat (although oftentimes frustrating) puzzle game. It works a something like this: There are two players. One player is the codemaker (your program), the other is the codebreaker (the user). The codemaker chooses a sequence of four colored pegs, out of a possible six colors (red, blue, green, yellow, orange, and purple). They may repeat colors and place them in any order they wish. This sequence is hidden from the codebreaker. The codebreaker has 10 chances to guess the sequence. The codebreaker places colored pegs down to indicate each of their guesses. After each guess, the codemaker is required to reveal certain information about how close the guess was to the actual hidden sequence.
Describe the Problem: |
In this part of the lab, you will create a program to play Mastermind, where computer is playing the codemaker, and the human user is the codebreaker. Thus your program needs to generate a secret code, and repeatedly prompt the user for guesses. For each guess, your program needs to give appropriate feedback (more detail below). The game ends when either the user guesses correctly (wins) or uses up 10 guesses (loses). |
Understand the Problem: |
The trickiest part of this game is determining how to provide feedback on the codebreaker's guesses. In particular, next to each guess that the codebreaker makes, the codemaker places up to four clue pegs. Each clue peg is either black or white. Each black peg indicates a correct color in a correct spot. Each white peg indicates a correct color in an incorrect spot. No indication is given as to which clue corresponds to which guess.
For example, suppose that the code is RYGY (red yellow green yellow). Then the guess GRGY (green red green yellow) would cause the codemaker to put down 2 black pegs (since guesses 3 and 4 were correct) and 1 white peg (since the red guess was correct, but out of place). Note that no peg was given for guess 1 even though there was a green in the code; this is because that green had already been "counted" (a black peg had been given for that one). As another example, again using RYGY as our code, the guess YBBB would generate 1 white peg and 0 black; yellow appears twice in the code, but the guess only contains one yellow peg. Likewise, for the guess BRRR, only 1 white peg is given; there is an R in the code, but only one. Check here for an online graphical version of the game (where their red pegs are our black pegs). A sample run of our text-based program may look like this: Sample outputpython3 game.py I have a 4 letter code, made from 6 colours. The colours are R, G, B, Y, P, or O. Your guess: GGGG Not quite. You get 0 black pegs, 0 white pegs. Your guess: YYYY Not quite. You get 1 black pegs, 0 white pegs. Your guess: YOYO Not quite. You get 0 black pegs, 2 white pegs. Your guess: PPYO Not quite. You get 1 black pegs, 2 white pegs. Your guess: POYB Not quite. You get 1 black pegs, 3 white pegs. Your guess: PBOY You win! So clever. |
Design an Algorithm |
Once you understand how the game works, you should design a pseudocode plan of attack. The general steps are:
|
Implement a Design |
Now that you have some of the kinks worked out in theory, it is time to write your program game.py.
You may assume the user always provides a guess with the available colors, and always in uppercase. Make and use an integer constant NUM_TURNS that represents the number of allowable turns (say, 10). generateCode()To generate the code, write a method generateCode() that generates the codemaker's code (and returns it as a String to the caller). That is, this method should randomly generate 4 colored pegs, selected from R, B, G, Y, O, and P, and return it as a 4-letter string. You'll want to use the random methods as discussed in lab03 in order to randomly generate a color for each peg. In particular, you'll generate an integer between 0 and 5 inclusive, and use if-statements to map each result to one of the 6 colors (if the random number is a 0, add a "R" to the code; if the random number is a 1, add a "B" to the code; etc.). Test your generateCode() method thoroughly before continuing. No, seriously, test it before continuing.clue(code, guess)Next, write a method clue(code, guess) that prints out the white and black clue pegs according to the given guess and code, and returns True if code equals guess, and False otherwise. Translate the pseudocode above to help you out.Note that you can "change" the i-th character in a string s to an 'x' as follows:
Also note you can omit the len(s) from the above expression. That is, if you write s[i:], Python interprets that as the substring of s from position i to the end. Similarly, s[:i] denotes the substring of s from the beginning up to (but not including) i.
|
Test the Program |
It is hard to test your program when you are given a random code that you don't know. Therefore, you should print out a hint message at the beginning of the program with the actual code, so that the graders know what the correct answer is when evaluating the number of black and white pegs your program provides. |
Part 2 - Looking for a Match
match.py: 20 points
Files Needed: (download each to your lab05 folder)
As you may know, proteins are chains of molecules called amino acids. There are 20 amino acids, each of which is typically represented by a single letter, and any protein can be specified by its sequence of amino acids. This sequence determines the properties of the protein, including its 3D structure.
Right: 3D structure of a protein. Image source: wikipedia.org.
When a new protein is found, one way in which we might attempt to guess the functionality of that protein would be to see if it contains certain markers common to a known class of proteins. For example (and an entirely bogus example at that), suppose we discover a new protein, that we've named Duane, with the following amino acid sequence:
STTECQLKDNRAWTSLFIHTGHTECA
We may also suspect that Duane might belong to one of two possible classes of proteins: Spiffs and Blorts. As you well know, most Spiffs contain the pattern TECQRKMN or at least something close to it. That is, most of the sequences in the class of Spiff proteins have the subsequence TECQRKMN with only a few of the letters changed. Blorts, meanwhile, have the pattern ALFHHTTGT, or something very similar.
In this case, we can deduce that Duane is most likely a Spiff: Duane contains the pattern TECQLKDN which only has 2 mismatches from TECQRKMN (the errors are marked with a ^ below).
TECQLKDN TECQRKMN ^ ^
The closest pattern to the Blort sequence is
SLFIHTGHT ALFHHTTGT ^ ^ ^^
which has 4 mismatches.
Describe the Problem: |
Input: A file that contains a string s representing a protein sequence, along with some number of strings, each representing a marker sequence.
Goal: For each marker sequence, find its best match in the protein sequence and report its location and the number of errors in the match. |
Understand the Problem: |
The file test.txt is in the format you should expect for your input (and is the file you should use to test your program). In particular, the first line will always contain the protein sequence. Following the protein sequence will be some number of pattern sequences. For each of these sequences, you should report the location of the best match, and the number of errors at that location.
For example, the contents of test.txt are as follows: STTECQLKDNRAWTSLFIHTGHTECA TECQRKMN ALFHHTTGT TTECQ HT ZZZ TTZZZRAWTFor this file your program should have something like the following output: Example OutputSequence 1 has 2 errors at position 2. Sequence 2 has 4 errors at position 14. Sequence 3 has 0 errors at position 1. Sequence 4 has 0 errors at position 18. Sequence 5 has 3 errors at position 0. Sequence 6 has 5 errors at position 5. |
Design an Algorithm: |
Make sure you come up with a plan of attack (on paper) before you begin coding. |
Implement a Design: |
Unlike previous assignments in which data was entered by the user or hard-coded into the program, here your data will come from a file. As such, you'll need a few tools for handling files.
Reading from a FileTo work with an external file, you'll use the open function:
This function opens the file with the given name and loads it into the specified variable. The <mode> can be either "r" or "w", depending on whether you intend to read from the file or write to the file (you can only do one type of operation at a time). In this case we're just reading from the file, so you'll want something like this:
You can now use functions associated with a file object, including:
Keep in mind that the lines returned by all these functions include the newline character at the end of each line. So if you call the print function on one of these lines, you'll print two newlines (creating a blank line). If you want to avoid this, you can either tell print not to add a newline at the end (end='') or you can just work with all but the last character in the line (myLine[:-1]). You can also use a for loop to iterate through all the remaining lines in the file. For example,
will print the first character and the length of each line in the file. Note that if some lines had already been read before this for-loop, those lines wouldn't be interated through.
|
Test the Program: |
Try your program on a variety of sequences and inputs. Make sure the program still works when the protein sequence or some pattern sequence is empty. |
Maintain: |
Make sure your code is "readable": use short but meaningful variable names, use constants where appropriate, use functions where you can, and comment any code that does anything substantial (for example, it would be a good idea to put a comment before your for loops explaining the purpose of the for loops, before each function explaining the function's parameters and purpose, etc.) It is not necessary to comment assignment statements and simple things like that.
You should also handle exceptions so as to make your program robust to runtime errors (e.g. the user if enters a file name that doesn't exist). |
Handin |
Be sure to hand in what you have finished so far. |
Part 4 - Wrap Up
Google Form: 2 points
As with every lab, your last job prior to submission is to complete a brief write-up by filling out a Google Form.
Handin
You now just need to electronically handin all your files. As a reminder
cd # changes to your home directory cd cs150 # goes to your cs150 folder handin # starts the handin program # class is 150 # assignment is 5 # file/directory is lab05 lshand # should show that you've handed in something
You can also specify the options to handin from the command line
cd ~/cs150 # goes to your cs150 folder handin -c 150 -a 5 lab05
File Checklist
You should have submitted the following files:game.py match.py test.txt (for ease of grading)
C. Taylor, A. Eck, T. Wexler, A. Sharp.