The purpose of this lab is to:
As we've seen in class, a dictionary, like a list, is one of the built-in data structures supported by Python for representing collections of data. The entries in a dictionary are key-value pairs. A dictionary key can be any immutable type (typically a number or a string), while a dictionary value can be anything at all. We think of a dictionary as something that maps keys to values. In a traditional dictionary, the keys are words, and the values are their definitions. In a Python dictionary, your key-value pairs could be words and definitions, or usernames and passwords, or phone numbers and names.
Unlike a list, dictionaries are not ordered -- they are just collections of key-value pairs. So there is no element 0, element 1, etc. Instead, values in a dictionary are indexed by their keys. If you print a dictionary, the elements will be listed in a seemingly random order. One advantage of using a dictionary is that testing membership or looking up a value in a dictionary is very fast. To evaluate the boolean expression a in myList, Python needs to look through the whole list, so this gets slower and slower as the list grows. Evaluating a in myDictionary, however, is very fast, even for very large dictionaries. Plus, sometimes it's just easier to have values indexed by arbitrary keys.
Here are some examples of syntax involving dictionaries.
score = {} # set score to be an empty dictionary score = {"kirk": 17, "spock": 4} # set score to be a dictionary with two key-value pairs score["spock"] # 4 score["bones"] # error (key not found) "bones" in score # False score["bones"] = 42 # adds key "bones" with value 42 score["spock"] = 18 # updates value for key "spock" to 18 "bones" in score # True del score["kirk"] # remove key "kirk" and its value list(score.keys()) # returns a list of the keys len(score) # 2 (number of keys in score) for k in score : # iterates over the keys in score
20 points, individual
Consider the following text:
Question: Whether nobler mind suffer Slings arrows outrageous fortune Take arms against sea troubles By opposing them.I suspect many of you recognize this as a condensed version of the first few lines of Hamlet's famous "To be, or not to be" soliloquy. The original unedited text is:
To be, or not to be -- that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune Or to take arms against a sea of troubles And by opposing end them.
The former was produced by finding the 30 most commonly used words in the speech (only some of which is shown above) and removing them. Your first challenge is to write a program called distill.py that prompts the user for the name of a text file and a number n, and prints the contents of that text file with the n most common words removed.
The details of implementing a solution are up to you, but here is a suggested outline of how to approach the problem. As usual, think about the 6 steps of program development, and test each piece as you go.
textfile = open('hamlet.txt','r') textstring = textfile.read()
A set is another built-in data structures supported by Python for the mathematical notion of a set, i.e. a collection of elements. Unlike a dictionary, the elements in a set don't have values associated with them. You could simulate a set using a dictionary, but adding a key for each element, and setting that key's value to something arbitrary, like 0, or an empty string, or none. That said, if you don't have data associated with each element, and simply whant to keep track of a set of items, using a set is the way to go.
Like dictionarys (and unlike lists), sets are not ordered, but testing membership and addinging or removing elements is very fast. Sets do not store duplicate elements: adding an element to a set that already contains that element has no effect.
Here are some examples of syntax involving sets.
team = set() # makes a set with 0 elements team = {"kirk", "spock"} # makes a set with 2 elements len(team) # 2 team.add("bones") # adds "bones" to team team.remove("kirk") # removes "kirk" from team for p in team : # iterates through elements of team "bones" in team # True "malcolm" in team # False "river" not in team # True
20 points, partner
An anagram is just a rearrangement of the letters in the word to form another word or words. For example, if you rearrange the letters in
oberlin studentyou can get
let none disturbor
intends troubleand many many more. But I'm fairly confident that those are probably the best ones.
For this part of the assignment, you'll be writing a program called anagrams.py to generate your own anagrams. To decide which anagrams are at least plausibly interesting, your program will have to decide which strings are legitimate words. Your program should prompt the user for a file containing a word list, and a word for anagramming.
Once we get the basic functionality down, we'll add a few other features. For example, we'd like to allow the user to specify that they are only interested in using words of length 4 or greater, or anagrams containing at most 3 words.
contains("zombiepig", "bozo") # returns False, "" contains("zombiepig", "biz") # returns True, "omepig"
You might be wondering why we're passing around the variable sofar. Indeed, when we want to find the anagrams of a string given by the user, we'll pass in an empty list. However, that list will be critical for making use of recursion. Let's look at an example to see why. Suppose we want to find anagrams of
robopirateWe'll look through our wordlist for words that are contained in this string. The string "cat" doesn't appear in "robopirate", but "air" does. So one thing our function call will do is begin looking through the remainder of "robopirate" with "air" removed, looking for further anagrams. That is, it'll continue to look for strings contained in
robopteOur list includes "bro", which is contained in "robopte", so another recursive call will be made on the remains, namely "opte". Our wordlist contains "poet", leaving us with an empty string. At this point we've used up all the letters in the string, so we have an anagram, namely
air bro poetUnfortunately, if we want to print our anagram, we're in trouble, since we haven't kept a record of the previous words we found. (Could we have just printed words as we found them?) That's where sofar comes in. This list will track the words we've found so far in this particular branch of the recursion. That is,
grams("robopirate", words, [])will call (among other things)
grams("robopte", words, ["air"])which in turn calls
grams("opte", words, ["air", "bro"])which in turn calls
grams("", words, ["air", "bro", "poet"])which can now print the complete anagram.
With this in mind, we're ready to describe the overall structure of this function. We loop over every word w in our wordlist. For each word w that's found in our string s, we make a recursive call on the remainder of s, and with a new list, equal to the current list with w added on. Make sure you are not changing sofar. (Why does it need to be a new list? I.e., why would sofar.append(w) be the wrong thing to do?) If we make a call where s is the empty string, we can just print the contents of sofar.
s.replace(ch,'',1)creates a new string that is identical to s except the first occurence (parameter 3) of ch (parameter 1) is replaced by the empty string (parameter 2).
" ".join(L)returns a new string containing every element of the list L, glued together with a space. Of course, you could join the elements with any string, but a space makes the most sense here.
or bro rat bat air ape poet poor ripe taboo orbitNote that this doesn't contain "rabbit" ("robotpirate" only has one "b"). If you run your program using words1.txt for your word list on the string "robopirate", you should get
or ape orbit or orbit ape bro air poet bro poet air air bro poet air poet bro ape or orbit ape orbit or poet bro air poet air bro orbit or ape orbit ape orIt doesn't matter if your output is in another order.
Add a "preprocessing" step: Instead of adding every string found in the word file to the set words, only add those that are contained in the user's input string. Fortunately you already have created a function that can help you out here.
If you followed the Honor Code in this assignment, insert a paragraph attesting to the fact within one of your .py files.
I affirm that I have adhered to the Honor Code in this assignment.
You now just need to electronically handin all your files. As a reminder
% cd # changes to your home directory % cd cs150 # goes to your cs150 folder % handin # starts the handin program # class is 150 # assignment is 10 # file/directory is lab10 % lshand # should show that you've handed in something
You can also specify the options to handin from the command line
% cd ~/cs150 # goes to your cs150 folder % handin -c 150 -a 10 lab10
distill.py anagrams.py