concordance.py: 34 points

In your GitHub repository for this lab, you will find the following test files for your program:

  • Jabberwocky.txt [file] - Lewis Carrol’s Jabberwocky
  • LoveAndTheButterfly.txt [file] - Alice Moore-Dunbar Nelson’s Love and the Butterfly
  • Prufrock.txt [file] - T.S. Eliot’s The Love Song of J. Alfred Prufrock
  • Test.txt [file] - a file for testing your line numbering

Notably, Test.txt should give you the following output:

eight 8 8 8 8 8 8 8 8
five 3 5 5 5 5 5
one 1
three 3 3 3 5
I found 8 lines containing 4 unique words.

If you get a different output, there is either a problem with your line numbering or the way you are stripping punctuation. The other files are mainly useful for checking punctuation; there are many different punctuation characters used in these files and you should remove all of the leading and trailing punctuation. Look carefully at your output.


If you see what appears to be a blank word followed by line numbers, it may have been generated in the following way. The split() function separates a string into words by using whitespace as a delimiter, so some “words” might just be sequences of punctuation characters, such as “!!!”. When you strip off the punctuation you are left with the empty string. Before you add a word and its line number to the concordance, you should check if the word is the empty string; if it is, just don’t add it.


If you want to play with your concordance, here are a few additional files you might work with (you can download these, then upload them to your Codespace by dragging and dropping them into the Explorer pane on the left side of the screen):

  • Beowulf.txt [file] - translated to modern English by Hall
  • Frankenstein.txt [file] - the entire text of Mary Wollstonecraft Shelley’s novel
  • DavidCopperfield.txt [file] - all 626 pages of the Dickens novel
  • Inferno.txt [file] - the first third of Dante’s Divine Comedy, translated by Norton
  • KingLear.txt [file] - the Shakespeare play
  • Republic.txt [file] - Plato’s Republic