concordance.py: 34 points

In your replit project for this lab, you will find the following test files for your program:

  • Jabberwocky.txt [file] - Lewis Carrol’s Jabberwocky
  • LoveAndTheButterfly.txt [file] - Alice Moore-Dunbar Nelson’s Love and the Butterfly
  • Prufrock.txt [file] - T.S. Eliot’s The Love Song of J. Alfred Prufrock
  • Test.txt [file] - a file for testing your line numbering

Notably, Test.txt should give you the following output:

eight 8 8 8 8 8 8 8 8
five 3 5 5 5 5 5
one 1
three 3 3 3 5
I found 8 lines containing 4 unique words.

If you get a different output, there is either a problem with your line numbering or the way you are stripping punctuation. The other files are mainly useful for checking punctuation; there are many different punctuation characters used in these files and you should remove all of them. Look carefully at your output.


If you see what appears to be a blank word followed by line numbers, it may have been generated in the following way. The split() function separates a string into words by using whitespace as a delimiter, so some “words” might just be sequences of punctuation characters, such as “!!!”. When you strip off the punctuation you are left with the empty string. Before you add a word and its line number to the concordance, you should check if the word is the empty string; if it is, just don’t add it.


If you want to play with your concordance, here are a few additional files you might work with (you can download these, then upload them to your replit project using the Files pane on the left side of the screen):

  • Beowulf.txt [file] - translated to modern English by Hall
  • Frankenstein.txt [file] - the entire text of Mary Wollstonecraft Shelley’s novel
  • DavidCopperfield.txt [file] - all 626 pages of the Dickens novel
  • Inferno.txt [file] - the first third of Dante’s Divine Comedy, translated by Norton
  • KingLear.txt [file] - the Shakespeare play
  • Republic.txt [file] - Plato’s Republic