In this final part of the lab, we will use our MyHashMap
to implement a n
-order Markov chain model that learns how to generate new text after training on existing text from real authors.
In particular, we will extend our idea of a first-order Markov chain developed in the DNASequence.java
file in the Warmup to work for any n
-order Markov chain model. Recall that an n
-order Markov chain learns the probability of each possible next state (e.g., a character of text) based on the history of the n
most recent states (e.g., the most recent String of n
characters). After learning these probabilities, we can then use them to randomly generate new text that will hopefully look like it was written by a person!
We have provided you with example text files in the samples
folder of your GitHub repository. Example runs of the program could generate the following text:
THE COMEDY OF ERRORS
ACT I. Scene I.
Elsinore. hall in the palace.
Flourish. Exeunt.
SCENE IV.
A Hall in the Castle. Another part of the world, I give to negligence,
And hear at large discourse to your friends, I'll leave you till bed time.
My present business and desires shall pay the sum for him,
He shall kill two of vs, and men are
onelie turned into tongue, will speak to you like of Paris loue?
Iul. No Madam, we haue cul'd such necessaries are embark'd. Farewell. Exit [Polonius].
Exit.
Ham. Good.
King. But where's the commission; read it at more leisure,
And now no soil nor cautel doth besmirch
The very firstlings of my wife withal;
And so of these. Which is the natural touch; for the wealth that he hath lam'd me,
I shall be cal'd assurance double sure,
Still am I call'd. Unhand me, gentlemen, he hath bid me to interpret between you and your behests, and am enioyn'd
By holy Lawrence, to fall prostrate he
We have provided you with some starter code in the MarkovChain.java
file. Your objective in this part of the lab is to fill in the missing pieces so that you can generate text similar to the above.
The first step is to add two instance variables to the MarkovChain
class:
transitions
, which is a MyHashMap<String, Transitions<Character>>
that will store the learned transition counts
order
, an integer storing the order n
of the model
The class has a single constructor whose instructions need to be filled in: you should save the order
parameter into the this.order
instance variable, and you should create a new MyHashMap
for the this.transitions
instance variable.
ReadMe
To earn full credit, you should use your MyHashMap
to complete this part of the lab. However, if your MyHashMap
is not quite working, you can still earn partial credit by using Java’s built in HashMap
class since one of the goals of this part is to gain practice using maps to solve interesting real-world problems.
Similar to the Warmup’s trainCounts
method, you should implement the logic for training the Markov chain transition probabilities in the train
and updateCount
methods.
Your train
method should loop through the provided String of input
text to find all history
and nextCharacter
pairs. Start with using the first n = this.order
characters as history
and the following character as nextCharacter
and pass these two arguments into the updateCount
method.
Hint
The String classes’ substring
and charAt
methods should be really useful here.
To illustrate, imagine that input = “The quick brown fox jumps over the lazy dog”
and this.order = 5
. To start,
history = “The q” = input.substring(0, 5)
nextCharacter = ‘u’ = input.charAt(5)
.In the second iteration shift everything over to the right by one position in the input
String. In our illustrative example:
history = “he qu” = input.substring(1, 6)
nextCharacter = ‘i’ = input.charAt(6)
.In the third iteration, shift over to the right by one more position in the input
String. In our illustrative example:
history = “e qui” = input.substring(2, 7)
nextCharacter = ‘c’ = input.charAt(7)
.Inside the updateCount
method, you should increase the transition count for observing nextCharacter
within the Transitions<Character>
stored for history
in the this.transitions
MyHashMap
Finally, after we have learned the transition probabilities, we want to be able to generate new text, similar to the Warmup’s generate
method.
The process here is very similar, except we start the new String with all of the characters in the start
parameter. Also, each time through the loop used to generate the next character, we update current
to be the last n = this.order
characters of the generated String (again, the substring
method from the String class could be helpful here).
Once you have all of the functionality implemented, try experimenting with different parameters for the three command line arguments: (1) the order of the model, (2) the length of the String to generate, and (3) the path to the file you want to use for training.
Especially try experimenting with different orders of the model. What trends do you observe as the order becomes larger and smaller?
Don’t forget to frequently save your progress to GitHub using the add
/commit
/push
commands with git.