The final part of this lab explains how to find paths between any two actors in your Graph
created from IMDB data.
Since we have an unweighted graph, breadth first search (BFS) is a great choice for finding the shortest paths between any two vertices (i.e., actors). We will call one actor the center
(e.g., ”Kevin Bacon”
) and the other actor the goal
(e.g., ”Christina Ricci”
). You should implement BFS in your BaconNumber
class in a method:
public List<String> findPath(Vertex center, Vertex goal)
Recall from lecture that the pseudocode for BFS is:
frontier
Queue
explored
Set
center
in the frontier
vertex
from the frontier
vertex
to explored
neighbor
of the current vertex
neighbor
is goal
, return a path from center
to neighbor
neighbor
is not in explored
and neighbor
is not in frontier
, then enqueue neighbor
in frontier
Hint
In Java, the LinkedList<T>
class implements the Queue<T>
interface (see Lab 5 for a reminder of important methods in the Queue
interface).
Also, the HashSet<T>
class implements the Set<T>
interface, which has methods add(T item)
to add an item
and remove(T item)
to remove an item
from the set.
If you use a HashSet
, it would be helpful for your Vertex
class to implement the
public int hashCode()
and
public boolean equals(Object obj)
methods that are used by HashSet
collections to achieve constant time add and remove operations.
Visual Studio Code can create these methods for you if you right click anywhere in the Vertex.java
file and select “Source Action” from the right click menu. Then, select “Generate hashCode() and equals()”. Check only the box next to “name: String” and VSC will calculate hashcodes and equality between two Vertex
objects based on their name
instance variables (i.e., actor names).
Hint
Checking whether a Vertex
is already in the frontier
Queue
is a linear time operation, which will slow down your program for large input files.
To speed this up, we can create a second HashSet
called frontierSet
in Step 2 that will only store the Vertex
objects that are currently on the frontier
. Each time you enqueue a Vertex
into frontier
, also add the same Vertex
to frontierSet
. Then, each time you dequeue a Vertex
from frontier
, you can also remove the same Vertex
from frontierSet
.
You can create the path from the Vertex center
to the Vertex goal
however you want in your findPath
method that implements the BFS algorithm. How you do so will depend on how you implemented your Vertex
class and how you represent edges in the Graph
.
Hint
If you have an Edge
class to store the label of an edge between two Vertex
objects, then one approach is to create a HashMap<Vertex, Edge>
at the start of your findPath
method, then when you loop over each neighbor
of vertex
, you can save the edge between vertex
and neighbor
into this HashMap<Vertex, Edge>
with neighbor
as the key.
To create the path from goal
back to center
, you can then follow the edges from goal
until you reach the center
.
The path itself should be a sequence of String
s starting with goal.name
and ending with center.name
. The String
s should alternate between actor name
s and a TV show or movie title
that connects that actor with the next. Each transition between name
and title
or title
and name
should print an arrow ->
.
For example, using the input file imdb_small.txt
, one shortest path between goal = “Christina Ricci”
and center = “Kevin Bacon”
) is:
Christina Ricci -> Man Who Cried, The (2000) -> Cate Blanchett -> Life Aquatic with Steve Zissou, The (2004) -> Bill Murray -> Wild Things (1998) -> Kevin Bacon
ReadMe
Because two actors might have acted in multiple TV shows or movies together, there might be multiple equivalent shortest paths between two actors, meaning that the specific actor name
s and specific TV show or movie title
s in the middle of your path could differ from someone else’s, depending on how you each implemented your Graph
representations. However, the length of the shortest path between any two actors should be consistent across all representations for a given input file of IMDB data.
Finally, to run your program to be able to find different paths between different actors, your BaconNumber
class should have a method:
public static void main(String[] args)
where the first command line argument is the name of the text file to use as input to construct your Graph
, and the second command line argument is the name of the goal
actor for your search (using ”Kevin Bacon”
as your center
actor). Your program should also take an optional third command line argument that allows the user to specify a different center
actor.
The program should then:
Graph
object (following the process described in Part 2)center
and goal
”No path found"
.We have provided you with five input files with different sizes of graphs. They are:
imdb_small.txt
: a connected graph with 161 vertices and 3,396 edgesimdb_movies_medium.txt
: a graph with 17,017 vertices and 71,383 edges, created using all movies in IMDB with at least 10,000 ratingsimdb_withtv_medium.txt
: a graph with 25,300 vertices and 143,757 edges, created using all movies and TV shows in IMDB with at least 10,000 ratingsimdb_movies_large.txt
: a graph with 199,406 vertices and 1,064,902 edges, created using all movies in IMDB with at least 100 ratingsimdb_withtv_large.txt
: a graph with 292,411 vertices and 2,511,899 edges, created using all movies and TV shows in IMDB with at least 100 ratingsReadMe
Because actor names are often multiple words, we need to enclose them in quotation marks so that Java interprets an entire name as a single command line argument. For example, to produce the path example above, I ran the program as:
java BaconNumber imdb_small.txt "Christina Ricci" "Kevin Bacon"
Once you have all of the functionality implemented, try experimenting with different parameters for the three command line arguments
Some questions to ponder (but you do not have to submit answers):
Don’t forget to frequently save your progress to GitHub using the add
/commit
/push
commands with git.