Finding Paths
The final part of this lab explains how to find paths between any two actors in your Graph
created from IMDB data.
Breadth First Search
Since we have an unweighted graph, breadth first search (BFS) is a great choice for finding the shortest paths between any two vertices (i.e., actors). We will call one actor the center
(e.g., ”Kevin Bacon”
) and the other actor the goal
(e.g., ”Christina Ricci”
). You should implement BFS in your BaconNumber
class in a method:
public List<String> findPath(Vertex center, Vertex goal)
Recall from lecture that the pseudocode for BFS is:
- Create an empty
frontier
Queue
- Create an empty
explored
Set
- Enqueue the
center
in thefrontier
- While the frontier is not empty:
- Dequeue
vertex
from thefrontier
- Add
vertex
toexplored
- Loop over each
neighbor
of the currentvertex
- If
neighbor
isgoal
, return a path fromcenter
toneighbor
- Else if
neighbor
is not inexplored
andneighbor
is not infrontier
, then enqueueneighbor
infrontier
- If
- Return no path exists (e.g., an empty path)
Hint
In Java, the LinkedList<T>
class implements the Queue<T>
interface (see Lab 5 for a reminder of important methods in the Queue
interface).
Also, the HashSet<T>
class implements the Set<T>
interface, which has methods add(T item)
to add an item
and remove(T item)
to remove an item
from the set.
If you use a HashSet
, it would be helpful for your Vertex
class to implement the
public int hashCode()
and
public boolean equals(Object obj)
methods that are used by HashSet
collections to achieve constant time add and remove operations.
Visual Studio Code can create these methods for you if you right click anywhere in the Vertex.java
file and select “Source Action” from the right click menu. Then, select “Generate hashCode() and equals()”. Check only the box next to “name: String” and VSC will calculate hashcodes and equality between two Vertex
objects based on their name
instance variables (i.e., actor names).
Hint
Checking whether a Vertex
is already in the frontier
Queue
is a linear time operation, which will slow down your program for large input files.
To speed this up, we can create a second HashSet
called frontierSet
in Step 2 that will only store the Vertex
objects that are currently on the frontier
. Each time you enqueue a Vertex
into frontier
, also add the same Vertex
to frontierSet
. Then, each time you dequeue a Vertex
from frontier
, you can also remove the same Vertex
from frontierSet
.
Creating Paths
You can create the path from the Vertex center
to the Vertex goal
however you want in your findPath
method that implements the BFS algorithm. How you do so will depend on how you implemented your Vertex
class and how you represent edges in the Graph
.
Hint
If you have an Edge
class to store the label of an edge between two Vertex
objects, then one approach is to create a HashMap<Vertex, Edge>
at the start of your findPath
method, then when you loop over each neighbor
of vertex
, you can save the edge between vertex
and neighbor
into this HashMap<Vertex, Edge>
with neighbor
as the key.
To create the path from goal
back to center
, you can then follow the edges from goal
until you reach the center
.
The path itself should be a sequence of String
s starting with goal.name
and ending with center.name
. The String
s should alternate between actor name
s and a TV show or movie title
that connects that actor with the next. Each transition between name
and title
or title
and name
should print an arrow ->
.
For example, using the input file imdb_small.txt
, one shortest path between goal = “Christina Ricci”
and center = “Kevin Bacon”
) is:
Christina Ricci -> Man Who Cried, The (2000) -> Cate Blanchett -> Life Aquatic with Steve Zissou, The (2004) -> Bill Murray -> Wild Things (1998) -> Kevin Bacon
README
Because two actors might have acted in multiple TV shows or movies together, there might be multiple equivalent shortest paths between two actors, meaning that the specific actor name
s and specific TV show or movie title
s in the middle of your path could differ from someone else’s, depending on how you each implemented your Graph
representations. However, the length of the shortest path between any two actors should be consistent across all representations for a given input file of IMDB data.
Interacting with the Program
Finally, to run your program to be able to find different paths between different actors, your BaconNumber
class should have a method:
public static void main(String[] args)
where the first command line argument is the name of the text file to use as input to construct your Graph
, and the second command line argument is the name of the goal
actor for your search (using ”Kevin Bacon”
as your center
actor). Your program should also take an optional third command line argument that allows the user to specify a different center
actor.
The program should then:
- Read in the IMDB data from the input file and create a
Graph
object (following the process described in Part 2) - Find a shortest path between the desired
center
andgoal
- Print the shortest path if one was found, otherwise print
”No path found"
.
We have provided you with five input files with different sizes of graphs. They are:
flights.txt
: a connected graph with 6 vertices and 10 edges, representing the flights from the warmupimdb_small.txt
: a connected graph with 161 vertices and 3,396 edgesimdb_movies_medium.txt
: a graph with 17,017 vertices and 71,383 edges, created using all movies in IMDB with at least 10,000 ratingsimdb_withtv_medium.txt
: a graph with 25,300 vertices and 143,757 edges, created using all movies and TV shows in IMDB with at least 10,000 ratingsimdb_movies_large.txt
: a graph with 199,406 vertices and 1,064,902 edges, created using all movies in IMDB with at least 100 ratingsimdb_withtv_large.txt
: a graph with 292,411 vertices and 2,511,899 edges, created using all movies and TV shows in IMDB with at least 100 ratings
These are the edge counts of one implementation of a graph. Your edge counts may differ a bit depending on your implementation.
README
Because actor names are often multiple words, we need to enclose them in quotation marks so that Java interprets an entire name as a single command line argument. For example, to produce the path example above, I ran the program as:
java BaconNumber imdb_small.txt "Christina Ricci" "Kevin Bacon"
Experimenting!
Once you have all of the functionality implemented, try experimenting with different parameters for the three command line arguments
Some questions to ponder (but you do not have to submit answers):
- Can you find any actors who are not connected so that there is no path between them?
- What two actors have the longest length of shortest path between them? What is the length of that path?
- How does the stopwatch runtime of your program change as you use smaller or larger input files?
Committing Your Progress
Don’t forget to frequently save your progress to GitHub using the add
/commit
/push
commands with git.