IMDB Data
Application Overview
In this application portion of this lab, we will be implementing a program that can find the shortest paths between two actors (used as a gender-neutral term) based on data from IMDB, where each vertex represents an actor and each edge represents a movie or TV show in which two actors acted.
In class, we have been discussing how graph structures can be used to represent relationships between groups of objects. For this assignment, our program will allow us to play the Kevin Bacon Game. A person’s Bacon Number is computed based on the number of movies of separation between that person and the actor Kevin Bacon. For example, if an actor is Kevin Bacon, then their Bacon Number is 0. If an actor was in a movie with Kevin Bacon, then their Bacon Number is 1. If an actor was not in a movie with Kevin Bacon, but they were in a movie with someone who was, then their Bacon Number would be 2. In short, an actor’s Bacon Number is one greater than the smallest Bacon Number of any of their co-stars.
For fun and some additional background, you can try out the Oracle of Bacon at the University of Virginia.
Data Format
We have provided you with several text files containing data about actors and the movies and TV shows in which they have acted from IMDB. These files are in the format:
<Actor Name> <Title>
where the actor’s name
and the title
of a TV show or movie that they acted in is separated by a tab character \t
. For example, the line:
Kevin Bacon Apollo 13 (1995)
indicates that actor ”Kevin Bacon”
was in the movie ”Apollo 13 (1995)”
.
Hint
To parse this line, refer back to how we parsed the book information in Lab 7. To split a line by the tab character \t
, we can use the following code:
String[] split = line.split("\t");
String name = split[0];
String title = split[1];
Application Instructions
In Visual Studio Code, create a new class BaconNumber
in a file BaconNumber.java
. Once again, you are free to implement this class however you like, but a good suggestion would be to have a private Graph
instance variable.
Creating Vertices and Edges
To create the contents a Graph
object using the data from an input file, the following process might be useful to implement in a private helper method of your BaconNumber
class:
- Create a
Set<String>
to store the unique actorname
s - Create a
Set<String>
to store unique TV show and movietitle
s - Create a
Map<String, List<String>
that will take an actor’sname
as the key and stores a list of TV show and movietitle
s for that actor - Create a
Map<String, List<String>
that will take a TV show or movietitle
as the key and stores a list of actors’name
s that were in that TV show or movie - Read the input file line by line, parsing the
name
andtitle
on each line
- add the
name
andtitle
to theSet<String>
s from Steps 1 and 2 - add the
title
to the list of TV shows and movies for thename
key in theMap<String, List<String>>
from Step 3 - add the
name
to the list of actors for thetitle
key in theMap<String, List<String>>
from Step 4
- After completing the loop in Step 5, create a
Vertex
object for each unique actor and add it to theGraph
using theaddVertex
method from Part 1 - Create an edge for every two actors that acted in the same TV show or movie.
- Loop over each actor
Vertex
and get the list of TV show and movietitle
s from theMap<String, List<String>>
created in Step 3 - Use an nested loop to loop over the TV show and movie
title
s in the list from the previous bullet point. Get the list of actor names for the currenttitle
from theMap<String, List<String>>
created in Step 4. - Use a doubly nested loop to loop over the actor
name
s in the list from the previous bullet point. Add an edge between the currentvertex.name
(from the outer loop) and the currentname
(from the doubly nested loop), using the thetitle
of the nested loop as the edge label. You can add such an edge to theGraph
using theaddEdge
method from Part 1
Testing Our Progress
If your tests in Part 1 were a success, then your addVertex
and addEdge
methods should be working well here.
You might want to add a public static void main(String[] args)
method to your BaconNumber
class and try reading in some test data, such as the imdb_small.txt
text file provided in your GitHub repository. This file contains 3,396 edges between 161 vertices/actors to create a connected graph. To verify your implementation so far, you can try reading in this input data, then count whether the number of vertices and edges match the expected counts. You could also use the Debugger to verify that some of your edge representations were created correctly.
Committing Your Progress
Don’t forget to frequently save your progress to GitHub using the add
/commit
/push
commands with git.