CS-151 Labs > Lab 2. Object-Oriented Programming


Part 1. Testing Collections

In this part of the lab, we will explore polymorphism. One of the great advantages of having an abstract superclass or super-abstract interface is that we can pass the subclasses around as if they were instances of that abstract superclass, as long as we only use methods that are defined in the superclass.

In this part, we will be using two Java classes ArrayList and TreeSet. Both of these classes implement interface Collection.

Because both ArrayList and TreeSet are subclasses of the Collection interface, they are guaranteed to have all the methods that are defined by Collection. However they have completely different internals and that leads to different performances for different operations. We are interested in comparing the performance of add() and contains() methods. Look at the documentation for the Collection interface to learn how to call these two methods and what they return.

The internal implementation details of ArrayList and TreeSet are not important at the moment, but we will cover them in-depth later in the semester. In a nutshell, the main difference between the two collections is that in the ArrayList the underlying data structure is an Array. In TreeSet it is a sophysticated datastructure that maintains elements in sorted order.

The add() method adds a new element to each collection. As you already know, it is relatively easy to add an element to the end of the ArrayList - in most cases this requires just a single operation. InTreeSet the order of elements must be preserved and the insertion of a new element takes O(log n) steps. So we expect the add() to be slower in the TreeSet than in ArrayList.

On the other hand, because the elements in ArrayList do not guarantee any specific order, the search for a given element may need to loop through all the array elements, which is an O(n) operation. In the TreeSet the elements are sorted and the search takes O(log n) steps. Hence we expect the method contains() to run faster for the TreeSet than for the ArrayList.

Our goal here is to time how long these two operations actually take in these two collections, and to compare them.

There is a starter code in CollectionsTest.java that you already downloaded as part of this archive. Let’s import starter files into our project. First of all, add a new package lab2 to your project. Then right-click on it and use the Import... command to import all three files from the starter2 folder. if you forgot how to do this, refer to Lab 0 Part 1.

For this part we use only CollectionsTest.java. The other two files are for Part 2 of the lab.

Your task is to fill in the missing code marked for you to do.

We will start with the bottom of CollectionsTest.java: the main() method. This method has been filled out for you already, but you should understand what it is doing. We are defining a Collection<Integer> reference variable, c, that will be used to hold first an ArrayList<Integer> and later a TreeSet<Integer> object.

The <Integer> part of the variable declaration is an example of use of Java generics. We can create an ArrayList which holds objects of any given type. In this case, it holds objects of type Integer.

    public static void main(String[] args) {
        Collection<Integer> c;
        
        System.out.println("Testing ArrayList:");
        c = new ArrayList<Integer>();
        testCollection(c);
        
        System.out.println();
        
        System.out.println("Testing TreeSet:");
        c = new TreeSet<Integer>();
        testCollection(c);
    }

After we initialize our variable c with the actual new class, we call a static method testCollection(c) which you need to implement. The method is declared in the starter code as static and void and it takes a Collection<Integer> parameter.

    public static void testCollection(Collection<Integer> c) {
        // TODO: Implement this method.

        // Here are arrays to store results of NUM_TESTS tests
        long[] testsAdd = new long[NUM_TESTS];
        long[] testsContains = new long[NUM_TESTS];
        
        
        // Terminate by printing your results: average, min and max time
    }

In this method, we want to call the timeAdd and timeContains methods, which we will be implementing soon. We want to call these methods multiple times, which can be done with a for loop. We will do this NUM_TESTS times.

The tests arrays are initialized to hold NUM_TESTS elements. Now implement a for loop that calls both
timeAdd and timeContains methods NUM_TESTS times, and stores the results returned by each method in the corresponding position of the corresponding array. After each call to timeAdd and timeContains in one loop iteration call c.clear() to clear the corresponding collection and start the new test.

You should then print out the average time for each operation, the minimum time, and the maximum time, calculated from the values of the corresponding tests array. Running performance tests multiple times is a good habit to get into, as you never know what may slow down a computer at a specific time. It is good to run several and then calculate an average and range.

Note that the implementations of timeAdd and timeContains have not been written, which is fine. Our function will not do anything yet until we write those methods.

Our final two methods are timeAdd and timeContains. These two methods will look similar. Start by storing the current time in nanoseconds in the long variable using System.nanoTime(). Then, for timeAdd, call the add() method on the collection to insert all numbers from 0 to MAX_NUM-1. For timeContains, we will query the Collection with the contains() method to see if it contains all numbers from 0 to MAX_NUM - 1. These operations can be done with for loops again. Each method will then get the current time, and return the difference between the start time and the end time.

When you finish, your class should contain 4 static methods, 3 of which you implemented and one (main) that was implemented for you.

Analyse your results. Which collection is better for adding, and which one is better for searching? Or is there a collection that is superior on both tasks? Do your actual results correspond to what you expected? Add a comment about this to your readme file.