# Penguin Data Science

penguin.py: 15 points

One of the benefits of using computers to solve problems is they can process data very quickly to help us discover important facts about the real-world. In this part of the lab, we will be using Python to perform some data science on observations biologists recorded about three species of penguins on different islands around Antarctica. In other words, we will be using data science as an interdisciplinary approach to answer questions related to biodiversity.

Special thanks and credit to Professor Allison Horst at the University of California Santa Barbara for making this data set public: Twitter post and thread with more information and GitHub repository.

#### Data About Penguins

You have been provided with a `read_data()` function in `penguin.py` that reads in all of the data from the `penguins_data.csv` file. This file contains data for about 342 real-life penguins. Calling `read_data()` returns a list of all of the `penguins` we will be working with.

``````penguins = read_data()
``````

Each penguin is a list containing the six values described in the table below.

List IndexInformationType
0species`str`
1home island`str`
2bill length`float`
3bill depth`float`
4flipper length`float`
5body mass`float`

For example, one penguin might be represented as the following list:

``````penguin = ["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0]
``````

Since each penguin is itself a list of values, then a list of multiple penguins is represented as a two-dimensional list, such as the following list that contains 5 penguins:

``````five_penguin_list = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0],
["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],
["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
``````

The `penguins` list returned by the `read_data()` function is similar in structure to the `five_penguins_list` above, except it contains 342 penguins, instead of only 5.

We will read in all the penguin observations from a file, so you do not need to make any assignments like the above (they merely illustrate what the data looks like).

#### Program Goal

Your goal in this program is to use that list of penguins to discover possible differences between the three species of penguins (Adelie, Chinstrap, and Gentoo) based on their data. In particular, your program should do the following:

1. Ask the user for a species name (either “Adelie”, “Chinstrap”, or “Gentoo”).
2. Ask the user for which type of measurement they want to see (either bill length or body mass; you do not have to handle bill depth or flipper length).
3. Based on the species chosen by the user in Step 1, create a new list containing only the penguins that belong to this species.
4. Calculate and then print the average, minimum, and maximum of the measurements selected by the user in Step 2 (bill length or body mass) for the list of penguins created in Step 3.

During Steps 1 and 2, you should make sure the user enters a valid option. If the user did not, you should print a message telling them what mistake they made then close the program. Tip: you might want to provide the user with a menu like we did with image filters in Lab 4

Reminder

As you make progress on your program, don’t forget to commit and push your changes regularly!

Reminder

As you make progress on your program, don’t forget to commit and push your changes regularly!

#### Useful Functions

To complete this assignment, the following functions will help us.

`find_species(penguins, species)`:

The `find_species()` function will perform Step 3 above by taking in a list of all of the penguins in the data, and return a smaller list that contains only the penguins of a particular species. This can be done by following these steps:

1. Create a new empty list called `filtered`.
2. Loop over each `penguin` in the `penguins` list.
1. Check if the current `penguin`’s species (in index 0, i.e., `penguin[0]`) is equal to `species`.
2. If so, append the current `penguin` to the `filtered` list.
3. Return the `filtered` list.

For example, say we have the same five penguins as we did above.

``````five_penguin_list = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0],
["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],
["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
``````

Then, if you want to create a list of only the Adelie penguins from those five, you can call `find_species(five_penguin_list, "Adelie")`, which should return a list with the two Adelie penguins.

``````filtered = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0]]
``````

Reminder

Once you’ve implemented the `find_species` function, remember to commit and push your changes!

`find_measurements(filtered, index)`:

In order to perform Step 4, we need to work with either the bill length or body mass of all of the penguins of a given species (returned as `filtered` from our `find_species()` function). To get those measurements, we will use the `find_measurements()` function.

The `find_measurements()` function is very similar to `find_species()`, except we are only saving a particular measurement from each `penguin`, instead of the entire `penguin`. This function should:

1. Create a new empty list called `measurements`.
2. Loop over each `penguin` in the `filtered` list.
1. Grab the `measurement` from `penguin[index]` (index = 2 if the user chose bill length and index = 5 if they chose body mass).
2. Save the `measurement` in the `measurements` list.
3. Return the measurements list.

For example, say we have the same two Adelie penguins as we did above:

``````filtered = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0]]
``````

Then, if I want to create a list of all of their bill lengths, I can call `find_measurements(filtered, 2)`, which should return a list:

``````measurements =  [39.1, 37.8]
``````

Reminder

Remember to commit and push your changes before moving on!

`find_average(measurements)`:

For the `find_average()` function, we will want to add together all the numbers in the input `measurements` list, then divide that total by the count of numbers in the list and return the result.

`find_max(measurements)` and `find_min(measurements)`:

For the `find_max()` and `find_min()` functions, we will need to loop through the values in `measurements`. Within the loop, we will keep track of which value is currently the largest (for `find_max()`) or smallest (for `find_min()`). You should not use Python’s built-in `max()` or `min()` functions here. In addition, you should not use the variable names `min` or `max` as they will collide with the built-in function names.

As a hint of how to loop over each `penguin` contained in a list of `penguins` (i.e., a list of lists), we can use the following code:

``````for penguin in penguins:
# do something with penguin, which is a list of measurements
``````

Reminder

Commit and push your changes as you complete each function!