penguin.py: 15 points
One of the benefits of using computers to solve problems is they can process data very quickly to help us discover important facts about the real-world. In this part of the lab, we will be using Python to perform some data science on observations biologists recorded about three species of penguins on different islands around Antarctica. In other words, we will be using data science as an interdisciplinary approach to answer questions related to biodiversity.
Special thanks and credit to Professor Allison Horst at the University of California Santa Barbara for making this data set public: Twitter post and thread with more information and GitHub repository.
You have been provided with a read_data()
function in penguin.py
that reads in all of the data from the penguins_data.csv
file. This file contains data for about 342 real-life penguins. Calling read_data()
returns a list of all of the penguins
we will be working with.
penguins = read_data()
Each penguin is a list containing the six values described in the table below.
List Index | Information | Type |
---|---|---|
0 | species | str |
1 | home island | str |
2 | bill length | float |
3 | bill depth | float |
4 | flipper length | float |
5 | body mass | float |
For example, one penguin might be represented as the following list:
penguin = ["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0]
Since each penguin is itself a list of values, then a list of multiple penguins is represented as a two-dimensional list, such as the following list that contains 5 penguins:
five_penguin_list = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0],
["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],
["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
The penguins
list returned by the read_data()
function is similar in structure to the five_penguins_list
above, except it contains 342 penguins, instead of only 5.
ReadMe
We will read in all the penguin observations from a file, so you do not need to make any assignments like the above (they merely illustrate what the data looks like).
Your goal in this program is to use that list of penguins to discover possible differences between the three species of penguins (Adelie, Chinstrap, and Gentoo) based on their data. In particular, your program should do the following:
During Steps 1 and 2, you should make sure the user enters a valid option. If the user did not, you should print a message telling them what mistake they made then close the program. Tip: you might want to provide the user with a menu like we did with image filters in Lab 4
Reminder
As you make progress on your program, don’t forget to commit and push your changes regularly!
ReadMe
Note: We do not care how many decimal places your answers go to. You do not need to worry about rounding in this assignment
To complete this assignment, the following functions will help us.
find_species(penguins, species)
:
The find_species()
function will perform Step 3 above by taking in a list of all of the penguins in the data, and return a smaller list that contains only the penguins of a particular species. This can be done by following these steps:
filtered
.penguin
in the penguins
list.penguin
’s species (in index 0, i.e., penguin[0]
) is equal to species
.penguin
to the filtered
list.filtered
list.For example, say we have the same five penguins as we did above.
five_penguin_list = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0],
["Gentoo", "Biscoe", 46.1, 13.2, 211.0, 4500.0],
["Gentoo", "Biscoe", 50.0, 16.3, 230.0, 5700.0],
["Chinstrap", "Dream", 46.5, 17.9, 192.0, 3500.0]]
Then, if you want to create a list of only the Adelie penguins from those five, you can call find_species(five_penguin_list, "Adelie")
, which should return a list with the two Adelie penguins.
filtered = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0]]
Reminder
Once you’ve implemented the find_species
function, remember to commit and push your changes!
find_measurements(filtered, index)
:
In order to perform Step 4, we need to work with either the bill length or body mass of all of the penguins of a given species (returned as filtered
from our find_species()
function). To get those measurements, we will use the find_measurements()
function.
The find_measurements()
function is very similar to find_species()
, except we are only saving a particular measurement from each penguin
, instead of the entire penguin
. This function should:
measurements
.penguin
in the filtered
list.measurement
from penguin[index]
(index = 2 if the user chose bill length and index = 5 if they chose body mass).measurement
in the measurements
list.For example, say we have the same two Adelie penguins as we did above:
filtered = [["Adelie", "Torgersen", 39.1, 18.7, 181.0, 3750.0],
["Adelie", "Briscoe", 37.8, 18.3, 174.0, 3400.0]]
Then, if I want to create a list of all of their bill lengths, I can call find_measurements(filtered, 2)
, which should return a list:
measurements = [39.1, 37.8]
Reminder
Remember to commit and push your changes before moving on!
ReadMe
Note: We do not care how many decimal places your answers go to. As long as you are still using the float
data type for penguin measurements, you do not need to worry about rounding in this assignment.
find_average(measurements)
:
For the find_average()
function, we will want to add together all the numbers in the input measurements
list, then divide that total by the count of numbers in the list and return the result.
find_max(measurements)
and find_min(measurements)
:
For the find_max()
and find_min()
functions, we will need to loop through the values in measurements
. Within the loop, we will keep track of which value is currently the largest (for find_max()
) or smallest (for find_min()
). You should not use Python’s built-in max()
or min()
functions here. In addition, you should not use the variable names min
or max
as they will collide with the built-in function names.
ReadMe
As a hint of how to loop over each penguin
contained in a list of penguins
(i.e., a list of lists), we can use the following code:
for penguin in penguins:
# do something with penguin, which is a list of measurements
Reminder
Commit and push your changes as you complete each function!
Species | Measurement | Min | Average | Max |
---|---|---|---|---|
Adelie | Bill Length | 32.1 | 38.7914 | 46.0 |
Adelie | Body Mass | 2850.0 | 3700.6623 | 4775.0 |
Chinstrap | Bill Length | 40.9 | 48.8338 | 58.0 |
Chinstrap | Body Mass | 2700.0 | 3733.0882 | 4800.0 |
Gentoo | Bill Length | 40.9 | 47.5049 | 59.6 |
Gentoo | Body Mass | 3950.0 | 5076.0163 | 6300.0 |