3.1. Exercises

Finding and Dropping Null Values Questions

You run .info() on the fruit_salad dataframe and get the following output.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   name           10 non-null     object 
 1   colour         10 non-null     object 
 2   location       10 non-null     object 
 3   seed           10 non-null     bool   
 4   shape          9 non-null      object 
 5   sweetness      10 non-null     bool   
 6   water_content  8 non-null      float64
 7   weight         10 non-null     int64  
dtypes: bool(2), float64(1), int64(1), object(4)
memory usage: 628.0+ bytes
     name  height  diameter   age flowering
0  Cherry    15.0         2  12.0      True
1     Fir    20.0         4   4.0     False
2  Willow    25.0         3   2.0      True
3     Oak     NaN         2   NaN     False
4     Oak    10.0         5   6.0       NaN

Filling Methods

Use the forest dataframe below to answer the next 2 questions:

     name  height  diameter   age flowering
0  Cherry    15.0         2  12.0      True
1     Fir    20.0         4   4.0     False
2  Willow    25.0         3   2.0      True
3     Oak     NaN         2   3.0     False
4     Oak    10.0         5   6.0     False
# Quesiton 1
     name  height  diameter  age  flowering
0  Cherry    15.0         2   12       True
1     Fir    20.0         4    4      False
2  Willow    25.0         3    2       True
3     Oak    17.5         2    3      False
4     Oak    10.0         5    6      False
# Quesiton 2
     name  height  diameter  age  flowering
0  Cherry    15.0         2   12       True
1     Fir    20.0         4    4      False
2  Willow    25.0         3    2       True
3     Oak    10.0         2    3      False
4     Oak    10.0         5    6      False

Coding questions

Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.

When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.

Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line won’t execute and you can test your code after each step.

Practice Filling Null Values

Let’s replace the values missing in the canucks dataframe with the salary mean.

Tasks:

  • Replace the NaN values in the dataframe with the mean salary value.
  • Save this as a new dataframe named canucks_altered.
  • Display the canucks_altered dataframe.
Hint 1
  • Are you using .fillna()?
  • Are you using the argument value=canucks['Salary].mean()?
Fully worked solution:


Practice Identifying Null Values

Let’s practice using .isnull() in our data processing using the canucks dataset from earlier in this course.

Tasks:

  • Identify any columns with null values in the canucks dataframe with .info() and save this as canucks_info.
  • Create a new column in the dataframe named Wealth where all the values equal "comfortable".
  • Name the new dataframe canucks_comf.
  • Do conditional value replacement, where if the value in the Salary column is null, we replace "comfortable" with "unknown".
  • Display the new canucks_comf dataframe.
Hint 1
  • Are you using canucks.info()?
  • Are you creating canucks_comf with canucks.assign(Wealth = "comfortable")?
  • Are you using .loc[] to replace the values in the Wealth column?
  • Are you using canucks_comf['Salary'].isnull() as your condition in .loc[]?
Fully worked solution: