2.1. Exercises
Imputation
Imputation True or False
Imputing in Action
Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.
When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.
Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line wonโt execute and you can test your code after each step.
Letโs take a look at a modified version of our basketball player dataset.
First, letโs take a look at if and/or where we are missing any values.
Tasks:
- Use
.describe()or.info()to find if there are any values missing from the dataset. - Using some of the skills we learned in the previous course find the number of rows that contains missing values and save the total number of examples with missing values in an object named
num_nan.
Hint:.any(axis=1)may come in handy here.
Now that weโve identified the columns with missing values, letโs use SimpleImputer to replace the missing value.
Tasks:
- Import the necessary library.
- Using
SimpleImputer, replace the null values in the training and testing dataset with the median value in each column. - Save your transformed data in objects named
train_X_impandtest_X_imprespectively. - Transform
X_train_impinto a dataframe using the column and index labels fromX_trainand save it asX_train_imp_df. - Check if
X_train_imp_dfstill has missing values.