2.1. Exercises

Imputation

Imputation True or False

Imputing in Action

Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.

When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.

Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line wonโ€™t execute and you can test your code after each step.

Letโ€™s take a look at a modified version of our basketball player dataset.

First, letโ€™s take a look at if and/or where we are missing any values.

Tasks:

  • Use .describe() or .info() to find if there are any values missing from the dataset.
  • Using some of the skills we learned in the previous course find the number of rows that contains missing values and save the total number of examples with missing values in an object named num_nan.
    Hint: .any(axis=1) may come in handy here.
Hint 1
  • Are you using X_train.info()?
  • Are you using X_train.isnull().any(axis=1).sum()?
Fully worked solution:


Now that weโ€™ve identified the columns with missing values, letโ€™s use SimpleImputer to replace the missing value.

Tasks:

  • Import the necessary library.
  • Using SimpleImputer, replace the null values in the training and testing dataset with the median value in each column.
  • Save your transformed data in objects named train_X_imp and test_X_imp respectively.
  • Transform X_train_imp into a dataframe using the column and index labels from X_train and save it as X_train_imp_df.
  • Check if X_train_imp_df still has missing values.
Hint 1
  • Are you using SimpleImputer(strategy="median")?
  • Are you fitting your model?
  • Are you using transfor() on both your train and test sets?
  • Are you putting it into a dataframe with pd.DataFrame(X_train_imp, columns = X_train.columns, index = X_train.index)?
Fully worked solution: