3.1. Exercises
Transforming Columns with ColumnTransformer
Refer to the dataframe to answer the following question.
colour location shape water_content weight
0 red canada NaN 84 100
1 yellow mexico long 75 120
2 orange spain NaN 90 NaN
3 magenta china round NaN 600
4 purple austria NaN 80 115
5 purple turkey oval 78 340
6 green mexico oval 83 NaN
7 blue canada round 73 535
8 brown china NaN NaN 1743
9 yellow mexico oval 83 265
Transforming True or False
Your Turn with Column Transforming
Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.
When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.
Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line wonโt execute and you can test your code after each step.
Letโs now start doing transformations and working with them with our basketball dataset.
Weโve provided you with the numerical and categorical features, itโs your turn to make a pipeline for each and then use ColumnTransformer to transform them.
We have a regression problem this time where we are attempting to predict a playerโs salary.
Tasks:
- Create a pipeline for the numeric features. It should have the first step as simple imputation using
strategy="median"and the second step should be usingStandardScaler. Name this pipelinenumeric_transformer. - Create a pipeline for the categorical features. It should also have 2 steps. The first is imputation using
strategy="most_frequent". The second step should be one-hot encoding withhandle_unknown="ignore". Name this pipelinecategorical_transformer. - Make your column transformer named
col_transformerand specify the transformations onnumeric_featuresandcategorical_featuresusing the appropriate pipelines you build above. - Create a main pipeline named
main_pipewhich preprocesses withcol_transformerfollowed by building aKNeighborsRegressormodel. - The last step is performing cross-validation using our pipeline.