4.1. Exercises
Making pipelines
Use the diagram below to answer the following questions.
Pipeline(
steps=[('columntransformer',
ColumnTransformer(
transformers=[('pipeline-1',
Pipeline(
steps=[('simpleimputer',
SimpleImputer(strategy='median')),
('standardscaler',
StandardScaler())]),
['water_content', 'weight', 'carbs']),
('pipeline-2',
Pipeline(
steps=[('simpleimputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('onehotencoder',
OneHotEncoder(handle_unknown='ignore'))]),
['colour', 'location', 'seed', 'shape', 'sweetness',
'tropical'])])),
('decisiontreeclassifier', DecisionTreeClassifier())])
Transforming True or False
Making Pipelines with make_pipeline()
Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.
When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.
Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line won’t execute and you can test your code after each step.
Let’s try to redo exercise 13, but this time let’s use make_pipeline() and make_column_transformer.
Tasks:
- For all pipelines, make sure to use
make_pipeline()where possible. - Create a pipeline for the numeric features. It should have the first step as simple imputation using
strategy="median"and the second step should be usingStandardScaler. Name this pipelinenumeric_transformer. - Create a pipeline for the categorical features. It should also have 2 steps. The first is imputation using
strategy="most_frequent". The second step should be one-hot encoding withhandle_unknown="ignore". Name this pipelinecategotical_transformer. - Make your column transformer named
col_transformerby usingmake_column_transformer()and specify the transformations onnumeric_featuresandcategorical_featuresusing the appropriate pipelines you build above. - Create a main pipeline named
main_pipewhich preprocesses withcol_transformerfollowed by building aKNeighborsRegressormodel. - The last step is performing cross-validation using our pipeline.