4.1. Exercises

Making pipelines

Use the diagram below to answer the following questions.

Pipeline(
    steps=[('columntransformer',
               ColumnTransformer(
                  transformers=[('pipeline-1',
                                  Pipeline(
                                    steps=[('simpleimputer',
                                             SimpleImputer(strategy='median')),
                                           ('standardscaler',
                                             StandardScaler())]),
                      ['water_content', 'weight', 'carbs']),
                                ('pipeline-2',
                                  Pipeline(
                                    steps=[('simpleimputer',
                                             SimpleImputer(fill_value='missing',
                                                                strategy='constant')),
                                           ('onehotencoder',
                                             OneHotEncoder(handle_unknown='ignore'))]),
                      ['colour', 'location', 'seed', 'shape', 'sweetness',
                                                   'tropical'])])),
         ('decisiontreeclassifier', DecisionTreeClassifier())])
                

Transforming True or False

Making Pipelines with make_pipeline()

Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.

When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.

Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line won’t execute and you can test your code after each step.

Let’s try to redo exercise 13, but this time let’s use make_pipeline() and make_column_transformer.

Tasks:

  • For all pipelines, make sure to use make_pipeline() where possible.
  • Create a pipeline for the numeric features. It should have the first step as simple imputation using strategy="median" and the second step should be using StandardScaler. Name this pipeline numeric_transformer.
  • Create a pipeline for the categorical features. It should also have 2 steps. The first is imputation using strategy="most_frequent". The second step should be one-hot encoding with handle_unknown="ignore". Name this pipeline categotical_transformer.
  • Make your column transformer named col_transformer by using make_column_transformer()and specify the transformations on numeric_features and categorical_features using the appropriate pipelines you build above.
  • Create a main pipeline named main_pipe which preprocesses with col_transformer followed by building a KNeighborsRegressor model.
  • The last step is performing cross-validation using our pipeline.
Hint 1
  • Are you using SimpleImputer(strategy="median") for numerical imputation?
  • Are you naming your steps?
  • Are you using SimpleImputer(strategy="most_frequent") for categorical imputation?
  • Are you using one-hot encoding?
  • Are you specifying numeric_transformer with numeric_features and categorical_transformer with categorical_features in make_column_transformer?
  • Is the first step in your main pipeline calling col_transformer?
  • Are you calling main_pipe in cross_validate()?
Fully worked solution: