Lectures 6: Class demo#

Imports#

import os
import sys

sys.path.append(os.path.join(os.path.abspath(".."), (".."), "code"))

import matplotlib.pyplot as plt
import mglearn
import numpy as np
import pandas as pd
from plotting_functions import *
from sklearn.dummy import DummyClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import cross_val_score, cross_validate, train_test_split
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from utils import *

%matplotlib inline
pd.set_option("display.max_colwidth", 200)
DATA_DIR = os.path.join(os.path.abspath(".."), (".."), "data/")
from sklearn import set_config

set_config(display="diagram")



Let’s look at an example of tuning max_depth of the DecisionTreeClassifier on the Spotify dataset.

spotify_df = pd.read_csv(DATA_DIR + "spotify.csv", index_col=0)
X_spotify = spotify_df.drop(columns=["target", "artist"])
y_spotify = spotify_df["target"]
X_spotify.head()
acousticness danceability duration_ms energy instrumentalness key liveness loudness mode speechiness tempo time_signature valence song_title
0 0.0102 0.833 204600 0.434 0.021900 2 0.1650 -8.795 1 0.4310 150.062 4.0 0.286 Mask Off
1 0.1990 0.743 326933 0.359 0.006110 1 0.1370 -10.401 1 0.0794 160.083 4.0 0.588 Redbone
2 0.0344 0.838 185707 0.412 0.000234 2 0.1590 -7.148 1 0.2890 75.044 4.0 0.173 Xanny Family
3 0.6040 0.494 199413 0.338 0.510000 5 0.0922 -15.236 1 0.0261 86.468 4.0 0.230 Master Of None
4 0.1800 0.678 392893 0.561 0.512000 5 0.4390 -11.648 0 0.0694 174.004 4.0 0.904 Parallel Lines
X_train, X_test, y_train, y_test = train_test_split(
    X_spotify, y_spotify, test_size=0.2, random_state=123
)
numeric_feats = ['acousticness', 'danceability', 'energy',
                 'instrumentalness', 'liveness', 'loudness',
                 'speechiness', 'tempo', 'valence']
categorical_feats = ['time_signature', 'key']
passthrough_feats = ['mode']
text_feat = "song_title"
from sklearn.compose import make_column_transformer
from sklearn.feature_extraction.text import CountVectorizer

preprocessor = make_column_transformer(
    (StandardScaler(), numeric_feats), 
    (OneHotEncoder(handle_unknown = "ignore"), categorical_feats), 
    ("passthrough", passthrough_feats), 
    (CountVectorizer(max_features=100, stop_words="english"), text_feat)
)

svc_pipe = make_pipeline(preprocessor, SVC)

What’s the general approach for model selection?

mglearn.plots.plot_grid_search_overview()
../../../_images/99f33e5951513ab11c5676d40e6dbda23d9ae9f24a360b32567af303cd04628e.png

Hyperparameter optimization is so common that sklearn includes two classes to automate these steps.

The “CV” stands for cross-validation; these methods have built-in cross-validation.





Exhaustive grid search: sklearn.model_selection.GridSearchCV#

  • For GridSearchCV we need

    • an instantiated model or a pipeline

    • a parameter grid: A user specifies a set of values for each hyperparameter.

    • other optional arguments

The method considers product of the sets and evaluates each combination one by one.

preprocessor.fit(X_train)
ColumnTransformer(transformers=[('standardscaler', StandardScaler(),
                                 ['acousticness', 'danceability', 'energy',
                                  'instrumentalness', 'liveness', 'loudness',
                                  'speechiness', 'tempo', 'valence']),
                                ('onehotencoder',
                                 OneHotEncoder(handle_unknown='ignore'),
                                 ['time_signature', 'key']),
                                ('passthrough', 'passthrough', ['mode']),
                                ('countvectorizer',
                                 CountVectorizer(max_features=100,
                                                 stop_words='english'),
                                 'song_title')])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
from sklearn.model_selection import GridSearchCV

pipe_svm = make_pipeline(preprocessor, SVC())

param_grid = {
    "columntransformer__countvectorizer__max_features": [100, 200, 400, 800, 1000, 2000],
    "svc__gamma": [0.001, 0.01, 0.1, 1.0, 10, 100],
    "svc__C": [0.001, 0.01, 0.1, 1.0, 10, 100],
}

# Create a grid search object 
gs = GridSearchCV(pipe_svm, 
                  param_grid = param_grid, 
                  n_jobs=-1, 
                  return_train_score=True
                 )

The GridSearchCV object above behaves like a classifier. We can call fit, predict or score on it.

# Carry out the search 
gs.fit(X_train, y_train)
GridSearchCV(estimator=Pipeline(steps=[('columntransformer',
                                        ColumnTransformer(transformers=[('standardscaler',
                                                                         StandardScaler(),
                                                                         ['acousticness',
                                                                          'danceability',
                                                                          'energy',
                                                                          'instrumentalness',
                                                                          'liveness',
                                                                          'loudness',
                                                                          'speechiness',
                                                                          'tempo',
                                                                          'valence']),
                                                                        ('onehotencoder',
                                                                         OneHotEncoder(handle_unknown='ignore'),
                                                                         ['time_signature',
                                                                          'key']),
                                                                        ('passthrough',
                                                                         'passthrough',
                                                                         ['mode']),
                                                                        ('countvectorizer',
                                                                         CountVectorizer(max_features=100,
                                                                                         stop_words='english'),
                                                                         'song_title')])),
                                       ('svc', SVC())]),
             n_jobs=-1,
             param_grid={'columntransformer__countvectorizer__max_features': [100,
                                                                              200,
                                                                              400,
                                                                              800,
                                                                              1000,
                                                                              2000],
                         'svc__C': [0.001, 0.01, 0.1, 1.0, 10, 100],
                         'svc__gamma': [0.001, 0.01, 0.1, 1.0, 10, 100]},
             return_train_score=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Fitting the GridSearchCV object

  • Searches for the best hyperparameter values

  • You can access the best score and the best hyperparameters using best_score_ and best_params_ attributes, respectively.

# Get the best score
gs.best_score_
np.float64(0.7395977155164125)
# Get the best hyperparameter values
gs.best_params_
{'columntransformer__countvectorizer__max_features': 1000,
 'svc__C': 1.0,
 'svc__gamma': 0.1}
  • It is often helpful to visualize results of all cross-validation experiments.

  • You can access this information using cv_results_ attribute of a fitted GridSearchCV object.

results = pd.DataFrame(gs.cv_results_)
results.T
0 1 2 3 4 5 6 7 8 9 ... 206 207 208 209 210 211 212 213 214 215
mean_fit_time 0.081458 0.067597 0.079209 0.068872 0.062513 0.062845 0.073076 0.064405 0.068187 0.068321 ... 0.10833 0.108598 0.107987 0.098317 0.085998 0.121705 0.107132 0.104748 0.108369 0.097153
std_fit_time 0.007708 0.006254 0.010356 0.007887 0.005892 0.001883 0.007362 0.003557 0.008313 0.009532 ... 0.007422 0.013991 0.009854 0.008934 0.006501 0.009733 0.012548 0.008275 0.009486 0.009063
mean_score_time 0.017978 0.019882 0.016656 0.017829 0.016837 0.016814 0.019368 0.015401 0.016989 0.017853 ... 0.018264 0.020877 0.021099 0.01755 0.016192 0.018154 0.017385 0.019507 0.019063 0.016565
std_score_time 0.002431 0.001182 0.001885 0.003617 0.003907 0.001666 0.003992 0.002815 0.001382 0.003161 ... 0.004611 0.001654 0.00359 0.002995 0.002228 0.004893 0.004633 0.001255 0.00204 0.001966
param_columntransformer__countvectorizer__max_features 100 100 100 100 100 100 100 100 100 100 ... 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000
param_svc__C 0.001 0.001 0.001 0.001 0.001 0.001 0.01 0.01 0.01 0.01 ... 10.0 10.0 10.0 10.0 100.0 100.0 100.0 100.0 100.0 100.0
param_svc__gamma 0.001 0.01 0.1 1.0 10.0 100.0 0.001 0.01 0.1 1.0 ... 0.1 1.0 10.0 100.0 0.001 0.01 0.1 1.0 10.0 100.0
params {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 0.001} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 0.01} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 1.0} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 10} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 100} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.01, 'svc__gamma': 0.001} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.01, 'svc__gamma': 0.01} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.01, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.01, 'svc__gamma': 1.0} ... {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 10, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 10, 'svc__gamma': 1.0} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 10, 'svc__gamma': 10} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 10, 'svc__gamma': 100} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 100, 'svc__gamma': 0.001} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 100, 'svc__gamma': 0.01} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 100, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 100, 'svc__gamma': 1.0} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 100, 'svc__gamma': 10} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 100, 'svc__gamma': 100}
split0_test_score 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 ... 0.733746 0.616099 0.50774 0.504644 0.718266 0.718266 0.724458 0.616099 0.50774 0.504644
split1_test_score 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 ... 0.77709 0.625387 0.510836 0.510836 0.724458 0.739938 0.764706 0.625387 0.510836 0.510836
split2_test_score 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 ... 0.690402 0.606811 0.50774 0.50774 0.693498 0.705882 0.687307 0.606811 0.50774 0.50774
split3_test_score 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 ... 0.708075 0.618012 0.509317 0.509317 0.68323 0.704969 0.708075 0.618012 0.509317 0.509317
split4_test_score 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 ... 0.723602 0.645963 0.509317 0.509317 0.720497 0.717391 0.720497 0.645963 0.509317 0.509317
mean_test_score 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 ... 0.726583 0.622454 0.50899 0.508371 0.70799 0.717289 0.721008 0.622454 0.50899 0.508371
std_test_score 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 ... 0.029198 0.013161 0.001162 0.002105 0.01647 0.012616 0.025396 0.013161 0.001162 0.002105
rank_test_score 121 121 121 121 121 121 121 121 121 121 ... 9 81 91 97 28 22 18 81 91 97
split0_train_score 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 ... 1.0 1.0 1.0 1.0 0.828682 0.989147 1.0 1.0 1.0 1.0
split1_train_score 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 ... 0.999225 1.0 1.0 1.0 0.834109 0.989922 1.0 1.0 1.0 1.0
split2_train_score 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 ... 0.99845 0.999225 0.999225 0.999225 0.827907 0.987597 0.999225 0.999225 0.999225 0.999225
split3_train_score 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 ... 0.998451 0.999225 0.999225 0.999225 0.841208 0.989156 0.999225 0.999225 0.999225 0.999225
split4_train_score 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 ... 0.999225 0.999225 0.999225 0.999225 0.82804 0.988381 0.999225 0.999225 0.999225 0.999225
mean_train_score 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 ... 0.99907 0.999535 0.999535 0.999535 0.831989 0.988841 0.999535 0.999535 0.999535 0.999535
std_train_score 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 ... 0.00058 0.00038 0.00038 0.00038 0.005151 0.00079 0.00038 0.00038 0.00038 0.00038

23 rows × 216 columns

results = (
    pd.DataFrame(gs.cv_results_).set_index("rank_test_score").sort_index()
)
display(results.T)
rank_test_score 1 2 3 4 5 5 7 8 9 10 ... 121 121 121 121 121 121 121 121 121 121
mean_fit_time 0.074224 0.083989 0.065975 0.067329 0.065096 0.061373 0.101038 0.097765 0.10833 0.103845 ... 0.088863 0.078514 0.078426 0.072423 0.102554 0.125866 0.085874 0.080043 0.084001 0.081458
std_fit_time 0.001822 0.013607 0.002373 0.003355 0.004307 0.007488 0.004101 0.01032 0.007422 0.014893 ... 0.007773 0.008 0.003515 0.00311 0.033829 0.019363 0.00554 0.005354 0.009203 0.007708
mean_score_time 0.015548 0.016412 0.013869 0.016925 0.01491 0.012543 0.01594 0.012393 0.018264 0.013806 ... 0.019305 0.018519 0.019897 0.017696 0.030472 0.020777 0.019871 0.018711 0.020357 0.017978
std_score_time 0.001206 0.004361 0.002881 0.003267 0.002366 0.002564 0.002809 0.000207 0.004611 0.003018 ... 0.00291 0.002901 0.000652 0.002117 0.013481 0.004611 0.004083 0.003717 0.004935 0.002431
param_columntransformer__countvectorizer__max_features 1000 2000 400 800 200 100 800 1000 2000 400 ... 1000 1000 1000 400 400 400 400 400 1000 100
param_svc__C 1.0 1.0 1.0 1.0 1.0 1.0 10.0 10.0 10.0 10.0 ... 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
param_svc__gamma 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ... 0.1 0.01 0.001 0.001 0.01 0.1 1.0 10.0 100.0 0.001
params {'columntransformer__countvectorizer__max_features': 1000, 'svc__C': 1.0, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 1.0, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 1.0, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 800, 'svc__C': 1.0, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 200, 'svc__C': 1.0, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 1.0, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 800, 'svc__C': 10, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 1000, 'svc__C': 10, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 2000, 'svc__C': 10, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 10, 'svc__gamma': 0.1} ... {'columntransformer__countvectorizer__max_features': 1000, 'svc__C': 0.001, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 1000, 'svc__C': 0.001, 'svc__gamma': 0.01} {'columntransformer__countvectorizer__max_features': 1000, 'svc__C': 0.001, 'svc__gamma': 0.001} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 0.001, 'svc__gamma': 0.001} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 0.001, 'svc__gamma': 0.01} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 0.001, 'svc__gamma': 0.1} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 0.001, 'svc__gamma': 1.0} {'columntransformer__countvectorizer__max_features': 400, 'svc__C': 0.001, 'svc__gamma': 10} {'columntransformer__countvectorizer__max_features': 1000, 'svc__C': 0.001, 'svc__gamma': 100} {'columntransformer__countvectorizer__max_features': 100, 'svc__C': 0.001, 'svc__gamma': 0.001}
split0_test_score 0.764706 0.767802 0.764706 0.76161 0.758514 0.76161 0.727554 0.718266 0.733746 0.739938 ... 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774
split1_test_score 0.767802 0.770898 0.764706 0.76161 0.758514 0.755418 0.77709 0.783282 0.77709 0.783282 ... 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774
split2_test_score 0.71517 0.708978 0.708978 0.712074 0.712074 0.712074 0.690402 0.708978 0.690402 0.693498 ... 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774 0.50774
split3_test_score 0.717391 0.717391 0.714286 0.720497 0.717391 0.714286 0.729814 0.717391 0.708075 0.714286 ... 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211 0.506211
split4_test_score 0.732919 0.729814 0.729814 0.723602 0.729814 0.732919 0.714286 0.708075 0.723602 0.701863 ... 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317 0.509317
mean_test_score 0.739598 0.738977 0.736498 0.735879 0.735261 0.735261 0.727829 0.727198 0.726583 0.726573 ... 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775
std_test_score 0.022629 0.025689 0.024028 0.021345 0.019839 0.020414 0.028337 0.028351 0.029198 0.032404 ... 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982 0.000982
split0_train_score 0.889147 0.903101 0.881395 0.886047 0.872093 0.856589 0.993023 0.996899 1.0 0.986047 ... 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752
split1_train_score 0.877519 0.895349 0.858915 0.873643 0.847287 0.83876 0.993023 0.994574 0.999225 0.987597 ... 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752
split2_train_score 0.888372 0.897674 0.87907 0.887597 0.85969 0.849612 0.994574 0.994574 0.99845 0.989922 ... 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752 0.507752
split3_train_score 0.884586 0.902401 0.869094 0.879938 0.859799 0.852053 0.989156 0.992254 0.998451 0.982184 ... 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133 0.508133
split4_train_score 0.876065 0.891557 0.861348 0.874516 0.850503 0.841983 0.992254 0.993029 0.999225 0.985283 ... 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359 0.507359
mean_train_score 0.883138 0.898016 0.869964 0.880348 0.857874 0.847799 0.992406 0.994266 0.99907 0.986207 ... 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775 0.50775
std_train_score 0.005426 0.004337 0.009063 0.00573 0.008667 0.006545 0.001792 0.001594 0.00058 0.002561 ... 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245 0.000245

22 rows × 216 columns

Let’s only look at the most relevant rows.

pd.DataFrame(gs.cv_results_)[
    [
        "mean_test_score",
        "param_columntransformer__countvectorizer__max_features", 
        "param_svc__gamma",
        "param_svc__C",
        "mean_fit_time",
        "rank_test_score",
    ]
].set_index("rank_test_score").sort_index().T
rank_test_score 1 2 3 4 5 5 7 8 9 10 ... 121 121 121 121 121 121 121 121 121 121
mean_test_score 0.739598 0.738977 0.736498 0.735879 0.735261 0.735261 0.727829 0.727198 0.726583 0.726573 ... 0.507750 0.507750 0.507750 0.507750 0.507750 0.507750 0.507750 0.507750 0.507750 0.507750
param_columntransformer__countvectorizer__max_features 1000.000000 2000.000000 400.000000 800.000000 200.000000 100.000000 800.000000 1000.000000 2000.000000 400.000000 ... 1000.000000 1000.000000 1000.000000 400.000000 400.000000 400.000000 400.000000 400.000000 1000.000000 100.000000
param_svc__gamma 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 ... 0.100000 0.010000 0.001000 0.001000 0.010000 0.100000 1.000000 10.000000 100.000000 0.001000
param_svc__C 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10.000000 10.000000 10.000000 10.000000 ... 0.001000 0.001000 0.001000 0.001000 0.001000 0.001000 0.001000 0.001000 0.001000 0.001000
mean_fit_time 0.074224 0.083989 0.065975 0.067329 0.065096 0.061373 0.101038 0.097765 0.108330 0.103845 ... 0.088863 0.078514 0.078426 0.072423 0.102554 0.125866 0.085874 0.080043 0.084001 0.081458

5 rows × 216 columns

  • Other than searching for best hyperparameter values, GridSearchCV also fits a new model on the whole training set with the parameters that yielded the best results.

  • So we can conveniently call score on the test set with a fitted GridSearchCV object.

# Get the test scores 

gs.score(X_test, y_test)
0.7574257425742574

Why are best_score_ and the score above different?

n_jobs=-1#

  • Note the n_jobs=-1 above.

  • Hyperparameter optimization can be done in parallel for each of the configurations.

  • This is very useful when scaling up to large numbers of machines in the cloud.

  • When you set n_jobs=-1, it means that you want to use all available CPU cores for the task.

The __ syntax#

  • Above: we have a nesting of transformers.

  • We can access the parameters of the “inner” objects by using __ to go “deeper”:

  • svc__gamma: the gamma of the svc of the pipeline

  • svc__C: the C of the svc of the pipeline

  • columntransformer__countvectorizer__max_features: the max_features hyperparameter of CountVectorizer in the column transformer preprocessor.

pipe_svm
Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('standardscaler',
                                                  StandardScaler(),
                                                  ['acousticness',
                                                   'danceability', 'energy',
                                                   'instrumentalness',
                                                   'liveness', 'loudness',
                                                   'speechiness', 'tempo',
                                                   'valence']),
                                                 ('onehotencoder',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['time_signature', 'key']),
                                                 ('passthrough', 'passthrough',
                                                  ['mode']),
                                                 ('countvectorizer',
                                                  CountVectorizer(max_features=100,
                                                                  stop_words='english'),
                                                  'song_title')])),
                ('svc', SVC())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Range of C#

  • Note the exponential range for C. This is quite common. Using this exponential range allows you to explore a wide range of values efficiently.

  • There is no point trying \(C=\{1,2,3\ldots,100\}\) because \(C=1,2,3\) are too similar to each other.

  • Often we’re trying to find an order of magnitude, e.g. \(C=\{0.01,0.1,1,10,100\}\).

  • We can also write that as \(C=\{10^{-2},10^{-1},10^0,10^1,10^2\}\).

  • Or, in other words, \(C\) values to try are \(10^n\) for \(n=-2,-1,0,1,2\) which is basically what we have above.



Visualizing the parameter grid as a heatmap#

def display_heatmap(param_grid, pipe, X_train, y_train):
    grid_search = GridSearchCV(
        pipe, param_grid, cv=5, n_jobs=-1, return_train_score=True
    )
    grid_search.fit(X_train, y_train)
    results = pd.DataFrame(grid_search.cv_results_)
    scores = np.array(results.mean_test_score).reshape(6, 6)

    # plot the mean cross-validation scores
    my_heatmap(
        scores,
        xlabel="gamma",
        xticklabels=param_grid["svc__gamma"],
        ylabel="C",
        yticklabels=param_grid["svc__C"],
        cmap="viridis",
    );
  • Note that the range we pick for the parameters play an important role in hyperparameter optimization.

  • For example, consider the following grid and the corresponding results.

param_grid1 = {
    "svc__gamma": 10.0**np.arange(-3, 3, 1), 
    "svc__C": 10.0**np.arange(-3, 3, 1)
}
display_heatmap(param_grid1, pipe_svm, X_train, y_train)
../../../_images/c4b09a27108abb5428feee2e33131f0e60f9fbd8a51ad8714d809936d02f37c7.png
  • Each point in the heat map corresponds to one run of cross-validation, with a particular setting

  • Colour encodes cross-validation accuracy.

    • Lighter colour means high accuracy

    • Darker colour means low accuracy

  • SVC is quite sensitive to hyperparameter settings.

  • Adjusting hyperparameters can change the accuracy from 0.51 to 0.74!

Bad range for hyperparameters#

np.logspace(1, 2, 6)
array([ 10.        ,  15.84893192,  25.11886432,  39.81071706,
        63.09573445, 100.        ])
np.linspace(1, 2, 6)
array([1. , 1.2, 1.4, 1.6, 1.8, 2. ])
param_grid2 = {"svc__gamma": np.round(np.logspace(1, 2, 6), 1), "svc__C": np.linspace(1, 2, 6)}
display_heatmap(param_grid2, pipe_svm, X_train, y_train)
../../../_images/645f2b5bffad222686894a731046fca24cb4badcbb03768a83e5d4ccf90b1992.png

Different range for hyperparameters yields better results!#

np.logspace(-3, 2, 6)
array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02])
np.linspace(1, 2, 6)
array([1. , 1.2, 1.4, 1.6, 1.8, 2. ])
param_grid3 = {"svc__gamma": np.logspace(-3, 2, 6), "svc__C": np.linspace(1, 2, 6)}

display_heatmap(param_grid3, pipe_svm, X_train, y_train)
../../../_images/8d38c4ef500125c30f91cae5c2dee73bfd524eac0b4f69255a265f8fb54fab4b.png

It seems like we are getting even better cross-validation results with C = 2.0 and gamma = 0.1

How about exploring different values of C close to 2.0?

param_grid4 = {"svc__gamma": np.logspace(-3, 2, 6), "svc__C": np.linspace(2, 3, 6)}

display_heatmap(param_grid4, pipe_svm, X_train, y_train)
../../../_images/9ff63466e1849933827d61d37ac2feea46f1038aa0eebc009235807167503b1e.png

That’s good! We are finding some more options for C where the accuracy is 0.75. The tricky part is we do not know in advance what range of hyperparameters might work the best for the given problem, model, and the dataset.

Note

GridSearchCV allows the param_grid to be a list of dictionaries. Sometimes some hyperparameters are applicable only for certain models. For example, in the context of SVC, C and gamma are applicable when the kernel is rbf whereas only C is applicable for kernel="linear".

(Optional) Fancier methods#

  • Both GridSearchCV and RandomizedSearchCV do each trial independently.

  • What if you could learn from your experience, e.g. learn that max_depth=3 is bad?

    • That could save time because you wouldn’t try combinations involving max_depth=3 in the future.

  • We can do this with scikit-optimize, which is a completely different package from scikit-learn

  • It uses a technique called “model-based optimization” and we’ll specifically use “Bayesian optimization”.

    • In short, it uses machine learning to predict what hyperparameters will be good.

    • Machine learning on machine learning!

  • This is an active research area and there are sophisticated packages for this.

Here are some examples