Support Vector Machines (SVMs) with RBF Kernel

cities_df = pd.read_csv("data/canada_usa_cities.csv")
train_df, test_df = train_test_split(cities_df, test_size=0.2, random_state=123)
train_df.head()
longitude latitude country
160 -76.4813 44.2307 Canada
127 -81.2496 42.9837 Canada
169 -66.0580 45.2788 Canada
188 -73.2533 45.3057 Canada
187 -67.9245 47.1652 Canada


X_train, y_train = train_df.drop(columns=['country']), train_df['country']
X_test, y_test = test_df.drop(columns=['country']), test_df['country']
X_train.head()
longitude latitude
160 -76.4813 44.2307
127 -81.2496 42.9837
169 -66.0580 45.2788
188 -73.2533 45.3057
187 -67.9245 47.1652
cities_plot = alt.Chart(train_df).mark_circle(size=20, opacity=0.6).encode(
    alt.X('longitude:Q', scale=alt.Scale(domain=[-140, -40])),
    alt.Y('latitude:Q', scale=alt.Scale(domain=[20, 60])),
    alt.Color('country:N', scale=alt.Scale(domain=['Canada', 'USA'],
                                           range=['red', 'blue'])))
cities_plot
404 image
from sklearn.svm import SVC
from sklearn.model_selection import cross_validate

svm = SVC(gamma=0.01)
scores = cross_validate(svm, X_train, y_train, return_train_score=True)
pd.DataFrame(scores)
fit_time score_time test_score train_score
0 0.002137 0.001423 0.823529 0.842105
1 0.001644 0.001235 0.823529 0.842105
2 0.001591 0.001213 0.727273 0.858209
3 0.001578 0.001210 0.787879 0.843284
4 0.001583 0.001210 0.939394 0.805970


svm_cv_score = scores['test_score'].mean()
svm_cv_score
np.float64(0.8203208556149733)
cities_df = pd.read_csv("data/canada_usa_cities.csv")
train_df, test_df = train_test_split(cities_df, test_size=0.2, random_state=123)
X_train, y_train = train_df.drop(columns=['country']), train_df['country']
X_test, y_test = test_df.drop(columns=['country']), test_df['country']

SVMs

knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_validate(knn, X_train, y_train, return_train_score=True)
pd.DataFrame(scores)
fit_time score_time test_score train_score
0 0.001740 0.002326 0.852941 0.849624
1 0.001257 0.002106 0.764706 0.834586
2 0.001236 0.001877 0.727273 0.850746
3 0.001218 0.001899 0.787879 0.858209
4 0.001212 0.001886 0.878788 0.813433


knn_cv_score = scores['test_score'].mean().round(3)
knn_cv_score
np.float64(0.802)


svm_cv_score.round(3)
np.float64(0.82)

SVM Regressor

from sklearn.svm import SVR

Hyperparameters of SVM are:

  • gamma
  • C

Relation of gamma and the fundamental trade-off

gamma controls the complexity of a model, just like other hyperparameters we’ve seen.

  • As gamma ↑, complexity ↑
  • As gamma ↓, complexity ↓


Relation of C and the fundamental trade-off

C also affects the fundamental tradeoff.

  • As C ↑, complexity ↑
  • As C ↓, complexity ↓


Let’s apply what we learned!