5.1. Exercises

Quick Questions on Tradeoff and Golden Rule

Question 1

404 image

Training and Testing Questions

Picking your Hyperparameter Part 1

Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.

When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.

Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line wonโ€™t execute and you can test your code after each step.

Letโ€™s take a look at the basketball dataset we saw in exercise 16. We will again be using features height, weight and salary and a target column position.. This time , however, letโ€™s cross-validate on different values for max_depth so we can set this hyperparameter and build a final model that best generalizes on our test set.

First letโ€™s see which hyperparameter is the most optimal.

Tasks:

  • Fill in the code below.

  • We are first loading in our bball.csv dataset and assigning our features to X and our target position to an object named y.

  • Fill in the code so that it split the dataset into X_train, X_test, y_train, y_test. Make sure to use a 20% test set and a random_state=33 so we can verify you solution.

  • Next, fill in the code so that a for loop does the following:

    1. iterates over the values 1-20.
    • Builds a decision tree classifier with a max_depth equal to each iteration.
    • Uses cross_validate on the model with a cv=10 and return_train_score=True.
    • Appends the depth value to the depth list in the dictionary results_dict.
    • Appends the test_score to the mean_cv_score list in the dictionary.
    • Appends the train_score to the mean_train_score list in the dictionary.
  • We have given you code that wrangles this dictionary and transforms it into a state ready for plotting.

  • Finish off by filling in the blank to create a line graph that plots the train and validation scores for each depth value. (Note: we have edited the limits of the y-axis so itโ€™s easier to read)

Hint 1
  • Are you using train_test_split() to split the data?
  • Are you splitting with either test_size=0.2 or train_size=0.8?
  • Are you setting your random_state=33 inside train_test_split()?
  • Are you using DecisionTreeClassifier(max_depth=depth) to build the model?
  • Are you using cross_validate(model, X_train, y_train, cv=10, return_train_score=True)?
  • Are you using alt.Chart(results_df).mark_line() to create your plot?
Fully worked solution:


Picking your Hyperparameter Part 2

Now that we have found a suitable value for max_depth letโ€™s build a new model and let this hyperparameter value. How well does your model do on the test data?

Tasks:

  • Build a model using DecisionTreeClassifier() using the optimal max_depth.
  • Save this in an object named model.
  • Fit your model on the objects X_train and y_train.
  • Evaluate the test score of the model using .score() on X_test and y_test and save the values in an object named test_score rounded to 4 decimal places.
Hint 1
  • Are using DecisionTreeClassifier(max_depth=4)?
  • Are you using the model named model?
  • Are you calling .fit(X_train, y_train) on your model?
  • Are you scoring your model using model.score(X_test, y_test)?
  • Are you rounding to 4 decimal places?
  • Are you calculating test_score as round(model.score(X_test, y_test), 4)
Fully worked solution: