1.1. Exercises

Splitting our data

Decision Tree Outcome

Splitting Data in Action

Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.

When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.

Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line won’t execute and you can test your code after each step.

Let’s split our data using train_test_split() on our candy bars dataset.

Tasks:

  • Split the X and y dataframes into 4 objects: X_train, X_test, y_train, y_test.
  • Make the test set 0.2 (or the train set 0.8) and make sure to use random_state=7.
  • Build a model using DecisionTreeClassifier().
  • Save this in an object named model.
  • Fit your model on the objects X_train and y_train.
  • Evaluate the accuracy of the model using .score() on X_train and y_train save the values in an object named train_score.
  • Repeat the above action but this time evaluate the accuracy of the model using .score() on X_test and y_test (which the model has never seen before) and save the values in an object named test_score.
Hint 1
  • Are you using X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)?
  • Are using DecisionTreeClassifier()?
  • Are you using the model named model?
  • Are you calling .fit(X_train, y_train) on your model?
  • Are you scoring your model using model.score(X_train, y_train) and model.score(X_test, y_test)?
Fully worked solution: