3.1. Exercises

Cross Validation Questions

Question 3

array([0.80952381, 0.80952381, 0.85714286, 0.85714286])

Cross Validation True or False


Instructions:
Running a coding exercise for the first time could take a bit of time for everything to load. Be patient, it could take a few minutes.

When you see ____ in a coding exercise, replace it with what you assume to be the correct code. Run it and see if you obtain the desired output. Submit your code to validate if you were correct.

Make sure you remove the hash (#) symbol in the coding portions of this question. We have commented them so that the line wonโ€™t execute and you can test your code after each step.

Cross Validation in Action

Letโ€™s use cross_val_score() on a Pokรฉmon dataset that weโ€™ve used before in Programming in Python for Data Science.

Tasks:

  • Split the X and y dataframes into 4 objects: X_train, X_test, y_train, y_test.
  • Make the test set 0.2 (or the train set 0.8) and make sure to use random_state=33 (the random state here is for testing purposes so we all get the same split).
  • Build a model using DecisionTreeClassifier().
  • Save this in an object named model.
  • Cross-validate using cross_val_score() on the objects X_train and y_train and with 6 folds (cv=6) and save these scores in an object named cv_scores.
Hint 1
  • Are you using X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=33)?
  • Are using DecisionTreeClassifier()?
  • Are you using the model named model?
  • Are you cross-validating using cross_val_score(model, X_train, y_train, cv=6) on your model?
Fully worked solution:


Cross Validation in Action again!

Letโ€™s use cross_validate() on our Pokรฉmon dataset that we saw in the previous exercises.

Tasks:

  • Build a model using DecisionTreeClassifier().
  • Save this in an object named model.
  • Cross-validate using cross_validate() on the objects X_train and y_train making sure to specify 10 folds and return_train_score=True.
  • Convert the scores into a dataframe and save it in an object named scores_df.
  • Calculate the mean value of each column and save this in an object named mean_scores.
Hint 1
  • Are using DecisionTreeClassifier()?
  • Are you using the model named model?
  • Are you cross-validating using cross_validate(model, X_train, y_train, cv=10, return_train_score=True) on your model?
  • Are you saving your dataframe using pd.DataFrame(scores)?
  • Are you using .mean() to calculate the mean of each column in scores_df?
Fully worked solution: