Module Learning Outcomes

Module Learning Outcomes

By the end of the module, students are expected to:

  • Explain handle_unknown="ignore" hyperparameter of scikit-learn’s OneHotEncoder.
  • Identify when it’s appropriate to apply ordinal encoding vs one-hot encoding.
  • Explain strategies to deal with categorical variables with too many categories.
  • Explain why text data needs a different treatment than categorical variables.
  • Use scikit-learn’s CountVectorizer to encode text data.
  • Explain different hyperparameters of CountVectorizer.
  • Use ColumnTransformer to build all our transformations together into one object and use it with scikit-learn pipelines.

Let’s start!