What Did we Learn and What to Expect in Assignment 6
Module Learning Outcomes
By the end of the module, students are expected to:
- Explain
handle_unknown="ignore" hyperparameter of scikit-learn’s OneHotEncoder.
- Identify when it’s appropriate to apply ordinal encoding vs one-hot encoding.
- Explain strategies to deal with categorical variables with too many categories.
- Explain why text data needs a different treatment than categorical variables.
- Use
scikit-learn’s CountVectorizer to encode text data.
- Explain different hyperparameters of
CountVectorizer.
- Use
ColumnTransformer to build all our transformations together into one object and use it with scikit-learn pipelines.