simplify.categorical_plot

simplify.categorical_plot(
    df,
    target_column,
    categorical_target,
    max_categories=10,
    categorical_features=None,
)

Perform EDA on categorical columns in a dataset.

This function creates Altair plots for the specified columns, assuming them to contain categorical data. It creates sorted horizontal bar charts to show the frequency and the proportion of each categories. Also create box plots for features vs target if the target is numerical, or stacked bar charts if the target is categorical.

Parameters

Name Type Description Default
df pandas.DataFrame A pandas DataFrame containing the dataset required
target_column str The name of the target column. required
categorical_target bool A boolean value indicating if the target column is categorical or not. required
max_categories int The maximum categories to plot for high cardinality features 10
categorical_features list A list of strings containing column names of the categorical features. If this is not passed, keep all None

Returns

Name Type Description
list A list of Altair plot objects of all the plots created

Raises

Name Type Description
TypeError If df is not a dataframe, target_column is not a string, or categorical_features is not a list
ValueError If df is empty, target_column is not in the DataFrame, or categorical_features is empty or contains columns not in the DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "artist": ["A", "B", "C", "D"],
...     "popularity": [80, 75, 90, 85],
...     "danceability": [0.8, 0.6, 0.9, 0.7],
...     "energy": [0.7, 0.8, 0.6, 0.9]
... })
>>> plots = categorical_plot(df, 'popularity', False, categorical_features=["artist"])