This function makes a copy of a DataFrame (df) and looks through all the string columns. It counts the number of unique strings and calculates the ratio of the unique values to the total number of rows. If this ratio is below a chosen threshold in this function, it converts the column to category dtype. This function returns a dataframe with the converted columns and prints the number of columns that have been updated.
Parameters
Name
Type
Description
Default
df
pd.DataFrame
The DataFrame containing string columns.
required
max_unique_ratio
float
The maximum ratio of (unique_values / total_rows) for categorical conversion. - 0.5 means convert if unique values are less than 50% of total rows - Lower values (e.g., 0.3) make conversion more conservative - Higher values (e.g., 0.7) make conversion more aggressive
0.5
Returns
Name
Type
Description
pd.DataFrame
The DataFrame with eligible string columns converted to category dtype.