Separates numeric and categorical columns for a pandas Dataframe, and applies overrides for ambiguous cases via input. Hidden function used purely for all_distributions function.
This function automatically classifies DataFrame columns as numeric or categorical based on their data types. Supports manual overrides when automatic classification is incorrect (e.g., a numeric zip code that should be treated as categorical).
Parameters
Name
Type
Description
Default
pd_dataframe
pandas.DataFrame
Input DataFrame to separate into numeric and categorical columns.
required
target_column
str
The name of the target column. Regardless of dtype, target column is included in both numeric and categorical outputs.
required
ambiguous_column_types
dict
Dictionary specifying column type overrides for ambiguous cases. Expected keys are “numeric” and “categorical”, each containing a list of column names to force into that category. Invalid or non-existent column names are silently ignored. Numeric definded as: int, float, and complex, including int/float 32/64, np.number and boolean columns too (Pandas behaviour). Categorical definded as: Non-numeric columns, including object, string, datetime, and categorical dtypes. Example: ambiguous_column_types = {“numeric”: [“year”], “categorical”: [“zip_code”]}
None
Returns
Name
Type
Description
dict
A dictionary with keys “numeric” and “categorical”, each containing a filtered DataFrame with only the columns of that type.
Raises
Name
Type
Description
ValueError
If the input DataFrame is empty.
ValueError
If a column is specified in both “numeric” and “categorical” lists in ambiguous_column_types.