validate_categorical_schema.validate_categorical_schema
validate_categorical_schema.validate_categorical_schema(
df,
column,
allowed_categories,
)Validate that a categorical column conforms to a predefined allowed-value schema.
This function checks whether all non-missing values in df[column] are contained in allowed_categories. Missing values (NaN/None) are ignored. Values not in allowed_categories are reported in invalid_records.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pandas.DataFrame | The DataFrame containing the categorical column. | required |
| column | str | Name of the categorical column to validate. | required |
| allowed_categories | Sequence | An iterable of allowed category values (e.g., list, set, tuple). | required |
Returns
| Name | Type | Description |
|---|---|---|
| dict | A validation summary containing: status : {‘pass’, ‘fail’} Overall validation status. invalid_records : pandas.DataFrame A DataFrame with columns [‘index’, ‘column’, ‘raw_value’]. |
Examples
>>> df = pd.DataFrame({"color": ["red", "green", None]})
>>> validate_categorical_schema(
... df,
... column="color",
... allowed_categories=["red", "blue"]
... )