standardize_schema

standardize_schema(data)

Sanitize and standardize a DataFrames structure.

This function performs a series of cleaning steps: 1. Standardizes column headers (snake_case, no punctuation/replace with underscore). 2. Removes columns that result in duplicate names (keeping the first). 3. Removes columns containing a single unique value (constants).

Parameters

Name Type Description Default
data pandas.DataFrame The raw input DataFrame to be standardized. required

Returns

Name Type Description
pandas.DataFrame The fully sanitized DataFrame.

Raises

Name Type Description
TypeError If the input data is not a pandas DataFrame.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "First Name": ["Alice", "Bob", "Charlie"],
...     "Age": [25, 30, 35],
...     "age": [25, 30, 35],
...     "Constant_Column": [1, 1, 1],
...     "  Special@Char$ ": [100, 200, 300]
... })
>>> standardized_df = standardize_schema(df)
>>> standardized_df
  first_name  age  special_char
0      Alice   25           100
1        Bob   30           200
2    Charlie   35           300