This function performs a series of cleaning steps: 1. Standardizes column headers (snake_case, no punctuation/replace with underscore). 2. Removes columns that result in duplicate names (keeping the first). 3. Removes columns containing a single unique value (constants).
Parameters
Name
Type
Description
Default
data
pandas.DataFrame
The raw input DataFrame to be standardized.
required
Returns
Name
Type
Description
pandas.DataFrame
The fully sanitized DataFrame.
Raises
Name
Type
Description
TypeError
If the input data is not a pandas DataFrame.
Examples
>>>import pandas as pd>>> df = pd.DataFrame({... "First Name": ["Alice", "Bob", "Charlie"],... "Age": [25, 30, 35],... "age": [25, 30, 35],... "Constant_Column": [1, 1, 1],... " Special@Char$ ": [100, 200, 300]... })>>> standardized_df = standardize_schema(df)>>> standardized_df first_name age special_char0 Alice 251001 Bob 302002 Charlie 35300