validate_datetime_schema.validate_datetime_schema

validate_datetime_schema.validate_datetime_schema(
    df,
    columns,
    datetime_format,
    coerce_invalid=False,
)

Validate that specified columns follow a given datetime format.

This function validates each non-missing value in the specified columns by attempting to parse it using pd.to_datetime(..., format=datetime_format). Values that cannot be parsed under the provided format are recorded as invalid.

By default, the function performs validation only and does not modify the input data. When coerce_invalid=True, it returns a copy of the DataFrame where valid values are converted to pandas datetime dtype and invalid values are set to NaT.

Parameters

Name Type Description Default
df pandas.DataFrame The DataFrame containing the datetime column(s) to validate. required
columns list of str A list of datetime column names to validate. If any specified column is not present in df, a KeyError is raised. required
datetime_format str Expected datetime format string used for strict validation (e.g., "%Y-%m-%d"). required
coerce_invalid bool Whether to return a coerced copy of the DataFrame. - If False, only validation is performed and the data are not modified. - If True, valid values are converted to datetime and invalid values are set to NaT in the returned validated_df. False

Returns

Name Type Description
dict A validation summary containing: status : {‘pass’, ‘fail’} Overall validation status across all specified columns. validated_df : pandas.DataFrame A copy of the input DataFrame. If coerce_invalid=True, the specified datetime columns are converted to pandas datetime dtype. invalid_records : pandas.DataFrame A tidy DataFrame listing all invalid datetime values with columns: - index : index labels from df.index where validation failed - column : name of the datetime column containing the invalid value - raw_value : original value that failed validation An empty DataFrame indicates that no invalid values were detected.

Examples

>>> df = pd.DataFrame({"date": ["2023-01-01", "2023-02-30"]})
>>> validate_datetime_schema(
...     df,
...     columns=["date"],
...     datetime_format="%Y-%m-%d"
... )