validate_datetime_schema.validate_datetime_schema
validate_datetime_schema.validate_datetime_schema(
df,
columns,
datetime_format,
coerce_invalid=False,
)Validate that specified columns follow a given datetime format.
This function validates each non-missing value in the specified columns by attempting to parse it using pd.to_datetime(..., format=datetime_format). Values that cannot be parsed under the provided format are recorded as invalid.
By default, the function performs validation only and does not modify the input data. When coerce_invalid=True, it returns a copy of the DataFrame where valid values are converted to pandas datetime dtype and invalid values are set to NaT.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pandas.DataFrame | The DataFrame containing the datetime column(s) to validate. | required |
| columns | list of str | A list of datetime column names to validate. If any specified column is not present in df, a KeyError is raised. |
required |
| datetime_format | str | Expected datetime format string used for strict validation (e.g., "%Y-%m-%d"). |
required |
| coerce_invalid | bool | Whether to return a coerced copy of the DataFrame. - If False, only validation is performed and the data are not modified. - If True, valid values are converted to datetime and invalid values are set to NaT in the returned validated_df. |
False |
Returns
| Name | Type | Description |
|---|---|---|
| dict | A validation summary containing: status : {‘pass’, ‘fail’} Overall validation status across all specified columns. validated_df : pandas.DataFrame A copy of the input DataFrame. If coerce_invalid=True, the specified datetime columns are converted to pandas datetime dtype. invalid_records : pandas.DataFrame A tidy DataFrame listing all invalid datetime values with columns: - index : index labels from df.index where validation failed - column : name of the datetime column containing the invalid value - raw_value : original value that failed validation An empty DataFrame indicates that no invalid values were detected. |
Examples
>>> df = pd.DataFrame({"date": ["2023-01-01", "2023-02-30"]})
>>> validate_datetime_schema(
... df,
... columns=["date"],
... datetime_format="%Y-%m-%d"
... )