validate_contract

validate_contract

Functions

Name Description
validate_contract Validate a pandas DataFrame against a predefined data contract.

validate_contract

validate_contract.validate_contract(df, contract, strict=True)

Validate a pandas DataFrame against a predefined data contract.

This function validates an input DataFrame by comparing it against a contract that defines expected columns, data types, missingness thresholds, numeric value limits, and allowed categorical values. All columns defined in the contract are treated as required. Validation results are returned as a collection of structured issues describing any detected violations.

Parameters

Name Type Description Default
df pandas.DataFrame The DataFrame to be validated. required
contract Contract A data contract defining the expected columns and validation rules for each column, including: - expected data type (as a string), - maximum allowed fraction of missing values, - minimum and maximum values for numeric columns, - allowed categorical values. required
strict bool If True, the presence of extra columns in the DataFrame that are not defined in the contract is reported as validation issues. If False, extra columns are ignored. True

Returns

Name Type Description
ValidationResult An object containing: - a boolean flag (ok) indicating whether validation succeeded, - a list of Issue objects describing all detected validation problems.

Notes

The function performs the following checks: - Missing columns defined in the contract - Unexpected extra columns (when strict mode is enabled) - Data type mismatches based on dtype string comparison - Missingness violations based on maximum allowed missing fraction - Minimum and maximum value violations for numeric columns - Invalid or unseen categorical values

Examples

>>> result = validate_contract(df, contract)
>>> result.ok
True