infer_contract

infer_contract

Functions

Name Description
infer_contract Derive a data contract from a pandas DataFrame.

infer_contract

infer_contract.infer_contract(df)

Derive a data contract from a pandas DataFrame.

Derives per-column expectations—including expected data type, allowable missingness, optional numeric bounds, and optional categorical domains. The resulting contract defines the expected schema and validation constraints for future datasets, based on the observed structure of the input DataFrame.

Parameters

Name Type Description Default
df pd.DataFrame A pandas DataFrame used to derive the data contract. This should be an example of “good” data that represents the expected structure and constraints of future datasets. required

Returns

Name Type Description
Contract A Contract object mapping column names to ColumnRule definitions, describing the expected schema and constraints of the dataset.

Examples

>>> import pandas as pd
>>> from data_validation.infer_contract import infer_contract
>>> df = pd.DataFrame({
...     "age": [20, 30, 40],
...     "height": [170.0, 180.5, 175.2],
...     "color": ["red", "blue", "red"],
... })
>>> contract = infer_contract(df)
>>> contract.name
'contract'
>>> sorted(contract.columns.keys())
['age', 'color', 'height']
>>> contract.columns["age"].dtype
'int'
>>> contract.columns["age"].min_value <= contract.columns["age"].max_value
True
>>> contract.columns["color"].allowed_values == {"red", "blue"}
True