infer_contract
infer_contract
Functions
| Name | Description |
|---|---|
| infer_contract | Derive a data contract from a pandas DataFrame. |
infer_contract
infer_contract.infer_contract(df)Derive a data contract from a pandas DataFrame.
Derives per-column expectations—including expected data type, allowable missingness, optional numeric bounds, and optional categorical domains. The resulting contract defines the expected schema and validation constraints for future datasets, based on the observed structure of the input DataFrame.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pd.DataFrame | A pandas DataFrame used to derive the data contract. This should be an example of “good” data that represents the expected structure and constraints of future datasets. | required |
Returns
| Name | Type | Description |
|---|---|---|
| Contract | A Contract object mapping column names to ColumnRule definitions, describing the expected schema and constraints of the dataset. |
Examples
>>> import pandas as pd
>>> from data_validation.infer_contract import infer_contract
>>> df = pd.DataFrame({
... "age": [20, 30, 40],
... "height": [170.0, 180.5, 175.2],
... "color": ["red", "blue", "red"],
... })
>>> contract = infer_contract(df)
>>> contract.name
'contract'
>>> sorted(contract.columns.keys())
['age', 'color', 'height']
>>> contract.columns["age"].dtype
'int'
>>> contract.columns["age"].min_value <= contract.columns["age"].max_value
True
>>> contract.columns["color"].allowed_values == {"red", "blue"}
True