compare_contracts

compare_contracts

Functions

Name Description
compare_contracts Compare two data contracts to detect schema and constraint drift.

compare_contracts

compare_contracts.compare_contracts(contract_a, contract_b)

Compare two data contracts to detect schema and constraint drift.

This function compares a reference (baseline) contract against an observed (latest) contract and reports differences in: - schema: added/removed columns and dtype changes - constraints: numeric bound changes, categorical domain changes, and missingness threshold changes

The comparison is directional: - “added” means present in contract_b but not in contract_a - “removed” means present in contract_a but not in contract_b - “old” refers to contract_a and “new” refers to contract_b

Drift definitions

  • Added columns: column in contract_b.columns but not in contract_a.columns
  • Removed columns: column in contract_a.columns but not in contract_b.columns
  • Dtype changes: for columns present in both contracts, ColumnRule.dtype differs (reported as (old_dtype, new_dtype))
  • Range changes (numeric bounds): for columns present in both contracts, min_value and/or max_value differs (only meaningful when numeric bounds are provided; this function compares the stored contract values, not raw data)
  • Category changes: for columns present in both contracts, allowed_values differs
  • Missingness changes: for columns present in both contracts, max_missing_frac differs (reported as (old_max_missing_frac, new_max_missing_frac))

Parameters

Name Type Description Default
contract_a Contract Reference contract representing the expected schema and constraints. required
contract_b Contract Observed contract representing the latest schema and constraints. required

Returns

Name Type Description
DriftReport A report containing only detected differences between the two contracts: - added_columns, removed_columns - dtype_changes (col -> (old, new)) - range_changes (set of columns whose min/max changed) - category_changes (set of columns whose allowed_values changed) - missingness_changes (col -> (old, new))

Notes

This function compares contract metadata only and does not inspect raw data. Drift is evaluated only for columns that exist in both contracts, except for added or removed columns detected via column name differences. Handling of optional fields (min_value, max_value, allowed_values) is implementation- defined; document your chosen rule if it matters for users.

Raises

Name Type Description
TypeError If contract_a or contract_b is not a Contract instance, or if a column rule is not a ColumnRule instance.
ValueError If max_missing_frac is non-numeric, outside [0, 1], or if min_value exceeds max_value.

Examples

>>> report = compare_contracts(contract_a, contract_b)
>>> report.has_drift
True
>>> report.missingness_changes
{'age': (0.05, 0.20)}