Schema registry for data contracts: semver versioning, compatibility checks (backward/forward/full), ownership, freshness SLAs. The 'you can't promote it without an approved contract' pattern for data pipelines. Optional audit-stream-py integration via AUDIT_STREAM_URL.
Project description
data-contract-registry
Schema registry for data contracts. Semver versioning, compatibility checks (backward / forward / full), declared owners, freshness SLAs. The "you can't promote a new dataset version without an approved contract" pattern, lifted from API governance and aimed at data pipelines.
The headline endpoint is POST /contracts — register a new version, get back a deterministic compatibility report or a 422 with every breaking change called out by field name and kind.
Why
The thing that gets data teams paged at 2am isn't a missing test. It's a producer who quietly removed ltv because "we never use it anymore" while three downstream dashboards still join on it. Schema registries (Confluent, Buf, etc.) solved this for streaming and gRPC; data pipelines need the same hardness in a shape that fits the things data teams actually argue about:
- owners — who do I page when this dataset goes stale
- freshness SLA — when does "stale" become "broken"
- primary key — changing it is a
MAJOR, not aMINOR - enum drift — adding a value is fine; removing one is a backward-compatibility break
- deprecation policy — flag a version with the URI of the migration plan; don't delete it
This package is the smallest thing that does all of those.
Install
pip install data-contract-registry
# with the FastAPI surface:
pip install "data-contract-registry[api]"
Python 3.11+. Runtime deps: pydantic + PyYAML.
Library quickstart
from data_contract_registry import (
ContractRegistry,
DataContract,
DataField,
Owner,
)
registry = ContractRegistry()
v1 = DataContract(
dataset_id="users.daily_active",
version="1.0.0",
primary_key=["user_id", "active_date"],
owners=[Owner(team="growth-platform", contact="#growth-platform")],
fields=[
DataField(name="user_id", type="string"),
DataField(name="active_date", type="timestamp"),
DataField(name="plan", type="string", enum=["free", "pro", "enterprise"]),
DataField(name="ltv", type="number", required=False),
],
status="active",
)
registry.register(v1)
# Compatible promotion (added an optional field).
v1_1 = v1.model_copy(update={
"version": "1.1.0",
"fields": [*v1.fields, DataField(name="signup_source", type="string", required=False)],
})
report = registry.register(v1_1)
print(report.compatible) # True
# Incompatible promotion — removing a field breaks backward compatibility.
v2 = v1.model_copy(update={"version": "2.0.0", "fields": [f for f in v1.fields if f.name != "ltv"]})
report = registry.register(v2)
print(report.compatible) # False
print(report.errors[0].kind) # "field_removed"
print(report.errors[0].message) # "field 'ltv' was removed; old data will fail validation"
Compatibility modes
| Mode | Meaning |
|---|---|
backward |
New schema can read data produced by the previous schema. Default. Consumers upgrade first. |
forward |
Previous schema can read data produced by the new schema. Producers upgrade first. |
full |
Both. |
none |
Anything goes. First-time onboarding only. |
The checks the engine knows how to flag (each carries a structured kind so you can build CI gates around specific failures):
| Kind | Severity | Mode |
|---|---|---|
field_removed |
error | backward |
field_type_changed |
error | backward |
field_required_added |
error | backward (optional→required) or forward (new required field) |
field_enum_shrunk |
error | backward |
primary_key_changed |
error | always |
version_not_increasing |
error | always |
owner_missing |
error | always |
FastAPI surface
pip install "data-contract-registry[api]"
uvicorn data_contract_registry.app:app --port 8090
| Method | Path | What it does |
|---|---|---|
| GET | / |
Service info. |
| GET | /healthz |
Liveness probe. |
| GET | /datasets |
List registered dataset IDs. |
| POST | /contracts |
Register / promote a contract. 422 with a structured issue list when incompatible. |
| POST | /contracts/check |
Dry-run compatibility check — does not register. |
| GET | /contracts/{ds}/latest |
Latest active contract for a dataset. |
| GET | /contracts/{ds}/versions |
Full version history. |
| GET | /contracts/{ds}/versions/{v} |
One specific version. |
| POST | /contracts/{ds}/versions/{v}/deprecate |
Mark deprecated with a migration URI. |
| POST | /contracts/{ds}/versions/{v}/archive |
Archive a version (history preserved). |
| POST | /contracts/owners/from-decision-card |
Cross-ecosystem hook — pull owners out of a Decision Card. |
Bundles are held in-memory by default. For restart-safe storage, swap _BundleStore's implementation; the protocol is small.
The cross-ecosystem hook
The third hook in the portfolio (after procurement-decision-api → policy-as-code-engine and the Suite → Decision Intelligence bridge). When a buyer approves a vendor whose data product the team will consume, the Decision Card's buyer.name + decision_maker are the right answer to "who owns the contract on our side":
curl -X POST http://localhost:8090/contracts/owners/from-decision-card \
-H 'Content-Type: application/json' \
-d @decision-card.json
# -> [
# {"team": "Springfield USD", "contact": "#data-platform"},
# {"team": "Director of Data (Alex Chen)", "contact": null}
# ]
Drop that list straight into DataContract.owners and the registration carries paging info the team didn't have to re-type.
YAML authoring
# contracts/users-daily-active.yaml
dataset_id: users.daily_active
version: "1.0.0"
owners:
- team: growth-platform
contact: "#growth-platform"
freshness_sla:
max_lag_seconds: 86400
fields:
- {name: user_id, type: string}
- {name: active_date, type: timestamp}
- {name: plan, type: string, enum: [free, pro, enterprise]}
Hand-author in YAML, validate in CI, register from Python:
import yaml
from pathlib import Path
from data_contract_registry import ContractRegistry, DataContract
raw = yaml.safe_load(Path("contracts/users-daily-active.yaml").read_text())
ContractRegistry().register(DataContract.model_validate(raw))
Tests
pip install -e ".[dev]"
ruff check src tests && ruff format --check src tests
mypy src
pytest -v
CI matrix runs Python 3.11 / 3.12 / 3.13.
Related in this ecosystem
- procurement-decision-api — drafts the Decision Cards this registry pulls owners from.
- policy-as-code-engine — pair with this registry to enforce contracts at request time.
- slo-budget-tracker — wire your freshness SLA into the same monitoring story.
- More at kineticgain.com.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_contract_registry-0.1.1.tar.gz.
File metadata
- Download URL: data_contract_registry-0.1.1.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7b78a731254f15bcdc8da93d5b846cbacb6eea0b8248d83bb3390ffa07c1a3a
|
|
| MD5 |
a257621da601e548503b7aadfac59065
|
|
| BLAKE2b-256 |
d9d2f5eae97cedf61c4ff135eb4f9fda43a43f18a0c1093d5c370630403da4d6
|
Provenance
The following attestation bundles were made for data_contract_registry-0.1.1.tar.gz:
Publisher:
publish.yml on mizcausevic-dev/data-contract-registry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_contract_registry-0.1.1.tar.gz -
Subject digest:
c7b78a731254f15bcdc8da93d5b846cbacb6eea0b8248d83bb3390ffa07c1a3a - Sigstore transparency entry: 1549222005
- Sigstore integration time:
-
Permalink:
mizcausevic-dev/data-contract-registry@5f0aef9a99f38b76c69f76be73813db56434e7fb -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mizcausevic-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5f0aef9a99f38b76c69f76be73813db56434e7fb -
Trigger Event:
push
-
Statement type:
File details
Details for the file data_contract_registry-0.1.1-py3-none-any.whl.
File metadata
- Download URL: data_contract_registry-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4b2053baedb1cfdda3536b1828ee1919f77238da05068f7563c44ac61170870
|
|
| MD5 |
d6796084b59d73b27be6913ac1717242
|
|
| BLAKE2b-256 |
90e766516e5bf384bf6019081eb70741314414185a337a3450fa1479ce5f717b
|
Provenance
The following attestation bundles were made for data_contract_registry-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on mizcausevic-dev/data-contract-registry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
data_contract_registry-0.1.1-py3-none-any.whl -
Subject digest:
f4b2053baedb1cfdda3536b1828ee1919f77238da05068f7563c44ac61170870 - Sigstore transparency entry: 1549222111
- Sigstore integration time:
-
Permalink:
mizcausevic-dev/data-contract-registry@5f0aef9a99f38b76c69f76be73813db56434e7fb -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mizcausevic-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5f0aef9a99f38b76c69f76be73813db56434e7fb -
Trigger Event:
push
-
Statement type: