datacontract-x

Data Contract eXtended — AI-native, platform-extensible data contracts: LLM enrichment (descriptions, tags, data quality), live import, and apply. Built on datacontract-cli.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

MickaelBZH

These details have not been verified by PyPI

Project description

dcx — Data Contract eXtended

Data Contract eXtended — AI-native, platform-extensible data contracts

Author data contracts with an LLM, sync them with your live platforms.
A lean, no-fork extension of datacontract-cli, built on the Open Data Contract Standard (ODCS).

PyPI Python License: MIT ODCS Built on datacontract-cli

What is dcx?

dcx (Data Contract eXtended) adds three things to the Open Data Contract Standard workflow that plain datacontract-cli doesn't do:

AI authoring — use an LLM to enrich a contract with column descriptions, validation constraints, governance tags from your own catalog, and an executable data-quality suite.
Live import — build a contract from a running system (its real columns, keys, comments, tags).
Apply — push the contract's governance back to the platform (comments, tags, data-quality, and the table itself).

It's platform-extensible by design: each platform is a small importer / exporter / apply module that plugs into datacontract-cli's factories. Snowflake is the first end-to-end platform (import → enrich → apply), with Kafka import today and more platforms built to slot in the same way.

The pipeline is: import a live schema into an ODCS contract → enrich it (columns · tags · quality) → apply it back to the platform, or export it to SQL / docs / schemas. Everything is available both as a CLI and as a REST API (dcx api).

Why dcx?

🧠 AI authoring that's safe to ship. Forced tool-calling, temperature=0, and strict server-side validation against the ODCS schema — the model can only produce spec-valid output, never free-form guesses.
🏷️ A tag manager, not a tag guesser. You define a controlled tag catalog (names, allowed values, examples); the LLM classifies columns into your vocabulary, with optional defaults.
✅ Executable, portable data quality. Quality rules prefer ODCS library metrics (portable, mappable to platform-native checks) and fall back to portable sql checks — across all seven ODCS dimensions.
🔌 Any LLM provider. Powered by litellm — Anthropic, OpenAI, Azure, Bedrock, Gemini, Ollama, … behind one --model flag.
🧩 Pluggable platforms, no fork. You keep all 30+ upstream importers/exporters and lint / test / changelog, and gain the AI + platform layer on top.
🔐 Auth that makes sense per surface. Live platform operations over the API use caller-supplied OAuth; secrets are never CLI flags.

Install

pip install datacontract-x

The import package and CLI are both dcx:

dcx --help
dcx info

From source (for development):

git clone https://github.com/MickaelBZH/data-contract-x.git
cd data-contract-x
pip install -e ".[dev]"

Requires Python 3.10–3.12. Installing pulls in datacontract-cli, litellm, FastAPI, and the platform connectors automatically.

Quickstart

The full loop — import a live schema, enrich it with an LLM, sync it back. Snowflake here is the example platform.

# 1. Import an existing schema into a contract (real columns, PKs, comments, tags)
dcx import snowflake --database MY_DB --schema LOAD --authenticator externalbrowser --output contract.yaml

# 2. Enrich with an LLM: descriptions + constraints + tags + data-quality tests
export ANTHROPIC_API_KEY=...           # or OPENAI_API_KEY / AZURE_API_KEY / ...
dcx enrich all contract.yaml --catalog tags_catalog.yaml --output contract.enriched.yaml

# 3. Preview exactly what will run — no connection needed
dcx apply snowflake contract.enriched.yaml --include-quality --dry-run

# 4. Apply it: creates the table if missing, governs it (comments + tags + DQ) if it exists
dcx apply snowflake contract.enriched.yaml --include-quality

Commands

Every command is dcx <command>, and most are mirrored to a REST endpoint when you run dcx api. Each section below lists the sub-commands, a CLI example, and the matching API call. Run dcx <command> --help for the full option list.

`import` — build a contract from a source

Sub-command	Source
`dcx import snowflake`	A live Snowflake schema (columns, primary keys, comments, tags)
`dcx import kafka`	A Kafka topic's value schema (Confluent Schema Registry)
`dcx import <format>`	A file/document — `sql`, `avro`, `dbml`, `glue`, `bigquery`, `unity`, `jsonschema`, `json`, `odcs`, `parquet`, `csv`, `protobuf`, `spark`, `iceberg`, `excel`, `dbt`

dcx import snowflake --database MY_DB --schema LOAD --authenticator externalbrowser --output contract.yaml
dcx import kafka --schema-registry https://sr:8081 --topic orders --output contract.yaml
dcx import sql --source schema.sql --dialect snowflake --output contract.yaml

API

POST /import/snowflake — live import, authenticated by the caller's Snowflake OAuth token (Authorization: Bearer <token>).
POST /import/{format} — file-based importers; send the document inline as source_content.
(Kafka import is CLI-only.)

`enrich` — AI authoring with an LLM

Sub-command	Adds
`dcx enrich columns`	Business descriptions, `logicalTypeOptions` constraints, `required` / `unique` flags
`dcx enrich tags`	Governance tags, classified against your tag catalog
`dcx enrich quality`	An executable data-quality suite across all ODCS dimensions
`dcx enrich all`	columns → tags → quality, in that order so each stage grounds the next

Each sub-command is independent and idempotent (existing values are preserved unless you pass --overwrite). The provider key is read from the environment — there is no --api-key flag. Use --model for any litellm model and --base-url for a proxy / Azure / Ollama endpoint.

dcx enrich columns contract.yaml --output contract.enriched.yaml
dcx enrich tags    contract.yaml --catalog tags_catalog.yaml --output contract.tagged.yaml
dcx enrich quality contract.yaml --model gpt-4o --output contract.dq.yaml
dcx enrich all     contract.yaml --catalog tags_catalog.yaml --output contract.full.yaml

API (the LLM key comes from the server's environment)

POST /enrich/columns · POST /enrich/quality
POST /enrich/tags · POST /enrich/all — take the tag catalog inline in the request body.

`export` — convert a contract to a target format

Sub-command	Output
`dcx export snowflake-full`	A Snowflake setup script: DDL + tags + Data Metric Functions, in one file
`dcx export <format>`	Any upstream format — `sql`, `jsonschema`, `html`, `markdown`, `mermaid`, `dbt-*`, `avro`, `protobuf`, `bigquery`, `spark`, `sqlalchemy`, `iceberg`, `sodacl`, `great-expectations`, `dbml`, `pydantic-model`, `odcs`, `rdf`, `go`, `excel`, …

snowflake-full shares apply's SQL-generation knobs, so it emits the exact same script apply --dry-run would: --ddl-mode auto\|always\|never (default auto → CREATE TABLE IF NOT EXISTS + govern), --structured-types, --comments, --include-tags, --include-quality, --create-tags, --tag-namespace DB.SCHEMA. (apply's --strict drift check has no export equivalent — it needs a live connection.)

dcx export snowflake-full contract.yaml --include-quality --create-tags --output setup.sql
dcx export snowflake-full contract.yaml --ddl-mode never --output govern.sql   # alter-only
dcx export html contract.yaml --output contract.html

API

POST /export/{format} — including POST /export/snowflake-full. The response media type depends on the format (JSON / YAML / text / binary).

`apply` — push governance to a live platform

Sub-command	Target
`dcx apply snowflake`	A live Snowflake account

With the default --ddl-mode auto you don't need to know whether the table exists: missing tables are created (CREATE TABLE IF NOT EXISTS) and existing ones are governed — column/table comments, tags, and (with --include-quality) data-quality metrics. For existing tables, dcx also compares the live schema to the contract and reports drift as warnings — or, with --strict, an error that aborts before any change (the check uses DESCRIBE TABLE, so it needs no active warehouse).

Option	Effect
`--ddl-mode auto\|always\|never`	create-if-missing-then-govern (default) · always `CREATE TABLE` · govern existing only
`--strict`	fail instead of warn on schema drift
`--structured-types`	typed nested `OBJECT(...)` / `ARRAY(...)`
`--include-quality` · `--create-tags` · `--tag-namespace`	data-metric functions · `CREATE TAG IF NOT EXISTS` · qualify tag refs
`--dry-run`	print the SQL without connecting

dcx apply snowflake contract.yaml --dry-run            # preview
dcx apply snowflake contract.yaml --include-quality    # create-or-govern

API

POST /apply/snowflake — authenticated by the caller's Snowflake OAuth token. Supports dry_run, ddl_mode, strict, structured_types, … and returns the executed SQL plus any drift warnings.

`target` — bind a contract to a platform

dcx target <type> sets the contract's server block and resolves each column's physicalType for that platform. ~30 types: snowflake, bigquery, databricks, postgres, redshift, mysql, sqlserver, oracle, s3, kafka, trino, athena, glue, duckdb, local, …

dcx target snowflake contract.yaml --output contract.snowflake.yaml

API

POST /target/{type} — one route per supported platform type.

From datacontract-cli

These commands work unchanged — dcx <command> behaves exactly like datacontract <command>.

Command	Sub-commands	Purpose	API
`dcx init`	—	Create an empty data contract	—
`dcx lint`	—	Validate a contract against the ODCS schema	`POST /lint`
`dcx test`	—	Run schema + data-quality tests against a configured server	`POST /test`
`dcx ci`	—	`test` for CI/CD — emits GitHub Actions annotations	—
`dcx changelog`	—	Semantic changelog between two contract versions	`POST /changelog`
`dcx catalog`	—	Render an HTML catalog of many contracts	—
`dcx publish`	—	Publish a contract to Entropy Data	—
`dcx dbt`	`sync`	Sync contracts into a dbt project	—

`api` / `info`

dcx api --port 4242      # start the REST server (Swagger UI at /docs)
dcx info                 # show dcx + datacontract-cli versions   (API: GET /info)

The tag catalog

dcx enrich tags does controlled-vocabulary tagging: instead of letting the model invent tags, you give it a catalog of allowed names and values, and it classifies each column into that vocabulary. The catalog is a small YAML (or JSON) file — the only extra input auto-tagging needs.

# tags_catalog.yaml
tags:
  - name: DATA_CLASSIFICATION          # the tag name (becomes the platform TAG name)
    description: >                      # tells the model what this tag is for
      Data sensitivity level. Assign exactly one — the highest level that applies.
    multiple: false                    # false = at most one value per column; true = many
    values:
      - value: PUBLIC                   # the model may only pick from these values
        description: Non-sensitive data that can be shared freely.
        examples: [country_code, currency, language, product_category]   # guide classification
      - value: INTERNAL
        description: Internal business data, not for public release. The default.
        default: true                  # assigned when the model picks nothing else
        examples: [order_id, status, created_at, loyalty_points]
      - value: CONFIDENTIAL
        description: Personal data or sensitive business data; need-to-know access.
        examples: [full_name, email, phone, home_address, date_of_birth]
      - value: RESTRICTED
        description: Highly sensitive data under legal/regulatory controls (financial, health, credentials, IDs).
        examples: [national_id, passport_number, iban, credit_card_number, health_status]

  - name: DATA_DOMAIN                   # you can define several tags
    description: The business domain that owns the column.
    multiple: false
    values:
      - value: CUSTOMER
        examples: [customer_id, email, loyalty_points]
      - value: FINANCE
        examples: [amount, currency, invoice_id, iban]

Field	Meaning
`name`	Tag name. Required. Becomes the tag key everywhere downstream.
`description`	What the tag means — given to the model as classification guidance.
`multiple`	`false` (default): at most one value per column. `true`: a column may carry several.
`values[].value`	An allowed value. The model may only assign values listed here — anything else is dropped.
`values[].description`	What the value means — strongly improves accuracy.
`values[].examples`	Example column names that fit this value — the model's strongest signal.
`values[].default`	If `true`, assigned to columns the model leaves unclassified for this tag. At most one per tag.

Assigned tags are written on each column as NAME=VALUE (e.g. DATA_CLASSIFICATION=CONFIDENTIAL) — the convention export snowflake-full and apply snowflake consume. A worked catalog and example contracts live in examples/.

REST API

dcx api --port 4242      # Swagger UI at http://127.0.0.1:4242/docs

Every command above is mirrored to an endpoint, with request and response schemas in the OpenAPI spec. Auth model:

Live platform operations (/import/snowflake, /apply/snowflake) act as the caller — the OAuth bearer token comes from the Authorization header, so the server never uses ambient credentials for someone else's data.
Enrichment (/enrich/*) uses the server's LLM key (from the environment). Put service-level auth/quota in front of it before exposing it publicly.
The CLI never takes secrets as flags — platform secrets come from env vars or the platform's own config; LLM keys from the provider's standard env var.

How it fits with datacontract-cli

dcx is a separate package that depends on datacontract-cli as a library — no fork. It registers new importers (snowflake, kafka) and the snowflake-full exporter into the upstream factories, adds target / enrich / apply sub-apps and live-import commands to the upstream Typer app, and mirrors every command to FastAPI for dcx api. So you keep all of upstream's importers, exporters, lint, test, and changelog, and gain the AI + platform layer on top.

Development

pip install -e ".[dev]"
pytest          # 211 tests
ruff check dcx  # lint

Tests never hit live services or real LLMs — platform connections, the Schema Registry, and every LLM call are mocked, so the suite stays fast and offline. See RELEASING.md for the PyPI release process.

Contributing

Issues and PRs welcome. Please run pytest and ruff check dcx before opening a PR, and add tests for new behavior.

License

MIT © MickaelBZH.

_{Built on datacontract-cli · Open Data Contract Standard · litellm}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

MickaelBZH

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Jun 10, 2026

0.1.1

Jun 9, 2026

0.1.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacontract_x-0.1.2.tar.gz (96.1 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datacontract_x-0.1.2-py3-none-any.whl (84.0 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file datacontract_x-0.1.2.tar.gz.

File metadata

Download URL: datacontract_x-0.1.2.tar.gz
Upload date: Jun 10, 2026
Size: 96.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datacontract_x-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`bab5780343b94565420f311140f80079072d3502b1ebc707707f52ba18d47f60`
MD5	`6086d2f293756559067856ea7ad3f947`
BLAKE2b-256	`50379afb80ba49a527e2ab76c7fd99a60889dc494b2f16e5a52d7e14ee9bddc2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datacontract_x-0.1.2.tar.gz:

Publisher: release.yml on MickaelBZH/data-contract-x

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datacontract_x-0.1.2.tar.gz
- Subject digest: bab5780343b94565420f311140f80079072d3502b1ebc707707f52ba18d47f60
- Sigstore transparency entry: 1784228584
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: MickaelBZH/data-contract-x@d3ddd8809e0c92708288161d19c3e35adcd9a395
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/MickaelBZH
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d3ddd8809e0c92708288161d19c3e35adcd9a395
- Trigger Event: release

File details

Details for the file datacontract_x-0.1.2-py3-none-any.whl.

File metadata

Download URL: datacontract_x-0.1.2-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 84.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datacontract_x-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e4c66a7df8eeea0bf2812bf9dc7599fe119064e6ed9c7781a29b45436a49e147`
MD5	`828e0363176829b164ce615902d7049b`
BLAKE2b-256	`2b2442e77dac0931f48c71366c027c68aa2b71e6a133438fcd8d292c3ecae190`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datacontract_x-0.1.2-py3-none-any.whl:

Publisher: release.yml on MickaelBZH/data-contract-x

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datacontract_x-0.1.2-py3-none-any.whl
- Subject digest: e4c66a7df8eeea0bf2812bf9dc7599fe119064e6ed9c7781a29b45436a49e147
- Sigstore transparency entry: 1784228773
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: MickaelBZH/data-contract-x@d3ddd8809e0c92708288161d19c3e35adcd9a395
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/MickaelBZH
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d3ddd8809e0c92708288161d19c3e35adcd9a395
- Trigger Event: release

datacontract-x 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Data Contract eXtended — AI-native, platform-extensible data contracts

What is dcx?

Why dcx?

Install

Quickstart

Commands

import — build a contract from a source

enrich — AI authoring with an LLM

export — convert a contract to a target format

apply — push governance to a live platform

target — bind a contract to a platform

From datacontract-cli

api / info

The tag catalog

REST API

How it fits with datacontract-cli

Development

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`import` — build a contract from a source

`enrich` — AI authoring with an LLM

`export` — convert a contract to a target format

`apply` — push governance to a live platform

`target` — bind a contract to a platform

`api` / `info`