Skip to main content

A solemn vow on your data. From YAML to verdict.

Project description

DataVow

Trust Your Data. Know Why You Can't.

Open-source data contract enforcement for modern data teams.
Define contracts in YAML. Sync to dbt. Validate in CI. Block bad data before it reaches production.

PyPI Python CI License GitHub Action


The problem

89% of data teams report pain points with data modeling and ownership. Data contracts are the solution — but the tooling is fragmented:

  • dbt tests → SQL only, no formal contract, no pre-ingestion validation
  • Great Expectations → verbose Python, steep learning curve, no standard format
  • Soda → good YAML checks, but no CI/CD gate, no stakeholder reporting, no ODCS
  • Data Contract CLI → ODCS compatible, but no dbt sync, no scoring, no CI gate

DataVow covers the full lifecycle: define → sync dbt → validate → block → report. One tool. One standard.

Quick start

pip install datavow

# Initialize a project
datavow init my-project

# Define a contract
datavow define contracts/orders.yaml

# Validate data against contracts
datavow validate contracts/orders.yaml --source data/orders.csv

# Generate an HTML report
datavow report contracts/orders.yaml --source data/orders.csv --format html

# Run in CI mode (exit code 1 on critical violations)
datavow ci contracts/ --source data/

Key features

YAML-first contracts (ODCS v3.1 native)

Define schemas, quality rules, and SLAs in readable YAML. DataVow supports both its own format and native ODCS v3.1 contracts — auto-detected, no config needed.

apiVersion: datavow/v1
kind: DataContract
metadata:
  name: orders
  version: 1.0.0
  owner: data-team@company.com
  domain: sales

schema:
  type: table
  fields:
    - name: order_id
      type: integer
      required: true
      unique: true
    - name: customer_email
      type: string
      required: true
      pii: true

quality:
  rules:
    - name: no_negative_totals
      type: sql
      query: "SELECT COUNT(*) FROM {table} WHERE total_amount < 0"
      threshold: 0
      severity: CRITICAL

datavow dbt sync — the killer feature

One command generates dbt-native tests from your contracts. Works on every dbt adapter — no connector needed.

# Generate dbt tests from contracts
datavow dbt sync contracts/ --dbt-project-dir .

# Generates generic + singular tests from your contracts
# All tagged `datavow` for easy filtering

Vow Score — every validation renders a verdict

Vow Score = 100 - (20 × CRITICAL + 5 × WARNING + 1 × INFO)

  95-100  ✅ Vow Kept      — fully compliant, ship it
  80-94   ⚠️ Vow Strained  — action needed
  50-79   🔧 Vow Broken    — blocking issues
   0-49   ❌ Vow Shattered  — critical violations

CI pipeline gating

Block bad data automatically. No manual intervention.

GitHub Action (Marketplace):

- uses: ludovicschmetz-stack/datavow-action@v1
  with:
    contracts: contracts/
    source: data/
    fail-on: critical
    comment-on-pr: "true"

dbt on-run-end hook (datavow-dbt):

# dbt_project.yml
on-run-end:
  - "{{ datavow_summary() }}"

vars:
  datavow_fail_on: broken  # block pipeline on Vow Broken or worse

ODCS v3.1 — validate against the official standard

# Validate a contract against the ODCS v3.1 JSON Schema
datavow odcs check contracts/orders.yaml

# Convert ODCS native → DataVow format
datavow odcs convert contracts/orders-odcs.yaml -o contracts/orders.yaml

DataVow bundles the official ODCS v3.1.0 JSON Schema (2928 lines, Draft 2019-09). No other CLI tool does this.

Full command reference

Command Description
datavow init Initialize project with config and example contract
datavow define Create or edit a data contract interactively
datavow validate Validate data against contracts
datavow report Generate HTML or Markdown reports
datavow ci CI mode — validate + exit code 0/1
datavow dbt generate Auto-generate contracts from dbt manifest
datavow dbt validate Validate against dbt warehouse (via profiles.yml)
datavow dbt sync Generate dbt tests from contracts
datavow dbt ci Full pipeline: sync → dbt test → Vow Score
datavow odcs check Validate contract against ODCS v3.1 JSON Schema
datavow odcs convert Convert ODCS native → DataVow format

Data sources

DataVow validates files and databases via DuckDB:

Source How
CSV, Parquet, JSON, TSV Direct file validation
PostgreSQL datavow validate --source postgresql://...
DuckDB datavow validate --source path/to/db.duckdb

For cloud warehouses (Snowflake, BigQuery, Redshift, Databricks), use datavow dbt sync — it generates dbt-native tests that run on your existing dbt adapter. No extra connector needed.

Built for your whole team

Persona Uses Gets
Data Engineer datavow ci in pipeline Automated quality gate
Analytics Engineer datavow dbt sync One source of truth, zero test duplication
Domain Data Owner YAML contracts in git Versioned, reviewable data agreements
Data Governance HTML reports Conformity view across domains
Tech Lead CI gate + Vow Score No pipeline in prod without a contract
Freelance / Consultant datavow report Quality proof attached to every delivery

Architecture

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│  YAML       │    │   DataVow    │    │   Outputs   │
│  Contracts  │───▶│   Engine     │───▶│             │
│  (ODCS/DV)  │    │   (DuckDB)   │    │  ✅ Score   │
└─────────────┘    └──────┬───────┘    │  📊 Report  │
                          │            │  🚦 Exit 1  │
              ┌───────────┼──────┐     └─────────────┘
              ▼           ▼      ▼
          CSV/Parquet  PostgreSQL  dbt

Ecosystem

Package Description Version
datavow CLI — define, validate, report, CI v0.3.0
datavow-action GitHub Action — CI gate v1.0.0
datavow-dbt dbt package — on-run-end Vow Score v1.0.0

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/ludovicschmetz-stack/datavow.git
cd datavow
python -m venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest  # 137 tests

License

Apache 2.0 — free forever. Use it, fork it, ship it.


Website · Documentation · PyPI · Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datavow-0.3.1.tar.gz (146.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datavow-0.3.1-py3-none-any.whl (60.2 kB view details)

Uploaded Python 3

File details

Details for the file datavow-0.3.1.tar.gz.

File metadata

  • Download URL: datavow-0.3.1.tar.gz
  • Upload date:
  • Size: 146.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datavow-0.3.1.tar.gz
Algorithm Hash digest
SHA256 847a5e187fea50e4e697c587a70fb8cd40bda31df859390d926eea5248ad62ea
MD5 33dad2b0bc7c615e8e10712c34f78d95
BLAKE2b-256 68a6c1460510b7171182524b2e08da3db0a5f796f8c3d98a3b8242434f798d42

See more details on using hashes here.

Provenance

The following attestation bundles were made for datavow-0.3.1.tar.gz:

Publisher: publish.yml on ludovicschmetz-stack/datavow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datavow-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: datavow-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 60.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datavow-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6abdfd35552e738d7379a9008824af5a9520960e32ed3b316a0ba2ef93cbef01
MD5 4201da6097f98d34c668d519555fe5e2
BLAKE2b-256 98ee251c7d7fa4031adc3ed2de37c156bd24396fab32a037868654e9323de340

See more details on using hashes here.

Provenance

The following attestation bundles were made for datavow-0.3.1-py3-none-any.whl:

Publisher: publish.yml on ludovicschmetz-stack/datavow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page