Skip to main content

A solemn vow on your data. From YAML to verdict.

Project description

DataVow

A solemn vow on your data. From YAML to verdict.

Data contract enforcement for modern data teams. Define contracts in YAML. Validate with DuckDB. Block in CI. Report for stakeholders.

License Python Tests


Why DataVow?

89% of data teams report pain points with data modeling and ownership. Data contracts are the answer — but the tooling is fragmented:

  • dbt tests: SQL-only, no formal contract, no pre-ingestion validation
  • Great Expectations: verbose Python, steep learning curve
  • Soda: good YAML checks, but no CI-native workflow or stakeholder reporting
  • ODCS v3.1: promising standard, but no complete implementation

DataVow fills the gap: one tool from contract definition to validation, CI blocking, and human-readable reports. Built on ODCS v3.1 and powered by DuckDB.

Install

pip install datavow

Quick start

# 1. Scaffold a project
datavow init my-project

# 2. Write a contract (or edit the example)
cat contracts/orders.yaml

# 3. Validate data against the contract
datavow validate contracts/orders.yaml data/orders.csv

# 4. Generate a stakeholder report
datavow report contracts/orders.yaml data/orders.csv

# 5. Run in CI — exit 1 on critical failures
datavow ci contracts/ data/

Commands

datavow init [project-name]

Scaffold a new project with a datavow.yaml config and a contracts/ directory.

datavow define <contract.yaml>

Validate a contract's YAML syntax and display its structure — fields, rules, SLA — without needing data.

✓ Contract orders is valid

Name              orders
Version           1.0.0
Domain            sales
Owner             data-team@company.com

Schema: 5 fields (5 required, 1 PII)
  • order_id integer (required, unique)
  • customer_email string (required, pii)
  • total_amount decimal (required)
  • status string (required)
  • created_at timestamp (required)

Quality rules: 3
  • no_negative_totals CRITICAL (sql)
  • email_not_null CRITICAL (not_null)
  • daily_volume WARNING (row_count)

SLA: freshness=24h, completeness=99.5%

datavow validate <contract.yaml> <source>

Run schema, quality, and freshness checks against a data source (CSV, Parquet, JSON).

datavow validate contracts/orders.yaml data/orders.csv --verbose
datavow validate contracts/orders.yaml data/orders.csv --ci        # exit 1 on CRITICAL
datavow validate contracts/orders.yaml data/orders.csv -o json     # JSON output
datavow validate contracts/orders.yaml data/orders.csv -o summary  # one-liner

datavow report <contract.yaml> <source>

Generate a self-contained HTML or Markdown report. Share it with stakeholders, attach to deliveries, or publish.

datavow report contracts/orders.yaml data/orders.csv                    # HTML (default)
datavow report contracts/orders.yaml data/orders.csv -f md              # Markdown
datavow report contracts/orders.yaml data/orders.csv -o my-report.html  # custom path

datavow ci <contracts_dir> <sources_dir>

Batch-validate all contracts against matching data sources. Matches by name convention: contracts/orders.yamlsources/orders.csv.

datavow ci contracts/ data/                    # exit 1 on CRITICAL
datavow ci contracts/ data/ --fail-on warning  # stricter: fail on WARNING too

Contract format

DataVow contracts are a superset of ODCS v3.1 — compatible but extended with severity, SLA, and PII flags.

apiVersion: datavow/v1
kind: DataContract
metadata:
  name: orders
  version: 1.0.0
  owner: data-team@company.com
  domain: sales
  description: "Customer orders from the e-commerce platform"
  tags: [pii, financial, critical]

schema:
  type: table
  fields:
    - name: order_id
      type: integer
      required: true
      unique: true
    - name: customer_email
      type: string
      required: true
      pii: true
      pattern: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
    - name: total_amount
      type: decimal
      required: true
      min: 0
    - name: status
      type: string
      required: true
      allowed_values: [pending, confirmed, shipped, delivered, cancelled]
    - name: created_at
      type: timestamp
      required: true

quality:
  rules:
    - name: no_negative_totals
      type: sql
      query: "SELECT COUNT(*) FROM {table} WHERE total_amount < 0"
      threshold: 0
      severity: CRITICAL
    - name: email_not_null
      type: not_null
      field: customer_email
      severity: CRITICAL
    - name: daily_volume
      type: row_count
      min: 1000
      max: 100000
      severity: WARNING

sla:
  freshness: 24h
  completeness: "99.5%"

Supported field types

string, integer, float, decimal, boolean, date, timestamp

Supported quality rule types

Type Description Required fields
sql Custom SQL query returning a count query, threshold
not_null Field has no nulls field
unique Field values are unique field
row_count Row count within bounds min, max
range Field values within bounds field, min_value, max_value
accepted_values Field values in allowed set field, values
regex Field values match pattern field, pattern

Severity levels

Each rule has a severity: CRITICAL, WARNING, or INFO.

Vow Score

Score = 100 - (20 × CRITICAL + 5 × WARNING + 1 × INFO)

95-100  ✅ Vow Kept       — fully compliant
80-94   ⚠️ Vow Strained   — action needed
50-79   🔧 Vow Broken     — blocking issues
0-49    ❌ Vow Shattered   — critical violations

Data Mesh ready

Contracts are organized by domain. Each contract has a metadata.domain field. Structure your repo naturally:

contracts/
├── sales/
│   ├── orders.yaml
│   └── invoices.yaml
├── logistics/
│   └── shipments.yaml
└── finance/
    └── transactions.yaml

Tech stack

Component Technology
Language Python 3.12+
CLI Typer + Rich
Contract parsing Pydantic v2
Data validation DuckDB
Reporting Jinja2
Data formats CSV, Parquet, JSON (via DuckDB)

Development

git clone https://github.com/ludovicschmetz-stack/datavow.git
cd datavow
python -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
pytest tests/ -v

Roadmap

  • Phase 1 — CLI MVP (done)
  • Phase 2 — Integrations: dbt post-hook, Airflow operator, GitHub Action, PostgreSQL/MySQL via DuckDB, Slack/Teams notifications
  • Phase 3 — SaaS: web dashboard, contract catalogue, role-based access, API

License

Apache 2.0 — free and open source forever. The CLI stays free. Monetization comes from the SaaS (Phase 3).

Author

Built by Ludovic Schmetz — Senior Data Engineer/Architect, Luxembourg. Also the author of Olympus.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datavow-0.2.2.tar.gz (128.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datavow-0.2.2-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file datavow-0.2.2.tar.gz.

File metadata

  • Download URL: datavow-0.2.2.tar.gz
  • Upload date:
  • Size: 128.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datavow-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4f6d3e16f9be43bcae3f3657b8a34de9a58cf31bb60dc930966444b061025d26
MD5 08286c0e760721adcb679c5ae41fd0c5
BLAKE2b-256 f514da279d913ebebbcd20b67574f5b8c47c4030cefd4e804d91067fe90f1383

See more details on using hashes here.

Provenance

The following attestation bundles were made for datavow-0.2.2.tar.gz:

Publisher: publish.yml on ludovicschmetz-stack/datavow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datavow-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: datavow-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datavow-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e51c4d087159fe588dfc80262901f7303d3676d82fe346734b9efc4e2ddd855c
MD5 8cd16594db09164c618fc902ba1445ec
BLAKE2b-256 755a66710b96010d132f3213ed51902c42c72d746c77af1f89d3f186c5782e0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for datavow-0.2.2-py3-none-any.whl:

Publisher: publish.yml on ludovicschmetz-stack/datavow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page