OpenDQV Core — open-source, contract-driven data quality validation engine for data pipelines and API boundaries
Project description
| Quickstart | Rules | Contracts | MCP | API | Security | FAQ |
|---|
"Trust is easier to build than to repair." That is why OpenDQV exists. A
422at the point of write is cheaper than a data incident three weeks later.
Beta (v2.x). Public API surface (REST, contract YAML, MCP tools, Python SDK) is stable. Breaking changes follow a one-release deprecation cycle. Security fixes backported to the latest 2.x line. See API Stability for commitments.
OpenDQV is a write-time data validation service. Source systems call it before writing data. Bad records return a 422 with per-field errors. Good records pass through. No payload is stored.
flowchart LR
subgraph Callers
direction TB
SF[Salesforce]
SAP[SAP]
DYN[Dynamics]
ORA[Oracle]
WEB[Web forms]
ETL1[ETL pipelines]
DJ[Django clean]
PY[Python scripts]
PD[Pandas / ETL]
CD[Claude Desktop]
CUR[Cursor]
LLM[LLM agents]
end
subgraph OpenDQV
direction TB
API[Validation API\nREST / batch]
SDK[LocalValidator\nin-process SDK]
MCP[MCP Server\nAI-native]
API & SDK & MCP --> CON[Contracts · YAML\nGovernance · RBAC\nAudit trail]
API & SDK & MCP --> GEN[Code Generator\nApex · JS · SQL]
end
subgraph Results
direction TB
R1[valid: true / false]
R2[per-field errors]
R3[severity levels]
R4[webhooks on events]
end
SF & SAP & DYN & ORA & WEB & ETL1 --> API
DJ & PY & PD --> SDK
CD & CUR & LLM --> MCP
API & SDK & MCP --> R1
subgraph Importers
IMP[dbt schema · GX suites\nSoda checks · ODCS · CSV]
end
IMP --> CON
style API fill:#0d3b5e,stroke:#092a44,color:#fff
style SDK fill:#0d3b5e,stroke:#092a44,color:#fff
style MCP fill:#0d3b5e,stroke:#092a44,color:#fff
style CON fill:#1a8aad,stroke:#14708d,color:#fff
style GEN fill:#1a8aad,stroke:#14708d,color:#fff
style R1 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R2 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R3 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R4 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style IMP fill:#1a8aad,stroke:#14708d,color:#fff
A 422 at the point of write closes the feedback loop — producers see failures immediately and fix them at source. Rejection rates drop over time because the tool changes the incentive, not just the outcome.
For post-landing monitoring use Great Expectations, Soda, or dbt tests — they're complementary, not competing. OpenDQV owns layer one (write-time enforcement); those tools own layer three (post-ingestion observability).
AI Agents — first-class via MCP
OpenDQV ships a built-in Model Context Protocol server, so Claude Desktop, Cursor, and any other MCP-compatible agent can discover contracts, validate records, and explain failures through tool calls the agent explicitly declares — no hallucinated compliance, no invented rules.
4-minute demo: Claude Desktop uses two MCP servers — OpenDQV for validation, Marmot for catalog lineage — to check a menu item against ppds_menu_item for Natasha's Law allergen compliance, stating which tool calls it makes and why. (Backup: download the MP4 from the repo)
For tool reference, write guardrails, remote/enterprise mode, and the Marmot composition pattern, see docs/mcp.md.
Install
| I have... | Command |
|---|---|
| Python 3.11+ | git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && bash install.sh |
| Docker | git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && cp .env.example .env && docker compose up -d |
| Just the SDK/CLI | pip install opendqv then opendqv init to bootstrap contracts |
| None of the above | Beginner setup guide → |
install.sh creates a virtual environment, installs dependencies, and launches the onboarding wizard. Docker pulls ghcr.io/opendqv/opendqv:latest — no build step required.
⚠️
AUTH_MODE=open(the default) has no authentication. SetAUTH_MODE=tokenand a strongSECRET_KEYin.envbefore any non-local deployment. See SECURITY.md.
Your First Validation
1. Write a contract — drop a YAML file in your contracts directory (run opendqv init --all to copy the 43 bundled contracts, or opendqv init for a single starter):
contract:
name: order
version: "1.0"
owner: "Data Governance"
status: active
rules:
- name: valid_email
type: regex
field: email
pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
severity: error
error_message: "Invalid email format"
- name: amount_positive
type: min
field: amount
min: 0.01
severity: error
error_message: "Order amount must be positive"
- name: status_valid
type: allowed_values
field: status
allowed_values: [pending, confirmed, shipped, cancelled]
severity: error
error_message: "Invalid order status"
2. Reload contracts:
curl -X POST http://localhost:8000/api/v1/contracts/reload
3. Send a bad record — OpenDQV rejects it:
curl -s -X POST http://localhost:8000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"contract": "order", "record": {"email": "not-an-email", "amount": -5, "status": "unknown"}}'
{
"valid": false,
"errors": [
{"field": "email", "rule": "valid_email", "message": "Invalid email format", "severity": "error"},
{"field": "amount", "rule": "amount_positive", "message": "Order amount must be positive", "severity": "error"},
{"field": "status", "rule": "status_valid", "message": "Invalid order status", "severity": "error"}
],
"contract": "order",
"version": "1.0"
}
4. Fix the record — it passes:
curl -s -X POST http://localhost:8000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"contract": "order", "record": {"email": "alice@example.com", "amount": 49.99, "status": "pending"}}'
{"valid": true, "errors": [], "warnings": [], "contract": "order", "version": "1.0"}
The customer contract ships pre-seeded if you want to skip step 1. The quickstart guide walks through authoring, lifecycle, and batch validation.
Rules
| Type | What it checks |
|---|---|
not_empty |
Field is present and non-empty |
regex |
Field matches (or does not match) a pattern. Built-ins: builtin:email, builtin:uuid, builtin:ipv4, builtin:url |
min / max / range |
Numeric bounds |
min_length / max_length |
String length |
date_format |
Parseable date/datetime. Falls back through common formats if no explicit format is set |
allowed_values |
Value must be in a fixed list |
lookup |
Value must appear in a local file or HTTP endpoint (with TTL cache) |
compare |
Cross-field: field op compare_to — supports gt, lt, gte, lte, eq, neq, and today/now sentinels |
required_if / forbidden_if |
Conditional: required or forbidden when another field equals a value |
checksum |
Check-digit integrity: IBAN, GTIN/GS1, NHS, ISIN, LEI, VIN, CPF, ISRC |
unique |
No duplicates within a batch (batch mode only) |
cross_field_range |
Value must be between two other fields in the same record |
field_sum |
Sum of named fields must equal a target (within optional tolerance) |
geospatial_bounds |
Lat/lon pair within a bounding box |
date_diff |
Difference between two date fields within a range |
age_match |
Declared age consistent with date-of-birth field |
Rules have severity: error (blocks the record) or severity: warning (flags but allows).
Any rule can include a condition block to apply it only when another field equals a given value.
Full reference: docs/rules/
How it compares
A mature data governance programme operates across three layers, each with a distinct job:
| Layer | Purpose | Tools |
|---|---|---|
| 1. Write-time enforcement | Prevent bad data from entering any system | OpenDQV |
| 2. Catalog / governance / stewardship | Ownership, glossary, lineage, policy, stewardship workflows | Alation, Atlan, Collibra, Purview, DataHub, Marmot |
| 3. Pipeline testing / observability | Detect drift, freshness issues, residual quality after ingestion | Great Expectations, Soda Core, dbt tests, Monte Carlo |
OpenDQV Core owns layer one. Your catalog handles layer two, your pipeline tools handle layer three.
| Great Expectations / Soda / dbt | OpenDQV | |
|---|---|---|
| When | After data lands (in warehouse/lake) | Before data is written (at the door) |
| Where | Data pipelines, batch jobs | Source system integration points |
| Model | Scan data at rest | Validate data in flight |
| Latency | Minutes to hours (batch) | Milliseconds (API call) |
| Who calls it | Data engineers | Data engineers, developers, CRM admins |
They're complementary. Use Great Expectations to monitor your warehouse. Use OpenDQV to stop bad data from getting there in the first place.
Contracts
43 production-ready contracts ship inside the opendqv package covering GDPR, HIPAA, SOX, MiFID II,
UK Building Safety Act, Martyn's Law, Natasha's Law, Ofcom Online Safety Act, EU DORA,
and 20+ other regulatory frameworks across UK, EU, and US. pip install opendqv gives you all of them
— opendqv list works with zero configuration.
See docs/compliance-contracts.md for the full list with regulatory context, or browse opendqv/contracts/ directly. 17 minimal starter templates are in examples/starter_contracts/.
Performance
EC2 c6i.large, 2 workers, 12-rule contract, mixed 50/50 workload:
~482 req/s, p99 ~182 ms. Sizing rule: WEB_CONCURRENCY = number of vCPUs.
See docs/benchmark_throughput.md for full platform comparison, methodology, and monthly volume extrapolation.
Documentation
| Quickstart | Build your first contract in 15 minutes |
| Rules Reference | All rule types with parameters and examples |
| Compliance Contracts | 44 contracts with regulatory context |
| API Reference | REST endpoints, SDK, GraphQL, webhooks |
| Security | Deployment checklist, threat model, RBAC |
| Production Deployment | Token auth, TLS, Docker Compose, hardening |
| Integrations | Salesforce, Kafka, Snowflake, dbt, Databricks, MCP, and more |
| All docs → | 76 documentation files |
API Stability
OpenDQV is in Beta as of 2.0.0. The following stability commitments apply to the v2.x series:
- REST API endpoints — paths, request bodies, and response shapes are stable within
v2.x. Backwards-incompatible changes require a major version bump and follow a deprecation cycle (one minor release of warnings before removal). - YAML contract format — the contract schema (rules, fields, types) is stable within
v2.x. New rule types may be added; existing rules will not change semantics without a deprecation cycle. - Python SDK —
OpenDQVClient,AsyncOpenDQVClient, andLocalValidatorpublic method signatures are stable withinv2.x. Internal helpers (prefixed_) are not covered. - MCP tools — tool names and parameters are stable within
v2.x. - Security fixes — backported to the latest 2.x line on a best-effort basis.
Known limitations in v2.2.x
- Rule null handling is inconsistent. Most format rules fail when the target
field is missing; a few (
max_length,allowed_values) pass silently;field_sumandratio_checkcoerce missing operands to0. Single-record and batch paths disagree in a few cases. Seedocs/rules/core_rules.mdfor the full matrix and the safe pattern to use today. v2.3.0 will make this consistent (loud-by-default with anoptional: trueopt-out). - Unknown rule types pass silently at runtime. A typo in
type:(e.g.min_lenght) is caught byopendqv lintbut not by the engine — a typo'd rule is a disabled rule. Always lint before deploy. v2.3.0 will reject unknown types at contract load.
Contributing
See CONTRIBUTING.md for setup instructions, coding guidelines, and how to submit changes.
License
MIT — see LICENSE.
Acknowledgements
Led by Sunny Sharma, BGMS Consultants Ltd. The vision, the architecture, every contract, and every design decision in this repository are directed by a human who believes data quality is a write-time responsibility.
OpenDQV is built with a hybrid team. Sunny leads — carbon and silicon. Three AI collaborators execute: Claude Sonnet 4.6 (primary developer), Claude Opus 4.6 (strategic auditor), and Grok (market intelligence). All answer to the same ethos: trust is easier to build than to repair.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opendqv-2.3.3.tar.gz.
File metadata
- Download URL: opendqv-2.3.3.tar.gz
- Upload date:
- Size: 261.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ccaa157a302ba516e9b5f6d53992a8b929d1367b22fc415c97fef1664e64fc4
|
|
| MD5 |
45e6d0b326088d0353a34a53c3fc4b4f
|
|
| BLAKE2b-256 |
f08a906f7c6ff70efbec72c8bce9a2ffeda9cc2a3c17ea5a3b1409336bf69f53
|
Provenance
The following attestation bundles were made for opendqv-2.3.3.tar.gz:
Publisher:
publish.yml on OpenDQV/OpenDQV
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opendqv-2.3.3.tar.gz -
Subject digest:
7ccaa157a302ba516e9b5f6d53992a8b929d1367b22fc415c97fef1664e64fc4 - Sigstore transparency entry: 1384706689
- Sigstore integration time:
-
Permalink:
OpenDQV/OpenDQV@a8b57f94a8870b25ebe8134a3adc710b6b2bccf5 -
Branch / Tag:
refs/tags/v2.3.3 - Owner: https://github.com/OpenDQV
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8b57f94a8870b25ebe8134a3adc710b6b2bccf5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file opendqv-2.3.3-py3-none-any.whl.
File metadata
- Download URL: opendqv-2.3.3-py3-none-any.whl
- Upload date:
- Size: 318.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29294bfd750b6ea2d7383d8af46d379a10bcc3e528aa11b9e7ca6d655358c3ea
|
|
| MD5 |
52a15b1e43dfea253394283357f069f5
|
|
| BLAKE2b-256 |
83288fd2168fe11ca5f795da54f1f473fce10bfab0b44a2d2c85701f4ff47247
|
Provenance
The following attestation bundles were made for opendqv-2.3.3-py3-none-any.whl:
Publisher:
publish.yml on OpenDQV/OpenDQV
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
opendqv-2.3.3-py3-none-any.whl -
Subject digest:
29294bfd750b6ea2d7383d8af46d379a10bcc3e528aa11b9e7ca6d655358c3ea - Sigstore transparency entry: 1384706703
- Sigstore integration time:
-
Permalink:
OpenDQV/OpenDQV@a8b57f94a8870b25ebe8134a3adc710b6b2bccf5 -
Branch / Tag:
refs/tags/v2.3.3 - Owner: https://github.com/OpenDQV
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8b57f94a8870b25ebe8134a3adc710b6b2bccf5 -
Trigger Event:
release
-
Statement type: