Skip to main content

Runtime Data Contract Enforcer

Project description

rdce

CI Pipeline Documentation Python License

Runtime Data Contract Enforcer: A lightweight Python 3 library for recursively validating and diffing nested JSON payloads against explicit Pydantic schemas.

Current Status

🚀 v0.2.0 Complete 🚀 The core recursive validation engine, Pydantic extractor, and public API are complete and fully tested with 100% coverage.

🌟 Features

  • Pydantic Native: Define your data contracts using standard Pydantic BaseModel classes.
  • Recursive Type Validation: Deeply inspects nested dictionaries and payloads without flattening them.
  • Array Support: Natively validates items inside list[Type] arrays.
  • Optional Forgiveness: Gracefully handles missing keys or None values for Optional and Union types.
  • Path Tracking: Returns exact dot-notation breadcrumbs for schema drift (e.g., user.address.zip_code, nodes[1].ip).
  • Zero Bloat: Built to do one thing—diffing data schemas.

📦 Installation

pip install rdce
# Or using poetry
poetry add rdce

🚀 Quick Start

rdce is designed to be a transparent bridge between your Pydantic models and incoming, untrusted dictionary payloads.

1. Define your Contract

Use standard Pydantic models. Nested models are fully supported.

from pydantic import BaseModel

class Address(BaseModel):
    city: str
    zip_code: int

class UserContract(BaseModel):
    username: str
    is_active: bool
    address: Address

2. Enforce the Payload

Pass the model class and your raw dictionary payload into the enforce_contract engine.

from rdce import enforce_contract

# A payload with schema drift (wrong type for zip_code, missing is_active)
incoming_payload = {
    "username": "alice_data",
    "address": {
        "city": "London",
        "zip_code": "E1 6AN" # Expected int, got string
    }
}

errors = enforce_contract(UserContract, incoming_payload)

for error in errors:
    print(error)

Output:

[
  {"path": "is_active", "expected": "bool", "actual": "MISSING"},
  {"path": "address.zip_code", "expected": "int", "actual": "str"}
]

3. Validating Arrays and Lists

rdce natively supports generic aliases like list[str] and lists of nested models. The engine will evaluate every item in the payload array and return the exact index of the violation.

class ServerNode(BaseModel):
    ip_address: str
    is_active: bool

class Cluster(BaseModel):
    cluster_name: str
    nodes: list[ServerNode]

# Payload with an error inside the array at index 1
payload = {
    "cluster_name": "eu-west-db",
    "nodes": [
        {"ip_address": "10.0.0.1", "is_active": True},
        {"ip_address": "10.0.0.2", "is_active": "yes"}
    ]
}

errors = enforce_contract(Cluster, payload)

Output:

[{"path": "nodes[1].is_active", "expected": "bool", "actual": "str"}]

4. Optional and Union Types

rdce gracefully handles optional fields. Missing keys or explicit None values will not trigger false positives if the contract allows them.

from typing import Optional

class UserProfile(BaseModel):
    username: str
    # Modern Python 3.10+ syntax
    age: int | None
    # Classic typing syntax              
    nickname: Optional[str]      

payload = {
    # 'age' is completely missing - ALLOWED!
    "username": "bob_builder",
    # Explicitly null - ALLOWED!
    "nickname": None             
}

errors = enforce_contract(UserProfile, payload)
# Output: [] (Perfectly valid payload)
[]

5. Strict Mode Validation

By default, rdce ignores extra keys in the payload. To flag injected or unexpected keys that are not defined in your schema, enable strict=True.

payload = {
    "username": "bob_builder",
    # INJECTED KEY
    "is_admin": True
}

errors = enforce_contract(UserProfile, payload, strict=True)
# Output: [{"path": "is_admin", "expected": "UNEXPECTED_KEY", "actual": "bool"}]

6. Fast Flat-File Validation (CSV Header Checking)

For data engineering pipelines (like Airflow or Dagster), rdce can validate flat-file schema drift without loading the entire file into memory. The enforce_csv_structure adapter reads only the first row of a CSV and cross-references the column names against your Pydantic contract.

from rdce.adapters import enforce_csv_structure
from pydantic import BaseModel
from typing import Optional

class PipelineContract(BaseModel):
    id: int
    username: str
    email: Optional[str]

# Instantly catches if an upstream database dropped the 'email' column
errors = enforce_csv_structure(PipelineContract, "massive_export.csv", delimiter=",")
# Output: [{"path": "email", "expected": "COLUMN_PRESENT", "actual": "MISSING"}]

7. Streaming CSV Validation (Deep Scan & Dead-Letter Queues)

When you need to validate data types row-by-row in massive flat files, loading them into memory will cause OOM (Out of Memory) crashes.

The stream_csv_contract adapter acts as a Python Generator. It streams the file one row at a time, attempts basic type coercion, and yields only the rows that fail validation.

This allows you to easily route bad data to a dead-letter queue while letting your pipeline continue:

import csv
from rdce.adapters import stream_csv_contract
from pydantic import BaseModel

class Employee(BaseModel):
    id: int
    name: str
    is_active: bool

# Stream a file, handling enterprise CSV encodings and custom delimiters
bad_rows = stream_csv_contract(
    Employee, 
    "massive_export.csv", 
    delimiter=",", 
    encoding="utf-8-sig", # Automatically strips invisible BOM characters from Excel/Enterprise exports
    ignore_nulls=False
)

# Route the rejected rows to a separate file for review
with open("rejects.csv", "w", newline="") as f:
    writer = None
    
    for bad_row_payload in bad_rows:
        # The payload contains the line number, the exact raw row, and the errors
        raw_row = bad_row_payload["raw_row"]
        errors = bad_row_payload["errors"]
        
        # Initialize the CSV writer on the first bad row we find
        if writer is None:
            writer = csv.DictWriter(f, fieldnames=raw_row.keys())
            writer.writeheader()
            
        writer.writerow(raw_row)
        print(f"Row {bad_row_payload['line_num']} failed: {errors}")

🤝 Contributing

We welcome contributions! To set up the project locally:

1 Clone the repository.
2 Initialize the environment: poetry install
3 We strictly enforce formatting and linting via Ruff:
4 Linter: 
    poetry run python3 -m ruff check .
5 Formatter: 
    poetry run python3 -m ruff format .
6 Run the test suite: 
    poetry run pytest
7 Ensure 100% test coverage before submitting a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdce-1.0.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdce-1.0.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file rdce-1.0.0.tar.gz.

File metadata

  • Download URL: rdce-1.0.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdce-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0b3d938c73fc6274461c66044e2f9396aaa694332c021324e32ec40097c8cdfc
MD5 3b89a433628268de9c4b27f61f627f0d
BLAKE2b-256 3fb59deeb65f182978ba7422967d9213746e95d27dae746485ddd64709789683

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdce-1.0.0.tar.gz:

Publisher: publish.yml on valdal14/rdce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rdce-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: rdce-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdce-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25d61fce46ca08aa552f892bbc11a072d48c84d68c01fa4fc674860e61ad80ed
MD5 2c7735580a7b8a697079143c4f4dafa5
BLAKE2b-256 a78a33b6d55a2b22e1c4e9e209d0db3cb09fea1726ecfd2b17aaca455cd45242

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdce-1.0.0-py3-none-any.whl:

Publisher: publish.yml on valdal14/rdce

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page