Skip to main content

YAML-first data contract governance for AI agents

Project description

agentic-data-contracts

PyPI version CI Python 3.12+ License: MIT

Stop your AI agents from running wild on your data.

agentic-data-contracts lets data engineers define governance contracts in YAML — what tables an agent may query, which operations are forbidden, what resource limits apply — and enforces them automatically at query time via SQL validation powered by sqlglot.

Why? AI agents querying databases face two problems: resource runaway (unbounded compute, endless retries, cost overruns) and semantic inconsistency (wrong tables, missing filters, ad-hoc metric definitions). This library addresses both with a single YAML contract.

Works with: Claude Agent SDK (primary target), or any Python agent framework. Optionally integrates with ai-agent-contracts for formal resource governance.

How It Works

Agent: "SELECT * FROM analytics.orders"
  -> BLOCKED (no SELECT * — specify explicit columns)

Agent: "SELECT order_id, amount FROM analytics.orders"
  -> BLOCKED (missing required filter: tenant_id)

Agent: "SELECT order_id, amount FROM analytics.orders WHERE tenant_id = 'acme'"
  -> PASSED + WARN (consider using semantic revenue definition)

Agent: "DELETE FROM analytics.orders WHERE id = 1"
  -> BLOCKED (forbidden operation: DELETE)

The contract defines the rules. The library enforces them — before the query ever reaches the database.

Installation

uv add agentic-data-contracts
# or
pip install agentic-data-contracts

With optional database adapters:

uv add "agentic-data-contracts[duckdb]"      # DuckDB
uv add "agentic-data-contracts[bigquery]"    # BigQuery
uv add "agentic-data-contracts[snowflake]"   # Snowflake
uv add "agentic-data-contracts[postgres]"    # PostgreSQL
uv add "agentic-data-contracts[agent-sdk]"   # Claude Agent SDK integration

Quick Start

1. Write a YAML contract

# contract.yml
version: "1.0"
name: revenue-analysis

semantic:
  allowed_tables:
    - schema: analytics
      tables: [orders, customers, subscriptions]
  forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]
  rules:
    - name: tenant_isolation
      description: "All queries must filter by tenant_id"
      enforcement: block
      filter_column: tenant_id
    - name: no_select_star
      description: "Must specify explicit columns"
      enforcement: block

resources:
  cost_limit_usd: 5.00
  max_retries: 3
  token_budget: 50000

temporal:
  max_duration_seconds: 300

2. Load the contract and create tools

from agentic_data_contracts import DataContract, create_tools
from agentic_data_contracts.adapters.duckdb import DuckDBAdapter
from agentic_data_contracts.semantic.yaml_source import YamlSource

dc = DataContract.from_yaml("contract.yml")
adapter = DuckDBAdapter("analytics.duckdb")
semantic = YamlSource("semantic.yml")

tools = create_tools(dc, adapter=adapter, semantic_source=semantic)

3. Use with the Claude Agent SDK

import asyncio
from claude_agent_sdk import (
    ClaudeAgentOptions,
    AssistantMessage,
    TextBlock,
    create_sdk_mcp_server,
    query,
)

server = create_sdk_mcp_server(name="data-contracts", version="1.0.0", tools=tools)

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    system_prompt=f"You are a revenue analytics assistant.\n\n{dc.to_system_prompt()}",
    mcp_servers={"dc": server},
    allowed_tools=[f"mcp__dc__{t.name}" for t in tools],
)

async def run(prompt: str) -> None:
    async for message in query(prompt=prompt, options=options):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

asyncio.run(run("What was total revenue by region in Q1 2025?"))

4. Or use the tools directly (no SDK required)

import asyncio

async def demo() -> None:
    # Validate a query without executing
    validate = next(t for t in tools if t.name == "validate_query")
    result = await validate.callable(
        {"sql": "SELECT id, amount FROM analytics.orders WHERE tenant_id = 'acme'"}
    )
    print(result["content"][0]["text"])
    # VALID — Query passed all checks.

    # Blocked query
    result = await validate.callable({"sql": "SELECT * FROM analytics.orders"})
    print(result["content"][0]["text"])
    # BLOCKED — Violations:
    # - SELECT * is not allowed — specify explicit columns

asyncio.run(demo())

The 10 Tools

Tool Description
list_schemas List all allowed database schemas from the contract
list_tables List allowed tables, optionally filtered by schema
describe_table Get full column details for an allowed table
preview_table Preview sample rows from an allowed table
list_metrics List all metric definitions from the semantic source
lookup_metric Get the full definition of a specific metric
validate_query Validate a SQL query against contract rules without executing
query_cost_estimate Estimate cost and row count via EXPLAIN
run_query Validate and execute a SQL query, returning results
get_contract_info Get the full contract: rules, limits, and session status

Contract Rules

Rules are enforced at three levels:

  • block — query is rejected and an error is returned to the agent
  • warn — query proceeds but a warning is included in the response
  • log — violation is recorded but not surfaced to the agent

Built-in checkers enforce:

  • Table allowlist — only tables listed in allowed_tables may be queried
  • Operation blocklistforbidden_operations (DELETE, DROP, etc.) are rejected
  • Required filters — rules with filter_column require a matching WHERE clause
  • No SELECT * — queries must name explicit columns

Semantic Sources

A semantic source provides metric and table schema metadata to the agent.

YAML (built-in):

# semantic.yml
metrics:
  - name: total_revenue
    description: "Total revenue from completed orders"
    sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
    source_model: analytics.orders

tables:
  - schema: analytics
    table: orders
    columns:
      - name: id
        type: INTEGER
      - name: amount
        type: DECIMAL
      - name: tenant_id
        type: VARCHAR

dbt — point to a manifest.json:

semantic:
  source:
    type: dbt
    path: "./dbt/manifest.json"

Cube — point to a Cube schema file:

semantic:
  source:
    type: cube
    path: "./cube/schema.yml"

Resource Limits

resources:
  cost_limit_usd: 5.00          # max estimated query cost
  max_retries: 3                 # max blocked queries per session
  token_budget: 50000            # max tokens consumed
  max_query_time_seconds: 30     # max wall-clock query time
  max_rows_scanned: 1000000      # max rows an EXPLAIN may estimate

Optional Dependencies

Extra Package Purpose
duckdb duckdb DuckDB adapter
bigquery google-cloud-bigquery BigQuery adapter
snowflake snowflake-connector-python Snowflake adapter
postgres psycopg2-binary PostgreSQL adapter
agent-sdk claude-agent-sdk Claude Agent SDK integration
agent-contracts ai-agent-contracts>=0.2.0 ai-agent-contracts bridge

Example

See examples/revenue_agent/ for a complete working example with a DuckDB database, YAML semantic source, and Claude Agent SDK integration.

uv run python examples/revenue_agent/setup_db.py
uv run python examples/revenue_agent/agent.py "What was Q1 revenue by region?"

Architecture

See docs/architecture.md for the full design spec covering the layered architecture, YAML schema, validation pipeline, tool design, semantic sources, database adapters, and the optional ai-agent-contracts bridge.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_data_contracts-0.2.0.tar.gz (194.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_data_contracts-0.2.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file agentic_data_contracts-0.2.0.tar.gz.

File metadata

  • Download URL: agentic_data_contracts-0.2.0.tar.gz
  • Upload date:
  • Size: 194.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_data_contracts-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a3727016a92294d4890cb7e70227547a51d710e5c71685d4ab46457f55ddddbf
MD5 50e823bb1583a3ae778fa4b6695754db
BLAKE2b-256 a3dea83de78d0effb7d6174202e9e2b5caf47bc9a9d7deef606ad2b2a407ce1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_data_contracts-0.2.0.tar.gz:

Publisher: ci.yml on flyersworder/agentic-data-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_data_contracts-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentic_data_contracts-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 641fa4eb6c8f558a8c70f3073a66c7a78c9c93c9030dccc7d48d0e1f23a6a70c
MD5 31a0395929659cfa9427269dd60e6f12
BLAKE2b-256 e9de5cca2bf4e611532a2aecb69157bed2fef6016155f01ada6a77bb9c7fb614

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_data_contracts-0.2.0-py3-none-any.whl:

Publisher: ci.yml on flyersworder/agentic-data-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page