Skip to main content

YAML-first data contract governance for AI agents

Project description

agentic-data-contracts

PyPI version CI Python 3.12+ License: MIT

Stop your AI agents from running wild on your data.

agentic-data-contracts lets data engineers define governance contracts in YAML — what tables an agent may query, which operations are forbidden, what resource limits apply — and enforces them automatically at query time via SQL validation powered by sqlglot.

Why? AI agents querying databases face two problems: resource runaway (unbounded compute, endless retries, cost overruns) and semantic inconsistency (wrong tables, missing filters, ad-hoc metric definitions). This library addresses both with a single YAML contract.

Works with: Claude Agent SDK (primary target), or any Python agent framework. Optionally integrates with ai-agent-contracts for formal resource governance.

How It Works

Agent: "SELECT * FROM analytics.orders"
  -> BLOCKED (no SELECT * — specify explicit columns)

Agent: "SELECT order_id, amount FROM analytics.orders"
  -> BLOCKED (missing required filter: tenant_id)

Agent: "SELECT order_id, amount FROM analytics.orders WHERE tenant_id = 'acme'"
  -> PASSED + WARN (consider using semantic revenue definition)

Agent: "DELETE FROM analytics.orders WHERE id = 1"
  -> BLOCKED (forbidden operation: DELETE)

The contract defines the rules. The library enforces them — before the query ever reaches the database.

Installation

uv add agentic-data-contracts
# or
pip install agentic-data-contracts

With optional database adapters:

uv add "agentic-data-contracts[duckdb]"      # DuckDB
uv add "agentic-data-contracts[bigquery]"    # BigQuery
uv add "agentic-data-contracts[snowflake]"   # Snowflake
uv add "agentic-data-contracts[postgres]"    # PostgreSQL
uv add "agentic-data-contracts[agent-sdk]"   # Claude Agent SDK integration

Quick Start

1. Write a YAML contract

# contract.yml
version: "1.0"
name: revenue-analysis

semantic:
  source:
    type: yaml
    path: "./semantic.yml"
  allowed_tables:
    - schema: analytics
      tables: [orders, customers, subscriptions]
  forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]
  rules:
    - name: tenant_isolation
      description: "All queries must filter by tenant_id"
      enforcement: block
      filter_column: tenant_id
    - name: no_select_star
      description: "Must specify explicit columns"
      enforcement: block

resources:
  cost_limit_usd: 5.00
  max_retries: 3
  token_budget: 50000

temporal:
  max_duration_seconds: 300

2. Load the contract and create tools

from agentic_data_contracts import DataContract, create_tools
from agentic_data_contracts.adapters.duckdb import DuckDBAdapter

dc = DataContract.from_yaml("contract.yml")
adapter = DuckDBAdapter("analytics.duckdb")

# Semantic source is auto-loaded from contract config (source.type + source.path)
tools = create_tools(dc, adapter=adapter)

3. Use with the Claude Agent SDK (requires claude-agent-sdk>=0.1.52)

import asyncio
from agentic_data_contracts import create_sdk_mcp_server
from claude_agent_sdk import (
    ClaudeAgentOptions,
    AssistantMessage,
    TextBlock,
    query,
)

# One-liner: wraps all 10 tools and bundles into an SDK MCP server
server = create_sdk_mcp_server(dc, adapter=adapter)

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    system_prompt=f"You are a revenue analytics assistant.\n\n{dc.to_system_prompt()}",
    mcp_servers={"dc": server},
    **dc.to_sdk_config(),  # token_budget → task_budget, max_retries → max_turns
)

async def run(prompt: str) -> None:
    async for message in query(prompt=prompt, options=options):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

asyncio.run(run("What was total revenue by region in Q1 2025?"))

4. Or use the tools directly (no SDK required)

import asyncio

async def demo() -> None:
    # Validate a query without executing
    validate = next(t for t in tools if t.name == "validate_query")
    result = await validate.callable(
        {"sql": "SELECT id, amount FROM analytics.orders WHERE tenant_id = 'acme'"}
    )
    print(result["content"][0]["text"])
    # VALID — Query passed all checks.

    # Blocked query
    result = await validate.callable({"sql": "SELECT * FROM analytics.orders"})
    print(result["content"][0]["text"])
    # BLOCKED — Violations:
    # - SELECT * is not allowed — specify explicit columns

asyncio.run(demo())

The 10 Tools

Tool Description
list_schemas List all allowed database schemas from the contract
list_tables List allowed tables, optionally filtered by schema
describe_table Get full column details for an allowed table
preview_table Preview sample rows from an allowed table
list_metrics List metric definitions, optionally filtered by domain
lookup_metric Get a metric definition; fuzzy search fallback when no exact match
validate_query Validate a SQL query against contract rules without executing
query_cost_estimate Estimate cost and row count via EXPLAIN
run_query Validate and execute a SQL query, returning results
get_contract_info Get the full contract: rules, limits, and session status

Contract Rules

Rules are enforced at three levels:

  • block — query is rejected and an error is returned to the agent
  • warn — query proceeds but a warning is included in the response
  • log — violation is recorded but not surfaced to the agent

Built-in checkers enforce:

  • Table allowlist — only tables listed in allowed_tables may be queried
  • Operation blocklistforbidden_operations (DELETE, DROP, etc.) are rejected
  • Required filters — rules with filter_column require a matching WHERE clause
  • No SELECT * — queries must name explicit columns

Semantic Sources

A semantic source provides metric and table schema metadata to the agent.

YAML (built-in):

# semantic.yml
metrics:
  - name: total_revenue
    description: "Total revenue from completed orders"
    sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
    source_model: analytics.orders

tables:
  - schema: analytics
    table: orders
    columns:
      - name: id
        type: INTEGER
      - name: amount
        type: DECIMAL
      - name: tenant_id
        type: VARCHAR

dbt — point to a manifest.json:

semantic:
  source:
    type: dbt
    path: "./dbt/manifest.json"

Cube — point to a Cube schema file:

semantic:
  source:
    type: cube
    path: "./cube/schema.yml"

Scalable Metric Discovery

For large data lakes with hundreds of KPIs, group metrics by domain and let the agent discover them efficiently:

semantic:
  domains:
    acquisition: [CAC, CPA, CPL, click_through_rate]
    retention: [churn_rate, LTV, retention_30d]
    attribution: [ROAS, first_touch_revenue]

The system prompt gets a compact index (names + descriptions grouped by domain). The agent uses lookup_metric for full SQL definitions — with fuzzy fallback when it doesn't know the exact name:

lookup_metric("CAC")                → exact match, full definition
lookup_metric("acquisition cost")   → fuzzy match, returns [CAC, CPA] as candidates
list_metrics(domain="retention")    → only retention metrics

Resource Limits

resources:
  cost_limit_usd: 5.00          # max estimated query cost
  max_retries: 3                 # max blocked queries per session
  token_budget: 50000            # max tokens consumed
  max_query_time_seconds: 30     # max wall-clock query time
  max_rows_scanned: 1000000      # max rows an EXPLAIN may estimate

Optional Dependencies

Extra Package Purpose
duckdb duckdb DuckDB adapter
bigquery google-cloud-bigquery BigQuery adapter
snowflake snowflake-connector-python Snowflake adapter
postgres psycopg2-binary PostgreSQL adapter
agent-sdk claude-agent-sdk Claude Agent SDK integration
agent-contracts ai-agent-contracts>=0.2.0 ai-agent-contracts bridge

Example

See examples/revenue_agent/ for a complete working example with a DuckDB database, YAML semantic source, and Claude Agent SDK integration.

uv run python examples/revenue_agent/setup_db.py
uv run python examples/revenue_agent/agent.py "What was Q1 revenue by region?"

Architecture

See docs/architecture.md for the full design spec covering the layered architecture, YAML schema, validation pipeline, tool design, semantic sources, database adapters, and the optional ai-agent-contracts bridge.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_data_contracts-0.2.3.tar.gz (196.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_data_contracts-0.2.3-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file agentic_data_contracts-0.2.3.tar.gz.

File metadata

  • Download URL: agentic_data_contracts-0.2.3.tar.gz
  • Upload date:
  • Size: 196.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_data_contracts-0.2.3.tar.gz
Algorithm Hash digest
SHA256 733d9af31c3a703960692053634a6658e7f95d315ee2efed44c4c6dadc81af44
MD5 1ab0d668635041f0e860b6e9b76324df
BLAKE2b-256 d9cd2613f4e3b99afbd49541dd590aebd8c1b529d8f25229fc09b14b33c51698

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_data_contracts-0.2.3.tar.gz:

Publisher: ci.yml on flyersworder/agentic-data-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_data_contracts-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for agentic_data_contracts-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9c24f92e0f3b778cae4810dfac5545c77100e5df7b6a0cc1dddce2220318a7fe
MD5 d69b344f88c52804c71f7c744becf996
BLAKE2b-256 1bea7045111900f97a96d1f44d8cb6ff2685e0ef9abd544a2007cec41d2ca860

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_data_contracts-0.2.3-py3-none-any.whl:

Publisher: ci.yml on flyersworder/agentic-data-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page