YAML-first data contract governance for AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

flyerwolf

These details have not been verified by PyPI

Project description

agentic-data-contracts

Stop your AI agents from running wild on your data.

agentic-data-contracts lets data engineers define governance contracts in YAML — what tables an agent may query, which operations are forbidden, what resource limits apply — and enforces them automatically at query time via SQL validation powered by sqlglot.

Why? AI agents querying databases face two problems: resource runaway (unbounded compute, endless retries, cost overruns) and semantic inconsistency (wrong tables, missing filters, ad-hoc metric definitions). This library addresses both with a single YAML contract.

Works with: Claude Agent SDK (primary target), or any Python agent framework. Optionally integrates with ai-agent-contracts for formal resource governance.

How It Works

Agent: "SELECT * FROM analytics.orders"
  -> BLOCKED (no SELECT * — specify explicit columns)

Agent: "SELECT order_id, amount FROM analytics.orders"
  -> BLOCKED (missing required filter: tenant_id)

Agent: "SELECT order_id, amount FROM analytics.orders WHERE tenant_id = 'acme'"
  -> PASSED + WARN (consider using semantic revenue definition)

Agent: "DELETE FROM analytics.orders WHERE id = 1"
  -> BLOCKED (forbidden operation: DELETE)

The contract defines the rules. The library enforces them — before the query ever reaches the database.

Installation

uv add agentic-data-contracts
# or
pip install agentic-data-contracts

With optional database adapters:

uv add "agentic-data-contracts[duckdb]"      # DuckDB
uv add "agentic-data-contracts[bigquery]"    # BigQuery
uv add "agentic-data-contracts[snowflake]"   # Snowflake
uv add "agentic-data-contracts[postgres]"    # PostgreSQL
uv add "agentic-data-contracts[agent-sdk]"   # Claude Agent SDK integration

Quick Start

1. Write a YAML contract

# contract.yml
version: "1.0"
name: revenue-analysis

semantic:
  source:
    type: yaml
    path: "./semantic.yml"
  allowed_tables:
    - schema: analytics
      tables: ["*"]          # all tables in schema (discovered from database)
    - schema: marketing
      tables: [campaigns]    # or list specific tables
  forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]
  rules:
    - name: tenant_isolation
      description: "All queries must filter by tenant_id"
      enforcement: block
      filter_column: tenant_id
    - name: no_select_star
      description: "Must specify explicit columns"
      enforcement: block

resources:
  cost_limit_usd: 5.00
  max_retries: 3
  token_budget: 50000

temporal:
  max_duration_seconds: 300

2. Load the contract and create tools

from agentic_data_contracts import DataContract, create_tools
from agentic_data_contracts.adapters.duckdb import DuckDBAdapter

dc = DataContract.from_yaml("contract.yml")
adapter = DuckDBAdapter("analytics.duckdb")

# Semantic source is auto-loaded from contract config (source.type + source.path)
tools = create_tools(dc, adapter=adapter)

3. Use with the Claude Agent SDK (requires `claude-agent-sdk>=0.1.52`)

import asyncio
from agentic_data_contracts import create_sdk_mcp_server
from claude_agent_sdk import (
    ClaudeAgentOptions,
    AssistantMessage,
    TextBlock,
    query,
)

# One-liner: wraps all 10 tools and bundles into an SDK MCP server
server = create_sdk_mcp_server(dc, adapter=adapter)

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    system_prompt=f"You are a revenue analytics assistant.\n\n{dc.to_system_prompt()}",
    mcp_servers={"dc": server},
    **dc.to_sdk_config(),  # token_budget → task_budget, max_retries → max_turns
)

async def run(prompt: str) -> None:
    async for message in query(prompt=prompt, options=options):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

asyncio.run(run("What was total revenue by region in Q1 2025?"))

4. Or use the tools directly (no SDK required)

import asyncio

async def demo() -> None:
    # Validate a query without executing
    validate = next(t for t in tools if t.name == "validate_query")
    result = await validate.callable(
        {"sql": "SELECT id, amount FROM analytics.orders WHERE tenant_id = 'acme'"}
    )
    print(result["content"][0]["text"])
    # VALID — Query passed all checks.

    # Blocked query
    result = await validate.callable({"sql": "SELECT * FROM analytics.orders"})
    print(result["content"][0]["text"])
    # BLOCKED — Violations:
    # - SELECT * is not allowed — specify explicit columns

asyncio.run(demo())

The 10 Tools

Tool	Description
`list_schemas`	List all allowed database schemas from the contract
`list_tables`	List allowed tables, optionally filtered by schema
`describe_table`	Get full column details for an allowed table
`preview_table`	Preview sample rows from an allowed table
`list_metrics`	List metric definitions, optionally filtered by domain
`lookup_metric`	Get a metric definition; fuzzy search fallback when no exact match
`validate_query`	Validate a SQL query against contract rules without executing
`query_cost_estimate`	Estimate cost and row count via EXPLAIN
`run_query`	Validate and execute a SQL query, returning results
`get_contract_info`	Get the full contract: rules, limits, and session status

Contract Rules

Rules are enforced at three levels:

block — query is rejected and an error is returned to the agent
warn — query proceeds but a warning is included in the response
log — violation is recorded but not surfaced to the agent

Built-in checkers enforce:

Table allowlist — only tables listed in allowed_tables may be queried
Operation blocklist — forbidden_operations (DELETE, DROP, etc.) are rejected
Required filters — rules with filter_column require a matching WHERE clause
No SELECT * — queries must name explicit columns

Semantic Sources

A semantic source provides metric and table schema metadata to the agent.

YAML (built-in):

# semantic.yml
metrics:
  - name: total_revenue
    description: "Total revenue from completed orders"
    sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
    source_model: analytics.orders

tables:
  - schema: analytics
    table: orders
    columns:
      - name: id
        type: INTEGER
      - name: amount
        type: DECIMAL
      - name: tenant_id
        type: VARCHAR

dbt — point to a manifest.json:

semantic:
  source:
    type: dbt
    path: "./dbt/manifest.json"

Cube — point to a Cube schema file:

semantic:
  source:
    type: cube
    path: "./cube/schema.yml"

Table Relationships

Define join paths so the agent knows how to combine tables correctly:

# semantic.yml
relationships:
  - from: analytics.orders.customer_id
    to: analytics.customers.id
    type: many_to_one
  - from: analytics.orders.product_id
    to: analytics.products.id
    type: many_to_one

The agent sees these in its system prompt and uses them to write correct JOINs instead of guessing from column names.

Scalable Metric Discovery

For large data lakes with hundreds of KPIs, group metrics by domain and let the agent discover them efficiently:

semantic:
  domains:
    acquisition: [CAC, CPA, CPL, click_through_rate]
    retention: [churn_rate, LTV, retention_30d]
    attribution: [ROAS, first_touch_revenue]

The system prompt gets a compact index (names + descriptions grouped by domain). The agent uses lookup_metric for full SQL definitions — with fuzzy fallback when it doesn't know the exact name:

lookup_metric("CAC")                → exact match, full definition
lookup_metric("acquisition cost")   → fuzzy match, returns [CAC, CPA] as candidates
list_metrics(domain="retention")    → only retention metrics

Scaling to Large Organizations

Tested for 200+ tables, 300+ metrics, 50+ relationships across multiple schemas.

Concern	How it scales
System prompt size	>20 metrics: auto-switches to compact domain counts (`acquisition (45)`) instead of listing every metric
Table discovery	`list_tables` is paginated (default 50, with offset). Use `schema` filter for targeted browsing
Wildcard schemas	`tables: ["*"]` discovers tables from the database. Resolution is cached — no repeated queries
Metric lookup	Fuzzy search via `thefuzz` (C++ backed) — sub-millisecond even with 1000+ metrics
SQL validation	Set-based allowlist check — O(1) per table reference regardless of allowlist size

Resource Limits

resources:
  cost_limit_usd: 5.00          # max estimated query cost
  max_retries: 3                 # max blocked queries per session
  token_budget: 50000            # max tokens consumed
  max_query_time_seconds: 30     # max wall-clock query time
  max_rows_scanned: 1000000      # max rows an EXPLAIN may estimate

Optional Dependencies

Extra	Package	Purpose
`duckdb`	`duckdb`	DuckDB adapter
`bigquery`	`google-cloud-bigquery`	BigQuery adapter
`snowflake`	`snowflake-connector-python`	Snowflake adapter
`postgres`	`psycopg2-binary`	PostgreSQL adapter
`agent-sdk`	`claude-agent-sdk`	Claude Agent SDK integration
`agent-contracts`	`ai-agent-contracts>=0.2.0`	ai-agent-contracts bridge

Example

See examples/revenue_agent/ for a complete working example with a DuckDB database, YAML semantic source, and Claude Agent SDK integration.

uv run python examples/revenue_agent/setup_db.py
uv run python examples/revenue_agent/agent.py "What was Q1 revenue by region?"

Architecture

See docs/architecture.md for the full design spec covering the layered architecture, YAML schema, validation pipeline, tool design, semantic sources, database adapters, and the optional ai-agent-contracts bridge.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

flyerwolf

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.2

Apr 15, 2026

0.9.1

Apr 13, 2026

0.9.0

Apr 13, 2026

0.8.0

Apr 12, 2026

0.7.1

Apr 11, 2026

0.7.0

Apr 11, 2026

0.6.0

Apr 9, 2026

0.5.0

Apr 4, 2026

0.4.0

Mar 31, 2026

0.3.0

Mar 30, 2026

This version

0.2.6

Mar 29, 2026

0.2.5

Mar 29, 2026

0.2.4

Mar 29, 2026

0.2.3

Mar 29, 2026

0.2.2

Mar 28, 2026

0.2.1

Mar 28, 2026

0.2.0

Mar 28, 2026

0.1.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_data_contracts-0.2.6.tar.gz (200.4 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentic_data_contracts-0.2.6-py3-none-any.whl (29.6 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file agentic_data_contracts-0.2.6.tar.gz.

File metadata

Download URL: agentic_data_contracts-0.2.6.tar.gz
Upload date: Mar 29, 2026
Size: 200.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_data_contracts-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`5fd701c67914eff78525909708444aed926e64862bd554203e9530755da763c8`
MD5	`7e65fa25f7703b9e23b062f715f7e338`
BLAKE2b-256	`ce79c24bdd2fe1aa20b6579bf45609981d6b5e72085a7a71ce2d3fa03ac59f3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_data_contracts-0.2.6.tar.gz:

Publisher: ci.yml on flyersworder/agentic-data-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_data_contracts-0.2.6.tar.gz
- Subject digest: 5fd701c67914eff78525909708444aed926e64862bd554203e9530755da763c8
- Sigstore transparency entry: 1193743184
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: flyersworder/agentic-data-contracts@2bb09d23f574166ec2d62b0acd4d48cecab5c4fc
- Branch / Tag: refs/tags/v0.2.6
- Owner: https://github.com/flyersworder
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@2bb09d23f574166ec2d62b0acd4d48cecab5c4fc
- Trigger Event: release

File details

Details for the file agentic_data_contracts-0.2.6-py3-none-any.whl.

File metadata

Download URL: agentic_data_contracts-0.2.6-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 29.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_data_contracts-0.2.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6fb77c1601e152415cc93c898ef36973b8fb2e2d7ac2def0724661bc1c5daa6c`
MD5	`a069e4913c10a33302fefe9c01e3bdeb`
BLAKE2b-256	`804ce89952ef7ed8667cb43ea755e3b49441b90c476844d9c267d02821dd9818`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_data_contracts-0.2.6-py3-none-any.whl:

Publisher: ci.yml on flyersworder/agentic-data-contracts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_data_contracts-0.2.6-py3-none-any.whl
- Subject digest: 6fb77c1601e152415cc93c898ef36973b8fb2e2d7ac2def0724661bc1c5daa6c
- Sigstore transparency entry: 1193743198
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: flyersworder/agentic-data-contracts@2bb09d23f574166ec2d62b0acd4d48cecab5c4fc
- Branch / Tag: refs/tags/v0.2.6
- Owner: https://github.com/flyersworder
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@2bb09d23f574166ec2d62b0acd4d48cecab5c4fc
- Trigger Event: release

agentic-data-contracts 0.2.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

agentic-data-contracts

How It Works

Installation

Quick Start

1. Write a YAML contract

2. Load the contract and create tools

3. Use with the Claude Agent SDK (requires claude-agent-sdk>=0.1.52)

4. Or use the tools directly (no SDK required)

The 10 Tools

Contract Rules

Semantic Sources

Table Relationships

Scalable Metric Discovery

Scaling to Large Organizations

Resource Limits

Optional Dependencies

Example

Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

3. Use with the Claude Agent SDK (requires `claude-agent-sdk>=0.1.52`)