YAML-first data contract governance for AI agents
Project description
agentic-data-contracts
Stop your AI agents from running wild on your data.
agentic-data-contracts lets data engineers define governance contracts in YAML — what tables an agent may query, which operations are forbidden, what resource limits apply — and enforces them automatically at query time via SQL validation powered by sqlglot.
Why? AI agents querying databases face two problems: resource runaway (unbounded compute, endless retries, cost overruns) and semantic inconsistency (wrong tables, missing filters, ad-hoc metric definitions). This library addresses both with a single YAML contract.
Works with: Claude Agent SDK (primary target), or any Python agent framework. Optionally integrates with ai-agent-contracts for formal resource governance.
How It Works
Agent: "SELECT * FROM analytics.orders"
-> BLOCKED (no SELECT * — specify explicit columns)
Agent: "SELECT order_id, amount FROM analytics.orders"
-> BLOCKED (missing required filter: tenant_id)
Agent: "SELECT order_id, amount FROM analytics.orders WHERE tenant_id = 'acme'"
-> PASSED + WARN (consider using semantic revenue definition)
Agent: "DELETE FROM analytics.orders WHERE id = 1"
-> BLOCKED (forbidden operation: DELETE)
The contract defines the rules. The library enforces them — before the query ever reaches the database.
Installation
uv add agentic-data-contracts
# or
pip install agentic-data-contracts
With optional database adapters:
uv add "agentic-data-contracts[duckdb]" # DuckDB
uv add "agentic-data-contracts[bigquery]" # BigQuery
uv add "agentic-data-contracts[snowflake]" # Snowflake
uv add "agentic-data-contracts[postgres]" # PostgreSQL
uv add "agentic-data-contracts[agent-sdk]" # Claude Agent SDK integration
Quick Start
1. Write a YAML contract
# contract.yml
version: "1.0"
name: revenue-analysis
semantic:
source:
type: yaml
path: "./semantic.yml"
allowed_tables:
- schema: analytics
tables: [orders, customers, subscriptions]
forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]
rules:
- name: tenant_isolation
description: "All queries must filter by tenant_id"
enforcement: block
filter_column: tenant_id
- name: no_select_star
description: "Must specify explicit columns"
enforcement: block
resources:
cost_limit_usd: 5.00
max_retries: 3
token_budget: 50000
temporal:
max_duration_seconds: 300
2. Load the contract and create tools
from agentic_data_contracts import DataContract, create_tools
from agentic_data_contracts.adapters.duckdb import DuckDBAdapter
dc = DataContract.from_yaml("contract.yml")
adapter = DuckDBAdapter("analytics.duckdb")
# Semantic source is auto-loaded from contract config (source.type + source.path)
tools = create_tools(dc, adapter=adapter)
3. Use with the Claude Agent SDK
import asyncio
from claude_agent_sdk import (
ClaudeAgentOptions,
AssistantMessage,
TextBlock,
create_sdk_mcp_server,
query,
)
server = create_sdk_mcp_server(name="data-contracts", version="1.0.0", tools=tools)
# Contract limits map to SDK options (token_budget → task_budget, max_retries → max_turns)
sdk_config = dc.to_sdk_config()
options = ClaudeAgentOptions(
model="claude-sonnet-4-6",
system_prompt=f"You are a revenue analytics assistant.\n\n{dc.to_system_prompt()}",
mcp_servers={"dc": server},
allowed_tools=[f"mcp__dc__{t.name}" for t in tools],
**sdk_config,
)
async def run(prompt: str) -> None:
async for message in query(prompt=prompt, options=options):
if isinstance(message, AssistantMessage):
for block in message.content:
if isinstance(block, TextBlock):
print(block.text)
asyncio.run(run("What was total revenue by region in Q1 2025?"))
4. Or use the tools directly (no SDK required)
import asyncio
async def demo() -> None:
# Validate a query without executing
validate = next(t for t in tools if t.name == "validate_query")
result = await validate.callable(
{"sql": "SELECT id, amount FROM analytics.orders WHERE tenant_id = 'acme'"}
)
print(result["content"][0]["text"])
# VALID — Query passed all checks.
# Blocked query
result = await validate.callable({"sql": "SELECT * FROM analytics.orders"})
print(result["content"][0]["text"])
# BLOCKED — Violations:
# - SELECT * is not allowed — specify explicit columns
asyncio.run(demo())
The 10 Tools
| Tool | Description |
|---|---|
list_schemas |
List all allowed database schemas from the contract |
list_tables |
List allowed tables, optionally filtered by schema |
describe_table |
Get full column details for an allowed table |
preview_table |
Preview sample rows from an allowed table |
list_metrics |
List metric definitions, optionally filtered by domain |
lookup_metric |
Get a metric definition; fuzzy search fallback when no exact match |
validate_query |
Validate a SQL query against contract rules without executing |
query_cost_estimate |
Estimate cost and row count via EXPLAIN |
run_query |
Validate and execute a SQL query, returning results |
get_contract_info |
Get the full contract: rules, limits, and session status |
Contract Rules
Rules are enforced at three levels:
block— query is rejected and an error is returned to the agentwarn— query proceeds but a warning is included in the responselog— violation is recorded but not surfaced to the agent
Built-in checkers enforce:
- Table allowlist — only tables listed in
allowed_tablesmay be queried - Operation blocklist —
forbidden_operations(DELETE, DROP, etc.) are rejected - Required filters — rules with
filter_columnrequire a matching WHERE clause - No SELECT * — queries must name explicit columns
Semantic Sources
A semantic source provides metric and table schema metadata to the agent.
YAML (built-in):
# semantic.yml
metrics:
- name: total_revenue
description: "Total revenue from completed orders"
sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
source_model: analytics.orders
tables:
- schema: analytics
table: orders
columns:
- name: id
type: INTEGER
- name: amount
type: DECIMAL
- name: tenant_id
type: VARCHAR
dbt — point to a manifest.json:
semantic:
source:
type: dbt
path: "./dbt/manifest.json"
Cube — point to a Cube schema file:
semantic:
source:
type: cube
path: "./cube/schema.yml"
Scalable Metric Discovery
For large data lakes with hundreds of KPIs, group metrics by domain and let the agent discover them efficiently:
semantic:
domains:
acquisition: [CAC, CPA, CPL, click_through_rate]
retention: [churn_rate, LTV, retention_30d]
attribution: [ROAS, first_touch_revenue]
The system prompt gets a compact index (names + descriptions grouped by domain). The agent uses lookup_metric for full SQL definitions — with fuzzy fallback when it doesn't know the exact name:
lookup_metric("CAC") → exact match, full definition
lookup_metric("acquisition cost") → fuzzy match, returns [CAC, CPA] as candidates
list_metrics(domain="retention") → only retention metrics
Resource Limits
resources:
cost_limit_usd: 5.00 # max estimated query cost
max_retries: 3 # max blocked queries per session
token_budget: 50000 # max tokens consumed
max_query_time_seconds: 30 # max wall-clock query time
max_rows_scanned: 1000000 # max rows an EXPLAIN may estimate
Optional Dependencies
| Extra | Package | Purpose |
|---|---|---|
duckdb |
duckdb |
DuckDB adapter |
bigquery |
google-cloud-bigquery |
BigQuery adapter |
snowflake |
snowflake-connector-python |
Snowflake adapter |
postgres |
psycopg2-binary |
PostgreSQL adapter |
agent-sdk |
claude-agent-sdk |
Claude Agent SDK integration |
agent-contracts |
ai-agent-contracts>=0.2.0 |
ai-agent-contracts bridge |
Example
See examples/revenue_agent/ for a complete working example with a DuckDB database, YAML semantic source, and Claude Agent SDK integration.
uv run python examples/revenue_agent/setup_db.py
uv run python examples/revenue_agent/agent.py "What was Q1 revenue by region?"
Architecture
See docs/architecture.md for the full design spec covering the layered architecture, YAML schema, validation pipeline, tool design, semantic sources, database adapters, and the optional ai-agent-contracts bridge.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_data_contracts-0.2.2.tar.gz.
File metadata
- Download URL: agentic_data_contracts-0.2.2.tar.gz
- Upload date:
- Size: 194.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e5399cc0f836577dc0d30e6f94dee34cd6f1d5904c0c8f7b3e1cb3e536cc24e
|
|
| MD5 |
075cb28b62df29aa10d72e1e17b27828
|
|
| BLAKE2b-256 |
ac7b2f0240fd348203edf203531ca03566f4935296f3d604c3a5ae16d4499234
|
Provenance
The following attestation bundles were made for agentic_data_contracts-0.2.2.tar.gz:
Publisher:
ci.yml on flyersworder/agentic-data-contracts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_data_contracts-0.2.2.tar.gz -
Subject digest:
8e5399cc0f836577dc0d30e6f94dee34cd6f1d5904c0c8f7b3e1cb3e536cc24e - Sigstore transparency entry: 1191096078
- Sigstore integration time:
-
Permalink:
flyersworder/agentic-data-contracts@26f51bfcd5a6475cb1afceb4285cf4fcb95c530d -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/flyersworder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@26f51bfcd5a6475cb1afceb4285cf4fcb95c530d -
Trigger Event:
release
-
Statement type:
File details
Details for the file agentic_data_contracts-0.2.2-py3-none-any.whl.
File metadata
- Download URL: agentic_data_contracts-0.2.2-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b1b1db4a8c8d96234125da5723e2635b39c6091f554e2a005dac127bb9c8911
|
|
| MD5 |
2cdbf5ba51170bb851b9d1f528acc217
|
|
| BLAKE2b-256 |
15067ffeda99883b784c30efc20f10226a53650309220fd2b4e24d4ba11bf5ef
|
Provenance
The following attestation bundles were made for agentic_data_contracts-0.2.2-py3-none-any.whl:
Publisher:
ci.yml on flyersworder/agentic-data-contracts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_data_contracts-0.2.2-py3-none-any.whl -
Subject digest:
1b1b1db4a8c8d96234125da5723e2635b39c6091f554e2a005dac127bb9c8911 - Sigstore transparency entry: 1191096085
- Sigstore integration time:
-
Permalink:
flyersworder/agentic-data-contracts@26f51bfcd5a6475cb1afceb4285cf4fcb95c530d -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/flyersworder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@26f51bfcd5a6475cb1afceb4285cf4fcb95c530d -
Trigger Event:
release
-
Statement type: