Provenance tracking and citation verification for pydantic-ai agents
Project description
pydantic-ai-provenance
Provenance tracking and citation verification for pydantic-ai agents.
Attach ProvenanceCapability to any pydantic-ai agent and get:
- A full execution DAG — every tool call, model request, and response linked in a directed acyclic graph.
- Automatic citation keys (
d_1,d_2,a_1, …) injected into source tool results so the LLM can cite them inline. - Multi-agent attribution — subagent outputs propagate through a shared store via
contextvars, enabling transitive citation resolution across agent boundaries. - Citation verification — TF-IDF cosine overlap (Step 2) and optional LLM entailment (Step 3) to validate every
[REF|…]tag in the final output. - Graph visualisation — export as Mermaid, GraphViz DOT, or JSON.
Installation
pip install pydantic-ai-provenance
Or with uv:
uv add pydantic-ai-provenance
Install directly from GitHub (latest development version):
pip install git+https://github.com/dugarsumit/pydantic-ai-provenance.git
uv add git+https://github.com/dugarsumit/pydantic-ai-provenance
Requirements: Python ≥ 3.12, pydantic-ai ≥ 1.80.
Quick start
import asyncio
from pydantic_ai import Agent
from pydantic_ai_provenance.capability import ProvenanceCapability
from pydantic_ai_provenance.attribution import attribute_output
provenance = ProvenanceCapability(
agent_name="summariser",
source_tools=["read_file"], # tools whose results are raw data sources
)
agent = Agent(
"anthropic:claude-sonnet-4-6",
capabilities=[provenance],
system_prompt="Summarise the content of files.",
)
@agent.tool_plain
def read_file(path: str) -> str:
return open(path).read()
async def main():
result = await agent.run("Read report.txt and summarise it.")
store = provenance.store
# Path-level attribution
print(attribute_output(store).summary())
# Mermaid diagram
print(store.to_mermaid())
# Citation verification (Steps 1 + 2, no extra API calls)
report = await provenance.verify(result.output)
print(report.text_with_verified_citations)
asyncio.run(main())
Citation format
The LLM is instructed to emit [REF|key] tags immediately after any claim derived from a source:
The report states revenue grew 12% YoY. [REF|d_1]
Multi-source claims use pipe-separated keys:
Both documents confirm the finding. [REF|d_1|d_2]
Multi-agent usage
Share the same ProvenanceCapability store across a coordinator and its subagents:
from pydantic_ai import Agent
from pydantic_ai_provenance.capability import ProvenanceCapability
research_cap = ProvenanceCapability(agent_name="researcher", source_tools=["fetch_url"])
coord_cap = ProvenanceCapability(agent_name="coordinator")
research_agent = Agent("anthropic:claude-haiku-4-5-20251001", capabilities=[research_cap])
coord_agent = Agent("anthropic:claude-sonnet-4-6", capabilities=[coord_cap])
@research_agent.tool_plain
def fetch_url(url: str) -> str: ...
@coord_agent.tool
async def delegate(ctx, topic: str) -> str:
result = await research_agent.run(f"Research: {topic}", usage=ctx.usage)
return result.output
async def main():
result = await coord_agent.run("Summarise pydantic-ai.")
# Both agents share the same store automatically via contextvars
store = coord_cap.store
print(store.to_mermaid())
API reference
Core
| Symbol | Description |
|---|---|
ProvenanceCapability |
pydantic-ai AbstractCapability that hooks into agent lifecycle |
ProvenanceStore |
Central registry: graph + citation key → node mapping |
Graph primitives
| Symbol | Description |
|---|---|
ProvenanceGraph |
DAG container with path traversal helpers |
ProvenanceNode |
Single execution step (id, type, label, data, timestamp) |
ProvenanceEdge |
Directed edge with optional label |
NodeType |
Enum: INPUT, DATA_READ, TOOL_CALL, TOOL_RESULT, MODEL_REQUEST, MODEL_RESPONSE, AGENT_RUN, FINAL_OUTPUT |
Attribution
| Symbol | Description |
|---|---|
attribute_output(store, output_node_id=None) |
Full path attribution for one FINAL_OUTPUT node |
attribute_all_outputs(store) |
Attribution for every FINAL_OUTPUT |
AttributionResult |
.sources, .paths, .summary() |
AttributionPath |
Single source-to-output path with .hop_count |
Citations
| Symbol | Description |
|---|---|
parse_citations(text) |
Extract all `[REF |
citation_tag_spans(text) |
Same but with (start, end, CitationRef) positions |
strip_inline_citation_tags(text) |
Remove all `[REF |
strip_inline_citation_tags_preserve_leading_ref_header(text) |
Strip body tags but keep an opening block header |
Verification
| Symbol | Description |
|---|---|
await verify_citations(text, store) |
Steps 1 (key sanitisation) + 2 (TF-IDF overlap) |
strip_unresolvable_citation_keys(text, store) |
Step 1 only: remove keys not in the store |
claim_source_tfidf_cosine(claim, source) |
Max cosine similarity over sliding source windows |
entailment_agent(model) |
Build a pydantic-ai Step 3 entailment judge |
refine_claim_source_similarities(records) |
Narrow results by top-N and min-score filters |
CitationVerificationReport |
.original_text, .text_with_verified_citations, .claim_source_similarities |
Visualisation
| Symbol | Description |
|---|---|
store.to_html(title="Provenance Graph") |
Self-contained interactive HTML page (Cytoscape.js) |
store.open_in_browser(title="Provenance Graph") |
Write HTML to a temp file and open in the default browser |
store.to_mermaid() |
Mermaid flowchart string |
store.to_dot(graph_name="provenance") |
GraphViz DOT string |
store.to_json() |
dict with nodes and edges lists |
store.to_json_str(indent=2) |
JSON string |
Running the examples
# Offline citation verification (no API keys required)
uv run python examples/verify_citations.py
# Single-agent example
ANTHROPIC_API_KEY=... uv run python examples/single_agent.py
# or Azure OpenAI:
AZURE_OPENAI_ENDPOINT=https://... AZURE_OPENAI_API_KEY=... uv run python examples/single_agent.py
# Multi-agent example (opens interactive provenance graph in browser after the run)
ANTHROPIC_API_KEY=... uv run python examples/multi_agent.py
Development
git clone https://github.com/dugarsumit/pydantic-ai-provenance.git
cd pydantic-ai-provenance
uv sync --extra dev
uv run pytest
uv run ruff check .
See CONTRIBUTING.md for the full contributing guide.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_ai_provenance-0.1.0.tar.gz.
File metadata
- Download URL: pydantic_ai_provenance-0.1.0.tar.gz
- Upload date:
- Size: 146.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cf844c5ffa6fe70213cc9163323039ef76ddcdc5a3e94e4ffa9c7a43e764eb5
|
|
| MD5 |
9bff8ce47597c87ace8c99e7a0c7c669
|
|
| BLAKE2b-256 |
dc40e39cd034ca408926a6e49c59e028a1fca6c639b271a5fb7b78f4d162d71a
|
Provenance
The following attestation bundles were made for pydantic_ai_provenance-0.1.0.tar.gz:
Publisher:
publish.yml on dugarsumit/pydantic-ai-provenance
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pydantic_ai_provenance-0.1.0.tar.gz -
Subject digest:
6cf844c5ffa6fe70213cc9163323039ef76ddcdc5a3e94e4ffa9c7a43e764eb5 - Sigstore transparency entry: 1508145736
- Sigstore integration time:
-
Permalink:
dugarsumit/pydantic-ai-provenance@cb3111a211fdfe0571ad2945d832d235cee53fcf -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/dugarsumit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cb3111a211fdfe0571ad2945d832d235cee53fcf -
Trigger Event:
release
-
Statement type:
File details
Details for the file pydantic_ai_provenance-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pydantic_ai_provenance-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41b782525aa96801ece4ffe9da59f6f534aa06c7fd38c147c7bc2607a3f13931
|
|
| MD5 |
8c027731282c4d11ae36f12dca4e4f59
|
|
| BLAKE2b-256 |
010361a487090e0df3e652f9b987f971c888d599f5c4aef3fdbd522aa5f8ddb7
|
Provenance
The following attestation bundles were made for pydantic_ai_provenance-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on dugarsumit/pydantic-ai-provenance
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pydantic_ai_provenance-0.1.0-py3-none-any.whl -
Subject digest:
41b782525aa96801ece4ffe9da59f6f534aa06c7fd38c147c7bc2607a3f13931 - Sigstore transparency entry: 1508146033
- Sigstore integration time:
-
Permalink:
dugarsumit/pydantic-ai-provenance@cb3111a211fdfe0571ad2945d832d235cee53fcf -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/dugarsumit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cb3111a211fdfe0571ad2945d832d235cee53fcf -
Trigger Event:
release
-
Statement type: