Skip to main content

Provenance tracking and citation verification for pydantic-ai agents

Project description

pydantic-ai-provenance

codecov License Docs Python versions

Provenance tracking and citation verification for pydantic-ai agents.

Attach ProvenanceCapability to any pydantic-ai agent and get:

  • A full execution DAG — every tool call, model request, and response linked in a directed acyclic graph.
  • Automatic citation keys (d_1, d_2, a_1, …) injected into source tool results so the LLM can cite them inline.
  • Multi-agent attribution — subagent outputs propagate through a shared store via contextvars, enabling transitive citation resolution across agent boundaries.
  • Citation verification — TF-IDF cosine overlap (Step 2) and optional LLM entailment (Step 3) to validate every [REF|…] tag in the final output.
  • Graph visualisation — export as Mermaid, GraphViz DOT, or JSON.

Installation

pip install pydantic-ai-provenance

Or with uv:

uv add pydantic-ai-provenance

Install directly from GitHub (latest development version):

pip install git+https://github.com/dugarsumit/pydantic-ai-provenance.git
uv add git+https://github.com/dugarsumit/pydantic-ai-provenance

Requirements: Python ≥ 3.12, pydantic-ai ≥ 1.80.


Quick start

import asyncio
from pydantic_ai import Agent
from pydantic_ai_provenance.capability import ProvenanceCapability
from pydantic_ai_provenance.attribution import attribute_output

provenance = ProvenanceCapability(
    agent_name="summariser",
    source_tools=["read_file"],   # tools whose results are raw data sources
)

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    capabilities=[provenance],
    system_prompt="Summarise the content of files.",
)

@agent.tool_plain
def read_file(path: str) -> str:
    return open(path).read()

async def main():
    result = await agent.run("Read report.txt and summarise it.")

    store = provenance.store

    # Path-level attribution
    print(attribute_output(store).summary())

    # Mermaid diagram
    print(store.to_mermaid())

    # Citation verification (Steps 1 + 2, no extra API calls)
    report = await provenance.verify(result.output)
    print(report.text_with_verified_citations)

asyncio.run(main())

Citation format

The LLM is instructed to emit [REF|key] tags immediately after any claim derived from a source:

The report states revenue grew 12% YoY. [REF|d_1]

Multi-source claims use pipe-separated keys:

Both documents confirm the finding. [REF|d_1|d_2]

Multi-agent usage

Share the same ProvenanceCapability store across a coordinator and its subagents:

from pydantic_ai import Agent
from pydantic_ai_provenance.capability import ProvenanceCapability

research_cap = ProvenanceCapability(agent_name="researcher", source_tools=["fetch_url"])
coord_cap    = ProvenanceCapability(agent_name="coordinator")

research_agent = Agent("anthropic:claude-haiku-4-5-20251001", capabilities=[research_cap])
coord_agent    = Agent("anthropic:claude-sonnet-4-6",          capabilities=[coord_cap])

@research_agent.tool_plain
def fetch_url(url: str) -> str: ...

@coord_agent.tool
async def delegate(ctx, topic: str) -> str:
    result = await research_agent.run(f"Research: {topic}", usage=ctx.usage)
    return result.output

async def main():
    result = await coord_agent.run("Summarise pydantic-ai.")
    # Both agents share the same store automatically via contextvars
    store = coord_cap.store
    print(store.to_mermaid())

API reference

Core

Symbol Description
ProvenanceCapability pydantic-ai AbstractCapability that hooks into agent lifecycle
ProvenanceStore Central registry: graph + citation key → node mapping

Graph primitives

Symbol Description
ProvenanceGraph DAG container with path traversal helpers
ProvenanceNode Single execution step (id, type, label, data, timestamp)
ProvenanceEdge Directed edge with optional label
NodeType Enum: INPUT, DATA_READ, TOOL_CALL, TOOL_RESULT, MODEL_REQUEST, MODEL_RESPONSE, AGENT_RUN, FINAL_OUTPUT

Attribution

Symbol Description
attribute_output(store, output_node_id=None) Full path attribution for one FINAL_OUTPUT node
attribute_all_outputs(store) Attribution for every FINAL_OUTPUT
AttributionResult .sources, .paths, .summary()
AttributionPath Single source-to-output path with .hop_count

Citations

Symbol Description
parse_citations(text) Extract all `[REF
citation_tag_spans(text) Same but with (start, end, CitationRef) positions
strip_inline_citation_tags(text) Remove all `[REF
strip_inline_citation_tags_preserve_leading_ref_header(text) Strip body tags but keep an opening block header

Verification

Symbol Description
await verify_citations(text, store) Steps 1 (key sanitisation) + 2 (TF-IDF overlap)
strip_unresolvable_citation_keys(text, store) Step 1 only: remove keys not in the store
claim_source_tfidf_cosine(claim, source) Max cosine similarity over sliding source windows
entailment_agent(model) Build a pydantic-ai Step 3 entailment judge
refine_claim_source_similarities(records) Narrow results by top-N and min-score filters
CitationVerificationReport .original_text, .text_with_verified_citations, .claim_source_similarities

Visualisation

Symbol Description
store.to_html(title="Provenance Graph") Self-contained interactive HTML page (Cytoscape.js)
store.open_in_browser(title="Provenance Graph") Write HTML to a temp file and open in the default browser
store.to_mermaid() Mermaid flowchart string
store.to_dot(graph_name="provenance") GraphViz DOT string
store.to_json() dict with nodes and edges lists
store.to_json_str(indent=2) JSON string

Running the examples

# Offline citation verification (no API keys required)
uv run python examples/verify_citations.py

# Single-agent example
ANTHROPIC_API_KEY=... uv run python examples/single_agent.py
# or Azure OpenAI:
AZURE_OPENAI_ENDPOINT=https://... AZURE_OPENAI_API_KEY=... uv run python examples/single_agent.py

# Multi-agent example (opens interactive provenance graph in browser after the run)
ANTHROPIC_API_KEY=... uv run python examples/multi_agent.py

Development

git clone https://github.com/dugarsumit/pydantic-ai-provenance.git
cd pydantic-ai-provenance
uv sync --extra dev
uv run pytest
uv run ruff check .

See CONTRIBUTING.md for the full contributing guide.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_ai_provenance-0.1.0.tar.gz (146.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydantic_ai_provenance-0.1.0-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file pydantic_ai_provenance-0.1.0.tar.gz.

File metadata

  • Download URL: pydantic_ai_provenance-0.1.0.tar.gz
  • Upload date:
  • Size: 146.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pydantic_ai_provenance-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6cf844c5ffa6fe70213cc9163323039ef76ddcdc5a3e94e4ffa9c7a43e764eb5
MD5 9bff8ce47597c87ace8c99e7a0c7c669
BLAKE2b-256 dc40e39cd034ca408926a6e49c59e028a1fca6c639b271a5fb7b78f4d162d71a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydantic_ai_provenance-0.1.0.tar.gz:

Publisher: publish.yml on dugarsumit/pydantic-ai-provenance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pydantic_ai_provenance-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pydantic_ai_provenance-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41b782525aa96801ece4ffe9da59f6f534aa06c7fd38c147c7bc2607a3f13931
MD5 8c027731282c4d11ae36f12dca4e4f59
BLAKE2b-256 010361a487090e0df3e652f9b987f971c888d599f5c4aef3fdbd522aa5f8ddb7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydantic_ai_provenance-0.1.0-py3-none-any.whl:

Publisher: publish.yml on dugarsumit/pydantic-ai-provenance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page