Skip to main content

Snapshot tests for AI agents. Record an agent's tool-call trace, diff against a baseline, fail CI on regressions. Python port of @mukundakatta/agentsnap.

Project description

agentsnap-py

PyPI Python License: MIT

Snapshot tests for AI agents. Record an agent's tool-call trace, diff it against a baseline, fail CI on regressions. Zero runtime dependencies. Drops into pytest or any test runner.

Python port of @mukundakatta/agentsnap.

Install

pip install agentsnap-py

Usage

from agentsnap import record, trace_tool, expect_snapshot

search = trace_tool("search", lambda q: fetch_results(q))
summarize = trace_tool("summarize", lambda docs: llm_summarize(docs))

def agent(question):
    docs = search(question)
    return summarize(docs)

def test_research_agent_stays_on_rails():
    trace = record(lambda: agent("What is RLHF?"))
    expect_snapshot(trace, "tests/__snapshots__/research.snap.json")

First run writes the snapshot. Every run after that diffs against it. If the agent calls a different tool, calls them in a different order, or starts erroring, the test fails with a readable diff. Regenerate with AGENTSNAP_UPDATE=1.

Async agents

import asyncio
from agentsnap import arecord, trace_tool, expect_snapshot

asearch = trace_tool("search", async_fetch)

async def agent(q):
    return await asearch(q)

def test_async_agent():
    trace = asyncio.run(arecord(lambda: agent("hello")))
    expect_snapshot(trace, "tests/__snapshots__/async.snap.json")

Diff statuses

Status When Default action
PASSED Bytewise match green
OUTPUT_DRIFT Tools + args identical, only output text or external result hashes differ warn (non-failing)
TOOLS_REORDERED Same tool names, different order fail
TOOLS_CHANGED Different tool names called, or different args fail
REGRESSION New error in the trace, or a tool that used to work now throws fail

Override per snapshot via expect_snapshot(trace, path, fail_on=[...]).

API

record(fn, *, input=None, model=None, capture_results=False) -> Trace

Run fn (sync) and capture every trace_tool-wrapped call inside it. Returns a JSON-serializable dict.

arecord(fn, ...) -> Trace

Async variant for async def agents. Use with asyncio.run(arecord(lambda: agent())) or inside an async test.

trace_tool(name, fn) -> wrapped_fn

Wraps a tool. Inside record, calls go into the trace; outside, transparent pass-through. Works with sync and async tools (returns the same shape).

expect_snapshot(trace, path, *, update=False, fail_on=None) -> dict

Compare against an on-disk JSON baseline. Writes if missing, regenerates if AGENTSNAP_UPDATE=1 (or update=True), otherwise diffs and raises AgentSnapshotMismatch on a failing status.

diff(baseline, current) -> DiffResult

Low-level diff engine. Returns a DiffResult(status=..., changes=[Change(...)]).

format_diff(result, path=None) -> str

Render a colored terminal block for the diff (used in the failure message).

pytest plugin

Installing this package registers a pytest plugin that exposes the same API as fixtures:

def test_my_agent(agentsnap_record, trace_tool, expect_snapshot):
    fn = trace_tool("hello", lambda: "world")
    trace = agentsnap_record(lambda: fn())
    expect_snapshot(trace, "tests/__snapshots__/hello.snap.json")

API differences from the JS sibling

  • Tracing uses contextvars (Python's AsyncLocalStorage equivalent) instead of node:async_hooks.
  • Sync agents use record(); async agents use arecord() -- Python doesn't have JS's "async by default" assumption.
  • Trace is a dict (not a class) so it serializes / inspects naturally.
  • Change.to_dict() produces the JS-style {"path": ..., "from": ..., "to": ...} -- the dataclass uses from_ because from is a Python keyword.
  • Adds a pytest plugin (pyproject.toml pytest11 entry point).

See the JS sibling's README for the full design notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentsnap_py-0.1.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentsnap_py-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file agentsnap_py-0.1.0.tar.gz.

File metadata

  • Download URL: agentsnap_py-0.1.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for agentsnap_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b5addd94b47aa81e5f2d6b500d3b15eb429a083597b1f9d101902370d2d0f5f8
MD5 5bed36528a77200d02cdc5989bfd401c
BLAKE2b-256 1a1508f789727648accce377ef2da663871f33865235944f0e511edddac91599

See more details on using hashes here.

File details

Details for the file agentsnap_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentsnap_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for agentsnap_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba3620c75240485f13b25288058a992f73f3edfea7d8b304b515db0724f35735
MD5 6051357484bb31b806ddcd7459fb2593
BLAKE2b-256 f57d18a4ac514f5293473a8e2267ca705c0ad8be94f7b83180ab2dcd13bd8918

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page