Skip to main content

Deterministic record-and-replay debugger for AI agent runs

Project description

agentrr

Deterministic record-and-replay debugger for AI agent runs.

When an AI agent does something wrong in production, you usually can't reproduce it. Run it again and it takes a different path — the model samples differently, the tool returns different data, and the bug you saw is gone.

agentrr records every nondeterministic boundary an agent crosses — every LLM call, tool call, clock read, and random draw — then replays the run deterministically and offline. The agent's real logic runs again; every external answer is served from the recording. No API calls, no side effects, no cost. You step through the exact failing run as many times as you need.

It's rr / time-travel debugging, for AI agents.

The 60-second demo

From a clone of this repo:

uv sync --group dev
export PYTHONPATH=examples

Record a run:

uv run python -m agents.deterministic_support
# run_id: deterministic_support-<id>
# log: .agentrr/runs/deterministic_support-<id>.jsonl

Replay it — deterministically, with no network and no live LLM calls (the demo agent uses a mock client; replay never calls it):

uv run agentrr replay deterministic_support-<id> agents.deterministic_support:main

Now edit the agent's prompt and replay again. agentrr halts at the exact boundary where behavior first diverges, with a signature mismatch and a structural diff in the divergence report:

DivergenceError: divergence at seq 5: signature mismatch

Strict mode halts on the first mismatch; use mode="observe" to continue and collect every divergence. The report includes structural diff previews (expected_preview / observed_preview), not the error string alone.

That's the core loop: turn a one-time, irreproducible failure into a fixed artifact you can re-enter and dissect.

What it guarantees

  • Faithful replay for every captured boundary — the replayed boundary sequence exactly matches the recording (verified in CI).
  • Offline and safe — replay makes zero live LLM calls and never re-executes tools. Replaying an agent that issued a refund does not issue another.
  • Crash-safe recording — an event is durably on disk (fsync) before the agent acts on it. Verified with real SIGKILL in CI. A killed run produces a truncated log, never a holed one.
  • Honest divergence — when replay can't reproduce faithfully, it halts at the exact point and tells you, with a diff. It never silently guesses or serves a mismatched response.

What it does NOT do (by design)

Single-process, synchronous agents. No marketplace, no backend, no GUI. Concurrency, streaming-chunk replay, and multi-agent pipelines are out of scope for v0.1. See docs/contract.md for the full contract and exclusions.

How it works

agentrr intercepts at the boundaries where an agent touches nondeterminism, and freezes them on replay:

Layer Recorded Served on replay
LLM calls (OpenAI, Anthropic) full request + response + metadata recorded response
Tool calls name, args, return/error recorded result (tool never runs)
Clock / RNG / IDs every read and draw recorded values, in order

Matching is sequence-primary, signature-validated — no fuzzy search, ever. A request that doesn't match the next expected event is divergence, not a thing to paper over.

Install

From source (PyPI publish pending):

git clone https://github.com/<OWNER>/agentrr.git
cd agentrr
uv sync --group dev
export PYTHONPATH=examples   # for example agents

Development

uv sync --group dev
make test          # full suite incl. credibility gates
make durability    # SIGKILL write-before-return gate
gitleaks detect    # secret scan before you push

Reference agents

Agent Purpose
examples/agents/deterministic_support.py Golden path (mock LLM, registered tools, shims)
examples/agents/unstable_loop.py Unwrapped random — diverges on replay (by design)
examples/agents/tool_caller.py LLM → tool → LLM loop
examples/agents/broken_replay_cases.py Negative scenarios (edited prompt, missing tool, truncated/corrupt log)

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentrr-0.1.0a1.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentrr-0.1.0a1-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file agentrr-0.1.0a1.tar.gz.

File metadata

  • Download URL: agentrr-0.1.0a1.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentrr-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 59fd803254eb8cba8a66410777d6646f84a6a84ae182bc7e978931f92c3108a8
MD5 b5a2a513fcedcbd46699132b4306925a
BLAKE2b-256 6ed55a9db45934968ebe224dbca8b3b5f14cc66c28470121a83b80199ebf4ad4

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentrr-0.1.0a1.tar.gz:

Publisher: release.yml on ip174/agentrr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentrr-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: agentrr-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentrr-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 7038cad9c6025f3f11e902f13b96bb737766182966dda21a1baca1cf06fca289
MD5 f53e021a5f89efdf0045adb6df67c8ba
BLAKE2b-256 3c2e378b0278ebf647c6358ca574e2ef4de7543366fc571e9305dad82f28138b

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentrr-0.1.0a1-py3-none-any.whl:

Publisher: release.yml on ip174/agentrr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page