Time-travel debugger for AI agents. Record any production run, replay any failure.

These details have not been verified by PyPI

Project links

Project description

Rewind — Time-Travel Debugger for AI Agents

Record any production run. Bisect to the failing step. Mutation-test for the failure modes you have not hit yet.

$ rewind bisect run-good-7f3a run-bad-9b2c

First divergence at step 4
  Session A: run-good-7f3a  model=gpt-4o-2025-11
  Session B: run-bad-9b2c   model=gpt-4o-2026-05
  Cause:    model_version_changed
  Detail:   model changed: 'gpt-4o-2025-11' -> 'gpt-4o-2026-05'.
            Model upgrades are the highest-likelihood cause
            of behaviour shifts.

The point of Rewind is the last two lines. Other tools tell you that two runs of your agent differ. Rewind tells you why.

What Makes This Different

Cassette-style HTTP replay for LLMs is not new. VCR.py has shipped this pattern since 2010; vcr-langchain since 2023; Docker cagent shipped a nearly identical implementation in 2026 and inspired several pieces of this codebase (see docs/adr/).

What Rewind adds on top of that base layer:

Cause inference. rewind bisect classifies the reason two runs diverged: was it a model version bump, a tool returning different output, prompt drift between deploys, or model non-determinism? Every other tool in the space stops at "step N differs".

Mutation testing for agents. rewind mutate is Stryker for LLM agents. It systematically perturbs a recorded cassette: drops steps, returns 429s, truncates responses, replaces tool outputs with errors. It re-runs your agent against each mutation and reports which ones the agent silently fails. Tells you where production drift will bite before it does.

Everything else (HTTPS MITM via mitmproxy, content-addressed blobs, SSE streaming preservation, pytest-rewind) is table stakes that existing tools also do. The cause inference and mutation harness are the part that justifies the project.

Install

pip install llm-rewind
rewind init

rewind init generates a local CA cert at ~/.rewind/ca.pem for HTTPS interception. On macOS and Linux the trust step is one command; on Windows it needs Administrator. rewind init prints the exact command for your platform after generating the cert.

Three Loops

1. Reproduce a production failure

ANTHROPIC_API_KEY=sk-...
rewind record python my_agent.py
# Captured 12 LLM call(s)  ~$0.034  | 8.4s

rewind list
# 7f3a2b9c  my_agent  2026-05-23 14:35:29   12  $0.034  c0e577f

rewind replay 7f3a2b
# Replay complete  8.4s (zero LLM cost)

2. Find the exact step a regression broke

rewind bisect run-good-7f3a run-bad-9b2c
# First divergence at step 4
#   Cause: tool_output_drift
#   Detail: previous step (3, tool_call) returned different output.
#           Likely root cause is upstream; bisect that step first.

3. Pressure-test before shipping

rewind mutate 7f3a2b9c

# Mutation Report
# +-------------------+------+---------+----------------------------+
# | Mutation          | Step | Outcome | Detail                     |
# +===================+======+=========+============================+
# | empty_response    | 0    | SURVIVED| ...                        |
# | provider_500      | 0    | CRASHED | unhandled HTTPStatusError  |
# | error_response    | 4    | CHANGED | agent ignored 429 retry    |
# | truncate_response | 7    | CRASHED | JSONDecodeError            |
# +-------------------+------+---------+----------------------------+
# Survived: 9 | Changed: 3 | Crashed: 3 | Total: 15

The Crashed row is what you fix before deploying.

How It Works

Rewind runs as a local HTTPS proxy (via mitmproxy) that intercepts every LLM API call your agent makes — OpenAI, Anthropic, or Gemini. Each request and response is stored in a content-addressed blob store (SHA-256, zstd-compressed) with DuckDB metadata. Because the proxy operates at the HTTP layer, Rewind works with any language and any framework: Python, Node.js, Go, LangChain, LlamaIndex, raw SDK calls.

On replay, Rewind starts the same proxy in replay mode. Incoming requests get matched by a canonical fingerprint (match_key) that strips volatile fields like tool_call_id and credential query parameters while preserving semantic content. Matched requests get the exact recorded response bytes back. Strict mode never falls through to the live API; a cassette miss returns HTTP 599 with a structured error body and a clear X-Rewind-Cassette-Miss header. No quiet billing.

docs/ARCHITECTURE.md has the full design and the ADRs.

Comparison

Feature	Rewind	LangSmith	Braintrust	Laminar	Helicone	vcr-langchain	Docker cagent
True deterministic replay	yes	no	no	no	yes	yes	yes
Cause inference on divergence	yes	no	no	no	no	no	no
Mutation testing for agents	yes	no	no	no	no	no	no
Framework-agnostic (HTTP-level)	yes	no	no	no	yes	no	no
Local-only, no cloud	yes	no	no	no	yes	yes	yes
Open source	MIT	partial	partial	Apache	MIT	MIT	Apache

LangSmith, Braintrust, and Laminar are observability platforms — they show you what happened. vcr-langchain, Helicone, and cagent are cassette/proxy tools — they let you replay. Rewind is positioned as a debugger: replay plus the two engines (bisect cause inference, mutation testing) that turn a recording into a diagnosis.

pytest Integration

@pytest.mark.rewind(cassette="tests/cassettes/customer_support.rw")
async def test_agent_handles_refund_request():
    result = await run_customer_support_agent("I want a refund")
    assert "refund" in result.lower()

Cassettes get committed to git. CI runs them with zero API cost and no keys configured. See docs/testing/STRATEGY.md.

SDK Decorator (Python convenience)

For pure-Python agents that do not want a proxy setup:

import rewind

@rewind.session(name="customer_support", mode="record")
async def run_agent(query: str) -> str:
    ...

@rewind.tool
def search_database(query: str) -> list[dict]:
    ...

The proxy approach is recommended for any non-Python or multi-language agent.

CLI Reference

rewind init                                # generate local CA cert
rewind record <command>                    # record an agent run
rewind replay <session-id>                 # replay from cassette
rewind list                                # list recorded sessions
rewind inspect <session-id>                # inspect step details
rewind diff <a> <b>                        # compare two sessions
rewind bisect <good> <bad>                 # find divergence + classify cause
rewind mutate <session-id>                 # mutation test the agent
rewind export <session-id> [--output f.rw] # export cassette file
rewind import <cassette.rw>                # import cassette to local DB
rewind stats [--days 30]                   # cost analytics

Contributing

See CONTRIBUTING.md. Setup:

git clone https://github.com/llm-rewind/rewind
cd rewind
pip install -e ".[dev]"
pytest                  # 140 tests, no API key needed
ruff check src/ tests/ pytest_rewind/
mypy src/ pytest_rewind/ --strict

The local end-to-end tests stand up an HTTPS server and a real mitmproxy instance, so they exercise the same code path a user hits. They run on CI for Python 3.11, 3.12, and 3.13.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

May 25, 2026

This version

0.2.0 yanked

May 24, 2026

0.1.0

May 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_rewind-0.2.0.tar.gz (88.3 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_rewind-0.2.0-py3-none-any.whl (42.6 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file llm_rewind-0.2.0.tar.gz.

File metadata

Download URL: llm_rewind-0.2.0.tar.gz
Upload date: May 24, 2026
Size: 88.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llm_rewind-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c164aec9ff0c29e942a13c9e3469f05c7404b0694aa12527e3a5e665c26c7697`
MD5	`73dde4b323555f80d409745cb1da4c70`
BLAKE2b-256	`b664b712ed77c79d70f931ee5b79af138ddbe83c487c54596027bf889119d334`

See more details on using hashes here.

File details

Details for the file llm_rewind-0.2.0-py3-none-any.whl.

File metadata

Download URL: llm_rewind-0.2.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 42.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llm_rewind-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2850d3ac0ceb5efea62a2faac64544163856c339eda0759f4f636b445b1b2865`
MD5	`2b2bb703930f3f6f787812d99e10548c`
BLAKE2b-256	`9a7015c1c4c79d3249100d2984defdb6a5a3de4e19e2637f9a4e57e6c4d37997`

See more details on using hashes here.

llm-rewind 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Rewind — Time-Travel Debugger for AI Agents

What Makes This Different

Install

Three Loops

1. Reproduce a production failure

2. Find the exact step a regression broke

3. Pressure-test before shipping

How It Works

Comparison

pytest Integration

SDK Decorator (Python convenience)

CLI Reference

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes