The DVR for AI Agents - Record, visualize, and time-travel through agent execution

These details have not been verified by PyPI

Project links

Project description

Agent VCR

Time-travel debugging for AI agents. Now with 🛡️ OpenHands Sentinel — real-time code quality guardian.

🛡️ OpenHands Sentinel

"Code is cheap now. Good code is not cheap because you need to check that it actually works." — Graham Neubig, OpenHands Chief Scientist

OpenHands Sentinel is a local-first code quality guardian that watches AI agents write code and stops the codebase from becoming a monster — in real time. Zero API keys. Zero cloud. Zero external dependencies.

How It Works

Agent writes a file
        ↓
Sentinel intercepts via EventStream hook
        ↓
Runs instant AST analysis:
  ✗ Duplicate function detection (cross-file, trajectory-aware)
  ✗ Function length explosion (> 50 lines)
  ✗ Cyclomatic complexity spikes
  ✗ File growth rate anomalies
  ✗ Parameter bloat detection
        ↓
Warns the agent → Agent self-corrects
        ↓
Everything recorded in agent-vcr JSONL (full audit trail)

Quick Start

# Scan any codebase an AI agent wrote
sentinel scan ./my-project

# Or hook into OpenHands natively (3 lines)

from openhands_sentinel import Sentinel
from agent_vcr import VCRRecorder

recorder = VCRRecorder()
sentinel = Sentinel(recorder=recorder)
sentinel.attach(runtime.event_stream)  # auto-intercepts every file write

Demo

python examples/sentinel_demo.py

STEP 1: Agent writes auth/utils.py
🛡️ SENTINEL: auth/utils.py — CLEAN ✓

STEP 2: Agent writes handlers.py (massive monolithic function)
🛡️ SENTINEL: VIOLATIONS DETECTED!
  CRITICAL  `hash_password()` already exists in auth/utils.py:8
  CRITICAL  `handle_auth_request()` is 109 lines (max 40)
  CRITICAL  Cyclomatic complexity 32 (max 8)
  WARNING   9 parameters (max 5)

STEP 3: Agent SELF-CORRECTS
🛡️ SENTINEL: handlers.py — CLEAN ✓ All issues resolved!

📼 Audit trail saved to .vcr/sentinel-demo/sentinel-demo.vcr

Why This Exists

Without Sentinel	With Sentinel
Agent writes bad code	Agent writes bad code
Human reviews PR	Sentinel catches instantly
Human rejects PR	Agent self-corrects
Agent rewrites	(already fixed)
Human reviews again	Zero human time
Cost: 2× LLM + human time	Cost: 1 extra LLM call

The Problem

Building multi-step AI agents (LangGraph, CrewAI, OpenHands) is painfully slow to debug.

When your agent fails on step 8 out of 10, observability tools like LangSmith or LangFuse only show you what went wrong. To fix it, you patch the code and re-run all 10 steps from scratch. Every typo costs minutes of wall time and dollars in wasted tokens.

The Solution

Agent VCR records your agent's complete state at every step. When something breaks, you rewind to the failing step, edit the state, and resume execution from that exact point. No re-running the whole chain.

LangSmith shows you what happened. Agent VCR lets you change it.

Quick Start

pip install ai-agent-vcr

from agent_vcr import VCRRecorder, VCRPlayer

# Record your agent
recorder = VCRRecorder()
recorder.start_session("bug_hunt")
# ... your agent code ...
recorder.save()

# Time-travel and fix
player = VCRPlayer.load(".vcr/bug_hunt.vcr")
state = player.goto_frame(2)      # jump to step 2
state["prompt"] = "Fixed prompt"   # fix the state
player.resume(from_frame=2)        # continue from there

What It Does

Time Travel — Jump back to any step. Full state snapshot at every node.
State Injection & Resume — Edit the state at any frame — fix a prompt, patch tool output, inject context — then resume mid-chain.
ACID Transactions — Wrap agent execution in real database-style transactions backed by git. Rollback physically reverts files on disk, not just in-memory state.
Golden Run Cache — Save successful runs as replayable paths. Next time you hit the same task, skip all LLM calls. Same task, zero tokens, instant.
React Dashboard — Run vcr-server, open localhost:8000. Glassmorphism UI for inspecting state, viewing JSON diffs, live WebSocket streaming.
TUI Debugger — Run vcr-tui in your terminal. Navigate frames, press e to edit state, press r to resume.
Visual Diffs — Color-coded state mutation tracking in Dashboard and TUI.
DAG Visualization — See parallel execution branches, search/filter sessions by tags.
Framework Agnostic — 1-line integration with LangGraph, CrewAI, or raw Python.
Git-Friendly Storage — JSONL files, version controllable, append-only.
Production Safe — <5ms overhead per frame. Async-native.

ACID Transactions

Databases solved the partial failure problem 40 years ago with transactions. Agents have the exact same problem — when your agent fails mid-run, you don't just have bad in-memory state. You have files written to disk, commits made, half a codebase that shouldn't exist. Current tools only roll back the state object. The filesystem stays polluted.

Agent VCR wraps agent execution in real transactional semantics:

from agent_vcr import VCRRecorder
from agent_vcr.integrations.openhands import ACIDWorkspace

recorder = VCRRecorder()
acid = ACIDWorkspace("/my/workspace", recorder=recorder)

acid.begin(session_id="task-001")        # snapshot workspace, isolated branch
acid.savepoint(state, node_name="coder") # checkpoint filesystem + state together
acid.rollback(to_frame_index=3)          # git reset --hard to savepoint
acid.commit()                            # merge clean branch into main

BEGIN creates an isolated git branch. Parallel agents can't interfere with each other.
SAVEPOINT checkpoints both the VCR state and the filesystem. Every frame has a matching git commit.
ROLLBACK runs git reset --hard to a previous savepoint. Files your agent hallucinated are gone from disk — not hidden, deleted.
COMMIT merges the successful branch back into main.

Golden Run Cache

When your agent succeeds, save the entire execution as a golden path. Next time you run the same task, replay the cached outputs directly — skipping every LLM call. Only re-run steps whose inputs actually changed.

from agent_vcr.golden_cache import GoldenRunCache

cache = GoldenRunCache()

# After a successful run
cache.save_golden_run("Build a REST API with JWT auth", recorder)

# Next time — instant, zero cost
outputs, ledger = cache.replay("Build a REST API with JWT auth")
print(ledger)  # CostLedger(saved=100% | $0.0123 | 4100 tokens | 2349ms)

The CostLedger tracks original vs replay tokens, dollars saved, latency saved, and percentage reduction.

Who Is This For?

If you are...	Agent VCR helps you...
An AI engineer debugging LangGraph agents	Rewind to the exact failing step, fix state, resume
A team lead reviewing agent behavior	Compare execution paths side-by-side with full state diffs
A researcher iterating on prompts	Fork from any step, change the prompt, see how downstream behavior changes
Building production agents	Record every run in JSONL for audit trails and regression testing
Running agents at scale	Cache successful runs and replay them at zero cost

Comparison

Feature	Agent VCR	LangSmith	LangFuse	Arize Phoenix
Record execution traces	Yes	Yes	Yes	Yes
Time-travel to any step	Yes	No	No	No
Edit state and resume	Yes	No	No	No
Fork from any frame	Yes	No	No	No
ACID transactions (filesystem rollback)	Yes	No	No	No
Golden Run Cache (zero-cost replay)	Yes	No	No	No
Compare execution runs	Yes	Yes	Partial	Partial
Self-hosted, local-first	Yes	No (cloud)	Yes	Yes
Git-friendly format (JSONL)	Yes	No	No	No
Framework agnostic	Yes	LangChain only	Yes	Yes
Zero external deps	Yes	Cloud required	Cloud required	Yes
Setup lines	3	~15	~10	~10

Integrations

LangGraph

from langgraph.graph import StateGraph
from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import VCRLangGraph

graph = StateGraph()
graph.add_node("planner", planner_node)
graph.add_node("coder", coder_node)
graph.add_edge("planner", "coder")

# One line to add recording
recorder = VCRRecorder()
graph = VCRLangGraph(recorder).wrap_graph(graph)

result = graph.invoke({"query": "Build a todo app"})

Raw Python

from agent_vcr.integrations.langgraph import vcr_record

recorder = VCRRecorder()

@vcr_record(recorder, node_name="my_function")
def my_function(data):
    return process(data)

result = my_function({"key": "value"})

CrewAI

Agent VCR hooks into CrewAI's step_callback and task_callback for automatic frame capture.

from crewai import Crew, Agent, Task
from agent_vcr import VCRRecorder
from agent_vcr.integrations.crewai import VCRCrewAI, vcr_task

recorder = VCRRecorder()
recorder.start_session("crew_debug_run")

# Wrap the whole crew — records every thought, tool call, and task
crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
vcr_crew = VCRCrewAI(recorder)
result = vcr_crew.kickoff(crew)
recorder.save()

# Or decorate individual functions
@vcr_task(recorder, task_name="research_step")
def research(context: dict) -> str:
    return "findings..."

Install with:

pip install "ai-agent-vcr[crewai]"

See examples/crewai_integration.py for a full runnable demo.

Storage Format

Agent VCR uses JSONL (JSON Lines):

{"type": "session", "data": {"session_id": "abc123", "created_at": "2024-01-01T00:00:00Z", ...}}
{"type": "frame", "data": {"frame_id": "...", "node_name": "planner", "input_state": {...}, "output_state": {...}, ...}}
{"type": "frame", "data": {...}}

Human-readable
Git-diffable
Append-only (efficient for streaming)
Line-by-line parsing (no need to load the entire file)

Performance

Recording overhead is continuously benchmarked in CI to stay under 5ms per frame.

pytest tests/benchmarks/ -v

API Reference

VCRRecorder

class VCRRecorder:
def start_session(
    self,
    session_id: str = None,
    parent_session_id: str = None,
    forked_from_frame: int = None,
    metadata: dict = None,
    tags: list[str] = None,
) -> Session

def record_step(
    self,
    node_name: str,
    input_state: dict,
    output_state: dict,
    metadata: FrameMetadata = None,
    frame_type: FrameType = FrameType.NODE_EXECUTION,
) -> Frame

def record_llm_call(...)
def record_tool_call(...)
def record_error(...)
def save(self) -> Path
def fork(self, from_frame: int, ...) -> VCRRecorder

VCRPlayer

class VCRPlayer:
@classmethod
def load(cls, filepath: str) -> VCRPlayer

def goto_frame(self, index: int) -> dict
def get_frame(self, index: int) -> Frame
def list_nodes(self) -> list[str]
def get_errors(self) -> list[Frame]
def compare_frames(self, a: int, b: int) -> dict
def resume(self, agent_callable: Callable, config: ResumeConfig) -> str
def export_state(self, frame_index: int) -> dict

ACIDWorkspace

class ACIDWorkspace:
def __init__(self, workspace_path: str, recorder: VCRRecorder = None)
def begin(self, session_id: str) -> None
def savepoint(self, state: dict, node_name: str) -> None
def rollback(self, to_frame_index: int) -> None
def commit(self) -> None

GoldenRunCache

class GoldenRunCache:
def __init__(self, cache_dir: str = ".vcr/golden")
def save_golden_run(self, task: str, recorder: VCRRecorder) -> str
def replay(self, task: str) -> tuple[list[dict], CostLedger]
def invalidate(self, task: str) -> bool

ResumeConfig

class ResumeConfig:
from_frame: int              # Frame to resume from
new_session_id: str = None   # Optional ID for forked session
state_overrides: dict = {}   # State changes to apply
mode: ResumeMode = FORK      # FORK, REPLAY, or MOCK
skip_nodes: list[str] = []   # Nodes to skip during replay
inject_mocks: dict = {}      # Mock values for dependencies

Examples

See the examples/ directory:

basic_usage.py — Recording and playback
time_travel_demo.py — Full time-travel workflow
langgraph_integration.py — LangGraph auto-instrumentation
acid_golden_run.py — ACID transactions and Golden Run Cache

python examples/acid_golden_run.py

Contributing

Contributions welcome. See CONTRIBUTING.md for guidelines.

Development Setup

git clone https://github.com/agent-vcr/agent-vcr.git
cd agent-vcr
pip install -e ".[dev]"

Running Tests

pytest tests/unit/ -v
pytest tests/integration/ -v
pytest tests/e2e/ -v
pytest tests/benchmarks/ -v
pytest --cov=agent_vcr --cov-report=html

Roadmap

Core recording and playback
Time-travel resume
FastAPI server with WebSocket
LangGraph integration
Async recorder and player
Terminal TUI debugger (vcr-tui)
CI/CD integrations
React dashboard
CrewAI integration
ACID Transactions (git-backed filesystem rollback)
Golden Run Cache (zero-cost replay of successful runs)
AutoGen integration
Cloud storage backend
Collaborative debugging

License

MIT License — see LICENSE for details.

Acknowledgments

Inspired by:

LangSmith — the observability paradigm
GDB — the time-travel debugging concept
Chrome DevTools — the UX patterns

Built by the Agent VCR community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.0

Apr 6, 2026

0.5.0

Mar 25, 2026

0.4.0

Mar 21, 2026

0.3.2

Mar 8, 2026

0.3.1

Mar 8, 2026

0.3.0

Mar 7, 2026

0.2.0

Mar 7, 2026

0.1.1

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agent_vcr-0.6.0.tar.gz (202.6 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_agent_vcr-0.6.0-py3-none-any.whl (159.3 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file ai_agent_vcr-0.6.0.tar.gz.

File metadata

Download URL: ai_agent_vcr-0.6.0.tar.gz
Upload date: Apr 6, 2026
Size: 202.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ai_agent_vcr-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`e5572ddfa0c20768c0293ac1e0cc540d165c7435eab26f5a717098f2851f3cee`
MD5	`e734a5c1c6e42a4a252f232672f5b0d7`
BLAKE2b-256	`b5d4b6caf3c445586f6149fa5ca754feb11656ce5554969243df094b83e32f2e`

See more details on using hashes here.

File details

Details for the file ai_agent_vcr-0.6.0-py3-none-any.whl.

File metadata

Download URL: ai_agent_vcr-0.6.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 159.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ai_agent_vcr-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`453da55723719c671443887d041ec5d60a6f4c93443e27eb265658b3f00cbe54`
MD5	`1d656bc7c431599b6492434657ddea00`
BLAKE2b-256	`c2c3a0444ded146b5f6bd6b3605e26d967decb920c1a400c7b3eee421f073806`

See more details on using hashes here.

ai-agent-vcr 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent VCR

🛡️ OpenHands Sentinel

How It Works

Quick Start

Demo

Why This Exists

The Problem

The Solution

Quick Start

What It Does

ACID Transactions

Golden Run Cache

Who Is This For?

Comparison

Integrations

LangGraph

Raw Python

CrewAI

Storage Format

Performance

API Reference

VCRRecorder

VCRPlayer

ACIDWorkspace

GoldenRunCache

ResumeConfig

Examples

Contributing

Development Setup

Running Tests

Roadmap

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes