Record and deterministically replay AI agent executions
Project description
Litmus
Record and deterministically replay AI agent executions.
Litmus captures every LLM and tool call your agent makes, then replays them deterministically — same inputs, same outputs, no real API calls. Debug production failures, test resilience with fault injection, and gate deploys with reliability scoring.
pip install litmus-trace
Quick Start — Zero Code Changes
# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py
# Replay deterministically (no API key needed, no real calls made)
litmus run --replay ./traces/lt-abc123.trace.json python my_agent.py
# What happens when the LLM refuses? Times out? Returns an error?
litmus run --replay trace.json --fault llm_refuse:step=0 python my_agent.py
Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.
What It Does
Record — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file.
Replay — Feeds recorded responses back to your agent. The agent runs the same code path — same tool calls, same final output — without hitting any real API. No API key needed.
Fault Injection — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out? Find out without waiting for it to happen in production.
CI Gating — Score your trace corpus for reliability and block deploys that drop below a threshold.
litmus ci ./traces --threshold 85
# Exit code 1 if score < 85 — blocks the deploy
Three Ways to Use It
1. CLI Wrapper (recommended — zero code changes)
litmus run python my_agent.py
2. One-Line Python API
import litmus
litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()
3. Proxy Mode (any language, advanced use)
litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py
Fault Injection
Test how your agent handles failures — before they happen in production.
# LLM refuses to help
litmus run --replay trace.json --fault llm_refuse:step=0 python agent.py
# LLM returns a 500 error
litmus replay trace.json --fault llm_error:step=0
# LLM times out
litmus replay trace.json --fault llm_timeout:step=0
# LLM hallucinates (returns plausible but wrong answer)
litmus replay trace.json --fault llm_hallucinate:step=1
CI/CD Integration
# Score all traces — exit non-zero if below threshold
litmus ci ./traces --threshold 85
# Verbose output with per-trace breakdown
litmus ci ./traces --threshold 80 --verbose
# JSON output for pipeline parsing
litmus ci ./traces --threshold 85 --json-output report.json
Scores across three dimensions:
- Correctness — did the agent complete without errors?
- Resilience — how does it handle faults?
- Efficiency — reasonable call count, no infinite loops?
Supported Providers
Works with any LLM API out of the box:
| Provider | Status |
|---|---|
| Anthropic (Claude) | Tested |
| OpenAI (GPT) | Tested |
| Google (Gemini) | Supported |
| Mistral | Supported |
| Cohere | Supported |
| Groq | Supported |
| Together AI | Supported |
| Fireworks AI | Supported |
| DeepSeek | Supported |
| Perplexity | Supported |
| OpenRouter | Supported |
| Ollama (local) | Supported |
| vLLM (local) | Supported |
| LM Studio (local) | Supported |
Custom/self-hosted models:
litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1
CLI Reference
litmus run Wrap a command to record/replay (zero code changes)
litmus proxy Start the recording/replay proxy server
litmus replay Replay a trace with optional fault injection
litmus view Pretty-print a trace file
litmus ci Score traces and gate deploys
litmus providers List all supported providers
How It Works
Litmus monkey-patches the httpx transport layer used by both Anthropic and OpenAI Python SDKs. When you call client.messages.create(...), Litmus intercepts the HTTP request before it leaves your machine.
Record mode: The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.
Replay mode: The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.
Security
- API keys (
Authorization,x-api-key) are automatically redacted from trace headers - Use
--compactto strip request bodies for smaller trace files - Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts
Limitations
- Python only — the monkey-patch approach (
litmus run,litmus.record()) requires Python. Use proxy mode for other languages. - httpx-based SDKs — works with SDKs that use
httpxunder the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs usingrequestsoraiohttpare not intercepted. - Sequential replay — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
- No tool call recording — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.
Community
- Discord — chat, bugs, feature requests
- GitHub Issues — bug reports
- PyPI — package
Why Litmus?
Observability tools (LangSmith, Langfuse) tell you what happened. They log traces.
Litmus tells you what would happen. Record a production trace, replay it 100 times with different faults, and know exactly how your agent breaks — before your users find out.
LangSmith is the dashcam. Litmus is the crash test facility.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litmus_trace-0.1.1.tar.gz.
File metadata
- Download URL: litmus_trace-0.1.1.tar.gz
- Upload date:
- Size: 51.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6f6a8e6c4b6ccec8e86af0456090c0c0f38633845712ed76c440b37dc34145e
|
|
| MD5 |
4b303ca4abd4fbcc4b7363f343c2fcca
|
|
| BLAKE2b-256 |
fd41014af18718763041b422359d5371ecd1bb095c2809c690ea3572babbaa8a
|
File details
Details for the file litmus_trace-0.1.1-py3-none-any.whl.
File metadata
- Download URL: litmus_trace-0.1.1-py3-none-any.whl
- Upload date:
- Size: 28.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
310604a4466ee0f324df3adde4393d201e548d80c892f95c977bd6733d42d956
|
|
| MD5 |
24acdccb351d1fdfd2246b7feb46053e
|
|
| BLAKE2b-256 |
d9c171bd688f5648c4a22267e983cbd5cfab018d77da2021594337ddbe673532
|