Skip to main content

Record and deterministically replay AI agent executions

Project description

Litmus

Record and deterministically replay AI agent executions.

Litmus captures every LLM and tool call your agent makes, then replays them deterministically — same inputs, same outputs, no real API calls. Debug production failures, test resilience with fault injection, and gate deploys with reliability scoring.

pip install litmus-trace

Quick Start — Zero Code Changes

# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py

# Replay deterministically (no API key needed, no real calls made)
litmus run --replay ./traces/lt-abc123.trace.json python my_agent.py

# What happens when the LLM refuses? Times out? Returns an error?
litmus run --replay trace.json --fault llm_refuse:step=0 python my_agent.py

Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.

What It Does

Record — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file.

Replay — Feeds recorded responses back to your agent. The agent runs the same code path — same tool calls, same final output — without hitting any real API. No API key needed.

Fault Injection — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out? Find out without waiting for it to happen in production.

CI Gating — Score your trace corpus for reliability and block deploys that drop below a threshold.

litmus ci ./traces --threshold 85
# Exit code 1 if score < 85 — blocks the deploy

Three Ways to Use It

1. CLI Wrapper (recommended — zero code changes)

litmus run python my_agent.py

2. One-Line Python API

import litmus

litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()

3. Proxy Mode (any language, advanced use)

litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py

Fault Injection

Test how your agent handles failures — before they happen in production.

# LLM refuses to help
litmus run --replay trace.json --fault llm_refuse:step=0 python agent.py

# LLM returns a 500 error
litmus replay trace.json --fault llm_error:step=0

# LLM times out
litmus replay trace.json --fault llm_timeout:step=0

# LLM hallucinates (returns plausible but wrong answer)
litmus replay trace.json --fault llm_hallucinate:step=1

CI/CD Integration

# Score all traces — exit non-zero if below threshold
litmus ci ./traces --threshold 85

# Verbose output with per-trace breakdown
litmus ci ./traces --threshold 80 --verbose

# JSON output for pipeline parsing
litmus ci ./traces --threshold 85 --json-output report.json

Scores across three dimensions:

  • Correctness — did the agent complete without errors?
  • Resilience — how does it handle faults?
  • Efficiency — reasonable call count, no infinite loops?

Supported Providers

Works with any LLM API out of the box:

Provider Status
Anthropic (Claude) Tested
OpenAI (GPT) Tested
Google (Gemini) Supported
Mistral Supported
Cohere Supported
Groq Supported
Together AI Supported
Fireworks AI Supported
DeepSeek Supported
Perplexity Supported
OpenRouter Supported
Ollama (local) Supported
vLLM (local) Supported
LM Studio (local) Supported

Custom/self-hosted models:

litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1

CLI Reference

litmus run          Wrap a command to record/replay (zero code changes)
litmus proxy        Start the recording/replay proxy server
litmus replay       Replay a trace with optional fault injection
litmus view         Pretty-print a trace file
litmus ci           Score traces and gate deploys
litmus providers    List all supported providers

How It Works

Litmus monkey-patches the httpx transport layer used by both Anthropic and OpenAI Python SDKs. When you call client.messages.create(...), Litmus intercepts the HTTP request before it leaves your machine.

Record mode: The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.

Replay mode: The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.

Security

  • API keys (Authorization, x-api-key) are automatically redacted from trace headers
  • Use --compact to strip request bodies for smaller trace files
  • Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts

Limitations

  • Python only — the monkey-patch approach (litmus run, litmus.record()) requires Python. Use proxy mode for other languages.
  • httpx-based SDKs — works with SDKs that use httpx under the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs using requests or aiohttp are not intercepted.
  • Sequential replay — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
  • No tool call recording — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.

Community

Why Litmus?

Observability tools (LangSmith, Langfuse) tell you what happened. They log traces.

Litmus tells you what would happen. Record a production trace, replay it 100 times with different faults, and know exactly how your agent breaks — before your users find out.

LangSmith is the dashcam. Litmus is the crash test facility.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litmus_trace-0.1.1.tar.gz (51.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litmus_trace-0.1.1-py3-none-any.whl (28.8 kB view details)

Uploaded Python 3

File details

Details for the file litmus_trace-0.1.1.tar.gz.

File metadata

  • Download URL: litmus_trace-0.1.1.tar.gz
  • Upload date:
  • Size: 51.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for litmus_trace-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b6f6a8e6c4b6ccec8e86af0456090c0c0f38633845712ed76c440b37dc34145e
MD5 4b303ca4abd4fbcc4b7363f343c2fcca
BLAKE2b-256 fd41014af18718763041b422359d5371ecd1bb095c2809c690ea3572babbaa8a

See more details on using hashes here.

File details

Details for the file litmus_trace-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: litmus_trace-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for litmus_trace-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 310604a4466ee0f324df3adde4393d201e548d80c892f95c977bd6733d42d956
MD5 24acdccb351d1fdfd2246b7feb46053e
BLAKE2b-256 d9c171bd688f5648c4a22267e983cbd5cfab018d77da2021594337ddbe673532

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page