Record and deterministically replay AI agent executions

These details have not been verified by PyPI

Project links

Project description

Litmus

Record and deterministically replay AI agent executions.

Litmus captures every LLM and tool call your agent makes, then replays them deterministically — same inputs, same outputs, no real API calls. Debug production failures, test resilience with fault injection, and gate deploys with reliability scoring.

pip install litmus-trace

Quick Start — Zero Code Changes

# Record your agent (wraps the process, captures all LLM calls)
litmus run python my_agent.py

# Replay deterministically (no API key needed, no real calls made)
litmus run --replay ./traces/lt-abc123.trace.json python my_agent.py

# What happens when the LLM refuses? Times out? Returns an error?
litmus run --replay trace.json --fault llm_refuse:step=0 python my_agent.py

Your agent code stays completely unchanged. Litmus patches the SDK transport layer at runtime.

What It Does

Record — Intercepts every HTTP call to LLM APIs (Anthropic, OpenAI, Mistral, 14+ providers). Saves the full request and response as a trace file.

Replay — Feeds recorded responses back to your agent. The agent runs the same code path — same tool calls, same final output — without hitting any real API. No API key needed.

Fault Injection — Mutate recorded responses to test resilience. What happens when Claude refuses? When GPT returns a 500? When the API times out? Find out without waiting for it to happen in production.

CI Gating — Score your trace corpus for reliability and block deploys that drop below a threshold.

litmus ci ./traces --threshold 85
# Exit code 1 if score < 85 — blocks the deploy

Three Ways to Use It

1. CLI Wrapper (recommended — zero code changes)

litmus run python my_agent.py

2. One-Line Python API

import litmus

litmus.record()
# ... your existing agent code, unchanged ...
litmus.stop()

3. Proxy Mode (any language, advanced use)

litmus proxy --mode record
# Then point your SDK:
ANTHROPIC_BASE_URL=http://localhost:8787/anthropic python my_agent.py

Fault Injection

Test how your agent handles failures — before they happen in production.

# LLM refuses to help
litmus run --replay trace.json --fault llm_refuse:step=0 python agent.py

# LLM returns a 500 error
litmus replay trace.json --fault llm_error:step=0

# LLM times out
litmus replay trace.json --fault llm_timeout:step=0

# LLM hallucinates (returns plausible but wrong answer)
litmus replay trace.json --fault llm_hallucinate:step=1

CI/CD Integration

# Score all traces — exit non-zero if below threshold
litmus ci ./traces --threshold 85

# Verbose output with per-trace breakdown
litmus ci ./traces --threshold 80 --verbose

# JSON output for pipeline parsing
litmus ci ./traces --threshold 85 --json-output report.json

Scores across three dimensions:

Correctness — did the agent complete without errors?
Resilience — how does it handle faults?
Efficiency — reasonable call count, no infinite loops?

Supported Providers

Works with any LLM API out of the box:

Provider	Status
Anthropic (Claude)	Tested
OpenAI (GPT)	Tested
Google (Gemini)	Supported
Mistral	Supported
Cohere	Supported
Groq	Supported
Together AI	Supported
Fireworks AI	Supported
DeepSeek	Supported
Perplexity	Supported
OpenRouter	Supported
Ollama (local)	Supported
vLLM (local)	Supported
LM Studio (local)	Supported

Custom/self-hosted models:

litmus proxy --provider my-model=https://my-finetuned-llama.example.com/v1

CLI Reference

litmus run          Wrap a command to record/replay (zero code changes)
litmus proxy        Start the recording/replay proxy server
litmus replay       Replay a trace with optional fault injection
litmus view         Pretty-print a trace file
litmus ci           Score traces and gate deploys
litmus providers    List all supported providers

How It Works

Litmus monkey-patches the httpx transport layer used by both Anthropic and OpenAI Python SDKs. When you call client.messages.create(...), Litmus intercepts the HTTP request before it leaves your machine.

Record mode: The real API call goes through. Litmus captures the request and response, then saves them to a trace file. API keys are automatically redacted.

Replay mode: The real API is never called. Litmus serves the recorded response directly from the trace file. Your agent gets the exact same response it got during recording — same tool calls, same content, same stop reason.

Security

API keys (Authorization, x-api-key) are automatically redacted from trace headers
Use --compact to strip request bodies for smaller trace files
Note: message content in request/response bodies is NOT redacted — don't include secrets in your prompts

Limitations

Python only — the monkey-patch approach (litmus run, litmus.record()) requires Python. Use proxy mode for other languages.
httpx-based SDKs — works with SDKs that use httpx under the hood (Anthropic, OpenAI, Mistral, Cohere, etc). SDKs using requests or aiohttp are not intercepted.
Sequential replay — responses are served in recorded order. Agents that make calls in a different order on replay will get mismatched responses.
No tool call recording — only LLM API calls are captured. External tool calls (database, HTTP APIs) are not recorded.

Community

Discord — chat, bugs, feature requests
GitHub Issues — bug reports
PyPI — package

Why Litmus?

Observability tools (LangSmith, Langfuse) tell you what happened. They log traces.

Litmus tells you what would happen. Record a production trace, replay it 100 times with different faults, and know exactly how your agent breaks — before your users find out.

LangSmith is the dashcam. Litmus is the crash test facility.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Mar 25, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litmus_trace-0.1.1.tar.gz (51.5 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litmus_trace-0.1.1-py3-none-any.whl (28.8 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file litmus_trace-0.1.1.tar.gz.

File metadata

Download URL: litmus_trace-0.1.1.tar.gz
Upload date: Mar 25, 2026
Size: 51.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for litmus_trace-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b6f6a8e6c4b6ccec8e86af0456090c0c0f38633845712ed76c440b37dc34145e`
MD5	`4b303ca4abd4fbcc4b7363f343c2fcca`
BLAKE2b-256	`fd41014af18718763041b422359d5371ecd1bb095c2809c690ea3572babbaa8a`

See more details on using hashes here.

File details

Details for the file litmus_trace-0.1.1-py3-none-any.whl.

File metadata

Download URL: litmus_trace-0.1.1-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 28.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for litmus_trace-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`310604a4466ee0f324df3adde4393d201e548d80c892f95c977bd6733d42d956`
MD5	`24acdccb351d1fdfd2246b7feb46053e`
BLAKE2b-256	`d9c171bd688f5648c4a22267e983cbd5cfab018d77da2021594337ddbe673532`

See more details on using hashes here.

litmus-trace 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Litmus

Quick Start — Zero Code Changes

What It Does

Three Ways to Use It

1. CLI Wrapper (recommended — zero code changes)

2. One-Line Python API

3. Proxy Mode (any language, advanced use)

Fault Injection

CI/CD Integration

Supported Providers

CLI Reference

How It Works

Security

Limitations

Community

Why Litmus?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes