Record and replay LLM API calls for deterministic testing
Project description
llm-mock
Record real LLM responses once, replay them in tests forever — no API key required, no cost, no non-determinism.
# Record once against the real API (run locally with your API key)
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
result = my_pipeline("Summarize this document...")
# Replay in tests — no API key, no cost, deterministic
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
result = my_pipeline("Summarize this document...")
assert "key points" in result
Why
- API calls during tests are expensive. A CI run hitting real LLM APIs can cost dollars per run at scale.
- LLM outputs are non-deterministic. Even at
temperature=0, responses can vary across model versions. - Your production code stays untouched. llm-mock intercepts at the HTTP transport layer — no changes to application code required.
llm-mock records and replays at the structured request level (model + messages + temperature), stores human-readable JSON fixtures, and integrates natively with pytest.
Installation
# PyPI release coming soon — install from source for now:
git clone https://github.com/autopost/llm-mock.git
cd llm-mock
pip install -e .
Runtime dependencies: httpx, respx, pydantic
How to use
Your production code — untouched
# my_app/pipeline.py
import anthropic
client = anthropic.Anthropic()
def summarize(text: str) -> str:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
messages=[{"role": "user", "content": f"Summarize: {text}"}],
)
return message.content[0].text
pipeline.py has zero knowledge of llm-mock. No imports, no changes needed.
Step 1 — Record (run once, locally)
Create a small script or a dedicated test that runs with mode="record". You need a real API key for this step.
# record_fixtures.py
from llm_mock import llm_mock
from my_app.pipeline import summarize
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
result = summarize("Long article about climate change...")
print(result) # real response from the API
ANTHROPIC_API_KEY=sk-... python record_fixtures.py
This creates tests/fixtures/summarize.json. Commit this file to git.
Step 2 — Replay (in tests, forever)
Use the pytest decorator — no with block needed inside the test:
# tests/test_pipeline.py
import pytest
from my_app.pipeline import summarize
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
result = summarize("Long article about climate change...")
assert "climate" in result
pytest # no API key needed, runs offline, instant
The decorator auto-discovers the fixture path relative to the test file — fixture="summarize" looks for tests/fixtures/summarize.json when the test lives in tests/.
llm-mock intercepts the httpx call the Anthropic SDK makes internally and returns the saved response — your test code calls summarize() exactly as it would in production.
Alternative: use the context manager directly if you need more control:
from llm_mock import llm_mock
def test_summarize():
with llm_mock(mode="replay", fixture="tests/fixtures/summarize"):
result = summarize("Long article about climate change...")
assert "climate" in result
Step 3 — Re-record when things change
If you change the prompt, update the model, or want to refresh fixtures:
ANTHROPIC_API_KEY=sk-... python record_fixtures.py # overwrites old fixture
git add tests/fixtures/summarize.json
git commit -m "refresh summarize fixture"
Quick start (direct API usage)
A complete working example from scratch.
1. Install
git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
2. Save your API key
echo 'export ANTHROPIC_API_KEY=sk-ant-api03-...' > .env
echo '.env' >> .gitignore
3. Create a record script
Create try_record.py:
import anthropic
from llm_mock import llm_mock
client = anthropic.Anthropic()
with llm_mock(mode="record", fixture="fixtures/hello"):
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=64,
messages=[{"role": "user", "content": "Say hello in one sentence."}],
)
print("Response:", message.content[0].text)
print("Fixture saved to fixtures/hello.json")
4. Run it
source .env && .venv/bin/python try_record.py
You should see the real response printed and fixtures/hello.json created.
5. Verify the fixture
llm-mock list tests/fixtures/hello
6. Replay without an API key
Create try_replay.py:
import anthropic
from llm_mock import llm_mock
client = anthropic.Anthropic(api_key="fake-key") # key is irrelevant in replay
with llm_mock(mode="replay", fixture="fixtures/hello"):
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=64,
messages=[{"role": "user", "content": "Say hello in one sentence."}],
)
print("Replayed:", message.content[0].text)
.venv/bin/python try_replay.py
The exact same response is returned instantly — no network call made.
7. Write a test with the pytest decorator
# tests/test_hello.py
import anthropic
import pytest
client = anthropic.Anthropic(api_key="fake-key")
@pytest.mark.llm_replay(fixture="hello")
def test_hello():
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=64,
messages=[{"role": "user", "content": "Say hello in one sentence."}],
)
assert message.content[0].text # replayed from fixtures/hello.json
.venv/bin/pytest tests/test_hello.py -v
CLI
Inspect and manage fixture files from the terminal.
Note: activate your virtual environment first so
llm-mockis on your PATH:source .venv/bin/activateOr run it directly with
.venv/bin/llm-mock <command>.
llm-mock list <fixture>
Show all recorded interactions in a fixture file:
$ llm-mock list tests/fixtures/summarize
Fixture : tests/fixtures/summarize.json
Provider: anthropic
Interactions: 2
1. a3f2c1d4e5b6… claude-sonnet-4-6 2026-04-23T10:00:00
"Summarize this document about climate change..."
2. b4g3d2e5f6c7… claude-haiku-4-5-20251001 2026-04-24T11:00:00
"What is the capital of France?"
llm-mock clear <fixture>
Delete an entire fixture file:
llm-mock clear tests/fixtures/summarize
Delete a single interaction by hash:
llm-mock clear tests/fixtures/summarize --hash a3f2c1d4e5b6
How it works
Record mode:
Your code → Anthropic/OpenAI SDK → httpx
→ llm-mock intercepts → forwards to real API
→ saves response to fixture JSON
→ returns response to your code
Replay mode:
Your code → Anthropic/OpenAI SDK → httpx
→ llm-mock intercepts → looks up fixture by SHA256(model + messages + temperature)
→ returns saved response — no network call made
Request matching uses SHA256 of (model, messages, temperature). Same request always hits the same fixture entry. Different temperature or different message content → different fixture entry.
API reference
llm_mock(mode, fixture, provider="all")
Context manager that activates record or replay mode.
| Parameter | Type | Description |
|---|---|---|
mode |
"record" | "replay" |
Whether to hit the real API and save, or return from fixture |
fixture |
str |
Path to the fixture file. .json extension added automatically if omitted |
provider |
"anthropic" | "openai" | "all" |
Which provider(s) to intercept. Default: "all" |
from llm_mock import llm_mock
with llm_mock(mode="replay", fixture="tests/fixtures/my_test", provider="anthropic"):
...
Exceptions
| Exception | When raised |
|---|---|
FixtureNotFoundError |
Replay mode: fixture file missing, or no matching hash in file |
FixtureParseError |
Fixture file exists but contains invalid JSON |
from llm_mock import llm_mock, FixtureNotFoundError
try:
with llm_mock(mode="replay", fixture="tests/fixtures/missing"):
client.messages.create(...)
except FixtureNotFoundError as e:
print(e) # includes hint to run in record mode first
Fixture file format
Fixture files are plain JSON — readable, diffable, committable.
{
"version": "1.0",
"provider": "anthropic",
"interactions": [
{
"hash": "a3f2c1...",
"request": {
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Say hello."}],
"max_tokens": 64
},
"response": {
"id": "msg_01XYZ",
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "Hello! How can I help you today?"}],
"model": "claude-sonnet-4-6",
"stop_reason": "end_turn",
"usage": {"input_tokens": 10, "output_tokens": 9}
},
"recorded_at": "2026-04-23T10:00:00+00:00"
}
]
}
Multiple interactions (from different requests) are stored in the same file. Re-recording an existing hash overwrites only that entry.
Supported providers
| Provider | Intercepted endpoint | Status |
|---|---|---|
| Anthropic | api.anthropic.com/v1/messages |
Supported |
| OpenAI | api.openai.com/v1/chat/completions |
Supported |
Streaming (stream=True) |
— | v1.1 |
Comparison
| Tool | Record mode | Native SDK support | In-process |
|---|---|---|---|
| llm-mock | yes | yes (Anthropic + OpenAI) | yes |
| llm_recorder | yes | no (LiteLLM only) | yes |
| AIMock | no | yes | no (HTTP server) |
| vcr-langchain | yes | no (LangChain only) | yes |
Development
git clone https://github.com/yourname/llm-mock
cd llm-mock
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
Roadmap
- v0.2 —
automode, disable via env var (LLM_MOCK_DISABLED) - v1.1 — streaming support
- v2 — shared fixtures for teams, semantic matching, web dashboard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_mock-0.1.0.tar.gz.
File metadata
- Download URL: llm_mock-0.1.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32d4e33aa5f4b5474f7889bcf4ae10d2d2918e245089be935d981fbbf0ebb9c4
|
|
| MD5 |
28a2206df15e7a02240b50c8d6c4e8f0
|
|
| BLAKE2b-256 |
527bee5f61abe23b965a954caf6f021d8b6297ad7246ad1a93d8722a5f474e62
|
File details
Details for the file llm_mock-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_mock-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d46e38a6e527347ba97f663cf439b6aea989a24b161f58dc57b886ca30a3f5ff
|
|
| MD5 |
33e2b5879fc04a208bea5486d980e527
|
|
| BLAKE2b-256 |
d29982e90390d9e53881fd90ab65c8880e381b8aff0135d57ec34b6f0a067d7f
|