A mock LLM server for testing. Drop-in replacement for OpenAI and Anthropic APIs.
Project description
fakellm
Run your LLM tests offline. Free, fast, deterministic.
A mock server that speaks the OpenAI and Anthropic APIs. Point your test code at it and your tests stop being slow, expensive, and flaky.
pip install fakellm
fakellm init
fakellm serve
Then in your tests:
import os
os.environ["OPENAI_BASE_URL"] = "http://localhost:9999/v1"
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:9999"
# your existing code runs unchanged
Why
Three bad options exist for testing LLM code today:
- Hit the real API — slow, expensive, flaky.
- Mock by hand — brittle, drifts, doesn't exercise streaming or tool-call code paths.
- Record-and-replay cassettes — go stale, blow up when prompts change.
fakellm is a fourth option. A local server that returns plausible responses in the right shape, controlled by a small YAML file. Same prompt → same response, every time.
Configure
fakellm.yaml:
version: 1
defaults:
fallback: deterministic_echo
rules:
- name: greeting
when:
messages_contain: "hello"
respond:
content: "Hi there!"
- name: weather_tool
when:
tools_include: get_weather
respond:
tool_calls:
- name: get_weather
arguments: { location: "San Francisco" }
- name: only_haiku
when:
model_matches: "*haiku*"
respond:
content: "Short response."
- name: rate_limit_test
when:
header.x-test-scenario: rate_limit
respond:
status: 429
error: "Rate limit exceeded"
Rules are walked top-to-bottom. First match wins. If nothing matches, you get a stable fingerprint response — same input gives the same output, forever.
Use with the OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:9999/v1", api_key="not-needed")
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "hello"}],
)
print(resp.choices[0].message.content) # → "Hi there!"
Use with the Anthropic SDK
from anthropic import Anthropic
client = Anthropic(base_url="http://localhost:9999", api_key="not-needed")
resp = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=100,
messages=[{"role": "user", "content": "hello"}],
)
print(resp.content[0].text)
Streaming works
Both APIs stream chunks the way the real ones do. Your streaming code paths get exercised.
Simulate failures
Set a header in your test to trigger a specific scenario:
client.chat.completions.create(
model="gpt-4o-mini",
messages=[...],
extra_headers={"x-test-scenario": "rate_limit"},
)
# raises a 429 just like the real API
Dashboard
Visit http://localhost:9999/_fakellm to see which rules are matching and which requests are falling through. Useful for tightening up your config.
What's in v0.1
- OpenAI
/v1/chat/completionsand Anthropic/v1/messages - Streaming for both
- Matchers:
messages_contain,model_matches,tools_include,header.* - Tool call responses
- Error/status code responses
- Deterministic fallback
- Live dashboard
Roadmap
- Multi-turn response sequences for agentic tests
- Recorded fixture mode (point at real API, capture, replay)
- pytest plugin with inline rule definitions
- More matchers (semantic similarity, JSON schema)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fakellm-0.1.0.tar.gz.
File metadata
- Download URL: fakellm-0.1.0.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e4b90d459f98cf8302aef84ed30a4556569153f2e71629fcdc8a0724ee04491
|
|
| MD5 |
0ead35c321f48c6dc8cc1a1b7dddb66a
|
|
| BLAKE2b-256 |
56582051de128da864bc6a66cc43eba539688413a0b308a46520d6a53736e0b8
|
File details
Details for the file fakellm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fakellm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d32e759015593397a2d65be815a965c018315947e7205ce2b84fbaa8911fb69
|
|
| MD5 |
5bd1c37e1f691fc48d59ea18c6f971ac
|
|
| BLAKE2b-256 |
e4eeee6626a98b7e312daf155be93b8856fe1de036e5d576c15aebbc69b5b4f3
|