Pytest fixtures for the fakellm mock OpenAI/Anthropic server — spin up, reset, and assert with zero boilerplate.
Project description
pytest-fakellm
Pytest fixtures for fakellm, the mock OpenAI/Anthropic server. Spin up a server, get a clean state per test, and assert on what your code sent — with zero boilerplate.
pip install pytest-fakellm
Once installed, the fixtures are available automatically — no imports, no
conftest.py setup.
The point
Without the plugin, using fakellm in a test means starting the server, wiring a client to its URL, resetting state, and tearing it all down yourself, in every test. With the plugin, that becomes:
def test_agent_handles_search(fakellm):
fakellm.set_config_text("""
version: 1
rules:
- name: summarize
when: { messages_contain: "research" }
respond: { content: "Based on the search, I found what you were looking for." }
""")
result = run_my_agent(fakellm.openai_client(), prompt="Please research fakellm")
assert "found what you were looking for" in result
fakellm.assert_request_count(1)
The server starts once per session, state is reset before each test, and everything is torn down at the end. You never touch a port number or a subprocess.
Fixtures
| Fixture | What you get |
|---|---|
fakellm |
A FakellmServer handle with fresh conversation state for the test. |
fakellm_openai |
A ready openai.OpenAI client pointed at the (reset) server. |
fakellm_anthropic |
A ready anthropic.Anthropic client pointed at the (reset) server. |
fakellm_logs |
Opt-in. Dumps the server's output into the failure report only if the test fails — handy for debugging without cluttering passing runs. |
FakellmServer handle
Clients and URLs:
openai_client(**kwargs)/anthropic_client(**kwargs)— clients pointed at the server.openai_base_url/anthropic_base_url— raw URLs if you build your own client.
Configuring rules:
set_config_text(yaml)— write rules inline and reload.load_rules(path)— load rules from a file and reload.reset()— clear conversation state (done for you between tests).reload()— re-read the config from disk.
Inspecting what happened:
stats()/conversations()— the admin JSON, for assertions.request_count— absolute session total of requests seen.requests_since_reset— requests made during the current test (per-test count).tool_results_seen()— total tool results the server observed across all conversations.
Assertions (raise AssertionError with a readable message on failure):
assert_request_count(expected)— exactlyexpectedrequests were seen.assert_rule_matched(rule_name, min_times=1)— a named config rule matched at leastmin_times.assert_tool_results_seen(min_results=1)— at leastmin_resultstool results were fed back.
Error injection:
set_error_simulation(status, error_message="...", *, when=None, name="...")— make the server return an HTTP error for matching requests.
See Assertions and error simulation for details.
Assertions and error simulation
Asserting on traffic
After your code runs, assert on what the server saw:
def test_agent_makes_one_call(fakellm):
fakellm.set_config_text("""
version: 1
rules:
- name: answer
when: { messages_contain: "weather" }
respond: { content: "It is sunny." }
""")
run_my_agent(fakellm.openai_client(), prompt="what is the weather?")
fakellm.assert_request_count(1)
fakellm.assert_rule_matched("answer")
assert_rule_matched reads the per-rule match counts the server keeps in
stats()["by_rule"]. Requests that matched no rule are counted under
"<fallthrough>", so you can assert on those too.
Both assert_request_count and assert_rule_matched count only what happened
during the current test. fakellm's stats are cumulative for the whole server
process (a reset() clears conversations but not stats), so the fakellm
fixture records a baseline at the start of each test and these helpers measure
the delta from it. If you want the raw numbers, request_count is the absolute
session total and requests_since_reset is the per-test count.
Tool results
If your agent calls a tool and feeds the result back to the model, the server counts those tool results:
def test_agent_used_a_tool(fakellm):
run_my_tool_using_agent(fakellm.openai_client(), prompt="search for X")
fakellm.assert_tool_results_seen(1)
A deliberate limitation worth knowing: fakellm records only a count of
tool results per conversation — it does not retain or expose tool names. So
you can confirm that a tool result came back, but not which tool produced it.
There is intentionally no assert_tool_called("search"), because the server
transmits no data to implement it against. If you need to assert on a specific
tool, match on it in a rule (when: { tools_include: "search" }) and then use
assert_rule_matched on that rule's name.
Simulating errors
To exercise your retry/back-off and error-handling paths, make the server return an HTTP error:
import openai
def test_agent_retries_on_rate_limit(fakellm):
fakellm.set_error_simulation(429, "slow down")
client = fakellm.openai_client()
with pytest.raises(openai.RateLimitError):
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "hello"}],
)
set_error_simulation works for both the OpenAI and Anthropic endpoints,
emitting the error in each API's native shape. status must be >= 400
(fakellm only treats those as errors). Pass a when= matcher dict to scope the
error to specific requests, e.g. set_error_simulation(503, "down", when={"messages_contain": "search"});
omit it to fail every request. The error message is YAML-serialized safely, so
quotes, colons, and newlines in the message won't corrupt the config.
Surfacing server logs on failure
Add the fakellm_logs fixture to a test and, if that test fails, the
server's output for that test is attached to the failure report. Passing tests
stay quiet:
def test_something_tricky(fakellm, fakellm_logs):
...
assert result == expected # on failure, server logs appear in the report
Configuration
Set a starting config file via the command line:
pytest --fakellm-config=tests/fixtures/rules.yaml
or in pyproject.toml / pytest.ini:
[tool.pytest.ini_options]
fakellm_config = "tests/fixtures/rules.yaml"
If you don't set one, a temporary empty config is created so set_config_text
and load_rules work immediately.
--fakellm-startup-timeout (default 10.0) controls how long the fixture waits
for the server to come up.
Client extras
openai_client() and anthropic_client() require the respective SDKs. Install
what you need:
pip install "pytest-fakellm[openai]" # adds openai
pip install "pytest-fakellm[anthropic]" # adds anthropic
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_fakellm-0.2.0.tar.gz.
File metadata
- Download URL: pytest_fakellm-0.2.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
818c9329ad85dac894edd5cade661d0714b3e17477d837431646651cdbbce55f
|
|
| MD5 |
1a257247993068eadf2cf522b7815c21
|
|
| BLAKE2b-256 |
55fdec400b8cf735b4722accba0a7a717d410bbb13b86fce6aa697626479c9da
|
File details
Details for the file pytest_fakellm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pytest_fakellm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cdac07b539dd288fb55db41e82ac5381864bbca2396e5c3d823156b32310a24
|
|
| MD5 |
7300c33b321c40a6cb0a5cecbe58a974
|
|
| BLAKE2b-256 |
621f1e65e6d14f249662e963605d4cc6aa112cb2dd76d33131f302b4d1dfe2d1
|