A mock LLM server for testing. Drop-in replacement for OpenAI and Anthropic APIs.

These details have not been verified by PyPI

Project links

Project description

fakellm

A mock OpenAI/Anthropic server for testing LLM apps without burning API credits.

fakellm speaks the OpenAI and Anthropic HTTP APIs and returns whatever responses you tell it to. Point your code at it instead of the real APIs in tests, CI, and local development. Define behavior in a YAML file — including multi-turn agent flows where turn 1 returns a tool call, turn 2 returns a summary, and turn N returns whatever you want.

pip install fakellm
fakellm init      # creates fakellm.yaml
fakellm serve     # starts on http://127.0.0.1:9999

Then point your client at it:

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:9999/v1", api_key="not-used")

Why fakellm

Testing code that calls an LLM is annoying. Real APIs cost money and rate-limit you. Recording-and-replay tools (VCR-style) go stale and can't cover error paths. unittest.mock.patch works for unit tests but falls apart the moment you have an agent that loops through tool calls.

fakellm fits between those:

	Real API in tests	`unittest.mock`	VCR-style replay	fakellm
Free / fast	❌	✅	✅	✅
Multi-turn agent flows	✅	painful	❌	✅
Test error paths (429, 500, malformed)	hard to trigger	✅	❌	✅
Test streaming	✅	painful	partial	✅
No code changes vs. production	✅	❌	✅	✅
Shareable across services / languages	n/a	❌	❌	✅

Multi-turn agents in 20 lines (new in 0.2)

Most mock servers can answer "what does turn N look like in isolation." fakellm can describe a whole agent flow as data:

fakellm.yaml

rules:
  # Turn 1: user asks for research → return a tool call
  - name: kickoff_research
    when:
      turn: 1
      messages_contain: "research"
    respond:
      tool_calls:
        - name: web_search
          arguments: {query: "fakellm"}

  # Turn 2: tool result came back → return a summary
  - name: summarize_results
    when:
      turn: 2
      tool_result_contains: "found"
    respond:
      content: "Based on the search, I found what you were looking for."

test_my_agent.py

import httpx
import pytest
from openai import OpenAI

@pytest.fixture(autouse=True)
def reset_fakellm():
    httpx.post("http://127.0.0.1:9999/_fakellm/reset")

def test_agent_handles_search():
    client = OpenAI(base_url="http://127.0.0.1:9999/v1", api_key="not-used")
    result = run_my_agent(client, prompt="Please research fakellm")
    assert "found what you were looking for" in result

That's it. No mocks, no recordings, no real API calls. The agent loop runs end-to-end against fakellm and you assert on the output.

Features

Speaks both APIs. Drop-in replacement for https://api.openai.com/v1 and https://api.anthropic.com/v1 — same request shapes, same response shapes, same SSE streaming formats.
Rules engine. Match requests on prompt content, model name, tools, headers, conversation turn, previous message role/content, or tool-result content. First match wins.
Multi-turn aware. Conversations are tracked across requests so rules can fire on "turn 2 after a tool result mentioned X."
Tool/function calls. Mock tool calls in either OpenAI or Anthropic shape, including streaming chunked arguments.
Streaming. Both data: ... SSE for OpenAI and the typed event sequence (message_start, content_block_delta, etc.) for Anthropic.
Error injection. Per-rule status codes for 4xx/5xx testing.
Live dashboard. Visit http://127.0.0.1:9999/_fakellm to see request history, matched rules, and active conversations.
Hot reload. POST /_fakellm/reload re-reads the YAML without restarting.

Installation

pip install fakellm

Requires Python 3.10+.

Quickstart

fakellm init       # creates fakellm.yaml in the current directory
fakellm serve      # starts the server on 127.0.0.1:9999

Edit fakellm.yaml to add rules. Either restart the server or curl -X POST http://127.0.0.1:9999/_fakellm/reload to pick up changes.

Endpoints

LLM-compatible

Method	Path	Purpose
POST	`/v1/chat/completions`	OpenAI chat completions
POST	`/v1/messages`	Anthropic messages

Both support stream: true.

Admin

Method	Path	Purpose
GET	`/_fakellm`	HTML dashboard
GET	`/_fakellm/stats`	JSON: request counts, recent requests, conversations
GET	`/_fakellm/conversations`	JSON: turn count + tool results per conversation
POST	`/_fakellm/reload`	Re-read the YAML config
POST	`/_fakellm/reset`	Clear all conversation state

Every response also includes an X-Fakellm-Conversation-Id header so clients can see which conversation they were bucketed into.

Config reference

Top-level structure

version: 1

defaults:
  fallback: deterministic_echo  # what to return when no rule matches

rules:
  - name: my_rule
    when: { ... }      # conditions (all must match)
    respond: { ... }   # what to return

Conditions (`when:`)

All conditions in a when: block must match for the rule to fire. Rules are evaluated top-to-bottom; first match wins. A rule with no when: block matches everything.

Condition	Type	Description
`messages_contain`	string	Case-insensitive substring across all message content.
`model_matches`	glob	e.g. `gpt-4`, `claude--sonnet-*`.
`tools_include`	string	Match if a tool with this name is defined in the request.
`turn`	int	Match the Nth turn of this conversation (1-indexed).
`turn_in`	`[low, high]`	Match a turn in this inclusive range.
`previous_message_role`	string	Role of the message immediately before the latest one (`user`, `assistant`, `tool`).
`previous_message_contains`	string	Substring match on the previous message's text.
`tool_result_contains`	string	Match if any tool result — in this request or earlier in this conversation — contains the substring.
`header.<name>`	string	Match a request header (e.g. `header.x-test-scenario: rate_limit`).

Responses (`respond:`)

Key	Type	Description
`content`	string	Assistant text content.
`tool_calls`	list	List of `{name, arguments}` to return as tool calls.
`status`	int	HTTP status. Default 200. Set to 4xx/5xx for error responses.
`error`	string	Error message body (used when `status >= 400`).

If neither content nor tool_calls is set, fakellm returns a deterministic echo response derived from a hash of the request — useful for "I just need some response" tests.

Conversations

A conversation is identified by a stable hash of the first user message in the request. Adding more turns doesn't change the ID, so the same conversation keeps the same ID across all its turns.

To override the ID (useful in tests where you want explicit control), send the X-Fakellm-Conversation-Id header with any value you want:

client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    extra_headers={"X-Fakellm-Conversation-Id": "test-session-42"},
)

Between tests, call POST /_fakellm/reset to clear all conversation state. Stats and request history are preserved.

CLI

fakellm init                  # create fakellm.yaml
fakellm serve                 # start the server
fakellm serve --port 8080     # custom port
fakellm serve --config x.yaml # custom config path
fakellm serve --reload        # auto-reload on code changes (dev only)

Caveats

Single-worker only. fakellm stores config and conversation state in process memory; running with multiple uvicorn workers will partition that state across workers. Stick with the default single worker.
Token counts are approximate (len(text) // 4) by default. Install the accurate extra for tiktoken-based counts: pip install fakellm[accurate]. (Coming in 0.3.)
Not for production traffic. fakellm is built for tests; it's not a production-ready proxy.

Contributing

Issues and PRs welcome. See CONTRIBUTING.md.

License

See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.5

May 22, 2026

0.3.4

May 22, 2026

This version

0.3.3

May 22, 2026

0.3.2

May 22, 2026

0.3.1

May 20, 2026

0.3.0

May 19, 2026

0.2.0

May 6, 2026

0.1.1

May 2, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fakellm-0.3.3.tar.gz (34.0 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fakellm-0.3.3-py3-none-any.whl (26.8 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file fakellm-0.3.3.tar.gz.

File metadata

Download URL: fakellm-0.3.3.tar.gz
Upload date: May 22, 2026
Size: 34.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`286877a60d54b0cd87f822c029d392cd90494a745584921564514026271c06d7`
MD5	`d77a91b2ee7b5119e0e0329d445317d1`
BLAKE2b-256	`9435f715fe57b4bba658e5db491ebb7e436431c1113f31e34c7ccbd126ee3219`

See more details on using hashes here.

File details

Details for the file fakellm-0.3.3-py3-none-any.whl.

File metadata

Download URL: fakellm-0.3.3-py3-none-any.whl
Upload date: May 22, 2026
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for fakellm-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e527823da372eab1df520a1e4760045d4775079e45aea5e4c3dcc9c7b8f3966`
MD5	`4db25eedf3c92bef787415534e9fb920`
BLAKE2b-256	`8287a3c9fae748d93482133be8cbee3681f2127a03d49adec94340a2568c6a18`

See more details on using hashes here.

fakellm 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fakellm

Why fakellm

Multi-turn agents in 20 lines (new in 0.2)

Features

Installation

Quickstart

Endpoints

LLM-compatible

Admin

Config reference

Top-level structure

Conditions (when:)

Responses (respond:)

Conversations

CLI

Caveats

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Conditions (`when:`)

Responses (`respond:`)