๐ Zero-config pytest plugin for mocking LLM APIs - OpenAI, Anthropic, Gemini, LangChain & more
Project description
๐งช pytest-mockllm
๐ Zero-config LLM mocking for pytest โ Test AI apps without the AI bills
Quick Start โข Features โข Providers โข Examples โข Recording โข Configuration
Why pytest-mockllm?
Testing LLM applications is painful:
- ๐ธ Expensive โ Every test run burns API credits
- ๐ข Slow โ API calls add seconds to your test suite
- ๐ฒ Non-deterministic โ Same input, different output = flaky tests
- ๐ Requires API keys โ CI needs secrets, local dev needs setup
pytest-mockllm fixes all of this with zero configuration:
# Just use the fixture โ it works immediately!
def test_my_chatbot(mock_openai):
mock_openai.add_response("Hello! I'm here to help.")
response = my_chatbot.chat("Hi there!")
assert "help" in response.lower()
assert mock_openai.call_count == 1
No setup. No API keys. No costs. Just fast, reliable tests.
๐ Quick Start
Installation
pip install pytest-mockllm
That's it! The plugin is auto-discovered by pytest.
Your First Test
def test_customer_support_bot(mock_openai):
# Configure the mock response
mock_openai.add_response("I can help you with your order. What's your order number?")
# Your actual code that uses OpenAI
from openai import OpenAI
client = OpenAI(api_key="fake-key") # Key doesn't matter!
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "I need help with my order"}]
)
# Assert on the response
assert "order number" in response.choices[0].message.content.lower()
# Assert on what was called
assert mock_openai.call_count == 1
assert mock_openai.last_call["model"] == "gpt-4o"
โจ Features
๐ฏ Zero Configuration
Fixtures are auto-discovered. Just use them.
๐ค Multi-Provider Support
OpenAI, Anthropic, Google Gemini โ one consistent API.
๐ Streaming Support
Full support for streaming responses, just like the real APIs.
๐ง LangChain & LlamaIndex
Native integration with popular LLM frameworks.
๐ผ Response Recording
VCR-style recording for golden tests.
๐ฐ Cost & Token Tracking
Assert on costs before they become production surprises.
โก Chaos Testing
Simulate rate limits, timeouts, and API errors.
๐ Type Safe
Full type hints and mypy compliance.
๐ค Providers
OpenAI
def test_openai(mock_openai):
mock_openai.add_response("The answer is 42")
# Works with the official OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key="fake")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
assert response.choices[0].message.content == "The answer is 42"
Anthropic
def test_anthropic(mock_anthropic):
mock_anthropic.add_response("I'd be happy to help!")
from anthropic import Anthropic
client = Anthropic(api_key="fake")
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello Claude!"}]
)
assert "happy" in response.content[0].text
Google Gemini
def test_gemini(mock_gemini):
mock_gemini.add_response("Here's what I found...")
import google.generativeai as genai
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content("Tell me about AI")
assert "found" in response.text
LangChain
def test_langchain(mock_langchain):
mock_langchain.add_response("Paris is the capital of France.")
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-4o", api_key="fake")
prompt = ChatPromptTemplate.from_template("What is the capital of {country}?")
chain = prompt | llm
result = chain.invoke({"country": "France"})
assert "Paris" in result.content
๐ Examples
Multiple Responses (Conversation)
def test_conversation(mock_openai):
mock_openai.add_responses(
"Hi! How can I help you today?",
"I can definitely help with that order.",
"Your order has been updated. Anything else?",
)
# First call
response1 = chatbot.send("Hello")
assert "help" in response1
# Second call
response2 = chatbot.send("I need to change my order")
assert "order" in response2
# Third call
response3 = chatbot.send("Change quantity to 5")
assert "updated" in response3
Streaming Responses
def test_streaming(mock_openai):
mock_openai.add_response("This is a streaming response that comes in chunks")
from openai import OpenAI
client = OpenAI(api_key="fake")
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content:
full_response += chunk.choices[0].delta.content
assert "streaming" in full_response
Function/Tool Calling
def test_function_calling(mock_openai):
from pytest_mockllm.core import MockResponse
mock_openai._responses.append(MockResponse(
content="",
tool_calls=[{
"id": "call_123",
"function": {
"name": "get_weather",
"arguments": {"location": "San Francisco", "unit": "celsius"},
},
}],
))
# Your function-calling logic here
# ...
assert mock_openai.last_call is not None
Token Usage & Cost Assertions
def test_stays_within_budget(mock_openai):
from pytest_mockllm.core import TokenUsage, estimate_cost
mock_openai.add_response(
"A detailed response...",
token_usage=TokenUsage(prompt_tokens=500, completion_tokens=1000),
)
# Your LLM call here
result = my_function()
# Assert token usage
assert mock_openai.total_tokens < 2000
assert mock_openai.total_completion_tokens < 1500
# Assert cost (for gpt-4o)
cost = estimate_cost(
"gpt-4o",
mock_openai.total_prompt_tokens,
mock_openai.total_completion_tokens,
)
assert cost < 0.05 # Less than 5 cents
Error Simulation (Chaos Testing)
def test_handles_rate_limit(mock_openai):
mock_openai.simulate_error("rate_limit", after_calls=2)
mock_openai.add_responses("OK", "OK")
# First two calls succeed
assert my_function() == "OK"
assert my_function() == "OK"
# Third call hits rate limit
with pytest.raises(Exception):
my_function()
def test_handles_timeout():
mock_openai.simulate_error("timeout")
# Should trigger your retry logic
with pytest.raises(TimeoutError):
my_function()
Strict Mode
def test_catches_unconfigured_calls(mock_openai):
mock_openai.set_strict_mode(True)
# This will raise an error because no response is configured
with pytest.raises(RuntimeError, match="No mock response configured"):
my_function_that_calls_llm()
๐ผ Recording & Replay
Record real API responses once, replay them forever โ like VCR for LLMs.
Record Mode
# First run: hits real API and saves response
@pytest.mark.llm_record
def test_with_recording(llm_recorder):
from openai import OpenAI
client = OpenAI() # Uses real API key from environment
response = client.chat.completions.create(
model="gpt-4o-mini", # Use cheap model for recording
messages=[{"role": "user", "content": "Say hello!"}]
)
assert response.choices[0].message.content
Run with recording:
pytest tests/test_example.py --llm-record
Replay Mode
# Subsequent runs: uses saved response (no API key needed!)
@pytest.mark.llm_replay
def test_with_recording(llm_recorder):
# Same test code โ but now uses cached response
# ...
Cassette Storage
Responses are saved in tests/llm_cassettes/ as YAML:
name: test_with_recording
version: "1.0"
created: 1703270400
interactions:
- request:
model: gpt-4o-mini
messages:
- role: user
content: Say hello!
response:
content: "Hello! How can I assist you today?"
model: gpt-4o-mini
provider: openai
latency_ms: 523
๐ง Configuration
CLI Options
# Enable recording mode
pytest --llm-record
# Custom cassette directory
pytest --llm-cassette-dir=my_cassettes
# Strict mode (fail if any LLM call is unconfigured)
pytest --llm-strict
Markers
@pytest.mark.llm_mock(provider="anthropic")
def test_with_anthropic(mock_llm):
# mock_llm is now an AnthropicMock
pass
@pytest.mark.llm_record
def test_records_responses(llm_recorder):
pass
@pytest.mark.llm_replay
def test_replays_responses(llm_recorder):
pass
pytest.ini / pyproject.toml
[tool.pytest.ini_options]
# Default cassette directory
llm_cassette_dir = "tests/fixtures/llm"
# Always run in strict mode
llm_strict = true
๐ Comparison
| Feature | pytest-mockllm | unittest.mock | responses | vcrpy |
|---|---|---|---|---|
| Zero config | โ | โ | โ | โ |
| pytest fixtures | โ | โ | โ | โ |
| OpenAI support | โ Native | ๐ก Manual | โ | ๐ก HTTP |
| Anthropic support | โ Native | ๐ก Manual | โ | ๐ก HTTP |
| Gemini support | โ Native | ๐ก Manual | โ | ๐ก HTTP |
| LangChain support | โ Native | ๐ก Complex | โ | ๐ก LimitedComplex |
| Streaming | โ | ๐ก Manual | โ | ๐ก Complex |
| Token tracking | โ | โ | โ | โ |
| Cost estimation | โ | โ | โ | โ |
| Recording/Replay | โ | โ | โ | โ |
| Error simulation | โ | ๐ก Manual | ๐ก HTTP | โ |
๐ฃ๏ธ Roadmap
- Async/await support improvements
- More providers (Cohere, Mistral, Together AI)
- pytest-xdist compatibility
- Response fuzzing for robustness testing
- Integration with LangSmith for debugging
- Automatic prompt regression detection
๐ค Contributing
We'd love your help! See CONTRIBUTING.md for guidelines.
# Clone the repo
git clone https://github.com/godhiraj-code/pytest-mockllm.git
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check src/
mypy src/
๐ License
MIT License โ see LICENSE for details.
Stop paying for tests. Start shipping faster.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_mockllm-0.1.1.tar.gz.
File metadata
- Download URL: pytest_mockllm-0.1.1.tar.gz
- Upload date:
- Size: 25.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc2a3b834c2b27b5f1e08f4f4cfc99f00e332e44dc3fb6d0a554b1e225b8c9a0
|
|
| MD5 |
439b74ec8b8a2cc55cc66cbacaf18e14
|
|
| BLAKE2b-256 |
5678945cdeaea51a3620a56e18ca8816a070e54d7c26b7d7d0aeca940fbf0e5d
|
File details
Details for the file pytest_mockllm-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pytest_mockllm-0.1.1-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed393eeaf5278a291497747ca1b166ea439e5f192c58a8d466b86b2c724f4587
|
|
| MD5 |
0cb452db5922c369f3efc7bb5b0c2907
|
|
| BLAKE2b-256 |
df02f1f624bb7d10afdee725873af40e9089bfaa0783edc9cdcd116e193338c6
|