🚀 Zero-config pytest plugin for mocking LLM APIs - OpenAI, Anthropic, Gemini, LangChain & more

These details have not been verified by PyPI

Project links

Project description

🧪 pytest-mockllm

🚀 Zero-config LLM mocking for pytest — Test AI apps without the AI bills

Quick Start • Features • Providers • Examples • Recording • Configuration

Why pytest-mockllm?

Testing LLM applications is painful:

💸 Expensive — Every test run burns API credits
🐢 Slow — API calls add seconds to your test suite
🎲 Non-deterministic — Same input, different output = flaky tests
🔒 Requires API keys — CI needs secrets, local dev needs setup

pytest-mockllm fixes all of this with zero configuration:

# Just use the fixture — it works immediately!
def test_my_chatbot(mock_openai):
    mock_openai.add_response("Hello! I'm here to help.")
    
    response = my_chatbot.chat("Hi there!")
    
    assert "help" in response.lower()
    assert mock_openai.call_count == 1

No setup. No API keys. No costs. Just fast, reliable tests.

🚀 Quick Start

Installation

pip install pytest-mockllm

That's it! The plugin is auto-discovered by pytest.

Your First Test

def test_customer_support_bot(mock_openai):
    # Configure the mock response
    mock_openai.add_response("I can help you with your order. What's your order number?")
    
    # Your actual code that uses OpenAI
    from openai import OpenAI
    client = OpenAI(api_key="fake-key")  # Key doesn't matter!
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "I need help with my order"}]
    )
    
    # Assert on the response
    assert "order number" in response.choices[0].message.content.lower()
    
    # Assert on what was called
    assert mock_openai.call_count == 1
    assert mock_openai.last_call["model"] == "gpt-4o"

✨ Features

🎯 Zero Configuration

Fixtures are auto-discovered. Just use them.

🤖 Multi-Provider Support

OpenAI, Anthropic, Google Gemini — one consistent API.

🌊 Streaming Support

Full support for streaming responses, just like the real APIs.

🔧 LangChain & LlamaIndex

Native integration with popular LLM frameworks.

📼 Response Recording

VCR-style recording for golden tests.

💰 Cost & Token Tracking

Assert on costs before they become production surprises.

⚡ Chaos Testing

Simulate rate limits, timeouts, and API errors.

🔒 Type Safe

Full type hints and mypy compliance.

🤖 Providers

OpenAI

def test_openai(mock_openai):
    mock_openai.add_response("The answer is 42")
    
    # Works with the official OpenAI SDK
    from openai import OpenAI
    client = OpenAI(api_key="fake")
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is the meaning of life?"}]
    )
    
    assert response.choices[0].message.content == "The answer is 42"

Anthropic

def test_anthropic(mock_anthropic):
    mock_anthropic.add_response("I'd be happy to help!")
    
    from anthropic import Anthropic
    client = Anthropic(api_key="fake")
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello Claude!"}]
    )
    
    assert "happy" in response.content[0].text

Google Gemini

def test_gemini(mock_gemini):
    mock_gemini.add_response("Here's what I found...")
    
    import google.generativeai as genai
    model = genai.GenerativeModel("gemini-1.5-pro")
    
    response = model.generate_content("Tell me about AI")
    
    assert "found" in response.text

LangChain

def test_langchain(mock_langchain):
    mock_langchain.add_response("Paris is the capital of France.")
    
    from langchain_openai import ChatOpenAI
    from langchain_core.prompts import ChatPromptTemplate
    
    llm = ChatOpenAI(model="gpt-4o", api_key="fake")
    prompt = ChatPromptTemplate.from_template("What is the capital of {country}?")
    chain = prompt | llm
    
    result = chain.invoke({"country": "France"})
    
    assert "Paris" in result.content

📚 Examples

Multiple Responses (Conversation)

def test_conversation(mock_openai):
    mock_openai.add_responses(
        "Hi! How can I help you today?",
        "I can definitely help with that order.",
        "Your order has been updated. Anything else?",
    )
    
    # First call
    response1 = chatbot.send("Hello")
    assert "help" in response1
    
    # Second call
    response2 = chatbot.send("I need to change my order")  
    assert "order" in response2
    
    # Third call
    response3 = chatbot.send("Change quantity to 5")
    assert "updated" in response3

Streaming Responses

def test_streaming(mock_openai):
    mock_openai.add_response("This is a streaming response that comes in chunks")
    
    from openai import OpenAI
    client = OpenAI(api_key="fake")
    
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
        stream=True,
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            full_response += chunk.choices[0].delta.content
    
    assert "streaming" in full_response

Function/Tool Calling

def test_function_calling(mock_openai):
    from pytest_mockllm.core import MockResponse
    
    mock_openai._responses.append(MockResponse(
        content="",
        tool_calls=[{
            "id": "call_123",
            "function": {
                "name": "get_weather",
                "arguments": {"location": "San Francisco", "unit": "celsius"},
            },
        }],
    ))
    
    # Your function-calling logic here
    # ...
    
    assert mock_openai.last_call is not None

Token Usage & Cost Assertions

def test_stays_within_budget(mock_openai):
    from pytest_mockllm.core import TokenUsage, estimate_cost
    
    mock_openai.add_response(
        "A detailed response...",
        token_usage=TokenUsage(prompt_tokens=500, completion_tokens=1000),
    )
    
    # Your LLM call here
    result = my_function()
    
    # Assert token usage
    assert mock_openai.total_tokens < 2000
    assert mock_openai.total_completion_tokens < 1500
    
    # Assert cost (for gpt-4o)
    cost = estimate_cost(
        "gpt-4o",
        mock_openai.total_prompt_tokens,
        mock_openai.total_completion_tokens,
    )
    assert cost < 0.05  # Less than 5 cents

Error Simulation (Chaos Testing)

def test_handles_rate_limit(mock_openai):
    mock_openai.simulate_error("rate_limit", after_calls=2)
    mock_openai.add_responses("OK", "OK")
    
    # First two calls succeed
    assert my_function() == "OK"
    assert my_function() == "OK"
    
    # Third call hits rate limit  
    with pytest.raises(Exception):
        my_function()

def test_handles_timeout():
    mock_openai.simulate_error("timeout")
    
    # Should trigger your retry logic
    with pytest.raises(TimeoutError):
        my_function()

Strict Mode

def test_catches_unconfigured_calls(mock_openai):
    mock_openai.set_strict_mode(True)
    
    # This will raise an error because no response is configured
    with pytest.raises(RuntimeError, match="No mock response configured"):
        my_function_that_calls_llm()

📼 Recording & Replay

Record real API responses once, replay them forever — like VCR for LLMs.

Record Mode

# First run: hits real API and saves response
@pytest.mark.llm_record
def test_with_recording(llm_recorder):
    from openai import OpenAI
    client = OpenAI()  # Uses real API key from environment
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Use cheap model for recording
        messages=[{"role": "user", "content": "Say hello!"}]
    )
    
    assert response.choices[0].message.content

Run with recording:

pytest tests/test_example.py --llm-record

Replay Mode

# Subsequent runs: uses saved response (no API key needed!)
@pytest.mark.llm_replay
def test_with_recording(llm_recorder):
    # Same test code — but now uses cached response
    # ...

Cassette Storage

Responses are saved in tests/llm_cassettes/ as YAML:

name: test_with_recording
version: "1.0"
created: 1703270400
interactions:
  - request:
      model: gpt-4o-mini
      messages:
        - role: user
          content: Say hello!
    response:
      content: "Hello! How can I assist you today?"
      model: gpt-4o-mini
    provider: openai
    latency_ms: 523

🔧 Configuration

CLI Options

# Enable recording mode
pytest --llm-record

# Custom cassette directory
pytest --llm-cassette-dir=my_cassettes

# Strict mode (fail if any LLM call is unconfigured)
pytest --llm-strict

Markers

@pytest.mark.llm_mock(provider="anthropic")
def test_with_anthropic(mock_llm):
    # mock_llm is now an AnthropicMock
    pass

@pytest.mark.llm_record
def test_records_responses(llm_recorder):
    pass

@pytest.mark.llm_replay  
def test_replays_responses(llm_recorder):
    pass

pytest.ini / pyproject.toml

[tool.pytest.ini_options]
# Default cassette directory
llm_cassette_dir = "tests/fixtures/llm"

# Always run in strict mode
llm_strict = true

🆚 Comparison

Feature	pytest-mockllm	unittest.mock	responses	vcrpy
Zero config	✅	❌	❌	❌
pytest fixtures	✅	❌	✅	✅
OpenAI support	✅ Native	🟡 Manual	❌	🟡 HTTP
Anthropic support	✅ Native	🟡 Manual	❌	🟡 HTTP
Gemini support	✅ Native	🟡 Manual	❌	🟡 HTTP
LangChain support	✅ Native	🟡 Complex	❌	🟡 LimitedComplex
Streaming	✅	🟡 Manual	❌	🟡 Complex
Token tracking	✅	❌	❌	❌
Cost estimation	✅	❌	❌	❌
Recording/Replay	✅	❌	❌	✅
Error simulation	✅	🟡 Manual	🟡 HTTP	❌

🛣️ Roadmap

Async/await support improvements
More providers (Cohere, Mistral, Together AI)
pytest-xdist compatibility
Response fuzzing for robustness testing
Integration with LangSmith for debugging
Automatic prompt regression detection

🤝 Contributing

We'd love your help! See CONTRIBUTING.md for guidelines.

# Clone the repo
git clone https://github.com/godhiraj-code/pytest-mockllm.git

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/
mypy src/

📜 License

MIT License — see LICENSE for details.

Stop paying for tests. Start shipping faster.

⭐ Star us on GitHub • Built by Dhiraj Das

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Dec 22, 2025

0.2.1

Dec 22, 2025

This version

0.1.1

Dec 22, 2025

0.1.0

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_mockllm-0.1.1.tar.gz (25.6 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_mockllm-0.1.1-py3-none-any.whl (28.1 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file pytest_mockllm-0.1.1.tar.gz.

File metadata

Download URL: pytest_mockllm-0.1.1.tar.gz
Upload date: Dec 22, 2025
Size: 25.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pytest_mockllm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`dc2a3b834c2b27b5f1e08f4f4cfc99f00e332e44dc3fb6d0a554b1e225b8c9a0`
MD5	`439b74ec8b8a2cc55cc66cbacaf18e14`
BLAKE2b-256	`5678945cdeaea51a3620a56e18ca8816a070e54d7c26b7d7d0aeca940fbf0e5d`

See more details on using hashes here.

File details

Details for the file pytest_mockllm-0.1.1-py3-none-any.whl.

File metadata

Download URL: pytest_mockllm-0.1.1-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 28.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pytest_mockllm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ed393eeaf5278a291497747ca1b166ea439e5f192c58a8d466b86b2c724f4587`
MD5	`0cb452db5922c369f3efc7bb5b0c2907`
BLAKE2b-256	`df02f1f624bb7d10afdee725873af40e9089bfaa0783edc9cdcd116e193338c6`

See more details on using hashes here.

pytest-mockllm 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧪 pytest-mockllm

Why pytest-mockllm?

🚀 Quick Start

Installation

Your First Test

✨ Features

🎯 Zero Configuration

🤖 Multi-Provider Support

🌊 Streaming Support

🔧 LangChain & LlamaIndex

📼 Response Recording

💰 Cost & Token Tracking

⚡ Chaos Testing

🔒 Type Safe

🤖 Providers

OpenAI

Anthropic

Google Gemini

LangChain

📚 Examples

Multiple Responses (Conversation)

Streaming Responses

Function/Tool Calling

Token Usage & Cost Assertions

Error Simulation (Chaos Testing)

Strict Mode

📼 Recording & Replay

Record Mode

Replay Mode

Cassette Storage

🔧 Configuration

CLI Options

Markers

pytest.ini / pyproject.toml

🆚 Comparison

🛣️ Roadmap

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes