Skip to main content

๐Ÿš€ Zero-config pytest plugin for mocking LLM APIs - OpenAI, Anthropic, Gemini, LangChain & more

Project description

pytest-mockllm logo

pytest-mockllm

๐Ÿš€ Zero-config LLM mocking for pytest โ€” Test AI apps without the AI bills

PyPI version Python versions CI Coverage License

Quick Start โ€ข Features โ€ข Providers โ€ข Examples โ€ข Recording โ€ข Docs


Why pytest-mockllm?

Testing LLM applications is painful:

  • ๐Ÿ’ธ Expensive โ€” Every test run burns API credits
  • ๐Ÿข Slow โ€” API calls add seconds to your test suite
  • ๐ŸŽฒ Non-deterministic โ€” Same input, different output = flaky tests
  • ๐Ÿ”’ Requires API keys โ€” CI needs secrets, local dev needs setup

pytest-mockllm fixes all of this with zero configuration:

# Just use the fixture โ€” it works immediately!
def test_my_chatbot(mock_openai):
    mock_openai.add_response("Hello! I'm here to help.")
    
    response = my_chatbot.chat("Hi there!")
    
    assert "help" in response.lower()
    assert mock_openai.call_count == 1

No setup. No API keys. No costs. Just fast, reliable tests.


๐Ÿš€ Quick Start

Installation

pip install pytest-mockllm

That's it! The plugin is auto-discovered by pytest.

Your First Test

def test_customer_support_bot(mock_openai):
    # Configure the mock response
    mock_openai.add_response("I can help you with your order. What's your order number?")
    
    # Your actual code that uses OpenAI
    from openai import OpenAI
    client = OpenAI(api_key="fake-key")  # Key doesn't matter!
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "I need help with my order"}]
    )
    
    # Assert on the response
    assert "order number" in response.choices[0].message.content.lower()
    
    # Assert on what was called
    assert mock_openai.call_count == 1
    assert mock_openai.last_call["model"] == "gpt-4o"

โœจ Features

๐ŸŽฏ Zero Configuration

Fixtures are auto-discovered. Just use them.

๐Ÿค– Multi-Provider Support

OpenAI, Anthropic, Google Gemini โ€” one consistent API.

๐ŸŒŠ Streaming Support

Full support for streaming responses, just like the real APIs.

๐Ÿ”ง LangChain & LlamaIndex

Native integration with popular LLM frameworks.

๐Ÿ“ผ Response Recording

VCR-style recording for golden tests.

๐Ÿ’ฐ Cost & Token Tracking

Assert on costs before they become production surprises.

โšก Chaos Testing

Simulate rate limits, timeouts, and API errors.

๐Ÿ”’ Type Safe

Full type hints and mypy compliance.


๐Ÿค– Providers

OpenAI

def test_openai(mock_openai):
    mock_openai.add_response("The answer is 42")
    
    # Works with the official OpenAI SDK
    from openai import OpenAI
    client = OpenAI(api_key="fake")
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is the meaning of life?"}]
    )
    
    assert response.choices[0].message.content == "The answer is 42"

Anthropic

def test_anthropic(mock_anthropic):
    mock_anthropic.add_response("I'd be happy to help!")
    
    from anthropic import Anthropic
    client = Anthropic(api_key="fake")
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello Claude!"}]
    )
    
    assert "happy" in response.content[0].text

Google Gemini

def test_gemini(mock_gemini):
    mock_gemini.add_response("Here's what I found...")
    
    import google.generativeai as genai
    model = genai.GenerativeModel("gemini-1.5-pro")
    
    response = model.generate_content("Tell me about AI")
    
    assert "found" in response.text

LangChain

def test_langchain(mock_langchain):
    mock_langchain.add_response("Paris is the capital of France.")
    
    from langchain_openai import ChatOpenAI
    from langchain_core.prompts import ChatPromptTemplate
    
    llm = ChatOpenAI(model="gpt-4o", api_key="fake")
    prompt = ChatPromptTemplate.from_template("What is the capital of {country}?")
    chain = prompt | llm
    
    result = chain.invoke({"country": "France"})
    
    assert "Paris" in result.content

๐Ÿ“š Examples

Multiple Responses (Conversation)

def test_conversation(mock_openai):
    mock_openai.add_responses(
        "Hi! How can I help you today?",
        "I can definitely help with that order.",
        "Your order has been updated. Anything else?",
    )
    
    # First call
    response1 = chatbot.send("Hello")
    assert "help" in response1
    
    # Second call
    response2 = chatbot.send("I need to change my order")  
    assert "order" in response2
    
    # Third call
    response3 = chatbot.send("Change quantity to 5")
    assert "updated" in response3

Streaming Responses

def test_streaming(mock_openai):
    mock_openai.add_response("This is a streaming response that comes in chunks")
    
    from openai import OpenAI
    client = OpenAI(api_key="fake")
    
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me a story"}],
        stream=True,
    )
    
    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            full_response += chunk.choices[0].delta.content
    
    assert "streaming" in full_response

Function/Tool Calling

def test_function_calling(mock_openai):
    from pytest_mockllm.core import MockResponse
    
    mock_openai._responses.append(MockResponse(
        content="",
        tool_calls=[{
            "id": "call_123",
            "function": {
                "name": "get_weather",
                "arguments": {"location": "San Francisco", "unit": "celsius"},
            },
        }],
    ))
    
    # Your function-calling logic here
    # ...
    
    assert mock_openai.last_call is not None

Token Usage & Cost Assertions

def test_stays_within_budget(mock_openai):
    from pytest_mockllm.core import TokenUsage, estimate_cost
    
    mock_openai.add_response(
        "A detailed response...",
        token_usage=TokenUsage(prompt_tokens=500, completion_tokens=1000),
    )
    
    # Your LLM call here
    result = my_function()
    
    # Assert token usage
    assert mock_openai.total_tokens < 2000
    assert mock_openai.total_completion_tokens < 1500
    
    # Assert cost (for gpt-4o)
    cost = estimate_cost(
        "gpt-4o",
        mock_openai.total_prompt_tokens,
        mock_openai.total_completion_tokens,
    )
    assert cost < 0.05  # Less than 5 cents

Error Simulation (Chaos Testing)

def test_handles_rate_limit(mock_openai):
    mock_openai.simulate_error("rate_limit", after_calls=2)
    mock_openai.add_responses("OK", "OK")
    
    # First two calls succeed
    assert my_function() == "OK"
    assert my_function() == "OK"
    
    # Third call hits rate limit  
    with pytest.raises(Exception):
        my_function()

def test_handles_timeout():
    mock_openai.simulate_error("timeout")
    
    # Should trigger your retry logic
    with pytest.raises(TimeoutError):
        my_function()

Strict Mode

def test_catches_unconfigured_calls(mock_openai):
    mock_openai.set_strict_mode(True)
    
    # This will raise an error because no response is configured
    with pytest.raises(RuntimeError, match="No mock response configured"):
        my_function_that_calls_llm()

๐Ÿ“ผ Recording & Replay

Record real API responses once, replay them forever โ€” like VCR for LLMs.

Record Mode

# First run: hits real API and saves response
@pytest.mark.llm_record
def test_with_recording(llm_recorder):
    from openai import OpenAI
    client = OpenAI()  # Uses real API key from environment
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Use cheap model for recording
        messages=[{"role": "user", "content": "Say hello!"}]
    )
    
    assert response.choices[0].message.content

Run with recording:

pytest tests/test_example.py --llm-record

Replay Mode

# Subsequent runs: uses saved response (no API key needed!)
@pytest.mark.llm_replay
def test_with_recording(llm_recorder):
    # Same test code โ€” but now uses cached response
    # ...

Cassette Storage

Responses are saved in tests/llm_cassettes/ as YAML:

name: test_with_recording
version: "1.0"
created: 1703270400
interactions:
  - request:
      model: gpt-4o-mini
      messages:
        - role: user
          content: Say hello!
    response:
      content: "Hello! How can I assist you today?"
      model: gpt-4o-mini
    provider: openai
    latency_ms: 523

๐Ÿ”ง Configuration

CLI Options

# Enable recording mode
pytest --llm-record

# Custom cassette directory
pytest --llm-cassette-dir=my_cassettes

# Strict mode (fail if any LLM call is unconfigured)
pytest --llm-strict

Markers

@pytest.mark.llm_mock(provider="anthropic")
def test_with_anthropic(mock_llm):
    # mock_llm is now an AnthropicMock
    pass

@pytest.mark.llm_record
def test_records_responses(llm_recorder):
    pass

@pytest.mark.llm_replay  
def test_replays_responses(llm_recorder):
    pass

pytest.ini / pyproject.toml

[tool.pytest.ini_options]
# Default cassette directory
llm_cassette_dir = "tests/fixtures/llm"

# Always run in strict mode
llm_strict = true

๐Ÿ†š Comparison

Feature pytest-mockllm unittest.mock responses vcrpy
Zero config โœ… โŒ โŒ โŒ
pytest fixtures โœ… โŒ โœ… โœ…
OpenAI support โœ… Native ๐ŸŸก Manual โŒ ๐ŸŸก HTTP
Anthropic support โœ… Native ๐ŸŸก Manual โŒ ๐ŸŸก HTTP
Gemini support โœ… Native ๐ŸŸก Manual โŒ ๐ŸŸก HTTP
LangChain support โœ… Native ๐ŸŸก Complex โŒ ๐ŸŸก LimitedComplex
Streaming โœ… ๐ŸŸก Manual โŒ ๐ŸŸก Complex
Token tracking โœ… โŒ โŒ โŒ
Cost estimation โœ… โŒ โŒ โŒ
Recording/Replay โœ… โŒ โŒ โœ…
Error simulation โœ… ๐ŸŸก Manual ๐ŸŸก HTTP โŒ

๐Ÿ›ฃ๏ธ Roadmap

  • Async/await support improvements
  • More providers (Cohere, Mistral, Together AI)
  • pytest-xdist compatibility
  • Response fuzzing for robustness testing
  • Integration with LangSmith for debugging
  • Automatic prompt regression detection

๐Ÿค Contributing

We'd love your help! See CONTRIBUTING.md for guidelines.

# Clone the repo
git clone https://github.com/godhiraj-code/pytest-mockllm.git

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src/
mypy src/

๐Ÿ“œ License

MIT License โ€” see LICENSE for details.


Stop paying for tests. Start shipping faster.

โญ Star us on GitHub โ€ข Built by Dhiraj Das

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_mockllm-0.1.0.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_mockllm-0.1.0-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file pytest_mockllm-0.1.0.tar.gz.

File metadata

  • Download URL: pytest_mockllm-0.1.0.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pytest_mockllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 257b75e91b2cc244e3accf08a6ab2d7afad4457960ce8f1657713ce6282733b2
MD5 6fd34b9428e5a1f8e7e724ba909190d4
BLAKE2b-256 8be2a2214c647d41b7a19393ff07ab6d49af552b82f348a2b2c00fa99f74174a

See more details on using hashes here.

File details

Details for the file pytest_mockllm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pytest_mockllm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for pytest_mockllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee4c0d7392d13c74653dfbaaa5a2786a849c19ded070d3255e837b5cfdd3a98a
MD5 85bed50b2ad49b53c387451450c20269
BLAKE2b-256 842ed14b93a0622840a2edfcde395ea37b5279be50272f750558e58e5d9407ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page