Skip to main content

Pytest plugin for testing with Claude Agent SDK

Project description

pytest-claude-agent-sdk

Pytest plugin for working with Claude Agent SDK.

Three complementary use cases:

  1. Test apps that use Claude Agent SDK - your app uses Claude internally, and you need to test it
  2. Test apps using Claude Agent SDK - use Claude as a smart test oracle (LLM-as-judge)
  3. Both at once - test your Claude-powered app with Claude-powered assertions

Uses Claude Code authentication automatically - no API key management needed.

When to Use This Plugin

Your code accepts a client via dependency injection:

async def my_function(client: ClaudeSDKClient, data: str) -> str:
    async for msg in client.query(f"Process: {data}"):
        ...

→ Use SpyClaudeSDKClient from this plugin. It wraps real calls while recording them for assertions.

Your code calls query() directly (no DI):

from claude_agent_sdk import query

async def my_function(data: str) -> str:
    async for msg in query(prompt=f"Process: {data}"):  # hardcoded
        ...

→ Use standard mocking (unittest.mock.patch or pytest-mock). This plugin won't help here — but that's a sign your code could benefit from dependency injection anyway.

Installation

pip install pytest-claude-agent-sdk

Or with uv:

uv add pytest-claude-agent-sdk

Fixtures

Fixture Type Purpose
claude_query QueryFunc Substitute for query() - stateless
claude_client SpyClaudeSDKClient Client-like interface for app under test (with spy)
claude_judge_client QueryFunc Separate query function for evaluation

QueryFunc = Callable[..., AsyncIterator[Message]] - matches claude_agent_sdk.query() signature.

SpyClaudeSDKClient

The claude_client fixture provides a SpyClaudeSDKClient - wraps query() with call recording:

import pytest
from pytest_claude_agent_sdk import SpyClaudeSDKClient


@pytest.mark.llm
@pytest.mark.asyncio
async def test_my_app(claude_client: SpyClaudeSDKClient) -> None:
    # Use it like query() - real LLM calls happen
    async for msg in claude_client.query("Hello"):
        pass

    # Inspect what happened
    assert claude_client.call_count == 1
    assert "Hello" in claude_client.calls[0].prompt
    assert claude_client.last_call.response is not None

    # Use assertion helpers
    claude_client.assert_called_once()
    claude_client.assert_any_call_contains("Hello")

Spy Attributes

  • calls: list[CallRecord] - All calls made
  • call_count: int - Number of calls
  • last_call: CallRecord | None - Most recent call

Spy Assertion Helpers

  • assert_called() - At least one call was made
  • assert_not_called() - No calls were made
  • assert_called_once() - Exactly one call was made
  • assert_call_count(n) - Exactly n calls were made
  • assert_any_call_contains(substring) - Some call's prompt contains substring
  • assert_last_call_contains(substring) - Last call's prompt contains substring
  • reset_calls() - Clear call history

Examples

1. Testing an app that uses Claude

Your app has a function that accepts something with a query() method. Test it with the spy client:

your_app/greeter.py

from typing import Protocol, AsyncIterator

from claude_agent_sdk import Message, ResultMessage


class QueryClient(Protocol):
    """Protocol for anything with a query() method."""
    def query(self, prompt: str) -> AsyncIterator[Message]: ...


async def generate_greeting(client: QueryClient, name: str) -> str:
    async for msg in client.query(f"Generate a short greeting for {name}"):
        if isinstance(msg, ResultMessage):
            return msg.result or ""
    return ""

tests/test_greeter.py

import pytest
from pytest_claude_agent_sdk import SpyClaudeSDKClient

from your_app.greeter import generate_greeting


@pytest.mark.llm
@pytest.mark.asyncio
async def test_greeting(claude_client: SpyClaudeSDKClient) -> None:
    greeting: str = await generate_greeting(claude_client, "Alice")

    # Verify the output
    assert len(greeting) > 0
    assert "alice" in greeting.lower()

    # Verify the call was made correctly
    claude_client.assert_called_once()
    claude_client.assert_last_call_contains("Alice")

2. Testing any app using Claude as judge

Your app doesn't use Claude, but you want Claude to evaluate outputs:

your_app/email.py

def format_email(subject: str, body: str) -> str:
    return f"Subject: {subject}\n\n{body}\n\nBest regards"

tests/test_email.py

from typing import Callable

import pytest
from claude_agent_sdk import ResultMessage

from your_app.email import format_email


@pytest.mark.llm
@pytest.mark.asyncio
async def test_email_is_professional(claude_judge_client: Callable) -> None:
    email: str = format_email("Meeting", "Let's meet tomorrow at 3pm.")

    async for msg in claude_judge_client(
        prompt=f"Is this email professional? Answer only YES or NO.\n\n{email}"
    ):
        if isinstance(msg, ResultMessage):
            assert msg.result is not None
            assert "YES" in msg.result.upper()

3. Both: Test Claude app with Claude judge

Your app uses Claude, and you evaluate it with Claude (separate fixture):

chess_hustler/trash_talk.py

from typing import Protocol, AsyncIterator

from claude_agent_sdk import Message, ResultMessage


class QueryClient(Protocol):
    """Protocol for anything with a query() method."""
    def query(self, prompt: str) -> AsyncIterator[Message]: ...


async def trash_talk(client: QueryClient, move: str) -> str:
    async for msg in client.query(
        f"You're a NYC chess hustler. React to opponent's move: {move}"
    ):
        if isinstance(msg, ResultMessage):
            return msg.result or ""
    return ""

tests/test_trash_talk.py

from typing import Callable

import pytest
from claude_agent_sdk import ResultMessage

from pytest_claude_agent_sdk import SpyClaudeSDKClient

from chess_hustler.trash_talk import trash_talk


@pytest.mark.llm
@pytest.mark.asyncio
async def test_trash_talk_is_in_character(
    claude_client: SpyClaudeSDKClient,
    claude_judge_client: Callable,
) -> None:
    # Generate with app client (spy)
    response: str = await trash_talk(claude_client, "e4")

    # Verify the call was made
    claude_client.assert_called_once()
    claude_client.assert_any_call_contains("e4")

    # Evaluate with judge (separate fixture, not spied)
    async for msg in claude_judge_client(
        prompt=f"Does this sound like a sarcastic NYC chess hustler? YES or NO.\n\n{response}"
    ):
        if isinstance(msg, ResultMessage):
            assert msg.result is not None
            assert "YES" in msg.result.upper()

Markers

@pytest.mark.llm  # Mark tests that call LLM
pytest -m llm        # Run only LLM tests
pytest -m "not llm"  # Skip LLM tests (fast CI)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_claude_agent_sdk-0.1.0.tar.gz (70.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_claude_agent_sdk-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file pytest_claude_agent_sdk-0.1.0.tar.gz.

File metadata

File hashes

Hashes for pytest_claude_agent_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7b693b915e60e3db9e76f5b4d784204b14c325a28452331d7e656692d863e753
MD5 dcccdcb92c112785b82c8956d6b826fd
BLAKE2b-256 4ddadcc516dcd447e2f19e06b9154c18b314161597bf6af9bb01442ad880a667

See more details on using hashes here.

File details

Details for the file pytest_claude_agent_sdk-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pytest_claude_agent_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b0e3c2a8a862e14d18e8e2ca0d7d36a886eafc2267c11b49ff0a0dc034d896c
MD5 72a3c2bbb116e91cbf49688a2ed4b018
BLAKE2b-256 1c8a378112d5e30be405d0790e49abcac7edc5ee2f6e7b8f035940f45af21723

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page