Pytest plugin for testing with Claude Agent SDK
Project description
pytest-claude-agent-sdk
Pytest plugin for working with Claude Agent SDK.
Three complementary use cases:
- Test apps that use Claude Agent SDK - your app uses Claude internally, and you need to test it
- Test apps using Claude Agent SDK - use Claude as a smart test oracle (LLM-as-judge)
- Both at once - test your Claude-powered app with Claude-powered assertions
Uses Claude Code authentication automatically - no API key management needed.
When to Use This Plugin
Your code accepts a client via dependency injection:
async def my_function(client: ClaudeSDKClient, data: str) -> str:
async for msg in client.query(f"Process: {data}"):
...
→ Use SpyClaudeSDKClient from this plugin. It wraps real calls while recording them for assertions.
Your code calls query() directly (no DI):
from claude_agent_sdk import query
async def my_function(data: str) -> str:
async for msg in query(prompt=f"Process: {data}"): # hardcoded
...
→ Use standard mocking (unittest.mock.patch or pytest-mock). This plugin won't help here — but that's a sign your code could benefit from dependency injection anyway.
Installation
pip install pytest-claude-agent-sdk
Or with uv:
uv add pytest-claude-agent-sdk
Fixtures
| Fixture | Type | Purpose |
|---|---|---|
claude_query |
QueryFunc |
Substitute for query() - stateless |
claude_client |
SpyClaudeSDKClient |
Client-like interface for app under test (with spy) |
claude_judge_client |
QueryFunc |
Separate query function for evaluation |
QueryFunc = Callable[..., AsyncIterator[Message]] - matches claude_agent_sdk.query() signature.
SpyClaudeSDKClient
The claude_client fixture provides a SpyClaudeSDKClient - wraps query() with call recording:
import pytest
from pytest_claude_agent_sdk import SpyClaudeSDKClient
@pytest.mark.llm
@pytest.mark.asyncio
async def test_my_app(claude_client: SpyClaudeSDKClient) -> None:
# Use it like query() - real LLM calls happen
async for msg in claude_client.query("Hello"):
pass
# Inspect what happened
assert claude_client.call_count == 1
assert "Hello" in claude_client.calls[0].prompt
assert claude_client.last_call.response is not None
# Use assertion helpers
claude_client.assert_called_once()
claude_client.assert_any_call_contains("Hello")
Spy Attributes
calls: list[CallRecord]- All calls madecall_count: int- Number of callslast_call: CallRecord | None- Most recent call
Spy Assertion Helpers
assert_called()- At least one call was madeassert_not_called()- No calls were madeassert_called_once()- Exactly one call was madeassert_call_count(n)- Exactly n calls were madeassert_any_call_contains(substring)- Some call's prompt contains substringassert_last_call_contains(substring)- Last call's prompt contains substringreset_calls()- Clear call history
Examples
1. Testing an app that uses Claude
Your app has a function that accepts something with a query() method. Test it with the spy client:
your_app/greeter.py
from typing import Protocol, AsyncIterator
from claude_agent_sdk import Message, ResultMessage
class QueryClient(Protocol):
"""Protocol for anything with a query() method."""
def query(self, prompt: str) -> AsyncIterator[Message]: ...
async def generate_greeting(client: QueryClient, name: str) -> str:
async for msg in client.query(f"Generate a short greeting for {name}"):
if isinstance(msg, ResultMessage):
return msg.result or ""
return ""
tests/test_greeter.py
import pytest
from pytest_claude_agent_sdk import SpyClaudeSDKClient
from your_app.greeter import generate_greeting
@pytest.mark.llm
@pytest.mark.asyncio
async def test_greeting(claude_client: SpyClaudeSDKClient) -> None:
greeting: str = await generate_greeting(claude_client, "Alice")
# Verify the output
assert len(greeting) > 0
assert "alice" in greeting.lower()
# Verify the call was made correctly
claude_client.assert_called_once()
claude_client.assert_last_call_contains("Alice")
2. Testing any app using Claude as judge
Your app doesn't use Claude, but you want Claude to evaluate outputs:
your_app/email.py
def format_email(subject: str, body: str) -> str:
return f"Subject: {subject}\n\n{body}\n\nBest regards"
tests/test_email.py
from typing import Callable
import pytest
from claude_agent_sdk import ResultMessage
from your_app.email import format_email
@pytest.mark.llm
@pytest.mark.asyncio
async def test_email_is_professional(claude_judge_client: Callable) -> None:
email: str = format_email("Meeting", "Let's meet tomorrow at 3pm.")
async for msg in claude_judge_client(
prompt=f"Is this email professional? Answer only YES or NO.\n\n{email}"
):
if isinstance(msg, ResultMessage):
assert msg.result is not None
assert "YES" in msg.result.upper()
3. Both: Test Claude app with Claude judge
Your app uses Claude, and you evaluate it with Claude (separate fixture):
chess_hustler/trash_talk.py
from typing import Protocol, AsyncIterator
from claude_agent_sdk import Message, ResultMessage
class QueryClient(Protocol):
"""Protocol for anything with a query() method."""
def query(self, prompt: str) -> AsyncIterator[Message]: ...
async def trash_talk(client: QueryClient, move: str) -> str:
async for msg in client.query(
f"You're a NYC chess hustler. React to opponent's move: {move}"
):
if isinstance(msg, ResultMessage):
return msg.result or ""
return ""
tests/test_trash_talk.py
from typing import Callable
import pytest
from claude_agent_sdk import ResultMessage
from pytest_claude_agent_sdk import SpyClaudeSDKClient
from chess_hustler.trash_talk import trash_talk
@pytest.mark.llm
@pytest.mark.asyncio
async def test_trash_talk_is_in_character(
claude_client: SpyClaudeSDKClient,
claude_judge_client: Callable,
) -> None:
# Generate with app client (spy)
response: str = await trash_talk(claude_client, "e4")
# Verify the call was made
claude_client.assert_called_once()
claude_client.assert_any_call_contains("e4")
# Evaluate with judge (separate fixture, not spied)
async for msg in claude_judge_client(
prompt=f"Does this sound like a sarcastic NYC chess hustler? YES or NO.\n\n{response}"
):
if isinstance(msg, ResultMessage):
assert msg.result is not None
assert "YES" in msg.result.upper()
Markers
@pytest.mark.llm # Mark tests that call LLM
pytest -m llm # Run only LLM tests
pytest -m "not llm" # Skip LLM tests (fast CI)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_claude_agent_sdk-0.1.0.tar.gz.
File metadata
- Download URL: pytest_claude_agent_sdk-0.1.0.tar.gz
- Upload date:
- Size: 70.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b693b915e60e3db9e76f5b4d784204b14c325a28452331d7e656692d863e753
|
|
| MD5 |
dcccdcb92c112785b82c8956d6b826fd
|
|
| BLAKE2b-256 |
4ddadcc516dcd447e2f19e06b9154c18b314161597bf6af9bb01442ad880a667
|
File details
Details for the file pytest_claude_agent_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pytest_claude_agent_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b0e3c2a8a862e14d18e8e2ca0d7d36a886eafc2267c11b49ff0a0dc034d896c
|
|
| MD5 |
72a3c2bbb116e91cbf49688a2ed4b018
|
|
| BLAKE2b-256 |
1c8a378112d5e30be405d0790e49abcac7edc5ee2f6e7b8f035940f45af21723
|