Use Claude Code in your pytests, or pytest your own Claude Code agents — or both

These details have not been verified by PyPI

Project links

Project description

pytest-claude-agent-sdk

Use Claude Code in your pytests, or pytest your own Claude Code agents — or both.

Three use cases:

Test apps that use Claude Agent SDK — your app calls Claude, verify it calls correctly
Test apps using Claude Agent SDK — Claude judges your app's output
Both at once — Claude-powered app, Claude-powered tests

Uses Claude Code authentication by default — see Authentication for options.

When to Use This Plugin

Your code accepts a client via dependency injection:

async def my_function(client: ClaudeSDKClient, data: str) -> str:
    async for msg in client.query(f"Process: {data}"):
        ...

→ Use SpyClaudeSDKClient from this plugin. It wraps real calls while recording them for assertions.

Your code calls query() directly (no DI):

from claude_agent_sdk import query

async def my_function(data: str) -> str:
    async for msg in query(prompt=f"Process: {data}"):  # hardcoded
        ...

→ Use standard mocking (unittest.mock.patch or pytest-mock). This plugin won't help here — but that's a sign your code could benefit from dependency injection anyway.

Installation

pip install pytest-claude-agent-sdk

Or with uv:

uv add pytest-claude-agent-sdk

Authentication

The plugin uses Claude Agent SDK which supports several authentication methods:

Claude Code subscription (default): If you have Claude Code installed and logged in (claude CLI), authentication is automatic — no configuration needed.

API key: Set the ANTHROPIC_API_KEY environment variable. This uses pay-as-you-go API billing.

export ANTHROPIC_API_KEY=sk-ant-...

Cloud providers:

Amazon Bedrock: CLAUDE_CODE_USE_BEDROCK=1
Google Vertex AI: CLAUDE_CODE_USE_VERTEX=1

See Claude Agent SDK documentation for details.

Fixtures

Fixture	Type	Purpose
`claude_query`	`QueryFunc`	Substitute for `query()` - stateless
`claude_client`	`SpyClaudeSDKClient`	Client-like interface for app under test (with spy)
`claude_judge_client`	`QueryFunc`	Separate query function for evaluation

QueryFunc = Callable[..., AsyncIterator[Message]] - matches claude_agent_sdk.query() signature.

SpyClaudeSDKClient

The claude_client fixture provides a SpyClaudeSDKClient - wraps query() with call recording:

import pytest
from pytest_claude_agent_sdk import SpyClaudeSDKClient


@pytest.mark.llm
@pytest.mark.asyncio
async def test_my_app(claude_client: SpyClaudeSDKClient) -> None:
    # Use it like query() - real LLM calls happen
    async for msg in claude_client.query("Hello"):
        pass

    # Inspect what happened
    assert claude_client.call_count == 1
    assert "Hello" in claude_client.calls[0].prompt
    assert claude_client.last_call.response is not None

    # Use assertion helpers
    claude_client.assert_called_once()
    claude_client.assert_any_call_contains("Hello")

Spy Attributes

calls: list[CallRecord] - All calls made
call_count: int - Number of calls
last_call: CallRecord | None - Most recent call

Spy Assertion Helpers

assert_called() - At least one call was made
assert_not_called() - No calls were made
assert_called_once() - Exactly one call was made
assert_call_count(n) - Exactly n calls were made
assert_any_call_contains(substring) - Some call's prompt contains substring
assert_last_call_contains(substring) - Last call's prompt contains substring
reset_calls() - Clear call history

Examples

1. Testing an app that uses Claude

Your app has a function that accepts something with a query() method. Test it with the spy client:

your_app/greeter.py

from typing import Protocol, AsyncIterator

from claude_agent_sdk import Message, ResultMessage


class QueryClient(Protocol):
    """Protocol for anything with a query() method."""
    def query(self, prompt: str) -> AsyncIterator[Message]: ...


async def generate_greeting(client: QueryClient, name: str) -> str:
    async for msg in client.query(f"Generate a short greeting for {name}"):
        if isinstance(msg, ResultMessage):
            return msg.result or ""
    return ""

tests/test_greeter.py

import pytest
from pytest_claude_agent_sdk import SpyClaudeSDKClient

from your_app.greeter import generate_greeting


@pytest.mark.llm
@pytest.mark.asyncio
async def test_greeting(claude_client: SpyClaudeSDKClient) -> None:
    greeting: str = await generate_greeting(claude_client, "Alice")

    # Verify the output
    assert len(greeting) > 0
    assert "alice" in greeting.lower()

    # Verify the call was made correctly
    claude_client.assert_called_once()
    claude_client.assert_last_call_contains("Alice")

2. Testing any app using Claude as judge

Your app doesn't use Claude, but you want Claude to evaluate outputs:

your_app/email.py

def format_email(subject: str, body: str) -> str:
    return f"Subject: {subject}\n\n{body}\n\nBest regards"

tests/test_email.py

from typing import Callable

import pytest
from claude_agent_sdk import ResultMessage

from your_app.email import format_email


@pytest.mark.llm
@pytest.mark.asyncio
async def test_email_is_professional(claude_judge_client: Callable) -> None:
    email: str = format_email("Meeting", "Let's meet tomorrow at 3pm.")

    async for msg in claude_judge_client(
        prompt=f"Is this email professional? Answer only YES or NO.\n\n{email}"
    ):
        if isinstance(msg, ResultMessage):
            assert msg.result is not None
            assert "YES" in msg.result.upper()

3. Both: Test Claude app with Claude judge

Your app uses Claude, and you evaluate it with Claude (separate fixture):

chess_hustler/trash_talk.py

from typing import Protocol, AsyncIterator

from claude_agent_sdk import Message, ResultMessage


class QueryClient(Protocol):
    """Protocol for anything with a query() method."""
    def query(self, prompt: str) -> AsyncIterator[Message]: ...


async def trash_talk(client: QueryClient, move: str) -> str:
    async for msg in client.query(
        f"You're a NYC chess hustler. React to opponent's move: {move}"
    ):
        if isinstance(msg, ResultMessage):
            return msg.result or ""
    return ""

tests/test_trash_talk.py

from typing import Callable

import pytest
from claude_agent_sdk import ResultMessage

from pytest_claude_agent_sdk import SpyClaudeSDKClient

from chess_hustler.trash_talk import trash_talk


@pytest.mark.llm
@pytest.mark.asyncio
async def test_trash_talk_is_in_character(
    claude_client: SpyClaudeSDKClient,
    claude_judge_client: Callable,
) -> None:
    # Generate with app client (spy)
    response: str = await trash_talk(claude_client, "e4")

    # Verify the call was made
    claude_client.assert_called_once()
    claude_client.assert_any_call_contains("e4")

    # Evaluate with judge (separate fixture, not spied)
    async for msg in claude_judge_client(
        prompt=f"Does this sound like a sarcastic NYC chess hustler? YES or NO.\n\n{response}"
    ):
        if isinstance(msg, ResultMessage):
            assert msg.result is not None
            assert "YES" in msg.result.upper()

Markers

@pytest.mark.llm  # Mark tests that call LLM

pytest -m llm        # Run only LLM tests
pytest -m "not llm"  # Skip LLM tests (fast CI)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jan 19, 2026

0.1.0

Jan 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_claude_agent_sdk-0.1.1.tar.gz (71.1 kB view details)

Uploaded Jan 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_claude_agent_sdk-0.1.1-py3-none-any.whl (10.1 kB view details)

Uploaded Jan 19, 2026 Python 3

File details

Details for the file pytest_claude_agent_sdk-0.1.1.tar.gz.

File metadata

Download URL: pytest_claude_agent_sdk-0.1.1.tar.gz
Upload date: Jan 19, 2026
Size: 71.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.11

File hashes

Hashes for pytest_claude_agent_sdk-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`fd9a1affc13061db571e9a3037e958e44e9505916c82e7de17e94545507b7bfa`
MD5	`cf7b4702d6c89fc9e4134fd5a0e650ca`
BLAKE2b-256	`ca2a869074faa46115d8ded081cea520dd0a75d1cdd7a8c4eb012d0ab70f197d`

See more details on using hashes here.

File details

Details for the file pytest_claude_agent_sdk-0.1.1-py3-none-any.whl.

File metadata

Download URL: pytest_claude_agent_sdk-0.1.1-py3-none-any.whl
Upload date: Jan 19, 2026
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.11

File hashes

Hashes for pytest_claude_agent_sdk-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cfa5021bbec7a61827acfa9a614a43cf746b1e23003b94787b8d737014337429`
MD5	`b640125c87a13acfa38ac0204904bd09`
BLAKE2b-256	`290732f813980c334a401a7d68da22129e464a588e00a0322155136f739db3a4`

See more details on using hashes here.

pytest-claude-agent-sdk 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytest-claude-agent-sdk

When to Use This Plugin

Installation

Authentication

Fixtures

SpyClaudeSDKClient

Spy Attributes

Spy Assertion Helpers

Examples

1. Testing an app that uses Claude

2. Testing any app using Claude as judge

3. Both: Test Claude app with Claude judge

Markers

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes