pytest for LLM prompts — tests, regressions, CI.
Project description
pytest-prompts
pytest for LLM prompts. Write tests, detect regressions, run in CI.
from pytest_prompts import prompt_test
@prompt_test()
def test_qa_knows_capital_of_france(runner):
result = runner.run(
prompt="prompts/qa.txt",
input="What is the capital of France?",
)
assert "Paris" in result.output
assert result.latency_ms < 5000
$ pytest-prompts run
pytest-prompts results
Test Model Tokens Latency Status
examples/test_prompts.py::test_summary... claude-sonnet-4-6 174 1.2s PASS
examples/test_prompts.py::test_qa_knows... claude-sonnet-4-6 48 0.9s PASS
examples/test_prompts.py::test_qa_admits... claude-sonnet-4-6 52 1.1s PASS
3 passed, 0 failed — 274 tokens total — $0.0012
Change a prompt → rerun → pytest-prompts diff shows what regressed.
Install
uv add pytest-prompts
export ANTHROPIC_API_KEY=sk-ant-...
Write a test
Any pytest file. Decorate with @prompt_test(), declare a runner fixture, assert on the result.
# tests/test_summarizer.py
from pytest_prompts import prompt_test
@prompt_test(model="claude-sonnet-4-6")
def test_summary_is_concise(runner):
result = runner.run(
prompt="prompts/summarizer.txt",
input="Long text here...",
)
assert len(result.output.split()) < 100
assert result.tokens_used < 500
result exposes output, input_tokens, output_tokens, tokens_used, latency_ms, model, cost_usd.
LLM-as-judge
String matching is fragile. Use runner.judge() to evaluate outputs semantically:
from pytest_prompts import prompt_test
@prompt_test()
def test_qa_knows_capital_of_france(runner):
result = runner.run(prompt="prompts/qa.txt", input="What is the capital of France?")
verdict = runner.judge(result, "The answer correctly identifies Paris as the capital of France")
assert verdict.verdict, verdict.reasoning
verdict exposes verdict (bool), reasoning (one sentence), criterion, input_tokens, output_tokens, cost_usd. Judge calls are recorded in snapshots alongside the run result.
Detect regressions
Every run writes snapshots to .pytest-prompts/snapshots/.
# Capture baseline on main
git checkout main
pytest-prompts run --snapshot-dir .pytest-prompts/base
# Run on your branch
git checkout feature/new-prompt
pytest-prompts run --snapshot-dir .pytest-prompts/head
# Compare
pytest-prompts diff .pytest-prompts/base .pytest-prompts/head
Output:
pytest-prompts diff
Test Base Head Status
test_summary_is_concise ✓ 342t 1.2s ✓ 891t 3.1s REGRESSION
test_qa_knows_capital_of_france ✓ 48t 0.9s ✓ 48t 0.8s ok
Regressions:
• test_summary_is_concise — tokens 342 → 891 (+160%)
Exit code 1 on any regression. Wire it into CI and you're done.
CI (GitHub Actions)
- uses: actions/checkout@v4
with:
fetch-depth: 0 # required for base-ref diff mode
- uses: chahine-tech/pytest-prompts@v0.1
with:
path: tests/prompts
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
base-ref: main
github-token: ${{ secrets.GITHUB_TOKEN }}
On pull requests, the action runs your tests against both main and your branch,
compares them, fails the job on regressions, and posts a summary comment on the PR.
| Input | Default | Description |
|---|---|---|
path |
. |
Test path (file or directory) |
anthropic-api-key |
— | Required. Anthropic API key |
python-version |
3.13 |
Python version |
base-ref |
Base git ref (e.g. main). On PRs, results are compared against this ref |
|
threshold |
0.05 |
Regression threshold as a fraction (5% by default) |
github-token |
When set on pull_request events, posts the results as a PR comment |
|
fail-on-regression |
true |
Fail the job if a regression is detected |
Outputs: passed, failed, total-tokens, total-cost-usd, regressions.
What's in the POC
@prompt_testdecorator withpytestintegrationRunnerfor the Anthropic API (Claude Sonnet 4.6 default)runner.judge()— LLM-as-judge for semantic assertionspytest-prompts run— run tests, summarize tokens/latency/costpytest-prompts diff— compare two snapshot dirs, flag regressions
Not here yet: OpenAI/Gemini adapters, static prompt analysis, LLM-as-judge, HTML reports. If you want them, open an issue — priorities come from usage, not from a roadmap.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_prompts-0.1.3.tar.gz.
File metadata
- Download URL: pytest_prompts-0.1.3.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e41272e0a52def480bf9eb0d66027606683da3851c5f1ed6f8f1f1e8c7236344
|
|
| MD5 |
340a6457825a9a5cc8fe926b20d9c510
|
|
| BLAKE2b-256 |
325b0a799e4c6ff871cce1b17f30e5de48d88715addbed1d9d32f4d15b5619f8
|
Provenance
The following attestation bundles were made for pytest_prompts-0.1.3.tar.gz:
Publisher:
release.yml on Chahine-tech/PromptCI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_prompts-0.1.3.tar.gz -
Subject digest:
e41272e0a52def480bf9eb0d66027606683da3851c5f1ed6f8f1f1e8c7236344 - Sigstore transparency entry: 1342287242
- Sigstore integration time:
-
Permalink:
Chahine-tech/PromptCI@5170c782aed8fea35efc660f3a82247af494a820 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/Chahine-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5170c782aed8fea35efc660f3a82247af494a820 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pytest_prompts-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pytest_prompts-0.1.3-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac844847ba0b72d2fba3f8f6a2d53af826c4566481545e425b8b2afd9f0b4abe
|
|
| MD5 |
a8b9ab1532bf18d69b7a3f9014d2a35d
|
|
| BLAKE2b-256 |
f3e1b3ef18a75acb8b1a2afe0243e227027f0d0ac4724dcbc39fc2042eb77169
|
Provenance
The following attestation bundles were made for pytest_prompts-0.1.3-py3-none-any.whl:
Publisher:
release.yml on Chahine-tech/PromptCI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pytest_prompts-0.1.3-py3-none-any.whl -
Subject digest:
ac844847ba0b72d2fba3f8f6a2d53af826c4566481545e425b8b2afd9f0b4abe - Sigstore transparency entry: 1342287249
- Sigstore integration time:
-
Permalink:
Chahine-tech/PromptCI@5170c782aed8fea35efc660f3a82247af494a820 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/Chahine-tech
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5170c782aed8fea35efc660f3a82247af494a820 -
Trigger Event:
push
-
Statement type: