Regression protection for LLM pipelines
Project description
promptry
Sentry for prompts. Sentry catches when your code breaks. promptry catches when your prompts break — versions them, runs eval suites in CI, and flags regressions or drift against a baseline. Local-first. No SaaS.
from promptry import track, suite, assert_semantic
# track() content-hashes your prompt and stores a new version if it changed
prompt = track(system_prompt, "rag-qa")
response = llm.chat(system=prompt, ...)
# suites are regular Python functions. run them via CLI or in CI.
@suite("rag-regression")
def test_quality():
response = my_pipeline("What is photosynthesis?")
assert_semantic(response, "Converts light into chemical energy")
When a suite regresses against its baseline, promptry reports what changed:
Overall score: 0.910 -> 0.720 REGRESSION
Probable cause:
-> Prompt changed (v3 -> v4)
Install
pip install promptry # core
pip install promptry[semantic] # + semantic assertions (sentence-transformers)
pip install promptry[dashboard] # + web dashboard
pip install promptry[semantic,dashboard] # everything
Quick start
promptry init # scaffold project + starter eval
promptry run smoke-test --module evals # run it
PASS test_basic_quality (142ms)
semantic (0.891) ok
Overall: PASS score: 0.891
Features
| Feature | What it does |
|---|---|
| Prompt versioning | Content-hashed, automatic dedup |
| Eval suites | Semantic, schema, LLM-as-judge, JSON, regex, grounding assertions |
| Regression detection | Compare against baselines, get root cause hints |
| Drift detection | Catch slow quality degradation over time |
| Model comparison | Statistical comparison against historical baseline (not just snapshots) |
| Cost tracking | Token usage and cost per prompt, aggregated reports |
| Safety templates | 25 starter jailbreak / injection / PII tests — add your own |
| MCP server | Expose everything as tools for Claude, Cursor, VS Code, etc. |
| Dashboard | Web UI for eval history, prompt diffs, model comparison, cost |
| JS/TS client | Ship prompt events from frontend/Node apps |
Dashboard
pip install promptry[dashboard]
promptry dashboard
How it differs
| Promptfoo | DeepEval | RAGAS | LangSmith | promptry | |
|---|---|---|---|---|---|
| Language | TypeScript | Python | Python | Python + JS | Python + JS |
| Local-first | Yes | Cloud push | Yes | SaaS only | SQLite |
| Prompt versioning | Via git + YAML | No | No | Prompt Hub | Automatic |
| Drift over time | No | No | No | Dashboards | Regression window |
| Root cause hints | No | No | No | No | Yes |
| Safety / red-team | Yes | Yes | No | No | 25 starters |
| MCP server | Plugin | Partial | No | No | Native |
| Vendor | OpenAI-owned | Independent | Independent | LangChain | Independent |
| Cost | Free | Freemium | Free | Freemium | Free |
Honest caveats: Promptfoo has more assertion types and a larger red-team corpus. RAGAS has the gold-standard RAG metrics (faithfulness, context precision, answer relevancy). LangSmith has better multi-user dashboards and deeper LangChain integration. promptry's niche is the combo of local SQLite + automatic versioning + CI-native + MCP server in one Python-first package.
GitHub Action
Run eval suites in CI with one line. On pull requests it posts (or updates) a single comment summarizing the eval: overall score, pass/fail counts, and any regressed tests vs. the previous run. View on Marketplace.
# .github/workflows/eval.yml
name: Eval
on: [push, pull_request]
jobs:
eval:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write # required for PR comments
steps:
- uses: actions/checkout@v4
- uses: bihanikeshav/promptry@v0.6.0
with:
suite: rag-regression
module: evals
compare: prod # optional — compare against baseline
Example PR comment on a regression:
## promptry eval: rag-regression
| | Current | Baseline | Delta |
|---|---|---|---|
| Overall score | 0.891 | 0.910 | -0.019 |
| Passed | 8/10 | 9/10 | -1 |
| Status | REGRESSED | PASS | |
**Regressions:**
- `test_photosynthesis_answer`: semantic 0.89 -> 0.72 (-0.17)
- `test_schema_validation`: passed -> **failed**
_Generated by [promptry](https://github.com/bihanikeshav/promptry)_
Subsequent pushes edit the same comment instead of spamming new ones.
| Input | Required | Default | Description |
|---|---|---|---|
suite |
Yes | Eval suite name | |
module |
Yes | Python module containing the suite | |
compare |
No | Baseline tag to compare against | |
python-version |
No | 3.12 |
Python version |
extras |
No | semantic |
pip extras to install |
pr-comment |
No | true |
Post/update a PR comment with results |
github-token |
No | ${{ github.token }} |
Token used to post PR comments |
MCP server
claude mcp add promptry -- promptry mcp # Claude Code
Works with Claude Desktop, Cursor, Windsurf, VS Code. See full setup.
Documentation
The full guide covers all assertions, cost tracking, model comparison, safety templates, notifications, storage modes, JS client, CLI reference, MCP setup, and config options.
Honest caveats
- Early-stage. v0.7, solo-maintained, small user base. API is stable but bus-factor is one. Issues welcome.
- "No API keys" applies to the framework only. SQLite storage and the CLI need nothing.
assert_llm,assert_grounded, and cost tracking all need your own LLM provider key. - Drift detection is a rolling-window regression on scores. Works for steady degradation over a configurable window (default 30 runs). It is not a formal hypothesis test — see drift detection docs for exactly what it does and does not do.
- Safety templates are starters, not comprehensive coverage. 25 curated prompts across 6 categories. For serious red-teaming look at garak or PyRIT. Bring your own templates via
templates.toml. - Cost tracking uses hardcoded rate tables. Fine for rough estimates; won't reflect batching discounts, prompt caching, or provider price changes. Reconcile against invoices for finance.
- Auto-instrumentation is opt-in.
promptry.integrations.openaiand.litellmwrap clients automatically; otherwise you addtrack()manually. Explicit by default. - No hosted multi-user UI. For that, look at LangSmith or Arize.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptry-0.7.0.tar.gz.
File metadata
- Download URL: promptry-0.7.0.tar.gz
- Upload date:
- Size: 297.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc2a37aaa75392b51612bb5e67a6ce79d881e639df23fd8f12a079a9c282e2f6
|
|
| MD5 |
c7ccac7e6e93abe810a1016fb0772ee3
|
|
| BLAKE2b-256 |
74fa93dab82613d0f4166d549a647404d80b9e3e523a5bb1afb242df9f89ab81
|
Provenance
The following attestation bundles were made for promptry-0.7.0.tar.gz:
Publisher:
publish-pypi.yml on bihanikeshav/promptry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptry-0.7.0.tar.gz -
Subject digest:
dc2a37aaa75392b51612bb5e67a6ce79d881e639df23fd8f12a079a9c282e2f6 - Sigstore transparency entry: 1332809910
- Sigstore integration time:
-
Permalink:
bihanikeshav/promptry@7da4e6c2efccdd0d49129cf4ff06f3e4589017d9 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/bihanikeshav
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7da4e6c2efccdd0d49129cf4ff06f3e4589017d9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file promptry-0.7.0-py3-none-any.whl.
File metadata
- Download URL: promptry-0.7.0-py3-none-any.whl
- Upload date:
- Size: 266.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0bf986cbf67e28d18bfe16f6433f473474c8755b39e417cac341e6f76d5b871
|
|
| MD5 |
dcba3a4a0c5e6086bdf0e8f540eb6a2f
|
|
| BLAKE2b-256 |
bac1871eb6c1382b484d04febac16cedf499276f12b6a9d7fc128f1cb492e164
|
Provenance
The following attestation bundles were made for promptry-0.7.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on bihanikeshav/promptry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptry-0.7.0-py3-none-any.whl -
Subject digest:
e0bf986cbf67e28d18bfe16f6433f473474c8755b39e417cac341e6f76d5b871 - Sigstore transparency entry: 1332810058
- Sigstore integration time:
-
Permalink:
bihanikeshav/promptry@7da4e6c2efccdd0d49129cf4ff06f3e4589017d9 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/bihanikeshav
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7da4e6c2efccdd0d49129cf4ff06f3e4589017d9 -
Trigger Event:
release
-
Statement type: