Skip to main content

Regression protection for LLM pipelines

Project description

promptry

PyPI npm CI Python 3.10+ License: MIT

Sentry for prompts. Sentry catches when your code breaks. promptry catches when your prompts break — versions them, runs eval suites in CI, and flags regressions or drift against a baseline. Local-first. No SaaS.

from promptry import track, suite, assert_semantic

# track() content-hashes your prompt and stores a new version if it changed
prompt = track(system_prompt, "rag-qa")
response = llm.chat(system=prompt, ...)

# suites are regular Python functions. run them via CLI or in CI.
@suite("rag-regression")
def test_quality():
    response = my_pipeline("What is photosynthesis?")
    assert_semantic(response, "Converts light into chemical energy")

When a suite regresses against its baseline, promptry reports what changed:

Overall score: 0.910 -> 0.720  REGRESSION

Probable cause:
  -> Prompt changed (v3 -> v4)

Install

pip install promptry                       # core
pip install promptry[semantic]             # + semantic assertions (sentence-transformers)
pip install promptry[dashboard]            # + web dashboard
pip install promptry[semantic,dashboard]   # everything

Quick start

promptry init                              # scaffold project + starter eval
promptry run smoke-test --module evals     # run it
PASS test_basic_quality (142ms)
  semantic (0.891) ok

Overall: PASS  score: 0.891

Features

Feature What it does
Prompt versioning Content-hashed, automatic dedup
Eval suites Semantic, schema, LLM-as-judge, JSON, regex, grounding assertions
Regression detection Compare against baselines, get root cause hints
Drift detection Catch slow quality degradation over time
Model comparison Statistical comparison against historical baseline (not just snapshots)
Cost tracking Token usage and cost per prompt, aggregated reports
Safety templates 25 starter jailbreak / injection / PII tests — add your own
MCP server Expose everything as tools for Claude, Cursor, VS Code, etc.
Dashboard Web UI for eval history, prompt diffs, model comparison, cost
JS/TS client Ship prompt events from frontend/Node apps

Dashboard

pip install promptry[dashboard]
promptry dashboard

Overview Suite Detail Prompts Models Cost

How it differs

Promptfoo DeepEval RAGAS LangSmith promptry
Language TypeScript Python Python Python + JS Python + JS
Local-first Yes Cloud push Yes SaaS only SQLite
Prompt versioning Via git + YAML No No Prompt Hub Automatic
Drift over time No No No Dashboards Regression window
Root cause hints No No No No Yes
Safety / red-team Yes Yes No No 25 starters
MCP server Plugin Partial No No Native
Vendor OpenAI-owned Independent Independent LangChain Independent
Cost Free Freemium Free Freemium Free

Honest caveats: Promptfoo has more assertion types and a larger red-team corpus. RAGAS has the gold-standard RAG metrics (faithfulness, context precision, answer relevancy). LangSmith has better multi-user dashboards and deeper LangChain integration. promptry's niche is the combo of local SQLite + automatic versioning + CI-native + MCP server in one Python-first package.

GitHub Action

Run eval suites in CI with one line. View on Marketplace.

# .github/workflows/eval.yml
name: Eval
on: [push]
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: bihanikeshav/promptry@v0.5.0
        with:
          suite: rag-regression
          module: evals
          compare: prod  # optional — compare against baseline
Input Required Default Description
suite Yes Eval suite name
module Yes Python module containing the suite
compare No Baseline tag to compare against
python-version No 3.12 Python version
extras No semantic pip extras to install

MCP server

claude mcp add promptry -- promptry mcp    # Claude Code

Works with Claude Desktop, Cursor, Windsurf, VS Code. See full setup.

Documentation

The full guide covers all assertions, cost tracking, model comparison, safety templates, notifications, storage modes, JS client, CLI reference, MCP setup, and config options.

Honest caveats

  • Early-stage. v0.6, solo-maintained, small user base. API is stable but bus-factor is one. Issues welcome.
  • "No API keys" applies to the framework only. SQLite storage and the CLI need nothing. assert_llm, assert_grounded, and cost tracking all need your own LLM provider key.
  • Drift detection is a rolling-window regression on scores. Works for steady degradation over a configurable window (default 30 runs). It is not a formal hypothesis test — see drift detection docs for exactly what it does and does not do.
  • Safety templates are starters, not comprehensive coverage. 25 curated prompts across 6 categories. For serious red-teaming look at garak or PyRIT. Bring your own templates via templates.toml.
  • Cost tracking uses hardcoded rate tables. Fine for rough estimates; won't reflect batching discounts, prompt caching, or provider price changes. Reconcile against invoices for finance.
  • Auto-instrumentation is opt-in. promptry.integrations.openai and .litellm wrap clients automatically; otherwise you add track() manually. Explicit by default.
  • No hosted multi-user UI. For that, look at LangSmith or Arize.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptry-0.6.1.tar.gz (270.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptry-0.6.1-py3-none-any.whl (247.9 kB view details)

Uploaded Python 3

File details

Details for the file promptry-0.6.1.tar.gz.

File metadata

  • Download URL: promptry-0.6.1.tar.gz
  • Upload date:
  • Size: 270.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for promptry-0.6.1.tar.gz
Algorithm Hash digest
SHA256 61f5eb08a5d126cc6610d265513de39e173eb4913ab634425ffda97fa306f994
MD5 da67f60c6b025bdd8a7672f925307fef
BLAKE2b-256 259a5f15deed3dcf0bf0bd6f97b6f050ebbbb8a7e000f6fdf20b2558d4225ccd

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptry-0.6.1.tar.gz:

Publisher: publish-pypi.yml on bihanikeshav/promptry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file promptry-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: promptry-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 247.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for promptry-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a0857a6378c5628039154fc9fb5b95d3f166be83b94bdd8a850c82db07000a8
MD5 223acb06ccb92bf506c3e43c0ae5b547
BLAKE2b-256 1559adb2e47cc452e67cf08caae97ae8af1c3e6f630383048245bbf235d61d5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptry-0.6.1-py3-none-any.whl:

Publisher: publish-pypi.yml on bihanikeshav/promptry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page