Skip to main content

CI-first prompt testing framework for LLMs

Project description

PromptCheck

PromptCheck is a CI-first test harness for LLM prompts. It lets you write tests for LLM outputs and automatically checks them in CI so you can catch prompt regressions early — before they reach users. PromptCheck works with any LLM provider, including OpenAI, Anthropic, and open-source models via Groq, OpenRouter, or local APIs.

PromptCheck is a CI-first test harness for LLM prompts.

Write tests in YAML, gate pull-requests, and see pass/fail summaries posted as comments—so your prompts don't quietly regress.

build PyPI License: BSL-1.1


Install & Run

pip install promptcheck
promptcheck init   # creates a config and test scaffold
promptcheck run    # runs all prompt tests

Need full example? See example/ or the Quick-Start Guide.


Get Started

Ready to dive in?


Why Prompt Testing Matters

LLMs can break without warning — even small prompt changes or model updates can cause major regressions. PromptCheck automates prompt evaluation like unit tests automate code quality.


What does PromptCheck do? Key Concepts in Automated LLM Evaluation

When you tweak a prompt, swap models, or refactor your agent code, PromptCheck runs a battery of tests in CI (Rouge, regex, token-cost, latency, etc.) and fails the pull-request if quality regresses or cost spikes.

Think pytest + coverage, but for LLM output.


Key Features for Effective LLM Testing

  • Easy setup — drop a YAML test file, add the GitHub Action, done.
  • Multi‑provider — Works with OpenAI, Anthropic, Groq, OpenRouter, or any model you connect via API (more built-in providers coming soon).
  • Metrics out‑of‑the‑box — exact/regex match, ROUGE‑L, BLEU (optional), token‑count, latency, cost.
  • Readable reports — Action log output and (coming soon) PR comment bot. run.json artifact produced.
  • Fast to extend — write your own metric in <30 lines (standard Python).
Feature Free Pro
CLI & GitHub Action
Unlimited history & charts
Slack alerts

What it looks like

PromptCheck PR Comment


How the YAML works (tests/*.yaml)

A test file contains a list of test cases. Here's an example structure:

- id: "openrouter_greet_test_001"
  name: "OpenRouter Basic Greeting Test"
  description: "Tests a basic greeting prompt."
  type: "llm_generation"

  input_data:
    prompt: "Briefly introduce yourself and greet the user."

  expected_output:
    # For regex_match, a pattern to find in the LLM's output
    regex_pattern: ".+" # Example: matches any non-empty string

  metric_configs:
    - metric: "regex_match" 
    - metric: "token_count"
      parameters:
        count_types: ["completion", "total"]
    - metric: "latency"
      threshold: # Optional: fail if conditions aren't met
        value: 10000 # e.g., latency_ms <= 10000
    - metric: "cost" 

  model_config:
    provider: "openrouter"
    model_name: "mistralai/mistral-7b-instruct"
    parameters:
      temperature: 0.7
      max_tokens: 50
      timeout_s: 25.0 
      retry_attempts: 2

  tags: ["openrouter", "greeting"] 

Add more cases in tests/. Thresholds (like value for latency, or f_score for rouge) are defined within the threshold object of a metric_config.


Installation Options & Development Setup

# From PyPI (once 0.1.0+ is live)
# pip install promptcheck

# With optional BLEU metric (requires NLTK)
# pip install promptcheck[bleu]

# For development:
poetry install # Installs base dependencies
poetry install --extras bleu # Installs with BLEU support

Releasing (maintainers)

# 1. Ensure tests pass and docs are updated
# 2. Bump version in pyproject.toml
poetry version <new_version>  # e.g., 0.1.0, 0.2.0b1

# 3. Build the package
poetry build

# 4. Publish to TestPyPI first (configured in pyproject.toml or via poetry config)
# poetry config pypi-token.testpypi <YOUR_TESTPYPI_TOKEN>
poetry publish -r testpypi

# 5. Test TestPyPI package thoroughly (e.g., in a clean venv or CI)

# 6. Publish to PyPI (prod)
# poetry config pypi-token.pypi <YOUR_PYPI_TOKEN>
poetry publish

# 7. Tag the release in Git
git tag v<new_version>        # e.g., v0.1.0
git push origin v<new_version>

Documentation

📖 Docs: Quick-Start Guide · YAML Reference (Coming Soon!)


Roadmap

  • PR comment bot (✅/❌ matrix in‑line)
  • Hosted dashboard (Supabase)
  • Async runner for large test suites
  • More metrics and LLM provider integrations

Contributing

  1. Fork & clone the repository.
  2. Set up your development environment: poetry install --extras bleu (to include all deps).
  3. Run tests locally: poetry run promptcheck run tests/ (or a specific file). Keep it green!
  4. Make your changes, add tests for new features.
  5. Open a Pull Request.

Feedback & Questions

Found an issue or have a question? We'd love to hear from you! Please open an issue or start a discussion.


License

License: Business Source License 1.1
PromptCheck is free to use for evaluation and non-production use. For commercial licenses, contact us.


End of Document – keep this file as the project's living reference; version & timestamp changes at top on each major edit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptcheck-0.1.0b5.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptcheck-0.1.0b5-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file promptcheck-0.1.0b5.tar.gz.

File metadata

  • Download URL: promptcheck-0.1.0b5.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.3 Darwin/24.3.0

File hashes

Hashes for promptcheck-0.1.0b5.tar.gz
Algorithm Hash digest
SHA256 c167218cb61b2a1b74bfad250a14e61159832bc8327cc875691c62b8a2af655b
MD5 8e1b1a3d9fbdb61bef8275624249dc9e
BLAKE2b-256 bbbada7c04b011c17280c8b6fb99c5b1e62ba306050279b8e85706c636708783

See more details on using hashes here.

File details

Details for the file promptcheck-0.1.0b5-py3-none-any.whl.

File metadata

  • Download URL: promptcheck-0.1.0b5-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.3 Darwin/24.3.0

File hashes

Hashes for promptcheck-0.1.0b5-py3-none-any.whl
Algorithm Hash digest
SHA256 8c868aeb62c4640d6b90daddc17577d79a1d6609a4fac7e6c5ae1765be52a94e
MD5 429efd34582f4e37a54c7e322ada38b0
BLAKE2b-256 a6023ddefcf97f126920205c397b59a940eb935eda5b722c5ef0259b3892f14b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page