Automated testing for LLM prompts. Like pytest, but for prompts.

These details have not been verified by PyPI

Project links

Project description

promptlab ⚡

Automated testing for LLM prompts. Write test cases in YAML, run them against Claude or OpenAI, get pass/fail results in your terminal.

Like pytest, but for prompts.

pip install promptlab-cli
promptlab run tests/

✅ summarize_article :: returns_short_summary        PASS (1.2s)
✅ summarize_article :: mentions_key_points           PASS (1.1s)
❌ translate_text :: preserves_tone                   FAIL (0.9s)
   Expected: contains "formal"
   Got: "Here is the translated text in a casual style..."

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 2 passed, 1 failed, 3 total (3.2s)

Why?

You're building an app with Claude or GPT. Your prompt works today. Tomorrow you tweak it and something breaks. You don't notice until a user complains.

promptlab catches prompt regressions before they ship. Define what good output looks like, run tests on every change, and know immediately if something broke.

Quickstart

Install

pip install promptlab-cli

Set your API key

export ANTHROPIC_API_KEY=sk-ant-...
# or
export OPENAI_API_KEY=sk-...

Write a test file

Create tests/summarize.yaml:

prompt: |
  Summarize this article in 2-3 sentences:
  {{ article }}

model: claude-sonnet-4-20250514

tests:
  - name: short_summary
    vars:
      article: |
        The Federal Reserve held interest rates steady on Wednesday,
        keeping the benchmark rate in the 5.25%-5.50% range. Chair
        Jerome Powell said the committee needs more confidence that
        inflation is moving toward the 2% target before cutting rates.
    assert:
      - type: max_tokens
        value: 100
      - type: contains
        value: "Federal Reserve"
      - type: contains
        value: "interest rate"

  - name: handles_empty_input
    vars:
      article: ""
    assert:
      - type: not_contains
        value: "error"
      - type: min_length
        value: 10

Run it

promptlab run tests/

Test File Format

Each .yaml file defines a prompt and its test cases:

# The prompt template. Use {{ variable }} for inputs.
prompt: |
  You are a helpful assistant. {{ instruction }}

# Which model to use
model: claude-sonnet-4-20250514   # or gpt-4o, claude-haiku-4-5-20251001, etc.

# Optional system prompt
system: "You are a concise technical writer."

# Optional model parameters
temperature: 0
max_tokens: 500

# Test cases
tests:
  - name: test_name
    vars:
      instruction: "Explain what a CPU does in one sentence."
    assert:
      - type: contains
        value: "processor"
      - type: max_tokens
        value: 50

Assertion Types

Type	Description	Example
`contains`	Output must contain this string (case-insensitive)	`value: "machine learning"`
`not_contains`	Output must NOT contain this string	`value: "I'm sorry"`
`starts_with`	Output must start with this string	`value: "Sure"`
`regex`	Output must match this regex pattern	`value: "\\d{4}"`
`max_tokens`	Output must be at most N tokens	`value: 100`
`min_length`	Output must be at least N characters	`value: 50`
`max_length`	Output must be at most N characters	`value: 500`
`equals`	Output must exactly equal this string	`value: "42"`
`llm_judge`	Ask another LLM to evaluate the output	`value: "Is this response helpful and accurate?"`

LLM-as-Judge

The most powerful assertion type. Uses a second LLM call to evaluate output quality:

tests:
  - name: helpful_response
    vars:
      question: "How do I fix a memory leak in Python?"
    assert:
      - type: llm_judge
        value: "Does this response provide specific, actionable debugging steps? Answer YES or NO."

Supported Models

Anthropic (Claude):

claude-sonnet-4-20250514
claude-haiku-4-5-20251001
Set ANTHROPIC_API_KEY environment variable

OpenAI:

gpt-4o
gpt-4o-mini
Set OPENAI_API_KEY environment variable

CLI Commands

# Run all test files in a directory
promptlab run tests/

# Run a single test file
promptlab run tests/summarize.yaml

# Verbose output (show full LLM responses)
promptlab run tests/ --verbose

# Output results as JSON
promptlab run tests/ --json

# Dry run (show what would be tested without calling APIs)
promptlab run tests/ --dry-run

Use Cases

Prompt regression testing — Run tests in CI/CD to catch regressions
Prompt comparison — Test the same cases across different models
Guard rails validation — Verify your prompt rejects harmful inputs
Output format checking — Ensure structured output matches expectations

Development

git clone https://github.com/vigp17/promptlab.git
cd promptlab
pip install -e ".[dev]"
pytest

Roadmap

YAML test definitions
Claude and OpenAI support
9 assertion types including LLM-as-judge
CLI with colored output
Cost tracking per test run
HTML report generation
Parallel test execution
GitHub Actions integration
Prompt versioning and diff
Custom scoring functions

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptlab_cli-0.1.0.tar.gz (9.9 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

promptlab_cli-0.1.0-py3-none-any.whl (10.8 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file promptlab_cli-0.1.0.tar.gz.

File metadata

Download URL: promptlab_cli-0.1.0.tar.gz
Upload date: Mar 24, 2026
Size: 9.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.14.3 HTTPX/0.28.1

File hashes

Hashes for promptlab_cli-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`36ca9c0da58ee0d92f9340a03f83756fe3112d5d6cb120aa79e2b34cf1670b2d`
MD5	`ef37560efe494bb874c798af882f321f`
BLAKE2b-256	`b037b526cad841b766764e9abde96815d91b2f0c67f89315820a5fadbd2441c5`

See more details on using hashes here.

File details

Details for the file promptlab_cli-0.1.0-py3-none-any.whl.

File metadata

Download URL: promptlab_cli-0.1.0-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.14.3 HTTPX/0.28.1

File hashes

Hashes for promptlab_cli-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c23e75295f03d92a97f377c4340cf929fdddd64f2af30a618649be8aee02dd5`
MD5	`1c90513a79c3b420d9d58b0cdfaf9ae0`
BLAKE2b-256	`ebf164b9240d18b94f34bc60664173621d9ae58af783740a2dec2c01be3148e3`

See more details on using hashes here.

promptlab-cli 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

promptlab ⚡

Why?

Quickstart

Install

Set your API key

Write a test file

Run it

Test File Format

Assertion Types

LLM-as-Judge

Supported Models

CLI Commands

Use Cases

Development

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes