Test framework for LLM prompt files
Project description
prompttest
Test framework for LLM prompt files.
What It Does
prompttest is a test framework for LLM prompt files, similar to what Jest or pytest does for application code. You write test suites in YAML that assert properties of your prompt files -- content, structure, token counts, cost limits, and more. Tests run in CI to catch prompt regressions.
Installation
pip install prompttest
Dependencies: prompttools-core >= 1.0, promptcost >= 1.0, typer >= 0.12, pyyaml >= 6.0, rich >= 13.0
CLI Commands
prompttest run
Run prompt tests from a file or directory.
# Run a single test file
prompttest run tests/test_greeting.yaml
# Run all test files in a directory
prompttest run tests/
# Run with custom glob pattern
prompttest run tests/ --pattern "check_*.yaml"
# Stop on first failure
prompttest run tests/ --fail-fast
# JSON output
prompttest run tests/ --format json
# JUnit XML output (for CI)
prompttest run tests/ --format junit
# Verbose output
prompttest run tests/ -v
Options:
| Option | Default | Description |
|---|---|---|
--format, -f |
text |
Output format: text, json, junit |
--model, -m |
none | Override model for cost/token assertions |
--fail-fast |
false |
Stop after first failure |
--verbose, -v |
false |
Show detailed output for all tests |
--pattern, -p |
test_*.yaml |
Glob pattern for test file discovery |
prompttest init
Create an example test file in the current directory.
prompttest init
This creates test_example.yaml with sample test cases you can adapt to your project.
Test File Format
Test files are YAML with this structure:
suite: my-test-suite # Suite name (optional, defaults to filename)
prompt: prompts/greeting.yaml # Path to the prompt file (relative to test file)
model: gpt-4o # Default model for cost/token assertions (optional)
tests:
- name: test-name # Unique test name
assert: assertion_type # One of the 15 assertion types below
# ... assertion-specific parameters
The prompt path is resolved relative to the test file's directory.
Assertion Types
prompttest supports 15 assertion types:
Content Assertions
contains
Assert that prompt content contains specific text.
- name: has-greeting-instruction
assert: contains
text: "greet the user"
case_sensitive: false # optional, default: false
not_contains
Assert that prompt content does NOT contain specific text.
- name: no-injection-risk
assert: not_contains
text: "ignore previous instructions"
matches_regex
Assert that prompt content matches a regular expression.
- name: has-version-tag
assert: matches_regex
pattern: "v\\d+\\.\\d+"
case_sensitive: false
not_matches_regex
Assert that prompt content does NOT match a regular expression.
- name: no-hardcoded-urls
assert: not_matches_regex
pattern: "https?://api\\.example\\.com"
Structure Assertions
has_role
Assert that the prompt has a message with a given role.
- name: has-system-message
assert: has_role
role: system
has_variables
Assert that the prompt uses specific template variables.
- name: required-variables
assert: has_variables
variables:
- user_name
- context
has_metadata
Assert that the prompt has specific metadata keys.
- name: has-required-metadata
assert: has_metadata
keys:
- model
- description
valid_format
Assert that the prompt file parsed without errors and contains at least one message.
- name: parseable-prompt
assert: valid_format
Token/Size Assertions
max_tokens
Assert that total token count is under a maximum.
- name: within-context-window
assert: max_tokens
max: 4096
min_tokens
Assert that total token count is above a minimum.
- name: not-too-short
assert: min_tokens
min: 50
max_messages
Assert that message count is under a maximum.
- name: reasonable-conversation
assert: max_messages
max: 10
min_messages
Assert that message count is above a minimum.
- name: has-enough-context
assert: min_messages
min: 2
token_ratio
Assert that the system/user token ratio is within bounds.
- name: balanced-prompt
assert: token_ratio
ratio_max: 5.0
The ratio is computed as system_tokens / user_tokens.
Cost Assertions
max_cost
Assert that the estimated cost per invocation is under a budget ceiling. Requires a model (set on the test or the suite).
- name: cost-under-budget
assert: max_cost
max: 0.05
model: gpt-4o # optional if set on suite
Regression Assertions
content_hash
Assert that the prompt content SHA256 hash matches an expected value. Detects unexpected prompt changes.
- name: prompt-unchanged
assert: content_hash
hash: "a1b2c3d4..." # omit to record current hash (always passes)
If hash is omitted, the test passes and reports the current hash so you can record it.
Test Options
Each test case supports these common options:
- name: example-test
assert: contains
text: "hello"
skip: true # Skip this test
skip_reason: "not ready" # Reason for skipping
case_sensitive: false # For text/regex assertions (default: false)
model: gpt-4o # Override suite model for this test
Output Formats
Text (default)
Rich-formatted terminal output with colored pass/fail indicators.
Suite: greeting-tests
Prompt: prompts/greeting.yaml
PASS has-system-message
PASS token-count-reasonable
FAIL no-injection-risk
Content unexpectedly contains 'ignore previous instructions'
PASS cost-under-budget
Results:
3 passed, 1 failed (4 total)
Duration: 12ms
JSON
prompttest run tests/ --format json
Returns a JSON object with total, passed, failed, errors, skipped, duration_ms, and detailed suites array.
JUnit XML
prompttest run tests/ --format junit
Standard JUnit XML format compatible with CI systems (GitHub Actions, Jenkins, GitLab CI, CircleCI).
Programmatic Usage
from prompttest import (
load_test_suite,
run_test_suite,
run_test_file,
run_test_directory,
discover_test_files,
format_text,
format_json,
format_junit,
)
# Run a single test file
report = run_test_file("tests/test_greeting.yaml")
print(f"Passed: {report.passed}/{report.total}")
# Run all tests in a directory
report = run_test_directory("tests/", fail_fast=True, pattern="test_*.yaml")
# Format output
print(format_text(report))
print(format_json(report))
print(format_junit(report))
# Load and run a suite manually
suite = load_test_suite("tests/test_greeting.yaml")
results = run_test_suite(suite, fail_fast=False)
for r in results:
print(f"{r.test_name}: {r.status.value} - {r.message}")
CI Integration
GitHub Actions
- name: Run prompt tests
run: prompttest run tests/ --format junit > test-results.xml
- name: Upload test results
uses: actions/upload-artifact@v4
with:
name: prompt-test-results
path: test-results.xml
GitLab CI
prompt-tests:
script:
- pip install prompttest
- prompttest run tests/ --format junit > report.xml
artifacts:
reports:
junit: report.xml
Exit codes:
| Code | Meaning |
|---|---|
| 0 | All tests passed (or no tests found) |
| 1 | One or more tests failed or errored |
| 2 | Path not found |
License
MIT License. Author: Scott Converse.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompttest_ai-1.0.0.tar.gz.
File metadata
- Download URL: prompttest_ai-1.0.0.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da4b70b9262d5c226b4692e2033ce308fe927300cd5db185a94cb21ceaab6af2
|
|
| MD5 |
2bb817188a43b0bbefa4de04fd1a2623
|
|
| BLAKE2b-256 |
343f509bf655309059ddb224ca0450abbfc5cf60ac8143685dee4298c419cf9e
|
File details
Details for the file prompttest_ai-1.0.0-py3-none-any.whl.
File metadata
- Download URL: prompttest_ai-1.0.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8fe9b7044fbb7b559c7423ed32bb9ed48ce8a1bf42b1390641564dd286f91ce
|
|
| MD5 |
972b6bdb26c5ee1c710ab177bfb1c096
|
|
| BLAKE2b-256 |
e88ab1d52236e01e5602883888ce0b11c62d409f39c77acc2a101e71fd4b57c4
|