pytest for LLMs - catch prompt regressions before they reach production
Project description
> evalflow
pytest for LLMs
You changed one prompt.
Summarization improved.
Classification silently broke.
Nobody noticed for 4 days.
evalflow catches this in CI before it ships.
Install
pip install evalflow
Quick Start
evalflow init
evalflow eval
What you get on day one:
- local prompt and dataset files
- SQLite-backed run history in
.evalflow/ - CI-friendly exit codes
- offline cache support for repeatable checks
Terminal Screenshot
> evalflow eval
Running 5 test cases against gpt-4o-mini...
✓ summarize_short_article 0.91
✓ classify_sentiment 1.00
✓ extract_entities 0.87
✗ answer_with_context 0.61
✓ rewrite_formal 0.93
Quality Gate: PASS
Failures: 1
Run ID: 20240315-a3f9c2d81b4e
Why evalflow
Traditional unit tests do not tell you when a prompt tweak quietly degrades a task. evalflow gives you a small local quality gate for prompt, model, and dataset changes.
Use it when you need to:
- catch regressions before merge
- compare runs locally
- keep prompt versions in YAML
- run the same gate in CI and on a laptop
GitHub Actions Workflow
# .github/workflows/evalflow.yml
name: LLM Quality Gate
on:
pull_request:
paths:
- "prompts/**"
- "evals/**"
- "**.py"
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install evalflow
- run: evalflow eval
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Features
- pytest-style exit codes:
0=pass,1=fail,2=error - exact match, embedding, consistency, and LLM judge methods
- baseline snapshots catch regressions, not just low scores
- prompt registry keeps prompts versioned in YAML
- works with OpenAI, Anthropic, Groq, Gemini, and Ollama
- local SQLite storage, no account needed
- offline cache for repeated and CI-safe checks
Command Surface
evalflow init
evalflow eval
evalflow doctor
evalflow runs
evalflow compare RUN_A RUN_B
evalflow prompt list
Documentation
- Docs hub: emartai.mintlify.app
- Quickstart source: docs/quickstart.mdx
- CLI reference source: docs/cli-reference.mdx
- CI guide source: docs/ci-github-actions.mdx
- Provider docs: docs/providers
Security
- evalflow reads API keys from environment variables, never config files
evalflow.yamlstores env var names, not secret values- keep
.envand.evalflow/out of git - see docs/dev-doc/security.md for the full security model
Reporting Security Issues
Please do not open public GitHub issues for security vulnerabilities. Open a private GitHub Security Advisory.
Examples
Development
See CONTRIBUTING.md for local setup, tests, smoke checks, and performance baselines.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evalflow-0.1.6.tar.gz.
File metadata
- Download URL: evalflow-0.1.6.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f247378ea7327498fd0ac2e9cf250a8ba8f19892e7a5d47954138815dce85214
|
|
| MD5 |
b43718267748b45db01f6610e147d285
|
|
| BLAKE2b-256 |
2f5c6bd6578541e9cac88e56fcff190d9e67c05dc8f28943d192500ff2b938bd
|
Provenance
The following attestation bundles were made for evalflow-0.1.6.tar.gz:
Publisher:
publish.yml on emartai/evalflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
evalflow-0.1.6.tar.gz -
Subject digest:
f247378ea7327498fd0ac2e9cf250a8ba8f19892e7a5d47954138815dce85214 - Sigstore transparency entry: 1247492451
- Sigstore integration time:
-
Permalink:
emartai/evalflow@e79da0b8dc35b315aedb9a030de0c7c6e9fab809 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/emartai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e79da0b8dc35b315aedb9a030de0c7c6e9fab809 -
Trigger Event:
push
-
Statement type:
File details
Details for the file evalflow-0.1.6-py3-none-any.whl.
File metadata
- Download URL: evalflow-0.1.6-py3-none-any.whl
- Upload date:
- Size: 53.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
266ea550ce9b0231b4165747acecf24a6dba777ad7b69446040334506077a2d2
|
|
| MD5 |
d4bbd6b0dfcd9f47d7f8d44473217810
|
|
| BLAKE2b-256 |
83ff70d6dd88fc1441ad39373b7220484e637efbd64837bfdc34a3859f71327d
|
Provenance
The following attestation bundles were made for evalflow-0.1.6-py3-none-any.whl:
Publisher:
publish.yml on emartai/evalflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
evalflow-0.1.6-py3-none-any.whl -
Subject digest:
266ea550ce9b0231b4165747acecf24a6dba777ad7b69446040334506077a2d2 - Sigstore transparency entry: 1247492458
- Sigstore integration time:
-
Permalink:
emartai/evalflow@e79da0b8dc35b315aedb9a030de0c7c6e9fab809 -
Branch / Tag:
refs/tags/v0.1.6 - Owner: https://github.com/emartai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e79da0b8dc35b315aedb9a030de0c7c6e9fab809 -
Trigger Event:
push
-
Statement type: