evalflow

pytest for LLMs - catch prompt regressions before they reach production

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Emart29

These details have not been verified by PyPI

Project links

Documentation

Project description

> evalflow

pytest for LLMs

You changed one prompt.
Summarization improved.
Classification silently broke.
Nobody noticed for 4 days.

evalflow catches this in CI before it ships.

Install

pip install evalflow

Quick Start

evalflow init
evalflow eval

What you get on day one:

local prompt and dataset files
SQLite-backed run history in .evalflow/
CI-friendly exit codes
offline cache support for repeatable checks

Terminal Screenshot

> evalflow eval

Running 5 test cases against gpt-4o-mini...

✓ summarize_short_article    0.91
✓ classify_sentiment         1.00
✓ extract_entities           0.87
✗ answer_with_context        0.61
✓ rewrite_formal             0.93

Quality Gate: PASS
Failures: 1
Run ID: 20240315-a3f9c2d81b4e

Why evalflow

Traditional unit tests do not tell you when a prompt tweak quietly degrades a task. evalflow gives you a small local quality gate for prompt, model, and dataset changes.

Use it when you need to:

catch regressions before merge
compare runs locally
keep prompt versions in YAML
run the same gate in CI and on a laptop

GitHub Actions Workflow

# .github/workflows/evalflow.yml
name: LLM Quality Gate

on:
  pull_request:
    paths:
      - "prompts/**"
      - "evals/**"
      - "**.py"

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install evalflow
      - run: evalflow eval
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Features

pytest-style exit codes: 0=pass, 1=fail, 2=error
exact match, embedding, consistency, and LLM judge methods
baseline snapshots catch regressions, not just low scores
prompt registry keeps prompts versioned in YAML
works with OpenAI, Anthropic, Groq, Gemini, and Ollama
local SQLite storage, no account needed
offline cache for repeated and CI-safe checks

Command Surface

evalflow init
evalflow eval
evalflow doctor
evalflow runs
evalflow compare RUN_A RUN_B
evalflow prompt list

Documentation

Docs hub: emartai.mintlify.app
Quickstart source: docs/quickstart.mdx
CLI reference source: docs/cli-reference.mdx
CI guide source: docs/ci-github-actions.mdx
Provider docs: docs/providers

Security

evalflow reads API keys from environment variables, never config files
evalflow.yaml stores env var names, not secret values
keep .env and .evalflow/ out of git
see docs/dev-doc/security.md for the full security model

Reporting Security Issues

Please do not open public GitHub issues for security vulnerabilities. Open a private GitHub Security Advisory.

Examples

Development

See CONTRIBUTING.md for local setup, tests, smoke checks, and performance baselines.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Emart29

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.1.7

Apr 7, 2026

0.1.6

Apr 7, 2026

0.1.5

Apr 7, 2026

0.1.4

Apr 7, 2026

0.1.3

Apr 7, 2026

0.1.2

Apr 7, 2026

0.1.1

Apr 7, 2026

0.1.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalflow-0.1.7.tar.gz (50.5 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evalflow-0.1.7-py3-none-any.whl (53.4 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file evalflow-0.1.7.tar.gz.

File metadata

Download URL: evalflow-0.1.7.tar.gz
Upload date: Apr 7, 2026
Size: 50.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evalflow-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`6f703c7d7c8e8463f7c7f05c186268fb9a34d8e0fbc3b1fa90882c853261038f`
MD5	`11e159200a1f8bdbcdfe73472d054e50`
BLAKE2b-256	`847667da7d9087015566a1e93ee3ebe2504df94471496bbb32d16dfc462d31db`

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalflow-0.1.7.tar.gz:

Publisher: publish.yml on emartai/evalflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: evalflow-0.1.7.tar.gz
- Subject digest: 6f703c7d7c8e8463f7c7f05c186268fb9a34d8e0fbc3b1fa90882c853261038f
- Sigstore transparency entry: 1247706872
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: emartai/evalflow@6173efd5696f7016ea44c827e99dd10a93570235
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/emartai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6173efd5696f7016ea44c827e99dd10a93570235
- Trigger Event: push

File details

Details for the file evalflow-0.1.7-py3-none-any.whl.

File metadata

Download URL: evalflow-0.1.7-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 53.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evalflow-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`327b516b56f711687ed9bb9774462332b7a62946739b0a872e1f6b923083bb1f`
MD5	`0fb625d9caa6ed0514adc12f412ca527`
BLAKE2b-256	`686d5aaea71d3fc69721eb55548107c4594e0cd48cee8845b6a95af8e0678b2d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalflow-0.1.7-py3-none-any.whl:

Publisher: publish.yml on emartai/evalflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: evalflow-0.1.7-py3-none-any.whl
- Subject digest: 327b516b56f711687ed9bb9774462332b7a62946739b0a872e1f6b923083bb1f
- Sigstore transparency entry: 1247706928
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: emartai/evalflow@6173efd5696f7016ea44c827e99dd10a93570235
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/emartai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6173efd5696f7016ea44c827e99dd10a93570235
- Trigger Event: push

evalflow 0.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Quick Start

Terminal Screenshot

Why evalflow

GitHub Actions Workflow

Features

Command Surface

Documentation

Security

Reporting Security Issues

Examples

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance