Evaluate eval regressions.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

reval

reval correlates your Langfuse eval sessions with your git history and uses a multi-agent LLM pipeline to pinpoint which code changes caused which metric regressions. It produces a report with explanations, evidence, and suggested fixes.

Installation

From PyPI:

pip install reval-cli

From source:

git clone https://github.com/calebevans/reval.git
cd reval
pip install .

For development (includes pytest, mypy, ruff, pre-commit):

pip install ".[dev]"

Requires Python 3.10+.

Quick Start

Generate a starter config:

reval init

Set your Langfuse credentials (or add them to reval.yaml):

export LANGFUSE_BASE_URL="https://cloud.langfuse.com"
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."

Run an analysis against a Langfuse eval session:

reval analyze --eval-results <session-id>

To compare two sessions (current vs. baseline) and correlate regressions with code changes:

reval analyze \
  --eval-results <current-session-id> \
  --eval-baseline <baseline-session-id> \
  --base main

Configuration

reval is configured through a reval.yaml file in your project root. Every field has a sensible default, so the file is optional for simple use cases.

langfuse:
  api_url: https://cloud.langfuse.com
  public_key: pk-...
  secret_key: sk-...
  project_id: ""                  # auto-detected if omitted
  current_session_id: ""          # or use --eval-results
  baseline_session_id: ""         # or use --eval-baseline
  publish: false                  # post results back to Langfuse

metrics:
  - name: answer_relevancy
    threshold: 0.05               # flag if score drops by more than this
  - name: faithfulness
    threshold: 0.05

relevance:
  include_patterns: []            # empty = include all non-ignored files
  ignore_patterns:
    - "**/tests/**"
    - "**/__pycache__/**"
    - "*.md"
    - "*.lock"
  category_mappings:
    prompt:
      - "**/prompts/**"
      - "**/*.prompt"
    model_config:
      - "**/config/model*"
      - "**/*llm_config*"
    retrieval:
      - "**/retrieval/**"
      - "**/rag/**"
    tool_definition:
      - "**/tools/**"
      - "**/functions/**"
    output_parsing:
      - "**/parsers/**"
      - "**/schema*"
    eval_config:
      - "**/eval*"

llm:
  model: openai/gpt-4o            # any LiteLLM model identifier
  temperature: 0.2
  max_tokens: 4096
  context_window: null             # override the model's default context window
  diff_model: null                 # use a different model for diff analysis
  eval_model: null                 # use a different model for eval analysis
  synthesis_model: null            # use a different model for synthesis

git:
  base: HEAD                       # base commit ref
  head: working                    # "working" = uncommitted changes

Configuration Sections

langfuse - Connection settings for your Langfuse instance. Credentials can also be set through environment variables (see below). Set publish: true to write analysis results back to Langfuse as comments.

metrics - List of metric names and their regression thresholds. A metric is flagged as regressed when current_score - baseline_score falls below -threshold. Defaults to 0.05 if not specified.

relevance - Controls which files from the git diff are included in analysis. Files matching ignore_patterns are excluded. If include_patterns is non-empty, only files matching at least one include pattern (and no ignore pattern) are kept. The category_mappings section maps glob patterns to semantic categories (prompt, model_config, retrieval, etc.) so the analysis agents understand the role of each changed file.

llm - Model configuration. The model field accepts any LiteLLM model identifier (e.g. openai/gpt-4o, anthropic/claude-sonnet-4-20250514, vertex_ai/gemini-2.0-flash). You can assign different models to each analysis agent using diff_model, eval_model, and synthesis_model.

git - The commit refs to diff. Set head to working to diff uncommitted changes against base, or set both to commit SHAs/branch names.

Environment Variables

Langfuse credentials can be provided through environment variables instead of (or in addition to) reval.yaml. Environment variables take precedence when the corresponding config field is left empty.

Variable	Config equivalent	Description
`LANGFUSE_BASE_URL`	`langfuse.api_url`	Langfuse API URL
`LANGFUSE_PUBLIC_KEY`	`langfuse.public_key`	Langfuse public key
`LANGFUSE_SECRET_KEY`	`langfuse.secret_key`	Langfuse secret key
`LANGFUSE_PROJECT_ID`	`langfuse.project_id`	Langfuse project ID (auto-detected if omitted)

CLI Reference

`reval init`

Generate a starter reval.yaml with interactive prompts.

reval init [--output PATH]

Option	Default	Description
`--output`	`reval.yaml`	Path for the generated config file

`reval analyze`

Run the analysis pipeline. This is the main command.

reval analyze [OPTIONS]

Option	Default	Description
`--eval-results`		Langfuse session ID for the current eval run (required)
`--eval-baseline`		Langfuse session ID for the baseline run (omit for single-session mode)
`--base`	From config or `HEAD`	Base commit ref
`--head`	From config or `working`	Head ref (`working` for uncommitted changes)
`--config`	`reval.yaml`	Path to config file
`--output`	`terminal`	Output format: `terminal`, `json`, or `markdown`
`--output-file`		Write the report to a file instead of stdout
`--threshold`	`0.05`	Global regression threshold (overrides per-metric config)
`--model`	From config	LLM model to use (overrides config)
`--publish / --no-publish`	From config	Publish results back to Langfuse
`--verbose`	`false`	Show debug information

`reval report`

Re-render a previously saved JSON report in a different format.

reval report REPORT_FILE [OPTIONS]

Option	Default	Description
`--output`	`terminal`	Output format: `terminal`, `json`, or `markdown`
`--output-file`		Write the report to a file instead of stdout

Example: save a JSON report, then render it as markdown later:

reval analyze --eval-results sess-123 --output json --output-file report.json
reval report report.json --output markdown

Analysis Modes

Compare mode

Activated when you provide both --eval-results and --eval-baseline. reval fetches both sessions from Langfuse, diffs the git history between --base and --head, and runs three agents:

Diff agent examines code changes in isolation and forms hypotheses about their potential eval impact.
Eval agent investigates each regressed test case by comparing outputs, scores, and evaluator reasoning between current and baseline runs.
Synthesis agent correlates the diff and eval findings into a final report with explanations and suggested fixes.

Single-session mode

Activated when you omit --eval-baseline. reval analyzes a single eval session without a baseline comparison. It loads source files matching your relevance patterns, runs the eval agent on any test cases that fall below threshold, and produces findings about what may be going wrong.

Output Formats

Format	Flag	Description
Terminal	`--output terminal`	Rich tables and panels with color-coded diffs (default)
JSON	`--output json`	Machine-readable output, can be re-rendered with `reval report`
Markdown	`--output markdown`	Tables and fenced diff blocks, suitable for PRs or documentation

All formats can be written to a file with --output-file PATH.

Publishing to Langfuse

When --publish is passed (or langfuse.publish is set to true in config), reval posts its analysis results back to Langfuse:

A session comment with the full markdown report is added to the current session.
A trace comment with relevant findings is added to each failed trace.

This makes it easy to review reval's analysis directly in the Langfuse UI alongside your eval results.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

calebevans

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reval_cli-1.0.0.tar.gz (42.0 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reval_cli-1.0.0-py3-none-any.whl (36.1 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file reval_cli-1.0.0.tar.gz.

File metadata

Download URL: reval_cli-1.0.0.tar.gz
Upload date: Apr 28, 2026
Size: 42.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for reval_cli-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`e5f33c51e0d3a3f999ad981711cb274b3e3470844537787b8d7b637333d935b3`
MD5	`b212532a396ba45058afc371cb8c4eb4`
BLAKE2b-256	`591e01af50a5360bea53734d7d33bf7a0db0488f38e8516768b3b66c2ed696b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for reval_cli-1.0.0.tar.gz:

Publisher: release.yml on calebevans/reval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: reval_cli-1.0.0.tar.gz
- Subject digest: e5f33c51e0d3a3f999ad981711cb274b3e3470844537787b8d7b637333d935b3
- Sigstore transparency entry: 1396201135
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: calebevans/reval@1eabb33e6f0bb0907893b45f26a9ef49b7cf2e19
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/calebevans
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1eabb33e6f0bb0907893b45f26a9ef49b7cf2e19
- Trigger Event: release

File details

Details for the file reval_cli-1.0.0-py3-none-any.whl.

File metadata

Download URL: reval_cli-1.0.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 36.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for reval_cli-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4fb17c1cc4e5dfc700c78e4b44f9f24f0ce8aed8817a74a16cc1c35ea17998f0`
MD5	`4a57cebb089aeb6ff66ee257c767e392`
BLAKE2b-256	`7b27d1f8b65e7e3b6fd9e31067fc6b4b1f2f52339274859417b771708b95e7e4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for reval_cli-1.0.0-py3-none-any.whl:

Publisher: release.yml on calebevans/reval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: reval_cli-1.0.0-py3-none-any.whl
- Subject digest: 4fb17c1cc4e5dfc700c78e4b44f9f24f0ce8aed8817a74a16cc1c35ea17998f0
- Sigstore transparency entry: 1396201145
- Sigstore integration time: Apr 28, 2026
Source repository:
- Permalink: calebevans/reval@1eabb33e6f0bb0907893b45f26a9ef49b7cf2e19
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/calebevans
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1eabb33e6f0bb0907893b45f26a9ef49b7cf2e19
- Trigger Event: release

reval-cli 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

reval

Installation

Quick Start

Configuration

Configuration Sections

Environment Variables

CLI Reference

reval init

reval analyze

reval report

Analysis Modes

Compare mode

Single-session mode

Output Formats

Publishing to Langfuse

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`reval init`

`reval analyze`

`reval report`