A python API sdk facilitating Error Analysis via LLM-as-a-Judge

These details have not been verified by PyPI

Project description

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

CLEAR (Comprehensive LLM Error Analysis and Reporting) is an interactive, open-source package for LLM-based error analysis. It helps surface meaningful, recurring issues in model outputs by combining automated evaluation with powerful visualization tools.

The workflow consists of two main phases:

Analysis
Generates textual feedback for each instance; Identifies system-level error categories from these critiques and quantifies their frequencies.
Interactive Dashboard
An intuitive dashboard provides a comprehensive view of model behavior. Users can:
- Explore aggregate visualizations of identified issues
- Apply dynamic filters to focus on specific error types or score ranges
- Drill down into individual examples that illustrate specific failure patterns

CLEAR makes it easier to diagnose model shortcomings and prioritize targeted improvements.

You can run CLEAR as a full pipeline, or reuse specific stages (generation, evaluation, or just UI).

🚀 Quickstart

Requires Python 3.10+ and the necessary credentials for a supported provider.

1. Installation

Option 1 (Recommended for development): Clone the repo and set up a virtual environment:

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

📦 Option 2: Install via pip (Latest Release)

pip install clear-eval

` 2. ### Set provider type and credentials CLEAR requires a supported LLM provider and credentials to run analysis. See supported providers ↓

⚠️ Using a private proxy or openai deployment? You must configure your model names explicitly (see below). Otherwise, default model names will be used automatically for supported providers.

Run on sample data:

The sample dataset is a small subset of the GSM8K math problems. For running on the sample data and default configuration, you simpy have to set your provider and run

run-clear-eval-analysis --provider=openai # or rits, watsonx

This will:

Run the full CLEAR pipeline
Save results under: results/gsm8k/sample_output/

View results in the interactive dashboard:

run-clear-eval-dashboard

Or set the port with

run-clear-eval-dashboard --port <port>

Then:

Upload the generated ZIP file from results/gsm8k/sample_output/
Explore issues, scores, filters, and drill into examples

To explore the dashboard without running any analysis:

Run the dashboard:

run-clear-eval-dashboard

Then you can load the pre-generated sample output zip. you can manually upload a sample .zip file located at:

<your-env>/site-packages/clear_eval/sample_data/gsm8k/analysis_results_gsm8k_default.zip

📁 Or just download it directly from the GitHub repo.

📂 Analyzing your own data

📄 Input Data Format

CLEAR takes a CSV file as input, with each row representing a single instance to be evaluated.

Required Columns

Column	Used When	Description
`id`	Always	Unique identifier for the instance
`model_input`	Always	Prompt provided to the generation model
`response`	Using pre-generated responses	Pre-generated model response (ignored if generation is enabled)
`ground_truth`	Performing reference based analysis	Ground-truth answer for evaluation (optional)
others	`--input_columns` is used	Additional input columns to show in dashboard (e.g. `question`)

🚀 Running the analysis

CLEAR can be run via the CLI or Python API.

Option 1: CLI commands

Each stage has its own entry point:

run-clear-eval-analysis --config_path path/to/config.yaml    # run full pypeline
run-clear-eval-generation --config_path path/to/config.yaml  # run generation only
run-clear-eval-evaluation --config_path path/to/config.yaml  # Assume generation responses are given, run evaluation

If --config_path is specified, all parameters are taken from the config unless explicitly overridden
CLI flags passed directly override corresponding config values

Option 2: Python API

from clear_eval.analysis_runner import run_clear_eval_analysis, run_clear_eval_generation, run_clear_eval_evaluation

run_clear_eval_analysis(
    config_path="configs/sample_run_config.yaml"
)

You may also pass overrides instead of using a config file:

from clear_eval.analysis_runner import run_clear_eval_analysis

run_clear_eval_analysis(
    run_name="my_data",
    provider="openai",
    data_path="my_data.csv",
    gen_model_name="gpt-3.5-turbo",
    eval_model_name="gpt-4",
    output_dir="results/gsm8k/",
    perform_generation=False,
    input_columns=["question"]
)

📊 Launching the Dashboard

run-clear-eval-dashboard

Upload the ZIP file generated in your --output-dir when prompted.

🎛 Supported CLI Arguments

Arguments can be provided via:

A YAML config file (--config_path)
CLI flags
Python function parameters (when using the API)

⚠️ Boolean arguments (perform_generation, is_reference_based, resume_enabled)
These must be set explicitly to true or false in YAML, CLI, or Python.
On the CLI, use --flag True or --flag False (case-insensitive).

⚠️ Naming Convention
Parameter names use snake_case in YAML and Python, but use --kebab-case in CLI.
For example:

YAML: perform_generation: true

Python: perform_generation=True

CLI: --perform-generation True

Argument	Description	Default
`--config_path`	Path to a YAML config file (all values loaded unless overridden by CLI args)
`--run_name`	Unique run name (used in result file names)
`--data_path`	Path to input CSV file
`--output_dir`	Output directory to write results
`--provider`	Model provider: `openai`, `watsonx`, `rits`
`--eval_model_name`	Name of judge model (e.g. `gpt-4o`)
`--gen_model_name`	Name of the generator model to evaluate. If not running generations - the generator name to display.
`--perform_generation`	Whether to generate responses or use existing `response` column	True
`--is_reference_based`	Use reference-based evaluation (requires `ground_truth` column in input)	False
`--resume_enabled`	Whether to reuse intermediate outputs from previous runs stored in output_dir	True
`--evaluation_criteria`	Custom criteria dictionary for scoring individual records: `{"criteria_name1":"criteria_desc1", ...}`supported for yaml config and python.	None
`--input_columns`	Comma-separated list of additional input fields (other than `model_input`) to appear in the results and dashboard (e.g. `question`)	None

🔑Supported providers and credentials

Depending on your selected --provider:

Provider	Required Environment Variables
`openai`	`OPENAI_API_KEY`, [`OPENAI_API_BASE` if using proxy ]
`watsonx`	`WATSONX_APIKEY`, `WATSONX_URL`, `WATSONX_SPACE_ID` or `PROJECT_ID`
`rits`	`RITS_API_KEY`

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.1

May 21, 2026

1.0.8

Oct 22, 2025

This version

1.0.7

Sep 3, 2025

1.0.6

Aug 11, 2025

1.0.5

Jul 24, 2025

1.0.4

Jul 24, 2025

1.0.3

Jul 24, 2025

1.0.2

Jul 24, 2025

1.0.1

Jul 24, 2025

1.0.0

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clear_eval-1.0.7.tar.gz (1.2 MB view details)

Uploaded Sep 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clear_eval-1.0.7-py3-none-any.whl (1.2 MB view details)

Uploaded Sep 3, 2025 Python 3

File details

Details for the file clear_eval-1.0.7.tar.gz.

File metadata

Download URL: clear_eval-1.0.7.tar.gz
Upload date: Sep 3, 2025
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for clear_eval-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`be0ce2082fd16db5ebabf8d5aefb26e56cb528ddf57b487a41d1fecaa800f1b2`
MD5	`55c764f03a018e0d4184182de53e7c6b`
BLAKE2b-256	`960711578b7ff283417665566f77868a1905dd8edd3c798aad394af8ce90ac3a`

See more details on using hashes here.

File details

Details for the file clear_eval-1.0.7-py3-none-any.whl.

File metadata

Download URL: clear_eval-1.0.7-py3-none-any.whl
Upload date: Sep 3, 2025
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for clear_eval-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7111bff257bb27c87b339afe49b68760cff5a742a5d4dae5662f403106375a70`
MD5	`cedac3471fd5c56da7a2256011b3602f`
BLAKE2b-256	`bce57cd8a3308f31d768a0b2d89d50a31a58919a5610463d8fc4a3a390522c15`

See more details on using hashes here.

clear-eval 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

🚀 Quickstart

1. Installation

Option 1 (Recommended for development): Clone the repo and set up a virtual environment:

📦 Option 2: Install via pip (Latest Release)

Run on sample data:

View results in the interactive dashboard:

To explore the dashboard without running any analysis:

📂 Analyzing your own data

📄 Input Data Format

Required Columns

🚀 Running the analysis

Option 1: CLI commands

Option 2: Python API

📊 Launching the Dashboard

🎛 Supported CLI Arguments

🔑Supported providers and credentials

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes