A python API sdk facilitating Error Analysis via LLM-as-a-Judge

These details have not been verified by PyPI

Project description

CLEAR: Comprehensive LLM Error Analysis and Reporting

CLEAR is an open-source toolkit for LLM error analysis using an LLM-as-a-Judge approach.

What is CLEAR?

CLEAR provides systematic error analysis for LLM-based systems. It combines automated LLM-as-a-judge evaluation with interactive dashboards to help you:

Identify recurring error patterns across your dataset
Quantify issue frequencies and severity
Drill down into specific failure cases
Prioritize improvements based on data-driven insights

CLEAR operates in two phases:

Analysis — Generates per-instance textual feedback, identifies system-level error categories, and quantifies their frequencies.
Interactive Dashboard — Explore aggregate visualizations, apply dynamic filters, and drill down into individual failure examples.

Two Analysis Modes

LLM Analysis

Evaluate standard LLM outputs — generation quality, correctness, and recurring error patterns. Provide a CSV with prompts and responses, and CLEAR will score each instance, generate textual critiques, and surface system-level issues.

Input: CSV with model inputs and responses
Output: Per-record scores, evaluation text, aggregated issue categories
Dashboard: Streamlit-based interactive explorer

LLM Analysis Guide →

Agentic Analysis

Evaluate multi-agent system trajectories — step-by-step agent interactions and full trajectory analysis. Supports traces from LangGraph, CrewAI, and other frameworks via MLflow or Langfuse.

Input: Raw JSON traces or preprocessed trajectory CSVs (each trace captures one complete agent task execution)
Output: Per-step CLEAR analysis, trajectory-level scores, rubric evaluations
Dashboard: NiceGUI-based workflow visualization with path and temporal analysis

Agentic Workflows Guide →

Installation

Requires Python 3.10+

Option 1: pip

pip install clear-eval

Option 2: From source (for development)

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

Verify the installation:

run-clear-eval-analysis --help

Quick Start

1. Set your provider credentials

CLEAR requires a supported LLM provider. Set the appropriate environment variables for your provider (e.g., OPENAI_API_KEY for OpenAI). Adjust --provider and --eval-model-name in the commands below to match your setup. See Provider Configuration for details.

2. LLM Analysis

This evaluates GSM8K math problem responses and surfaces recurring quality issues:

run-clear-eval-analysis --provider openai --eval-model-name gpt-4o

Results are saved to results/gsm8k/sample_output/. View them:

run-clear-eval-dashboard

Full LLM Analysis Guide → — input formats, CLI arguments, configuration, Python API, and external judges.

3. Agentic Analysis

These two modes are independent — this section does not require step 2.

Run CLEAR on sample agent traces (3 traces, each capturing one complete agent task execution, ~2 minutes):

run-clear-agentic-eval \
    --data-dir src/clear_eval/sample_data/agentic/research_agent_traces/mlflow \
    --results-dir my_smoke_test_results \
    --from-raw-traces true \
    --agent-framework langgraph \
    --observability-framework mlflow \
    --run-name smoke_test \
    --max-files 3 \
    --eval-model-name gpt-4o \
    --provider openai

View pre-computed results (all 20 traces) without re-running:

run-clear-agentic-dashboard
# Upload: src/clear_eval/sample_data/agentic/research_agent_results/mlflow/my_experiment/unified_ui_results.zip

Full Agentic Guide → — trace generation, configuration, output structure, and dashboard features.

Provider Configuration

CLEAR uses LiteLLM as its inference backend, supporting 100+ LLM providers (OpenAI, Anthropic, WatsonX, AWS Bedrock, Google Vertex AI, and more).

Parameters:

Parameter	CLI Flag	Description
`provider`	`--provider`	LiteLLM provider name (e.g., `openai`, `anthropic`, `bedrock`, `vertex_ai`)
`eval_model_name`	`--eval-model-name`	Model identifier (e.g., `gpt-4o`, `claude-3-5-sonnet-20241022`)
`eval_model_params`	`--eval-model-params`	Additional model parameters as JSON (e.g., `{"temperature": 0}`)
`endpoint_url`	`--endpoint-url`	Custom endpoint URL for local/self-hosted models

Cloud provider example:

export OPENAI_API_KEY="..."
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o

Local model example (vLLM, llama.cpp, Ollama, etc.):

run-clear-eval-analysis \
    --provider openai \
    --eval-model-name my-local-model \
    --endpoint-url http://localhost:8000/v1

No credentials are needed when using --endpoint-url with a local server.

Set the required environment variables for your provider according to LiteLLM's documentation.

Documentation

Guide	Description
Agentic Workflows Guide	Multi-agent evaluation — trace preprocessing, step-by-step and trajectory analysis, configuration reference
Agentic Dashboard Guide	Dashboard features — workflow view, node analysis, trajectory explorer, path and temporal analysis
LLM Analysis Guide	Full pipeline reference — input formats, CLI arguments, configuration, and external judges

Citation

If you use CLEAR in your research, please cite the relevant paper(s):

LLM Analysis (AAAI 2026):

@inproceedings{yehudai2026clear,
  title={CLEAR: Error analysis via llm-as-a-judge made easy},
  author={Yehudai, Asaf and Eden, Lilach and Perlitz, Yotam and Bar-Haim, Roy and Shmueli-Scheuer, Michal},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={48},
  pages={41736--41738},
  year={2026}
}

Agentic Analysis (ACL 2026, to appear — preprint):

@article{yehudai2026agentic,
  title={Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents},
  author={Yehudai, Asaf and Eden, Lilach and Shmueli-Scheuer, Michal},
  journal={arXiv preprint arXiv:2605.22608},
  year={2026}
}

License

Apache 2.0 — see LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.2

May 24, 2026

2.0.1

May 21, 2026

1.0.8

Oct 22, 2025

1.0.7

Sep 3, 2025

1.0.6

Aug 11, 2025

1.0.5

Jul 24, 2025

1.0.4

Jul 24, 2025

1.0.3

Jul 24, 2025

1.0.2

Jul 24, 2025

1.0.1

Jul 24, 2025

1.0.0

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clear_eval-2.0.2.tar.gz (2.3 MB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clear_eval-2.0.2-py3-none-any.whl (2.4 MB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file clear_eval-2.0.2.tar.gz.

File metadata

Download URL: clear_eval-2.0.2.tar.gz
Upload date: May 24, 2026
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for clear_eval-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`1fbb77565328c005f7b4a5d714a96b1e3316b849028c728023d96ba1239e4bdc`
MD5	`44e8515aaa616ca930550979a1e447a2`
BLAKE2b-256	`d1e451331c8acbb3027198d64d129ae92a47feca9c2e50a9095c84ac8df0966e`

See more details on using hashes here.

File details

Details for the file clear_eval-2.0.2-py3-none-any.whl.

File metadata

Download URL: clear_eval-2.0.2-py3-none-any.whl
Upload date: May 24, 2026
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for clear_eval-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c70d8b8e9e4015588d517de5dc8d1f69f500c76ce946a0ff01e8cb3590d2b77`
MD5	`848faa261a0175341e8d1001823f50bb`
BLAKE2b-256	`065f5489f618cbdd4af0eec2a57fca79b8ec40da9b23731fdb9bb8f0d0734da9`

See more details on using hashes here.

clear-eval 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

CLEAR: Comprehensive LLM Error Analysis and Reporting

What is CLEAR?

Two Analysis Modes

LLM Analysis

Agentic Analysis

Installation

Option 1: pip

Option 2: From source (for development)

Quick Start

1. Set your provider credentials

2. LLM Analysis

3. Agentic Analysis

Provider Configuration

Documentation

Citation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes