A python API sdk facilitating Error Analysis via LLM-as-a-Judge

These details have not been verified by PyPI

Project description

CLEAR: Comprehensive LLM Error Analysis and Reporting

CLEAR is an open-source toolkit for LLM error analysis using an LLM-as-a-Judge approach.

🎯 What is CLEAR?

CLEAR provides systematic error analysis for:

Single LLM Responses — Analyze quality issues in model outputs for tasks like Q&A, summarization, and generation
Agentic Workflows — Evaluate complex workflows with multiple components, tool usage, and multi-step task trajectories

It combines automated LLM-as-a-judge evaluation with interactive dashboards to help you:

Identify recurring error patterns across your dataset
Quantify issue frequencies and severity
Drill down into specific failure cases
Prioritize improvements based on data-driven insights

⚙️ How It Works

CLEAR operates in two phases:

Analysis — Generates per-instance textual feedback, identifies system-level error categories, and quantifies their frequencies.
Interactive Dashboard — Explore aggregate visualizations, apply dynamic filters, and drill down into individual failure examples.

🔀 Two Analysis Modes

CLEAR supports two distinct analysis modes, each with its own pipeline, dashboard, and documentation:

📝 LLM Analysis

Evaluate standard LLM outputs — generation quality, correctness, and recurring error patterns. Provide a CSV with prompts and responses, and CLEAR will score each instance, generate textual critiques, and surface system-level issues.


Input	CSV with model inputs and responses
Output	Per-record scores, evaluation text, aggregated issue categories
Dashboard	Streamlit-based interactive explorer

📖 Full LLM Analysis Guide →

🤖 Agentic Analysis

Evaluate multi-agent system trajectories — step-by-step agent interactions and full trajectory analysis. Supports traces from LangGraph, CrewAI, and other frameworks via MLflow or Langfuse.


Input	Raw JSON traces or preprocessed trajectory CSVs
Output	Per-step CLEAR analysis, trajectory-level scores, rubric evaluations
Dashboard	NiceGUI-based workflow visualization with path and temporal analysis

📖 Agentic Workflows Guide → | Agentic Dashboard Guide →

✨ Key Features


🧑‍⚖️ LLM-as-a-Judge	Automated evaluation for any text generation task
🤖 Agentic Workflows	Evaluate agent trajectories - step by step and as a whole
🔌 Multiple Backends	LangChain, LiteLLM (100+ providers), or direct HTTP endpoints
🧩 External Judges	Plug in custom evaluation functions
📊 Interactive Dashboards	Standard and agentic-specific visualizations
🛠️ Flexible Configuration	YAML config files, CLI flags, or Python API

📦 Installation

Requires Python 3.10+

Option 1: pip (recommended)

pip install clear-eval

Option 2: From source (for development)

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

🚀 Quick Start

1. Set your provider credentials

CLEAR requires a supported LLM provider. Set the appropriate environment variables for your provider (e.g., OPENAI_API_KEY for OpenAI). See the Providers and Credentials Guide for all supported providers and backends.

2. Run on sample data

With no data path specified, CLEAR runs on a built-in GSM8K sample dataset using default settings:

run-clear-eval-analysis --provider openai --eval-model-name gpt-4o

Results are saved to results/gsm8k/sample_output/.

3. Run on your own data

run-clear-eval-analysis \
    --provider openai \
    --eval-model-name gpt-4o \
    --data-path path/to/your_data.csv \
    --output-dir results/my_run/ \
    --run-name my_run

Your CSV should contain at minimum id, model_input, and response columns. See the LLM Analysis Guide for the full input format specification.

4. View results

run-clear-eval-dashboard

Upload the generated ZIP file from the results directory to explore issues, scores, and individual examples.

🔍 Usage Overview

📝 LLM Analysis (CLI)

# Full pipeline
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

# Evaluation only (using existing responses)
run-clear-eval-evaluation --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

📝 LLM Analysis (Python)

from clear_eval.analysis_runner import run_clear_eval_analysis

run_clear_eval_analysis(
    run_name="my_run",
    provider="openai",
    data_path="my_data.csv",
    eval_model_name="gpt-4o",
    output_dir="results/",
)

🤖 Agentic Analysis

run-clear-agentic-eval \
    --data-dir data/my_traces \
    --results-dir results \
    --from-raw-traces true \
    --eval-model-name gpt-4o \
    --provider openai

# Launch agentic dashboard
run-clear-agentic-dashboard

See the Agentic Workflows Guide for full details.

📚 Documentation

Guide	Description
📝 LLM Analysis Guide	Full pipeline reference — input formats, CLI arguments, configuration, and external judges
🤖 Agentic Workflows Guide	Multi-agent evaluation — trace preprocessing, step-by-step and trajectory analysis, configuration reference
📊 Agentic Dashboard Guide	Dashboard features — workflow view, node analysis, trajectory explorer, path and temporal analysis
🔑 Providers and Credentials	Inference backends (LangChain, LiteLLM, Endpoint), provider setup, and configuration examples

🔑 Supported Providers

Provider	Backend	Credentials
OpenAI	LangChain, LiteLLM, Endpoint	`OPENAI_API_KEY`
WatsonX	LangChain, LiteLLM, Endpoint	`WATSONX_APIKEY`, `WATSONX_URL`, `WATSONX_PROJECT_ID`
Anthropic	LiteLLM	`ANTHROPIC_API_KEY`
AWS Bedrock	LiteLLM	AWS credentials
Google Vertex AI	LiteLLM	GCP credentials
100+ more	LiteLLM	Provider-specific

See the Providers and Credentials Guide for backend configuration details and examples.

🗂️ Project Structure

CLEAR/
├── README.md                              # This file
├── docs/
│   ├── llm-analysis.md                    # LLM Analysis Guide
│   ├── providers.md                       # Providers and Credentials Guide
│   └── agentic/                           # Agentic documentation
│       ├── dashboard.md                   # Agentic Dashboard Guide
│       ├── intermediate-representation.md # CSV format reference
│       └── mlflow-tracing.md              # MLflow tracing guide
├── src/clear_eval/
│   ├── pipeline/                          # LLM analysis pipeline
│   ├── dashboard/                         # LLM dashboard (Streamlit)
│   ├── agentic/
│   │   ├── README.md                      # Agentic overview (links to docs/)
│   │   ├── pipeline/                      # Agentic pipeline
│   │   └── dashboard/                     # Dashboard code
│   └── sample_data/                       # Sample datasets
├── examples/                              # Configuration examples
└── tests/                                 # Test suite

📄 License

Apache 2.0 — see LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.1

May 21, 2026

1.0.8

Oct 22, 2025

1.0.7

Sep 3, 2025

1.0.6

Aug 11, 2025

1.0.5

Jul 24, 2025

1.0.4

Jul 24, 2025

1.0.3

Jul 24, 2025

1.0.2

Jul 24, 2025

1.0.1

Jul 24, 2025

1.0.0

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clear_eval-2.0.1.tar.gz (2.3 MB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clear_eval-2.0.1-py3-none-any.whl (2.4 MB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file clear_eval-2.0.1.tar.gz.

File metadata

Download URL: clear_eval-2.0.1.tar.gz
Upload date: May 21, 2026
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for clear_eval-2.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e79c2d472de62b55de1f888c668a13c3607644fa6237c5a9b1e67d50ddbe71ca`
MD5	`ebd0bde869143e5395cf6ccb54099d1f`
BLAKE2b-256	`3d44c7817d0cd6956d51595d41724fe288de349506f137c2697a957e3d64aae7`

See more details on using hashes here.

File details

Details for the file clear_eval-2.0.1-py3-none-any.whl.

File metadata

Download URL: clear_eval-2.0.1-py3-none-any.whl
Upload date: May 21, 2026
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for clear_eval-2.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b50e6d8977aa179b817226b64abc0e676a1e3c1c37fee05ce79198a8b617e605`
MD5	`9d060dc15a8365cb1f6f83c36a9a3317`
BLAKE2b-256	`e0eb55adb3684cc2f2385110e1ffeb35f393f02b7d124a8910b5099dde7f4f82`

See more details on using hashes here.

clear-eval 2.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

CLEAR: Comprehensive LLM Error Analysis and Reporting

🎯 What is CLEAR?

⚙️ How It Works

🔀 Two Analysis Modes

📝 LLM Analysis

🤖 Agentic Analysis

✨ Key Features

📦 Installation

Option 1: pip (recommended)

Option 2: From source (for development)

🚀 Quick Start

1. Set your provider credentials

2. Run on sample data

3. Run on your own data

4. View results

🔍 Usage Overview

📝 LLM Analysis (CLI)

📝 LLM Analysis (Python)

🤖 Agentic Analysis

📚 Documentation

🔑 Supported Providers

🗂️ Project Structure

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes