A python API sdk facilitating Error Analysis via LLM-as-a-Judge
Project description
CLEAR: Comprehensive LLM Error Analysis and Reporting
CLEAR is an open-source toolkit for LLM error analysis using an LLM-as-a-Judge approach.
๐ฏ What is CLEAR?
CLEAR provides systematic error analysis for:
- Single LLM Responses โ Analyze quality issues in model outputs for tasks like Q&A, summarization, and generation
- Agentic Workflows โ Evaluate complex workflows with multiple components, tool usage, and multi-step task trajectories
It combines automated LLM-as-a-judge evaluation with interactive dashboards to help you:
- Identify recurring error patterns across your dataset
- Quantify issue frequencies and severity
- Drill down into specific failure cases
- Prioritize improvements based on data-driven insights
โ๏ธ How It Works
CLEAR operates in two phases:
- Analysis โ Generates per-instance textual feedback, identifies system-level error categories, and quantifies their frequencies.
- Interactive Dashboard โ Explore aggregate visualizations, apply dynamic filters, and drill down into individual failure examples.
๐ Two Analysis Modes
CLEAR supports two distinct analysis modes, each with its own pipeline, dashboard, and documentation:
๐ LLM Analysis
Evaluate standard LLM outputs โ generation quality, correctness, and recurring error patterns. Provide a CSV with prompts and responses, and CLEAR will score each instance, generate textual critiques, and surface system-level issues.
| Input | CSV with model inputs and responses |
| Output | Per-record scores, evaluation text, aggregated issue categories |
| Dashboard | Streamlit-based interactive explorer |
๐ค Agentic Analysis
Evaluate multi-agent system trajectories โ step-by-step agent interactions and full trajectory analysis. Supports traces from LangGraph, CrewAI, and other frameworks via MLflow or Langfuse.
| Input | Raw JSON traces or preprocessed trajectory CSVs |
| Output | Per-step CLEAR analysis, trajectory-level scores, rubric evaluations |
| Dashboard | NiceGUI-based workflow visualization with path and temporal analysis |
๐ Agentic Workflows Guide โ | Agentic Dashboard Guide โ
โจ Key Features
| ๐งโโ๏ธ LLM-as-a-Judge | Automated evaluation for any text generation task |
| ๐ค Agentic Workflows | Evaluate agent trajectories - step by step and as a whole |
| ๐ Multiple Backends | LangChain, LiteLLM (100+ providers), or direct HTTP endpoints |
| ๐งฉ External Judges | Plug in custom evaluation functions |
| ๐ Interactive Dashboards | Standard and agentic-specific visualizations |
| ๐ ๏ธ Flexible Configuration | YAML config files, CLI flags, or Python API |
๐ฆ Installation
Requires Python 3.10+
Option 1: pip (recommended)
pip install clear-eval
Option 2: From source (for development)
git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
๐ Quick Start
1. Set your provider credentials
CLEAR requires a supported LLM provider. Set the appropriate environment variables for your provider (e.g., OPENAI_API_KEY for OpenAI). See the Providers and Credentials Guide for all supported providers and backends.
2. Run on sample data
With no data path specified, CLEAR runs on a built-in GSM8K sample dataset using default settings:
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o
Results are saved to results/gsm8k/sample_output/.
3. Run on your own data
run-clear-eval-analysis \
--provider openai \
--eval-model-name gpt-4o \
--data-path path/to/your_data.csv \
--output-dir results/my_run/ \
--run-name my_run
Your CSV should contain at minimum id, model_input, and response columns. See the LLM Analysis Guide for the full input format specification.
4. View results
run-clear-eval-dashboard
Upload the generated ZIP file from the results directory to explore issues, scores, and individual examples.
๐ Usage Overview
๐ LLM Analysis (CLI)
# Full pipeline
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml
# Evaluation only (using existing responses)
run-clear-eval-evaluation --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml
๐ LLM Analysis (Python)
from clear_eval.analysis_runner import run_clear_eval_analysis
run_clear_eval_analysis(
run_name="my_run",
provider="openai",
data_path="my_data.csv",
eval_model_name="gpt-4o",
output_dir="results/",
)
๐ค Agentic Analysis
run-clear-agentic-eval \
--data-dir data/my_traces \
--results-dir results \
--from-raw-traces true \
--eval-model-name gpt-4o \
--provider openai
# Launch agentic dashboard
run-clear-agentic-dashboard
See the Agentic Workflows Guide for full details.
๐ Documentation
| Guide | Description |
|---|---|
| ๐ LLM Analysis Guide | Full pipeline reference โ input formats, CLI arguments, configuration, and external judges |
| ๐ค Agentic Workflows Guide | Multi-agent evaluation โ trace preprocessing, step-by-step and trajectory analysis, configuration reference |
| ๐ Agentic Dashboard Guide | Dashboard features โ workflow view, node analysis, trajectory explorer, path and temporal analysis |
| ๐ Providers and Credentials | Inference backends (LangChain, LiteLLM, Endpoint), provider setup, and configuration examples |
๐ Supported Providers
| Provider | Backend | Credentials |
|---|---|---|
| OpenAI | LangChain, LiteLLM, Endpoint | OPENAI_API_KEY |
| WatsonX | LangChain, LiteLLM, Endpoint | WATSONX_APIKEY, WATSONX_URL, WATSONX_PROJECT_ID |
| Anthropic | LiteLLM | ANTHROPIC_API_KEY |
| AWS Bedrock | LiteLLM | AWS credentials |
| Google Vertex AI | LiteLLM | GCP credentials |
| 100+ more | LiteLLM | Provider-specific |
See the Providers and Credentials Guide for backend configuration details and examples.
๐๏ธ Project Structure
CLEAR/
โโโ README.md # This file
โโโ docs/
โ โโโ llm-analysis.md # LLM Analysis Guide
โ โโโ providers.md # Providers and Credentials Guide
โ โโโ agentic/ # Agentic documentation
โ โโโ dashboard.md # Agentic Dashboard Guide
โ โโโ intermediate-representation.md # CSV format reference
โ โโโ mlflow-tracing.md # MLflow tracing guide
โโโ src/clear_eval/
โ โโโ pipeline/ # LLM analysis pipeline
โ โโโ dashboard/ # LLM dashboard (Streamlit)
โ โโโ agentic/
โ โ โโโ README.md # Agentic overview (links to docs/)
โ โ โโโ pipeline/ # Agentic pipeline
โ โ โโโ dashboard/ # Dashboard code
โ โโโ sample_data/ # Sample datasets
โโโ examples/ # Configuration examples
โโโ tests/ # Test suite
๐ License
Apache 2.0 โ see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clear_eval-2.0.1.tar.gz.
File metadata
- Download URL: clear_eval-2.0.1.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e79c2d472de62b55de1f888c668a13c3607644fa6237c5a9b1e67d50ddbe71ca
|
|
| MD5 |
ebd0bde869143e5395cf6ccb54099d1f
|
|
| BLAKE2b-256 |
3d44c7817d0cd6956d51595d41724fe288de349506f137c2697a957e3d64aae7
|
File details
Details for the file clear_eval-2.0.1-py3-none-any.whl.
File metadata
- Download URL: clear_eval-2.0.1-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b50e6d8977aa179b817226b64abc0e676a1e3c1c37fee05ce79198a8b617e605
|
|
| MD5 |
9d060dc15a8365cb1f6f83c36a9a3317
|
|
| BLAKE2b-256 |
e0eb55adb3684cc2f2385110e1ffeb35f393f02b7d124a8910b5099dde7f4f82
|