Skip to main content

A python API sdk facilitating Error Analysis via LLM-as-a-Judge

Project description

CLEAR: Comprehensive LLM Error Analysis and Reporting

CLEAR is an open-source toolkit for LLM error analysis using an LLM-as-a-Judge approach.

Python 3.10+ License PyPI


๐ŸŽฏ What is CLEAR?

CLEAR provides systematic error analysis for:

  • Single LLM Responses โ€” Analyze quality issues in model outputs for tasks like Q&A, summarization, and generation
  • Agentic Workflows โ€” Evaluate complex workflows with multiple components, tool usage, and multi-step task trajectories

It combines automated LLM-as-a-judge evaluation with interactive dashboards to help you:

  • Identify recurring error patterns across your dataset
  • Quantify issue frequencies and severity
  • Drill down into specific failure cases
  • Prioritize improvements based on data-driven insights

โš™๏ธ How It Works

CLEAR operates in two phases:

  1. Analysis โ€” Generates per-instance textual feedback, identifies system-level error categories, and quantifies their frequencies.
  2. Interactive Dashboard โ€” Explore aggregate visualizations, apply dynamic filters, and drill down into individual failure examples.

๐Ÿ”€ Two Analysis Modes

CLEAR supports two distinct analysis modes, each with its own pipeline, dashboard, and documentation:

๐Ÿ“ LLM Analysis

Evaluate standard LLM outputs โ€” generation quality, correctness, and recurring error patterns. Provide a CSV with prompts and responses, and CLEAR will score each instance, generate textual critiques, and surface system-level issues.

Input CSV with model inputs and responses
Output Per-record scores, evaluation text, aggregated issue categories
Dashboard Streamlit-based interactive explorer

๐Ÿ“– Full LLM Analysis Guide โ†’

๐Ÿค– Agentic Analysis

Evaluate multi-agent system trajectories โ€” step-by-step agent interactions and full trajectory analysis. Supports traces from LangGraph, CrewAI, and other frameworks via MLflow or Langfuse.

Input Raw JSON traces or preprocessed trajectory CSVs
Output Per-step CLEAR analysis, trajectory-level scores, rubric evaluations
Dashboard NiceGUI-based workflow visualization with path and temporal analysis

๐Ÿ“– Agentic Workflows Guide โ†’ | Agentic Dashboard Guide โ†’


โœจ Key Features

๐Ÿง‘โ€โš–๏ธ LLM-as-a-Judge Automated evaluation for any text generation task
๐Ÿค– Agentic Workflows Evaluate agent trajectories - step by step and as a whole
๐Ÿ”Œ Multiple Backends LangChain, LiteLLM (100+ providers), or direct HTTP endpoints
๐Ÿงฉ External Judges Plug in custom evaluation functions
๐Ÿ“Š Interactive Dashboards Standard and agentic-specific visualizations
๐Ÿ› ๏ธ Flexible Configuration YAML config files, CLI flags, or Python API

๐Ÿ“ฆ Installation

Requires Python 3.10+

Option 1: pip (recommended)

pip install clear-eval

Option 2: From source (for development)

git clone https://github.com/IBM/CLEAR.git
cd CLEAR
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

๐Ÿš€ Quick Start

1. Set your provider credentials

CLEAR requires a supported LLM provider. Set the appropriate environment variables for your provider (e.g., OPENAI_API_KEY for OpenAI). See the Providers and Credentials Guide for all supported providers and backends.

2. Run on sample data

With no data path specified, CLEAR runs on a built-in GSM8K sample dataset using default settings:

run-clear-eval-analysis --provider openai --eval-model-name gpt-4o

Results are saved to results/gsm8k/sample_output/.

3. Run on your own data

run-clear-eval-analysis \
    --provider openai \
    --eval-model-name gpt-4o \
    --data-path path/to/your_data.csv \
    --output-dir results/my_run/ \
    --run-name my_run

Your CSV should contain at minimum id, model_input, and response columns. See the LLM Analysis Guide for the full input format specification.

4. View results

run-clear-eval-dashboard

Upload the generated ZIP file from the results directory to explore issues, scores, and individual examples.


๐Ÿ” Usage Overview

๐Ÿ“ LLM Analysis (CLI)

# Full pipeline
run-clear-eval-analysis --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

# Evaluation only (using existing responses)
run-clear-eval-evaluation --provider openai --eval-model-name gpt-4o --config_path path/to/config.yaml

๐Ÿ“ LLM Analysis (Python)

from clear_eval.analysis_runner import run_clear_eval_analysis

run_clear_eval_analysis(
    run_name="my_run",
    provider="openai",
    data_path="my_data.csv",
    eval_model_name="gpt-4o",
    output_dir="results/",
)

๐Ÿค– Agentic Analysis

run-clear-agentic-eval \
    --data-dir data/my_traces \
    --results-dir results \
    --from-raw-traces true \
    --eval-model-name gpt-4o \
    --provider openai

# Launch agentic dashboard
run-clear-agentic-dashboard

See the Agentic Workflows Guide for full details.


๐Ÿ“š Documentation

Guide Description
๐Ÿ“ LLM Analysis Guide Full pipeline reference โ€” input formats, CLI arguments, configuration, and external judges
๐Ÿค– Agentic Workflows Guide Multi-agent evaluation โ€” trace preprocessing, step-by-step and trajectory analysis, configuration reference
๐Ÿ“Š Agentic Dashboard Guide Dashboard features โ€” workflow view, node analysis, trajectory explorer, path and temporal analysis
๐Ÿ”‘ Providers and Credentials Inference backends (LangChain, LiteLLM, Endpoint), provider setup, and configuration examples

๐Ÿ”‘ Supported Providers

Provider Backend Credentials
OpenAI LangChain, LiteLLM, Endpoint OPENAI_API_KEY
WatsonX LangChain, LiteLLM, Endpoint WATSONX_APIKEY, WATSONX_URL, WATSONX_PROJECT_ID
Anthropic LiteLLM ANTHROPIC_API_KEY
AWS Bedrock LiteLLM AWS credentials
Google Vertex AI LiteLLM GCP credentials
100+ more LiteLLM Provider-specific

See the Providers and Credentials Guide for backend configuration details and examples.


๐Ÿ—‚๏ธ Project Structure

CLEAR/
โ”œโ”€โ”€ README.md                              # This file
โ”œโ”€โ”€ docs/
โ”‚   โ”œโ”€โ”€ llm-analysis.md                    # LLM Analysis Guide
โ”‚   โ”œโ”€โ”€ providers.md                       # Providers and Credentials Guide
โ”‚   โ””โ”€โ”€ agentic/                           # Agentic documentation
โ”‚       โ”œโ”€โ”€ dashboard.md                   # Agentic Dashboard Guide
โ”‚       โ”œโ”€โ”€ intermediate-representation.md # CSV format reference
โ”‚       โ””โ”€โ”€ mlflow-tracing.md              # MLflow tracing guide
โ”œโ”€โ”€ src/clear_eval/
โ”‚   โ”œโ”€โ”€ pipeline/                          # LLM analysis pipeline
โ”‚   โ”œโ”€โ”€ dashboard/                         # LLM dashboard (Streamlit)
โ”‚   โ”œโ”€โ”€ agentic/
โ”‚   โ”‚   โ”œโ”€โ”€ README.md                      # Agentic overview (links to docs/)
โ”‚   โ”‚   โ”œโ”€โ”€ pipeline/                      # Agentic pipeline
โ”‚   โ”‚   โ””โ”€โ”€ dashboard/                     # Dashboard code
โ”‚   โ””โ”€โ”€ sample_data/                       # Sample datasets
โ”œโ”€โ”€ examples/                              # Configuration examples
โ””โ”€โ”€ tests/                                 # Test suite

๐Ÿ“„ License

Apache 2.0 โ€” see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clear_eval-2.0.1.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clear_eval-2.0.1-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file clear_eval-2.0.1.tar.gz.

File metadata

  • Download URL: clear_eval-2.0.1.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for clear_eval-2.0.1.tar.gz
Algorithm Hash digest
SHA256 e79c2d472de62b55de1f888c668a13c3607644fa6237c5a9b1e67d50ddbe71ca
MD5 ebd0bde869143e5395cf6ccb54099d1f
BLAKE2b-256 3d44c7817d0cd6956d51595d41724fe288de349506f137c2697a957e3d64aae7

See more details on using hashes here.

File details

Details for the file clear_eval-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: clear_eval-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for clear_eval-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b50e6d8977aa179b817226b64abc0e676a1e3c1c37fee05ce79198a8b617e605
MD5 9d060dc15a8365cb1f6f83c36a9a3317
BLAKE2b-256 e0eb55adb3684cc2f2385110e1ffeb35f393f02b7d124a8910b5099dde7f4f82

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page