ARC-Eval: Agent Reliability & Compliance evaluation platform for LLMs and AI agents

These details have not been verified by PyPI

Project links

Project description

ARC-Eval: Agent-as-a-Judge Enterprise Platform

The first Agent-as-a-Judge platform for enterprise agent evaluation

Transform your agent compliance from static audits to continuous improvement. Get AI-powered feedback, CISO-ready reports, and actionable recommendations across 345 enterprise scenarios.

Quick Start

# Install
pip install arc-eval

# Try it instantly (no setup required)
arc-eval --quick-start --domain finance --agent-judge

# Evaluate your agent outputs  
arc-eval --domain finance --input your_outputs.json --agent-judge

# Generate executive reports
arc-eval --domain security --input outputs.json --agent-judge --export pdf

Need an API key? Set ANTHROPIC_API_KEY for Agent-as-a-Judge features, or use traditional evaluation without AI feedback.

Why Agent-as-a-Judge?

Traditional compliance tools give you pass/fail results. Agent-as-a-Judge gives you a path to improvement.

🎯 Value Delivered

Continuous Feedback: AI judges provide actionable recommendations, not just scores
Enterprise Scale: 345 scenarios across Finance (110), Security (120), ML (107) domains
CISO-Ready: Executive reports with compliance framework mapping
Cost Optimized: Smart model selection and fallbacks for production use

⚡ How It Works

Your Agent Output → AI Judge → Compliance Score + Improvement Plan + Training Signals → Self-Improvement Loop

Domains: Finance (SOX, KYC, AML) • Security (OWASP, MITRE) • ML (MLOps, EU AI Act)

Common Use Cases

# 🚀 Demo & Discovery
arc-eval --quick-start --domain finance --agent-judge

# 📊 Evaluate Your Agents  
arc-eval --domain security --input outputs.json --agent-judge

# 🏢 Executive Reporting
arc-eval --domain ml --input outputs.json --agent-judge --export pdf --summary-only

# ⚙️ CI/CD Integration
arc-eval --domain finance --input logs.json --agent-judge --judge-model claude-3-5-haiku

More Examples: See examples/ for detailed workflows, input formats, and CI/CD templates.

Input Format

{"output": "Transaction approved for customer John Smith"}

ARC-Eval auto-detects formats from OpenAI, Anthropic, LangChain, and custom agents. See examples/ for comprehensive format documentation.

Key Commands

# Essential flags
--domain finance|security|ml    # Choose evaluation domain
--input file.json               # Your agent outputs
--agent-judge                   # Enable AI feedback
--export pdf                    # Generate reports

# Useful options  
--quick-start                   # Try with sample data
--judge-model auto|sonnet|haiku # Cost optimization
--summary-only                  # Executive reports only
--list-domains                  # See all scenarios

Full Reference: Run arc-eval --help or see examples/ for complete documentation.

Enterprise Integration

CI/CD Pipeline

# Basic compliance gate
arc-eval --domain finance --input $CI_ARTIFACTS/logs.json --agent-judge
if [ $? -ne 0 ]; then exit 1; fi

Enterprise Features

345 Enterprise Scenarios: Finance (110) • Security (120) • ML (107)
AI Judge Framework: SecurityJudge, FinanceJudge, MLJudge with continuous feedback
Self-Improvement Engine: Automatic training data generation and retraining triggers from evaluation feedback
CISO-Ready Reports: Executive dashboards with compliance framework mapping
Cost Optimization: Smart model selection (Claude Sonnet ↔ Haiku)
Production Templates: GitHub Actions, input formats, enterprise onboarding

Complete Integration Guide: See examples/ci-templates/ for production-ready CI/CD workflows.

What's Next?

Try the Demo: arc-eval --quick-start --domain finance --agent-judge
Explore Examples: examples/ for workflows and CI/CD templates
Enterprise Setup: examples/ci-templates/ for production deployment
Get Support: Run arc-eval --help or visit our documentation

ARC-Eval: Transform agent compliance from static audits to continuous improvement with AI-powered feedback.

MIT License • Documentation • GitHub

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.9

Jun 4, 2025

0.2.8

Jun 1, 2025

0.2.7

May 30, 2025

0.2.6

May 29, 2025

0.2.5

May 28, 2025

0.2.4

May 28, 2025

0.2.3

May 27, 2025

0.2.2

May 27, 2025

This version

0.2.1

May 26, 2025

0.2.0

May 25, 2025

0.1.0

May 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arc_eval-0.2.1.tar.gz (137.3 kB view details)

Uploaded May 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arc_eval-0.2.1-py3-none-any.whl (141.3 kB view details)

Uploaded May 26, 2025 Python 3

File details

Details for the file arc_eval-0.2.1.tar.gz.

File metadata

Download URL: arc_eval-0.2.1.tar.gz
Upload date: May 26, 2025
Size: 137.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arc_eval-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`e5d6d57fa1c5b0a7c11cbca5d72e9d9c5464f2e6c4bd9959cc0cc08c1c8088b2`
MD5	`bad98fd86a1ced9e3377071188493f4e`
BLAKE2b-256	`62804e015944d7d695450dc8703316651b5d5f4492cce22c1eee2edb99611122`

See more details on using hashes here.

File details

Details for the file arc_eval-0.2.1-py3-none-any.whl.

File metadata

Download URL: arc_eval-0.2.1-py3-none-any.whl
Upload date: May 26, 2025
Size: 141.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arc_eval-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2389e73c79979ccfdde907399976c0a99efc7bf8acae13f2cdd5d3f99aa7722`
MD5	`6f41341f5d19122024e161d6d15d40c9`
BLAKE2b-256	`c1f066800d4c7aef0fa8ceb35a7c09566274392cee3eeef1f7aeba8126258569`

See more details on using hashes here.

arc-eval 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ARC-Eval: Agent-as-a-Judge Enterprise Platform

Quick Start

Why Agent-as-a-Judge?

🎯 Value Delivered

⚡ How It Works

Common Use Cases

Input Format

Key Commands

Enterprise Integration

CI/CD Pipeline

Enterprise Features

What's Next?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes