Skip to main content

Comprehensive evaluation framework for Agent Communication Protocol (ACP) agents

Project description

ACP Evals

Production-ready evaluation framework for agents in the ACP/BeeAI ecosystem

Python ACP Compatible License

Overview

ACP Evals provides comprehensive evaluation tools for agents built with the Agent Communication Protocol (ACP). It enables developers to measure, benchmark, and improve agent performance with a focus on multi-agent systems, production metrics, and developer experience.

Key Features

  • 🚀 Zero to evaluation in < 5 lines of code
  • 🤖 Multi-agent focused - Specialized metrics for agent coordination
  • 🔌 Multiple LLM providers - OpenAI, Anthropic, Ollama, or mock mode
  • 📊 Production metrics - Token usage, costs, latency tracking
  • 🎯 Built-in evaluators - Accuracy, performance, reliability, safety

Quick Start

from acp_evals import evaluate, AccuracyEval

# Evaluate any agent with just 3 lines
result = evaluate(
    AccuracyEval(agent="http://localhost:8000/agents/my-agent"),
    input="What is the capital of France?",
    expected="Paris"
)

Documentation

Installation

pip install acp-evals

# Or with specific provider support
pip install "acp-evals[openai]"
pip install "acp-evals[anthropic]"
pip install "acp-evals[all-providers]"

Project Structure

acp-evals/
├── python/                 # Python implementation
│   ├── src/acp_evals/     # Core package
│   ├── tests/             # Test suite
│   ├── examples/          # Example scripts
│   └── docs/              # Documentation
└── internal-docs/         # Internal planning documents

Contributing

We welcome contributions! Please see the Python contributing guide for details.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Community


Part of the BeeAI project, an initiative of the Linux Foundation AI & Data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acp_evals-0.1.0.tar.gz (189.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acp_evals-0.1.0-py3-none-any.whl (149.0 kB view details)

Uploaded Python 3

File details

Details for the file acp_evals-0.1.0.tar.gz.

File metadata

  • Download URL: acp_evals-0.1.0.tar.gz
  • Upload date:
  • Size: 189.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for acp_evals-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c6fa2cba7b5e2e8a8b495dbf928bf347322076d663f403909b9edb4cae5e7e4b
MD5 06b37f2f3af606e55b96e898fc3ed6df
BLAKE2b-256 89f05854af846735814236591147fd488ddd5678ac7faa5d20c5327fc4634be2

See more details on using hashes here.

File details

Details for the file acp_evals-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: acp_evals-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 149.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for acp_evals-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a64da15ecea203eb3d20f38a6cba44a7dd41d0ecb2ff9a571a1eeb347542781
MD5 3b008200b41066a64e3b95e01294fc7f
BLAKE2b-256 5e3e4a2a824fdb6970367ca1a3725ab44818487e04083d9a1c36c5d0a622c145

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page