Comprehensive evaluation framework for Agent Communication Protocol (ACP) agents
Project description
ACP Evals
Production-ready evaluation framework for agents in the ACP/BeeAI ecosystem
Overview
ACP Evals provides comprehensive evaluation tools for agents built with the Agent Communication Protocol (ACP). It enables developers to measure, benchmark, and improve agent performance with a focus on multi-agent systems, production metrics, and developer experience.
Key Features
- 🚀 Zero to evaluation in < 5 lines of code
- 🤖 Multi-agent focused - Specialized metrics for agent coordination
- 🔌 Multiple LLM providers - OpenAI, Anthropic, Ollama, or mock mode
- 📊 Production metrics - Token usage, costs, latency tracking
- 🎯 Built-in evaluators - Accuracy, performance, reliability, safety
Quick Start
from acp_evals import evaluate, AccuracyEval
# Evaluate any agent with just 3 lines
result = evaluate(
AccuracyEval(agent="http://localhost:8000/agents/my-agent"),
input="What is the capital of France?",
expected="Paris"
)
Documentation
Installation
pip install acp-evals
# Or with specific provider support
pip install "acp-evals[openai]"
pip install "acp-evals[anthropic]"
pip install "acp-evals[all-providers]"
Project Structure
acp-evals/
├── python/ # Python implementation
│ ├── src/acp_evals/ # Core package
│ ├── tests/ # Test suite
│ ├── examples/ # Example scripts
│ └── docs/ # Documentation
└── internal-docs/ # Internal planning documents
Contributing
We welcome contributions! Please see the Python contributing guide for details.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Community
Part of the BeeAI project, an initiative of the Linux Foundation AI & Data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file acp_evals-0.1.0.tar.gz.
File metadata
- Download URL: acp_evals-0.1.0.tar.gz
- Upload date:
- Size: 189.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6fa2cba7b5e2e8a8b495dbf928bf347322076d663f403909b9edb4cae5e7e4b
|
|
| MD5 |
06b37f2f3af606e55b96e898fc3ed6df
|
|
| BLAKE2b-256 |
89f05854af846735814236591147fd488ddd5678ac7faa5d20c5327fc4634be2
|
File details
Details for the file acp_evals-0.1.0-py3-none-any.whl.
File metadata
- Download URL: acp_evals-0.1.0-py3-none-any.whl
- Upload date:
- Size: 149.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a64da15ecea203eb3d20f38a6cba44a7dd41d0ecb2ff9a571a1eeb347542781
|
|
| MD5 |
3b008200b41066a64e3b95e01294fc7f
|
|
| BLAKE2b-256 |
5e3e4a2a824fdb6970367ca1a3725ab44818487e04083d9a1c36c5d0a622c145
|