Agent Reliability Observatory — a behavioral taxonomy and annotation framework for analyzing why coding agents succeed or fail.
Project description
Agent Reliability Observatory
A behavioral taxonomy and annotation framework for analyzing why coding agents succeed or fail on benchmark tasks.
Install
pip install agent-diagnostics
Quick Start
from agent_diagnostics import load_taxonomy, valid_category_names
# Load the 23-category behavioral taxonomy
taxonomy = load_taxonomy()
print(f"{len(taxonomy['categories'])} categories")
# Get valid category names
names = valid_category_names()
print(names)
# Validate an annotation
from agent_diagnostics import validate_annotation_categories
annotation = {
"categories": [
{"name": "retrieval_failure", "confidence": 0.9},
]
}
validate_annotation_categories(annotation) # raises ValueError if invalid
Taxonomy
The taxonomy organizes agent behaviors into three polarities:
| Polarity | Count | Purpose |
|---|---|---|
| failure | 16 | Explains why the agent failed or underperformed |
| success | 5 | Explains which strategy led to success |
| neutral | 2-3 | Contextual factors that affect interpretation |
Taxonomy Versions
- v1 (flat): Categories in a flat list with
name,description,polarity,detection_hints,examples - v2 (hierarchical): Categories organized by dimension (Retrieval, Execution, etc.)
from agent_diagnostics.taxonomy import load_taxonomy, _package_data_path
# Load v2 (hierarchical dimensions)
v2 = load_taxonomy(_package_data_path("taxonomy_v2.yaml"))
Annotation Schema
The package includes a JSON Schema for machine-readable annotations:
from agent_diagnostics.taxonomy import get_schema_path
schema_path = get_schema_path()
Exemplars
25 hand-annotated examples covering all 23 taxonomy categories are bundled with the package under exemplars/.
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_diagnostics-0.5.0.tar.gz.
File metadata
- Download URL: agent_diagnostics-0.5.0.tar.gz
- Upload date:
- Size: 91.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c10c63f38141aa74a11d16b9abc9221fc0e5fb54322780e9cfbe4695c09107a5
|
|
| MD5 |
1476dfe240986e48d4e86a3542b3184d
|
|
| BLAKE2b-256 |
c05b3b10acb847258008ba129fb8879887daec99d0706d960c104daa7bedb519
|
File details
Details for the file agent_diagnostics-0.5.0-py3-none-any.whl.
File metadata
- Download URL: agent_diagnostics-0.5.0-py3-none-any.whl
- Upload date:
- Size: 77.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61c1657cb39578496c822f8ac96653fd271ef8f27d52fa40855b3d44e49f80a1
|
|
| MD5 |
b7a564846a75db5b9781d9f601b3635c
|
|
| BLAKE2b-256 |
4e732c26aeb8b0b7783e6297608afba60cd50981fb0b060510b35bd09b07fa05
|