Skip to main content

Agent Reliability Observatory — a behavioral taxonomy and annotation framework for analyzing why coding agents succeed or fail.

Project description

Agent Reliability Observatory

A behavioral taxonomy and annotation framework for analyzing why coding agents succeed or fail on benchmark tasks.

Install

pip install agent-diagnostics

Quick Start

from agent_diagnostics import load_taxonomy, valid_category_names

# Load the 23-category behavioral taxonomy
taxonomy = load_taxonomy()
print(f"{len(taxonomy['categories'])} categories")

# Get valid category names
names = valid_category_names()
print(names)

# Validate an annotation
from agent_diagnostics import validate_annotation_categories

annotation = {
    "categories": [
        {"name": "retrieval_failure", "confidence": 0.9},
    ]
}
validate_annotation_categories(annotation)  # raises ValueError if invalid

Taxonomy

The taxonomy organizes agent behaviors into three polarities:

Polarity Count Purpose
failure 16 Explains why the agent failed or underperformed
success 5 Explains which strategy led to success
neutral 2-3 Contextual factors that affect interpretation

Taxonomy Versions

  • v1 (flat): Categories in a flat list with name, description, polarity, detection_hints, examples
  • v2 (hierarchical): Categories organized by dimension (Retrieval, Execution, etc.)
from agent_diagnostics.taxonomy import load_taxonomy, _package_data_path

# Load v2 (hierarchical dimensions)
v2 = load_taxonomy(_package_data_path("taxonomy_v2.yaml"))

Annotation Schema

The package includes a JSON Schema for machine-readable annotations:

from agent_diagnostics.taxonomy import get_schema_path

schema_path = get_schema_path()

Exemplars

25 hand-annotated examples covering all 23 taxonomy categories are bundled with the package under exemplars/.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_diagnostics-0.5.0.tar.gz (91.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_diagnostics-0.5.0-py3-none-any.whl (77.0 kB view details)

Uploaded Python 3

File details

Details for the file agent_diagnostics-0.5.0.tar.gz.

File metadata

  • Download URL: agent_diagnostics-0.5.0.tar.gz
  • Upload date:
  • Size: 91.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agent_diagnostics-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c10c63f38141aa74a11d16b9abc9221fc0e5fb54322780e9cfbe4695c09107a5
MD5 1476dfe240986e48d4e86a3542b3184d
BLAKE2b-256 c05b3b10acb847258008ba129fb8879887daec99d0706d960c104daa7bedb519

See more details on using hashes here.

File details

Details for the file agent_diagnostics-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_diagnostics-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61c1657cb39578496c822f8ac96653fd271ef8f27d52fa40855b3d44e49f80a1
MD5 b7a564846a75db5b9781d9f601b3635c
BLAKE2b-256 4e732c26aeb8b0b7783e6297608afba60cd50981fb0b060510b35bd09b07fa05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page