Skip to main content

Agent quality analysis and repair SDK for OpenTelemetry traces

Project description

๐ŸŽฏ AgentCoach

Agent quality analysis and repair SDK for OpenTelemetry traces

AgentCoach analyzes agent execution traces to detect quality issues, identify root causes, and provide actionable recommendations for improvement. It works with OpenTelemetry/OpenInference-style traces and supports runtime repair loops.

Python 3.11+ License: MIT

โœจ Features

  • ๐Ÿ” Trace Analysis: Ingest and analyze OpenTelemetry/OpenInference traces
  • ๐ŸŽฏ 7 Quality Detectors:
    • Output contract/schema validation
    • Evidence grounding verification
    • Tool-use failure detection
    • Loop/planning failure detection
    • State/constraint loss detection
    • Policy/tone compliance
    • Consistency detection (stub)
  • ๐Ÿ“Š Rich Reporting: JSON and HTML reports with quality scores
  • ๐Ÿ”ง Runtime Repair: Automatic output repair with evidence grounding
  • ๐Ÿ’ก Engineering Coach: Actionable recommendations (prompt diffs, retrieval settings, etc.)
  • ๐Ÿงช Canary Tests: Auto-generate regression test suites from failures
  • ๐Ÿ”— LangGraph Integration: Drop-in quality guard node
  • ๐Ÿค– Optional LLM Judge: OpenAI, Anthropic, or SAP BTP AI Core

๐Ÿš€ Quick Start

Installation

# Clone the repository
git clone <repo-url>
cd agentcoach

# Install in development mode
pip install -e .

# Or install with dev dependencies
pip install -e ".[dev]"

Initialize Configuration

agentcoach init

This creates:

  • agentcoach.yaml - Configuration file
  • .env.example - Environment variables template

Analyze a Trace

agentcoach analyze --trace examples/sample_trace.json --out results/

This generates:

  • results/report.json - Structured findings
  • results/report.html - Interactive HTML report

View Results

Open results/report.html in your browser to see:

  • Quality score
  • Findings by severity and category
  • Engineering recommendations
  • Suggested fixes

๐Ÿ“– Usage

CLI Commands

1. Initialize Project

agentcoach init

2. Analyze Traces

# Basic analysis
agentcoach analyze --trace path/to/trace.json --out output_dir/

# With custom config
agentcoach analyze --trace trace.json --out results/ --config agentcoach.yaml

# With LLM judge (requires API keys in .env)
agentcoach analyze --trace trace.json --out results/ --llm-judge

3. Repair Output

# Repair with heuristics only
agentcoach repair --trace trace.json --out repaired/

# Repair with LLM provider
agentcoach repair --trace trace.json --out repaired/ --llm-provider openai

4. Generate Canary Tests

agentcoach canary --report results/report.json --suite canary_tests/

Python SDK

from agentcoach import load_trace, analyze_trace
from agentcoach.report import generate_report

# Load and analyze trace
trace = load_trace("path/to/trace.json")
findings = analyze_trace(trace)

# Generate reports
generate_report(trace, findings, "output_dir/")

LangGraph Integration

from agentcoach.langgraph import QualityGuardNode

# Create quality guard node
quality_guard = QualityGuardNode(
    contract_schema="schemas/default_contract.json",
    policy_pack="schemas/default_policy.json",
    auto_repair=True,
)

# Add to your LangGraph
from langgraph.graph import StateGraph

graph = StateGraph(AgentState)
graph.add_node("quality_guard", quality_guard)
graph.add_edge("draft_answer", "quality_guard")
graph.add_edge("quality_guard", END)

app = graph.compile()

See examples/langgraph_demo.py for a complete example.

๐Ÿ”ง Configuration

agentcoach.yaml

# Output contract schema
contract_schema: schemas/default_contract.json

# Policy pack
policy: schemas/default_policy.json

# LLM Judge
llm_judge:
  enabled: false
  provider: openai  # openai, anthropic, or sap

# Detector configuration
detectors:
  schema:
    enabled: true
  grounding:
    enabled: true
    require_citations: true
  tool_use:
    enabled: true
  loops:
    enabled: true
    max_repeats: 3
  state:
    enabled: true
  policy_tone:
    enabled: true
  consistency:
    enabled: false

Environment Variables

Create a .env file (see .env.example):

# OpenAI
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini

# Anthropic
ANTHROPIC_API_KEY=your_key_here
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

# SAP BTP AI Core
AICORE_BASE_URL=https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com
AICORE_CLIENT_ID=your_client_id
AICORE_CLIENT_SECRET=your_client_secret
AICORE_RESOURCE_GROUP=default
AICORE_MODEL=gpt-4

๐Ÿ“Š Trace Format

AgentCoach supports OpenTelemetry and simplified trace formats:

Simplified Format

{
  "trace_id": "trace-001",
  "spans": [
    {
      "span_id": "span-1",
      "name": "agent_run",
      "kind": "agent",
      "attributes": {
        "input.value": "User query",
        "output.value": "Agent response"
      }
    },
    {
      "span_id": "span-2",
      "parent_span_id": "span-1",
      "name": "retrieval",
      "kind": "retrieval",
      "attributes": {
        "retrieval.query": "search query",
        "documents": [
          {"content": "Retrieved document text"}
        ]
      }
    }
  ]
}

Exporting from LangGraph

from langchain_core.tracers import LangChainTracer
import json

tracer = LangChainTracer()
result = graph.invoke(input, config={"callbacks": [tracer]})

# Export trace
with open("trace.json", "w") as f:
    json.dump(tracer.runs[0].dict(), f)

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=agentcoach --cov-report=html

# Run specific test
pytest tests/test_schema_detector.py -v

๐ŸŽฏ Quality Detectors

1. Schema Detector

Validates output against JSON schema contracts.

Checks:

  • Required fields present
  • Correct data types
  • Valid JSON format

2. Grounding Detector

Verifies answers are grounded in evidence.

Checks:

  • Citations present
  • Evidence referenced in answer
  • Tool outputs used

3. Tool-Use Detector

Detects tool execution failures.

Checks:

  • Tool errors
  • Ignored tool outputs
  • Premature final answers

4. Loop Detector

Identifies infinite loops and planning failures.

Checks:

  • Repeated tool calls
  • Repeated LLM prompts
  • Excessive iterations

5. State Detector

Tracks constraint loss.

Checks:

  • User constraints maintained
  • Requirements addressed

6. Policy/Tone Detector

Enforces policy compliance.

Checks:

  • Banned phrases
  • Answer length limits
  • Tone requirements

7. Consistency Detector

Multi-run variance analysis (MVP stub).

๐Ÿ’ก Engineering Recommendations

AgentCoach provides actionable recommendations:

Prompt Engineering

--- system_prompt
+++ system_prompt
 You are a helpful assistant.
+
+Always format your response as JSON with:
+{"answer": "...", "confidence": 0.0-1.0, "citations": [...]}

Retrieval Settings

  • Increase top_k from 3 to 5-10
  • Add re-ranking step
  • Implement query rewriting

Error Handling

def call_tool_with_retry(tool_name, args, max_retries=2):
    for attempt in range(max_retries + 1):
        try:
            return execute_tool(tool_name, args)
        except Exception as e:
            if attempt < max_retries:
                args = fix_tool_args(tool_name, args, error=str(e))
            else:
                return {"error": str(e)}

Architecture

  • Add loop detection
  • Implement memory trimming
  • Add policy validation node

๐Ÿงช Canary Tests

Generate regression tests from failures:

agentcoach canary --report results/report.json --suite canary_tests/

This creates:

  • canary_tests/cases.jsonl - Test cases
  • canary_tests/test_canary.py - Pytest file

Implement the run_agent() function and run:

pytest canary_tests/test_canary.py -v

๐Ÿ“ Project Structure

agentcoach/
โ”œโ”€โ”€ agentcoach/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cli.py              # CLI commands
โ”‚   โ”œโ”€โ”€ models.py           # Data models
โ”‚   โ”œโ”€โ”€ trace_ingest.py     # Trace parsing
โ”‚   โ”œโ”€โ”€ config.py           # Configuration
โ”‚   โ”œโ”€โ”€ contracts.py        # Schema validation
โ”‚   โ”œโ”€โ”€ report.py           # Report generation
โ”‚   โ”œโ”€โ”€ repair.py           # Runtime repair
โ”‚   โ”œโ”€โ”€ judge.py            # LLM judge adapters
โ”‚   โ”œโ”€โ”€ canary.py           # Test generation
โ”‚   โ”œโ”€โ”€ langgraph.py        # LangGraph integration
โ”‚   โ””โ”€โ”€ detectors/          # Quality detectors
โ”œโ”€โ”€ schemas/                # Default schemas
โ”œโ”€โ”€ examples/               # Example code
โ”œโ”€โ”€ tests/                  # Test suite
โ””โ”€โ”€ README.md

๐Ÿค Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Run pytest and ruff check
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments

Built for analyzing agent quality with OpenTelemetry/OpenInference traces.

๐Ÿ“ž Support

  • Issues: GitHub Issues
  • Documentation: This README
  • Examples: See examples/ directory

Made with โค๏ธ for better agent quality

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentcoach-0.1.0.tar.gz (42.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentcoach-0.1.0-py3-none-any.whl (38.1 kB view details)

Uploaded Python 3

File details

Details for the file agentcoach-0.1.0.tar.gz.

File metadata

  • Download URL: agentcoach-0.1.0.tar.gz
  • Upload date:
  • Size: 42.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agentcoach-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3f4cd058251ef15074b65c73be09cc1c4b6a9bf0bb2e2d26273f1d0d41b0e77b
MD5 43ce629cc8a24f105a6c9390c8ceef2a
BLAKE2b-256 04f3ed5d453711bcb2e82eb8252afcfd5d37d8d993e11b7daa78963144a47cbe

See more details on using hashes here.

File details

Details for the file agentcoach-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentcoach-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agentcoach-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53d16e2764ebee5bf29c58ef6aaf7b2966b4eff86743316690119ff6b01fc96a
MD5 fd36614e4fc2152650a58e9c9c278e19
BLAKE2b-256 c1f9d1e7675f354e745568287af6cf25575aaaec1db02fa73ff71bfdfd655fe0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page