Agent quality analysis and repair SDK for OpenTelemetry traces
Project description
๐ฏ AgentCoach
Agent quality analysis and repair SDK for OpenTelemetry traces
AgentCoach analyzes agent execution traces to detect quality issues, identify root causes, and provide actionable recommendations for improvement. It works with OpenTelemetry/OpenInference-style traces and supports runtime repair loops.
โจ Features
- ๐ Trace Analysis: Ingest and analyze OpenTelemetry/OpenInference traces
- ๐ฏ 7 Quality Detectors:
- Output contract/schema validation
- Evidence grounding verification
- Tool-use failure detection
- Loop/planning failure detection
- State/constraint loss detection
- Policy/tone compliance
- Consistency detection (stub)
- ๐ Rich Reporting: JSON and HTML reports with quality scores
- ๐ง Runtime Repair: Automatic output repair with evidence grounding
- ๐ก Engineering Coach: Actionable recommendations (prompt diffs, retrieval settings, etc.)
- ๐งช Canary Tests: Auto-generate regression test suites from failures
- ๐ LangGraph Integration: Drop-in quality guard node
- ๐ค Optional LLM Judge: OpenAI, Anthropic, or SAP BTP AI Core
๐ Quick Start
Installation
# Clone the repository
git clone <repo-url>
cd agentcoach
# Install in development mode
pip install -e .
# Or install with dev dependencies
pip install -e ".[dev]"
Initialize Configuration
agentcoach init
This creates:
agentcoach.yaml- Configuration file.env.example- Environment variables template
Analyze a Trace
agentcoach analyze --trace examples/sample_trace.json --out results/
This generates:
results/report.json- Structured findingsresults/report.html- Interactive HTML report
View Results
Open results/report.html in your browser to see:
- Quality score
- Findings by severity and category
- Engineering recommendations
- Suggested fixes
๐ Usage
CLI Commands
1. Initialize Project
agentcoach init
2. Analyze Traces
# Basic analysis
agentcoach analyze --trace path/to/trace.json --out output_dir/
# With custom config
agentcoach analyze --trace trace.json --out results/ --config agentcoach.yaml
# With LLM judge (requires API keys in .env)
agentcoach analyze --trace trace.json --out results/ --llm-judge
3. Repair Output
# Repair with heuristics only
agentcoach repair --trace trace.json --out repaired/
# Repair with LLM provider
agentcoach repair --trace trace.json --out repaired/ --llm-provider openai
4. Generate Canary Tests
agentcoach canary --report results/report.json --suite canary_tests/
Python SDK
from agentcoach import load_trace, analyze_trace
from agentcoach.report import generate_report
# Load and analyze trace
trace = load_trace("path/to/trace.json")
findings = analyze_trace(trace)
# Generate reports
generate_report(trace, findings, "output_dir/")
LangGraph Integration
from agentcoach.langgraph import QualityGuardNode
# Create quality guard node
quality_guard = QualityGuardNode(
contract_schema="schemas/default_contract.json",
policy_pack="schemas/default_policy.json",
auto_repair=True,
)
# Add to your LangGraph
from langgraph.graph import StateGraph
graph = StateGraph(AgentState)
graph.add_node("quality_guard", quality_guard)
graph.add_edge("draft_answer", "quality_guard")
graph.add_edge("quality_guard", END)
app = graph.compile()
See examples/langgraph_demo.py for a complete example.
๐ง Configuration
agentcoach.yaml
# Output contract schema
contract_schema: schemas/default_contract.json
# Policy pack
policy: schemas/default_policy.json
# LLM Judge
llm_judge:
enabled: false
provider: openai # openai, anthropic, or sap
# Detector configuration
detectors:
schema:
enabled: true
grounding:
enabled: true
require_citations: true
tool_use:
enabled: true
loops:
enabled: true
max_repeats: 3
state:
enabled: true
policy_tone:
enabled: true
consistency:
enabled: false
Environment Variables
Create a .env file (see .env.example):
# OpenAI
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini
# Anthropic
ANTHROPIC_API_KEY=your_key_here
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
# SAP BTP AI Core
AICORE_BASE_URL=https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com
AICORE_CLIENT_ID=your_client_id
AICORE_CLIENT_SECRET=your_client_secret
AICORE_RESOURCE_GROUP=default
AICORE_MODEL=gpt-4
๐ Trace Format
AgentCoach supports OpenTelemetry and simplified trace formats:
Simplified Format
{
"trace_id": "trace-001",
"spans": [
{
"span_id": "span-1",
"name": "agent_run",
"kind": "agent",
"attributes": {
"input.value": "User query",
"output.value": "Agent response"
}
},
{
"span_id": "span-2",
"parent_span_id": "span-1",
"name": "retrieval",
"kind": "retrieval",
"attributes": {
"retrieval.query": "search query",
"documents": [
{"content": "Retrieved document text"}
]
}
}
]
}
Exporting from LangGraph
from langchain_core.tracers import LangChainTracer
import json
tracer = LangChainTracer()
result = graph.invoke(input, config={"callbacks": [tracer]})
# Export trace
with open("trace.json", "w") as f:
json.dump(tracer.runs[0].dict(), f)
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=agentcoach --cov-report=html
# Run specific test
pytest tests/test_schema_detector.py -v
๐ฏ Quality Detectors
1. Schema Detector
Validates output against JSON schema contracts.
Checks:
- Required fields present
- Correct data types
- Valid JSON format
2. Grounding Detector
Verifies answers are grounded in evidence.
Checks:
- Citations present
- Evidence referenced in answer
- Tool outputs used
3. Tool-Use Detector
Detects tool execution failures.
Checks:
- Tool errors
- Ignored tool outputs
- Premature final answers
4. Loop Detector
Identifies infinite loops and planning failures.
Checks:
- Repeated tool calls
- Repeated LLM prompts
- Excessive iterations
5. State Detector
Tracks constraint loss.
Checks:
- User constraints maintained
- Requirements addressed
6. Policy/Tone Detector
Enforces policy compliance.
Checks:
- Banned phrases
- Answer length limits
- Tone requirements
7. Consistency Detector
Multi-run variance analysis (MVP stub).
๐ก Engineering Recommendations
AgentCoach provides actionable recommendations:
Prompt Engineering
--- system_prompt
+++ system_prompt
You are a helpful assistant.
+
+Always format your response as JSON with:
+{"answer": "...", "confidence": 0.0-1.0, "citations": [...]}
Retrieval Settings
- Increase top_k from 3 to 5-10
- Add re-ranking step
- Implement query rewriting
Error Handling
def call_tool_with_retry(tool_name, args, max_retries=2):
for attempt in range(max_retries + 1):
try:
return execute_tool(tool_name, args)
except Exception as e:
if attempt < max_retries:
args = fix_tool_args(tool_name, args, error=str(e))
else:
return {"error": str(e)}
Architecture
- Add loop detection
- Implement memory trimming
- Add policy validation node
๐งช Canary Tests
Generate regression tests from failures:
agentcoach canary --report results/report.json --suite canary_tests/
This creates:
canary_tests/cases.jsonl- Test casescanary_tests/test_canary.py- Pytest file
Implement the run_agent() function and run:
pytest canary_tests/test_canary.py -v
๐ Project Structure
agentcoach/
โโโ agentcoach/
โ โโโ __init__.py
โ โโโ cli.py # CLI commands
โ โโโ models.py # Data models
โ โโโ trace_ingest.py # Trace parsing
โ โโโ config.py # Configuration
โ โโโ contracts.py # Schema validation
โ โโโ report.py # Report generation
โ โโโ repair.py # Runtime repair
โ โโโ judge.py # LLM judge adapters
โ โโโ canary.py # Test generation
โ โโโ langgraph.py # LangGraph integration
โ โโโ detectors/ # Quality detectors
โโโ schemas/ # Default schemas
โโโ examples/ # Example code
โโโ tests/ # Test suite
โโโ README.md
๐ค Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Run
pytestandruff check - Submit a pull request
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
Built for analyzing agent quality with OpenTelemetry/OpenInference traces.
๐ Support
- Issues: GitHub Issues
- Documentation: This README
- Examples: See
examples/directory
Made with โค๏ธ for better agent quality
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentcoach-0.1.0.tar.gz.
File metadata
- Download URL: agentcoach-0.1.0.tar.gz
- Upload date:
- Size: 42.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f4cd058251ef15074b65c73be09cc1c4b6a9bf0bb2e2d26273f1d0d41b0e77b
|
|
| MD5 |
43ce629cc8a24f105a6c9390c8ceef2a
|
|
| BLAKE2b-256 |
04f3ed5d453711bcb2e82eb8252afcfd5d37d8d993e11b7daa78963144a47cbe
|
File details
Details for the file agentcoach-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentcoach-0.1.0-py3-none-any.whl
- Upload date:
- Size: 38.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53d16e2764ebee5bf29c58ef6aaf7b2966b4eff86743316690119ff6b01fc96a
|
|
| MD5 |
fd36614e4fc2152650a58e9c9c278e19
|
|
| BLAKE2b-256 |
c1f9d1e7675f354e745568287af6cf25575aaaec1db02fa73ff71bfdfd655fe0
|