Agent observability, tracing, and evaluation toolkit for agentic AI systems
Project description
AgentZen
Observe how agents behave, not what they think.
Overview
AgentZen is a toolkit for observing and understanding how AI agents act.
It helps developers see what an agent did, why it made certain decisions, how confident it was, and how its actions change from run to run—without recording prompts, chain-of-thought, or internal model details.
AgentZen focuses on actions and outcomes, not generated text.
AgentZen focuses on behavior, not text.
Why AgentZen
Modern AI agents often fail silently.
Small changes such as:
- prompt edits
- model upgrades
- tool-routing logic changes
can subtly alter agent behavior in ways that are difficult to detect.
Traditional logging shows outputs but not intent.
Logging chain-of-thought introduces privacy, safety, and policy risks.
AgentZen solves this by recording structured decision events instead of reasoning text, making agent behavior observable and safe for production.
Design Goals
- Behavior over text
- Explainability without reasoning leakage
- Deterministic, structured traces
- Production-safe and policy-compliant
- CLI-first workflows
- Minimal dependencies
- CI compatibility
AgentZen is intentionally not:
- an agent framework
- a prompt manager
- a monitoring dashboard
Core Concepts
Traces
A trace represents a single execution of an agent.
Each trace:
- corresponds to one run
- contains multiple spans
- is written incrementally during execution
- is exported as structured JSONL
Spans
A span represents a unit of agent behavior.
Examples:
- planning phases
- tool calls
- retries
- sub-task execution
- decisions
Spans may be nested to reflect execution structure.
Decision Spans
A decision span captures a single agent choice.
Each decision span records:
- decision name
- available options
- chosen option
- confidence score (0.0–1.0)
Decision spans explicitly do not capture:
- prompts
- chain-of-thought
- model internals
- hidden state
This ensures safety, determinism, and production readiness.
Installation
Requirements
- Python 3.9 or newer
- pip
Install from PyPI
pip install agentzen
Verify Installation
agentzen --help
Expected output:
AgentZen CLI
Usage:
agentzen trace <trace.jsonl> [--analyze] [--fail]
agentzen trace diff <old.jsonl> <new.jsonl>
Basic Usage
Initializing the Tracer
from agentzen.tracing.tracer import Tracer
from agentzen.exporters.jsonl import JSONLExporter
tracer = Tracer(JSONLExporter("trace.jsonl"))
Creating a Trace
with tracer.trace("request"):
...
All spans created inside this block belong to the same execution.
Instrumenting Decisions
with tracer.decision(
name="choose_tool",
options=["search", "calculate"],
chosen="search",
confidence=0.72,
):
pass
Guidelines:
- options should enumerate all meaningful alternatives
- chosen must be one of the options
- confidence should reflect internal certainty
Each run produces a structured JSONL trace describing agent behavior.
CLI Reference
All CLI commands operate on JSONL trace files.
Command: trace (View Execution)
agentzen trace trace.jsonl
Example output:
Trace: request
└── Decision: choose_tool
├── Options: [search, calculate]
├── Chosen: search
└── Confidence: 0.72
Command: trace --analyze (Behavioral Analysis)
agentzen trace trace.jsonl --analyze
Healthy trace output:
Behavioral Analysis Summary
---------------------------
✔ No critical issues detected
Observations:
- All decisions exceeded confidence threshold (0.60)
- No oscillation detected
- No repeated decision loops
Problematic trace output:
Behavioral Analysis Summary
---------------------------
⚠ Issues Detected: 2
1. Decision Loop Detected
- Decision: choose_tool
- Repeated 4 times
- Recommendation: Add stopping condition
2. Low Confidence Decisions
- Decision: select_action
- Confidence below 0.40
- Recommendation: Improve context or constrain options
Command: trace --analyze --fail (CI Enforcement)
agentzen trace trace.jsonl --analyze --fail
Example output:
Behavioral Analysis Summary
---------------------------
✖ Blocking Issues Detected
- Decision loop detected
- Confidence regression detected
Exiting with status code 1
Exit codes:
- 0 → acceptable behavior
- 1 → regression detected
Command: trace diff (Behavioral Diffing)
agentzen trace diff run_v1.jsonl run_v2.jsonl
Example output:
Behavioral Diff Summary
-----------------------
⚠ Behavior Changes Detected
Decision: choose_tool
- Previous: search (0.81)
- New: calculate (0.62)
Decision: plan_steps
- Confidence dropped by 0.27
Overall Assessment:
- Behavior materially changed
Command Summary
agentzen trace <trace.jsonl>
agentzen trace <trace.jsonl> --analyze
agentzen trace <trace.jsonl> --analyze --fail
agentzen trace diff <old.jsonl> <new.jsonl>
Production Usage
Feature Flagging
AgentZen is typically:
- initialized once
- guarded behind a feature flag
- disabled in latency-sensitive paths
When disabled, overhead is near zero.
SDK Integration
SDK authors typically:
- embed AgentZen internally
- instrument key control-flow decisions
- expose observability as an opt-in feature
Storage and Retention
Traces contain:
- no prompts
- no chain-of-thought
- no sensitive model state
They are safe to store, share, and retain.
What AgentZen Does Not Do
AgentZen does not:
- generate prompts
- orchestrate agents
- manage tools
- provide dashboards
- log reasoning or model internals
License
MIT License.
Who This Is For
AgentZen is built for:
- developers building production AI agents
- teams operating agent-based systems
- SDK and platform engineers
- organizations requiring behavioral guarantees over time
Just tell me.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentzen-0.1.0.tar.gz.
File metadata
- Download URL: agentzen-0.1.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dd33ddbe83edcbc03ea2e582ba9749d1ffb06d8c884b2f573bc95bfd4ad9535
|
|
| MD5 |
7ca7b583edeb68dd0b12277c43ba360d
|
|
| BLAKE2b-256 |
eb48837f970bff98b9b670d055cc5ecfb882a5666d50b8520d56f7e3f85600f1
|
File details
Details for the file agentzen-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentzen-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a949f7315b8791a7da40756eacd951d092d3ee5f323093264191ede272d1bf2
|
|
| MD5 |
b4105daed3aebee2957198345d341f1d
|
|
| BLAKE2b-256 |
15bee015f33b923ca697b1bc3c7e902fd60477f7eb5952fc6be3c6c52cf2074d
|