Skip to main content

Agent observability, tracing, and evaluation toolkit for agentic AI systems

Project description

AgentZen Logo

AgentZen

Observe how agents behave, not what they think.


Overview

AgentZen is a toolkit for observing and understanding how AI agents act.

It helps developers see what an agent did, why it made certain decisions, how confident it was, and how its actions change from run to run—without recording prompts, chain-of-thought, or internal model details.

AgentZen focuses on actions and outcomes, not generated text.

AgentZen focuses on behavior, not text.


Why AgentZen

Modern AI agents often fail silently.

Small changes such as:

  • prompt edits
  • model upgrades
  • tool-routing logic changes

can subtly alter agent behavior in ways that are difficult to detect.

Traditional logging shows outputs but not intent.
Logging chain-of-thought introduces privacy, safety, and policy risks.

AgentZen solves this by recording structured decision events instead of reasoning text, making agent behavior observable and safe for production.


Design Goals

  • Behavior over text
  • Explainability without reasoning leakage
  • Deterministic, structured traces
  • Production-safe and policy-compliant
  • CLI-first workflows
  • Minimal dependencies
  • CI compatibility

AgentZen is intentionally not:

  • an agent framework
  • a prompt manager
  • a monitoring dashboard

Core Concepts

Traces

A trace represents a single execution of an agent.

Each trace:

  • corresponds to one run
  • contains multiple spans
  • is written incrementally during execution
  • is exported as structured JSONL

Spans

A span represents a unit of agent behavior.

Examples:

  • planning phases
  • tool calls
  • retries
  • sub-task execution
  • decisions

Spans may be nested to reflect execution structure.


Decision Spans

A decision span captures a single agent choice.

Each decision span records:

  • decision name
  • available options
  • chosen option
  • confidence score (0.0–1.0)

Decision spans explicitly do not capture:

  • prompts
  • chain-of-thought
  • model internals
  • hidden state

This ensures safety, determinism, and production readiness.


Installation

Requirements

  • Python 3.9 or newer
  • pip

Install from PyPI

pip install agentzen

Verify Installation

agentzen --help

Expected output:

AgentZen CLI
Usage:
  agentzen trace <trace.jsonl> [--analyze] [--fail]
  agentzen trace diff <old.jsonl> <new.jsonl>

Basic Usage

Initializing the Tracer

from agentzen.tracing.tracer import Tracer
from agentzen.exporters.jsonl import JSONLExporter

tracer = Tracer(JSONLExporter("trace.jsonl"))

Creating a Trace

with tracer.trace("request"):
    ...

All spans created inside this block belong to the same execution.


Instrumenting Decisions

with tracer.decision(
    name="choose_tool",
    options=["search", "calculate"],
    chosen="search",
    confidence=0.72,
):
    pass

Guidelines:

  • options should enumerate all meaningful alternatives
  • chosen must be one of the options
  • confidence should reflect internal certainty

Each run produces a structured JSONL trace describing agent behavior.


CLI Reference

All CLI commands operate on JSONL trace files.


Command: trace (View Execution)

agentzen trace trace.jsonl

Example output:

Trace: request
└── Decision: choose_tool
    ├── Options: [search, calculate]
    ├── Chosen: search
    └── Confidence: 0.72

Command: trace --analyze (Behavioral Analysis)

agentzen trace trace.jsonl --analyze

Healthy trace output:

Behavioral Analysis Summary
---------------------------
✔ No critical issues detected

Observations:
- All decisions exceeded confidence threshold (0.60)
- No oscillation detected
- No repeated decision loops

Problematic trace output:

Behavioral Analysis Summary
---------------------------
⚠ Issues Detected: 2

1. Decision Loop Detected
   - Decision: choose_tool
   - Repeated 4 times
   - Recommendation: Add stopping condition

2. Low Confidence Decisions
   - Decision: select_action
   - Confidence below 0.40
   - Recommendation: Improve context or constrain options

Command: trace --analyze --fail (CI Enforcement)

agentzen trace trace.jsonl --analyze --fail

Example output:

Behavioral Analysis Summary
---------------------------
✖ Blocking Issues Detected

- Decision loop detected
- Confidence regression detected

Exiting with status code 1

Exit codes:

  • 0 → acceptable behavior
  • 1 → regression detected

Command: trace diff (Behavioral Diffing)

agentzen trace diff run_v1.jsonl run_v2.jsonl

Example output:

Behavioral Diff Summary
-----------------------
⚠ Behavior Changes Detected

Decision: choose_tool
- Previous: search (0.81)
- New: calculate (0.62)

Decision: plan_steps
- Confidence dropped by 0.27

Overall Assessment:
- Behavior materially changed

Command Summary

agentzen trace <trace.jsonl>
agentzen trace <trace.jsonl> --analyze
agentzen trace <trace.jsonl> --analyze --fail
agentzen trace diff <old.jsonl> <new.jsonl>

Production Usage

Feature Flagging

AgentZen is typically:

  • initialized once
  • guarded behind a feature flag
  • disabled in latency-sensitive paths

When disabled, overhead is near zero.


SDK Integration

SDK authors typically:

  • embed AgentZen internally
  • instrument key control-flow decisions
  • expose observability as an opt-in feature

Storage and Retention

Traces contain:

  • no prompts
  • no chain-of-thought
  • no sensitive model state

They are safe to store, share, and retain.


What AgentZen Does Not Do

AgentZen does not:

  • generate prompts
  • orchestrate agents
  • manage tools
  • provide dashboards
  • log reasoning or model internals

License

MIT License.


Who This Is For

AgentZen is built for:

  • developers building production AI agents
  • teams operating agent-based systems
  • SDK and platform engineers
  • organizations requiring behavioral guarantees over time

Just tell me.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentzen-0.1.0.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentzen-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file agentzen-0.1.0.tar.gz.

File metadata

  • Download URL: agentzen-0.1.0.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agentzen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8dd33ddbe83edcbc03ea2e582ba9749d1ffb06d8c884b2f573bc95bfd4ad9535
MD5 7ca7b583edeb68dd0b12277c43ba360d
BLAKE2b-256 eb48837f970bff98b9b670d055cc5ecfb882a5666d50b8520d56f7e3f85600f1

See more details on using hashes here.

File details

Details for the file agentzen-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentzen-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agentzen-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a949f7315b8791a7da40756eacd951d092d3ee5f323093264191ede272d1bf2
MD5 b4105daed3aebee2957198345d341f1d
BLAKE2b-256 15bee015f33b923ca697b1bc3c7e902fd60477f7eb5952fc6be3c6c52cf2074d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page