Skip to main content

AgentTrace — the record of what happened during an agent run. The shared trace primitive for the Aevyra stack (Reflex, Origin, Verdict).

Project description

aevyra-witness

CI Security License Docs

When an AI agent gives a wrong answer, which step caused it? The LLM call that misread the context? The tool that returned stale data? The retrieval that pulled the wrong docs? Without a structured record of what actually ran, you're guessing.

Witness records every step of an agent pipeline — its inputs, outputs, timing, and how the steps relate to each other — in a single structured object called an AgentTrace. Add @span to each function you want to instrument, wrap the run in trace(), and Witness does the rest. That trace is then ready to hand to Origin for failure attribution, Verdict for scoring, or Reflex for prompt optimization.

Use cases

  • Debugging a failing agent — see exactly which step produced the bad output, what it received as input, and what it returned, without adding print statements everywhere.
  • Attributing failures across a multi-step pipeline — when a plan-act-respond loop fails, know whether the planner, a tool call, or the final response step was responsible.
  • Feeding evaluation and optimization tools — pass the same trace to a judge that scores it, a diagnoser that finds the root cause, and an optimizer that fixes the prompt — all without reformatting your data.
Witness  →  captures what happened
Verdict  →  judges it
Origin   →  finds where it went wrong
Reflex   →  fixes it

Zero runtime dependencies. Works with any LLM framework. Non-Python users can emit traces as JSON directly — see the schema spec.

Install

pip install aevyra-witness

Python 3.10+.

Quick start (manual AgentTrace)

The simplest way to build a trace is to construct it directly. Run your pipeline, collect the inputs and outputs, and wrap them in an AgentTrace:

from aevyra_witness import AgentTrace, TraceNode

def run_pipeline(prompt: str, ticket: str) -> AgentTrace:
    ticket_type = classify_ticket(ticket)
    policy      = retrieve_policy(ticket_type)
    response    = generate_response(ticket, ticket_type, policy, prompt)

    return AgentTrace(
        nodes=[
            TraceNode("classify_ticket",   input=ticket,      output=ticket_type),
            TraceNode("retrieve_policy",   input=ticket_type, output=policy),
            TraceNode("generate_response", input=ticket,      output=response,
                      optimize=True),
        ],
        ideal=expected_response,
    )

That trace is a plain Python object — no I/O, no side effects. Call at.to_trace_text() to see what a judge or critic will read:

=== AGENT TRACE ===

Node 1 — classify_ticket
  Input:  billing dispute on invoice #4821
  Output: billing

Node 2 — retrieve_policy
  Input:  billing
  Output: Refund requests must be submitted within 30 days of the
          invoice date. Disputes after that window require manager approval.

Node 3 — generate_response  [optimize]
  Input:  billing dispute on invoice #4821
  Output: I can help with your billing dispute. Our policy requires
          disputes to be submitted within 30 days of the invoice date...

[optimize] marks the span whose prompt Reflex will rewrite if the trace scores poorly. Pass the trace to Origin to find out which span caused the failure before asking Reflex to fix anything.

Quick start (live capture)

The runtime instruments your existing functions without changing their signatures. Add @span to each node you want to track, wrap the run in trace(), and call t.finish() for the completed AgentTrace:

from aevyra_witness.runtime import span, trace

@span("classify")
def classify(text):
    return "billing"

@span("retrieve")
def retrieve(topic):
    return ["doc1", "doc2"]

@span("answer", optimize=True, prompt_id="answer_v1")
def answer(q, docs):
    return "your refund will post in 3–5 business days"

def my_agent(q):
    topic = classify(q)
    return answer(q, retrieve(topic))

with trace(ideal="your refund will post in 3–5 business days") as t:
    my_agent("how do I refund?")
at = t.finish()

# at is an AgentTrace with three spans, input/output/timing captured,
# ready to hand to Verdict or Origin.

span doubles as a context manager for places where a decorator doesn't fit (tool calls, inline LLM calls):

from aevyra_witness.runtime import span, KIND_TOOL

with span("gmail_search", kind=KIND_TOOL) as s:
    s.input = {"query": "from:billing"}
    s.output = gmail.search(s.input["query"])

Outside a trace() scope, @span is a silent no-op — your code still runs, nothing is recorded. This lets you instrument a library without forcing every caller to adopt the tracer.

@span supports async functions, propagates the current parent id across await via contextvars, and records exceptions with error = repr(exc) without swallowing them.

Quick start (non-Python)

The trace format is plain JSON — any language can produce it without installing this library. Write a conforming object and save it to a file; the Origin CLI will take it from there.

// TypeScript / JavaScript
import { writeFileSync } from "fs";

const trace = {
  nodes: [
    { name: "classify", kind: "reason", input: userMessage, output: category },
    { name: "lookup",   kind: "tool",   input: { id },      output: result,
      metadata: { mcp_server: "stripe" } },
    { name: "answer",   kind: "reason", input: prompt,      output: reply },
  ],
  ideal: expectedAnswer,
  metadata: { session_id: sessionId },
};

writeFileSync("trace.json", JSON.stringify(trace, null, 2));
# Then run attribution with the Origin CLI (Python, one-time install)
pip install aevyra-origin[anthropic]
aevyra-origin diagnose trace.json --score 0.2 --rubric rubric.txt

For Go, Java, and full field reference see the schema spec.

Complex usage (N-step plan-act with M-parallel tools)

Witness is designed for real agent systems from day one — a reasoning model that dispatches several tools in parallel, reflects on the results, and iterates. The DAG is expressed through parent_id; one prompt fired at many call sites is tracked via prompt_id.

from aevyra_witness import AgentTrace, TraceNode, KIND_REASON, KIND_TOOL

trace = AgentTrace(nodes=[
    TraceNode("plan", id="p1", kind=KIND_REASON, prompt_id="planner",
              step=1, input=user_query, output=plan1, optimize=True),

    # Three parallel tool calls, all spawned by the step-1 plan.
    TraceNode("search_flights", id="t1a", kind=KIND_TOOL, parent_id="p1",
              input={"destination": "Tokyo"}, output=[...]),
    TraceNode("check_calendar", id="t1b", kind=KIND_TOOL, parent_id="p1",
              input={"dates": "next week"}, output={...}),
    TraceNode("get_weather", id="t1c", kind=KIND_TOOL, parent_id="p1",
              input={"city": "Tokyo"}, output={...}),

    TraceNode("plan", id="p2", kind=KIND_REASON, prompt_id="planner",
              step=2, input=context, output=plan2, optimize=True),
    TraceNode("book_flight", id="t2a", kind=KIND_TOOL, parent_id="p2",
              input={...}, output={"confirmation": "JL123"}),

    TraceNode("respond", id="r", kind=KIND_REASON, prompt_id="responder",
              step=3, input=final_context, output=reply),
])

Both p1 and p2 carry prompt_id="planner" and optimize=True — they're the same prompt fired at two steps. Reflex updates the planner prompt once and both call sites benefit. trace.optimize_prompt_ids returns ["planner"].

Integrations

Already have traces from another system? Witness ships adapters that convert external formats into AgentTrace in one call — no re-instrumentation needed.

OpenTelemetry (LangGraph, CrewAI, AutoGen, Vercel AI SDK)

Any framework that emits OpenTelemetry spans with the GenAI semantic conventions works out of the box. Pass the finished spans to from_otel_spans:

from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from aevyra_witness.adapters import from_otel_spans

exporter = InMemorySpanExporter()
# ... configure your OTel TracerProvider with this exporter, run your agent ...
spans = exporter.get_finished_spans()
trace = from_otel_spans(spans)

Plain dicts from an OTLP JSON export are also accepted.

OpenClaw

OpenClaw streams telemetry as JSONL — one event per line. Pass the lines (strings or pre-parsed dicts) directly:

from pathlib import Path
from aevyra_witness.adapters import from_openclaw_jsonl

lines = Path("run_2026_04_21.jsonl").read_text().splitlines()
trace = from_openclaw_jsonl(lines, ideal="expected output")

The adapter handles paired start/end events, auto-wires tool calls back to the reasoning turn that dispatched them via tool_call_id, and recognises all OpenClaw event families including Task Brain (task.*, cron.*, acp.*).

To mark specific prompts as Reflex optimization targets without annotating the event stream:

trace = from_openclaw_jsonl(lines, optimize_prompt_ids=["planner", "responder"])

MCP sessions

The MCP interceptor wraps any ClientSession and records every call_tool invocation as a TraceNode automatically — no @span decorators needed in agent code:

from mcp import ClientSession
from aevyra_witness.interceptors.mcp import wrap_mcp_session

async with ClientSession(read, write) as session:
    await session.initialize()
    mcp = wrap_mcp_session(session, server_name="github")

    result = await mcp.call_tool("create_issue", {"title": "Bug"})
    result = await mcp.call_tool("list_repos", {})

    trace = mcp.to_trace()  # AgentTrace with all captured spans

To record MCP calls alongside @span-instrumented functions, pass the active tracer so the spans land in the same trace:

from aevyra_witness.runtime import trace as witness_trace

with witness_trace() as t:
    mcp = wrap_mcp_session(session, server_name="slack", tracer=t)
    await mcp.call_tool("post_message", {...})

at = t.finish()  # includes both @span spans and MCP calls

Bring your own format

If you control the producer (TypeScript, Go, Rust), the simplest path is to emit a JSON file that matches the AgentTrace schema and run the Origin CLI against it — see Quick start (non-Python) above. For structured logs from Langfuse, LangSmith, or a home-grown JSONL store, the Origin BYO trace tutorial shows a 30-line adapter pattern that works for any source format.

What's in the box

Schema:

  • AgentTrace / TraceNode — the dataclasses
  • Recommended kind constants: KIND_REASON, KIND_TOOL, KIND_RETRIEVE, KIND_AGENT, KIND_OTHER (custom kinds allowed)

Runtime (aevyra_witness.runtime):

  • @span(name, ...) — decorator that captures a function's call as a TraceNode. Forwards optimize, kind, prompt_id, tokens. Async and sync both work; exceptions are recorded and re-raised.
  • span(...) — the same object, used as a context manager for inline blocks (tool calls, LLM calls not wrapped in a function).
  • trace(*, ideal=None, metadata=None) — context manager that installs a Tracer via contextvars. t.finish() returns the completed AgentTrace and is idempotent.
  • current_tracer() — access the active tracer (for writing custom instrumentation).
  • Tracer — the underlying accumulator, exposed for advanced users who want to drive the runtime by hand.

Rendering:

  • to_trace_text() — hierarchical indented tree for LLM consumption (judges and critics read this)
  • to_dataset_record() — Verdict-compatible dataset record

Topology queries:

  • roots, children_of(id), by_id(id), depth_of(node)

Optimization targets:

  • optimize_nodes — every span marked optimize=True
  • optimize_prompt_ids — distinct prompt ids Reflex will update
  • optimize_node — first marked span (linear-trace convenience)

Serialization:

  • to_dict() / from_dict() / to_json() / from_json()

Tool calls (including MCP):

  • TraceNode.mcp_tool(...) — factory for MCP tool-call spans that normalizes mcp_server, tool_call_id, error_code, latency_ms metadata so downstream tools render them consistently.

Adapters:

  • from_openclaw_jsonl(lines) — convert an OpenClaw JSONL event stream into an AgentTrace. Handles start/end pairing, auto-parents tool calls via tool_call_id, and covers Task Brain event families.
  • from_otel_spans(spans) — convert OpenTelemetry ReadableSpan objects or OTLP JSON dicts into an AgentTrace. Works with LangGraph, CrewAI, AutoGen, Vercel AI SDK, and any framework emitting the GenAI semantic conventions.

Interceptors:

  • wrap_mcp_session(session, server_name=...) — wrap any MCP ClientSession to record every call_tool invocation as a TraceNode, with no decorators needed in agent code.

No LLM calls, no HTTP, no optimizer. Just the schema.

Why it's its own package

A trace type is a contract. If Reflex and Origin each defined their own copy, the contract would drift — a field added here, a rename there, and suddenly a trace that works with the optimizer doesn't work with the diagnoser. Witness is the single source of truth that every Aevyra tool imports. Adding a new tool (a trace viewer, an OTel importer, a failure clusterer) is as simple as pip install aevyra-witness and reading the same type.

Design notes

See DESIGN.md for the rationale behind the schema: identity vs. execution, flat-list-plus-parent-id, render-for-LLM rules, and what's intentionally out of scope.

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aevyra_witness-0.1.0.tar.gz (49.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aevyra_witness-0.1.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file aevyra_witness-0.1.0.tar.gz.

File metadata

  • Download URL: aevyra_witness-0.1.0.tar.gz
  • Upload date:
  • Size: 49.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aevyra_witness-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa01735acc56768da43b3167e784a6f39b3171582e39071467a8f2f6be8f6db2
MD5 7aad79ae40fceef6e9b5c4822e627773
BLAKE2b-256 2f1473120e2b25c5817803dcdad945b1fde2341b4163c7b0f7ee4b71deff416c

See more details on using hashes here.

Provenance

The following attestation bundles were made for aevyra_witness-0.1.0.tar.gz:

Publisher: publish.yml on aevyraai/witness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aevyra_witness-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aevyra_witness-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aevyra_witness-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3dcc5cba1a6e852a583bdbb4b640d4feb396bbfb8d1c5fedd7ca0b7dc5d4c288
MD5 26c370c17c03af89ee185436feef5ec2
BLAKE2b-256 48c1a0d74fcfef25d52bde55841e552f97fceaae81d9a374268058232e944f54

See more details on using hashes here.

Provenance

The following attestation bundles were made for aevyra_witness-0.1.0-py3-none-any.whl:

Publisher: publish.yml on aevyraai/witness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page