TraceRazor Python SDK — comprehensive token efficiency auditing & optimization for AI agents
Project description
tracerazor
Python SDK for TraceRazor — comprehensive token efficiency auditing & optimization for AI agents.
Works with any Python agent: OpenAI, Anthropic, LangGraph, CrewAI, AutoGen, or raw code.
v0.2.0 — New metrics:
- Semantic Continuity (CSD) — detects reasoning drift across steps
- Adherence Scoring (IAR) — validates that optimization fixes improve metrics
- Multi-agent reporting — audit 2+ agents in a workflow, aggregate efficiency
Install
pip install tracerazor
Requires the tracerazor binary to be built and accessible. Either:
# Option A: build from source (one-time)
cargo build --release
export TRACERAZOR_BIN=/path/to/TraceRazor/target/release/tracerazor
# Option B: use HTTP mode against a running server (no binary on client)
# docker compose up (in the TraceRazor repo)
Quickstart
from tracerazor_sdk import Tracer
tracer = Tracer(agent_name="my-agent", framework="openai")
# After each LLM call, record the reasoning step:
tracer.reasoning(
content=llm_response.text,
tokens=llm_response.usage.total_tokens,
input_context=prompt, # optional, improves CCE detection
)
# After each tool call:
tracer.tool(
name="get_order_details",
params={"order_id": "ORD-9182"},
output=tool_result,
success=True,
tokens=120,
)
# After the agent finishes:
report = tracer.analyse()
print(report.summary())
# → TAS 74.3/100 [Good] | 6 steps, 3200 tokens | Saved 1100 tokens (34%)
print(report.markdown()) # full formatted report
report.assert_passes() # raises AssertionError if TAS < threshold (CI use)
HTTP mode
If you'd rather not put the binary on every machine, run the server once and POST from anywhere:
from tracerazor_sdk import Tracer
tracer = Tracer(
agent_name="my-agent",
server="http://localhost:8080", # tracerazor-server URL
)
# Record steps the same way, then:
report = tracer.analyse()
Install with HTTP support:
pip install tracerazor[http]
Context manager
with Tracer(agent_name="my-agent") as t:
t.reasoning("...", tokens=500)
t.tool("search", params={}, output="...", success=True, tokens=100)
report = t.analyse()
API
Tracer(agent_name, framework, threshold, task_value_score, bin_path, server)
| param | default | description |
|---|---|---|
agent_name |
required | shown in reports and used for baseline tracking |
framework |
"custom" |
any string: "openai", "anthropic", "crewai", etc. |
threshold |
70.0 |
minimum TAS for assert_passes() |
task_value_score |
1.0 |
answer quality (0–1), update with set_task_value() |
bin_path |
auto | path to tracerazor binary; falls back to TRACERAZOR_BIN env var |
server |
None |
if set, use HTTP mode |
tracer.reasoning(content, tokens, input_context, output)
Record one LLM reasoning step. input_context is the full prompt — include it for accurate CCE bloat detection.
tracer.tool(name, params, output, success, error, tokens, input_context)
Record one tool call. success=False triggers misfire detection (TCA) and auto-fix generation.
tracer.set_task_value(score: float)
Update the task quality score after validating the agent's answer. Call before analyse().
tracer.analyse() → TraceRazorReport
Submit the trace and return the report.
TraceRazorReport
| attribute | type | description |
|---|---|---|
tas_score |
float |
0–100 composite score |
grade |
str |
Excellent, Good, Fair, Poor |
passes |
bool |
tas_score >= threshold |
savings |
dict |
tokens_saved, reduction_pct, monthly_savings_usd |
fixes |
list |
auto-generated fix patches |
anomalies |
list |
z-score alerts vs. agent baseline (after 5+ runs) |
metrics |
dict |
raw per-metric scores (SRR, LDI, TCA, RDA, ISR, TUR, CCE, DBO) |
.summary() |
method | one-line string |
.markdown() |
method | full formatted report |
.assert_passes() |
method | raises AssertionError if TAS < threshold |
Multi-Agent Workflows
TraceRazor handles multi-agent systems natively. Each agent in your workflow gets its own tracer and independent report:
from tracerazor_sdk import Tracer
# Agent 1: Triage
with Tracer(agent_name="triage") as t:
category = classify_request(user_query)
t.reasoning(f"Classified as: {category}", tokens=120)
triage_result = t.analyse()
# Agent 2: Resolution
with Tracer(agent_name="resolution") as t:
answer = resolve_by_category(category)
t.tool("search_kb", params={"q": category}, output=answer, success=True, tokens=180)
resolution_result = t.analyse()
# Agent 3: Escalation (if needed)
if not resolution_result.passes:
with Tracer(agent_name="escalation") as t:
ticket = escalate_to_human()
t.tool("create_ticket", params={}, output=ticket, success=True, tokens=100)
escalation_result = t.analyse()
# Aggregate results
total_tokens = sum([
triage_result.total_tokens,
resolution_result.total_tokens,
escalation_result.total_tokens,
])
avg_efficiency = (
triage_result.tas_score +
resolution_result.tas_score +
escalation_result.tas_score
) / 3
print(f"Workflow: {total_tokens} tokens | Efficiency: {avg_efficiency:.1f}/100")
See examples/multi_agent_workflow.py for a complete working example with 4 agents, tool calling, and cost analysis.
What's Audited
Each trace is analyzed across 13 independent signals:
Structural Efficiency:
- Step Redundancy (SRR) — near-duplicate steps
- Loop Detection (LDI) — repeated tool calls
- Tool Accuracy (TCA) — failed tool calls
- Reasoning Depth (RDA) — over-complex reasoning
- Information Sufficiency (ISR) — wasted steps
- Token Utilisation (TUR) — off-task content
- Context Efficiency (CCE) — duplicate context
- Decision Optimality (DBO) — suboptimal tool sequences
- Semantic Continuity (CSD) — reasoning drift [NEW in v0.2]
Verbosity & Presentation:
- Verbosity Density (VDI) — filler words, low-substance content
- Sycophancy/Hedging (SHL) — over-polite phrasing
- Compression Ratio (CCR) — highly compressible output
Optimization Validation:
- Adherence Score (IAR) — % of fixes that improved metrics on re-audit [NEW in v0.2]
Examples
- Multi-agent customer support system —
examples/multi_agent_workflow.py - LangGraph integration —
../langgraph/examples/ - CrewAI integration —
../crewai/examples/
Troubleshooting
Q: How do I optimize an agent?
Use the Rust CLI:
tracerazor optimize trace.json --output optimized_prompt.txt --target-tas 85
Q: Can I compare two runs?
Yes:
tracerazor bench --before trace_v1.json --after trace_v2.json
Q: What if I don't have the tracerazor binary?
Use HTTP mode with a running server:
pip install tracerazor[http]
docker compose up # in the TraceRazor repo
Then pass server="http://localhost:8080" to Tracer().
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tracerazor-0.2.0.tar.gz.
File metadata
- Download URL: tracerazor-0.2.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9255dae79637d0cc165b05e630289f649f8becc439b6cbb0be86eacbd3ce06a
|
|
| MD5 |
18ab2acdaa62b2d68a3a4c5991ddb985
|
|
| BLAKE2b-256 |
9f7787c2cb9d6d605c75f8e1bc4219b1a3c4ff97b65e853f4464f6d0fe4a567d
|
File details
Details for the file tracerazor-0.2.0-py3-none-any.whl.
File metadata
- Download URL: tracerazor-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
525d6c04df9907235be548fe1693f606566aa226ca36098779cdb499dcf4e30b
|
|
| MD5 |
05d5c5356f5f41d947866c40e4ec0b48
|
|
| BLAKE2b-256 |
f9c3387ee701e2213ce68b976ea2bdeba1d99b957af761fe01706b6e6165e9c3
|