Token efficiency auditing and adaptive sampling for production AI agents
Project description
TraceRazor
Token efficiency auditing and adaptive sampling for production AI agents.
TraceRazor does two things:
Audit your agent's traces to find wasted tokens, detect tool misfires and reasoning loops, generate fix patches, and estimate cost savings.
Sample more reliably by running K parallel LLM candidates per step and picking the consensus winner. Improves task pass rates without changing your agent's logic.
Both features are independent. Use one, the other, or both.
Install
pip install tracerazor
Install with optional dependencies as needed:
pip install "tracerazor[openai]" # OpenAI adapter
pip install "tracerazor[anthropic]" # Anthropic adapter
pip install "tracerazor[langgraph]" # LangGraph integration
pip install "tracerazor[http]" # HTTP mode for remote server
pip install "tracerazor[all]" # Everything
Audit quickstart
Record steps manually with Tracer, then call analyse() to get a report:
from tracerazor import Tracer
with Tracer(agent_name="support-agent", framework="openai") as t:
response = llm.invoke(prompt)
t.reasoning(response.text, tokens=response.usage.total_tokens)
result = lookup_order(order_id="ORD-123")
t.tool("lookup_order", params={"order_id": "ORD-123"},
output=str(result), success=True, tokens=80)
report = t.analyse()
print(report.summary())
# TAS 81.4/100 [Good] | 2 steps, 900 tokens | Saved 140 tokens (16%)
report.assert_passes() # raises AssertionError in CI if TAS < 70
The Tracer submits the trace to the local tracerazor binary (CLI mode) or
to a running tracerazor-server (HTTP mode). Build the binary with:
cargo build --release
Or point to an existing binary:
export TRACERAZOR_BIN=/path/to/tracerazor
Sampling quickstart
AdaptiveKNode is a drop-in replacement for a LangGraph ReAct node. It samples
K parallel LLM candidates at each step and picks the consensus winner.
from tracerazor import AdaptiveKNode, openai_llm
from openai import AsyncOpenAI
from langgraph.graph import StateGraph
llm = openai_llm(AsyncOpenAI(), model="gpt-4.1")
node = AdaptiveKNode(llm=llm, tools=my_tools, k_max=5, k_min=2)
graph = StateGraph(AgentState)
graph.add_node("agent", node)
# ... add edges and compile as usual ...
result = await graph.ainvoke({"messages": [HumanMessage(content="...")]})
print(result["consensus_report"].summary())
K adapts automatically: it shrinks toward k_min when all candidates agree
(saving tokens), and resets to k_max after a divergent vote or a
state-mutating tool call (e.g. booking a flight, cancelling an order).
Baselines
Use NaiveKEnsemble and SelfConsistencyBaseline to benchmark your setup:
from tracerazor import NaiveKEnsemble, SelfConsistencyBaseline
NaiveKEnsemble runs K independent full-task agents and picks the majority
result. SelfConsistencyBaseline uses a single deterministic tool-calling
pass, then re-samples the final response K times.
In tau-bench airline benchmarks (50 tasks, gpt-4o):
| Strategy | pass^1 | mean tokens | vs baseline |
|---|---|---|---|
| K=1 baseline | 38% | 63k | 1.0x |
| NaiveKEnsemble (K=5) | 40% | 282k | 4.5x |
| AdaptiveKNode (K=5) | 46% | 246k | 3.9x |
| SelfConsistency (K=5) | 48% | 137k | 2.2x |
Audit API
| Name | Description |
|---|---|
Tracer |
Context manager for recording steps and submitting for analysis |
TraceRazorClient |
Lower-level client for submitting trace dicts directly |
TraceRazorReport |
Parsed audit result with TAS score, metrics, fixes, and savings |
TraceStep |
Data class for a single recorded step |
Sampling API
| Name | Description |
|---|---|
AdaptiveKNode |
LangGraph node with per-step adaptive parallel sampling |
ExactMatchConsensus |
Aggregates K branch proposals by exact-match comparison |
MutationMetadata |
Classifies tools as mutating vs read-only |
NaiveKEnsemble |
K independent full-task agents, majority vote |
SelfConsistencyBaseline |
K re-samples of the final response only |
LLM adapters
| Name | Description |
|---|---|
openai_llm |
Adapter factory for AsyncOpenAI |
anthropic_llm |
Adapter factory for AsyncAnthropic |
mock_llm |
Deterministic mock for tests and offline demos |
License
MIT. Copyright (c) 2024 Zulfaqar Hafez.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tracerazor-1.0.0.tar.gz.
File metadata
- Download URL: tracerazor-1.0.0.tar.gz
- Upload date:
- Size: 63.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a1663157cb89a3b3fdd7d7f1e2081730046e7d2e5cfe60b1e70d2bc3bc4b0e4
|
|
| MD5 |
40d4f2979cec169bd18b9a8b71af67a6
|
|
| BLAKE2b-256 |
057d439b8f1baad29fbce8d25d89b0d82dba63726b226863e2eec03bbef30b69
|
File details
Details for the file tracerazor-1.0.0-py3-none-any.whl.
File metadata
- Download URL: tracerazor-1.0.0-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cca4e505fc41e4baa6ce8b924071f40e1b43440cf6684981359ed0c1886c1b63
|
|
| MD5 |
a44871d9cb5005718ff9260469656ad2
|
|
| BLAKE2b-256 |
3744a8b1be809d163da60b211c7dec0b5093663b41669d144261aa6170421cfb
|