Skip to main content

Token efficiency auditing and adaptive sampling for production AI agents

Project description

TraceRazor

Token efficiency auditing and adaptive sampling for production AI agents.

TraceRazor does two things:

Audit your agent's traces to find wasted tokens, detect tool misfires and reasoning loops, generate fix patches, and estimate cost savings.

Sample more reliably by running K parallel LLM candidates per step and picking the consensus winner. Improves task pass rates without changing your agent's logic.

Both features are independent. Use one, the other, or both.


Install

pip install tracerazor

Install with optional dependencies as needed:

pip install "tracerazor[openai]"        # OpenAI adapter
pip install "tracerazor[anthropic]"     # Anthropic adapter
pip install "tracerazor[langgraph]"     # LangGraph integration
pip install "tracerazor[http]"          # HTTP mode for remote server
pip install "tracerazor[all]"           # Everything

Audit quickstart

Record steps manually with Tracer, then call analyse() to get a report:

from tracerazor import Tracer

with Tracer(agent_name="support-agent", framework="openai") as t:
    response = llm.invoke(prompt)
    t.reasoning(response.text, tokens=response.usage.total_tokens)

    result = lookup_order(order_id="ORD-123")
    t.tool("lookup_order", params={"order_id": "ORD-123"},
           output=str(result), success=True, tokens=80)

report = t.analyse()
print(report.summary())
# TAS 81.4/100 [Good] | 2 steps, 900 tokens | Saved 140 tokens (16%)

report.assert_passes()  # raises AssertionError in CI if TAS < 70

The Tracer submits the trace to the local tracerazor binary (CLI mode) or to a running tracerazor-server (HTTP mode). Build the binary with:

cargo build --release

Or point to an existing binary:

export TRACERAZOR_BIN=/path/to/tracerazor

Sampling quickstart

AdaptiveKNode is a drop-in replacement for a LangGraph ReAct node. It samples K parallel LLM candidates at each step and picks the consensus winner.

from tracerazor import AdaptiveKNode, openai_llm
from openai import AsyncOpenAI
from langgraph.graph import StateGraph

llm = openai_llm(AsyncOpenAI(), model="gpt-4.1")
node = AdaptiveKNode(llm=llm, tools=my_tools, k_max=5, k_min=2)

graph = StateGraph(AgentState)
graph.add_node("agent", node)
# ... add edges and compile as usual ...

result = await graph.ainvoke({"messages": [HumanMessage(content="...")]})
print(result["consensus_report"].summary())

K adapts automatically: it shrinks toward k_min when all candidates agree (saving tokens), and resets to k_max after a divergent vote or a state-mutating tool call (e.g. booking a flight, cancelling an order).


Baselines

Use NaiveKEnsemble and SelfConsistencyBaseline to benchmark your setup:

from tracerazor import NaiveKEnsemble, SelfConsistencyBaseline

NaiveKEnsemble runs K independent full-task agents and picks the majority result. SelfConsistencyBaseline uses a single deterministic tool-calling pass, then re-samples the final response K times.

In tau-bench airline benchmarks (50 tasks, gpt-4o):

Strategy pass^1 mean tokens vs baseline
K=1 baseline 38% 63k 1.0x
NaiveKEnsemble (K=5) 40% 282k 4.5x
AdaptiveKNode (K=5) 46% 246k 3.9x
SelfConsistency (K=5) 48% 137k 2.2x

Audit API

Name Description
Tracer Context manager for recording steps and submitting for analysis
TraceRazorClient Lower-level client for submitting trace dicts directly
TraceRazorReport Parsed audit result with TAS score, metrics, fixes, and savings
TraceStep Data class for a single recorded step

Sampling API

Name Description
AdaptiveKNode LangGraph node with per-step adaptive parallel sampling
ExactMatchConsensus Aggregates K branch proposals by exact-match comparison
MutationMetadata Classifies tools as mutating vs read-only
NaiveKEnsemble K independent full-task agents, majority vote
SelfConsistencyBaseline K re-samples of the final response only

LLM adapters

Name Description
openai_llm Adapter factory for AsyncOpenAI
anthropic_llm Adapter factory for AsyncAnthropic
mock_llm Deterministic mock for tests and offline demos

License

MIT. Copyright (c) 2024 Zulfaqar Hafez.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracerazor-1.0.0.tar.gz (63.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tracerazor-1.0.0-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file tracerazor-1.0.0.tar.gz.

File metadata

  • Download URL: tracerazor-1.0.0.tar.gz
  • Upload date:
  • Size: 63.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for tracerazor-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7a1663157cb89a3b3fdd7d7f1e2081730046e7d2e5cfe60b1e70d2bc3bc4b0e4
MD5 40d4f2979cec169bd18b9a8b71af67a6
BLAKE2b-256 057d439b8f1baad29fbce8d25d89b0d82dba63726b226863e2eec03bbef30b69

See more details on using hashes here.

File details

Details for the file tracerazor-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tracerazor-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for tracerazor-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cca4e505fc41e4baa6ce8b924071f40e1b43440cf6684981359ed0c1886c1b63
MD5 a44871d9cb5005718ff9260469656ad2
BLAKE2b-256 3744a8b1be809d163da60b211c7dec0b5093663b41669d144261aa6170421cfb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page