TRACER: Trace-Based Adaptive Cost-Efficient Routing. Turn LLM traces into parity-gated routing policies - cut 90%+ of LLM calls with formal guarantees.
Project description
TRACER
Trace-Based Adaptive Cost-Efficient Routing
Most LLM-based classification pipelines use a large language model for every single input. In practice, the vast majority of that traffic is predictable - a lightweight traditional ML model (logistic regression, gradient-boosted trees, or a small neural net) can match the LLM's output with near-perfect agreement.
TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast, non-LLM surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM. Every deferred call produces a new trace, which feeds the next refit - coverage grows automatically over time. The result: 90%+ of classification calls routed to traditional ML, with formal parity guarantees against the teacher LLM and a self-improving routing policy.
pip install tracer-llm
See it work
tracer demo
TRACER Demo - Banking77 (77 intents · 1,500 traces)
Routing Policy
method l2d
coverage 80.7% of traffic handled by surrogate
teacher TA 0.951 surrogate matches teacher on handled traffic
Cost Projection (10k queries/day)
Without TRACER 10,000 LLM calls/day $20.00/day
With TRACER 1,926 LLM calls/day $ 3.85/day $5,894 saved/yr
Quickstart
Input: a JSONL file where each line contains the original text (input) and the label your LLM assigned (teacher).
import tracer
# 1. Fit - learn a routing policy from your LLM's classification traces
result = tracer.fit(
"traces.jsonl", # {"input": "...", "teacher": "label"} per line
embeddings=X, # np.ndarray (n, dim) - precomputed text embeddings
config=tracer.FitConfig(target_teacher_agreement=0.95),
)
# 2. Route - surrogate handles easy inputs, LLM handles the rest
router = tracer.load_router(".tracer", embedder=embedder)
out = router.predict("What is my balance?")
# {"label": "check_balance", "decision": "handled", "accept_score": 0.96}
# 3. Fallback - only invokes the LLM when the surrogate declines
out = router.predict("Some edge case", fallback=lambda: call_my_llm(text))
Want to go deeper? The concepts guide explains the full pipeline, model zoo, and parity gate. The API reference covers every parameter. The CLI reference covers
tracer fit,tracer serve, and more.
How it works
User query → [Embedder] → [ML Surrogate] → [Acceptor Gate]
| |
score >= t score < t
| |
Local answer Defer to LLM
(traditional ML)
The surrogate is not another LLM - it is a classical ML or shallow DL model (the model zoo includes logistic regression, SGD, LightGBM, random forests, and small feed-forward nets). This is what makes the cost reduction real: inference is CPU-bound, sub-millisecond, and free.
- Fit - train a suite of candidate surrogates on your LLM's classification traces; select the best via cross-validated teacher agreement
- Gate - attach a learned acceptor that estimates, per-input, whether the surrogate will agree with the teacher
- Calibrate - sweep the acceptor threshold to maximise coverage at your target parity (e.g. ≥ 95% teacher agreement)
- Guard - block deployment if the best candidate cannot clear the parity bar on held-out data
Benchmark results (Banking77 - 77-class intent classification)
| Metric | Value |
|---|---|
| Coverage | 92.2% of traffic handled locally |
| Teacher agreement (handled) | 96.1% |
| End-to-end accuracy | 96.4% |
| Annual savings (10k queries/day) | $302,850 |
Continual learning flywheel
TRACER is not a one-shot fit. Every deferred input that reaches the LLM produces a new labeled trace, which feeds back into the next refit. As the surrogate sees more of the input distribution, its coverage grows - meaning fewer LLM calls, which in turn cost less, while the quality guarantee holds at every iteration.
Day 1: 2,000 traces → 84% coverage → 1,600 calls/day saved
Day 3: 6,000 traces → 90% coverage → 9,000 calls/day saved
Day 5: 10,000 traces → 92% coverage → 9,200 calls/day saved
tracer.update("new_traces.jsonl", embeddings=X_new) # refit with new production traces
The parity gate re-calibrates on each update, so coverage only increases when the surrogate actually earns it.
Embedder options
from tracer import Embedder
embedder = Embedder.from_sentence_transformers("BAAI/bge-small-en-v1.5") # local
embedder = Embedder.from_endpoint("https://api.example.com/embed", headers={...}) # API
embedder = Embedder.from_callable(my_fn) # any function
# or skip the embedder and pass raw np.ndarray embeddings directly
Need to compute embeddings at fit time?
pip install tracer-llm[embeddings] # adds sentence-transformers
X = tracer.embed(texts) # default: all-MiniLM-L6-v2 (384-dim)
CLI
| Command | What it does |
|---|---|
tracer demo |
Zero-setup demo on real data |
tracer fit traces.jsonl --target 0.95 |
Fit a routing policy |
tracer update new_traces.jsonl |
Refit with new traces |
tracer report-html |
Open the HTML audit report |
tracer serve .tracer --port 8000 |
HTTP prediction server |
What's in .tracer/
| File | Contents |
|---|---|
manifest.json |
Method, coverage, teacher agreement, label space |
pipeline.joblib |
Surrogate + acceptor + calibrated thresholds |
frontier.json |
All candidates at each quality target |
qualitative_report.json |
Per-label slices, boundary pairs, examples |
report.html |
Visual audit report |
Install
pip install tracer-llm # core (numpy + sklearn + joblib)
pip install tracer-llm[embeddings] # + sentence-transformers
pip install tracer-llm[all] # everything
Docs
| Concepts | Pipeline internals, model zoo, parity gate |
| API reference | Every function, parameter, and return type |
| CLI reference | tracer fit, tracer serve, tracer demo, and more |
| Artifacts | .tracer/ directory schema |
| AGENTS.md | Integration guide for AI coding assistants |
Paper
A research paper detailing the approach, formal guarantees, ablation studies, limitations, and reproducible experiment tooling is in preparation. It will be linked here upon publication.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tracer_llm-0.1.0.tar.gz.
File metadata
- Download URL: tracer_llm-0.1.0.tar.gz
- Upload date:
- Size: 311.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
628bccc1936231a8bbd51c92c6e2266e3a4be309369e44fc7dae41ccaffacd1a
|
|
| MD5 |
e4f7114d85dc7fdf738aa20975bc023f
|
|
| BLAKE2b-256 |
f052bfa6459a72c6740fc7c5d42dcfe65bfc29905012df5446178576e84b1bec
|
File details
Details for the file tracer_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tracer_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33aec878080913dfa60bc1ddbebc8fe929259a631a3e907ad6d4b93d60b54a39
|
|
| MD5 |
de446a4f3a46066ba716aa5259a9d13a
|
|
| BLAKE2b-256 |
ecad6705100ad71aea8088825fdf3c61345f07aae01df6c9d54f7d72e4fba499
|