Adaptive test-time-compute routing for LLM reasoning: cheap samples first, escalate to native thinking only on disagreement.
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
ZEN — Sample · Agree · Escalate
Adaptive test-time compute for LLMs: stop paying for thinking your model doesn't need.
ZEN routes every request through the cheapest path that can solve it. It draws two cheap (non-thinking) samples first and only escalates to expensive native thinking when they disagree — using disagreement as a free difficulty signal. On hard benchmarks it matches or beats native-thinking accuracy at lower token cost; on easy traffic it answers for a fraction of the price.
pip install zen-router
from zen import ZenGateway
gw = ZenGateway() # any OpenAI-compatible endpoint
resp = gw.route("What is 15% of 240?") # classifies, routes, answers
print(resp.text) # "36"
print(resp.path) # "consensus" (solved cheaply — no thinking spent)
print(resp.tokens) # ~220
No training. No GPU. No logprobs required. Provider-agnostic.
Why
Thinking/reasoning modes are powerful and expensive — and they charge you for
every hidden reasoning token. A trivial question costs 10× more with
thinking enabled (we measured 6 vs 66 completion tokens for 24*17). Chat apps
today either burn that on every message or make the user toggle thinking by
hand. ZEN makes the decision per request, automatically, with an auditable
token account for every call.
How it works
request ──> 2 cheap samples (parallel, thinking off)
│
agree? ──┴── yes ──> answer (~4k tokens on AIME)
│
no (= this one is actually hard)
│
▼
3rd cheap sample + 1 native-thinking sample (parallel)
│
▼
weighted vote {cheap ×1 each, native ×2} ──> answer (~19k tokens)
Benchmarks
DeepSeek-V4-Flash via OpenRouter, single-sample protocol, tokens counted across all calls each method makes. N=30 per benchmark — treat ±9pp as noise. Full tables and methodology: docs/RESULTS.md.
AIME 2025 (hard for the model — native thinking spends ~15k tokens/problem):
| method | accuracy | mean tokens |
|---|---|---|
| single cheap call | 40.0% | 2.0k |
| self-consistency@10 | 43.3% | 18.2k |
| best-of-5 + LLM judge | 50.0% | 13.9k |
| native thinking (1 call) | 56.7% | 14.6k |
| ZEN (vote) | 56.7–66.7% (2 runs) | 12.8k |
AIME 2024 (easy for the model — native thinking self-regulates to ~9k):
| method | accuracy | mean tokens |
|---|---|---|
| native thinking (1 call) | 76.7% | 9.2k |
| ZEN (vote) | 73.3% | 9.8k |
Honest reading: ZEN wins when the task challenges the model (cuts waste, adds vote robustness). When the task is easy for the model, it is a statistical tie with slight overhead — modern thinking modes already self-regulate. Rule of thumb from the data: ZEN pays off when ≥ ~35–55% of your traffic is cheaply solvable.
The three layers
| layer | what it does | use it for |
|---|---|---|
ZenGateway |
classifies each message (chat / question / task) and dispatches the right amount of compute | chat apps, AI workspaces — an automatic thinking mode |
ZenRouter |
cheap consensus → thinking escalation, for verifiable answers | math, MCQ, facts, extraction, code-with-tests |
ZenPlanner |
plan-and-execute: decompose, run steps with threaded context, synthesize | multi-step tasks, agent pipelines |
ZenGateway — the automatic thinking mode
from zen import ZenGateway
gw = ZenGateway()
gw.route("hey, what do you think about coffee?") # chat -> 1 cheap call
gw.route("Which planet has the most moons?") # question -> consensus route
gw.route("Compare SQLite and PostgreSQL and recommend one.") # task -> planner
gw.route(msg, kind="question") # or force the kind yourself
Tool-safety rule (built in): pass tools_present=True on turns where the
model may call tools. The gateway then makes exactly one routed call — it
never samples in parallel around side-effectful tools (two samples would run
your tools twice). Your agent loop handles the tool cycle around it.
ZenRouter — verifiable Q→A
from zen import ZenRouter
router = ZenRouter() # math-style prompt/parser by default
result = router.solve("If 3x + 7 = 22, what is x?")
result.answer, result.tokens, result.path # 5, ~4k, "consensus"
ZenRouter(
variant="vote", # "vote" (validated best) | "eager" | "hybrid"
temperature=0.7, # sampling diversity for consensus
native_weight=2, # native sample's weight in the final vote
think_budget=16000, # completion budget of the native call
parser=my_extractor, # swap the answer parser for your domain
raw_log="samples.jsonl", # dump every sample for offline analysis
)
ZenPlanner — plan-and-execute with an agent hook
from zen import ZenPlanner
def my_agent_step(step_description, context):
# run your tool-calling agent (e.g. SIFT) on this step, return text
return my_agent.run(step_description, context)
planner = ZenPlanner(executor=my_agent_step) # omit executor = pure reasoning
result = planner.run("Research X, compare with Y, write a recommendation.")
result.text # final deliverable
result.steps # [(step, result), ...]
The planner fixes the classic plan-and-execute inefficiency: each step receives the task + plan + clipped results of prior steps — not the full reasoning history — so cost grows linearly, not quadratically. A step that fails is retried once with native thinking (per-step escalation). Plans with fewer than two steps skip orchestration entirely.
Configuration
Point ZEN at any OpenAI-compatible endpoint:
export ZEN_LLM_BASE_URL=https://openrouter.ai/api/v1
export ZEN_LLM_MODEL=deepseek/deepseek-v4-flash
export ZEN_LLM_API_KEY=sk-or-...
or use a built-in profile (reads the key from OPENROUTER_API_KEY or a
git-ignored .secrets.json):
from zen import config
config.apply_profile("deepseek-v4-flash")
Escalation needs a model exposing a thinking toggle (DeepSeek, Qwen, GPT reasoning-effort, Claude extended thinking...). Models without one still work — ZEN then behaves as consensus routing.
Everything is injectable — client_factory=, classifier=, parser=,
executor= — so ZEN stays provider-agnostic and fully testable offline:
python tests/test_offline.py # 31 tests, zero API calls
Negative results we kept (so you don't rediscover them)
- Confidence gating (accept a high-logprob first sample): logprob coverage
through OpenRouter is partial (~47%) and confidence did not predict
correctness on AIME — the non-thinking model is often confidently wrong.
Available as
gate_tail_logprob=, off by default. - Halving the native think budget (
think_budget=8000): saved only ~10% of total tokens and cost accuracy exactly where thinking was needed. - Truncating candidates in judge/aggregation prompts silently destroys
them — keep the head and the tail (
zen.parsing.clip).
Honest caveats: results come from one model family and N=30 math benchmarks;
consensus needs comparable short answers (v0.2 targets verifiable outputs —
swap parser= for your domain, open-ended text is future work).
Repo layout
zen/ the package: gateway, router, planner, client, parsing, config
tests/ offline test suite (no API key needed)
experiments/ evaluation harnesses (AIME/MATH/GSM8K) + the RL research line
docs/ full results tables + development log
Related work
ZEN distills ideas from self-consistency (Wang et al.), adaptive-consensus and fast/slow-routing research (AdaptThink, DART), DeepConf, RSA and RL of Thoughts — the tiny-controller idea that started this project. ZEN's contribution is the disagreement-routed cheap→thinking cascade with honest, per-call token accounting.
Works well next to SIFT (tool retrieval and calling by the same author): SIFT decides what the model can do, ZEN decides how hard it should think.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zen_router-0.2.0.tar.gz.
File metadata
- Download URL: zen_router-0.2.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9393334b869b030531c84f89eff85e08e47602800186359b35f4829612cd040
|
|
| MD5 |
58b1fe40050f34a866c2bf099511a879
|
|
| BLAKE2b-256 |
a8a06c1b23c074dc86cd34b775bb3fae82d1f24b0a674bc685fa923774909db3
|
Provenance
The following attestation bundles were made for zen_router-0.2.0.tar.gz:
Publisher:
publish.yml on Victor-Alves0/ZEN
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zen_router-0.2.0.tar.gz -
Subject digest:
b9393334b869b030531c84f89eff85e08e47602800186359b35f4829612cd040 - Sigstore transparency entry: 2048133337
- Sigstore integration time:
-
Permalink:
Victor-Alves0/ZEN@a6cf4cf6a56f069da3cb39dd8abe5153415062ad -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Victor-Alves0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a6cf4cf6a56f069da3cb39dd8abe5153415062ad -
Trigger Event:
release
-
Statement type:
File details
Details for the file zen_router-0.2.0-py3-none-any.whl.
File metadata
- Download URL: zen_router-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48169f3c15b234166bcb7760cab496bc241eb01a586f086371770fc8f3659bb7
|
|
| MD5 |
de238e6a34d188987875ecb2d8c5c26e
|
|
| BLAKE2b-256 |
cee2bd8c888eac8dd64ab6f8710c03b2d0f480182dcb1770d537b085198bf3a5
|
Provenance
The following attestation bundles were made for zen_router-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on Victor-Alves0/ZEN
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zen_router-0.2.0-py3-none-any.whl -
Subject digest:
48169f3c15b234166bcb7760cab496bc241eb01a586f086371770fc8f3659bb7 - Sigstore transparency entry: 2048133660
- Sigstore integration time:
-
Permalink:
Victor-Alves0/ZEN@a6cf4cf6a56f069da3cb39dd8abe5153415062ad -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Victor-Alves0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a6cf4cf6a56f069da3cb39dd8abe5153415062ad -
Trigger Event:
release
-
Statement type: