Adaptive test-time-compute routing for LLM reasoning: cheap samples first, escalate to native thinking only on disagreement.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

victoralves0

These details have not been verified by PyPI

Project description

ZEN — Sample · Agree · Escalate

Adaptive test-time compute for LLMs: stop paying for thinking your model doesn't need.

ZEN routes every request through the cheapest path that can solve it. It draws two cheap (non-thinking) samples first and only escalates to expensive native thinking when they disagree — using disagreement as a free difficulty signal. On hard benchmarks it matches or beats native-thinking accuracy at lower token cost; on easy traffic it answers for a fraction of the price.

pip install zen-router

from zen import ZenGateway

gw = ZenGateway()                       # any OpenAI-compatible endpoint
resp = gw.route("What is 15% of 240?")  # classifies, routes, answers

print(resp.text)     # "36"
print(resp.path)     # "consensus"  (solved cheaply — no thinking spent)
print(resp.tokens)   # ~220

No training. No GPU. No logprobs required. Provider-agnostic.

Why

Thinking/reasoning modes are powerful and expensive — and they charge you for every hidden reasoning token. A trivial question costs 10× more with thinking enabled (we measured 6 vs 66 completion tokens for 24*17). Chat apps today either burn that on every message or make the user toggle thinking by hand. ZEN makes the decision per request, automatically, with an auditable token account for every call.

How it works

request ──> 2 cheap samples (parallel, thinking off)
                │
       agree? ──┴── yes ──> answer                     (~4k tokens on AIME)
                │
                no   (= this one is actually hard)
                │
                ▼
       3rd cheap sample + 1 native-thinking sample     (parallel)
                │
                ▼
       weighted vote {cheap ×1 each, native ×2} ──> answer   (~19k tokens)

Benchmarks

DeepSeek-V4-Flash via OpenRouter, single-sample protocol, tokens counted across all calls each method makes. N=30 per benchmark — treat ±9pp as noise. Full tables and methodology: docs/RESULTS.md.

AIME 2025 (hard for the model — native thinking spends ~15k tokens/problem):

method	accuracy	mean tokens
single cheap call	40.0%	2.0k
self-consistency@10	43.3%	18.2k
best-of-5 + LLM judge	50.0%	13.9k
native thinking (1 call)	56.7%	14.6k
ZEN (vote)	56.7–66.7% (2 runs)	12.8k

AIME 2024 (easy for the model — native thinking self-regulates to ~9k):

method	accuracy	mean tokens
native thinking (1 call)	76.7%	9.2k
ZEN (vote)	73.3%	9.8k

Honest reading: ZEN wins when the task challenges the model (cuts waste, adds vote robustness). When the task is easy for the model, it is a statistical tie with slight overhead — modern thinking modes already self-regulate. Rule of thumb from the data: ZEN pays off when ≥ ~35–55% of your traffic is cheaply solvable.

The three layers

layer	what it does	use it for
`ZenGateway`	classifies each message (chat / question / task) and dispatches the right amount of compute	chat apps, AI workspaces — an automatic thinking mode
`ZenRouter`	cheap consensus → thinking escalation, for verifiable answers	math, MCQ, facts, extraction, code-with-tests
`ZenPlanner`	plan-and-execute: decompose, run steps with threaded context, synthesize	multi-step tasks, agent pipelines

ZenGateway — the automatic thinking mode

from zen import ZenGateway

gw = ZenGateway()
gw.route("hey, what do you think about coffee?")   # chat  -> 1 cheap call
gw.route("Which planet has the most moons?")       # question -> consensus route
gw.route("Compare SQLite and PostgreSQL and recommend one.")  # task -> planner
gw.route(msg, kind="question")                     # or force the kind yourself

Tool-safety rule (built in): pass tools_present=True on turns where the model may call tools. The gateway then makes exactly one routed call — it never samples in parallel around side-effectful tools (two samples would run your tools twice). Your agent loop handles the tool cycle around it.

ZenRouter — verifiable Q→A

from zen import ZenRouter

router = ZenRouter()                    # math-style prompt/parser by default
result = router.solve("If 3x + 7 = 22, what is x?")
result.answer, result.tokens, result.path   # 5, ~4k, "consensus"

ZenRouter(
    variant="vote",          # "vote" (validated best) | "eager" | "hybrid"
    temperature=0.7,         # sampling diversity for consensus
    native_weight=2,         # native sample's weight in the final vote
    think_budget=16000,      # completion budget of the native call
    parser=my_extractor,     # swap the answer parser for your domain
    raw_log="samples.jsonl", # dump every sample for offline analysis
)

ZenPlanner — plan-and-execute with an agent hook

from zen import ZenPlanner

def my_agent_step(step_description, context):
    # run your tool-calling agent (e.g. SIFT) on this step, return text
    return my_agent.run(step_description, context)

planner = ZenPlanner(executor=my_agent_step)   # omit executor = pure reasoning
result = planner.run("Research X, compare with Y, write a recommendation.")
result.text     # final deliverable
result.steps    # [(step, result), ...]

The planner fixes the classic plan-and-execute inefficiency: each step receives the task + plan + clipped results of prior steps — not the full reasoning history — so cost grows linearly, not quadratically. A step that fails is retried once with native thinking (per-step escalation). Plans with fewer than two steps skip orchestration entirely.

Configuration

Point ZEN at any OpenAI-compatible endpoint:

export ZEN_LLM_BASE_URL=https://openrouter.ai/api/v1
export ZEN_LLM_MODEL=deepseek/deepseek-v4-flash
export ZEN_LLM_API_KEY=sk-or-...

or use a built-in profile (reads the key from OPENROUTER_API_KEY or a git-ignored .secrets.json):

from zen import config
config.apply_profile("deepseek-v4-flash")

Escalation needs a model exposing a thinking toggle (DeepSeek, Qwen, GPT reasoning-effort, Claude extended thinking...). Models without one still work — ZEN then behaves as consensus routing.

Everything is injectable — client_factory=, classifier=, parser=, executor= — so ZEN stays provider-agnostic and fully testable offline:

python tests/test_offline.py    # 31 tests, zero API calls

Negative results we kept (so you don't rediscover them)

Confidence gating (accept a high-logprob first sample): logprob coverage through OpenRouter is partial (~47%) and confidence did not predict correctness on AIME — the non-thinking model is often confidently wrong. Available as gate_tail_logprob=, off by default.
Halving the native think budget (think_budget=8000): saved only ~10% of total tokens and cost accuracy exactly where thinking was needed.
Truncating candidates in judge/aggregation prompts silently destroys them — keep the head and the tail (zen.parsing.clip).

Honest caveats: results come from one model family and N=30 math benchmarks; consensus needs comparable short answers (v0.2 targets verifiable outputs — swap parser= for your domain, open-ended text is future work).

Repo layout

zen/           the package: gateway, router, planner, client, parsing, config
tests/         offline test suite (no API key needed)
experiments/   evaluation harnesses (AIME/MATH/GSM8K) + the RL research line
docs/          full results tables + development log

Related work

ZEN distills ideas from self-consistency (Wang et al.), adaptive-consensus and fast/slow-routing research (AdaptThink, DART), DeepConf, RSA and RL of Thoughts — the tiny-controller idea that started this project. ZEN's contribution is the disagreement-routed cheap→thinking cascade with honest, per-call token accounting.

Works well next to SIFT (tool retrieval and calling by the same author): SIFT decides what the model can do, ZEN decides how hard it should think.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

victoralves0

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jul 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zen_router-0.2.0.tar.gz (22.8 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zen_router-0.2.0-py3-none-any.whl (19.8 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file zen_router-0.2.0.tar.gz.

File metadata

Download URL: zen_router-0.2.0.tar.gz
Upload date: Jul 2, 2026
Size: 22.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zen_router-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b9393334b869b030531c84f89eff85e08e47602800186359b35f4829612cd040`
MD5	`58b1fe40050f34a866c2bf099511a879`
BLAKE2b-256	`a8a06c1b23c074dc86cd34b775bb3fae82d1f24b0a674bc685fa923774909db3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zen_router-0.2.0.tar.gz:

Publisher: publish.yml on Victor-Alves0/ZEN

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zen_router-0.2.0.tar.gz
- Subject digest: b9393334b869b030531c84f89eff85e08e47602800186359b35f4829612cd040
- Sigstore transparency entry: 2048133337
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: Victor-Alves0/ZEN@a6cf4cf6a56f069da3cb39dd8abe5153415062ad
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Victor-Alves0
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a6cf4cf6a56f069da3cb39dd8abe5153415062ad
- Trigger Event: release

File details

Details for the file zen_router-0.2.0-py3-none-any.whl.

File metadata

Download URL: zen_router-0.2.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 19.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for zen_router-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48169f3c15b234166bcb7760cab496bc241eb01a586f086371770fc8f3659bb7`
MD5	`de238e6a34d188987875ecb2d8c5c26e`
BLAKE2b-256	`cee2bd8c888eac8dd64ab6f8710c03b2d0f480182dcb1770d537b085198bf3a5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zen_router-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Victor-Alves0/ZEN

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zen_router-0.2.0-py3-none-any.whl
- Subject digest: 48169f3c15b234166bcb7760cab496bc241eb01a586f086371770fc8f3659bb7
- Sigstore transparency entry: 2048133660
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: Victor-Alves0/ZEN@a6cf4cf6a56f069da3cb39dd8abe5153415062ad
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Victor-Alves0
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a6cf4cf6a56f069da3cb39dd8abe5153415062ad
- Trigger Event: release

zen-router 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ZEN — Sample · Agree · Escalate

Why

How it works

Benchmarks

The three layers

ZenGateway — the automatic thinking mode

ZenRouter — verifiable Q→A

ZenPlanner — plan-and-execute with an agent hook

Configuration

Negative results we kept (so you don't rediscover them)

Repo layout

Related work

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance