Cut LLM reasoning-token costs by 60% with one line of code.

These details have not been verified by PyPI

Project links

Project description

ThinkRouter

Pre-inference query routing for LLM reasoning models.
Cut thinking-token costs by 60% with one line of code.

The problem

Reasoning models (o1, DeepSeek-R1, Claude thinking) apply the same 8,000-token compute budget to every query — whether it is simple arithmetic or a complex proof.

"What is 2 + 3?"                   →  8,000 thinking tokens   ← 99% wasted
"Prove that sqrt(2) is irrational"  →  8,000 thinking tokens   ← correctly used

At 100,000 queries per day, that is $192,635/month in avoidable spend.

The solution

from thinkrouter import ThinkRouter

client   = ThinkRouter(provider="openai")
response = client.chat("What is the capital of France?")
# Routed to NO_THINK → 50 tokens used, not 8,000

client.usage.print_dashboard()

  ThinkRouter — Usage Dashboard
  ──────────────────────────────────────────────
  Total calls          : 13
  Tokens saved         : 55,650
  Compute savings      : 53.5%
  Avg classifier time  : 0.02 ms

  Routing breakdown:
    no_think        :      7  (53.8%)  — Direct answer
    short_think     :      0  ( 0.0%)  — Moderate reasoning
    full_think      :      6  (46.2%)  — Full extended reasoning

How it works

ThinkRouter intercepts each query, runs a lightweight classifier in under 1ms, and routes to the minimum compute budget:

Tier	Budget	Use case
NO_THINK	50 tokens	Arithmetic, definitions, lookups, translations
SHORT	800 tokens	Multi-step reasoning, moderate chaining
FULL	8,000 tokens	Proofs, system design, algorithm implementation

Installation

# Base install — works immediately, zero ML dependencies
pip install thinkrouter

# With fine-tuned DistilBERT classifier (higher accuracy)
pip install thinkrouter[classifier]

# With OpenAI client
pip install thinkrouter[openai]

# With Anthropic client
pip install thinkrouter[anthropic]

# Everything
pip install thinkrouter[all]

Quick start

Try it now — no API key needed

OpenAI

from thinkrouter import ThinkRouter

client = ThinkRouter(
    provider="openai",
    api_key="sk-...",      # or set OPENAI_API_KEY
    model="gpt-4o",
    verbose=True,
)

response = client.chat("Explain how merge sort works.")
print(response.content)
print(response.routing)
# ClassifierResult(tier=FULL, confidence=0.87, budget=8000 tokens, latency=1.2ms)

client.usage.print_dashboard()

Anthropic

client = ThinkRouter(
    provider="anthropic",
    api_key="sk-ant-...",  # or set ANTHROPIC_API_KEY
    model="claude-haiku-4-5-20251001",
)

response = client.chat("What is 144 divided by 12?")
# Routed to NO_THINK → 50 tokens, not 8,000

Streaming

for chunk in client.stream("Explain quantum entanglement step by step."):
    print(chunk, end="", flush=True)

Classify without an API call

results = client.classify_batch([
    "What is 7 * 8?",
    "Design a distributed caching system.",
    "How many days are in a leap year?",
])

for r in results:
    print(f"{r.tier.name:<12}  budget={r.token_budget:>6} tokens  conf={r.confidence:.2f}")

NO_THINK      budget=    50 tokens  conf=0.88
FULL          budget=  8000 tokens  conf=0.85
NO_THINK      budget=    50 tokens  conf=0.80

Cost savings at scale

Volume	Savings/day	Savings/month
10,000 queries/day	$642	$19,263
100,000 queries/day	$6,421	$192,635
1,000,000 queries/day	$64,212	$1,926,346

Based on 53.5% savings rate, $15/million reasoning tokens (approximate o1 rate).

Classifier backends

Heuristic (default)

Zero dependencies. Regex patterns and word-count heuristics. Runs in under 1ms.

client = ThinkRouter(classifier_backend="heuristic")

DistilBERT

Fine-tuned on GSM8K. Achieves 93%+ quality retention at 60% compute savings.
Requires pip install thinkrouter[classifier].

client = ThinkRouter(
    classifier_backend="distilbert",
    confidence_threshold=0.75,
)

Confidence threshold

Threshold	Savings	Quality retained	Use case
0.65	~59%	~91%	High cost sensitivity
0.75	~55%	~93%	Recommended
0.85	~44%	~96%	Quality-sensitive

Queries below the threshold fall back to FULL — never degrades output quality.

API reference

ThinkRouter

ThinkRouter(
    provider             = "openai",      # "openai" | "anthropic" | "generic"
    api_key              = None,          # falls back to OPENAI_API_KEY / ANTHROPIC_API_KEY
    model                = None,          # default model for all calls
    classifier_backend   = "heuristic",   # "heuristic" | "distilbert"
    confidence_threshold = 0.75,
    max_records          = 10_000,
    verbose              = False,
)

RouterResponse

response.content       # str — generated text
response.routing       # ClassifierResult
response.provider      # "openai" | "anthropic"
response.model         # model identifier
response.usage_tokens  # {"prompt_tokens": N, "completion_tokens": M, ...}

ClassifierResult

result.tier          # Tier.NO_THINK | Tier.SHORT | Tier.FULL
result.confidence    # float in [0, 1]
result.token_budget  # int — thinking tokens assigned
result.latency_ms    # classifier wall-clock time in ms
result.backend       # "heuristic" | "distilbert:cuda" | "distilbert:cpu"

Running tests

git clone https://github.com/saikoushiknalubola/thinkrouter.git
cd thinkrouter
pip install -e ".[dev]"
pytest tests/ -v

Roadmap

Heuristic classifier
OpenAI and Anthropic adapters
Streaming support
Thread-safe usage dashboard
GitHub Actions CI (Python 3.9–3.12)
DistilBERT model on HuggingFace Hub
Multi-domain training (MMLU, HumanEval, ARC-Challenge)
Async support (achat(), astream())
Continuous budget regression
Hosted API proxy (api.thinkrouter.ai)

Research basis

Zhao et al. (2025). SelfBudgeter. arXiv:2505.11274 — 74.47% savings validated
Wang et al. (2025). TALE-EP. ACL Findings 2025 — 67% output token reduction
Sanh et al. (2019). DistilBERT. arXiv:1910.01108
Cobbe et al. (2021). GSM8K. arXiv:2110.14168

Contributing

See CONTRIBUTING.md. Issues and pull requests welcome.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Mar 31, 2026

0.2.0

Mar 31, 2026

0.1.1

Mar 28, 2026

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thinkrouter-0.3.0.tar.gz (30.0 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

thinkrouter-0.3.0-py3-none-any.whl (27.2 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file thinkrouter-0.3.0.tar.gz.

File metadata

Download URL: thinkrouter-0.3.0.tar.gz
Upload date: Mar 31, 2026
Size: 30.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for thinkrouter-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`0e2bdd4b2fe2e8c74f3361527365325eb85ebf3eeca1d957bb6df28f9015396f`
MD5	`192a2976c077547674949619c50c2c09`
BLAKE2b-256	`5fa8e7579b10ba0510b727a8ea2305c19fd499ed4242ece53ab3be0907edfc97`

See more details on using hashes here.

File details

Details for the file thinkrouter-0.3.0-py3-none-any.whl.

File metadata

Download URL: thinkrouter-0.3.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 27.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for thinkrouter-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e7c76d138f73f4b8df5a30b35949cc8d9a9ede60caf990d09d83dc6d6001523`
MD5	`042ee4a39b2975f03b85ae105387fc76`
BLAKE2b-256	`f2bf646447cb3bfdac1e947439622fc8cbfc39a05dac0d90d80d307fdf094e03`

See more details on using hashes here.

thinkrouter 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ThinkRouter

The problem

The solution

How it works

Installation

Quick start

Try it now — no API key needed

OpenAI

Anthropic

Streaming

Classify without an API call

Cost savings at scale

Classifier backends

Heuristic (default)

DistilBERT

Confidence threshold

API reference

ThinkRouter

RouterResponse

ClassifierResult

Running tests

Roadmap

Research basis

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes