Skip to main content

File-based model router for LLM cost optimization. Zero dependencies.

Project description

Antaris Router

Adaptive model router for LLM applications. Learns from outcomes. Zero dependencies.

Routes prompts to the cheapest capable model using semantic classification (TF-IDF), not keyword matching. Tracks outcomes to learn which models actually perform well on which tasks. All state stored in plain JSON files.

PyPI Tests Python 3.9+ License

What It Does

  • Semantic classification — TF-IDF vectors + cosine similarity, not keyword matching
  • Outcome learning — tracks routing decisions and their results, builds per-model quality profiles
  • Fallback chains — automatic escalation when cheap models fail
  • A/B testing — routes a configurable % to premium models to validate cheap routing
  • Context-aware — adjusts routing based on iteration count, conversation length, user expertise
  • Multi-objective — optimize for quality, cost, speed, or balanced
  • Runs fully offline — zero network calls, zero tokens, zero API keys

Demo

Prompt                                                          Tier       Model
──────────────────────────────────────────────────────────────────────────────────
What is 2 + 2?                                                  trivial    gpt-4o-mini
Translate hello to French                                       trivial    gpt-4o-mini
Write a Python function to reverse a string                     simple     gpt-4o-mini
Implement a React component with sortable table and pagination  moderate   claude-sonnet
Write a class that manages a connection pool with retry logic   moderate   claude-sonnet
Design microservices for e-commerce with 10K users and CQRS     complex    claude-sonnet
Fix this bug (iteration 1)                                      trivial    gpt-4o-mini
Fix this bug (iteration 5)                                      simple     claude-sonnet

Note: "Implement a React component" routes to moderate, not trivial. The semantic classifier understands that implementation tasks require real capability, regardless of prompt length.

Performance

Routing latency (100 calls): median 0.04ms, p99 0.17ms
Classification: ~50 seed examples across 5 tiers, TF-IDF with cosine similarity
Memory: <5MB for typical workloads
Storage: 3 JSON files (examples, model, decisions)

Measured on Apple M4, Python 3.14. Your numbers will vary.

What It Doesn't Do

  • Not a proxy — doesn't forward requests to models. It tells you which model to use.
  • Not semantic search — no embeddings, no vector DB. Uses TF-IDF (bag-of-words with term weighting).
  • Not real-time market data — doesn't track live model pricing or availability.
  • Classification is statistical, not perfect — edge cases exist. Use teach() to correct them.
  • Quality tracking requires your feedback — call report_outcome() after using the model, or the router can't learn.

Install

pip install antaris-router

Quick Start

from antaris_router import AdaptiveRouter, ModelConfig

router = AdaptiveRouter("./routing_data")

# Register your models with their capability ranges
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))
router.register_model(ModelConfig(
    name="claude-sonnet",
    tier_range=("simple", "complex"),
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))
router.register_model(ModelConfig(
    name="claude-opus",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

# Route a prompt
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Use {result.model} (tier: {result.tier}, confidence: {result.confidence:.2f})")
# → Use claude-sonnet (tier: complex, confidence: 0.50)

# Report outcome so the router learns
router.report_outcome(result.prompt_hash, quality_score=0.9, success=True)

Outcome Learning

The router gets smarter over time. When a cheap model consistently fails on a task type, the router learns to skip it.

# Initial: routes to cheap model
result = router.route("Write a regex to validate emails")
# → cheap (score: 0.50)

# After reporting 5 failures on cheap:
router.report_outcome(hash, quality_score=0.15, success=False)
# ... repeat ...

# After reporting 4 successes on premium:
# → premium (cheap score: 0.28, premium score: 0.89)

Quality scores are computed per model per tier:

score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)

Models below the escalation threshold (default 0.30) are automatically skipped.

Context-Aware Routing

Pass context to influence routing decisions:

# First attempt — routes normally
result = router.route("Fix this bug", context={"iteration": 1})
# → trivial → cheap model

# Fifth attempt — escalates (user is struggling)
result = router.route("Fix this bug", context={"iteration": 5})
# → simple → better model

# Long conversation — minimum moderate
result = router.route("What do you think?", context={"conversation_length": 15})

# Expert user — don't waste time with weak models
result = router.route("Optimize this", context={"user_expertise": "expert"})

A/B Testing

Validate cheap routing by occasionally sending requests to premium:

router = AdaptiveRouter("./data", ab_test_rate=0.05)  # 5% of requests

# 95% of trivial/simple/moderate prompts route normally
# 5% route to the premium model for quality comparison
# Track outcomes to confirm cheap routing is actually good enough

Fallback Chains

Every routing result includes a fallback chain:

result = router.route("Write unit tests for authentication")
print(result.model)           # → gpt-4o-mini
print(result.fallback_chain)  # → ['claude-sonnet', 'claude-opus']

# If the primary model fails:
next_model = router.escalate(result.prompt_hash)
print(next_model)  # → claude-sonnet

Teaching Corrections

When the classifier gets it wrong, teach it:

# Classifier thinks this is simple, but it's actually complex
router.teach(
    "Optimize our Kubernetes deployment for cost efficiency",
    "complex"
)
# Correction is learned permanently and improves future classifications

Semantic Classification

The classifier uses TF-IDF (term frequency-inverse document frequency) with cosine similarity, not keyword matching.

How it works:

  1. Ships with ~50 labeled examples across 5 tiers (trivial → expert)
  2. Builds TF-IDF vectors from the example corpus
  3. For each new prompt, computes similarity to all examples
  4. Scores each tier by average similarity to its top-3 closest examples
  5. Applies structural adjustments (long prompts can't be trivial, code presence boosts tier)

Why not embeddings? Embeddings would be better but require either an API call (defeats offline goal) or a model file (~100MB+). TF-IDF gets 80% of the benefit with zero dependencies.

Storage Format

routing_data/
├── routing_examples.json    # Labeled examples (seed + learned)
├── routing_model.json       # TF-IDF model (IDF weights, vocab)
├── routing_decisions.json   # Decision history for outcome learning
├── model_profiles.json      # Per-model per-tier quality scores
└── router_config.json       # Model registry and settings

All plain JSON. Inspect or edit with any text editor.

Architecture

AdaptiveRouter
├── SemanticClassifier
│   └── TFIDFVectorizer     — Term weighting + cosine similarity
├── QualityTracker
│   ├── RoutingDecision      — Decision + outcome records
│   └── ModelProfiles        — Per-model per-tier quality scores
├── ContextAdjuster          — Iteration, conversation, expertise signals
├── FallbackChain            — Ordered model escalation
└── ABTester                 — Validation routing (configurable %)

Routing flow: prompt → semantic classify → context adjust → quality filter → select model → record decision → return

Tiers

Tier Description Examples
trivial One-line answers, lookups "What is 2+2?", "Define photosynthesis"
simple Short tasks, basic code "Reverse a string", "Explain TCP vs UDP"
moderate Multi-step implementation "Build a REST API with auth", "Write a caching layer"
complex Architecture, multi-system "Design microservices for e-commerce", "Build a custom ORM"
expert Full system design "Architect a globally distributed database", "Design HFT platform"

Comparison

Antaris Router OpenRouter LiteLLM RouteLLM
Classification TF-IDF semantic API proxy API proxy Embeddings
Learning ✅ Outcome tracking
Offline
A/B testing ✅ Built-in
Context-aware
Fallback chains
Zero dependencies
Infrastructure None Cloud Cloud GPU/API

Honest assessment: OpenRouter and LiteLLM are API proxies that do more (actual request forwarding, billing, rate limiting). Antaris Router is a classification library — it tells you which model to use, not how to call it. Different tools for different needs.

Legacy API

The v1 keyword-based router is still available:

from antaris_router import Router  # v1 API
router = Router("./config")
decision = router.route("What's 2+2?")

We recommend migrating to AdaptiveRouter for better classification accuracy.

Part of the Antaris Analytics Suite

  • antaris-memory — Persistent memory for AI agents
  • antaris-router — Adaptive model routing (this package)
  • antaris-guard — Security and prompt injection detection (coming soon)
  • antaris-context — Context window optimization (coming soon)

License

Licensed under the Apache License 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-2.0.0.tar.gz (44.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_router-2.0.0-py3-none-any.whl (38.5 kB view details)

Uploaded Python 3

File details

Details for the file antaris_router-2.0.0.tar.gz.

File metadata

  • Download URL: antaris_router-2.0.0.tar.gz
  • Upload date:
  • Size: 44.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-2.0.0.tar.gz
Algorithm Hash digest
SHA256 28d63c2579ad318016ede79a97241e868f1adf9abbd778004e834bba31c93135
MD5 4bc96c859304717394518c664474468b
BLAKE2b-256 6e45f4d08726962300fc949768ccd4e8ef7db532e8b8844b031e3b3176e3229a

See more details on using hashes here.

File details

Details for the file antaris_router-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: antaris_router-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 38.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0db8ad97816b6d048b6ef5ec1c89d5c760f04961c865d34490807b0ee2b2ee7d
MD5 b4a9b888450a22f603d76011ac335599
BLAKE2b-256 6c4c38852a722c5c68721bc9ead3bd4dabf15c8f1a13257d58fd89c7872004a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page