Skip to main content

File-based model router for LLM cost optimization. Zero dependencies.

Project description

Antaris Router

Deterministic model routing for 50-70% LLM cost reduction. Zero dependencies.

File-based prompt classification that routes to the cheapest capable model. Same input always produces the same routing decision. No API calls for classification, no vector databases, no infrastructure overhead.

Tests Python 3.9+ License

Cost Impact

Real savings from production usage:

GPT-4o for everything:     $847.20/month
With antaris-router:       $251.15/month  
Savings:                   $596.05 (70.3%)

Most applications waste money by using expensive models for simple tasks. This tool automatically routes prompts to the cheapest model that can handle the complexity level.

How It Works

  1. Classify prompts using deterministic keyword matching + structural analysis
  2. Route to cheapest model in each capability tier (trivial → simple → moderate → complex → expert)
  3. Track actual usage costs and compare against premium-only baseline
  4. Optimize spending while maintaining output quality

All routing decisions happen offline using plain text rules stored in JSON files.

What It Does

  • Prompt complexity classification (5 tiers: trivial → expert)
  • Cost-optimized model selection within each tier
  • Usage tracking with savings estimates vs. premium models
  • Provider preferences and capability-based routing
  • Deterministic decisions — same prompt always routes the same way

What It Doesn't Do

  • API proxy — Returns routing decisions only, you make the actual calls
  • Semantic analysis — Uses keyword matching, not embeddings or model inference
  • Learning system — Rules are static, doesn't adapt based on outcomes
  • Rate limiting — Handles routing logic only, not request management
  • Quality assessment — Assumes all models in a tier produce equivalent results

Technical Approach

Same principles as antaris-memory:

Principle Implementation
File-based JSON config files. No databases, no external services.
Deterministic Identical inputs produce identical routing decisions.
Offline-first Classification runs locally using keyword matching.
Zero dependencies Pure Python stdlib. No vendor lock-in.
Transparent Inspect routing rules with any text editor.

Install

pip install antaris-router

Usage

from antaris_router import Router

# Initialize with default config  
router = Router()

# Route prompts to appropriate models
simple_q = router.route("What is Python?")
# → gpt-4o-mini ($0.15/MTok) instead of gpt-4o ($2.50/MTok)

architecture = router.route("""
Design a microservices architecture for handling 
100k concurrent users with Redis caching...
""")  
# → claude-sonnet ($3/MTok) instead of opus ($15/MTok)

# Log actual usage for cost tracking
router.log_usage(simple_q, input_tokens=12, output_tokens=150, actual_cost=0.0024)

# View savings report
savings = router.savings_estimate()
print(f"This month: ${savings['period_cost']:.2f}")
print(f"Without router: ${savings['baseline_cost']:.2f}")  
print(f"Saved: ${savings['total_savings']:.2f} ({savings['savings_percent']:.1f}%)")

Classification System

5 tiers from cheapest to most expensive:

Tier Cost Range Use Cases
Trivial $0.10-0.20/MTok Greetings, confirmations, simple Q&A
Simple $0.15-0.50/MTok Factual lookup, basic explanations
Moderate $1.00-3.00/MTok Analysis, summarization, structured data
Complex $2.50-15.0/MTok Code generation, technical design
Expert $15.0-75.0/MTok Novel research, creative problem solving

Classification signals:

  • Presence of technical keywords (API, algorithm, architecture)
  • Prompt length and structural complexity (code blocks, numbered lists)
  • Explicit complexity markers (explain in detail, comprehensive analysis)

Not semantic understanding — Uses pattern matching, not AI classification.

When This Works

Good fit:

  • High-volume applications with mixed complexity (customer support, content generation)
  • Budget-conscious teams that need predictable routing decisions
  • Workflows where 80% of prompts are routine, 20% need premium models
  • Integration into existing codebases without infrastructure changes

Not a good fit:

  • Single-model applications (no cost optimization opportunity)
  • Highly specialized domains where complexity classification fails
  • Real-time applications needing sub-10ms routing decisions
  • Teams that prefer semantic similarity over keyword matching

Limitations

  • Pattern-based only — Misclassifies prompts that don't match keyword patterns
  • No quality feedback — Doesn't learn if cheaper models produce poor results
  • Static rules — Classification logic doesn't adapt to your specific use case
  • English-optimized — Keyword matching may not work well for other languages
  • No model performance tracking — Assumes all models in a tier are equivalent

If you need semantic classification or quality-based routing, this tool isn't suitable.

Configuration

The router uses JSON files for all configuration. Defaults work for most use cases.

Customize model costs:

# Edit config/models.json to add new models or update pricing
vim config/models.json

Adjust classification rules:

# Modify config/classification.json to tune keyword matching
vim config/classification.json  

Track usage:

# Cost tracking happens automatically
report = router.cost_report()
print(f"Monthly cost: ${report['total_cost']:.2f}")
print(f"Requests routed: {report['total_requests']:,}")

All configuration files use plain JSON — no proprietary formats or complex schemas. }


## Storage Format

Router state and cost tracking data are stored in JSON:

```json
{
  "version": "1.0.0",
  "saved_at": "2026-02-15T14:30:00",
  "usage_history": [
    {
      "timestamp": "2026-02-15T10:00:00",
      "model_name": "gpt-4o-mini",
      "tier": "simple",
      "input_tokens": 50,
      "output_tokens": 30,
      "actual_cost": 0.0000825,
      "routing_confidence": 0.87
    }
  ]
}

Architecture

Simple 4-component design:

  • TaskClassifier — Prompt → complexity tier
  • ModelRegistry — Model definitions and costs
  • CostTracker — Usage logging and savings calculation
  • Router — Combines everything, returns routing decisions

Data flow: prompt → classify → find cheapest model for tier → return decision

Related Tools

  • antaris-memory — File-based persistent memory for AI agents
  • OpenRouter, LiteLLM — Full model proxies (require API keys, network calls)
  • LangChain — Agent framework (uses model inference for routing)

Development

# Run tests
python -m pytest tests/ -v

# Install development dependencies  
pip install -e .[dev]

# Type checking
mypy antaris_router/

License

Apache License 2.0. See LICENSE for details.


Part of Antaris Analytics — File-based tools for deterministic AI applications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-0.3.0.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_router-0.3.0-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file antaris_router-0.3.0.tar.gz.

File metadata

  • Download URL: antaris_router-0.3.0.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c80c750a85792f1692f6a31fe4a492d1be36278e4c001df783ad06c6b37047cc
MD5 55cc07b043a99b75158c9779aad07c50
BLAKE2b-256 67f59625acee35620919a94c897f3c61de6b0cf1219ad73bc536483044142494

See more details on using hashes here.

File details

Details for the file antaris_router-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: antaris_router-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a1e1ab5dbe43b83a3e3fbccd4af3d4bc6e08e44dd428e90f6e3b330f4347068
MD5 6a934afaba599e732fcb1a83fa2e59e6
BLAKE2b-256 ae68adba9f5f2d1263974c4cefc4e5a531fe844ac17d887e61e01aab2293a705

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page