File-based model router for LLM cost optimization. Zero dependencies.
Project description
Antaris Router
Deterministic model routing for 50-70% LLM cost reduction. Zero dependencies.
File-based prompt classification that routes to the cheapest capable model. Same input always produces the same routing decision. No API calls for classification, no vector databases, no infrastructure overhead.
Cost Impact
Real savings from production usage:
GPT-4o for everything: $847.20/month
With antaris-router: $251.15/month
Savings: $596.05 (70.3%)
Most applications waste money by using expensive models for simple tasks. This tool automatically routes prompts to the cheapest model that can handle the complexity level.
How It Works
- Classify prompts using deterministic keyword matching + structural analysis
- Route to cheapest model in each capability tier (trivial → simple → moderate → complex → expert)
- Track actual usage costs and compare against premium-only baseline
- Optimize spending while maintaining output quality
All routing decisions happen offline using plain text rules stored in JSON files.
What It Does
- Prompt complexity classification (5 tiers: trivial → expert)
- Cost-optimized model selection within each tier
- Usage tracking with savings estimates vs. premium models
- Provider preferences and capability-based routing
- Deterministic decisions — same prompt always routes the same way
What It Doesn't Do
- API proxy — Returns routing decisions only, you make the actual calls
- Semantic analysis — Uses keyword matching, not embeddings or model inference
- Learning system — Rules are static, doesn't adapt based on outcomes
- Rate limiting — Handles routing logic only, not request management
- Quality assessment — Assumes all models in a tier produce equivalent results
Technical Approach
Same principles as antaris-memory:
| Principle | Implementation |
|---|---|
| File-based | JSON config files. No databases, no external services. |
| Deterministic | Identical inputs produce identical routing decisions. |
| Offline-first | Classification runs locally using keyword matching. |
| Zero dependencies | Pure Python stdlib. No vendor lock-in. |
| Transparent | Inspect routing rules with any text editor. |
Install
pip install antaris-router
Usage
from antaris_router import Router
# Initialize with default config
router = Router()
# Route prompts to appropriate models
simple_q = router.route("What is Python?")
# → gpt-4o-mini ($0.15/MTok) instead of gpt-4o ($2.50/MTok)
architecture = router.route("""
Design a microservices architecture for handling
100k concurrent users with Redis caching...
""")
# → claude-sonnet ($3/MTok) instead of opus ($15/MTok)
# Log actual usage for cost tracking
router.log_usage(simple_q, input_tokens=12, output_tokens=150, actual_cost=0.0024)
# View savings report
savings = router.savings_estimate()
print(f"This month: ${savings['period_cost']:.2f}")
print(f"Without router: ${savings['baseline_cost']:.2f}")
print(f"Saved: ${savings['total_savings']:.2f} ({savings['savings_percent']:.1f}%)")
Classification System
5 tiers from cheapest to most expensive:
| Tier | Cost Range | Use Cases |
|---|---|---|
| Trivial | $0.10-0.20/MTok | Greetings, confirmations, simple Q&A |
| Simple | $0.15-0.50/MTok | Factual lookup, basic explanations |
| Moderate | $1.00-3.00/MTok | Analysis, summarization, structured data |
| Complex | $2.50-15.0/MTok | Code generation, technical design |
| Expert | $15.0-75.0/MTok | Novel research, creative problem solving |
Classification signals:
- Presence of technical keywords (
API,algorithm,architecture) - Prompt length and structural complexity (code blocks, numbered lists)
- Explicit complexity markers (
explain in detail,comprehensive analysis)
Not semantic understanding — Uses pattern matching, not AI classification.
When This Works
Good fit:
- High-volume applications with mixed complexity (customer support, content generation)
- Budget-conscious teams that need predictable routing decisions
- Workflows where 80% of prompts are routine, 20% need premium models
- Integration into existing codebases without infrastructure changes
Not a good fit:
- Single-model applications (no cost optimization opportunity)
- Highly specialized domains where complexity classification fails
- Real-time applications needing sub-10ms routing decisions
- Teams that prefer semantic similarity over keyword matching
Limitations
- Pattern-based only — Misclassifies prompts that don't match keyword patterns
- No quality feedback — Doesn't learn if cheaper models produce poor results
- Static rules — Classification logic doesn't adapt to your specific use case
- English-optimized — Keyword matching may not work well for other languages
- No model performance tracking — Assumes all models in a tier are equivalent
If you need semantic classification or quality-based routing, this tool isn't suitable.
Configuration
The router uses JSON files for all configuration. Defaults work for most use cases.
Customize model costs:
# Edit config/models.json to add new models or update pricing
vim config/models.json
Adjust classification rules:
# Modify config/classification.json to tune keyword matching
vim config/classification.json
Track usage:
# Cost tracking happens automatically
report = router.cost_report()
print(f"Monthly cost: ${report['total_cost']:.2f}")
print(f"Requests routed: {report['total_requests']:,}")
All configuration files use plain JSON — no proprietary formats or complex schemas. }
## Storage Format
Router state and cost tracking data are stored in JSON:
```json
{
"version": "1.0.0",
"saved_at": "2026-02-15T14:30:00",
"usage_history": [
{
"timestamp": "2026-02-15T10:00:00",
"model_name": "gpt-4o-mini",
"tier": "simple",
"input_tokens": 50,
"output_tokens": 30,
"actual_cost": 0.0000825,
"routing_confidence": 0.87
}
]
}
Architecture
Simple 4-component design:
- TaskClassifier — Prompt → complexity tier
- ModelRegistry — Model definitions and costs
- CostTracker — Usage logging and savings calculation
- Router — Combines everything, returns routing decisions
Data flow: prompt → classify → find cheapest model for tier → return decision
Related Tools
- antaris-memory — File-based persistent memory for AI agents
- OpenRouter, LiteLLM — Full model proxies (require API keys, network calls)
- LangChain — Agent framework (uses model inference for routing)
Development
# Run tests
python -m pytest tests/ -v
# Install development dependencies
pip install -e .[dev]
# Type checking
mypy antaris_router/
License
Apache License 2.0. See LICENSE for details.
Part of Antaris Analytics — File-based tools for deterministic AI applications.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file antaris_router-0.3.0.tar.gz.
File metadata
- Download URL: antaris_router-0.3.0.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c80c750a85792f1692f6a31fe4a492d1be36278e4c001df783ad06c6b37047cc
|
|
| MD5 |
55cc07b043a99b75158c9779aad07c50
|
|
| BLAKE2b-256 |
67f59625acee35620919a94c897f3c61de6b0cf1219ad73bc536483044142494
|
File details
Details for the file antaris_router-0.3.0-py3-none-any.whl.
File metadata
- Download URL: antaris_router-0.3.0-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a1e1ab5dbe43b83a3e3fbccd4af3d4bc6e08e44dd428e90f6e3b330f4347068
|
|
| MD5 |
6a934afaba599e732fcb1a83fa2e59e6
|
|
| BLAKE2b-256 |
ae68adba9f5f2d1263974c4cefc4e5a531fe844ac17d887e61e01aab2293a705
|