File-based model router for LLM cost optimization. Zero dependencies.
Project description
antaris-router
Adaptive model routing with semantic classification and outcome learning. Zero external dependencies.
Routes prompts to optimal models using TF-IDF classification (no embeddings required). Tracks routing decisions and outcomes to improve accuracy over time. Fallback chains provide automatic failover. All state persists to JSON files.
pip install antaris-router
Version 4.0.1 | Suite Compatibility: antaris-suite 4.2.0 | Python 3.9+ | stdlib only
Benchmarks
- Routing accuracy: 100% (8/8 correct on standard test suite)
- Self-improving: accuracy increases with outcome data accumulation
- Latency: median 0.05ms, p99 0.09ms
- Memory: <5MB for typical workloads
Key Exports
from antaris_router import AdaptiveRouter, Router, RoutingDecision, ModelConfig
Complete Workflow Example
from antaris_router import AdaptiveRouter, ModelConfig
# Initialize router with file-based persistence
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)
# Register models with tier ranges and costs
router.register_model(ModelConfig(
name="gpt-4o-mini",
tier_range=("trivial", "moderate"),
cost_per_1k_input=0.00015,
cost_per_1k_output=0.0006,
))
router.register_model(ModelConfig(
name="claude-sonnet",
tier_range=("simple", "complex"),
cost_per_1k_input=0.003,
cost_per_1k_output=0.015,
))
router.register_model(ModelConfig(
name="claude-opus",
tier_range=("complex", "expert"),
cost_per_1k_input=0.015,
cost_per_1k_output=0.075,
))
# Route prompts to appropriate models
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Route to: {result.model}")
print(f"Tier: {result.tier}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Fallback chain: {result.fallback_chain}")
# Use the model (your implementation)
response = your_llm_client.call(result.model, result.prompt)
quality_score = evaluate_response(response) # 0.0-1.0
# Report outcome so router learns
router.report_outcome(
prompt_hash=result.prompt_hash,
quality_score=quality_score,
success=quality_score > 0.7
)
# Save learned state
router.save()
# View routing analytics
analytics = router.routing_analytics()
print(f"Total decisions: {analytics['total_decisions']}")
print(f"Cost savings: ${analytics['cost_savings']:.2f}")
Semantic Classification
Uses TF-IDF vectorization with cosine similarity for semantic understanding. No external embeddings or API calls required.
# These prompts route to different tiers despite similar length
router.route("What is 2 + 2?") # tier: trivial
router.route("Implement OAuth2 flow") # tier: moderate
router.route("Design distributed consensus") # tier: expert
Classification Features:
- ~50 labeled examples across 5 complexity tiers
- TF-IDF term weighting for semantic understanding
- Cosine similarity for classification decisions
teach()method for manual corrections
# Correct misclassification
router.teach("Optimize Kubernetes for cost", "complex")
Quality Tracking with Outcome Learning
Router builds quality profiles per model per tier based on reported outcomes.
# Quality score calculation
score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)
# View model performance
profiles = router.get_model_profiles()
print(profiles["gpt-4o-mini"]["moderate"])
# {'quality_score': 0.73, 'attempts': 45, 'successes': 33}
# Models below threshold (default 0.30) are skipped
router.set_escalation_threshold(0.35)
Learning Process:
- Router makes initial routing decision
- You use the suggested model
- Call
report_outcome()with quality score and success flag - Router updates quality profiles
- Future routing considers learned performance data
Fallback Chains
Automatic failover when primary models are unavailable or perform poorly.
# Configure fallback order
router = AdaptiveRouter(
data_dir="./routing_data",
fallback_chain=["gpt-4o-mini", "claude-sonnet", "claude-opus"]
)
result = router.route("Debug this memory leak")
print(result.model) # Primary choice
print(result.fallback_chain) # Ordered alternatives
# Escalate to next model if primary fails
next_model = router.escalate(result.prompt_hash)
A/B Testing Support
Randomly routes a percentage of requests to premium models for validation.
# Route 5% to premium models regardless of classification
router = AdaptiveRouter("./data", ab_test_rate=0.05)
# Track A/B test results
stats = router.get_ab_stats()
print(f"A/B tests: {stats['total_tests']}")
print(f"Premium win rate: {stats['premium_win_rate']:.2f}")
Context-Aware Routing
Adjusts routing based on conversation state and user expertise.
# Iteration count influences tier selection
result = router.route("Fix this bug", context={"iteration": 1}) # Normal tier
result = router.route("Fix this bug", context={"iteration": 5}) # Escalated tier
# Conversation length sets minimum tier
result = router.route("Any thoughts?", context={"conversation_length": 20})
# User expertise level
result = router.route("Optimize this", context={"user_expertise": "expert"})
# Query complexity analysis
result = router.route(long_complex_prompt, context={"analyze_complexity": True})
Context Parameters:
iteration: Attempt number (escalates on repeated failures)conversation_length: Message count (longer = higher minimum tier)user_expertise: "novice", "intermediate", "expert"analyze_complexity: Enable structural complexity analysis
Cost Tracking and Optimization
Tracks usage costs and calculates savings versus premium-only routing.
# Cost analysis
cost_report = router.get_cost_analysis(days=7)
print(f"Total cost: ${cost_report['total_cost']:.2f}")
print(f"Savings vs premium: ${cost_report['savings']:.2f}")
print(f"Cost per request: ${cost_report['avg_cost_per_request']:.4f}")
# Usage breakdown by model
for model, data in cost_report['by_model'].items():
print(f"{model}: {data['requests']} requests, ${data['cost']:.2f}")
Confidence Gating
Routes to cheaper models when confidence is high, escalates when uncertain.
from antaris_router import ConfidenceRouter
router = ConfidenceRouter(
confidence_threshold=0.8, # Use cheap model if confidence > 0.8
cheap_model="gpt-4o-mini",
premium_model="claude-sonnet"
)
result = router.route("Simple math problem")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}") # Likely cheap model
result = router.route("Complex system architecture question")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}") # Likely premium model
Tier System
Five complexity levels from trivial lookups to expert system design.
| Tier | Examples | Characteristics |
|---|---|---|
| trivial | "What is 2+2?", "Define REST" | Single fact lookup, <10 words |
| simple | "Reverse string in Python", "TCP vs UDP" | Basic programming, short explanations |
| moderate | "Implement JWT auth", "Design Redis cache" | Multi-step implementation, system components |
| complex | "Microservices architecture", "Database sharding" | System design, multiple technologies |
| expert | "Distributed consensus algorithm", "HFT platform" | Research-level problems, novel solutions |
# View tier distribution
analytics = router.routing_analytics()
print(analytics['tier_distribution'])
# {'trivial': 0.25, 'simple': 0.30, 'moderate': 0.25, 'complex': 0.15, 'expert': 0.05}
File-Based State Persistence
All routing decisions and learning data persists to JSON files.
routing_data/
├── routing_examples.json # Classification training data
├── routing_model.json # TF-IDF model weights
├── routing_decisions.json # Decision history
├── model_profiles.json # Quality scores per model/tier
└── router_config.json # Model registry and settings
# Manual state management
router.save() # Save all state
router.load() # Load from disk
router.backup("backup_dir") # Create backup
router.export_data() # Export for analysis
MCP Server Integration
Optional MCP server for external integrations.
from antaris_router.mcp import MCPServer
# Start MCP server
server = MCPServer(router, port=8000)
server.start()
# MCP endpoints
# GET /route?prompt=... - Get routing decision
# POST /outcome - Report outcome
# GET /analytics - View routing statistics
Legacy Router (v1 API)
Keyword-based classification with SLA monitoring.
from antaris_router import Router, SLAConfig
sla = SLAConfig(
max_latency_ms=200,
budget_per_hour_usd=5.00,
min_quality_score=0.7
)
router = Router(config_path="config.json", sla=sla)
decision = router.route("Implement user authentication")
# SLA monitoring
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()
Integration Examples
With OpenAI:
import openai
result = router.route(prompt)
response = openai.chat.completions.create(
model=result.model,
messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)
With Anthropic:
import anthropic
result = router.route(prompt)
response = anthropic.messages.create(
model=result.model,
messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)
With Local Models (Ollama):
import requests
# Register local model at $0 cost
router.register_model(ModelConfig(
name="llama3-8b-local",
tier_range=("trivial", "simple"),
cost_per_1k_input=0.0,
cost_per_1k_output=0.0
))
result = router.route(prompt)
if "local" in result.model:
response = requests.post("http://localhost:11434/api/generate",
json={"model": result.model, "prompt": prompt})
Architecture
AdaptiveRouter
├── SemanticClassifier
│ └── TFIDFVectorizer # Term frequency analysis
├── QualityTracker
│ ├── RoutingDecision # Decision records
│ └── ModelProfiles # Per-model quality scores
├── ContextAdjuster # Context-aware tier adjustment
├── FallbackChain # Model escalation logic
└── ABTester # Validation routing
Router (Legacy)
├── TaskClassifier # Keyword-based classification
├── ModelRegistry # Model capabilities
├── CostTracker # Usage analysis
└── SLAMonitor # Budget and latency enforcement
Testing
git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-router
pip install pytest
python -m pytest tests/ -v
All 194 tests pass. Zero external dependencies required.
Performance Characteristics
- Cold start latency: 0.05ms median
- Memory usage: <5MB typical workload
- Classification accuracy: 100% on test suite (8/8 cases)
- Storage overhead: ~1KB per 1000 routing decisions
- TF-IDF model size: ~50KB for 5-tier classification
Limitations
- Classification is statistical, not deterministic
- Requires outcome feedback for learning
- TF-IDF less accurate than embeddings for edge cases
- No real-time pricing data
- Does not call models directly
License
Apache 2.0 License. See LICENSE for details.
Part of the antaris-suite:
- antaris-memory - Persistent memory for agents
- antaris-guard - Security and prompt injection detection
- antaris-context - Context window optimization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file antaris_router-4.9.17.tar.gz.
File metadata
- Download URL: antaris_router-4.9.17.tar.gz
- Upload date:
- Size: 72.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90d8278ace87b0198d8cacabf9d033ea0e6b0401c842ad0bba3ac09c7b945178
|
|
| MD5 |
129023b0ccc8f66c5cba1a239eee6460
|
|
| BLAKE2b-256 |
f2a7cd403bc368b2cf4b9a9a9e323f24f0666c60aac8b4e15abc16d212f5c479
|
File details
Details for the file antaris_router-4.9.17-py3-none-any.whl.
File metadata
- Download URL: antaris_router-4.9.17-py3-none-any.whl
- Upload date:
- Size: 58.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19136c259893acd5163ad3a189352c0c0e9e6171b14091fa7e5b68ccc16118df
|
|
| MD5 |
f6963ac8af8be507703caabb8fd0a927
|
|
| BLAKE2b-256 |
5952c9656e5c44a586ca48f051b1921005fa8c9f9870d4d62a0c0ed30bcde256
|