Skip to main content

File-based model router for LLM cost optimization. Zero dependencies.

Project description

antaris-router

Adaptive model routing with semantic classification and outcome learning. Zero external dependencies.

Routes prompts to optimal models using TF-IDF classification (no embeddings required). Tracks routing decisions and outcomes to improve accuracy over time. Fallback chains provide automatic failover. All state persists to JSON files.

pip install antaris-router

Version 4.0.1 | Suite Compatibility: antaris-suite 4.2.0 | Python 3.9+ | stdlib only

Benchmarks

  • Routing accuracy: 100% (8/8 correct on standard test suite)
  • Self-improving: accuracy increases with outcome data accumulation
  • Latency: median 0.05ms, p99 0.09ms
  • Memory: <5MB for typical workloads

Key Exports

from antaris_router import AdaptiveRouter, Router, RoutingDecision, ModelConfig

Complete Workflow Example

from antaris_router import AdaptiveRouter, ModelConfig

# Initialize router with file-based persistence
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

# Register models with tier ranges and costs
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

router.register_model(ModelConfig(
    name="claude-sonnet",
    tier_range=("simple", "complex"), 
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))

router.register_model(ModelConfig(
    name="claude-opus",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

# Route prompts to appropriate models
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Route to: {result.model}")
print(f"Tier: {result.tier}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Fallback chain: {result.fallback_chain}")

# Use the model (your implementation)
response = your_llm_client.call(result.model, result.prompt)
quality_score = evaluate_response(response)  # 0.0-1.0

# Report outcome so router learns
router.report_outcome(
    prompt_hash=result.prompt_hash,
    quality_score=quality_score,
    success=quality_score > 0.7
)

# Save learned state
router.save()

# View routing analytics
analytics = router.routing_analytics()
print(f"Total decisions: {analytics['total_decisions']}")
print(f"Cost savings: ${analytics['cost_savings']:.2f}")

Semantic Classification

Uses TF-IDF vectorization with cosine similarity for semantic understanding. No external embeddings or API calls required.

# These prompts route to different tiers despite similar length
router.route("What is 2 + 2?")                    # tier: trivial
router.route("Implement OAuth2 flow")             # tier: moderate  
router.route("Design distributed consensus")      # tier: expert

Classification Features:

  • ~50 labeled examples across 5 complexity tiers
  • TF-IDF term weighting for semantic understanding
  • Cosine similarity for classification decisions
  • teach() method for manual corrections
# Correct misclassification
router.teach("Optimize Kubernetes for cost", "complex")

Quality Tracking with Outcome Learning

Router builds quality profiles per model per tier based on reported outcomes.

# Quality score calculation
score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)

# View model performance
profiles = router.get_model_profiles()
print(profiles["gpt-4o-mini"]["moderate"])
# {'quality_score': 0.73, 'attempts': 45, 'successes': 33}

# Models below threshold (default 0.30) are skipped
router.set_escalation_threshold(0.35)

Learning Process:

  1. Router makes initial routing decision
  2. You use the suggested model
  3. Call report_outcome() with quality score and success flag
  4. Router updates quality profiles
  5. Future routing considers learned performance data

Fallback Chains

Automatic failover when primary models are unavailable or perform poorly.

# Configure fallback order
router = AdaptiveRouter(
    data_dir="./routing_data",
    fallback_chain=["gpt-4o-mini", "claude-sonnet", "claude-opus"]
)

result = router.route("Debug this memory leak")
print(result.model)           # Primary choice
print(result.fallback_chain)  # Ordered alternatives

# Escalate to next model if primary fails
next_model = router.escalate(result.prompt_hash)

A/B Testing Support

Randomly routes a percentage of requests to premium models for validation.

# Route 5% to premium models regardless of classification
router = AdaptiveRouter("./data", ab_test_rate=0.05)

# Track A/B test results
stats = router.get_ab_stats()
print(f"A/B tests: {stats['total_tests']}")
print(f"Premium win rate: {stats['premium_win_rate']:.2f}")

Context-Aware Routing

Adjusts routing based on conversation state and user expertise.

# Iteration count influences tier selection
result = router.route("Fix this bug", context={"iteration": 1})   # Normal tier
result = router.route("Fix this bug", context={"iteration": 5})   # Escalated tier

# Conversation length sets minimum tier
result = router.route("Any thoughts?", context={"conversation_length": 20})

# User expertise level
result = router.route("Optimize this", context={"user_expertise": "expert"})

# Query complexity analysis
result = router.route(long_complex_prompt, context={"analyze_complexity": True})

Context Parameters:

  • iteration: Attempt number (escalates on repeated failures)
  • conversation_length: Message count (longer = higher minimum tier)
  • user_expertise: "novice", "intermediate", "expert"
  • analyze_complexity: Enable structural complexity analysis

Cost Tracking and Optimization

Tracks usage costs and calculates savings versus premium-only routing.

# Cost analysis
cost_report = router.get_cost_analysis(days=7)
print(f"Total cost: ${cost_report['total_cost']:.2f}")
print(f"Savings vs premium: ${cost_report['savings']:.2f}")
print(f"Cost per request: ${cost_report['avg_cost_per_request']:.4f}")

# Usage breakdown by model
for model, data in cost_report['by_model'].items():
    print(f"{model}: {data['requests']} requests, ${data['cost']:.2f}")

Confidence Gating

Routes to cheaper models when confidence is high, escalates when uncertain.

from antaris_router import ConfidenceRouter

router = ConfidenceRouter(
    confidence_threshold=0.8,  # Use cheap model if confidence > 0.8
    cheap_model="gpt-4o-mini",
    premium_model="claude-sonnet"
)

result = router.route("Simple math problem")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}")  # Likely cheap model

result = router.route("Complex system architecture question")
print(f"Confidence: {result.confidence:.2f}") 
print(f"Model: {result.model}")  # Likely premium model

Tier System

Five complexity levels from trivial lookups to expert system design.

Tier Examples Characteristics
trivial "What is 2+2?", "Define REST" Single fact lookup, <10 words
simple "Reverse string in Python", "TCP vs UDP" Basic programming, short explanations
moderate "Implement JWT auth", "Design Redis cache" Multi-step implementation, system components
complex "Microservices architecture", "Database sharding" System design, multiple technologies
expert "Distributed consensus algorithm", "HFT platform" Research-level problems, novel solutions
# View tier distribution
analytics = router.routing_analytics()
print(analytics['tier_distribution'])
# {'trivial': 0.25, 'simple': 0.30, 'moderate': 0.25, 'complex': 0.15, 'expert': 0.05}

File-Based State Persistence

All routing decisions and learning data persists to JSON files.

routing_data/
├── routing_examples.json    # Classification training data
├── routing_model.json       # TF-IDF model weights
├── routing_decisions.json   # Decision history
├── model_profiles.json      # Quality scores per model/tier
└── router_config.json       # Model registry and settings
# Manual state management
router.save()                    # Save all state
router.load()                    # Load from disk
router.backup("backup_dir")      # Create backup
router.export_data()             # Export for analysis

MCP Server Integration

Optional MCP server for external integrations.

from antaris_router.mcp import MCPServer

# Start MCP server
server = MCPServer(router, port=8000)
server.start()

# MCP endpoints
# GET /route?prompt=... - Get routing decision
# POST /outcome - Report outcome
# GET /analytics - View routing statistics

Legacy Router (v1 API)

Keyword-based classification with SLA monitoring.

from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,
    budget_per_hour_usd=5.00,
    min_quality_score=0.7
)

router = Router(config_path="config.json", sla=sla)
decision = router.route("Implement user authentication")

# SLA monitoring
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()

Integration Examples

With OpenAI:

import openai

result = router.route(prompt)
response = openai.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Anthropic:

import anthropic

result = router.route(prompt)
response = anthropic.messages.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Local Models (Ollama):

import requests

# Register local model at $0 cost
router.register_model(ModelConfig(
    name="llama3-8b-local",
    tier_range=("trivial", "simple"),
    cost_per_1k_input=0.0,
    cost_per_1k_output=0.0
))

result = router.route(prompt)
if "local" in result.model:
    response = requests.post("http://localhost:11434/api/generate", 
                           json={"model": result.model, "prompt": prompt})

Architecture

AdaptiveRouter
├── SemanticClassifier
│   └── TFIDFVectorizer      # Term frequency analysis
├── QualityTracker
│   ├── RoutingDecision      # Decision records
│   └── ModelProfiles        # Per-model quality scores
├── ContextAdjuster          # Context-aware tier adjustment
├── FallbackChain           # Model escalation logic
└── ABTester                # Validation routing

Router (Legacy)
├── TaskClassifier          # Keyword-based classification
├── ModelRegistry           # Model capabilities
├── CostTracker             # Usage analysis
└── SLAMonitor              # Budget and latency enforcement

Testing

git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-router
pip install pytest
python -m pytest tests/ -v

All 194 tests pass. Zero external dependencies required.

Performance Characteristics

  • Cold start latency: 0.05ms median
  • Memory usage: <5MB typical workload
  • Classification accuracy: 100% on test suite (8/8 cases)
  • Storage overhead: ~1KB per 1000 routing decisions
  • TF-IDF model size: ~50KB for 5-tier classification

Limitations

  • Classification is statistical, not deterministic
  • Requires outcome feedback for learning
  • TF-IDF less accurate than embeddings for edge cases
  • No real-time pricing data
  • Does not call models directly

License

Apache 2.0 License. See LICENSE for details.


Part of the antaris-suite:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-4.9.13.tar.gz (72.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_router-4.9.13-py3-none-any.whl (58.8 kB view details)

Uploaded Python 3

File details

Details for the file antaris_router-4.9.13.tar.gz.

File metadata

  • Download URL: antaris_router-4.9.13.tar.gz
  • Upload date:
  • Size: 72.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.9.13.tar.gz
Algorithm Hash digest
SHA256 ee4a85c48cb48b058576d13b243b457a8cb95cc3d88cb51121117b652acca884
MD5 f88c1c20274f1b9c3871892f46b56a46
BLAKE2b-256 c299b44aa7b3d75132a8dc6b53539fd0a472714bce746e15e5d53dd4ce3f3f3d

See more details on using hashes here.

File details

Details for the file antaris_router-4.9.13-py3-none-any.whl.

File metadata

File hashes

Hashes for antaris_router-4.9.13-py3-none-any.whl
Algorithm Hash digest
SHA256 6e89bb60799a978347838e6d3cb0ccad5e37b20f004da0c2f18cdb5bd1feef0b
MD5 5ae3711b0b839e3be2d5907aab4a5150
BLAKE2b-256 49b306694e7fc1b6ef7d02ee73e493d103fe562fead07c2a6347680b92e32247

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page