Skip to main content

File-based model router for LLM cost optimization. Zero dependencies.

Project description

antaris-router

Adaptive model routing with semantic classification and outcome learning. Zero external dependencies.

Routes prompts to optimal models using TF-IDF classification (no embeddings required). Tracks routing decisions and outcomes to improve accuracy over time. Fallback chains provide automatic failover. All state persists to JSON files.

pip install antaris-router

Version 4.0.1 | Suite Compatibility: antaris-suite 4.2.0 | Python 3.9+ | stdlib only

Benchmarks

  • Routing accuracy: 100% (8/8 correct on standard test suite)
  • Self-improving: accuracy increases with outcome data accumulation
  • Latency: median 0.05ms, p99 0.09ms
  • Memory: <5MB for typical workloads

Key Exports

from antaris_router import AdaptiveRouter, Router, RoutingDecision, ModelConfig

Complete Workflow Example

from antaris_router import AdaptiveRouter, ModelConfig

# Initialize router with file-based persistence
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

# Register models with tier ranges and costs
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

router.register_model(ModelConfig(
    name="claude-sonnet",
    tier_range=("simple", "complex"), 
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))

router.register_model(ModelConfig(
    name="claude-opus",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

# Route prompts to appropriate models
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Route to: {result.model}")
print(f"Tier: {result.tier}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Fallback chain: {result.fallback_chain}")

# Use the model (your implementation)
response = your_llm_client.call(result.model, result.prompt)
quality_score = evaluate_response(response)  # 0.0-1.0

# Report outcome so router learns
router.report_outcome(
    prompt_hash=result.prompt_hash,
    quality_score=quality_score,
    success=quality_score > 0.7
)

# Save learned state
router.save()

# View routing analytics
analytics = router.routing_analytics()
print(f"Total decisions: {analytics['total_decisions']}")
print(f"Cost savings: ${analytics['cost_savings']:.2f}")

Semantic Classification

Uses TF-IDF vectorization with cosine similarity for semantic understanding. No external embeddings or API calls required.

# These prompts route to different tiers despite similar length
router.route("What is 2 + 2?")                    # tier: trivial
router.route("Implement OAuth2 flow")             # tier: moderate  
router.route("Design distributed consensus")      # tier: expert

Classification Features:

  • ~50 labeled examples across 5 complexity tiers
  • TF-IDF term weighting for semantic understanding
  • Cosine similarity for classification decisions
  • teach() method for manual corrections
# Correct misclassification
router.teach("Optimize Kubernetes for cost", "complex")

Quality Tracking with Outcome Learning

Router builds quality profiles per model per tier based on reported outcomes.

# Quality score calculation
score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)

# View model performance
profiles = router.get_model_profiles()
print(profiles["gpt-4o-mini"]["moderate"])
# {'quality_score': 0.73, 'attempts': 45, 'successes': 33}

# Models below threshold (default 0.30) are skipped
router.set_escalation_threshold(0.35)

Learning Process:

  1. Router makes initial routing decision
  2. You use the suggested model
  3. Call report_outcome() with quality score and success flag
  4. Router updates quality profiles
  5. Future routing considers learned performance data

Fallback Chains

Automatic failover when primary models are unavailable or perform poorly.

# Configure fallback order
router = AdaptiveRouter(
    data_dir="./routing_data",
    fallback_chain=["gpt-4o-mini", "claude-sonnet", "claude-opus"]
)

result = router.route("Debug this memory leak")
print(result.model)           # Primary choice
print(result.fallback_chain)  # Ordered alternatives

# Escalate to next model if primary fails
next_model = router.escalate(result.prompt_hash)

A/B Testing Support

Randomly routes a percentage of requests to premium models for validation.

# Route 5% to premium models regardless of classification
router = AdaptiveRouter("./data", ab_test_rate=0.05)

# Track A/B test results
stats = router.get_ab_stats()
print(f"A/B tests: {stats['total_tests']}")
print(f"Premium win rate: {stats['premium_win_rate']:.2f}")

Context-Aware Routing

Adjusts routing based on conversation state and user expertise.

# Iteration count influences tier selection
result = router.route("Fix this bug", context={"iteration": 1})   # Normal tier
result = router.route("Fix this bug", context={"iteration": 5})   # Escalated tier

# Conversation length sets minimum tier
result = router.route("Any thoughts?", context={"conversation_length": 20})

# User expertise level
result = router.route("Optimize this", context={"user_expertise": "expert"})

# Query complexity analysis
result = router.route(long_complex_prompt, context={"analyze_complexity": True})

Context Parameters:

  • iteration: Attempt number (escalates on repeated failures)
  • conversation_length: Message count (longer = higher minimum tier)
  • user_expertise: "novice", "intermediate", "expert"
  • analyze_complexity: Enable structural complexity analysis

Cost Tracking and Optimization

Tracks usage costs and calculates savings versus premium-only routing.

# Cost analysis
cost_report = router.get_cost_analysis(days=7)
print(f"Total cost: ${cost_report['total_cost']:.2f}")
print(f"Savings vs premium: ${cost_report['savings']:.2f}")
print(f"Cost per request: ${cost_report['avg_cost_per_request']:.4f}")

# Usage breakdown by model
for model, data in cost_report['by_model'].items():
    print(f"{model}: {data['requests']} requests, ${data['cost']:.2f}")

Confidence Gating

Routes to cheaper models when confidence is high, escalates when uncertain.

from antaris_router import ConfidenceRouter

router = ConfidenceRouter(
    confidence_threshold=0.8,  # Use cheap model if confidence > 0.8
    cheap_model="gpt-4o-mini",
    premium_model="claude-sonnet"
)

result = router.route("Simple math problem")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}")  # Likely cheap model

result = router.route("Complex system architecture question")
print(f"Confidence: {result.confidence:.2f}") 
print(f"Model: {result.model}")  # Likely premium model

Tier System

Five complexity levels from trivial lookups to expert system design.

Tier Examples Characteristics
trivial "What is 2+2?", "Define REST" Single fact lookup, <10 words
simple "Reverse string in Python", "TCP vs UDP" Basic programming, short explanations
moderate "Implement JWT auth", "Design Redis cache" Multi-step implementation, system components
complex "Microservices architecture", "Database sharding" System design, multiple technologies
expert "Distributed consensus algorithm", "HFT platform" Research-level problems, novel solutions
# View tier distribution
analytics = router.routing_analytics()
print(analytics['tier_distribution'])
# {'trivial': 0.25, 'simple': 0.30, 'moderate': 0.25, 'complex': 0.15, 'expert': 0.05}

File-Based State Persistence

All routing decisions and learning data persists to JSON files.

routing_data/
├── routing_examples.json    # Classification training data
├── routing_model.json       # TF-IDF model weights
├── routing_decisions.json   # Decision history
├── model_profiles.json      # Quality scores per model/tier
└── router_config.json       # Model registry and settings
# Manual state management
router.save()                    # Save all state
router.load()                    # Load from disk
router.backup("backup_dir")      # Create backup
router.export_data()             # Export for analysis

MCP Server Integration

Optional MCP server for external integrations.

from antaris_router.mcp import MCPServer

# Start MCP server
server = MCPServer(router, port=8000)
server.start()

# MCP endpoints
# GET /route?prompt=... - Get routing decision
# POST /outcome - Report outcome
# GET /analytics - View routing statistics

Legacy Router (v1 API)

Keyword-based classification with SLA monitoring.

from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,
    budget_per_hour_usd=5.00,
    min_quality_score=0.7
)

router = Router(config_path="config.json", sla=sla)
decision = router.route("Implement user authentication")

# SLA monitoring
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()

Integration Examples

With OpenAI:

import openai

result = router.route(prompt)
response = openai.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Anthropic:

import anthropic

result = router.route(prompt)
response = anthropic.messages.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Local Models (Ollama):

import requests

# Register local model at $0 cost
router.register_model(ModelConfig(
    name="llama3-8b-local",
    tier_range=("trivial", "simple"),
    cost_per_1k_input=0.0,
    cost_per_1k_output=0.0
))

result = router.route(prompt)
if "local" in result.model:
    response = requests.post("http://localhost:11434/api/generate", 
                           json={"model": result.model, "prompt": prompt})

Architecture

AdaptiveRouter
├── SemanticClassifier
│   └── TFIDFVectorizer      # Term frequency analysis
├── QualityTracker
│   ├── RoutingDecision      # Decision records
│   └── ModelProfiles        # Per-model quality scores
├── ContextAdjuster          # Context-aware tier adjustment
├── FallbackChain           # Model escalation logic
└── ABTester                # Validation routing

Router (Legacy)
├── TaskClassifier          # Keyword-based classification
├── ModelRegistry           # Model capabilities
├── CostTracker             # Usage analysis
└── SLAMonitor              # Budget and latency enforcement

Testing

git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-router
pip install pytest
python -m pytest tests/ -v

All 194 tests pass. Zero external dependencies required.

Performance Characteristics

  • Cold start latency: 0.05ms median
  • Memory usage: <5MB typical workload
  • Classification accuracy: 100% on test suite (8/8 cases)
  • Storage overhead: ~1KB per 1000 routing decisions
  • TF-IDF model size: ~50KB for 5-tier classification

Limitations

  • Classification is statistical, not deterministic
  • Requires outcome feedback for learning
  • TF-IDF less accurate than embeddings for edge cases
  • No real-time pricing data
  • Does not call models directly

License

Apache 2.0 License. See LICENSE for details.


Part of the antaris-suite:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-4.7.0.tar.gz (72.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_router-4.7.0-py3-none-any.whl (58.8 kB view details)

Uploaded Python 3

File details

Details for the file antaris_router-4.7.0.tar.gz.

File metadata

  • Download URL: antaris_router-4.7.0.tar.gz
  • Upload date:
  • Size: 72.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.7.0.tar.gz
Algorithm Hash digest
SHA256 2fd128edf85a37eb67dfd9a2d2e9eeca4208362e1727f97e9bc15242f28d94ee
MD5 dc153149a83a684ed2c12306fa3a8642
BLAKE2b-256 ff09d320a0843bb78d9d281d0eef0f4f8b1d8ab0e8f286c84860312b82f91916

See more details on using hashes here.

File details

Details for the file antaris_router-4.7.0-py3-none-any.whl.

File metadata

  • Download URL: antaris_router-4.7.0-py3-none-any.whl
  • Upload date:
  • Size: 58.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1c02761b29257d130ddf88f41365b6ac6ec78cccb893f7869b0abbf2b694cb5
MD5 b4b41a7c149f9f1cb3822304ffcd0ba1
BLAKE2b-256 54f79eeece9a1fe1da869fd30ec1bf58b22fd8d947d0294cba98de9a13a725f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page