Skip to main content

File-based model router for LLM cost optimization. Zero dependencies.

Project description

antaris-router

Adaptive model routing with semantic classification and outcome learning. Zero external dependencies.

Routes prompts to optimal models using TF-IDF classification (no embeddings required). Tracks routing decisions and outcomes to improve accuracy over time. Fallback chains provide automatic failover. All state persists to JSON files.

pip install antaris-router

Version 4.0.1 | Suite Compatibility: antaris-suite 4.2.0 | Python 3.9+ | stdlib only

Benchmarks

  • Routing accuracy: 100% (8/8 correct on standard test suite)
  • Self-improving: accuracy increases with outcome data accumulation
  • Latency: median 0.05ms, p99 0.09ms
  • Memory: <5MB for typical workloads

Key Exports

from antaris_router import AdaptiveRouter, Router, RoutingDecision, ModelConfig

Complete Workflow Example

from antaris_router import AdaptiveRouter, ModelConfig

# Initialize router with file-based persistence
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

# Register models with tier ranges and costs
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

router.register_model(ModelConfig(
    name="claude-sonnet",
    tier_range=("simple", "complex"), 
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))

router.register_model(ModelConfig(
    name="claude-opus",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

# Route prompts to appropriate models
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Route to: {result.model}")
print(f"Tier: {result.tier}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Fallback chain: {result.fallback_chain}")

# Use the model (your implementation)
response = your_llm_client.call(result.model, result.prompt)
quality_score = evaluate_response(response)  # 0.0-1.0

# Report outcome so router learns
router.report_outcome(
    prompt_hash=result.prompt_hash,
    quality_score=quality_score,
    success=quality_score > 0.7
)

# Save learned state
router.save()

# View routing analytics
analytics = router.routing_analytics()
print(f"Total decisions: {analytics['total_decisions']}")
print(f"Cost savings: ${analytics['cost_savings']:.2f}")

Semantic Classification

Uses TF-IDF vectorization with cosine similarity for semantic understanding. No external embeddings or API calls required.

# These prompts route to different tiers despite similar length
router.route("What is 2 + 2?")                    # tier: trivial
router.route("Implement OAuth2 flow")             # tier: moderate  
router.route("Design distributed consensus")      # tier: expert

Classification Features:

  • ~50 labeled examples across 5 complexity tiers
  • TF-IDF term weighting for semantic understanding
  • Cosine similarity for classification decisions
  • teach() method for manual corrections
# Correct misclassification
router.teach("Optimize Kubernetes for cost", "complex")

Quality Tracking with Outcome Learning

Router builds quality profiles per model per tier based on reported outcomes.

# Quality score calculation
score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)

# View model performance
profiles = router.get_model_profiles()
print(profiles["gpt-4o-mini"]["moderate"])
# {'quality_score': 0.73, 'attempts': 45, 'successes': 33}

# Models below threshold (default 0.30) are skipped
router.set_escalation_threshold(0.35)

Learning Process:

  1. Router makes initial routing decision
  2. You use the suggested model
  3. Call report_outcome() with quality score and success flag
  4. Router updates quality profiles
  5. Future routing considers learned performance data

Fallback Chains

Automatic failover when primary models are unavailable or perform poorly.

# Configure fallback order
router = AdaptiveRouter(
    data_dir="./routing_data",
    fallback_chain=["gpt-4o-mini", "claude-sonnet", "claude-opus"]
)

result = router.route("Debug this memory leak")
print(result.model)           # Primary choice
print(result.fallback_chain)  # Ordered alternatives

# Escalate to next model if primary fails
next_model = router.escalate(result.prompt_hash)

A/B Testing Support

Randomly routes a percentage of requests to premium models for validation.

# Route 5% to premium models regardless of classification
router = AdaptiveRouter("./data", ab_test_rate=0.05)

# Track A/B test results
stats = router.get_ab_stats()
print(f"A/B tests: {stats['total_tests']}")
print(f"Premium win rate: {stats['premium_win_rate']:.2f}")

Context-Aware Routing

Adjusts routing based on conversation state and user expertise.

# Iteration count influences tier selection
result = router.route("Fix this bug", context={"iteration": 1})   # Normal tier
result = router.route("Fix this bug", context={"iteration": 5})   # Escalated tier

# Conversation length sets minimum tier
result = router.route("Any thoughts?", context={"conversation_length": 20})

# User expertise level
result = router.route("Optimize this", context={"user_expertise": "expert"})

# Query complexity analysis
result = router.route(long_complex_prompt, context={"analyze_complexity": True})

Context Parameters:

  • iteration: Attempt number (escalates on repeated failures)
  • conversation_length: Message count (longer = higher minimum tier)
  • user_expertise: "novice", "intermediate", "expert"
  • analyze_complexity: Enable structural complexity analysis

Cost Tracking and Optimization

Tracks usage costs and calculates savings versus premium-only routing.

# Cost analysis
cost_report = router.get_cost_analysis(days=7)
print(f"Total cost: ${cost_report['total_cost']:.2f}")
print(f"Savings vs premium: ${cost_report['savings']:.2f}")
print(f"Cost per request: ${cost_report['avg_cost_per_request']:.4f}")

# Usage breakdown by model
for model, data in cost_report['by_model'].items():
    print(f"{model}: {data['requests']} requests, ${data['cost']:.2f}")

Confidence Gating

Routes to cheaper models when confidence is high, escalates when uncertain.

from antaris_router import ConfidenceRouter

router = ConfidenceRouter(
    confidence_threshold=0.8,  # Use cheap model if confidence > 0.8
    cheap_model="gpt-4o-mini",
    premium_model="claude-sonnet"
)

result = router.route("Simple math problem")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}")  # Likely cheap model

result = router.route("Complex system architecture question")
print(f"Confidence: {result.confidence:.2f}") 
print(f"Model: {result.model}")  # Likely premium model

Tier System

Five complexity levels from trivial lookups to expert system design.

Tier Examples Characteristics
trivial "What is 2+2?", "Define REST" Single fact lookup, <10 words
simple "Reverse string in Python", "TCP vs UDP" Basic programming, short explanations
moderate "Implement JWT auth", "Design Redis cache" Multi-step implementation, system components
complex "Microservices architecture", "Database sharding" System design, multiple technologies
expert "Distributed consensus algorithm", "HFT platform" Research-level problems, novel solutions
# View tier distribution
analytics = router.routing_analytics()
print(analytics['tier_distribution'])
# {'trivial': 0.25, 'simple': 0.30, 'moderate': 0.25, 'complex': 0.15, 'expert': 0.05}

File-Based State Persistence

All routing decisions and learning data persists to JSON files.

routing_data/
├── routing_examples.json    # Classification training data
├── routing_model.json       # TF-IDF model weights
├── routing_decisions.json   # Decision history
├── model_profiles.json      # Quality scores per model/tier
└── router_config.json       # Model registry and settings
# Manual state management
router.save()                    # Save all state
router.load()                    # Load from disk
router.backup("backup_dir")      # Create backup
router.export_data()             # Export for analysis

MCP Server Integration

Optional MCP server for external integrations.

from antaris_router.mcp import MCPServer

# Start MCP server
server = MCPServer(router, port=8000)
server.start()

# MCP endpoints
# GET /route?prompt=... - Get routing decision
# POST /outcome - Report outcome
# GET /analytics - View routing statistics

Legacy Router (v1 API)

Keyword-based classification with SLA monitoring.

from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,
    budget_per_hour_usd=5.00,
    min_quality_score=0.7
)

router = Router(config_path="config.json", sla=sla)
decision = router.route("Implement user authentication")

# SLA monitoring
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()

Integration Examples

With OpenAI:

import openai

result = router.route(prompt)
response = openai.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Anthropic:

import anthropic

result = router.route(prompt)
response = anthropic.messages.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Local Models (Ollama):

import requests

# Register local model at $0 cost
router.register_model(ModelConfig(
    name="llama3-8b-local",
    tier_range=("trivial", "simple"),
    cost_per_1k_input=0.0,
    cost_per_1k_output=0.0
))

result = router.route(prompt)
if "local" in result.model:
    response = requests.post("http://localhost:11434/api/generate", 
                           json={"model": result.model, "prompt": prompt})

Architecture

AdaptiveRouter
├── SemanticClassifier
│   └── TFIDFVectorizer      # Term frequency analysis
├── QualityTracker
│   ├── RoutingDecision      # Decision records
│   └── ModelProfiles        # Per-model quality scores
├── ContextAdjuster          # Context-aware tier adjustment
├── FallbackChain           # Model escalation logic
└── ABTester                # Validation routing

Router (Legacy)
├── TaskClassifier          # Keyword-based classification
├── ModelRegistry           # Model capabilities
├── CostTracker             # Usage analysis
└── SLAMonitor              # Budget and latency enforcement

Testing

git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-router
pip install pytest
python -m pytest tests/ -v

All 194 tests pass. Zero external dependencies required.

Performance Characteristics

  • Cold start latency: 0.05ms median
  • Memory usage: <5MB typical workload
  • Classification accuracy: 100% on test suite (8/8 cases)
  • Storage overhead: ~1KB per 1000 routing decisions
  • TF-IDF model size: ~50KB for 5-tier classification

Limitations

  • Classification is statistical, not deterministic
  • Requires outcome feedback for learning
  • TF-IDF less accurate than embeddings for edge cases
  • No real-time pricing data
  • Does not call models directly

License

Apache 2.0 License. See LICENSE for details.


Part of the antaris-suite:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-4.6.0.tar.gz (72.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_router-4.6.0-py3-none-any.whl (58.8 kB view details)

Uploaded Python 3

File details

Details for the file antaris_router-4.6.0.tar.gz.

File metadata

  • Download URL: antaris_router-4.6.0.tar.gz
  • Upload date:
  • Size: 72.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.6.0.tar.gz
Algorithm Hash digest
SHA256 14a6046ddf007884fd1f8ea8e7d922625ffcb2a56901de7ac3824d1c198ad9f8
MD5 a170aa141b12557f73ac1c401069967e
BLAKE2b-256 803e03e319a7f26820e5bf6d2a2f5b3090169ff278d4f424ece01526e8268633

See more details on using hashes here.

File details

Details for the file antaris_router-4.6.0-py3-none-any.whl.

File metadata

  • Download URL: antaris_router-4.6.0-py3-none-any.whl
  • Upload date:
  • Size: 58.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3b1a338141aa6727a7d022a6372a73d8f549c4deb36bbb7322fbbeadc3faeb7
MD5 10cdd0aaf803e5a414663e2421dfa58a
BLAKE2b-256 fcce758d91000d302d89a70b9a2b77259ee37067b788351b01d5695f9ed13533

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page