File-based model router for LLM cost optimization. Zero dependencies.

These details have not been verified by PyPI

Project links

Project description

antaris-router

Adaptive model routing with semantic classification and outcome learning. Zero external dependencies.

Routes prompts to optimal models using TF-IDF classification (no embeddings required). Tracks routing decisions and outcomes to improve accuracy over time. Fallback chains provide automatic failover. All state persists to JSON files.

pip install antaris-router

Version 4.0.1 | Suite Compatibility: antaris-suite 4.2.0 | Python 3.9+ | stdlib only

Benchmarks

Routing accuracy: 100% (8/8 correct on standard test suite)
Self-improving: accuracy increases with outcome data accumulation
Latency: median 0.05ms, p99 0.09ms
Memory: <5MB for typical workloads

Key Exports

from antaris_router import AdaptiveRouter, Router, RoutingDecision, ModelConfig

Complete Workflow Example

from antaris_router import AdaptiveRouter, ModelConfig

# Initialize router with file-based persistence
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

# Register models with tier ranges and costs
router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

router.register_model(ModelConfig(
    name="claude-sonnet",
    tier_range=("simple", "complex"), 
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))

router.register_model(ModelConfig(
    name="claude-opus",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

# Route prompts to appropriate models
result = router.route("Implement a distributed task queue with priority scheduling")
print(f"Route to: {result.model}")
print(f"Tier: {result.tier}")
print(f"Confidence: {result.confidence:.2f}")
print(f"Fallback chain: {result.fallback_chain}")

# Use the model (your implementation)
response = your_llm_client.call(result.model, result.prompt)
quality_score = evaluate_response(response)  # 0.0-1.0

# Report outcome so router learns
router.report_outcome(
    prompt_hash=result.prompt_hash,
    quality_score=quality_score,
    success=quality_score > 0.7
)

# Save learned state
router.save()

# View routing analytics
analytics = router.routing_analytics()
print(f"Total decisions: {analytics['total_decisions']}")
print(f"Cost savings: ${analytics['cost_savings']:.2f}")

Semantic Classification

Uses TF-IDF vectorization with cosine similarity for semantic understanding. No external embeddings or API calls required.

# These prompts route to different tiers despite similar length
router.route("What is 2 + 2?")                    # tier: trivial
router.route("Implement OAuth2 flow")             # tier: moderate  
router.route("Design distributed consensus")      # tier: expert

Classification Features:

~50 labeled examples across 5 complexity tiers
TF-IDF term weighting for semantic understanding
Cosine similarity for classification decisions
teach() method for manual corrections

# Correct misclassification
router.teach("Optimize Kubernetes for cost", "complex")

Quality Tracking with Outcome Learning

Router builds quality profiles per model per tier based on reported outcomes.

# Quality score calculation
score = 0.4 × success_rate + 0.4 × avg_quality + 0.2 × (1 - escalation_rate)

# View model performance
profiles = router.get_model_profiles()
print(profiles["gpt-4o-mini"]["moderate"])
# {'quality_score': 0.73, 'attempts': 45, 'successes': 33}

# Models below threshold (default 0.30) are skipped
router.set_escalation_threshold(0.35)

Learning Process:

Router makes initial routing decision
You use the suggested model
Call report_outcome() with quality score and success flag
Router updates quality profiles
Future routing considers learned performance data

Fallback Chains

Automatic failover when primary models are unavailable or perform poorly.

# Configure fallback order
router = AdaptiveRouter(
    data_dir="./routing_data",
    fallback_chain=["gpt-4o-mini", "claude-sonnet", "claude-opus"]
)

result = router.route("Debug this memory leak")
print(result.model)           # Primary choice
print(result.fallback_chain)  # Ordered alternatives

# Escalate to next model if primary fails
next_model = router.escalate(result.prompt_hash)

A/B Testing Support

Randomly routes a percentage of requests to premium models for validation.

# Route 5% to premium models regardless of classification
router = AdaptiveRouter("./data", ab_test_rate=0.05)

# Track A/B test results
stats = router.get_ab_stats()
print(f"A/B tests: {stats['total_tests']}")
print(f"Premium win rate: {stats['premium_win_rate']:.2f}")

Context-Aware Routing

Adjusts routing based on conversation state and user expertise.

# Iteration count influences tier selection
result = router.route("Fix this bug", context={"iteration": 1})   # Normal tier
result = router.route("Fix this bug", context={"iteration": 5})   # Escalated tier

# Conversation length sets minimum tier
result = router.route("Any thoughts?", context={"conversation_length": 20})

# User expertise level
result = router.route("Optimize this", context={"user_expertise": "expert"})

# Query complexity analysis
result = router.route(long_complex_prompt, context={"analyze_complexity": True})

Context Parameters:

iteration: Attempt number (escalates on repeated failures)
conversation_length: Message count (longer = higher minimum tier)
user_expertise: "novice", "intermediate", "expert"
analyze_complexity: Enable structural complexity analysis

Cost Tracking and Optimization

Tracks usage costs and calculates savings versus premium-only routing.

# Cost analysis
cost_report = router.get_cost_analysis(days=7)
print(f"Total cost: ${cost_report['total_cost']:.2f}")
print(f"Savings vs premium: ${cost_report['savings']:.2f}")
print(f"Cost per request: ${cost_report['avg_cost_per_request']:.4f}")

# Usage breakdown by model
for model, data in cost_report['by_model'].items():
    print(f"{model}: {data['requests']} requests, ${data['cost']:.2f}")

Confidence Gating

Routes to cheaper models when confidence is high, escalates when uncertain.

from antaris_router import ConfidenceRouter

router = ConfidenceRouter(
    confidence_threshold=0.8,  # Use cheap model if confidence > 0.8
    cheap_model="gpt-4o-mini",
    premium_model="claude-sonnet"
)

result = router.route("Simple math problem")
print(f"Confidence: {result.confidence:.2f}")
print(f"Model: {result.model}")  # Likely cheap model

result = router.route("Complex system architecture question")
print(f"Confidence: {result.confidence:.2f}") 
print(f"Model: {result.model}")  # Likely premium model

Tier System

Five complexity levels from trivial lookups to expert system design.

Tier	Examples	Characteristics
trivial	"What is 2+2?", "Define REST"	Single fact lookup, <10 words
simple	"Reverse string in Python", "TCP vs UDP"	Basic programming, short explanations
moderate	"Implement JWT auth", "Design Redis cache"	Multi-step implementation, system components
complex	"Microservices architecture", "Database sharding"	System design, multiple technologies
expert	"Distributed consensus algorithm", "HFT platform"	Research-level problems, novel solutions

# View tier distribution
analytics = router.routing_analytics()
print(analytics['tier_distribution'])
# {'trivial': 0.25, 'simple': 0.30, 'moderate': 0.25, 'complex': 0.15, 'expert': 0.05}

File-Based State Persistence

All routing decisions and learning data persists to JSON files.

routing_data/
├── routing_examples.json    # Classification training data
├── routing_model.json       # TF-IDF model weights
├── routing_decisions.json   # Decision history
├── model_profiles.json      # Quality scores per model/tier
└── router_config.json       # Model registry and settings

# Manual state management
router.save()                    # Save all state
router.load()                    # Load from disk
router.backup("backup_dir")      # Create backup
router.export_data()             # Export for analysis

MCP Server Integration

Optional MCP server for external integrations.

from antaris_router.mcp import MCPServer

# Start MCP server
server = MCPServer(router, port=8000)
server.start()

# MCP endpoints
# GET /route?prompt=... - Get routing decision
# POST /outcome - Report outcome
# GET /analytics - View routing statistics

Legacy Router (v1 API)

Keyword-based classification with SLA monitoring.

from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,
    budget_per_hour_usd=5.00,
    min_quality_score=0.7
)

router = Router(config_path="config.json", sla=sla)
decision = router.route("Implement user authentication")

# SLA monitoring
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()

Integration Examples

With OpenAI:

import openai

result = router.route(prompt)
response = openai.chat.completions.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Anthropic:

import anthropic

result = router.route(prompt)
response = anthropic.messages.create(
    model=result.model,
    messages=[{"role": "user", "content": prompt}]
)
router.report_outcome(result.prompt_hash, evaluate(response), True)

With Local Models (Ollama):

import requests

# Register local model at $0 cost
router.register_model(ModelConfig(
    name="llama3-8b-local",
    tier_range=("trivial", "simple"),
    cost_per_1k_input=0.0,
    cost_per_1k_output=0.0
))

result = router.route(prompt)
if "local" in result.model:
    response = requests.post("http://localhost:11434/api/generate", 
                           json={"model": result.model, "prompt": prompt})

Architecture

AdaptiveRouter
├── SemanticClassifier
│   └── TFIDFVectorizer      # Term frequency analysis
├── QualityTracker
│   ├── RoutingDecision      # Decision records
│   └── ModelProfiles        # Per-model quality scores
├── ContextAdjuster          # Context-aware tier adjustment
├── FallbackChain           # Model escalation logic
└── ABTester                # Validation routing

Router (Legacy)
├── TaskClassifier          # Keyword-based classification
├── ModelRegistry           # Model capabilities
├── CostTracker             # Usage analysis
└── SLAMonitor              # Budget and latency enforcement

Testing

git clone https://github.com/Antaris-Analytics-LLC/antaris-suite.git
cd antaris-router
pip install pytest
python -m pytest tests/ -v

All 194 tests pass. Zero external dependencies required.

Performance Characteristics

Cold start latency: 0.05ms median
Memory usage: <5MB typical workload
Classification accuracy: 100% on test suite (8/8 cases)
Storage overhead: ~1KB per 1000 routing decisions
TF-IDF model size: ~50KB for 5-tier classification

Limitations

Classification is statistical, not deterministic
Requires outcome feedback for learning
TF-IDF less accurate than embeddings for edge cases
No real-time pricing data
Does not call models directly

License

Apache 2.0 License. See LICENSE for details.

Part of the antaris-suite:

antaris-memory - Persistent memory for agents
antaris-guard - Security and prompt injection detection
antaris-context - Context window optimization

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.0.1

Mar 10, 2026

4.9.20

Mar 8, 2026

4.9.18

Mar 7, 2026

4.9.17

Mar 7, 2026

4.9.16

Mar 6, 2026

4.9.15

Mar 6, 2026

4.9.14

Mar 5, 2026

This version

4.9.13

Mar 5, 2026

4.9.12

Mar 5, 2026

4.9.11

Mar 5, 2026

4.9.10

Mar 4, 2026

4.9.5

Mar 3, 2026

4.9.4

Mar 3, 2026

4.9.3

Mar 3, 2026

4.9.2

Mar 3, 2026

4.9.1

Mar 3, 2026

4.9.0

Mar 3, 2026

4.8.0

Mar 3, 2026

4.7.1

Mar 3, 2026

4.7.0

Mar 3, 2026

4.6.8

Mar 2, 2026

4.6.6

Mar 2, 2026

4.6.5

Mar 2, 2026

4.6.0

Mar 2, 2026

4.5.3

Mar 1, 2026

4.5.2

Mar 1, 2026

4.2.0

Feb 27, 2026

4.1.0

Feb 21, 2026

4.0.3

Feb 26, 2026

4.0.1

Feb 23, 2026

4.0.0

Feb 21, 2026

3.3.0

Feb 21, 2026

3.0.1

Feb 20, 2026

3.0.0

Feb 19, 2026

2.0.0

Feb 16, 2026

0.3.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-4.9.13.tar.gz (72.9 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

antaris_router-4.9.13-py3-none-any.whl (58.8 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file antaris_router-4.9.13.tar.gz.

File metadata

Download URL: antaris_router-4.9.13.tar.gz
Upload date: Mar 5, 2026
Size: 72.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.9.13.tar.gz
Algorithm	Hash digest
SHA256	`ee4a85c48cb48b058576d13b243b457a8cb95cc3d88cb51121117b652acca884`
MD5	`f88c1c20274f1b9c3871892f46b56a46`
BLAKE2b-256	`c299b44aa7b3d75132a8dc6b53539fd0a472714bce746e15e5d53dd4ce3f3f3d`

See more details on using hashes here.

File details

Details for the file antaris_router-4.9.13-py3-none-any.whl.

File metadata

Download URL: antaris_router-4.9.13-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 58.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-4.9.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e89bb60799a978347838e6d3cb0ccad5e37b20f004da0c2f18cdb5bd1feef0b`
MD5	`5ae3711b0b839e3be2d5907aab4a5150`
BLAKE2b-256	`49b306694e7fc1b6ef7d02ee73e493d103fe562fead07c2a6347680b92e32247`

See more details on using hashes here.

antaris-router 4.9.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

antaris-router

Benchmarks

Key Exports

Complete Workflow Example

Semantic Classification

Quality Tracking with Outcome Learning

Fallback Chains

A/B Testing Support

Context-Aware Routing

Cost Tracking and Optimization

Confidence Gating

Tier System

File-Based State Persistence

MCP Server Integration

Legacy Router (v1 API)

Integration Examples

Architecture

Testing

Performance Characteristics

Limitations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes