File-based model router for LLM cost optimization. Zero dependencies.
Project description
antaris-router โก
Adaptive LLM model routing for cost optimization โ zero dependencies, stdlib only.
Route every prompt to the right model at the right cost. antaris-router classifies task complexity, selects the optimal model from your registry, enforces SLAs, tracks provider health, and continuously improves routing quality through outcome feedback.
๐ฆ Installation
pip install antaris-router
Version: 4.9.20
Dependencies: None โ pure Python stdlib only.
๐บ๏ธ Table of Contents
- Why antaris-router?
- Tier System
- Quick Start
- v2.0 API โ AdaptiveRouter (Semantic)
- v1.0 API โ Router (Keyword-based)
- RoutingDecision Fields
- Explainability โ explain()
- Confidence-Gated Escalation
- SLA Configuration & Enforcement
- Provider Health Tracking
- A/B Testing
- Cost Forecasting
- Cost Tracking & Analytics
- Model Registry
- SemanticClassifier (v2.0)
- QualityTracker (v2.0)
- ClassificationResult & Signals
- Full API Reference
- Complete Exports
- Migration: v1.0 โ v2.0
๐ฏ Why antaris-router?
LLM costs are asymmetric. A one-line question routed to claude-opus wastes 50โ100ร what it needs to. antaris-router fixes that:
| Without routing | With antaris-router |
|---|---|
| Every request โ one expensive model | Each request โ cheapest capable model |
| No visibility into cost breakdown | Real-time cost tracking + forecasting |
| Silent model failures | Provider health tracking + auto-failover |
| Blind prompt-to-model mapping | TF-IDF semantic classification (v2.0) |
| No quality signal loop | Outcome feedback โ self-improving routing |
๐ Tier System
antaris-router classifies every prompt into one of five complexity tiers. Each tier maps to a cost bracket, ensuring you always pay proportionally to task complexity.
| Tier | Char Range | Typical Tasks | Strategy |
|---|---|---|---|
trivial |
โค 50 chars | Simple Q&A, single-word lookups | Cheapest model |
simple |
50โ200 chars | Basic tasks, short explanations | Low-cost model |
moderate |
200โ1,000 chars | Standard tasks, multi-step answers | Mid-tier model |
complex |
1,000โ3,000 chars | Analysis, architecture, code review | Powerful model |
expert |
3,000+ chars | Highest complexity, long-form reasoning | Most capable model |
Tier boundaries are based on character count combined with keyword signals, code detection, and structural complexity analysis. The v2.0 AdaptiveRouter additionally uses TF-IDF semantic classification and improves tier accuracy over time through outcome feedback.
โก Quick Start
v2.0 โ AdaptiveRouter (recommended for new projects)
from antaris_router import AdaptiveRouter, ModelConfig
router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)
router.register_model(ModelConfig(
name="gpt-4o-mini",
tier_range=("trivial", "moderate"),
cost_per_1k_input=0.00015,
cost_per_1k_output=0.0006,
))
router.register_model(ModelConfig(
name="claude-sonnet-4-6",
tier_range=("simple", "complex"),
cost_per_1k_input=0.003,
cost_per_1k_output=0.015,
))
router.register_model(ModelConfig(
name="claude-opus-4-6",
tier_range=("complex", "expert"),
cost_per_1k_input=0.015,
cost_per_1k_output=0.075,
))
result = router.route("Implement a distributed task queue with priority scheduling")
print(result.model) # "claude-sonnet-4-6"
print(result.tier) # "complex"
print(result.confidence) # 0.87
print(result.estimated_cost) # 0.00234
# Feed outcome back to improve future routing
router.report_outcome(result.prompt_hash, quality_score=0.9, success=True)
# Session analytics
analytics = router.get_analytics()
print(analytics)
# {
# "total_routed": 42,
# "tier_distribution": {"trivial": 5, "simple": 12, "moderate": 15, "complex": 8, "expert": 2},
# "avg_quality": 0.88,
# "model_usage": {"gpt-4o-mini": 17, "claude-sonnet-4-6": 21, "claude-opus-4-6": 4},
# "cost_savings": 0.142
# }
v1.0 โ Router (production-proven, keyword-based)
from antaris_router import Router
router = Router(enable_cost_tracking=True)
decision = router.route("Explain async/await in Python with examples")
print(decision.model) # "claude-sonnet-4-6"
print(decision.tier) # "moderate"
print(decision.confidence) # 0.82
print(router.explain(decision))
๐ค v2.0 API โ AdaptiveRouter (Semantic, Self-Improving)
AdaptiveRouter is the next-generation router. It uses TF-IDF vectorization for semantic classification, learns from outcome feedback, and persists routing state across sessions.
Constructor
router = AdaptiveRouter(
workspace="./routing_data", # directory for persisted state
ab_test_rate=0.05, # fraction of routes used for A/B exploration (0.0โ1.0)
)
| Parameter | Type | Default | Description |
|---|---|---|---|
workspace |
str |
required | Path to directory for persisted routing data, quality history, and TF-IDF model |
ab_test_rate |
float |
0.05 |
Fraction of routing decisions used for A/B exploration. Set 0.0 to disable. |
The workspace directory is created automatically if it doesn't exist.
register_model(config: ModelConfig)
Register a model with its tier range and cost parameters.
router.register_model(ModelConfig(
name="gpt-4o-mini",
tier_range=("trivial", "moderate"), # (min_tier, max_tier)
cost_per_1k_input=0.00015,
cost_per_1k_output=0.0006,
))
ModelConfig fields:
| Field | Type | Description |
|---|---|---|
name |
str |
Model identifier (e.g. "gpt-4o-mini") |
tier_range |
Tuple[str, str] |
(min_tier, max_tier) โ tiers this model handles |
cost_per_1k_input |
float |
Cost in USD per 1K input tokens |
cost_per_1k_output |
float |
Cost in USD per 1K output tokens |
Tier range semantics: A model registered with tier_range=("simple", "complex") is eligible for simple, moderate, and complex prompts. The router selects the lowest-cost eligible model for each tier.
route(prompt: str) โ RoutingResult
Classify the prompt and select the optimal model.
result = router.route("Summarize the following contract clause: ...")
Returns RoutingResult:
| Field | Type | Description |
|---|---|---|
model |
str |
Selected model name |
tier |
str |
Classified complexity tier |
confidence |
float |
Classification confidence (0.0โ1.0) |
prompt_hash |
str |
SHA-256 hash of prompt (used for outcome feedback) |
estimated_cost |
float |
Estimated cost in USD for this request |
report_outcome(prompt_hash: str, quality_score: float, success: bool)
Feed outcome back to the router to improve future routing decisions. This is the core self-improvement loop.
router.report_outcome(
result.prompt_hash,
quality_score=0.9, # 0.0โ1.0, how good the model's response was
success=True, # whether the request succeeded at all
)
The router uses outcome history to:
- Detect tier misclassifications (e.g. a
moderateprompt that consistently gets poor quality โ escalate tocomplex) - Track per-model quality trends across tier assignments
- Improve TF-IDF weights over time
get_analytics() โ Dict
Aggregate routing stats for the current session.
analytics = router.get_analytics()
# {
# "total_routed": int,
# "tier_distribution": {"trivial": int, "simple": int, ...},
# "avg_quality": float,
# "model_usage": {"model-name": int, ...},
# "cost_savings": float # USD saved vs always using most capable model
# }
๐ง v1.0 API โ Router (Keyword-based, Production)
Router is the production-proven keyword-based router. Fully featured with SLA enforcement, confidence-gated escalation, provider health tracking, A/B testing, and cost forecasting. Use this for stability; use AdaptiveRouter for semantic accuracy.
Constructor
from antaris_router import Router, SLAConfig
router = Router(
config_path=None, # optional path to JSON config file
enable_cost_tracking=True, # track per-model cost usage
low_confidence_threshold=0.0, # 0.0 = never escalate (default)
escalation_model=None, # model to escalate to when confidence is low
escalation_strategy="always", # "always" | "log_only" | "ask"
sla=None, # SLAConfig instance
fallback_chain=None, # ordered list of fallback model names
classifier=None, # inject custom classifier (e.g. SemanticClassifier)
)
| Parameter | Type | Default | Description |
|---|---|---|---|
config_path |
str | None |
None |
Path to JSON config file. If None, uses built-in defaults. |
enable_cost_tracking |
bool |
True |
Track cost per model, per session. Required for cost_report(), savings_estimate(). |
low_confidence_threshold |
float |
0.0 |
Confidence below this triggers escalation. 0.0 = disabled. |
escalation_model |
str | None |
None |
Model name to escalate to on low confidence. |
escalation_strategy |
str |
"always" |
Escalation behavior: "always" swaps model, "log_only" logs but keeps model, "ask" signals user to confirm. |
sla |
SLAConfig | None |
None |
SLA constraints to enforce during routing. |
fallback_chain |
List[str] | None |
None |
Ordered fallback models for auto_scale=True. |
classifier |
object | None |
None |
Custom classifier to inject (e.g. SemanticClassifier). Replaces built-in keyword classifier. |
route(...) โ RoutingDecision
Route a prompt to the optimal model.
decision = router.route(
prompt="text to route", # required
context=None, # optional: additional context dict
prefer=None, # preferred provider: "claude" | "openai" | etc.
min_tier=None, # minimum tier floor: "simple"|"moderate"|"complex"|"expert"
capability=None, # required capability: "vision"|"code"|etc.
estimate_tokens=(100, 50), # (input_tokens, output_tokens) for cost estimation
ab_test=None, # A/B test config from create_ab_test()
prefer_healthy=False, # skip degraded/rate-limited providers
auto_scale=False, # fall back through fallback_chain if primary is degraded or over-budget
)
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | The prompt text to classify and route |
context |
dict | None |
None |
Additional context for routing decisions |
prefer |
str | None |
None |
Preferred provider name. Router respects this if an eligible model exists. |
min_tier |
str | None |
None |
Force minimum complexity tier. E.g. "complex" ensures at least a complex-tier model. |
capability |
str | None |
None |
Required model capability. Only models with this capability are considered. |
estimate_tokens |
Tuple[int, int] |
(100, 50) |
(input_tokens, output_tokens) used for cost estimation in decision.estimated_cost. |
ab_test |
ABTest | None |
None |
A/B test object from create_ab_test(). Enables variant-based routing. |
prefer_healthy |
bool |
False |
If True, degraded or down providers are skipped. Falls through to next eligible model. |
auto_scale |
bool |
False |
If True and primary model is degraded or over-budget, routes through fallback_chain in order. |
๐ RoutingDecision Fields
Every router.route() call returns a RoutingDecision object with full decision transparency.
decision = router.route("Design a microservices platform for high-throughput event processing")
decision.model # str: "claude-sonnet-4-6"
decision.provider # str: "anthropic"
decision.tier # str: "complex"
decision.confidence # float: 0.85
decision.reasoning # List[str]: ["Input length 1,250 chars โ complex range", ...]
decision.estimated_cost # float: 0.00525 (USD)
decision.fallback_models # List[str]: ["claude-opus-4-6", "gpt-4o"]
decision.classification # ClassificationResult object
decision.confidence_basis # str: "keyword_density" | "composite" | "rule_based"
decision.evidence # List[str]: human-readable decision signals
decision.escalated # bool: True if escalation changed the model
decision.original_confidence # float: pre-escalation confidence (if escalated)
decision.escalation_reason # str: why escalation triggered (if escalated)
decision.ab_variant # str: "a" | "b" if A/B test active
decision.explanation # str: full human-readable explanation
decision.supports_streaming # bool: whether selected model supports streaming
decision.sla_compliant # bool: whether decision satisfies all SLA constraints
decision.sla_breaches # List[str]: e.g. ["latency_exceeded", "budget_exceeded"]
decision.sla_adjustments # List[str]: e.g. ["routed_to_cheaper_model_due_to_budget_sla"]
decision.selected_model # property alias for decision.model
decision.to_dict() # Dict: all fields serialized to a plain dict
Complete Field Reference
| Field | Type | Description |
|---|---|---|
model |
str |
Name of the selected model |
provider |
str |
Provider name: "anthropic", "openai", etc. |
tier |
str |
Complexity tier: trivial/simple/moderate/complex/expert |
confidence |
float |
Classification confidence 0.0โ1.0 |
reasoning |
List[str] |
Ordered list of reasons why this model was chosen |
estimated_cost |
float |
Estimated USD cost for this specific request |
fallback_models |
List[str] |
Ordered list of alternative models considered |
classification |
ClassificationResult |
Raw classification output including signals |
confidence_basis |
str |
How confidence was computed: "keyword_density", "composite", "rule_based" |
evidence |
List[str] |
Human-readable signals that drove the decision |
escalated |
bool |
True if escalation logic overrode the original model selection |
original_confidence |
float |
Confidence before escalation (populated only when escalated=True) |
escalation_reason |
str |
Human-readable reason escalation triggered |
ab_variant |
str |
"a" or "b" when an A/B test is active, "" otherwise |
explanation |
str |
Full plain-English explanation of the routing decision |
supports_streaming |
bool |
Whether the selected model supports streaming responses |
sla_compliant |
bool |
Whether the decision satisfies all active SLA constraints |
sla_breaches |
List[str] |
Which SLA constraints were breached (if any) |
sla_adjustments |
List[str] |
Routing adjustments made to satisfy SLA constraints |
selected_model |
property | Alias for model |
to_dict() Output
d = decision.to_dict()
# {
# "model": "claude-sonnet-4-6",
# "provider": "anthropic",
# "tier": "complex",
# "confidence": 0.85,
# "reasoning": [...],
# "estimated_cost": 0.00525,
# "fallback_models": [...],
# "confidence_basis": "keyword_density",
# "evidence": [...],
# "escalated": False,
# "original_confidence": 0.0,
# "escalation_reason": "",
# "ab_variant": "",
# "explanation": "Model selected: claude-sonnet-4-6 ...",
# "supports_streaming": True,
# "sla_compliant": True,
# "sla_breaches": [],
# "sla_adjustments": []
# }
๐ Explainability โ explain()
Every routing decision can be explained in plain English. Use explain() for debugging, auditing, or displaying routing logic to users.
explanation = router.explain(decision)
print(explanation)
Example output:
Model selected: claude-sonnet-4-6 (confidence: 85%)
Basis: keyword density
Reasoning: Input classified as 'complex' task (85% confidence). Length 1,250 chars falls in
complex range (1,000โ3,000). Strong signal keywords detected: "microservices", "architecture",
"distributed".
Estimated cost: $0.003000 per 1K tokens (this request: $0.005250).
Evidence: length: 1250 chars โ complex range (โค3000), keyword match: 3 'complex'-tier keywords
(microservices, architecture, distributed), structural_complexity: 2
Alternatives considered: claude-opus-4-6 (more capable, 5.0x cost), gpt-4o-mini (cheaper, reduced quality)
When escalation occurred:
Model selected: claude-opus-4-6 (confidence: 45%)
[Escalated from original confidence 45%: Low confidence below threshold 0.60. Original model: claude-sonnet-4-6]
Basis: composite
Reasoning: Input classified as 'moderate' task (45% confidence)...
explain() sections:
| Section | Always shown | Description |
|---|---|---|
Model selected: X (confidence: Y%) |
โ | Selected model and final confidence |
[Escalated from ...] |
Only if escalated | Pre-escalation state and trigger reason |
Basis: X |
โ | Confidence computation method |
Reasoning: ... |
โ | Human-readable classification narrative |
Estimated cost: ... |
โ | Per-1K and per-request cost |
Evidence: ... |
โ | Raw signals that drove classification |
Alternatives considered: ... |
โ | Other models with relative cost factor |
๐ฆ Confidence-Gated Escalation
When the classifier is uncertain about a prompt's complexity, antaris-router can automatically escalate to a more capable model rather than risk a low-quality response.
Configuration
router = Router(
low_confidence_threshold=0.6, # escalate if confidence < 0.6
escalation_model="claude-opus-4-6", # which model to escalate to
escalation_strategy="always", # escalation behavior
)
Escalation Strategies
| Strategy | Behavior | Use Case |
|---|---|---|
"always" |
Replaces selected model with escalation_model |
Production: trust the router's escalation |
"log_only" |
Logs the low-confidence event, keeps original model | Monitoring: observe without changing behavior |
"ask" |
Sets decision.escalated=True + escalation_reason, keeps original model |
Human-in-the-loop: surface uncertainty to user |
Usage
router = Router(
low_confidence_threshold=0.6,
escalation_model="claude-opus-4-6",
escalation_strategy="always",
)
decision = router.route("What does this cryptic error mean in this context?")
if decision.escalated:
print(f"Escalated! Original confidence: {decision.original_confidence:.2f}")
print(f"Reason: {decision.escalation_reason}")
print(f"Using: {decision.model}") # claude-opus-4-6
Strategy: "ask" โ Human-in-the-Loop
When escalation_strategy="ask", the router signals uncertainty without changing the model. Use this to prompt users to confirm the routing decision:
router = Router(
low_confidence_threshold=0.65,
escalation_model="claude-opus-4-6",
escalation_strategy="ask",
)
decision = router.route("some ambiguous prompt")
if decision.escalated:
# Present choice to user
print(f"Router is uncertain (confidence: {decision.original_confidence:.0%}).")
print(f"Suggested escalation: {decision.escalation_reason}")
print(f"Upgrade to claude-opus-4-6? Current model: {decision.model}")
Escalation Decision Fields
When decision.escalated is True:
decision.escalated # True
decision.original_confidence # e.g. 0.48 โ confidence before escalation
decision.escalation_reason # e.g. "Low confidence below threshold 0.60. Original model: claude-sonnet-4-6"
decision.model # escalation_model (if strategy="always"), else original model
๐ SLA Configuration & Enforcement
antaris-router enforces Service Level Agreements on latency, budget, and response quality. When constraints are breached, the router adjusts model selection automatically.
Setup
from antaris_router import Router, SLAConfig
sla = SLAConfig(
max_latency_ms=200, # max acceptable latency per request
budget_per_hour_usd=5.00, # hourly spend cap in USD
min_quality_score=0.7, # minimum acceptable quality (0.0โ1.0)
auto_escalate_on_breach=True, # automatically adjust routing on SLA breach
)
router = Router(
sla=sla,
fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
)
SLAConfig Parameters
| Parameter | Type | Description |
|---|---|---|
max_latency_ms |
float |
Maximum acceptable request latency in milliseconds |
budget_per_hour_usd |
float |
Maximum spend per hour in USD |
min_quality_score |
float |
Minimum acceptable quality score (0.0โ1.0) |
auto_escalate_on_breach |
bool |
If True, router adjusts model selection to restore SLA compliance |
Routing With SLA
decision = router.route("prompt", auto_scale=True)
# SLA compliance info on every decision
print(decision.sla_compliant) # True / False
print(decision.sla_breaches) # ["budget_exceeded", "latency_exceeded"]
print(decision.sla_adjustments) # ["routed_to_cheaper_model_due_to_budget_sla"]
get_sla_report(since_hours=1.0) โ Dict
Aggregate SLA compliance report over a time window.
report = router.get_sla_report(since_hours=1.0)
# {
# "compliance_rate": 0.94,
# "breaches": {
# "latency": 3,
# "cost": 1,
# "quality": 2
# },
# "adjustments_made": 4,
# "cost_savings_usd": 0.87,
# "avg_latency_ms": 142.3,
# "budget_utilization": 0.68,
# "total_requests": 150
# }
| Field | Description |
|---|---|
compliance_rate |
Fraction of requests fully SLA-compliant (0.0โ1.0) |
breaches.latency |
Count of latency SLA breaches |
breaches.cost |
Count of budget SLA breaches |
breaches.quality |
Count of quality SLA breaches |
adjustments_made |
Count of routing adjustments made to restore SLA compliance |
cost_savings_usd |
USD saved through SLA-driven model downgrade |
avg_latency_ms |
Average request latency over the window |
budget_utilization |
Fraction of hourly budget consumed (0.0โ1.0) |
total_requests |
Total requests in the time window |
check_budget_alert() โ Dict
Real-time budget status and spend projection.
alert = router.check_budget_alert()
# {
# "status": "warning", # "ok" | "warning" | "critical"
# "hourly_spend_usd": 3.42,
# "budget_usd": 5.00,
# "utilization": 0.684,
# "projected_hourly_usd": 4.89,
# "recommendation": "Consider routing moderate tasks to gpt-4o-mini to reduce spend"
# }
| Status | Trigger |
|---|---|
"ok" |
Utilization below warning threshold |
"warning" |
Approaching budget limit |
"critical" |
At or over budget limit |
record_sla_quality(model, score)
Record an observed quality score for a completed request. Used to track quality SLA compliance.
router.record_sla_quality("claude-sonnet-4-6", score=0.85)
get_cost_optimizations(estimate_tokens) โ List[Dict]
Get actionable cost optimization suggestions based on current routing patterns.
suggestions = router.get_cost_optimizations(estimate_tokens=(100, 50))
# [
# {
# "suggestion": "Route 'moderate' prompts to gpt-4o-mini instead of claude-sonnet-4-6",
# "estimated_savings_usd_per_day": 2.34,
# "tradeoff": "Slightly lower quality for moderate tasks (est. -0.05 quality score)"
# },
# {
# "suggestion": "Enable confidence-gated escalation to reduce expert-tier misrouting",
# "estimated_savings_usd_per_day": 0.89,
# "tradeoff": "Adds ~10ms classification overhead per request"
# }
# ]
๐ฅ Provider Health Tracking
Track real-time health of each provider/model. Route around degraded providers automatically.
Recording Events
# After a successful call
router.record_provider_event(
"claude-sonnet-4-6",
event="success",
latency_ms=245.0,
)
# After an error
router.record_provider_event(
"claude-sonnet-4-6",
event="error",
details="rate_limited",
)
# After a timeout
router.record_provider_event("gpt-4o", event="timeout")
Event types:
| Event | Description |
|---|---|
"success" |
Request completed successfully. latency_ms recorded. |
"error" |
Request failed. details string (e.g. "rate_limited", "context_exceeded") |
"timeout" |
Request timed out. |
get_provider_health(model) โ Dict
health = router.get_provider_health("claude-sonnet-4-6")
# {
# "model": "claude-sonnet-4-6",
# "status": "healthy", # "healthy" | "degraded" | "down"
# "success_rate_1h": 0.97,
# "avg_latency_ms": 231.4,
# "recent_errors": ["rate_limited"],
# "last_seen": 1741500000.0 # Unix timestamp
# }
| Status | Meaning |
|---|---|
"healthy" |
High success rate, normal latency |
"degraded" |
Elevated error rate or latency โ still usable but non-preferred |
"down" |
No recent successes โ excluded from routing |
Health-Aware Routing
# Skip degraded/down providers entirely
decision = router.route("prompt", prefer_healthy=True)
When prefer_healthy=True:
- Models with status
"degraded"or"down"are skipped - Router falls through to next eligible model in cost order
- If all eligible models are degraded, falls back to least-degraded option
Combining with auto_scale:
decision = router.route(
"prompt",
prefer_healthy=True,
auto_scale=True, # use fallback_chain when primary is unavailable
)
๐งช A/B Testing
Run controlled routing experiments to compare strategies โ cost-optimized vs quality-first โ with configurable traffic splits.
Creating an A/B Test
ab_test = router.create_ab_test(
name="quality-vs-cost",
strategy_a="cost_optimized", # baseline strategy
strategy_b="quality_first", # bumps tier one level for B variant
split=0.5, # 50/50 split; 0.3 = 30% to B
)
| Parameter | Type | Description |
|---|---|---|
name |
str |
Human-readable test name |
strategy_a |
str |
Baseline strategy: "cost_optimized" |
strategy_b |
str |
Experimental strategy: "quality_first" bumps tier by one level |
split |
float |
Fraction of traffic routed to strategy B (0.0โ1.0) |
Running the Test
decision = router.route("Summarize the quarterly earnings report", ab_test=ab_test)
print(decision.ab_variant) # "a" or "b"
print(decision.model) # varies by variant
if decision.ab_variant == "b":
# B variant gets one tier higher โ more capable model
print("Quality-first routing applied")
Strategies
| Strategy | Behavior |
|---|---|
"cost_optimized" |
Standard routing โ cheapest eligible model for detected tier |
"quality_first" |
Bumps detected tier up by one level (e.g. moderate โ complex) for higher quality |
Collecting Results
Track ab_variant alongside actual quality scores to measure the tradeoff:
# In your application
decision = router.route(prompt, ab_test=ab_test)
response = call_llm(decision.model, prompt)
quality = evaluate(response)
router.record_sla_quality(decision.model, quality)
# Store for analysis
results.append({
"variant": decision.ab_variant,
"model": decision.model,
"cost": decision.estimated_cost,
"quality": quality,
})
๐ฐ Cost Forecasting
Project future LLM costs based on current routing distribution and expected traffic.
forecast_cost(...) โ Dict
forecast = router.forecast_cost(
requests_per_hour=1000,
avg_input_tokens=500,
avg_output_tokens=200,
)
| Parameter | Type | Description |
|---|---|---|
requests_per_hour |
int |
Expected request volume per hour |
avg_input_tokens |
int |
Average input tokens per request |
avg_output_tokens |
int |
Average output tokens per request |
Returns:
# {
# "hourly_cost_usd": 1.24,
# "daily_cost_usd": 29.76,
# "monthly_cost_usd": 892.80,
# "breakdown_by_model": {
# "gpt-4o-mini": {
# "requests_pct": 0.45,
# "cost_per_request_usd": 0.000105,
# "hourly_cost_usd": 0.047
# },
# "claude-sonnet-4-6": {
# "requests_pct": 0.40,
# "cost_per_request_usd": 0.002100,
# "hourly_cost_usd": 0.840
# },
# "claude-opus-4-6": {
# "requests_pct": 0.15,
# "cost_per_request_usd": 0.013500,
# "hourly_cost_usd": 0.203
# }
# },
# "optimization_tip": "Routing 10% of simple tasks from claude-sonnet-4-6 to gpt-4o-mini would save ~$4.20/day"
# }
| Field | Description |
|---|---|
hourly_cost_usd |
Projected USD spend per hour |
daily_cost_usd |
Projected USD spend per day |
monthly_cost_usd |
Projected USD spend per month |
breakdown_by_model |
Per-model cost decomposition |
optimization_tip |
Actionable recommendation to reduce costs |
Use forecasting to:
- Set
SLAConfig.budget_per_hour_usdbased on realistic projections - Identify which models dominate cost
- Plan budget before scaling traffic
๐ Cost Tracking & Analytics
log_usage(decision, input_tokens, output_tokens) โ float
Log actual token usage for a completed request. Returns the actual cost in USD.
cost = router.log_usage(decision, input_tokens=500, output_tokens=200)
print(f"Request cost: ${cost:.6f}")
cost_report(period) โ Dict
Aggregate cost report over a time period.
report = router.cost_report(period="week") # "day" | "week" | "month"
# {
# "period": "week",
# "total_cost_usd": 42.18,
# "by_model": {
# "gpt-4o-mini": {"requests": 8420, "cost_usd": 3.14},
# "claude-sonnet-4-6": {"requests": 3210, "cost_usd": 28.44},
# "claude-opus-4-6": {"requests": 380, "cost_usd": 10.60}
# },
# "avg_cost_per_request_usd": 0.00351
# }
savings_estimate(comparison_model) โ Dict
Calculate how much was saved by routing vs always using a reference model.
savings = router.savings_estimate(comparison_model="gpt-4o")
# {
# "comparison_model": "gpt-4o",
# "actual_cost_usd": 42.18,
# "comparison_cost_usd": 187.40,
# "savings_usd": 145.22,
# "savings_pct": 0.775
# }
A savings_pct of 0.775 means the router saved 77.5% vs routing every request to gpt-4o.
routing_analytics() โ Dict
Full aggregate analytics on routing decisions.
analytics = router.routing_analytics()
# {
# "total_decisions": 12010,
# "avg_confidence": 0.831,
# "tier_distribution": {
# "trivial": 1205, "simple": 3802, "moderate": 4510,
# "complex": 2101, "expert": 392
# },
# "tier_percentages": {
# "trivial": 10.0, "simple": 31.7, "moderate": 37.6,
# "complex": 17.5, "expert": 3.3
# },
# "model_usage": {
# "gpt-4o-mini": 5007,
# "claude-sonnet-4-6": 5902,
# "claude-opus-4-6": 1101
# },
# "provider_usage": {
# "openai": 5007,
# "anthropic": 7003
# },
# "most_used_model": "claude-sonnet-4-6",
# "most_used_provider": "anthropic"
# }
๐๏ธ Model Registry
get_model_info(model_name) โ ModelInfo
info = router.get_model_info("claude-sonnet-4-6")
info.name # "claude-sonnet-4-6"
info.provider # "anthropic"
info.cost_per_1k_input # 0.003
info.cost_per_1k_output # 0.015
info.capabilities # ["text", "code", "vision"]
info.max_tokens # 200000
info.supports_streaming # True
info.has_capability("vision") # True โ bool
info.calculate_cost(500, 200) # โ float: cost for 500 input + 200 output tokens
ModelInfo Fields
| Field | Type | Description |
|---|---|---|
name |
str |
Model identifier |
provider |
str |
Provider: "anthropic", "openai", etc. |
cost_per_1k_input |
float |
USD per 1,000 input tokens |
cost_per_1k_output |
float |
USD per 1,000 output tokens |
capabilities |
List[str] |
E.g. ["text", "code", "vision"] |
max_tokens |
int |
Maximum context window in tokens |
supports_streaming |
bool |
Whether model supports streaming responses |
list_models_for_tier(tier) โ List[Dict]
List all models eligible for a given tier, ordered by cost.
models = router.list_models_for_tier("moderate")
# [
# {"name": "gpt-4o-mini", "provider": "openai", "cost": 0.000105, "capabilities": [...], "max_tokens": 128000},
# {"name": "claude-sonnet-4-6", "provider": "anthropic", "cost": 0.00210, "capabilities": [...], "max_tokens": 200000},
# ]
save_state(path)
Persist router state (cost history, health data, analytics) to disk.
router.save_state("./router_state")
๐ง SemanticClassifier (v2.0)
SemanticClassifier replaces the built-in keyword classifier with TF-IDF semantic classification. It can be injected into the v1.0 Router for semantic accuracy without migrating to AdaptiveRouter.
Usage
from antaris_router import SemanticClassifier, Router
sem = SemanticClassifier(workspace="./routing_data")
router = Router(classifier=sem)
decision = router.route("Design a microservices platform with event-driven architecture")
The SemanticClassifier persists its TF-IDF model to workspace/ and improves with each classified prompt. It is the same classifier used internally by AdaptiveRouter.
Constructor
sem = SemanticClassifier(workspace="./routing_data")
| Parameter | Type | Description |
|---|---|---|
workspace |
str |
Directory to persist TF-IDF model and vocabulary |
How it Works
- Tokenization โ Prompt is tokenized and stopwords removed
- TF-IDF vectorization โ Term frequency ร inverse document frequency weights computed
- Tier classification โ Vector compared to learned per-tier centroids
- Confidence scoring โ Distance to centroids determines confidence score
- Feedback loop โ
report_outcome()adjusts centroid weights over time
Injecting into v1.0 Router
sem = SemanticClassifier(workspace="./routing_data")
router = Router(
classifier=sem,
low_confidence_threshold=0.6,
escalation_model="claude-opus-4-6",
escalation_strategy="always",
)
decision = router.route("Implement OAuth2 with PKCE in a distributed system")
This gives you semantic classification accuracy with all v1.0 features (SLA, health tracking, A/B testing, cost tracking).
๐ QualityTracker (v2.0)
QualityTracker stores per-prompt outcome data and model performance history. Used internally by AdaptiveRouter and available as a standalone component.
Usage
from antaris_router import QualityTracker
tracker = QualityTracker("./routing_data")
# Record an outcome
tracker.record_outcome(
prompt_hash, # str: from RoutingResult.prompt_hash
quality_score=0.9, # float: 0.0โ1.0
success=True, # bool: did the request succeed
model="claude-sonnet-4-6",
)
# Query model performance history
history = tracker.get_model_performance("claude-sonnet-4-6")
# {
# "model": "claude-sonnet-4-6",
# "avg_quality": 0.87,
# "success_rate": 0.96,
# "total_outcomes": 3820,
# "quality_by_tier": {"simple": 0.91, "moderate": 0.88, "complex": 0.84}
# }
record_outcome(prompt_hash, quality_score, success, model)
| Parameter | Type | Description |
|---|---|---|
prompt_hash |
str |
SHA-256 hash from RoutingResult.prompt_hash |
quality_score |
float |
Quality 0.0โ1.0. Source: human rating, LLM eval, or downstream metric |
success |
bool |
Whether the request succeeded (True) or errored (False) |
model |
str |
Model name that handled the request |
get_model_performance(model) โ Dict
Aggregate quality history for a model across all tracked outcomes.
๐ฌ ClassificationResult & Signals
ClassificationResult is the raw output of the classifier, accessible via decision.classification.
decision = router.route("Build a Redis-backed distributed rate limiter in Go")
clf = decision.classification
clf.tier # "complex"
clf.confidence # 0.83
clf.signals # dict (see below)
ClassificationResult.signals
clf.signals = {
"length": 62, # raw character count
"keyword_matches": {
"trivial": 0,
"simple": 1,
"complex": 2, # matched "distributed", "rate limiter"
},
"has_code": False, # whether prompt contains code
"code_indicators": 0, # count of code-related patterns
"structural_complexity": 2, # heuristic complexity score
}
| Signal | Type | Description |
|---|---|---|
length |
int |
Raw character count of the prompt |
keyword_matches |
Dict[str, int] |
Per-tier keyword match counts |
has_code |
bool |
Whether prompt contains code blocks or inline code |
code_indicators |
int |
Count of code-related patterns (functions, syntax, etc.) |
structural_complexity |
int |
Heuristic score: nesting, multi-part requests, etc. |
๐ Full API Reference
Router Methods
| Method | Signature | Description |
|---|---|---|
route |
(prompt, context, prefer, min_tier, capability, estimate_tokens, ab_test, prefer_healthy, auto_scale) โ RoutingDecision |
Route a prompt to the optimal model |
explain |
(decision: RoutingDecision) โ str |
Generate plain-English explanation of a routing decision |
log_usage |
(decision, input_tokens, output_tokens) โ float |
Log actual usage, returns cost in USD |
cost_report |
(period: str) โ Dict |
Aggregate cost report. Period: "day"/"week"/"month" |
savings_estimate |
(comparison_model: str) โ Dict |
Cost savings vs always using comparison model |
routing_analytics |
() โ Dict |
Full routing analytics (tiers, models, confidence) |
get_model_info |
(model_name: str) โ ModelInfo |
Model metadata from registry |
list_models_for_tier |
(tier: str) โ List[Dict] |
All eligible models for a tier |
save_state |
(path: str) |
Persist router state to disk |
record_provider_event |
(model, event, latency_ms, details) โ None |
Record provider health event |
get_provider_health |
(model: str) โ Dict |
Current health status for a model |
create_ab_test |
(name, strategy_a, strategy_b, split) โ ABTest |
Create A/B test configuration |
forecast_cost |
(requests_per_hour, avg_input_tokens, avg_output_tokens) โ Dict |
Project future costs |
get_sla_report |
(since_hours: float) โ Dict |
SLA compliance report |
check_budget_alert |
() โ Dict |
Real-time budget status |
record_sla_quality |
(model: str, score: float) โ None |
Record quality score for SLA tracking |
get_cost_optimizations |
(estimate_tokens: Tuple[int, int]) โ List[Dict] |
Cost optimization suggestions |
AdaptiveRouter Methods
| Method | Signature | Description |
|---|---|---|
register_model |
(config: ModelConfig) โ None |
Register a model with tier range and costs |
route |
(prompt: str) โ RoutingResult |
Classify and route a prompt |
report_outcome |
(prompt_hash, quality_score, success) โ None |
Feed outcome back for self-improvement |
get_analytics |
() โ Dict |
Session-level routing analytics |
๐ฆ Complete Exports
from antaris_router import (
# โโ v2.0 API โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
AdaptiveRouter, # Semantic, self-improving router
RoutingResult, # Result object from AdaptiveRouter.route()
ModelConfig, # Model registration config for AdaptiveRouter
SemanticClassifier, # TF-IDF classifier (injectable into v1.0 Router)
SemanticResult, # Result object from SemanticClassifier
TFIDFVectorizer, # Low-level TF-IDF vectorizer
QualityTracker, # Outcome feedback tracker
QualityDecision, # Quality decision record
# โโ v1.0 API โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Router, # Keyword-based production router
RoutingDecision, # Decision object from Router.route()
TaskClassifier, # Built-in keyword classifier
ClassificationResult, # Classification output with signals
ModelRegistry, # Internal model registry
ModelInfo, # Model metadata object
CostTracker, # Cost tracking component
UsageRecord, # Per-request usage record
Config, # Router configuration
# โโ Sprint 5 โ SLA โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
SLAConfig, # SLA constraint configuration
SLAMonitor, # SLA enforcement monitor
SLARecord, # Per-request SLA record
)
๐ Migration: v1.0 โ v2.0
| Feature | v1.0 Router |
v2.0 AdaptiveRouter |
|---|---|---|
| Classification | Keyword matching | TF-IDF semantic |
| Self-improvement | โ | โ
via report_outcome() |
| Persistence | save_state() |
Automatic to workspace |
| SLA enforcement | โ | โ (use Router + SemanticClassifier) |
| Provider health | โ | โ (use Router + SemanticClassifier) |
| A/B testing | โ | Built-in ab_test_rate |
| Cost tracking | โ | Basic (via analytics) |
| Explainability | โ
explain() |
Via result.confidence + analytics |
RoutingDecision |
Full object | Lightweight RoutingResult |
Recommended migration path:
# Option A: Full v2.0 โ new project, accuracy-first
router = AdaptiveRouter("./routing_data")
# Option B: Best of both โ semantic accuracy + full v1.0 features
sem = SemanticClassifier(workspace="./routing_data")
router = Router(
classifier=sem, # semantic classification
sla=sla, # + SLA enforcement
enable_cost_tracking=True # + cost tracking
)
Option B lets you adopt semantic classification incrementally without losing any v1.0 production features.
๐งฉ Advanced Patterns
Full Production Setup
from antaris_router import Router, SLAConfig, SemanticClassifier
sem = SemanticClassifier(workspace="./routing_data")
sla = SLAConfig(
max_latency_ms=300,
budget_per_hour_usd=10.00,
min_quality_score=0.75,
auto_escalate_on_breach=True,
)
router = Router(
classifier=sem,
sla=sla,
fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
low_confidence_threshold=0.6,
escalation_model="claude-opus-4-6",
escalation_strategy="always",
enable_cost_tracking=True,
)
def route_and_call(prompt: str) -> str:
decision = router.route(
prompt,
estimate_tokens=(len(prompt) // 4, 200),
prefer_healthy=True,
auto_scale=True,
)
# Log decision for audit
print(router.explain(decision))
# Call your LLM here
response = call_llm(decision.model, prompt)
# Record actual usage
router.log_usage(decision, input_tokens=len(prompt)//4, output_tokens=len(response)//4)
# Record provider health
router.record_provider_event(decision.model, event="success", latency_ms=242.0)
return response
Periodic Reporting
import time
# Every hour
while True:
time.sleep(3600)
report = router.get_sla_report(since_hours=1.0)
alert = router.check_budget_alert()
analytics = router.routing_analytics()
print(f"SLA compliance: {report['compliance_rate']:.1%}")
print(f"Budget: {alert['status']} ({alert['utilization']:.1%} used)")
print(f"Most used model: {analytics['most_used_model']}")
if alert["status"] == "critical":
# Trigger alerts, adjust SLA config, etc.
pass
router.save_state("./router_state")
๐๏ธ Architecture
antaris-router
โโโ Router (v1.0) โ Production keyword-based router
โ โโโ TaskClassifier โ Built-in keyword classification
โ โ โโโ ClassificationResult โ With signals: length, keywords, code
โ โโโ ModelRegistry โ Model metadata + capability index
โ โโโ CostTracker โ Per-session/period cost tracking
โ โโโ SLAMonitor โ Constraint enforcement + reporting
โ โโโ RoutingDecision โ Full decision object
โ
โโโ AdaptiveRouter (v2.0) โ Self-improving semantic router
โ โโโ SemanticClassifier โ TF-IDF vectorizer + tier centroids
โ โโโ TFIDFVectorizer โ Low-level TF-IDF implementation
โ โโโ QualityTracker โ Outcome feedback + model performance
โ โโโ RoutingResult โ Lightweight result object
โ
โโโ Shared
โโโ SLAConfig โ SLA constraint definition
โโโ SLARecord โ Per-request SLA record
โโโ ModelInfo โ Model metadata (costs, caps, streaming)
๐ License
Part of the antaris-suite โ adaptive AI infrastructure for LLM cost optimization.
ยฉ Antaris Analytics LLC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file antaris_router-5.0.1.tar.gz.
File metadata
- Download URL: antaris_router-5.0.1.tar.gz
- Upload date:
- Size: 104.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60df728c38cddf5e0f8274e673b50f2aa660080bf1d61c19345b050253c843db
|
|
| MD5 |
7b431281fb8501c450513b6ed91fc6b1
|
|
| BLAKE2b-256 |
fb312dfba016cdac68751185dd0d2a760753ebacb1b013289080855a07457673
|
File details
Details for the file antaris_router-5.0.1-py3-none-any.whl.
File metadata
- Download URL: antaris_router-5.0.1-py3-none-any.whl
- Upload date:
- Size: 68.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dcdbf7c212198f9caec1f85ff505281ee4c53e2b2a0727aa52ce2f24c01f901
|
|
| MD5 |
6f9b7ff0e77861d721f8bd37928a3fca
|
|
| BLAKE2b-256 |
47c35a2dba5cad15d7ebc125a731ca1c60d190ec833732785f7c651bffe8e301
|