File-based model router for LLM cost optimization. Zero dependencies.

These details have not been verified by PyPI

Project links

Project description

antaris-router ⚡

Adaptive LLM model routing for cost optimization — zero dependencies, stdlib only.

Route every prompt to the right model at the right cost. antaris-router classifies task complexity, selects the optimal model from your registry, enforces SLAs, tracks provider health, and continuously improves routing quality through outcome feedback.

📦 Installation

pip install antaris-router

Version: 4.9.20 Dependencies: None — pure Python stdlib only.

🗺️ Table of Contents

Why antaris-router?
Tier System
Quick Start
v2.0 API — AdaptiveRouter (Semantic)
v1.0 API — Router (Keyword-based)
RoutingDecision Fields
Explainability — explain()
Confidence-Gated Escalation
SLA Configuration & Enforcement
Provider Health Tracking
A/B Testing
Cost Forecasting
Cost Tracking & Analytics
Model Registry
SemanticClassifier (v2.0)
QualityTracker (v2.0)
ClassificationResult & Signals
Full API Reference
Complete Exports
Migration: v1.0 → v2.0

🎯 Why antaris-router?

LLM costs are asymmetric. A one-line question routed to claude-opus wastes 50–100× what it needs to. antaris-router fixes that:

Without routing	With antaris-router
Every request → one expensive model	Each request → cheapest capable model
No visibility into cost breakdown	Real-time cost tracking + forecasting
Silent model failures	Provider health tracking + auto-failover
Blind prompt-to-model mapping	TF-IDF semantic classification (v2.0)
No quality signal loop	Outcome feedback → self-improving routing

🏆 Tier System

antaris-router classifies every prompt into one of five complexity tiers. Each tier maps to a cost bracket, ensuring you always pay proportionally to task complexity.

Tier	Char Range	Typical Tasks	Strategy
`trivial`	≤ 50 chars	Simple Q&A, single-word lookups	Cheapest model
`simple`	50–200 chars	Basic tasks, short explanations	Low-cost model
`moderate`	200–1,000 chars	Standard tasks, multi-step answers	Mid-tier model
`complex`	1,000–3,000 chars	Analysis, architecture, code review	Powerful model
`expert`	3,000+ chars	Highest complexity, long-form reasoning	Most capable model

Tier boundaries are based on character count combined with keyword signals, code detection, and structural complexity analysis. The v2.0 AdaptiveRouter additionally uses TF-IDF semantic classification and improves tier accuracy over time through outcome feedback.

⚡ Quick Start

v2.0 — AdaptiveRouter (recommended for new projects)

from antaris_router import AdaptiveRouter, ModelConfig

router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))
router.register_model(ModelConfig(
    name="claude-sonnet-4-6",
    tier_range=("simple", "complex"),
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))
router.register_model(ModelConfig(
    name="claude-opus-4-6",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

result = router.route("Implement a distributed task queue with priority scheduling")

print(result.model)          # "claude-sonnet-4-6"
print(result.tier)           # "complex"
print(result.confidence)     # 0.87
print(result.estimated_cost) # 0.00234

# Feed outcome back to improve future routing
router.report_outcome(result.prompt_hash, quality_score=0.9, success=True)

# Session analytics
analytics = router.get_analytics()
print(analytics)
# {
#   "total_routed": 42,
#   "tier_distribution": {"trivial": 5, "simple": 12, "moderate": 15, "complex": 8, "expert": 2},
#   "avg_quality": 0.88,
#   "model_usage": {"gpt-4o-mini": 17, "claude-sonnet-4-6": 21, "claude-opus-4-6": 4},
#   "cost_savings": 0.142
# }

v1.0 — Router (production-proven, keyword-based)

from antaris_router import Router

router = Router(enable_cost_tracking=True)

decision = router.route("Explain async/await in Python with examples")
print(decision.model)    # "claude-sonnet-4-6"
print(decision.tier)     # "moderate"
print(decision.confidence) # 0.82

print(router.explain(decision))

🤖 v2.0 API — AdaptiveRouter (Semantic, Self-Improving)

AdaptiveRouter is the next-generation router. It uses TF-IDF vectorization for semantic classification, learns from outcome feedback, and persists routing state across sessions.

Constructor

router = AdaptiveRouter(
    workspace="./routing_data",  # directory for persisted state
    ab_test_rate=0.05,           # fraction of routes used for A/B exploration (0.0–1.0)
)

Parameter	Type	Default	Description
`workspace`	`str`	required	Path to directory for persisted routing data, quality history, and TF-IDF model
`ab_test_rate`	`float`	`0.05`	Fraction of routing decisions used for A/B exploration. Set `0.0` to disable.

The workspace directory is created automatically if it doesn't exist.

`register_model(config: ModelConfig)`

router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),   # (min_tier, max_tier)
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

ModelConfig fields:

Field	Type	Description
`name`	`str`	Model identifier (e.g. `"gpt-4o-mini"`)
`tier_range`	`Tuple[str, str]`	`(min_tier, max_tier)` — tiers this model handles
`cost_per_1k_input`	`float`	Cost in USD per 1K input tokens
`cost_per_1k_output`	`float`	Cost in USD per 1K output tokens

Tier range semantics: A model registered with tier_range=("simple", "complex") is eligible for simple, moderate, and complex prompts. The router selects the lowest-cost eligible model for each tier.

`route(prompt: str) → RoutingResult`

Classify the prompt and select the optimal model.

result = router.route("Summarize the following contract clause: ...")

Returns RoutingResult:

Field	Type	Description
`model`	`str`	Selected model name
`tier`	`str`	Classified complexity tier
`confidence`	`float`	Classification confidence (0.0–1.0)
`prompt_hash`	`str`	SHA-256 hash of prompt (used for outcome feedback)
`estimated_cost`	`float`	Estimated cost in USD for this request

`report_outcome(prompt_hash: str, quality_score: float, success: bool)`

Feed outcome back to the router to improve future routing decisions. This is the core self-improvement loop.

router.report_outcome(
    result.prompt_hash,
    quality_score=0.9,  # 0.0–1.0, how good the model's response was
    success=True,       # whether the request succeeded at all
)

The router uses outcome history to:

Detect tier misclassifications (e.g. a moderate prompt that consistently gets poor quality → escalate to complex)
Track per-model quality trends across tier assignments
Improve TF-IDF weights over time

`get_analytics() → Dict`

Aggregate routing stats for the current session.

analytics = router.get_analytics()
# {
#   "total_routed": int,
#   "tier_distribution": {"trivial": int, "simple": int, ...},
#   "avg_quality": float,
#   "model_usage": {"model-name": int, ...},
#   "cost_savings": float   # USD saved vs always using most capable model
# }

🔧 v1.0 API — Router (Keyword-based, Production)

Router is the production-proven keyword-based router. Fully featured with SLA enforcement, confidence-gated escalation, provider health tracking, A/B testing, and cost forecasting. Use this for stability; use AdaptiveRouter for semantic accuracy.

Constructor

from antaris_router import Router, SLAConfig

router = Router(
    config_path=None,                  # optional path to JSON config file
    enable_cost_tracking=True,         # track per-model cost usage
    low_confidence_threshold=0.0,      # 0.0 = never escalate (default)
    escalation_model=None,             # model to escalate to when confidence is low
    escalation_strategy="always",      # "always" | "log_only" | "ask"
    sla=None,                          # SLAConfig instance
    fallback_chain=None,               # ordered list of fallback model names
    classifier=None,                   # inject custom classifier (e.g. SemanticClassifier)
)

Parameter	Type	Default	Description
`config_path`	`str \| None`	`None`	Path to JSON config file. If `None`, uses built-in defaults.
`enable_cost_tracking`	`bool`	`True`	Track cost per model, per session. Required for `cost_report()`, `savings_estimate()`.
`low_confidence_threshold`	`float`	`0.0`	Confidence below this triggers escalation. `0.0` = disabled.
`escalation_model`	`str \| None`	`None`	Model name to escalate to on low confidence.
`escalation_strategy`	`str`	`"always"`	Escalation behavior: `"always"` swaps model, `"log_only"` logs but keeps model, `"ask"` signals user to confirm.
`sla`	`SLAConfig \| None`	`None`	SLA constraints to enforce during routing.
`fallback_chain`	`List[str] \| None`	`None`	Ordered fallback models for `auto_scale=True`.
`classifier`	`object \| None`	`None`	Custom classifier to inject (e.g. `SemanticClassifier`). Replaces built-in keyword classifier.

`route(...) → RoutingDecision`

Route a prompt to the optimal model.

decision = router.route(
    prompt="text to route",            # required
    context=None,                      # optional: additional context dict
    prefer=None,                       # preferred provider: "claude" | "openai" | etc.
    min_tier=None,                     # minimum tier floor: "simple"|"moderate"|"complex"|"expert"
    capability=None,                   # required capability: "vision"|"code"|etc.
    estimate_tokens=(100, 50),         # (input_tokens, output_tokens) for cost estimation
    ab_test=None,                      # A/B test config from create_ab_test()
    prefer_healthy=False,              # skip degraded/rate-limited providers
    auto_scale=False,                  # fall back through fallback_chain if primary is degraded or over-budget
)

Parameter	Type	Default	Description
`prompt`	`str`	required	The prompt text to classify and route
`context`	`dict \| None`	`None`	Additional context for routing decisions
`prefer`	`str \| None`	`None`	Preferred provider name. Router respects this if an eligible model exists.
`min_tier`	`str \| None`	`None`	Force minimum complexity tier. E.g. `"complex"` ensures at least a complex-tier model.
`capability`	`str \| None`	`None`	Required model capability. Only models with this capability are considered.
`estimate_tokens`	`Tuple[int, int]`	`(100, 50)`	`(input_tokens, output_tokens)` used for cost estimation in `decision.estimated_cost`.
`ab_test`	`ABTest \| None`	`None`	A/B test object from `create_ab_test()`. Enables variant-based routing.
`prefer_healthy`	`bool`	`False`	If `True`, degraded or down providers are skipped. Falls through to next eligible model.
`auto_scale`	`bool`	`False`	If `True` and primary model is degraded or over-budget, routes through `fallback_chain` in order.

📋 RoutingDecision Fields

Every router.route() call returns a RoutingDecision object with full decision transparency.

decision = router.route("Design a microservices platform for high-throughput event processing")

decision.model              # str: "claude-sonnet-4-6"
decision.provider           # str: "anthropic"
decision.tier               # str: "complex"
decision.confidence         # float: 0.85
decision.reasoning          # List[str]: ["Input length 1,250 chars → complex range", ...]
decision.estimated_cost     # float: 0.00525 (USD)
decision.fallback_models    # List[str]: ["claude-opus-4-6", "gpt-4o"]
decision.classification     # ClassificationResult object
decision.confidence_basis   # str: "keyword_density" | "composite" | "rule_based"
decision.evidence           # List[str]: human-readable decision signals
decision.escalated          # bool: True if escalation changed the model
decision.original_confidence  # float: pre-escalation confidence (if escalated)
decision.escalation_reason  # str: why escalation triggered (if escalated)
decision.ab_variant         # str: "a" | "b" if A/B test active
decision.explanation        # str: full human-readable explanation
decision.supports_streaming # bool: whether selected model supports streaming
decision.sla_compliant      # bool: whether decision satisfies all SLA constraints
decision.sla_breaches       # List[str]: e.g. ["latency_exceeded", "budget_exceeded"]
decision.sla_adjustments    # List[str]: e.g. ["routed_to_cheaper_model_due_to_budget_sla"]

decision.selected_model     # property alias for decision.model
decision.to_dict()          # Dict: all fields serialized to a plain dict

Complete Field Reference

Field	Type	Description
`model`	`str`	Name of the selected model
`provider`	`str`	Provider name: `"anthropic"`, `"openai"`, etc.
`tier`	`str`	Complexity tier: `trivial/simple/moderate/complex/expert`
`confidence`	`float`	Classification confidence 0.0–1.0
`reasoning`	`List[str]`	Ordered list of reasons why this model was chosen
`estimated_cost`	`float`	Estimated USD cost for this specific request
`fallback_models`	`List[str]`	Ordered list of alternative models considered
`classification`	`ClassificationResult`	Raw classification output including signals
`confidence_basis`	`str`	How confidence was computed: `"keyword_density"`, `"composite"`, `"rule_based"`
`evidence`	`List[str]`	Human-readable signals that drove the decision
`escalated`	`bool`	`True` if escalation logic overrode the original model selection
`original_confidence`	`float`	Confidence before escalation (populated only when `escalated=True`)
`escalation_reason`	`str`	Human-readable reason escalation triggered
`ab_variant`	`str`	`"a"` or `"b"` when an A/B test is active, `""` otherwise
`explanation`	`str`	Full plain-English explanation of the routing decision
`supports_streaming`	`bool`	Whether the selected model supports streaming responses
`sla_compliant`	`bool`	Whether the decision satisfies all active SLA constraints
`sla_breaches`	`List[str]`	Which SLA constraints were breached (if any)
`sla_adjustments`	`List[str]`	Routing adjustments made to satisfy SLA constraints
`selected_model`	property	Alias for `model`

`to_dict()` Output

d = decision.to_dict()
# {
#   "model": "claude-sonnet-4-6",
#   "provider": "anthropic",
#   "tier": "complex",
#   "confidence": 0.85,
#   "reasoning": [...],
#   "estimated_cost": 0.00525,
#   "fallback_models": [...],
#   "confidence_basis": "keyword_density",
#   "evidence": [...],
#   "escalated": False,
#   "original_confidence": 0.0,
#   "escalation_reason": "",
#   "ab_variant": "",
#   "explanation": "Model selected: claude-sonnet-4-6 ...",
#   "supports_streaming": True,
#   "sla_compliant": True,
#   "sla_breaches": [],
#   "sla_adjustments": []
# }

🔍 Explainability — `explain()`

Every routing decision can be explained in plain English. Use explain() for debugging, auditing, or displaying routing logic to users.

explanation = router.explain(decision)
print(explanation)

Example output:

Model selected: claude-sonnet-4-6 (confidence: 85%)
Basis: keyword density
Reasoning: Input classified as 'complex' task (85% confidence). Length 1,250 chars falls in
complex range (1,000–3,000). Strong signal keywords detected: "microservices", "architecture",
"distributed".
Estimated cost: $0.003000 per 1K tokens (this request: $0.005250).
Evidence: length: 1250 chars → complex range (≤3000), keyword match: 3 'complex'-tier keywords
(microservices, architecture, distributed), structural_complexity: 2
Alternatives considered: claude-opus-4-6 (more capable, 5.0x cost), gpt-4o-mini (cheaper, reduced quality)

When escalation occurred:

Model selected: claude-opus-4-6 (confidence: 45%)
[Escalated from original confidence 45%: Low confidence below threshold 0.60. Original model: claude-sonnet-4-6]
Basis: composite
Reasoning: Input classified as 'moderate' task (45% confidence)...

explain() sections:

Section	Always shown	Description
`Model selected: X (confidence: Y%)`	✅	Selected model and final confidence
`[Escalated from ...]`	Only if escalated	Pre-escalation state and trigger reason
`Basis: X`	✅	Confidence computation method
`Reasoning: ...`	✅	Human-readable classification narrative
`Estimated cost: ...`	✅	Per-1K and per-request cost
`Evidence: ...`	✅	Raw signals that drove classification
`Alternatives considered: ...`	✅	Other models with relative cost factor

🚦 Confidence-Gated Escalation

When the classifier is uncertain about a prompt's complexity, antaris-router can automatically escalate to a more capable model rather than risk a low-quality response.

Configuration

router = Router(
    low_confidence_threshold=0.6,          # escalate if confidence < 0.6
    escalation_model="claude-opus-4-6",    # which model to escalate to
    escalation_strategy="always",          # escalation behavior
)

Escalation Strategies

Strategy	Behavior	Use Case
`"always"`	Replaces selected model with `escalation_model`	Production: trust the router's escalation
`"log_only"`	Logs the low-confidence event, keeps original model	Monitoring: observe without changing behavior
`"ask"`	Sets `decision.escalated=True` + `escalation_reason`, keeps original model	Human-in-the-loop: surface uncertainty to user

Usage

router = Router(
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
)

decision = router.route("What does this cryptic error mean in this context?")

if decision.escalated:
    print(f"Escalated! Original confidence: {decision.original_confidence:.2f}")
    print(f"Reason: {decision.escalation_reason}")
    print(f"Using: {decision.model}")  # claude-opus-4-6

Strategy: `"ask"` — Human-in-the-Loop

When escalation_strategy="ask", the router signals uncertainty without changing the model. Use this to prompt users to confirm the routing decision:

router = Router(
    low_confidence_threshold=0.65,
    escalation_model="claude-opus-4-6",
    escalation_strategy="ask",
)

decision = router.route("some ambiguous prompt")

if decision.escalated:
    # Present choice to user
    print(f"Router is uncertain (confidence: {decision.original_confidence:.0%}).")
    print(f"Suggested escalation: {decision.escalation_reason}")
    print(f"Upgrade to claude-opus-4-6? Current model: {decision.model}")

Escalation Decision Fields

When decision.escalated is True:

decision.escalated           # True
decision.original_confidence # e.g. 0.48 — confidence before escalation
decision.escalation_reason   # e.g. "Low confidence below threshold 0.60. Original model: claude-sonnet-4-6"
decision.model               # escalation_model (if strategy="always"), else original model

📊 SLA Configuration & Enforcement

antaris-router enforces Service Level Agreements on latency, budget, and response quality. When constraints are breached, the router adjusts model selection automatically.

Setup

from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,           # max acceptable latency per request
    budget_per_hour_usd=5.00,     # hourly spend cap in USD
    min_quality_score=0.7,        # minimum acceptable quality (0.0–1.0)
    auto_escalate_on_breach=True, # automatically adjust routing on SLA breach
)

router = Router(
    sla=sla,
    fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
)

`SLAConfig` Parameters

Parameter	Type	Description
`max_latency_ms`	`float`	Maximum acceptable request latency in milliseconds
`budget_per_hour_usd`	`float`	Maximum spend per hour in USD
`min_quality_score`	`float`	Minimum acceptable quality score (0.0–1.0)
`auto_escalate_on_breach`	`bool`	If `True`, router adjusts model selection to restore SLA compliance

Routing With SLA

decision = router.route("prompt", auto_scale=True)

# SLA compliance info on every decision
print(decision.sla_compliant)    # True / False
print(decision.sla_breaches)     # ["budget_exceeded", "latency_exceeded"]
print(decision.sla_adjustments)  # ["routed_to_cheaper_model_due_to_budget_sla"]

`get_sla_report(since_hours=1.0) → Dict`

Aggregate SLA compliance report over a time window.

report = router.get_sla_report(since_hours=1.0)
# {
#   "compliance_rate": 0.94,
#   "breaches": {
#     "latency": 3,
#     "cost": 1,
#     "quality": 2
#   },
#   "adjustments_made": 4,
#   "cost_savings_usd": 0.87,
#   "avg_latency_ms": 142.3,
#   "budget_utilization": 0.68,
#   "total_requests": 150
# }

Field	Description
`compliance_rate`	Fraction of requests fully SLA-compliant (0.0–1.0)
`breaches.latency`	Count of latency SLA breaches
`breaches.cost`	Count of budget SLA breaches
`breaches.quality`	Count of quality SLA breaches
`adjustments_made`	Count of routing adjustments made to restore SLA compliance
`cost_savings_usd`	USD saved through SLA-driven model downgrade
`avg_latency_ms`	Average request latency over the window
`budget_utilization`	Fraction of hourly budget consumed (0.0–1.0)
`total_requests`	Total requests in the time window

`check_budget_alert() → Dict`

Real-time budget status and spend projection.

alert = router.check_budget_alert()
# {
#   "status": "warning",            # "ok" | "warning" | "critical"
#   "hourly_spend_usd": 3.42,
#   "budget_usd": 5.00,
#   "utilization": 0.684,
#   "projected_hourly_usd": 4.89,
#   "recommendation": "Consider routing moderate tasks to gpt-4o-mini to reduce spend"
# }

Status	Trigger
`"ok"`	Utilization below warning threshold
`"warning"`	Approaching budget limit
`"critical"`	At or over budget limit

`record_sla_quality(model, score)`

Record an observed quality score for a completed request. Used to track quality SLA compliance.

router.record_sla_quality("claude-sonnet-4-6", score=0.85)

`get_cost_optimizations(estimate_tokens) → List[Dict]`

Get actionable cost optimization suggestions based on current routing patterns.

suggestions = router.get_cost_optimizations(estimate_tokens=(100, 50))
# [
#   {
#     "suggestion": "Route 'moderate' prompts to gpt-4o-mini instead of claude-sonnet-4-6",
#     "estimated_savings_usd_per_day": 2.34,
#     "tradeoff": "Slightly lower quality for moderate tasks (est. -0.05 quality score)"
#   },
#   {
#     "suggestion": "Enable confidence-gated escalation to reduce expert-tier misrouting",
#     "estimated_savings_usd_per_day": 0.89,
#     "tradeoff": "Adds ~10ms classification overhead per request"
#   }
# ]

🏥 Provider Health Tracking

Track real-time health of each provider/model. Route around degraded providers automatically.

Recording Events

# After a successful call
router.record_provider_event(
    "claude-sonnet-4-6",
    event="success",
    latency_ms=245.0,
)

# After an error
router.record_provider_event(
    "claude-sonnet-4-6",
    event="error",
    details="rate_limited",
)

# After a timeout
router.record_provider_event("gpt-4o", event="timeout")

Event types:

Event	Description
`"success"`	Request completed successfully. `latency_ms` recorded.
`"error"`	Request failed. `details` string (e.g. `"rate_limited"`, `"context_exceeded"`)
`"timeout"`	Request timed out.

`get_provider_health(model) → Dict`

health = router.get_provider_health("claude-sonnet-4-6")
# {
#   "model": "claude-sonnet-4-6",
#   "status": "healthy",       # "healthy" | "degraded" | "down"
#   "success_rate_1h": 0.97,
#   "avg_latency_ms": 231.4,
#   "recent_errors": ["rate_limited"],
#   "last_seen": 1741500000.0  # Unix timestamp
# }

Status	Meaning
`"healthy"`	High success rate, normal latency
`"degraded"`	Elevated error rate or latency — still usable but non-preferred
`"down"`	No recent successes — excluded from routing

Health-Aware Routing

# Skip degraded/down providers entirely
decision = router.route("prompt", prefer_healthy=True)

When prefer_healthy=True:

Models with status "degraded" or "down" are skipped
Router falls through to next eligible model in cost order
If all eligible models are degraded, falls back to least-degraded option

Combining with auto_scale:

decision = router.route(
    "prompt",
    prefer_healthy=True,
    auto_scale=True,           # use fallback_chain when primary is unavailable
)

🧪 A/B Testing

Run controlled routing experiments to compare strategies — cost-optimized vs quality-first — with configurable traffic splits.

Creating an A/B Test

ab_test = router.create_ab_test(
    name="quality-vs-cost",
    strategy_a="cost_optimized",   # baseline strategy
    strategy_b="quality_first",    # bumps tier one level for B variant
    split=0.5,                     # 50/50 split; 0.3 = 30% to B
)

Parameter	Type	Description
`name`	`str`	Human-readable test name
`strategy_a`	`str`	Baseline strategy: `"cost_optimized"`
`strategy_b`	`str`	Experimental strategy: `"quality_first"` bumps tier by one level
`split`	`float`	Fraction of traffic routed to strategy B (0.0–1.0)

Running the Test

decision = router.route("Summarize the quarterly earnings report", ab_test=ab_test)

print(decision.ab_variant)     # "a" or "b"
print(decision.model)          # varies by variant

if decision.ab_variant == "b":
    # B variant gets one tier higher → more capable model
    print("Quality-first routing applied")

Strategies

Strategy	Behavior
`"cost_optimized"`	Standard routing — cheapest eligible model for detected tier
`"quality_first"`	Bumps detected tier up by one level (e.g. `moderate` → `complex`) for higher quality

Collecting Results

Track ab_variant alongside actual quality scores to measure the tradeoff:

# In your application
decision = router.route(prompt, ab_test=ab_test)
response = call_llm(decision.model, prompt)
quality = evaluate(response)

router.record_sla_quality(decision.model, quality)

# Store for analysis
results.append({
    "variant": decision.ab_variant,
    "model": decision.model,
    "cost": decision.estimated_cost,
    "quality": quality,
})

💰 Cost Forecasting

Project future LLM costs based on current routing distribution and expected traffic.

`forecast_cost(...) → Dict`

forecast = router.forecast_cost(
    requests_per_hour=1000,
    avg_input_tokens=500,
    avg_output_tokens=200,
)

Parameter	Type	Description
`requests_per_hour`	`int`	Expected request volume per hour
`avg_input_tokens`	`int`	Average input tokens per request
`avg_output_tokens`	`int`	Average output tokens per request

Returns:

# {
#   "hourly_cost_usd": 1.24,
#   "daily_cost_usd": 29.76,
#   "monthly_cost_usd": 892.80,
#   "breakdown_by_model": {
#     "gpt-4o-mini": {
#       "requests_pct": 0.45,
#       "cost_per_request_usd": 0.000105,
#       "hourly_cost_usd": 0.047
#     },
#     "claude-sonnet-4-6": {
#       "requests_pct": 0.40,
#       "cost_per_request_usd": 0.002100,
#       "hourly_cost_usd": 0.840
#     },
#     "claude-opus-4-6": {
#       "requests_pct": 0.15,
#       "cost_per_request_usd": 0.013500,
#       "hourly_cost_usd": 0.203
#     }
#   },
#   "optimization_tip": "Routing 10% of simple tasks from claude-sonnet-4-6 to gpt-4o-mini would save ~$4.20/day"
# }

Field	Description
`hourly_cost_usd`	Projected USD spend per hour
`daily_cost_usd`	Projected USD spend per day
`monthly_cost_usd`	Projected USD spend per month
`breakdown_by_model`	Per-model cost decomposition
`optimization_tip`	Actionable recommendation to reduce costs

Use forecasting to:

Set SLAConfig.budget_per_hour_usd based on realistic projections
Identify which models dominate cost
Plan budget before scaling traffic

📈 Cost Tracking & Analytics

`log_usage(decision, input_tokens, output_tokens) → float`

Log actual token usage for a completed request. Returns the actual cost in USD.

cost = router.log_usage(decision, input_tokens=500, output_tokens=200)
print(f"Request cost: ${cost:.6f}")

`cost_report(period) → Dict`

Aggregate cost report over a time period.

report = router.cost_report(period="week")    # "day" | "week" | "month"
# {
#   "period": "week",
#   "total_cost_usd": 42.18,
#   "by_model": {
#     "gpt-4o-mini": {"requests": 8420, "cost_usd": 3.14},
#     "claude-sonnet-4-6": {"requests": 3210, "cost_usd": 28.44},
#     "claude-opus-4-6": {"requests": 380, "cost_usd": 10.60}
#   },
#   "avg_cost_per_request_usd": 0.00351
# }

`savings_estimate(comparison_model) → Dict`

Calculate how much was saved by routing vs always using a reference model.

savings = router.savings_estimate(comparison_model="gpt-4o")
# {
#   "comparison_model": "gpt-4o",
#   "actual_cost_usd": 42.18,
#   "comparison_cost_usd": 187.40,
#   "savings_usd": 145.22,
#   "savings_pct": 0.775
# }

A savings_pct of 0.775 means the router saved 77.5% vs routing every request to gpt-4o.

`routing_analytics() → Dict`

Full aggregate analytics on routing decisions.

analytics = router.routing_analytics()
# {
#   "total_decisions": 12010,
#   "avg_confidence": 0.831,
#   "tier_distribution": {
#     "trivial": 1205, "simple": 3802, "moderate": 4510,
#     "complex": 2101, "expert": 392
#   },
#   "tier_percentages": {
#     "trivial": 10.0, "simple": 31.7, "moderate": 37.6,
#     "complex": 17.5, "expert": 3.3
#   },
#   "model_usage": {
#     "gpt-4o-mini": 5007,
#     "claude-sonnet-4-6": 5902,
#     "claude-opus-4-6": 1101
#   },
#   "provider_usage": {
#     "openai": 5007,
#     "anthropic": 7003
#   },
#   "most_used_model": "claude-sonnet-4-6",
#   "most_used_provider": "anthropic"
# }

🗂️ Model Registry

`get_model_info(model_name) → ModelInfo`

info = router.get_model_info("claude-sonnet-4-6")

info.name                                     # "claude-sonnet-4-6"
info.provider                                 # "anthropic"
info.cost_per_1k_input                        # 0.003
info.cost_per_1k_output                       # 0.015
info.capabilities                             # ["text", "code", "vision"]
info.max_tokens                               # 200000
info.supports_streaming                       # True
info.has_capability("vision")                 # True → bool
info.calculate_cost(500, 200)                 # → float: cost for 500 input + 200 output tokens

`ModelInfo` Fields

Field	Type	Description
`name`	`str`	Model identifier
`provider`	`str`	Provider: `"anthropic"`, `"openai"`, etc.
`cost_per_1k_input`	`float`	USD per 1,000 input tokens
`cost_per_1k_output`	`float`	USD per 1,000 output tokens
`capabilities`	`List[str]`	E.g. `["text", "code", "vision"]`
`max_tokens`	`int`	Maximum context window in tokens
`supports_streaming`	`bool`	Whether model supports streaming responses

`list_models_for_tier(tier) → List[Dict]`

List all models eligible for a given tier, ordered by cost.

models = router.list_models_for_tier("moderate")
# [
#   {"name": "gpt-4o-mini", "provider": "openai", "cost": 0.000105, "capabilities": [...], "max_tokens": 128000},
#   {"name": "claude-sonnet-4-6", "provider": "anthropic", "cost": 0.00210, "capabilities": [...], "max_tokens": 200000},
# ]

`save_state(path)`

Persist router state (cost history, health data, analytics) to disk.

router.save_state("./router_state")

🧠 SemanticClassifier (v2.0)

SemanticClassifier replaces the built-in keyword classifier with TF-IDF semantic classification. It can be injected into the v1.0 Router for semantic accuracy without migrating to AdaptiveRouter.

Usage

from antaris_router import SemanticClassifier, Router

sem = SemanticClassifier(workspace="./routing_data")
router = Router(classifier=sem)

decision = router.route("Design a microservices platform with event-driven architecture")

The SemanticClassifier persists its TF-IDF model to workspace/ and improves with each classified prompt. It is the same classifier used internally by AdaptiveRouter.

Constructor

sem = SemanticClassifier(workspace="./routing_data")

Parameter	Type	Description
`workspace`	`str`	Directory to persist TF-IDF model and vocabulary

How it Works

Tokenization — Prompt is tokenized and stopwords removed
TF-IDF vectorization — Term frequency × inverse document frequency weights computed
Tier classification — Vector compared to learned per-tier centroids
Confidence scoring — Distance to centroids determines confidence score
Feedback loop — report_outcome() adjusts centroid weights over time

Injecting into v1.0 Router

sem = SemanticClassifier(workspace="./routing_data")

router = Router(
    classifier=sem,
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
)

decision = router.route("Implement OAuth2 with PKCE in a distributed system")

This gives you semantic classification accuracy with all v1.0 features (SLA, health tracking, A/B testing, cost tracking).

📊 QualityTracker (v2.0)

QualityTracker stores per-prompt outcome data and model performance history. Used internally by AdaptiveRouter and available as a standalone component.

Usage

from antaris_router import QualityTracker

tracker = QualityTracker("./routing_data")

# Record an outcome
tracker.record_outcome(
    prompt_hash,          # str: from RoutingResult.prompt_hash
    quality_score=0.9,    # float: 0.0–1.0
    success=True,         # bool: did the request succeed
    model="claude-sonnet-4-6",
)

# Query model performance history
history = tracker.get_model_performance("claude-sonnet-4-6")
# {
#   "model": "claude-sonnet-4-6",
#   "avg_quality": 0.87,
#   "success_rate": 0.96,
#   "total_outcomes": 3820,
#   "quality_by_tier": {"simple": 0.91, "moderate": 0.88, "complex": 0.84}
# }

`record_outcome(prompt_hash, quality_score, success, model)`

Parameter	Type	Description
`prompt_hash`	`str`	SHA-256 hash from `RoutingResult.prompt_hash`
`quality_score`	`float`	Quality 0.0–1.0. Source: human rating, LLM eval, or downstream metric
`success`	`bool`	Whether the request succeeded (True) or errored (False)
`model`	`str`	Model name that handled the request

`get_model_performance(model) → Dict`

Aggregate quality history for a model across all tracked outcomes.

🔬 ClassificationResult & Signals

ClassificationResult is the raw output of the classifier, accessible via decision.classification.

decision = router.route("Build a Redis-backed distributed rate limiter in Go")

clf = decision.classification

clf.tier         # "complex"
clf.confidence   # 0.83
clf.signals      # dict (see below)

`ClassificationResult.signals`

clf.signals = {
    "length": 62,                           # raw character count
    "keyword_matches": {
        "trivial": 0,
        "simple": 1,
        "complex": 2,                       # matched "distributed", "rate limiter"
    },
    "has_code": False,                      # whether prompt contains code
    "code_indicators": 0,                   # count of code-related patterns
    "structural_complexity": 2,             # heuristic complexity score
}

Signal	Type	Description
`length`	`int`	Raw character count of the prompt
`keyword_matches`	`Dict[str, int]`	Per-tier keyword match counts
`has_code`	`bool`	Whether prompt contains code blocks or inline code
`code_indicators`	`int`	Count of code-related patterns (functions, syntax, etc.)
`structural_complexity`	`int`	Heuristic score: nesting, multi-part requests, etc.

📚 Full API Reference

`Router` Methods

Method	Signature	Description
`route`	`(prompt, context, prefer, min_tier, capability, estimate_tokens, ab_test, prefer_healthy, auto_scale) → RoutingDecision`	Route a prompt to the optimal model
`explain`	`(decision: RoutingDecision) → str`	Generate plain-English explanation of a routing decision
`log_usage`	`(decision, input_tokens, output_tokens) → float`	Log actual usage, returns cost in USD
`cost_report`	`(period: str) → Dict`	Aggregate cost report. Period: `"day"/"week"/"month"`
`savings_estimate`	`(comparison_model: str) → Dict`	Cost savings vs always using comparison model
`routing_analytics`	`() → Dict`	Full routing analytics (tiers, models, confidence)
`get_model_info`	`(model_name: str) → ModelInfo`	Model metadata from registry
`list_models_for_tier`	`(tier: str) → List[Dict]`	All eligible models for a tier
`save_state`	`(path: str)`	Persist router state to disk
`record_provider_event`	`(model, event, latency_ms, details) → None`	Record provider health event
`get_provider_health`	`(model: str) → Dict`	Current health status for a model
`create_ab_test`	`(name, strategy_a, strategy_b, split) → ABTest`	Create A/B test configuration
`forecast_cost`	`(requests_per_hour, avg_input_tokens, avg_output_tokens) → Dict`	Project future costs
`get_sla_report`	`(since_hours: float) → Dict`	SLA compliance report
`check_budget_alert`	`() → Dict`	Real-time budget status
`record_sla_quality`	`(model: str, score: float) → None`	Record quality score for SLA tracking
`get_cost_optimizations`	`(estimate_tokens: Tuple[int, int]) → List[Dict]`	Cost optimization suggestions

`AdaptiveRouter` Methods

Method	Signature	Description
`register_model`	`(config: ModelConfig) → None`	Register a model with tier range and costs
`route`	`(prompt: str) → RoutingResult`	Classify and route a prompt
`report_outcome`	`(prompt_hash, quality_score, success) → None`	Feed outcome back for self-improvement
`get_analytics`	`() → Dict`	Session-level routing analytics

📦 Complete Exports

from antaris_router import (
    # ── v2.0 API ──────────────────────────────────────────────────────────────
    AdaptiveRouter,       # Semantic, self-improving router
    RoutingResult,        # Result object from AdaptiveRouter.route()
    ModelConfig,          # Model registration config for AdaptiveRouter
    SemanticClassifier,   # TF-IDF classifier (injectable into v1.0 Router)
    SemanticResult,       # Result object from SemanticClassifier
    TFIDFVectorizer,      # Low-level TF-IDF vectorizer
    QualityTracker,       # Outcome feedback tracker
    QualityDecision,      # Quality decision record

    # ── v1.0 API ──────────────────────────────────────────────────────────────
    Router,               # Keyword-based production router
    RoutingDecision,      # Decision object from Router.route()
    TaskClassifier,       # Built-in keyword classifier
    ClassificationResult, # Classification output with signals
    ModelRegistry,        # Internal model registry
    ModelInfo,            # Model metadata object
    CostTracker,          # Cost tracking component
    UsageRecord,          # Per-request usage record
    Config,               # Router configuration

    # ── Sprint 5 — SLA ────────────────────────────────────────────────────────
    SLAConfig,            # SLA constraint configuration
    SLAMonitor,           # SLA enforcement monitor
    SLARecord,            # Per-request SLA record
)

🔄 Migration: v1.0 → v2.0

Feature	v1.0 `Router`	v2.0 `AdaptiveRouter`
Classification	Keyword matching	TF-IDF semantic
Self-improvement	❌	✅ via `report_outcome()`
Persistence	`save_state()`	Automatic to workspace
SLA enforcement	✅	❌ (use Router + SemanticClassifier)
Provider health	✅	❌ (use Router + SemanticClassifier)
A/B testing	✅	Built-in `ab_test_rate`
Cost tracking	✅	Basic (via analytics)
Explainability	✅ `explain()`	Via `result.confidence` + analytics
`RoutingDecision`	Full object	Lightweight `RoutingResult`

Recommended migration path:

# Option A: Full v2.0 — new project, accuracy-first
router = AdaptiveRouter("./routing_data")

# Option B: Best of both — semantic accuracy + full v1.0 features
sem = SemanticClassifier(workspace="./routing_data")
router = Router(
    classifier=sem,           # semantic classification
    sla=sla,                  # + SLA enforcement
    enable_cost_tracking=True # + cost tracking
)

Option B lets you adopt semantic classification incrementally without losing any v1.0 production features.

🧩 Advanced Patterns

Full Production Setup

from antaris_router import Router, SLAConfig, SemanticClassifier

sem = SemanticClassifier(workspace="./routing_data")

sla = SLAConfig(
    max_latency_ms=300,
    budget_per_hour_usd=10.00,
    min_quality_score=0.75,
    auto_escalate_on_breach=True,
)

router = Router(
    classifier=sem,
    sla=sla,
    fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
    enable_cost_tracking=True,
)

def route_and_call(prompt: str) -> str:
    decision = router.route(
        prompt,
        estimate_tokens=(len(prompt) // 4, 200),
        prefer_healthy=True,
        auto_scale=True,
    )

    # Log decision for audit
    print(router.explain(decision))

    # Call your LLM here
    response = call_llm(decision.model, prompt)

    # Record actual usage
    router.log_usage(decision, input_tokens=len(prompt)//4, output_tokens=len(response)//4)

    # Record provider health
    router.record_provider_event(decision.model, event="success", latency_ms=242.0)

    return response

Periodic Reporting

import time

# Every hour
while True:
    time.sleep(3600)

    report = router.get_sla_report(since_hours=1.0)
    alert = router.check_budget_alert()
    analytics = router.routing_analytics()

    print(f"SLA compliance: {report['compliance_rate']:.1%}")
    print(f"Budget: {alert['status']} ({alert['utilization']:.1%} used)")
    print(f"Most used model: {analytics['most_used_model']}")

    if alert["status"] == "critical":
        # Trigger alerts, adjust SLA config, etc.
        pass

    router.save_state("./router_state")

🏗️ Architecture

antaris-router
├── Router (v1.0)                 ← Production keyword-based router
│   ├── TaskClassifier            ← Built-in keyword classification
│   │   └── ClassificationResult  ← With signals: length, keywords, code
│   ├── ModelRegistry             ← Model metadata + capability index
│   ├── CostTracker               ← Per-session/period cost tracking
│   ├── SLAMonitor                ← Constraint enforcement + reporting
│   └── RoutingDecision           ← Full decision object
│
├── AdaptiveRouter (v2.0)         ← Self-improving semantic router
│   ├── SemanticClassifier        ← TF-IDF vectorizer + tier centroids
│   ├── TFIDFVectorizer           ← Low-level TF-IDF implementation
│   ├── QualityTracker            ← Outcome feedback + model performance
│   └── RoutingResult             ← Lightweight result object
│
└── Shared
    ├── SLAConfig                 ← SLA constraint definition
    ├── SLARecord                 ← Per-request SLA record
    └── ModelInfo                 ← Model metadata (costs, caps, streaming)

📄 License

Part of the antaris-suite — adaptive AI infrastructure for LLM cost optimization.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

5.0.1

Mar 10, 2026

4.9.20

Mar 8, 2026

4.9.18

Mar 7, 2026

4.9.17

Mar 7, 2026

4.9.16

Mar 6, 2026

4.9.15

Mar 6, 2026

4.9.14

Mar 5, 2026

4.9.13

Mar 5, 2026

4.9.12

Mar 5, 2026

4.9.11

Mar 5, 2026

4.9.10

Mar 4, 2026

4.9.5

Mar 3, 2026

4.9.4

Mar 3, 2026

4.9.3

Mar 3, 2026

4.9.2

Mar 3, 2026

4.9.1

Mar 3, 2026

4.9.0

Mar 3, 2026

4.8.0

Mar 3, 2026

4.7.1

Mar 3, 2026

4.7.0

Mar 3, 2026

4.6.8

Mar 2, 2026

4.6.6

Mar 2, 2026

4.6.5

Mar 2, 2026

4.6.0

Mar 2, 2026

4.5.3

Mar 1, 2026

4.5.2

Mar 1, 2026

4.2.0

Feb 27, 2026

4.1.0

Feb 21, 2026

4.0.3

Feb 26, 2026

4.0.1

Feb 23, 2026

4.0.0

Feb 21, 2026

3.3.0

Feb 21, 2026

3.0.1

Feb 20, 2026

3.0.0

Feb 19, 2026

2.0.0

Feb 16, 2026

0.3.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-5.0.1.tar.gz (104.1 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

antaris_router-5.0.1-py3-none-any.whl (68.1 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file antaris_router-5.0.1.tar.gz.

File metadata

Download URL: antaris_router-5.0.1.tar.gz
Upload date: Mar 10, 2026
Size: 104.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-5.0.1.tar.gz
Algorithm	Hash digest
SHA256	`60df728c38cddf5e0f8274e673b50f2aa660080bf1d61c19345b050253c843db`
MD5	`7b431281fb8501c450513b6ed91fc6b1`
BLAKE2b-256	`fb312dfba016cdac68751185dd0d2a760753ebacb1b013289080855a07457673`

See more details on using hashes here.

File details

Details for the file antaris_router-5.0.1-py3-none-any.whl.

File metadata

Download URL: antaris_router-5.0.1-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 68.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-5.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9dcdbf7c212198f9caec1f85ff505281ee4c53e2b2a0727aa52ce2f24c01f901`
MD5	`6f9b7ff0e77861d721f8bd37928a3fca`
BLAKE2b-256	`47c35a2dba5cad15d7ebc125a731ca1c60d190ec833732785f7c651bffe8e301`

See more details on using hashes here.

antaris-router 5.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

antaris-router ⚡

📦 Installation

🗺️ Table of Contents

🎯 Why antaris-router?

🏆 Tier System

⚡ Quick Start

v2.0 — AdaptiveRouter (recommended for new projects)

v1.0 — Router (production-proven, keyword-based)

🤖 v2.0 API — AdaptiveRouter (Semantic, Self-Improving)

Constructor

register_model(config: ModelConfig)

route(prompt: str) → RoutingResult

report_outcome(prompt_hash: str, quality_score: float, success: bool)

get_analytics() → Dict

🔧 v1.0 API — Router (Keyword-based, Production)

Constructor

route(...) → RoutingDecision

📋 RoutingDecision Fields

Complete Field Reference

to_dict() Output

🔍 Explainability — explain()

🚦 Confidence-Gated Escalation

Configuration

Escalation Strategies

Usage

Strategy: "ask" — Human-in-the-Loop

Escalation Decision Fields

📊 SLA Configuration & Enforcement

Setup

SLAConfig Parameters

Routing With SLA

get_sla_report(since_hours=1.0) → Dict

check_budget_alert() → Dict

record_sla_quality(model, score)

get_cost_optimizations(estimate_tokens) → List[Dict]

🏥 Provider Health Tracking

Recording Events

get_provider_health(model) → Dict

Health-Aware Routing

🧪 A/B Testing

Creating an A/B Test

Running the Test

Strategies

Collecting Results

💰 Cost Forecasting

forecast_cost(...) → Dict

📈 Cost Tracking & Analytics

log_usage(decision, input_tokens, output_tokens) → float

cost_report(period) → Dict

savings_estimate(comparison_model) → Dict

routing_analytics() → Dict

🗂️ Model Registry

get_model_info(model_name) → ModelInfo

ModelInfo Fields

list_models_for_tier(tier) → List[Dict]

save_state(path)

🧠 SemanticClassifier (v2.0)

Usage

Constructor

How it Works

Injecting into v1.0 Router

📊 QualityTracker (v2.0)

Usage

record_outcome(prompt_hash, quality_score, success, model)

get_model_performance(model) → Dict

🔬 ClassificationResult & Signals

ClassificationResult.signals

📚 Full API Reference

Router Methods

AdaptiveRouter Methods

📦 Complete Exports

`register_model(config: ModelConfig)`

`route(prompt: str) → RoutingResult`

`report_outcome(prompt_hash: str, quality_score: float, success: bool)`

`get_analytics() → Dict`

`route(...) → RoutingDecision`

`to_dict()` Output

🔍 Explainability — `explain()`

Strategy: `"ask"` — Human-in-the-Loop

`SLAConfig` Parameters

`get_sla_report(since_hours=1.0) → Dict`

`check_budget_alert() → Dict`

`record_sla_quality(model, score)`

`get_cost_optimizations(estimate_tokens) → List[Dict]`

`get_provider_health(model) → Dict`

`forecast_cost(...) → Dict`

`log_usage(decision, input_tokens, output_tokens) → float`

`cost_report(period) → Dict`

`savings_estimate(comparison_model) → Dict`

`routing_analytics() → Dict`

`get_model_info(model_name) → ModelInfo`

`ModelInfo` Fields

`list_models_for_tier(tier) → List[Dict]`

`save_state(path)`

`record_outcome(prompt_hash, quality_score, success, model)`

`get_model_performance(model) → Dict`

`ClassificationResult.signals`

`Router` Methods

`AdaptiveRouter` Methods