Skip to main content

File-based model router for LLM cost optimization. Zero dependencies.

Project description

antaris-router โšก

Adaptive LLM model routing for cost optimization โ€” zero dependencies, stdlib only.

Route every prompt to the right model at the right cost. antaris-router classifies task complexity, selects the optimal model from your registry, enforces SLAs, tracks provider health, and continuously improves routing quality through outcome feedback.

PyPI version Python Zero dependencies


๐Ÿ“ฆ Installation

pip install antaris-router

Version: 4.9.20 Dependencies: None โ€” pure Python stdlib only.


๐Ÿ—บ๏ธ Table of Contents

  1. Why antaris-router?
  2. Tier System
  3. Quick Start
  4. v2.0 API โ€” AdaptiveRouter (Semantic)
  5. v1.0 API โ€” Router (Keyword-based)
  6. RoutingDecision Fields
  7. Explainability โ€” explain()
  8. Confidence-Gated Escalation
  9. SLA Configuration & Enforcement
  10. Provider Health Tracking
  11. A/B Testing
  12. Cost Forecasting
  13. Cost Tracking & Analytics
  14. Model Registry
  15. SemanticClassifier (v2.0)
  16. QualityTracker (v2.0)
  17. ClassificationResult & Signals
  18. Full API Reference
  19. Complete Exports
  20. Migration: v1.0 โ†’ v2.0

๐ŸŽฏ Why antaris-router?

LLM costs are asymmetric. A one-line question routed to claude-opus wastes 50โ€“100ร— what it needs to. antaris-router fixes that:

Without routing With antaris-router
Every request โ†’ one expensive model Each request โ†’ cheapest capable model
No visibility into cost breakdown Real-time cost tracking + forecasting
Silent model failures Provider health tracking + auto-failover
Blind prompt-to-model mapping TF-IDF semantic classification (v2.0)
No quality signal loop Outcome feedback โ†’ self-improving routing

๐Ÿ† Tier System

antaris-router classifies every prompt into one of five complexity tiers. Each tier maps to a cost bracket, ensuring you always pay proportionally to task complexity.

Tier Char Range Typical Tasks Strategy
trivial โ‰ค 50 chars Simple Q&A, single-word lookups Cheapest model
simple 50โ€“200 chars Basic tasks, short explanations Low-cost model
moderate 200โ€“1,000 chars Standard tasks, multi-step answers Mid-tier model
complex 1,000โ€“3,000 chars Analysis, architecture, code review Powerful model
expert 3,000+ chars Highest complexity, long-form reasoning Most capable model

Tier boundaries are based on character count combined with keyword signals, code detection, and structural complexity analysis. The v2.0 AdaptiveRouter additionally uses TF-IDF semantic classification and improves tier accuracy over time through outcome feedback.


โšก Quick Start

v2.0 โ€” AdaptiveRouter (recommended for new projects)

from antaris_router import AdaptiveRouter, ModelConfig

router = AdaptiveRouter("./routing_data", ab_test_rate=0.05)

router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))
router.register_model(ModelConfig(
    name="claude-sonnet-4-6",
    tier_range=("simple", "complex"),
    cost_per_1k_input=0.003,
    cost_per_1k_output=0.015,
))
router.register_model(ModelConfig(
    name="claude-opus-4-6",
    tier_range=("complex", "expert"),
    cost_per_1k_input=0.015,
    cost_per_1k_output=0.075,
))

result = router.route("Implement a distributed task queue with priority scheduling")

print(result.model)          # "claude-sonnet-4-6"
print(result.tier)           # "complex"
print(result.confidence)     # 0.87
print(result.estimated_cost) # 0.00234

# Feed outcome back to improve future routing
router.report_outcome(result.prompt_hash, quality_score=0.9, success=True)

# Session analytics
analytics = router.get_analytics()
print(analytics)
# {
#   "total_routed": 42,
#   "tier_distribution": {"trivial": 5, "simple": 12, "moderate": 15, "complex": 8, "expert": 2},
#   "avg_quality": 0.88,
#   "model_usage": {"gpt-4o-mini": 17, "claude-sonnet-4-6": 21, "claude-opus-4-6": 4},
#   "cost_savings": 0.142
# }

v1.0 โ€” Router (production-proven, keyword-based)

from antaris_router import Router

router = Router(enable_cost_tracking=True)

decision = router.route("Explain async/await in Python with examples")
print(decision.model)    # "claude-sonnet-4-6"
print(decision.tier)     # "moderate"
print(decision.confidence) # 0.82

print(router.explain(decision))

๐Ÿค– v2.0 API โ€” AdaptiveRouter (Semantic, Self-Improving)

AdaptiveRouter is the next-generation router. It uses TF-IDF vectorization for semantic classification, learns from outcome feedback, and persists routing state across sessions.

Constructor

router = AdaptiveRouter(
    workspace="./routing_data",  # directory for persisted state
    ab_test_rate=0.05,           # fraction of routes used for A/B exploration (0.0โ€“1.0)
)
Parameter Type Default Description
workspace str required Path to directory for persisted routing data, quality history, and TF-IDF model
ab_test_rate float 0.05 Fraction of routing decisions used for A/B exploration. Set 0.0 to disable.

The workspace directory is created automatically if it doesn't exist.

register_model(config: ModelConfig)

Register a model with its tier range and cost parameters.

router.register_model(ModelConfig(
    name="gpt-4o-mini",
    tier_range=("trivial", "moderate"),   # (min_tier, max_tier)
    cost_per_1k_input=0.00015,
    cost_per_1k_output=0.0006,
))

ModelConfig fields:

Field Type Description
name str Model identifier (e.g. "gpt-4o-mini")
tier_range Tuple[str, str] (min_tier, max_tier) โ€” tiers this model handles
cost_per_1k_input float Cost in USD per 1K input tokens
cost_per_1k_output float Cost in USD per 1K output tokens

Tier range semantics: A model registered with tier_range=("simple", "complex") is eligible for simple, moderate, and complex prompts. The router selects the lowest-cost eligible model for each tier.

route(prompt: str) โ†’ RoutingResult

Classify the prompt and select the optimal model.

result = router.route("Summarize the following contract clause: ...")

Returns RoutingResult:

Field Type Description
model str Selected model name
tier str Classified complexity tier
confidence float Classification confidence (0.0โ€“1.0)
prompt_hash str SHA-256 hash of prompt (used for outcome feedback)
estimated_cost float Estimated cost in USD for this request

report_outcome(prompt_hash: str, quality_score: float, success: bool)

Feed outcome back to the router to improve future routing decisions. This is the core self-improvement loop.

router.report_outcome(
    result.prompt_hash,
    quality_score=0.9,  # 0.0โ€“1.0, how good the model's response was
    success=True,       # whether the request succeeded at all
)

The router uses outcome history to:

  • Detect tier misclassifications (e.g. a moderate prompt that consistently gets poor quality โ†’ escalate to complex)
  • Track per-model quality trends across tier assignments
  • Improve TF-IDF weights over time

get_analytics() โ†’ Dict

Aggregate routing stats for the current session.

analytics = router.get_analytics()
# {
#   "total_routed": int,
#   "tier_distribution": {"trivial": int, "simple": int, ...},
#   "avg_quality": float,
#   "model_usage": {"model-name": int, ...},
#   "cost_savings": float   # USD saved vs always using most capable model
# }

๐Ÿ”ง v1.0 API โ€” Router (Keyword-based, Production)

Router is the production-proven keyword-based router. Fully featured with SLA enforcement, confidence-gated escalation, provider health tracking, A/B testing, and cost forecasting. Use this for stability; use AdaptiveRouter for semantic accuracy.

Constructor

from antaris_router import Router, SLAConfig

router = Router(
    config_path=None,                  # optional path to JSON config file
    enable_cost_tracking=True,         # track per-model cost usage
    low_confidence_threshold=0.0,      # 0.0 = never escalate (default)
    escalation_model=None,             # model to escalate to when confidence is low
    escalation_strategy="always",      # "always" | "log_only" | "ask"
    sla=None,                          # SLAConfig instance
    fallback_chain=None,               # ordered list of fallback model names
    classifier=None,                   # inject custom classifier (e.g. SemanticClassifier)
)
Parameter Type Default Description
config_path str | None None Path to JSON config file. If None, uses built-in defaults.
enable_cost_tracking bool True Track cost per model, per session. Required for cost_report(), savings_estimate().
low_confidence_threshold float 0.0 Confidence below this triggers escalation. 0.0 = disabled.
escalation_model str | None None Model name to escalate to on low confidence.
escalation_strategy str "always" Escalation behavior: "always" swaps model, "log_only" logs but keeps model, "ask" signals user to confirm.
sla SLAConfig | None None SLA constraints to enforce during routing.
fallback_chain List[str] | None None Ordered fallback models for auto_scale=True.
classifier object | None None Custom classifier to inject (e.g. SemanticClassifier). Replaces built-in keyword classifier.

route(...) โ†’ RoutingDecision

Route a prompt to the optimal model.

decision = router.route(
    prompt="text to route",            # required
    context=None,                      # optional: additional context dict
    prefer=None,                       # preferred provider: "claude" | "openai" | etc.
    min_tier=None,                     # minimum tier floor: "simple"|"moderate"|"complex"|"expert"
    capability=None,                   # required capability: "vision"|"code"|etc.
    estimate_tokens=(100, 50),         # (input_tokens, output_tokens) for cost estimation
    ab_test=None,                      # A/B test config from create_ab_test()
    prefer_healthy=False,              # skip degraded/rate-limited providers
    auto_scale=False,                  # fall back through fallback_chain if primary is degraded or over-budget
)
Parameter Type Default Description
prompt str required The prompt text to classify and route
context dict | None None Additional context for routing decisions
prefer str | None None Preferred provider name. Router respects this if an eligible model exists.
min_tier str | None None Force minimum complexity tier. E.g. "complex" ensures at least a complex-tier model.
capability str | None None Required model capability. Only models with this capability are considered.
estimate_tokens Tuple[int, int] (100, 50) (input_tokens, output_tokens) used for cost estimation in decision.estimated_cost.
ab_test ABTest | None None A/B test object from create_ab_test(). Enables variant-based routing.
prefer_healthy bool False If True, degraded or down providers are skipped. Falls through to next eligible model.
auto_scale bool False If True and primary model is degraded or over-budget, routes through fallback_chain in order.

๐Ÿ“‹ RoutingDecision Fields

Every router.route() call returns a RoutingDecision object with full decision transparency.

decision = router.route("Design a microservices platform for high-throughput event processing")

decision.model              # str: "claude-sonnet-4-6"
decision.provider           # str: "anthropic"
decision.tier               # str: "complex"
decision.confidence         # float: 0.85
decision.reasoning          # List[str]: ["Input length 1,250 chars โ†’ complex range", ...]
decision.estimated_cost     # float: 0.00525 (USD)
decision.fallback_models    # List[str]: ["claude-opus-4-6", "gpt-4o"]
decision.classification     # ClassificationResult object
decision.confidence_basis   # str: "keyword_density" | "composite" | "rule_based"
decision.evidence           # List[str]: human-readable decision signals
decision.escalated          # bool: True if escalation changed the model
decision.original_confidence  # float: pre-escalation confidence (if escalated)
decision.escalation_reason  # str: why escalation triggered (if escalated)
decision.ab_variant         # str: "a" | "b" if A/B test active
decision.explanation        # str: full human-readable explanation
decision.supports_streaming # bool: whether selected model supports streaming
decision.sla_compliant      # bool: whether decision satisfies all SLA constraints
decision.sla_breaches       # List[str]: e.g. ["latency_exceeded", "budget_exceeded"]
decision.sla_adjustments    # List[str]: e.g. ["routed_to_cheaper_model_due_to_budget_sla"]

decision.selected_model     # property alias for decision.model
decision.to_dict()          # Dict: all fields serialized to a plain dict

Complete Field Reference

Field Type Description
model str Name of the selected model
provider str Provider name: "anthropic", "openai", etc.
tier str Complexity tier: trivial/simple/moderate/complex/expert
confidence float Classification confidence 0.0โ€“1.0
reasoning List[str] Ordered list of reasons why this model was chosen
estimated_cost float Estimated USD cost for this specific request
fallback_models List[str] Ordered list of alternative models considered
classification ClassificationResult Raw classification output including signals
confidence_basis str How confidence was computed: "keyword_density", "composite", "rule_based"
evidence List[str] Human-readable signals that drove the decision
escalated bool True if escalation logic overrode the original model selection
original_confidence float Confidence before escalation (populated only when escalated=True)
escalation_reason str Human-readable reason escalation triggered
ab_variant str "a" or "b" when an A/B test is active, "" otherwise
explanation str Full plain-English explanation of the routing decision
supports_streaming bool Whether the selected model supports streaming responses
sla_compliant bool Whether the decision satisfies all active SLA constraints
sla_breaches List[str] Which SLA constraints were breached (if any)
sla_adjustments List[str] Routing adjustments made to satisfy SLA constraints
selected_model property Alias for model

to_dict() Output

d = decision.to_dict()
# {
#   "model": "claude-sonnet-4-6",
#   "provider": "anthropic",
#   "tier": "complex",
#   "confidence": 0.85,
#   "reasoning": [...],
#   "estimated_cost": 0.00525,
#   "fallback_models": [...],
#   "confidence_basis": "keyword_density",
#   "evidence": [...],
#   "escalated": False,
#   "original_confidence": 0.0,
#   "escalation_reason": "",
#   "ab_variant": "",
#   "explanation": "Model selected: claude-sonnet-4-6 ...",
#   "supports_streaming": True,
#   "sla_compliant": True,
#   "sla_breaches": [],
#   "sla_adjustments": []
# }

๐Ÿ” Explainability โ€” explain()

Every routing decision can be explained in plain English. Use explain() for debugging, auditing, or displaying routing logic to users.

explanation = router.explain(decision)
print(explanation)

Example output:

Model selected: claude-sonnet-4-6 (confidence: 85%)
Basis: keyword density
Reasoning: Input classified as 'complex' task (85% confidence). Length 1,250 chars falls in
complex range (1,000โ€“3,000). Strong signal keywords detected: "microservices", "architecture",
"distributed".
Estimated cost: $0.003000 per 1K tokens (this request: $0.005250).
Evidence: length: 1250 chars โ†’ complex range (โ‰ค3000), keyword match: 3 'complex'-tier keywords
(microservices, architecture, distributed), structural_complexity: 2
Alternatives considered: claude-opus-4-6 (more capable, 5.0x cost), gpt-4o-mini (cheaper, reduced quality)

When escalation occurred:

Model selected: claude-opus-4-6 (confidence: 45%)
[Escalated from original confidence 45%: Low confidence below threshold 0.60. Original model: claude-sonnet-4-6]
Basis: composite
Reasoning: Input classified as 'moderate' task (45% confidence)...

explain() sections:

Section Always shown Description
Model selected: X (confidence: Y%) โœ… Selected model and final confidence
[Escalated from ...] Only if escalated Pre-escalation state and trigger reason
Basis: X โœ… Confidence computation method
Reasoning: ... โœ… Human-readable classification narrative
Estimated cost: ... โœ… Per-1K and per-request cost
Evidence: ... โœ… Raw signals that drove classification
Alternatives considered: ... โœ… Other models with relative cost factor

๐Ÿšฆ Confidence-Gated Escalation

When the classifier is uncertain about a prompt's complexity, antaris-router can automatically escalate to a more capable model rather than risk a low-quality response.

Configuration

router = Router(
    low_confidence_threshold=0.6,          # escalate if confidence < 0.6
    escalation_model="claude-opus-4-6",    # which model to escalate to
    escalation_strategy="always",          # escalation behavior
)

Escalation Strategies

Strategy Behavior Use Case
"always" Replaces selected model with escalation_model Production: trust the router's escalation
"log_only" Logs the low-confidence event, keeps original model Monitoring: observe without changing behavior
"ask" Sets decision.escalated=True + escalation_reason, keeps original model Human-in-the-loop: surface uncertainty to user

Usage

router = Router(
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
)

decision = router.route("What does this cryptic error mean in this context?")

if decision.escalated:
    print(f"Escalated! Original confidence: {decision.original_confidence:.2f}")
    print(f"Reason: {decision.escalation_reason}")
    print(f"Using: {decision.model}")  # claude-opus-4-6

Strategy: "ask" โ€” Human-in-the-Loop

When escalation_strategy="ask", the router signals uncertainty without changing the model. Use this to prompt users to confirm the routing decision:

router = Router(
    low_confidence_threshold=0.65,
    escalation_model="claude-opus-4-6",
    escalation_strategy="ask",
)

decision = router.route("some ambiguous prompt")

if decision.escalated:
    # Present choice to user
    print(f"Router is uncertain (confidence: {decision.original_confidence:.0%}).")
    print(f"Suggested escalation: {decision.escalation_reason}")
    print(f"Upgrade to claude-opus-4-6? Current model: {decision.model}")

Escalation Decision Fields

When decision.escalated is True:

decision.escalated           # True
decision.original_confidence # e.g. 0.48 โ€” confidence before escalation
decision.escalation_reason   # e.g. "Low confidence below threshold 0.60. Original model: claude-sonnet-4-6"
decision.model               # escalation_model (if strategy="always"), else original model

๐Ÿ“Š SLA Configuration & Enforcement

antaris-router enforces Service Level Agreements on latency, budget, and response quality. When constraints are breached, the router adjusts model selection automatically.

Setup

from antaris_router import Router, SLAConfig

sla = SLAConfig(
    max_latency_ms=200,           # max acceptable latency per request
    budget_per_hour_usd=5.00,     # hourly spend cap in USD
    min_quality_score=0.7,        # minimum acceptable quality (0.0โ€“1.0)
    auto_escalate_on_breach=True, # automatically adjust routing on SLA breach
)

router = Router(
    sla=sla,
    fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
)

SLAConfig Parameters

Parameter Type Description
max_latency_ms float Maximum acceptable request latency in milliseconds
budget_per_hour_usd float Maximum spend per hour in USD
min_quality_score float Minimum acceptable quality score (0.0โ€“1.0)
auto_escalate_on_breach bool If True, router adjusts model selection to restore SLA compliance

Routing With SLA

decision = router.route("prompt", auto_scale=True)

# SLA compliance info on every decision
print(decision.sla_compliant)    # True / False
print(decision.sla_breaches)     # ["budget_exceeded", "latency_exceeded"]
print(decision.sla_adjustments)  # ["routed_to_cheaper_model_due_to_budget_sla"]

get_sla_report(since_hours=1.0) โ†’ Dict

Aggregate SLA compliance report over a time window.

report = router.get_sla_report(since_hours=1.0)
# {
#   "compliance_rate": 0.94,
#   "breaches": {
#     "latency": 3,
#     "cost": 1,
#     "quality": 2
#   },
#   "adjustments_made": 4,
#   "cost_savings_usd": 0.87,
#   "avg_latency_ms": 142.3,
#   "budget_utilization": 0.68,
#   "total_requests": 150
# }
Field Description
compliance_rate Fraction of requests fully SLA-compliant (0.0โ€“1.0)
breaches.latency Count of latency SLA breaches
breaches.cost Count of budget SLA breaches
breaches.quality Count of quality SLA breaches
adjustments_made Count of routing adjustments made to restore SLA compliance
cost_savings_usd USD saved through SLA-driven model downgrade
avg_latency_ms Average request latency over the window
budget_utilization Fraction of hourly budget consumed (0.0โ€“1.0)
total_requests Total requests in the time window

check_budget_alert() โ†’ Dict

Real-time budget status and spend projection.

alert = router.check_budget_alert()
# {
#   "status": "warning",            # "ok" | "warning" | "critical"
#   "hourly_spend_usd": 3.42,
#   "budget_usd": 5.00,
#   "utilization": 0.684,
#   "projected_hourly_usd": 4.89,
#   "recommendation": "Consider routing moderate tasks to gpt-4o-mini to reduce spend"
# }
Status Trigger
"ok" Utilization below warning threshold
"warning" Approaching budget limit
"critical" At or over budget limit

record_sla_quality(model, score)

Record an observed quality score for a completed request. Used to track quality SLA compliance.

router.record_sla_quality("claude-sonnet-4-6", score=0.85)

get_cost_optimizations(estimate_tokens) โ†’ List[Dict]

Get actionable cost optimization suggestions based on current routing patterns.

suggestions = router.get_cost_optimizations(estimate_tokens=(100, 50))
# [
#   {
#     "suggestion": "Route 'moderate' prompts to gpt-4o-mini instead of claude-sonnet-4-6",
#     "estimated_savings_usd_per_day": 2.34,
#     "tradeoff": "Slightly lower quality for moderate tasks (est. -0.05 quality score)"
#   },
#   {
#     "suggestion": "Enable confidence-gated escalation to reduce expert-tier misrouting",
#     "estimated_savings_usd_per_day": 0.89,
#     "tradeoff": "Adds ~10ms classification overhead per request"
#   }
# ]

๐Ÿฅ Provider Health Tracking

Track real-time health of each provider/model. Route around degraded providers automatically.

Recording Events

# After a successful call
router.record_provider_event(
    "claude-sonnet-4-6",
    event="success",
    latency_ms=245.0,
)

# After an error
router.record_provider_event(
    "claude-sonnet-4-6",
    event="error",
    details="rate_limited",
)

# After a timeout
router.record_provider_event("gpt-4o", event="timeout")

Event types:

Event Description
"success" Request completed successfully. latency_ms recorded.
"error" Request failed. details string (e.g. "rate_limited", "context_exceeded")
"timeout" Request timed out.

get_provider_health(model) โ†’ Dict

health = router.get_provider_health("claude-sonnet-4-6")
# {
#   "model": "claude-sonnet-4-6",
#   "status": "healthy",       # "healthy" | "degraded" | "down"
#   "success_rate_1h": 0.97,
#   "avg_latency_ms": 231.4,
#   "recent_errors": ["rate_limited"],
#   "last_seen": 1741500000.0  # Unix timestamp
# }
Status Meaning
"healthy" High success rate, normal latency
"degraded" Elevated error rate or latency โ€” still usable but non-preferred
"down" No recent successes โ€” excluded from routing

Health-Aware Routing

# Skip degraded/down providers entirely
decision = router.route("prompt", prefer_healthy=True)

When prefer_healthy=True:

  • Models with status "degraded" or "down" are skipped
  • Router falls through to next eligible model in cost order
  • If all eligible models are degraded, falls back to least-degraded option

Combining with auto_scale:

decision = router.route(
    "prompt",
    prefer_healthy=True,
    auto_scale=True,           # use fallback_chain when primary is unavailable
)

๐Ÿงช A/B Testing

Run controlled routing experiments to compare strategies โ€” cost-optimized vs quality-first โ€” with configurable traffic splits.

Creating an A/B Test

ab_test = router.create_ab_test(
    name="quality-vs-cost",
    strategy_a="cost_optimized",   # baseline strategy
    strategy_b="quality_first",    # bumps tier one level for B variant
    split=0.5,                     # 50/50 split; 0.3 = 30% to B
)
Parameter Type Description
name str Human-readable test name
strategy_a str Baseline strategy: "cost_optimized"
strategy_b str Experimental strategy: "quality_first" bumps tier by one level
split float Fraction of traffic routed to strategy B (0.0โ€“1.0)

Running the Test

decision = router.route("Summarize the quarterly earnings report", ab_test=ab_test)

print(decision.ab_variant)     # "a" or "b"
print(decision.model)          # varies by variant

if decision.ab_variant == "b":
    # B variant gets one tier higher โ†’ more capable model
    print("Quality-first routing applied")

Strategies

Strategy Behavior
"cost_optimized" Standard routing โ€” cheapest eligible model for detected tier
"quality_first" Bumps detected tier up by one level (e.g. moderate โ†’ complex) for higher quality

Collecting Results

Track ab_variant alongside actual quality scores to measure the tradeoff:

# In your application
decision = router.route(prompt, ab_test=ab_test)
response = call_llm(decision.model, prompt)
quality = evaluate(response)

router.record_sla_quality(decision.model, quality)

# Store for analysis
results.append({
    "variant": decision.ab_variant,
    "model": decision.model,
    "cost": decision.estimated_cost,
    "quality": quality,
})

๐Ÿ’ฐ Cost Forecasting

Project future LLM costs based on current routing distribution and expected traffic.

forecast_cost(...) โ†’ Dict

forecast = router.forecast_cost(
    requests_per_hour=1000,
    avg_input_tokens=500,
    avg_output_tokens=200,
)
Parameter Type Description
requests_per_hour int Expected request volume per hour
avg_input_tokens int Average input tokens per request
avg_output_tokens int Average output tokens per request

Returns:

# {
#   "hourly_cost_usd": 1.24,
#   "daily_cost_usd": 29.76,
#   "monthly_cost_usd": 892.80,
#   "breakdown_by_model": {
#     "gpt-4o-mini": {
#       "requests_pct": 0.45,
#       "cost_per_request_usd": 0.000105,
#       "hourly_cost_usd": 0.047
#     },
#     "claude-sonnet-4-6": {
#       "requests_pct": 0.40,
#       "cost_per_request_usd": 0.002100,
#       "hourly_cost_usd": 0.840
#     },
#     "claude-opus-4-6": {
#       "requests_pct": 0.15,
#       "cost_per_request_usd": 0.013500,
#       "hourly_cost_usd": 0.203
#     }
#   },
#   "optimization_tip": "Routing 10% of simple tasks from claude-sonnet-4-6 to gpt-4o-mini would save ~$4.20/day"
# }
Field Description
hourly_cost_usd Projected USD spend per hour
daily_cost_usd Projected USD spend per day
monthly_cost_usd Projected USD spend per month
breakdown_by_model Per-model cost decomposition
optimization_tip Actionable recommendation to reduce costs

Use forecasting to:

  • Set SLAConfig.budget_per_hour_usd based on realistic projections
  • Identify which models dominate cost
  • Plan budget before scaling traffic

๐Ÿ“ˆ Cost Tracking & Analytics

log_usage(decision, input_tokens, output_tokens) โ†’ float

Log actual token usage for a completed request. Returns the actual cost in USD.

cost = router.log_usage(decision, input_tokens=500, output_tokens=200)
print(f"Request cost: ${cost:.6f}")

cost_report(period) โ†’ Dict

Aggregate cost report over a time period.

report = router.cost_report(period="week")    # "day" | "week" | "month"
# {
#   "period": "week",
#   "total_cost_usd": 42.18,
#   "by_model": {
#     "gpt-4o-mini": {"requests": 8420, "cost_usd": 3.14},
#     "claude-sonnet-4-6": {"requests": 3210, "cost_usd": 28.44},
#     "claude-opus-4-6": {"requests": 380, "cost_usd": 10.60}
#   },
#   "avg_cost_per_request_usd": 0.00351
# }

savings_estimate(comparison_model) โ†’ Dict

Calculate how much was saved by routing vs always using a reference model.

savings = router.savings_estimate(comparison_model="gpt-4o")
# {
#   "comparison_model": "gpt-4o",
#   "actual_cost_usd": 42.18,
#   "comparison_cost_usd": 187.40,
#   "savings_usd": 145.22,
#   "savings_pct": 0.775
# }

A savings_pct of 0.775 means the router saved 77.5% vs routing every request to gpt-4o.

routing_analytics() โ†’ Dict

Full aggregate analytics on routing decisions.

analytics = router.routing_analytics()
# {
#   "total_decisions": 12010,
#   "avg_confidence": 0.831,
#   "tier_distribution": {
#     "trivial": 1205, "simple": 3802, "moderate": 4510,
#     "complex": 2101, "expert": 392
#   },
#   "tier_percentages": {
#     "trivial": 10.0, "simple": 31.7, "moderate": 37.6,
#     "complex": 17.5, "expert": 3.3
#   },
#   "model_usage": {
#     "gpt-4o-mini": 5007,
#     "claude-sonnet-4-6": 5902,
#     "claude-opus-4-6": 1101
#   },
#   "provider_usage": {
#     "openai": 5007,
#     "anthropic": 7003
#   },
#   "most_used_model": "claude-sonnet-4-6",
#   "most_used_provider": "anthropic"
# }

๐Ÿ—‚๏ธ Model Registry

get_model_info(model_name) โ†’ ModelInfo

info = router.get_model_info("claude-sonnet-4-6")

info.name                                     # "claude-sonnet-4-6"
info.provider                                 # "anthropic"
info.cost_per_1k_input                        # 0.003
info.cost_per_1k_output                       # 0.015
info.capabilities                             # ["text", "code", "vision"]
info.max_tokens                               # 200000
info.supports_streaming                       # True
info.has_capability("vision")                 # True โ†’ bool
info.calculate_cost(500, 200)                 # โ†’ float: cost for 500 input + 200 output tokens

ModelInfo Fields

Field Type Description
name str Model identifier
provider str Provider: "anthropic", "openai", etc.
cost_per_1k_input float USD per 1,000 input tokens
cost_per_1k_output float USD per 1,000 output tokens
capabilities List[str] E.g. ["text", "code", "vision"]
max_tokens int Maximum context window in tokens
supports_streaming bool Whether model supports streaming responses

list_models_for_tier(tier) โ†’ List[Dict]

List all models eligible for a given tier, ordered by cost.

models = router.list_models_for_tier("moderate")
# [
#   {"name": "gpt-4o-mini", "provider": "openai", "cost": 0.000105, "capabilities": [...], "max_tokens": 128000},
#   {"name": "claude-sonnet-4-6", "provider": "anthropic", "cost": 0.00210, "capabilities": [...], "max_tokens": 200000},
# ]

save_state(path)

Persist router state (cost history, health data, analytics) to disk.

router.save_state("./router_state")

๐Ÿง  SemanticClassifier (v2.0)

SemanticClassifier replaces the built-in keyword classifier with TF-IDF semantic classification. It can be injected into the v1.0 Router for semantic accuracy without migrating to AdaptiveRouter.

Usage

from antaris_router import SemanticClassifier, Router

sem = SemanticClassifier(workspace="./routing_data")
router = Router(classifier=sem)

decision = router.route("Design a microservices platform with event-driven architecture")

The SemanticClassifier persists its TF-IDF model to workspace/ and improves with each classified prompt. It is the same classifier used internally by AdaptiveRouter.

Constructor

sem = SemanticClassifier(workspace="./routing_data")
Parameter Type Description
workspace str Directory to persist TF-IDF model and vocabulary

How it Works

  1. Tokenization โ€” Prompt is tokenized and stopwords removed
  2. TF-IDF vectorization โ€” Term frequency ร— inverse document frequency weights computed
  3. Tier classification โ€” Vector compared to learned per-tier centroids
  4. Confidence scoring โ€” Distance to centroids determines confidence score
  5. Feedback loop โ€” report_outcome() adjusts centroid weights over time

Injecting into v1.0 Router

sem = SemanticClassifier(workspace="./routing_data")

router = Router(
    classifier=sem,
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
)

decision = router.route("Implement OAuth2 with PKCE in a distributed system")

This gives you semantic classification accuracy with all v1.0 features (SLA, health tracking, A/B testing, cost tracking).


๐Ÿ“Š QualityTracker (v2.0)

QualityTracker stores per-prompt outcome data and model performance history. Used internally by AdaptiveRouter and available as a standalone component.

Usage

from antaris_router import QualityTracker

tracker = QualityTracker("./routing_data")

# Record an outcome
tracker.record_outcome(
    prompt_hash,          # str: from RoutingResult.prompt_hash
    quality_score=0.9,    # float: 0.0โ€“1.0
    success=True,         # bool: did the request succeed
    model="claude-sonnet-4-6",
)

# Query model performance history
history = tracker.get_model_performance("claude-sonnet-4-6")
# {
#   "model": "claude-sonnet-4-6",
#   "avg_quality": 0.87,
#   "success_rate": 0.96,
#   "total_outcomes": 3820,
#   "quality_by_tier": {"simple": 0.91, "moderate": 0.88, "complex": 0.84}
# }

record_outcome(prompt_hash, quality_score, success, model)

Parameter Type Description
prompt_hash str SHA-256 hash from RoutingResult.prompt_hash
quality_score float Quality 0.0โ€“1.0. Source: human rating, LLM eval, or downstream metric
success bool Whether the request succeeded (True) or errored (False)
model str Model name that handled the request

get_model_performance(model) โ†’ Dict

Aggregate quality history for a model across all tracked outcomes.


๐Ÿ”ฌ ClassificationResult & Signals

ClassificationResult is the raw output of the classifier, accessible via decision.classification.

decision = router.route("Build a Redis-backed distributed rate limiter in Go")

clf = decision.classification

clf.tier         # "complex"
clf.confidence   # 0.83
clf.signals      # dict (see below)

ClassificationResult.signals

clf.signals = {
    "length": 62,                           # raw character count
    "keyword_matches": {
        "trivial": 0,
        "simple": 1,
        "complex": 2,                       # matched "distributed", "rate limiter"
    },
    "has_code": False,                      # whether prompt contains code
    "code_indicators": 0,                   # count of code-related patterns
    "structural_complexity": 2,             # heuristic complexity score
}
Signal Type Description
length int Raw character count of the prompt
keyword_matches Dict[str, int] Per-tier keyword match counts
has_code bool Whether prompt contains code blocks or inline code
code_indicators int Count of code-related patterns (functions, syntax, etc.)
structural_complexity int Heuristic score: nesting, multi-part requests, etc.

๐Ÿ“š Full API Reference

Router Methods

Method Signature Description
route (prompt, context, prefer, min_tier, capability, estimate_tokens, ab_test, prefer_healthy, auto_scale) โ†’ RoutingDecision Route a prompt to the optimal model
explain (decision: RoutingDecision) โ†’ str Generate plain-English explanation of a routing decision
log_usage (decision, input_tokens, output_tokens) โ†’ float Log actual usage, returns cost in USD
cost_report (period: str) โ†’ Dict Aggregate cost report. Period: "day"/"week"/"month"
savings_estimate (comparison_model: str) โ†’ Dict Cost savings vs always using comparison model
routing_analytics () โ†’ Dict Full routing analytics (tiers, models, confidence)
get_model_info (model_name: str) โ†’ ModelInfo Model metadata from registry
list_models_for_tier (tier: str) โ†’ List[Dict] All eligible models for a tier
save_state (path: str) Persist router state to disk
record_provider_event (model, event, latency_ms, details) โ†’ None Record provider health event
get_provider_health (model: str) โ†’ Dict Current health status for a model
create_ab_test (name, strategy_a, strategy_b, split) โ†’ ABTest Create A/B test configuration
forecast_cost (requests_per_hour, avg_input_tokens, avg_output_tokens) โ†’ Dict Project future costs
get_sla_report (since_hours: float) โ†’ Dict SLA compliance report
check_budget_alert () โ†’ Dict Real-time budget status
record_sla_quality (model: str, score: float) โ†’ None Record quality score for SLA tracking
get_cost_optimizations (estimate_tokens: Tuple[int, int]) โ†’ List[Dict] Cost optimization suggestions

AdaptiveRouter Methods

Method Signature Description
register_model (config: ModelConfig) โ†’ None Register a model with tier range and costs
route (prompt: str) โ†’ RoutingResult Classify and route a prompt
report_outcome (prompt_hash, quality_score, success) โ†’ None Feed outcome back for self-improvement
get_analytics () โ†’ Dict Session-level routing analytics

๐Ÿ“ฆ Complete Exports

from antaris_router import (
    # โ”€โ”€ v2.0 API โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    AdaptiveRouter,       # Semantic, self-improving router
    RoutingResult,        # Result object from AdaptiveRouter.route()
    ModelConfig,          # Model registration config for AdaptiveRouter
    SemanticClassifier,   # TF-IDF classifier (injectable into v1.0 Router)
    SemanticResult,       # Result object from SemanticClassifier
    TFIDFVectorizer,      # Low-level TF-IDF vectorizer
    QualityTracker,       # Outcome feedback tracker
    QualityDecision,      # Quality decision record

    # โ”€โ”€ v1.0 API โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    Router,               # Keyword-based production router
    RoutingDecision,      # Decision object from Router.route()
    TaskClassifier,       # Built-in keyword classifier
    ClassificationResult, # Classification output with signals
    ModelRegistry,        # Internal model registry
    ModelInfo,            # Model metadata object
    CostTracker,          # Cost tracking component
    UsageRecord,          # Per-request usage record
    Config,               # Router configuration

    # โ”€โ”€ Sprint 5 โ€” SLA โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    SLAConfig,            # SLA constraint configuration
    SLAMonitor,           # SLA enforcement monitor
    SLARecord,            # Per-request SLA record
)

๐Ÿ”„ Migration: v1.0 โ†’ v2.0

Feature v1.0 Router v2.0 AdaptiveRouter
Classification Keyword matching TF-IDF semantic
Self-improvement โŒ โœ… via report_outcome()
Persistence save_state() Automatic to workspace
SLA enforcement โœ… โŒ (use Router + SemanticClassifier)
Provider health โœ… โŒ (use Router + SemanticClassifier)
A/B testing โœ… Built-in ab_test_rate
Cost tracking โœ… Basic (via analytics)
Explainability โœ… explain() Via result.confidence + analytics
RoutingDecision Full object Lightweight RoutingResult

Recommended migration path:

# Option A: Full v2.0 โ€” new project, accuracy-first
router = AdaptiveRouter("./routing_data")

# Option B: Best of both โ€” semantic accuracy + full v1.0 features
sem = SemanticClassifier(workspace="./routing_data")
router = Router(
    classifier=sem,           # semantic classification
    sla=sla,                  # + SLA enforcement
    enable_cost_tracking=True # + cost tracking
)

Option B lets you adopt semantic classification incrementally without losing any v1.0 production features.


๐Ÿงฉ Advanced Patterns

Full Production Setup

from antaris_router import Router, SLAConfig, SemanticClassifier

sem = SemanticClassifier(workspace="./routing_data")

sla = SLAConfig(
    max_latency_ms=300,
    budget_per_hour_usd=10.00,
    min_quality_score=0.75,
    auto_escalate_on_breach=True,
)

router = Router(
    classifier=sem,
    sla=sla,
    fallback_chain=["claude-sonnet-4-6", "claude-haiku-3-5", "gpt-4o-mini"],
    low_confidence_threshold=0.6,
    escalation_model="claude-opus-4-6",
    escalation_strategy="always",
    enable_cost_tracking=True,
)

def route_and_call(prompt: str) -> str:
    decision = router.route(
        prompt,
        estimate_tokens=(len(prompt) // 4, 200),
        prefer_healthy=True,
        auto_scale=True,
    )

    # Log decision for audit
    print(router.explain(decision))

    # Call your LLM here
    response = call_llm(decision.model, prompt)

    # Record actual usage
    router.log_usage(decision, input_tokens=len(prompt)//4, output_tokens=len(response)//4)

    # Record provider health
    router.record_provider_event(decision.model, event="success", latency_ms=242.0)

    return response

Periodic Reporting

import time

# Every hour
while True:
    time.sleep(3600)

    report = router.get_sla_report(since_hours=1.0)
    alert = router.check_budget_alert()
    analytics = router.routing_analytics()

    print(f"SLA compliance: {report['compliance_rate']:.1%}")
    print(f"Budget: {alert['status']} ({alert['utilization']:.1%} used)")
    print(f"Most used model: {analytics['most_used_model']}")

    if alert["status"] == "critical":
        # Trigger alerts, adjust SLA config, etc.
        pass

    router.save_state("./router_state")

๐Ÿ—๏ธ Architecture

antaris-router
โ”œโ”€โ”€ Router (v1.0)                 โ† Production keyword-based router
โ”‚   โ”œโ”€โ”€ TaskClassifier            โ† Built-in keyword classification
โ”‚   โ”‚   โ””โ”€โ”€ ClassificationResult  โ† With signals: length, keywords, code
โ”‚   โ”œโ”€โ”€ ModelRegistry             โ† Model metadata + capability index
โ”‚   โ”œโ”€โ”€ CostTracker               โ† Per-session/period cost tracking
โ”‚   โ”œโ”€โ”€ SLAMonitor                โ† Constraint enforcement + reporting
โ”‚   โ””โ”€โ”€ RoutingDecision           โ† Full decision object
โ”‚
โ”œโ”€โ”€ AdaptiveRouter (v2.0)         โ† Self-improving semantic router
โ”‚   โ”œโ”€โ”€ SemanticClassifier        โ† TF-IDF vectorizer + tier centroids
โ”‚   โ”œโ”€โ”€ TFIDFVectorizer           โ† Low-level TF-IDF implementation
โ”‚   โ”œโ”€โ”€ QualityTracker            โ† Outcome feedback + model performance
โ”‚   โ””โ”€โ”€ RoutingResult             โ† Lightweight result object
โ”‚
โ””โ”€โ”€ Shared
    โ”œโ”€โ”€ SLAConfig                 โ† SLA constraint definition
    โ”œโ”€โ”€ SLARecord                 โ† Per-request SLA record
    โ””โ”€โ”€ ModelInfo                 โ† Model metadata (costs, caps, streaming)

๐Ÿ“„ License

Part of the antaris-suite โ€” adaptive AI infrastructure for LLM cost optimization.

ยฉ Antaris Analytics LLC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

antaris_router-5.0.1.tar.gz (104.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

antaris_router-5.0.1-py3-none-any.whl (68.1 kB view details)

Uploaded Python 3

File details

Details for the file antaris_router-5.0.1.tar.gz.

File metadata

  • Download URL: antaris_router-5.0.1.tar.gz
  • Upload date:
  • Size: 104.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-5.0.1.tar.gz
Algorithm Hash digest
SHA256 60df728c38cddf5e0f8274e673b50f2aa660080bf1d61c19345b050253c843db
MD5 7b431281fb8501c450513b6ed91fc6b1
BLAKE2b-256 fb312dfba016cdac68751185dd0d2a760753ebacb1b013289080855a07457673

See more details on using hashes here.

File details

Details for the file antaris_router-5.0.1-py3-none-any.whl.

File metadata

  • Download URL: antaris_router-5.0.1-py3-none-any.whl
  • Upload date:
  • Size: 68.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for antaris_router-5.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9dcdbf7c212198f9caec1f85ff505281ee4c53e2b2a0727aa52ce2f24c01f901
MD5 6f9b7ff0e77861d721f8bd37928a3fca
BLAKE2b-256 47c35a2dba5cad15d7ebc125a731ca1c60d190ec833732785f7c651bffe8e301

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page