Efficient Agent Router — routes tasks to the best LLM under quality, latency, cost, and safety constraints
Project description
Efficient Agent Router (EAR)
Efficient Agent Router (EAR) is a Python-first orchestration service that selects and executes the best LLM for a request based on quality, cost, latency, context window, and safety constraints.
Goals
- Route each request to the most suitable model for the task.
- Reduce token burn through cost-aware model ranking.
- Protect sensitive input with prompt-injection and PII safeguards.
- Provide a clean CLI first, then expose the same logic through MCP.
Current Delivery Status (v0.10.16)
| Epic | Description | Status |
|---|---|---|
| E1 | Foundation and Project Setup | ✅ Complete |
| E2 | Model Registry and Metadata Management | ✅ Complete |
| E3 | Predictive Routing Engine | ✅ Complete |
| E4 | CLI Experience and Operator Workflow | ✅ Complete |
| E5 | Reliability and Cascade Fallback | ✅ Complete |
| E6 | Safety and Guardrails | ✅ Complete |
| E7 | Observability and Cost/Latency Metrics | ✅ Complete |
| E8 | MCP Server and Tool Exposure | ✅ Complete |
| E9 | CI/CD and Security Automation | ✅ Complete |
| E10 | Execution Plane and Adaptive Routing Intelligence | ✅ Complete |
| E11 | Leadership Demo Frontend and GTM Showcase | ✅ Complete |
| E17 | Ollama Private Provider Integration | ✅ Complete |
| E18 | Live React Web Console | ✅ Complete |
| E19 | CLI Aliases and UX Polish | ✅ Complete |
| E12–E16 | Post-launch hardening (PyPI verify, canary, benchmarks, ADRs) | ⏳ Pending |
Current Delivery Strategy
- Build and validate core routing engine through CLI. ✅
- Harden reliability, guardrails, and observability. ✅
- Expose stable capabilities through MCP server. ✅
- Add real execution runtime and adaptive intent/injection intelligence. ✅
- Ship interactive leadership demo with value storytelling. ✅
- Add Ollama private provider for on-premise safety routing. ✅
- Ship live React web console for developer-facing routing visualization. ✅
- Post-launch: verify PyPI release, run live canary, publish benchmarks, backfill ADRs.
Tech Stack
- Python 3.12+
- asyncio
- Typer CLI
- Pydantic v2
- httpx for OpenRouter model metadata
- pytest, pytest-asyncio, pytest-cov
- bandit and pip-audit for security controls
Repository Layout
src/
ear/
__init__.py # Package root, version
config.py # Pydantic-settings configuration (EARConfig)
models.py # Domain models: ModelSpec, RoutingRequest, RoutingDecision
registry.py # OpenRouterRegistry, OllamaRegistry, RegistryFactory
router_engine.py # IntentClassifier, SuitabilityScorer, RouterEngine
guardrails.py # Prompt-injection detector, PII policy, semantic risk scorer
fallback.py # FailureClassifier, FallbackPipeline
metrics.py # MetricsCollector, SessionSummary
executor.py # LLMExecutor, OllamaExecutor, CompositeExecutor
orchestrator.py # Unified execution orchestration pipeline
intent.py # Advanced intent classifier (embedding + heuristic fallback)
evaluation.py # Evaluation harness and benchmark suite
cli.py # Typer CLI: route, inspect-models, stats (+ aliases)
mcp_server.py # MCP stdio transport and tool/resource handlers
demo_backend.py # Demo routing replay scenarios and value storytelling
demo_server.py # uvicorn-backed local demo HTTP server
tests/
conftest.py
test_config.py
test_models.py
test_registry.py
test_router_engine.py
test_guardrails.py
test_fallback.py
test_metrics.py
test_executor.py
test_orchestrator.py
test_intent.py
test_evaluation.py
test_cli.py
test_mcp_server.py
test_demo_backend.py
test_demo_server.py
webapp/
package.json # React + Vite dependencies
vite.config.js
src/ # React routing console components
docs/
system_prompt.md
execution_plan.md
wbs.md
release-playbook.md
llm_explorer.html # Standalone browser-based LLM explorer and demo UI
usage-guide.md
project-history.md # Full commit history and delivery log
adr/
releases/
Core Workflow
- Accept user task input and options (task hint, budget priority, context profile).
- Run safety prechecks (injection and PII policy).
- Load model metadata from OpenRouter registry cache.
- Compute suitability score and candidate ranking.
- Return model recommendation, rationale, and fallback chain (execution runtime is tracked in E10).
- Emit session metrics snapshot for observability.
Routing Model
The router evaluates candidate models using a weighted suitability function:
S = Quality / (Cost * Latency)
Where score inputs are normalized and constrained by policy:
- Context window threshold
- Budget priority
- Safety allowlist and PII policy
- Task-specific boosts (coding, planning, research)
CLI Commands
Full command names and short aliases are both supported:
# Route a prompt (full and alias)
ear route "explain quicksort" --task coding --budget medium
ear r "explain quicksort" --task coding --budget medium
# JSON output for scripting
ear route "explain quicksort" --json
# Execute the routed model call
ear route "explain quicksort" --execute
# Inspect cached models
ear inspect-models
ear im
# Session metrics
ear stats
ear s
# Bare invocation: routes with sensible defaults
ear "explain quicksort"
MCP Design
- Tool:
route_and_execute - Resources: model performance metrics, cost per session
- Transport: stdio
Ollama Private Provider
EAR routes PII-containing and injection-risk prompts to a local Ollama instance, ensuring sensitive data never reaches cloud providers.
Configuration:
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_ENABLED=true
Behavior:
ollama/<model>models appear in the registry withtrusted=Trueand zero pricing.- Guardrail-blocked prompts route to Ollama when available instead of hard-blocking.
- PII prompts are restricted to Ollama and vetted cloud providers only.
- If Ollama is unavailable and a prompt is blocked,
GuardrailsBlockedErroris raised (fail-closed).
Interactive LLM Explorer and Demo UI
- File:
docs/llm_explorer.html - Purpose: interactive OpenRouter model table, routing demo, and value storytelling for leadership and investor demos.
What it includes:
- Live model fetch from OpenRouter (
/api/v1/models) with auto-refresh and last-updated indicator. - Search, provider pills, min-context, max-cost, and priced/unpriced filters.
- Excel-style sortable table with per-column filters.
- Side-by-side comparison cards for selected models (up to 4).
- Value Story section with 10 routing scenarios: cost savings, latency gains, and safety enforcement.
- Routing-mode toggle (Standard / Ollama Private): shows attack scenarios routing to
ollama/llama3for on-premise data-residency demonstration. - Processing progress log: step-by-step routing decisions with timestamps.
How to run:
- Open
docs/llm_explorer.htmldirectly in a browser, or - Start the local demo server:
python -m ear.demo_server(default port 7861)
Live React Web Console
- Directory:
webapp/ - Purpose: developer-facing real-time routing visualization built with React and Vite.
How to run:
# Windows
run_live_webapp.bat
# Linux / macOS
bash run_live_webapp.sh
The launcher waits for the Vite dev server to be ready before opening the browser.
Demo Walkthrough
# Windows
run_demo_walkthrough.bat
# Linux / macOS
bash run_demo_walkthrough.sh
Runs all 10 demo routing scenarios end-to-end and opens the value storytelling view.
Configuration
Environment variables (minimum required):
OPENROUTER_API_KEY=<your key>
EAR_REGISTRY_TTL_SECONDS=300
EAR_DEFAULT_BUDGET=medium
EAR_MAX_RETRIES=3
EAR_OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
EAR_REQUEST_TIMEOUT_SECONDS=30
Optional Ollama private provider:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_ENABLED=true
Recommended local setup:
- Create and activate virtual environment:
python -m venv .venv && .venv\Scripts\activate - Install:
pip install -e .[dev] - Copy
.env.exampleto.envand populate values. - Run tests:
run_tests.bat(Windows) orbash run_tests.sh - Run security audits:
run_security_audits.batorbash run_security_audits.sh
Quality and Security Requirements
- 100% statement and branch coverage for routing core.
- Deterministic tests with mocked external dependencies.
- Security linting with bandit.
- Dependency auditing with pip-audit.
- No plaintext secret logging.
Security Report HTML Generation
- Security workflows generate JSON first, then render HTML using sec-report-kit.
- pip-audit workflow outputs: security_reports/pip_audit_latest.html.
- Trivy workflow outputs: security_reports/trivy_latest.html.
- Both HTML files are uploaded in the workflow artifacts alongside JSON and SARIF outputs.
- Local scripts also generate HTML from JSON:
- run_pip_audit.bat / run_pip_audit.sh
- run_trivy.bat / run_trivy.sh
- one-command wrapper: run_security_audits.bat / run_security_audits.sh
MCP Server: sec-report-kit
Install sec-report-kit locally:
pip install sec-report-kit
Configured MCP server command:
srk mcp serve --transport stdio
Workspace configuration is stored in .vscode/mcp.json.
Milestones
- M1: Registry and schema baseline ✅
- M2: Router core and CLI ✅
- M3: Guardrails and metrics ✅
- M4: MCP server and CI/CD gates ✅
- M5: Execution runtime and adaptive routing intelligence ✅
- M6: Leadership/investor demo frontend ✅
- M8: Ollama private provider integration ✅
- M9: React console and CLI UX hardening ✅
- M7: Post-launch hardening (PyPI verify, canary, benchmarks, ADRs) ⏳ Pending
Tests
- 291 tests across 16 test modules
- Enforced 100% statement and branch coverage for all routing, guardrail, and execution logic
- All tests run with mocked external dependencies
run_tests.bat # Windows
bash run_tests.sh # Linux / macOS
Reports are written to coverage_reports/ (HTML, XML, JSON).
- Use
ear r/ear im/ear saliases in examples for brevity.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file efficient_agent_router_ear-0.10.16.tar.gz.
File metadata
- Download URL: efficient_agent_router_ear-0.10.16.tar.gz
- Upload date:
- Size: 69.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce62459962b5a8b13fbb5d3b645ee61a86680a6a2134374d2183c398e5bcee09
|
|
| MD5 |
27ebccd477eed8bb29a9851fd47c73c9
|
|
| BLAKE2b-256 |
e1ccef95ec4ec3dc49c2a8387a4f734f9b2163fd65fa3224538f2ed73a11e069
|
Provenance
The following attestation bundles were made for efficient_agent_router_ear-0.10.16.tar.gz:
Publisher:
publish-pypi.yml on ShanKonduru/efficient-agent-router-ear
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
efficient_agent_router_ear-0.10.16.tar.gz -
Subject digest:
ce62459962b5a8b13fbb5d3b645ee61a86680a6a2134374d2183c398e5bcee09 - Sigstore transparency entry: 1431561118
- Sigstore integration time:
-
Permalink:
ShanKonduru/efficient-agent-router-ear@6b223cdc307e8ad52564dee75cc2afa54132f277 -
Branch / Tag:
refs/tags/v0.10.16 - Owner: https://github.com/ShanKonduru
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@6b223cdc307e8ad52564dee75cc2afa54132f277 -
Trigger Event:
push
-
Statement type:
File details
Details for the file efficient_agent_router_ear-0.10.16-py3-none-any.whl.
File metadata
- Download URL: efficient_agent_router_ear-0.10.16-py3-none-any.whl
- Upload date:
- Size: 44.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e00ea4d2ea27f4b325ee5f0a87c0433d30238bf92d4f5254a20157efa9b850b6
|
|
| MD5 |
373291f64fe3c2438eaf25675dfdc347
|
|
| BLAKE2b-256 |
86e7b3a94321a5203fb34e6d485ca7dd65a0131f70f3ffdc37ce55efdf4c8ea0
|
Provenance
The following attestation bundles were made for efficient_agent_router_ear-0.10.16-py3-none-any.whl:
Publisher:
publish-pypi.yml on ShanKonduru/efficient-agent-router-ear
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
efficient_agent_router_ear-0.10.16-py3-none-any.whl -
Subject digest:
e00ea4d2ea27f4b325ee5f0a87c0433d30238bf92d4f5254a20157efa9b850b6 - Sigstore transparency entry: 1431561553
- Sigstore integration time:
-
Permalink:
ShanKonduru/efficient-agent-router-ear@6b223cdc307e8ad52564dee75cc2afa54132f277 -
Branch / Tag:
refs/tags/v0.10.16 - Owner: https://github.com/ShanKonduru
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@6b223cdc307e8ad52564dee75cc2afa54132f277 -
Trigger Event:
push
-
Statement type: