Skip to main content

Efficient Agent Router — routes tasks to the best LLM under quality, latency, cost, and safety constraints

Project description

Efficient Agent Router (EAR)

Efficient Agent Router (EAR) is a Python-first orchestration service that selects and executes the best LLM for a request based on quality, cost, latency, context window, and safety constraints.

Goals

  • Route each request to the most suitable model for the task.
  • Reduce token burn through cost-aware model ranking.
  • Protect sensitive input with prompt-injection and PII safeguards.
  • Provide a clean CLI first, then expose the same logic through MCP.

Current Delivery Status (v0.10.16)

Epic Description Status
E1 Foundation and Project Setup ✅ Complete
E2 Model Registry and Metadata Management ✅ Complete
E3 Predictive Routing Engine ✅ Complete
E4 CLI Experience and Operator Workflow ✅ Complete
E5 Reliability and Cascade Fallback ✅ Complete
E6 Safety and Guardrails ✅ Complete
E7 Observability and Cost/Latency Metrics ✅ Complete
E8 MCP Server and Tool Exposure ✅ Complete
E9 CI/CD and Security Automation ✅ Complete
E10 Execution Plane and Adaptive Routing Intelligence ✅ Complete
E11 Leadership Demo Frontend and GTM Showcase ✅ Complete
E17 Ollama Private Provider Integration ✅ Complete
E18 Live React Web Console ✅ Complete
E19 CLI Aliases and UX Polish ✅ Complete
E12–E16 Post-launch hardening (PyPI verify, canary, benchmarks, ADRs) ⏳ Pending

Current Delivery Strategy

  1. Build and validate core routing engine through CLI. ✅
  2. Harden reliability, guardrails, and observability. ✅
  3. Expose stable capabilities through MCP server. ✅
  4. Add real execution runtime and adaptive intent/injection intelligence. ✅
  5. Ship interactive leadership demo with value storytelling. ✅
  6. Add Ollama private provider for on-premise safety routing. ✅
  7. Ship live React web console for developer-facing routing visualization. ✅
  8. Post-launch: verify PyPI release, run live canary, publish benchmarks, backfill ADRs.

Tech Stack

  • Python 3.12+
  • asyncio
  • Typer CLI
  • Pydantic v2
  • httpx for OpenRouter model metadata
  • pytest, pytest-asyncio, pytest-cov
  • bandit and pip-audit for security controls

Repository Layout

src/
  ear/
    __init__.py          # Package root, version
    config.py            # Pydantic-settings configuration (EARConfig)
    models.py            # Domain models: ModelSpec, RoutingRequest, RoutingDecision
    registry.py          # OpenRouterRegistry, OllamaRegistry, RegistryFactory
    router_engine.py     # IntentClassifier, SuitabilityScorer, RouterEngine
    guardrails.py        # Prompt-injection detector, PII policy, semantic risk scorer
    fallback.py          # FailureClassifier, FallbackPipeline
    metrics.py           # MetricsCollector, SessionSummary
    executor.py          # LLMExecutor, OllamaExecutor, CompositeExecutor
    orchestrator.py      # Unified execution orchestration pipeline
    intent.py            # Advanced intent classifier (embedding + heuristic fallback)
    evaluation.py        # Evaluation harness and benchmark suite
    cli.py               # Typer CLI: route, inspect-models, stats (+ aliases)
    mcp_server.py        # MCP stdio transport and tool/resource handlers
    demo_backend.py      # Demo routing replay scenarios and value storytelling
    demo_server.py       # uvicorn-backed local demo HTTP server
tests/
  conftest.py
  test_config.py
  test_models.py
  test_registry.py
  test_router_engine.py
  test_guardrails.py
  test_fallback.py
  test_metrics.py
  test_executor.py
  test_orchestrator.py
  test_intent.py
  test_evaluation.py
  test_cli.py
  test_mcp_server.py
  test_demo_backend.py
  test_demo_server.py
webapp/
  package.json           # React + Vite dependencies
  vite.config.js
  src/                   # React routing console components
docs/
  system_prompt.md
  execution_plan.md
  wbs.md
  release-playbook.md
  llm_explorer.html      # Standalone browser-based LLM explorer and demo UI
  usage-guide.md
  project-history.md     # Full commit history and delivery log
  adr/
  releases/

Core Workflow

  1. Accept user task input and options (task hint, budget priority, context profile).
  2. Run safety prechecks (injection and PII policy).
  3. Load model metadata from OpenRouter registry cache.
  4. Compute suitability score and candidate ranking.
  5. Return model recommendation, rationale, and fallback chain (execution runtime is tracked in E10).
  6. Emit session metrics snapshot for observability.

Routing Model

The router evaluates candidate models using a weighted suitability function:

S = Quality / (Cost * Latency)

Where score inputs are normalized and constrained by policy:

  • Context window threshold
  • Budget priority
  • Safety allowlist and PII policy
  • Task-specific boosts (coding, planning, research)

CLI Commands

Full command names and short aliases are both supported:

# Route a prompt (full and alias)
ear route "explain quicksort" --task coding --budget medium
ear r "explain quicksort" --task coding --budget medium

# JSON output for scripting
ear route "explain quicksort" --json

# Execute the routed model call
ear route "explain quicksort" --execute

# Inspect cached models
ear inspect-models
ear im

# Session metrics
ear stats
ear s

# Bare invocation: routes with sensible defaults
ear "explain quicksort"

MCP Design

  • Tool: route_and_execute
  • Resources: model performance metrics, cost per session
  • Transport: stdio

Ollama Private Provider

EAR routes PII-containing and injection-risk prompts to a local Ollama instance, ensuring sensitive data never reaches cloud providers.

Configuration:

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_ENABLED=true

Behavior:

  • ollama/<model> models appear in the registry with trusted=True and zero pricing.
  • Guardrail-blocked prompts route to Ollama when available instead of hard-blocking.
  • PII prompts are restricted to Ollama and vetted cloud providers only.
  • If Ollama is unavailable and a prompt is blocked, GuardrailsBlockedError is raised (fail-closed).

Interactive LLM Explorer and Demo UI

  • File: docs/llm_explorer.html
  • Purpose: interactive OpenRouter model table, routing demo, and value storytelling for leadership and investor demos.

What it includes:

  • Live model fetch from OpenRouter (/api/v1/models) with auto-refresh and last-updated indicator.
  • Search, provider pills, min-context, max-cost, and priced/unpriced filters.
  • Excel-style sortable table with per-column filters.
  • Side-by-side comparison cards for selected models (up to 4).
  • Value Story section with 10 routing scenarios: cost savings, latency gains, and safety enforcement.
  • Routing-mode toggle (Standard / Ollama Private): shows attack scenarios routing to ollama/llama3 for on-premise data-residency demonstration.
  • Processing progress log: step-by-step routing decisions with timestamps.

How to run:

  1. Open docs/llm_explorer.html directly in a browser, or
  2. Start the local demo server: python -m ear.demo_server (default port 7861)

Live React Web Console

  • Directory: webapp/
  • Purpose: developer-facing real-time routing visualization built with React and Vite.

How to run:

# Windows
run_live_webapp.bat

# Linux / macOS
bash run_live_webapp.sh

The launcher waits for the Vite dev server to be ready before opening the browser.

Demo Walkthrough

# Windows
run_demo_walkthrough.bat

# Linux / macOS
bash run_demo_walkthrough.sh

Runs all 10 demo routing scenarios end-to-end and opens the value storytelling view.

Configuration

Environment variables (minimum required):

OPENROUTER_API_KEY=<your key>
EAR_REGISTRY_TTL_SECONDS=300
EAR_DEFAULT_BUDGET=medium
EAR_MAX_RETRIES=3
EAR_OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
EAR_REQUEST_TIMEOUT_SECONDS=30

Optional Ollama private provider:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_ENABLED=true

Recommended local setup:

  1. Create and activate virtual environment: python -m venv .venv && .venv\Scripts\activate
  2. Install: pip install -e .[dev]
  3. Copy .env.example to .env and populate values.
  4. Run tests: run_tests.bat (Windows) or bash run_tests.sh
  5. Run security audits: run_security_audits.bat or bash run_security_audits.sh

Quality and Security Requirements

  • 100% statement and branch coverage for routing core.
  • Deterministic tests with mocked external dependencies.
  • Security linting with bandit.
  • Dependency auditing with pip-audit.
  • No plaintext secret logging.

Security Report HTML Generation

  • Security workflows generate JSON first, then render HTML using sec-report-kit.
  • pip-audit workflow outputs: security_reports/pip_audit_latest.html.
  • Trivy workflow outputs: security_reports/trivy_latest.html.
  • Both HTML files are uploaded in the workflow artifacts alongside JSON and SARIF outputs.
  • Local scripts also generate HTML from JSON:
    • run_pip_audit.bat / run_pip_audit.sh
    • run_trivy.bat / run_trivy.sh
    • one-command wrapper: run_security_audits.bat / run_security_audits.sh

MCP Server: sec-report-kit

Install sec-report-kit locally:

pip install sec-report-kit

Configured MCP server command:

srk mcp serve --transport stdio

Workspace configuration is stored in .vscode/mcp.json.

Milestones

  • M1: Registry and schema baseline ✅
  • M2: Router core and CLI ✅
  • M3: Guardrails and metrics ✅
  • M4: MCP server and CI/CD gates ✅
  • M5: Execution runtime and adaptive routing intelligence ✅
  • M6: Leadership/investor demo frontend ✅
  • M8: Ollama private provider integration ✅
  • M9: React console and CLI UX hardening ✅
  • M7: Post-launch hardening (PyPI verify, canary, benchmarks, ADRs) ⏳ Pending

Tests

  • 291 tests across 16 test modules
  • Enforced 100% statement and branch coverage for all routing, guardrail, and execution logic
  • All tests run with mocked external dependencies
run_tests.bat        # Windows
bash run_tests.sh    # Linux / macOS

Reports are written to coverage_reports/ (HTML, XML, JSON).

  • Use ear r / ear im / ear s aliases in examples for brevity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efficient_agent_router_ear-0.10.17.tar.gz (70.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

efficient_agent_router_ear-0.10.17-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file efficient_agent_router_ear-0.10.17.tar.gz.

File metadata

File hashes

Hashes for efficient_agent_router_ear-0.10.17.tar.gz
Algorithm Hash digest
SHA256 829a20b2e4934d9886796aa061aa84d78402d89cbe7d28f5d2acbc92633dea7e
MD5 b050ed949f864f5e89cf1cf334f91072
BLAKE2b-256 dd58fd4f9174418011c3d3a6d6ab8e0db5f6813a510cb8c09f283686dad8c312

See more details on using hashes here.

Provenance

The following attestation bundles were made for efficient_agent_router_ear-0.10.17.tar.gz:

Publisher: publish-pypi.yml on ShanKonduru/efficient-agent-router-ear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file efficient_agent_router_ear-0.10.17-py3-none-any.whl.

File metadata

File hashes

Hashes for efficient_agent_router_ear-0.10.17-py3-none-any.whl
Algorithm Hash digest
SHA256 00d1ff878f4a386db8113b6cdef3a71254c83734942ff0de6dbc584692c7f400
MD5 106a8dc15fdfffcb7b3c8360a52475ef
BLAKE2b-256 a2097e1493ddffd14ad9c2a63d16f8949e2f2146c2f9ff3f5ad010a496f9d7ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for efficient_agent_router_ear-0.10.17-py3-none-any.whl:

Publisher: publish-pypi.yml on ShanKonduru/efficient-agent-router-ear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page