Skip to main content

Efficient Agent Router — routes tasks to the best LLM under quality, latency, cost, and safety constraints

Project description

Efficient Agent Router (EAR)

Efficient Agent Router (EAR) is a Python-first orchestration service that selects and executes the best LLM for a request based on quality, cost, latency, context window, and safety constraints.

Goals

  • Route each request to the most suitable model for the task.
  • Reduce token burn through cost-aware model ranking.
  • Protect sensitive input with prompt-injection and PII safeguards.
  • Provide a clean CLI first, then expose the same logic through MCP.

Current Delivery Status (v0.10.16)

Epic Description Status
E1 Foundation and Project Setup ✅ Complete
E2 Model Registry and Metadata Management ✅ Complete
E3 Predictive Routing Engine ✅ Complete
E4 CLI Experience and Operator Workflow ✅ Complete
E5 Reliability and Cascade Fallback ✅ Complete
E6 Safety and Guardrails ✅ Complete
E7 Observability and Cost/Latency Metrics ✅ Complete
E8 MCP Server and Tool Exposure ✅ Complete
E9 CI/CD and Security Automation ✅ Complete
E10 Execution Plane and Adaptive Routing Intelligence ✅ Complete
E11 Leadership Demo Frontend and GTM Showcase ✅ Complete
E17 Ollama Private Provider Integration ✅ Complete
E18 Live React Web Console ✅ Complete
E19 CLI Aliases and UX Polish ✅ Complete
E12–E16 Post-launch hardening (PyPI verify, canary, benchmarks, ADRs) ⏳ Pending

Current Delivery Strategy

  1. Build and validate core routing engine through CLI. ✅
  2. Harden reliability, guardrails, and observability. ✅
  3. Expose stable capabilities through MCP server. ✅
  4. Add real execution runtime and adaptive intent/injection intelligence. ✅
  5. Ship interactive leadership demo with value storytelling. ✅
  6. Add Ollama private provider for on-premise safety routing. ✅
  7. Ship live React web console for developer-facing routing visualization. ✅
  8. Post-launch: verify PyPI release, run live canary, publish benchmarks, backfill ADRs.

Tech Stack

  • Python 3.12+
  • asyncio
  • Typer CLI
  • Pydantic v2
  • httpx for OpenRouter model metadata
  • pytest, pytest-asyncio, pytest-cov
  • bandit and pip-audit for security controls

Repository Layout

src/
  ear/
    __init__.py          # Package root, version
    config.py            # Pydantic-settings configuration (EARConfig)
    models.py            # Domain models: ModelSpec, RoutingRequest, RoutingDecision
    registry.py          # OpenRouterRegistry, OllamaRegistry, RegistryFactory
    router_engine.py     # IntentClassifier, SuitabilityScorer, RouterEngine
    guardrails.py        # Prompt-injection detector, PII policy, semantic risk scorer
    fallback.py          # FailureClassifier, FallbackPipeline
    metrics.py           # MetricsCollector, SessionSummary
    executor.py          # LLMExecutor, OllamaExecutor, CompositeExecutor
    orchestrator.py      # Unified execution orchestration pipeline
    intent.py            # Advanced intent classifier (embedding + heuristic fallback)
    evaluation.py        # Evaluation harness and benchmark suite
    cli.py               # Typer CLI: route, inspect-models, stats (+ aliases)
    mcp_server.py        # MCP stdio transport and tool/resource handlers
    demo_backend.py      # Demo routing replay scenarios and value storytelling
    demo_server.py       # uvicorn-backed local demo HTTP server
tests/
  conftest.py
  test_config.py
  test_models.py
  test_registry.py
  test_router_engine.py
  test_guardrails.py
  test_fallback.py
  test_metrics.py
  test_executor.py
  test_orchestrator.py
  test_intent.py
  test_evaluation.py
  test_cli.py
  test_mcp_server.py
  test_demo_backend.py
  test_demo_server.py
webapp/
  package.json           # React + Vite dependencies
  vite.config.js
  src/                   # React routing console components
docs/
  system_prompt.md
  execution_plan.md
  wbs.md
  release-playbook.md
  llm_explorer.html      # Standalone browser-based LLM explorer and demo UI
  usage-guide.md
  project-history.md     # Full commit history and delivery log
  adr/
  releases/

Core Workflow

  1. Accept user task input and options (task hint, budget priority, context profile).
  2. Run safety prechecks (injection and PII policy).
  3. Load model metadata from OpenRouter registry cache.
  4. Compute suitability score and candidate ranking.
  5. Return model recommendation, rationale, and fallback chain (execution runtime is tracked in E10).
  6. Emit session metrics snapshot for observability.

Routing Model

The router evaluates candidate models using a weighted suitability function:

S = Quality / (Cost * Latency)

Where score inputs are normalized and constrained by policy:

  • Context window threshold
  • Budget priority
  • Safety allowlist and PII policy
  • Task-specific boosts (coding, planning, research)

CLI Commands

Full command names and short aliases are both supported:

# Route a prompt (full and alias)
ear route "explain quicksort" --task coding --budget medium
ear r "explain quicksort" --task coding --budget medium

# JSON output for scripting
ear route "explain quicksort" --json

# Execute the routed model call
ear route "explain quicksort" --execute

# Inspect cached models
ear inspect-models
ear im

# Session metrics
ear stats
ear s

# Bare invocation: routes with sensible defaults
ear "explain quicksort"

MCP Design

  • Tool: route_and_execute
  • Resources: model performance metrics, cost per session
  • Transport: stdio

Ollama Private Provider

EAR routes PII-containing and injection-risk prompts to a local Ollama instance, ensuring sensitive data never reaches cloud providers.

Configuration:

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_ENABLED=true

Behavior:

  • ollama/<model> models appear in the registry with trusted=True and zero pricing.
  • Guardrail-blocked prompts route to Ollama when available instead of hard-blocking.
  • PII prompts are restricted to Ollama and vetted cloud providers only.
  • If Ollama is unavailable and a prompt is blocked, GuardrailsBlockedError is raised (fail-closed).

Interactive LLM Explorer and Demo UI

  • File: docs/llm_explorer.html
  • Purpose: interactive OpenRouter model table, routing demo, and value storytelling for leadership and investor demos.

What it includes:

  • Live model fetch from OpenRouter (/api/v1/models) with auto-refresh and last-updated indicator.
  • Search, provider pills, min-context, max-cost, and priced/unpriced filters.
  • Excel-style sortable table with per-column filters.
  • Side-by-side comparison cards for selected models (up to 4).
  • Value Story section with 10 routing scenarios: cost savings, latency gains, and safety enforcement.
  • Routing-mode toggle (Standard / Ollama Private): shows attack scenarios routing to ollama/llama3 for on-premise data-residency demonstration.
  • Processing progress log: step-by-step routing decisions with timestamps.

How to run:

  1. Open docs/llm_explorer.html directly in a browser, or
  2. Start the local demo server: python -m ear.demo_server (default port 7861)

Live React Web Console

  • Directory: webapp/
  • Purpose: developer-facing real-time routing visualization built with React and Vite.

How to run:

# Windows
run_live_webapp.bat

# Linux / macOS
bash run_live_webapp.sh

The launcher waits for the Vite dev server to be ready before opening the browser.

Demo Walkthrough

# Windows
run_demo_walkthrough.bat

# Linux / macOS
bash run_demo_walkthrough.sh

Runs all 10 demo routing scenarios end-to-end and opens the value storytelling view.

Configuration

Environment variables (minimum required):

OPENROUTER_API_KEY=<your key>
EAR_REGISTRY_TTL_SECONDS=300
EAR_DEFAULT_BUDGET=medium
EAR_MAX_RETRIES=3
EAR_OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
EAR_REQUEST_TIMEOUT_SECONDS=30

Optional Ollama private provider:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_ENABLED=true

Recommended local setup:

  1. Create and activate virtual environment: python -m venv .venv && .venv\Scripts\activate
  2. Install: pip install -e .[dev]
  3. Copy .env.example to .env and populate values.
  4. Run tests: run_tests.bat (Windows) or bash run_tests.sh
  5. Run security audits: run_security_audits.bat or bash run_security_audits.sh

Quality and Security Requirements

  • 100% statement and branch coverage for routing core.
  • Deterministic tests with mocked external dependencies.
  • Security linting with bandit.
  • Dependency auditing with pip-audit.
  • No plaintext secret logging.

Security Report HTML Generation

  • Security workflows generate JSON first, then render HTML using sec-report-kit.
  • pip-audit workflow outputs: security_reports/pip_audit_latest.html.
  • Trivy workflow outputs: security_reports/trivy_latest.html.
  • Both HTML files are uploaded in the workflow artifacts alongside JSON and SARIF outputs.
  • Local scripts also generate HTML from JSON:
    • run_pip_audit.bat / run_pip_audit.sh
    • run_trivy.bat / run_trivy.sh
    • one-command wrapper: run_security_audits.bat / run_security_audits.sh

MCP Server: sec-report-kit

Install sec-report-kit locally:

pip install sec-report-kit

Configured MCP server command:

srk mcp serve --transport stdio

Workspace configuration is stored in .vscode/mcp.json.

Milestones

  • M1: Registry and schema baseline ✅
  • M2: Router core and CLI ✅
  • M3: Guardrails and metrics ✅
  • M4: MCP server and CI/CD gates ✅
  • M5: Execution runtime and adaptive routing intelligence ✅
  • M6: Leadership/investor demo frontend ✅
  • M8: Ollama private provider integration ✅
  • M9: React console and CLI UX hardening ✅
  • M7: Post-launch hardening (PyPI verify, canary, benchmarks, ADRs) ⏳ Pending

Tests

  • 291 tests across 16 test modules
  • Enforced 100% statement and branch coverage for all routing, guardrail, and execution logic
  • All tests run with mocked external dependencies
run_tests.bat        # Windows
bash run_tests.sh    # Linux / macOS

Reports are written to coverage_reports/ (HTML, XML, JSON).

  • Use ear r / ear im / ear s aliases in examples for brevity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efficient_agent_router_ear-0.11.0.tar.gz (79.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

efficient_agent_router_ear-0.11.0-py3-none-any.whl (50.5 kB view details)

Uploaded Python 3

File details

Details for the file efficient_agent_router_ear-0.11.0.tar.gz.

File metadata

File hashes

Hashes for efficient_agent_router_ear-0.11.0.tar.gz
Algorithm Hash digest
SHA256 37d5d12b8d9f8525d4591ce4f69bd51b06acb4742e189e4d899c8310d3b77f85
MD5 ef9710dac0c6703de267372f99d3f68c
BLAKE2b-256 caf55ecd9546a516f1817366861e1357352b9e8ac5b520e651fae3c1e8e96b76

See more details on using hashes here.

Provenance

The following attestation bundles were made for efficient_agent_router_ear-0.11.0.tar.gz:

Publisher: publish-pypi.yml on ShanKonduru/efficient-agent-router-ear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file efficient_agent_router_ear-0.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for efficient_agent_router_ear-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f504501fc0d00136438c39f6dfcc049c35b68d6d3f8e82635e198805d223b16
MD5 1db08f56b835b96546b09c4c58514194
BLAKE2b-256 4047663146906886d00ee9c8413d9304eab55fdf834a2edf8edfc279502a2693

See more details on using hashes here.

Provenance

The following attestation bundles were made for efficient_agent_router_ear-0.11.0-py3-none-any.whl:

Publisher: publish-pypi.yml on ShanKonduru/efficient-agent-router-ear

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page