An SRE-Optimized API Gateway for dynamic LLM routing, Redis caching, and Prometheus telemetry.
Project description
Agentic API Gateway | SRE Edge Router
A production-grade, highly resilient API Gateway that dynamically routes Large Language Model (LLM) prompts based on complexity and minimizes cost through semantic caching.
Built strictly with Site Reliability Engineering (SRE) principles, it implements automated failovers, Redis-backed rate limiters, token-cost telemetry routing to Prometheus, and features a glowing, immersive UI dashboard built beautifully in Vanilla JS/Vite.
🚀 Core SRE Features
- Dynamic Tier Routing: Uses a heuristics engine to parse prompt intent. Simple queries route to blazing-fast models (
groq/llama-3.1-8b), while complex queries (e.g. system design) automatically route to heavy models (groq/llama-3.1-70b). - Zero-Latency Semantic Caching: SHA-256 hashes intercepts inbound requests. Identical requests skip the LLM network entirely, serving an exact match directly from Redis memory in
< 0msfor$0USDcost. - Intelligent Failover Resiliency: Wraps primary model calls in strict
asynciotimeouts. If a provider throws a quota limit,503, or hangs, the router gracefully degrades to alternative models before ever throwing an error to the user. - Vite SRE Telemetry Dashboard: Complete visual interface built without bulky frameworks—utilizing raw CSS glassmorphism, flexbox scaling, and micro-animated charts showcasing true request latency and cost updates on every stream.
- DDoS/Billing Defense: Implements a Redis token-bucket API rate limiter (50 req/min) requiring an
x-api-keyheader to prevent billing exhaustion. - Prometheus & Grafana Observability: Instrumentated with custom Python metrics exposing End-to-End Latency Histograms, LLM Token Cost accumulations, and Routing Cache Hit/Miss rates to
/metrics.
🛠️ Tech Stack & Architecture
- Backend Route Logic:
Python 3.11,FastAPI,LiteLLM(for multi-provider standardization) - Frontend Dashboard: Raw
HTML5, Vanilla BaseCSS,ViteNode-Server,marked.js - Cache & Memory:
Redisalpine container - Orchestration:
Docker Compose - Observability:
Prometheus(Scraping),Grafana(Visualization) - Inference Hardware:
Groq LPU(Llama 3.1 models config standard)
⚡ Quickstart
-
Clone the repository:
git clone https://github.com/ManikBodamwad/LLM-Latency-Cost-Router.git cd "LLM Latency & Cost router"
-
Supply your API Keys: Create a
.envfile in the root directory:GROQ_API_KEY="gsk_your_groq_key_here"
-
Deploy via Docker Compose:
docker compose up -d --build
-
Experience the Application: Open http://localhost:5173 in your web browser. Type a complex prompt like "Can you explain the Medallion Architecture?" and observe the SRE dashboard dynamically tracking the latency, the exact Token Cost, and the routing strategy in real-time.
📦 Usage as a Python Package
This repository is built as a portable Python package so engineering teams can inject edge-routing into their own systems natively without bulky Docker containers!
If you install this via pip:
pip install agentic-sre-gateway
You can instantly spin up the SRE-optimized Routing API on your local terminal using the globally injected command:
export GROQ_API_KEY="your_key"
export REDIS_URL="redis://localhost:6379/0"
agentic-gateway
This serves teams that want a drop-in API proxy to massively reduce LLM bills and monitor token consumption locally without rewriting complex LiteLLM and Prometheus wrappers themselves.
📊 View Local Development Telemetry
For local visualization during development, the docker-compose orchestration automatically spins up standard metrics scrape targets.
- Prometheus Scraper UI:
http://localhost:9090 - Grafana Workspace:
http://localhost:3000(Note: This uses the default local-dev credentials Login:admin/ Password:admin)
Developed by Manik Bodamwad to solve enterprise-level LLM deployment friction points: Cost Runaway, High Latency, and Provider Downtime.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_sre_gateway-1.0.0.tar.gz.
File metadata
- Download URL: agentic_sre_gateway-1.0.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
947fb0e5748a387c1cc4f052227c796efa9d6bc25fa9c2e47908dbaee56d27b0
|
|
| MD5 |
ec72c71c0ec2dbf79e56792c1da7cf76
|
|
| BLAKE2b-256 |
01f2be296108bcf364d8ebb72779aed4411c0d25d2b798c1771ee77e0fef2094
|
File details
Details for the file agentic_sre_gateway-1.0.0-py3-none-any.whl.
File metadata
- Download URL: agentic_sre_gateway-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b32d8a52832c85969ed906302f06ee85eb69b965973ce292b32cc31d8f41e3a5
|
|
| MD5 |
c7d2ba6251a32fdee85e355ef8721c2e
|
|
| BLAKE2b-256 |
77f05ae77f993a4da22a21b35ed98f4c8793f8bb74fe96f5d76ed52473981819
|