An SRE-Optimized API Gateway for dynamic LLM routing, Redis caching, and Prometheus telemetry.

These details have not been verified by PyPI

Project links

Homepage

Project description

Agentic API Gateway | SRE Edge Router

A production-grade, highly resilient API Gateway that dynamically routes Large Language Model (LLM) prompts based on complexity and minimizes cost through semantic caching.

Built strictly with Site Reliability Engineering (SRE) principles, it implements automated failovers, Redis-backed rate limiters, token-cost telemetry routing to Prometheus, and features a glowing, immersive UI dashboard built beautifully in Vanilla JS/Vite.

🚀 Core SRE Features

Dynamic Tier Routing: Uses a heuristics engine to parse prompt intent. Simple queries route to blazing-fast models (groq/llama-3.1-8b), while complex queries (e.g. system design) automatically route to heavy models (groq/llama-3.1-70b).
Zero-Latency Semantic Caching: SHA-256 hashes intercepts inbound requests. Identical requests skip the LLM network entirely, serving an exact match directly from Redis memory in < 0ms for $0USD cost.
Intelligent Failover Resiliency: Wraps primary model calls in strict asyncio timeouts. If a provider throws a quota limit, 503, or hangs, the router gracefully degrades to alternative models before ever throwing an error to the user.
Vite SRE Telemetry Dashboard: Complete visual interface built without bulky frameworks—utilizing raw CSS glassmorphism, flexbox scaling, and micro-animated charts showcasing true request latency and cost updates on every stream.
DDoS/Billing Defense: Implements a Redis token-bucket API rate limiter (50 req/min) requiring an x-api-key header to prevent billing exhaustion.
Prometheus & Grafana Observability: Instrumentated with custom Python metrics exposing End-to-End Latency Histograms, LLM Token Cost accumulations, and Routing Cache Hit/Miss rates to /metrics.

🛠️ Tech Stack & Architecture

Backend Route Logic: Python 3.11, FastAPI, LiteLLM (for multi-provider standardization)
Frontend Dashboard: Raw HTML5, Vanilla Base CSS, Vite Node-Server, marked.js
Cache & Memory: Redis alpine container
Orchestration: Docker Compose
Observability: Prometheus (Scraping), Grafana (Visualization)
Inference Hardware: Groq LPU (Llama 3.1 models config standard)

⚡ Quickstart

Clone the repository:

git clone https://github.com/ManikBodamwad/LLM-Latency-Cost-Router.git
cd "LLM Latency & Cost router"

Supply your API Keys: Create a .env file in the root directory:
```
GROQ_API_KEY="gsk_your_groq_key_here"
```
Deploy via Docker Compose:
```
docker compose up -d --build
```
Experience the Application: Open http://localhost:5173 in your web browser. Type a complex prompt like "Can you explain the Medallion Architecture?" and observe the SRE dashboard dynamically tracking the latency, the exact Token Cost, and the routing strategy in real-time.

📦 Usage as a Python Package

This repository is built as a portable Python package so engineering teams can inject edge-routing into their own systems natively without bulky Docker containers!

If you install this via pip:

pip install agentic-sre-gateway

You can instantly spin up the SRE-optimized Routing API on your local terminal using the globally injected command:

export GROQ_API_KEY="your_key"
export REDIS_URL="redis://localhost:6379/0"
agentic-gateway

This serves teams that want a drop-in API proxy to massively reduce LLM bills and monitor token consumption locally without rewriting complex LiteLLM and Prometheus wrappers themselves.

📊 View Local Development Telemetry

For local visualization during development, the docker-compose orchestration automatically spins up standard metrics scrape targets.

Prometheus Scraper UI: http://localhost:9090
Grafana Workspace: http://localhost:3000 (Note: This uses the default local-dev credentials Login: admin / Password: admin)

Developed by Manik Bodamwad to solve enterprise-level LLM deployment friction points: Cost Runaway, High Latency, and Provider Downtime.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.0

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_sre_gateway-1.0.0.tar.gz (7.2 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentic_sre_gateway-1.0.0-py3-none-any.whl (7.7 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file agentic_sre_gateway-1.0.0.tar.gz.

File metadata

Download URL: agentic_sre_gateway-1.0.0.tar.gz
Upload date: Apr 7, 2026
Size: 7.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for agentic_sre_gateway-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`947fb0e5748a387c1cc4f052227c796efa9d6bc25fa9c2e47908dbaee56d27b0`
MD5	`ec72c71c0ec2dbf79e56792c1da7cf76`
BLAKE2b-256	`01f2be296108bcf364d8ebb72779aed4411c0d25d2b798c1771ee77e0fef2094`

See more details on using hashes here.

File details

Details for the file agentic_sre_gateway-1.0.0-py3-none-any.whl.

File metadata

Download URL: agentic_sre_gateway-1.0.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 7.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for agentic_sre_gateway-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b32d8a52832c85969ed906302f06ee85eb69b965973ce292b32cc31d8f41e3a5`
MD5	`c7d2ba6251a32fdee85e355ef8721c2e`
BLAKE2b-256	`77f05ae77f993a4da22a21b35ed98f4c8793f8bb74fe96f5d76ed52473981819`

See more details on using hashes here.

agentic-sre-gateway 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agentic API Gateway | SRE Edge Router

🚀 Core SRE Features

🛠️ Tech Stack & Architecture

⚡ Quickstart

📦 Usage as a Python Package

📊 View Local Development Telemetry

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes