High-performance Rust-based inference gateway for large-scale LLM deployments
Project description
Shepherd Model Gateway
High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.
Why SMG?
| 🚀 Maximize GPU Utilization | Cache-aware routing understands your inference engine's KV cache state—whether vLLM, SGLang, or TensorRT-LLM—to reuse prefixes and reduce redundant computation. |
| 🔌 One API, Any Backend | Route to self-hosted models (vLLM, SGLang, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint. |
| ⚡ Built for Speed | Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running. |
| 🔒 Enterprise Control | Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure. |
| 📊 Full Observability | 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer. |
API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.
Quick Start
Install — pick your preferred method:
# Docker
docker pull lightseekorg/smg:latest
# Python
pip install smg
# Rust
cargo install smg
Run — point SMG at your inference workers:
# Single worker
smg --worker-urls http://localhost:8000
# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware
# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001
Use — send requests to the gateway:
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'
That's it. SMG is now load-balancing requests across your workers.
Supported Backends
| Self-Hosted | Cloud Providers |
|---|---|
| vLLM | OpenAI |
| SGLang | Anthropic |
| TensorRT-LLM | Google Gemini |
| Ollama | AWS Bedrock |
| Any OpenAI-compatible server | Azure OpenAI |
Features
| Feature | Description |
|---|---|
| 8 Routing Policies | cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket |
| gRPC Pipeline | Native gRPC with streaming, reasoning extraction, and tool call parsing |
| MCP Integration | Connect external tool servers via Model Context Protocol |
| High Availability | Mesh networking with SWIM protocol for multi-node deployments |
| Chat History | Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory |
| WASM Plugins | Extend with custom WebAssembly logic |
| Resilience | Circuit breakers, retries with backoff, rate limiting |
Documentation
| Getting Started | Installation and first steps |
| Architecture | How SMG works |
| Configuration | CLI reference and options |
| API Reference | OpenAI-compatible endpoints |
| Kubernetes Setup | In-cluster discovery and production setup |
Contributing
We welcome contributions! See Contributing Guide for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smg-0.4.0.tar.gz.
File metadata
- Download URL: smg-0.4.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2734b176372f8a6fe2c35ec3946f3ecdd599059577db63bbdf95717be43689e
|
|
| MD5 |
11fd9694fc2102a0681af7586e2809ce
|
|
| BLAKE2b-256 |
385a3d9587257bcbbef34bab3137257ed1d759ff4fd3577c06427340f1f2ac45
|
File details
Details for the file smg-0.4.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 17.1 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7e142ce7a40dc2e3badcfc81866b66cd9ff4bc57319dae70f8878c789dac80d
|
|
| MD5 |
b3769d94df45ed7b810f864580dade44
|
|
| BLAKE2b-256 |
6b0af320510fe766f3b30b71eb9aa52ddfe209c5d2292fc8db37c7aed04c4131
|
File details
Details for the file smg-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 19.0 MB
- Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ee0ebce6f4b9c34539a058353b4be9eb82dce5476b79a0a9b8b234808c2393c
|
|
| MD5 |
83904d99b6e2db6e4572bb18eaa88b8b
|
|
| BLAKE2b-256 |
1c02e011f34acac2fe1907ae9037ddd080bb5cb61f27d2c75e7b6490340157a0
|
File details
Details for the file smg-0.4.0-cp38-abi3-musllinux_1_1_aarch64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 19.1 MB
- Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99dcabe0c21715a6bcdf02629f477a20077295a869f3f94936b2f6ac4e888545
|
|
| MD5 |
d48b58ac1ea180b2671f3b76c1bab481
|
|
| BLAKE2b-256 |
7e00dcfc254d41929e3ced57caa717457a719f71a73e7abd436ada41e65309f3
|
File details
Details for the file smg-0.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 18.7 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e50c6ce33b78d7f2948d87ff02b8899b876ca9c7e3870e016a71a17dc1dc8583
|
|
| MD5 |
e4d83b4f3d3c2e20a137da2e969b8531
|
|
| BLAKE2b-256 |
b35278ce2b1a74ab5a7fc6d09b0133fb6c17cf8bbc6979b643e6c600c23359d7
|
File details
Details for the file smg-0.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 19.0 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a11b6ee23659919a9930b02ccc977922c4064e39ae37acc314ed6e0158433c64
|
|
| MD5 |
b7e5f588e330b7646edca3667070d89e
|
|
| BLAKE2b-256 |
226693899d3e20b78c794ae9c98e9297e6c489537b64569d352a2913b54fe507
|
File details
Details for the file smg-0.4.0-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 15.0 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e71e48d374f706fb98248218aae92f0f9595a3f838762d000ea616871ef1b7ee
|
|
| MD5 |
ec377bbc19b687e1dc14f13f56245ce8
|
|
| BLAKE2b-256 |
2976b746a8dc20453c2d1f9062745ebd1c69f25cc71fe57f5f4fea4106cce1ef
|
File details
Details for the file smg-0.4.0-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: smg-0.4.0-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 15.7 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19cf26cb2ab33d2f985b6ee864b9d999b21a475c18290f8780b51ccfd03c9ee0
|
|
| MD5 |
2c0303b30e3d4b188312ef24e9e4b7ed
|
|
| BLAKE2b-256 |
3af2c7a86d71c4f45d531ff00e2d2ad7c5360d77753127e5745ed3a0e5ebcf5f
|