Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki PyTorch Blog

Engine-agnostic, high-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether vLLM, TensorRT-LLM, TokenSpeed, or SGLang—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (vLLM, TensorRT-LLM, TokenSpeed, SGLang) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg launch --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg launch --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg launch --worker-urls http://gpu1:8000 --enable-mesh \
  --mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
TensorRT-LLM Anthropic
TokenSpeed Google Gemini
SGLang AWS Bedrock
Ollama Azure OpenAI
Any OpenAI-compatible server Any OpenAI-compatible provider

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.6.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.6.0-cp38-abi3-win_amd64.whl (25.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.6.0-cp38-abi3-musllinux_1_1_x86_64.whl (28.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.6.0-cp38-abi3-musllinux_1_1_aarch64.whl (29.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (28.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.6.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (29.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.6.0-cp38-abi3-macosx_11_0_arm64.whl (27.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.6.0-cp38-abi3-macosx_10_12_x86_64.whl (26.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.6.0.tar.gz.

File metadata

  • Download URL: smg-1.6.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.6.0.tar.gz
Algorithm Hash digest
SHA256 6504c63ff74d1a2e568c1062166161448baf149d6d8ca2c9a5542b75777f9490
MD5 962cc73b616f2c81b9a5301610199b2e
BLAKE2b-256 395f417d1f7aa81900a7f36c20f413dd90fa3dc0fb9f2d4a81daf1a57c3e5d71

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.6.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 25.0 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.6.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 265b2c12c4a1ee959f80204d0f83fedf48c475a7ec4a79011e8c00b810813081
MD5 d00b34c87696bd60804766dafde4922e
BLAKE2b-256 16284d8c43a7bb1d5f4cfd24b038bf3c5c4f7f7c2bb460ef353e421915973def

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.6.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 28.1 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.6.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 9fe737f801960d6a32e1be114658b43e1d70ad3da7493e661f040a6479aba924
MD5 916466e1e0415923d8dd55600c096fd9
BLAKE2b-256 1771680cd4133bea4061e714d9dfb90b59e93cb11742d254a4db4c61125c81f7

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.6.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 29.8 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.6.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 2ee3a2eea07ae43dc0eb70f9c1ac58da0a7af6b60b6a235c43906aa1ca82e8c3
MD5 38e360c93d96addb87e4937f5735e6a0
BLAKE2b-256 970a8789e1ae659a56c0841c626ec40ea72de22f71a8b1e041eb1474b0402f08

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3259fa4727707afbbc620b3748ed482898699b55cba36c9ab04085fa7659616d
MD5 a9790932bad505fa30d698a7bf069351
BLAKE2b-256 d238bebeceea6378349871d7c4bae18d13362e5cc823a28fe70091b3a62c4b57

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.6.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a5a1cc97a66507c614924c4256701b4ca84aad183faa03a17afb50c45892b93f
MD5 f17f5fb36c4e4c13936e39e8743b4260
BLAKE2b-256 f6146e63eb4435986c3683eeb64c576d4b8983d3ab4b801e696f23b0fab2861a

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.6.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 27.2 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.6.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 05a07acd3eb98fa9701ac859141570cdc4ef681097f5b6b709c903fb35c31925
MD5 374f6e8ebc64a4cff208c90a367dbe07
BLAKE2b-256 693c04b04352ba583faf4e79148b9cca4ca49813a6f57a0a2e29ae2ada12c702

See more details on using hashes here.

File details

Details for the file smg-1.6.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.6.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 26.2 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.6.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ae230387b850dbd23e543ba8e5f13c7ed0f087f1d57c3b07ec61b0b4b5149620
MD5 0c3fd87d320f84d153401b64421dbfad
BLAKE2b-256 55873e46e6e01d791fbe64a62124a7502aea92da344b455b7c3d82c220af73d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page