Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release License Docs Discord Slack

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether vLLM, SGLang, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (vLLM, SGLang, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-0.4.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-0.4.0-cp38-abi3-win_amd64.whl (17.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-0.4.0-cp38-abi3-musllinux_1_1_aarch64.whl (19.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-0.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-0.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (19.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-0.4.0-cp38-abi3-macosx_11_0_arm64.whl (15.0 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-0.4.0-cp38-abi3-macosx_10_12_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-0.4.0.tar.gz.

File metadata

  • Download URL: smg-0.4.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e2734b176372f8a6fe2c35ec3946f3ecdd599059577db63bbdf95717be43689e
MD5 11fd9694fc2102a0681af7586e2809ce
BLAKE2b-256 385a3d9587257bcbbef34bab3137257ed1d759ff4fd3577c06427340f1f2ac45

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-0.4.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 17.1 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-0.4.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e7e142ce7a40dc2e3badcfc81866b66cd9ff4bc57319dae70f8878c789dac80d
MD5 b3769d94df45ed7b810f864580dade44
BLAKE2b-256 6b0af320510fe766f3b30b71eb9aa52ddfe209c5d2292fc8db37c7aed04c4131

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 19.0 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-0.4.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 6ee0ebce6f4b9c34539a058353b4be9eb82dce5476b79a0a9b8b234808c2393c
MD5 83904d99b6e2db6e4572bb18eaa88b8b
BLAKE2b-256 1c02e011f34acac2fe1907ae9037ddd080bb5cb61f27d2c75e7b6490340157a0

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-0.4.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-0.4.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 99dcabe0c21715a6bcdf02629f477a20077295a869f3f94936b2f6ac4e888545
MD5 d48b58ac1ea180b2671f3b76c1bab481
BLAKE2b-256 7e00dcfc254d41929e3ced57caa717457a719f71a73e7abd436ada41e65309f3

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-0.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e50c6ce33b78d7f2948d87ff02b8899b876ca9c7e3870e016a71a17dc1dc8583
MD5 e4d83b4f3d3c2e20a137da2e969b8531
BLAKE2b-256 b35278ce2b1a74ab5a7fc6d09b0133fb6c17cf8bbc6979b643e6c600c23359d7

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-0.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a11b6ee23659919a9930b02ccc977922c4064e39ae37acc314ed6e0158433c64
MD5 b7e5f588e330b7646edca3667070d89e
BLAKE2b-256 226693899d3e20b78c794ae9c98e9297e6c489537b64569d352a2913b54fe507

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-0.4.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 15.0 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-0.4.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e71e48d374f706fb98248218aae92f0f9595a3f838762d000ea616871ef1b7ee
MD5 ec377bbc19b687e1dc14f13f56245ce8
BLAKE2b-256 2976b746a8dc20453c2d1f9062745ebd1c69f25cc71fe57f5f4fea4106cce1ef

See more details on using hashes here.

File details

Details for the file smg-0.4.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-0.4.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 15.7 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-0.4.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 19cf26cb2ab33d2f985b6ee864b9d999b21a475c18290f8780b51ccfd03c9ee0
MD5 2c0303b30e3d4b188312ef24e9e4b7ed
BLAKE2b-256 3af2c7a86d71c4f45d531ff00e2d2ad7c5360d77753127e5745ed3a0e5ebcf5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page