Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.4.1.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.4.1-cp38-abi3-win_amd64.whl (19.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.4.1-cp38-abi3-musllinux_1_1_x86_64.whl (21.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.4.1-cp38-abi3-musllinux_1_1_aarch64.whl (21.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.4.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.4.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.4.1-cp38-abi3-macosx_11_0_arm64.whl (17.3 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.4.1-cp38-abi3-macosx_10_12_x86_64.whl (18.1 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.4.1.tar.gz.

File metadata

  • Download URL: smg-1.4.1.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.1.tar.gz
Algorithm Hash digest
SHA256 cca7a54faf066831aa621ca24ed7ce610b340fb47d7416c847351d6f8a680a95
MD5 0eea39628d4a10cdd36b599206799137
BLAKE2b-256 d06fcd29f085a99e44d1283c2c3aa1a3c728ba8df71e9818067513a5d532df5b

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.4.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 19.7 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0fc6e51f1e5775a98d93f7ce78af31832f1750fb011ab885769ab762847ddd3c
MD5 399e260dfedc11ca72202c4ff3736965
BLAKE2b-256 d922ac1766198bc74adbd61534756ddd717431067033e9bc5d2f7fa95292ea35

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.4.1-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 21.5 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.1-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 2c16b6cb5b20a48ebaaae05ecccf08196bed126f48a23573d12339c2649c600a
MD5 ab8a52882ff69ddb01d89e225dc18ff3
BLAKE2b-256 9df73be545ad7df24483650e6c60c4448c9b6598ce59141e0c94dc58af510365

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.4.1-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 21.5 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.1-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 a1afbad2853aff556648daf9a7c20eac404ce457b647fadd26837ae35d357156
MD5 d4cc9068f8e27f6ccf792a18068de3f2
BLAKE2b-256 68f600dd810150e12f47cb2c1b12ceb175d4f4af7a03ce42f9bf164aced447e6

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.4.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4ad87fc2f1676ad983123362bd37194517edaaadf42b8668b13cfe5f35fc4ea6
MD5 5b263d4e5c7d5fb31a440942db6ec8c4
BLAKE2b-256 506535a98795c6e7cd1f87fc2b5b102b0e327bbc1bba602f4ab55a9311cf9cce

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.4.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ab8ecaecf408f7077012e8da4b24e99de49d755e8876b377793c4f2b63c79db5
MD5 b2e6ae8ad90d744368197d7982a986db
BLAKE2b-256 fbe9e668e7722b91e95c618e006c1c3e51d6d6de591063624525c52b0b41ed01

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.4.1-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6fe23539cfacfabec6431a46611c718e03e01f7d625908b610820bb709994a18
MD5 48bbb18774f555d5aa87ba9c539dfc76
BLAKE2b-256 37764cfbd5f029d28fe46099bc6ded4ebbde59701f6794d11732a191880d001e

See more details on using hashes here.

File details

Details for the file smg-1.4.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.4.1-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 18.1 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7e698d01af3d8204986727f216eb508be3344073e25838736107e4bb46b10ef8
MD5 76c860c880a0d5267d356c65744227c9
BLAKE2b-256 09a2b8171987860a4ac929f0192996c3334558fe82eac8fa235f66b64b97dcc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page