Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.3.3.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.3.3-cp38-abi3-win_amd64.whl (18.9 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.3.3-cp38-abi3-musllinux_1_1_x86_64.whl (20.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.3.3-cp38-abi3-musllinux_1_1_aarch64.whl (20.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.3.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.3.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (20.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.3.3-cp38-abi3-macosx_11_0_arm64.whl (16.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.3.3-cp38-abi3-macosx_10_12_x86_64.whl (17.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.3.3.tar.gz.

File metadata

  • Download URL: smg-1.3.3.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.3.tar.gz
Algorithm Hash digest
SHA256 674c7010b3dd3948dc7ba3cf872d09fa80a59af52e48e73b1973b337565d608f
MD5 943ae999cf876a8d76a8bb03c3b5d1f2
BLAKE2b-256 07218a9156329a0fa539bf6d596f0ead12e4727889811825610f227387838197

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.3.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 18.9 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 61647955d7baa0dfb96d217ef16bb23d56dc0f7ed87adfcd8bb15780d04d7cec
MD5 b00558b0670b0f397813907d5e9644e0
BLAKE2b-256 dcf5412730554e81e3ac9c59aaad547e9baf47407f74b14dbb55c31aa7036bfe

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.3.3-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.3-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 9dd359a6b1e5e94027876d2e5f7445d46dfc5532b96f823da2077876632ccff4
MD5 a029fa118d975340569c2a5c053e343d
BLAKE2b-256 df6c29bb5c8127fadfa18f6cf80d5d35df30b72840dc57d2e36ee5c98419a34b

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.3.3-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 20.8 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.3-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 906ab482121053bee6c121b302b77c06d4fb0e1ee8e4dd158c06d1bd77782ddd
MD5 9b8735672c412930021086457cd6d3a4
BLAKE2b-256 6e9059b02f8329cfc71ee4011c7a4423d55c00f73e951530490f37d8b514efcc

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.3.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e572e8079ab9a68b18b207200ccae0e7f06f11fd707f9c21e1728acbdc236412
MD5 edc467d9e89246dbff60ccac1e139683
BLAKE2b-256 424eacf146a573c5ed99256cb662fa9a3a86eb0a4a918f5927e05195a282f95e

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.3.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8babd2b8942b1f888fac97a62ec975a20e6ae3a3b0cf53c98e72f42e521acb16
MD5 4ff8bb7f6015a0858d0cade36e89b772
BLAKE2b-256 eebc4a15305f04fa7daa568afaddb490b0acd6e5405aa15ed5db4feb092412e4

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.3.3-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 16.6 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 de602841f5a3fb562d61dc6108e40f791e2cb849e4648a87251ac8e79aa4a0a2
MD5 78e7115a260748579e55fa92c2b576b0
BLAKE2b-256 d7243aa4e85c53100db0e5cdf3fa553821066d19ab66e0b31042fa049db77076

See more details on using hashes here.

File details

Details for the file smg-1.3.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.3.3-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 17.4 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b278552ed7b58aae164d3c56b23f4382e4909612443d2d56e709c48b099da73b
MD5 e9487b76946e595e671ae34e80f27523
BLAKE2b-256 8584cc7caeb3241c2c96997acae89647c967a44c0821a5a45c53317bc0d531c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page