Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release License Docs Discord Slack

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether vLLM, SGLang, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (vLLM, SGLang, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.0.1.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.0.1-cp38-abi3-win_amd64.whl (17.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.0.1-cp38-abi3-musllinux_1_1_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.0.1-cp38-abi3-musllinux_1_1_aarch64.whl (19.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.0.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.0.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (19.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.0.1-cp38-abi3-macosx_11_0_arm64.whl (15.0 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.0.1-cp38-abi3-macosx_10_12_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.0.1.tar.gz.

File metadata

  • Download URL: smg-1.0.1.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fb9afd1e0cd0982322ea89b670f5a56656a94d3f4a0fbcdfa3e5917b9486d8a5
MD5 3fc508d20dcd8c3469f9d7622a58b469
BLAKE2b-256 21fd1b5ba259bb3f157b6901cef3bb8a44447ee5a4ec4baadbfc3c240e97e222

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.0.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 17.1 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.0.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4141ad46d78f7999cbcd0f2e13e02e0b6750513efdbeac1c11ea2531d7c294cd
MD5 bd71572a023443847cee907a04b408c9
BLAKE2b-256 b5d43eea5e89749fa7402c8bc8ca4a0994f03df86b0d5e1ba4244fcb1900c57e

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.0.1-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 19.0 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.0.1-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 ae8e3be4fcced5d550edd5d0e8178164625f27de5b2016af0869005776f4e05a
MD5 a891d59e9af91acea20f212b947b3229
BLAKE2b-256 676cba043f4a6487865cfb4a802ff6090a03198aac78c939fe1ccc60f30cfb07

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.0.1-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.0.1-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 fe623bdb52ed17f4357a21366d825971d1b01e1f3077c7df5507eaee74a04b3b
MD5 c3683e02a212e7bcb2c718731ff23700
BLAKE2b-256 591f0ad5f773cf7553e2cb992df9a9fd49e273335455879202eccfada9995706

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.0.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 672433410b8c0a31edaf1443644e89ecc8e1ed643c151127a9c78feb3d28b72b
MD5 23bb253e5f7461525dbb8ad9f7b1e212
BLAKE2b-256 fbff2981f61cff027e97740beb9f0dfe5e374a3dbb4316dec9f15c6485f50536

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.0.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 951e8696b1a25706a33b1c17cc01b2cb760d727a6d68055ecfd564f32ba56fa7
MD5 467cf04c7d40211b16e9109b8b112b3b
BLAKE2b-256 a744ade15a4c055ababf910c363cbd00f3f06a5d02ac47ea3dbe55eed1ff3dde

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.0.1-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 15.0 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.0.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a00303f4078e4df4649d3e20d672745e31fc3bde23e609e4e2f83ecfc1eb1506
MD5 77bb03b36cc6bb5a7e0b3ecdcb1482cd
BLAKE2b-256 4499db9e0001345db8ab2fde41362bd5ba7410d96b0acd5aa9e6b4f9d5008cd1

See more details on using hashes here.

File details

Details for the file smg-1.0.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.0.1-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 15.7 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.0.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cd9ece74816af18254c2ad638de41ec5121a63a33e37b94c9dbd2c66a94637de
MD5 3ab72534ec6efcfa87d5b3ff9d7b4bd1
BLAKE2b-256 ab978d14eccdbed12119382812f1141cac3230d72a6a7432585c7693e646d3a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page