Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.3.0.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.3.0-cp38-abi3-win_amd64.whl (18.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.3.0-cp38-abi3-musllinux_1_1_x86_64.whl (20.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.3.0-cp38-abi3-musllinux_1_1_aarch64.whl (20.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.3.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (20.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.3.0-cp38-abi3-macosx_11_0_arm64.whl (16.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.3.0-cp38-abi3-macosx_10_12_x86_64.whl (17.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.3.0.tar.gz.

File metadata

  • Download URL: smg-1.3.0.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.0.tar.gz
Algorithm Hash digest
SHA256 bfb0cb6112aa266a07ae6738a9cad2e67454ab4811f26ef5e5caf785a508b068
MD5 beb9722c29f4c8ff6c6a9f97f428120e
BLAKE2b-256 3e4c8ebe73d6f42b231e7a630ea784c53e9286b13c71b7fbab1823e5df3e4d87

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.3.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 18.8 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 167040d49a3736abf3bbecaa2bde015dad4e17dca36f086d1ce2d134e141934b
MD5 867adbdb9f67977d13fe53eb308a5e79
BLAKE2b-256 3a1cc16acb42e651dc56f9b187ebf646bd1307c9de5f1acd90c8b89a7ce403f5

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.3.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 20.6 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d6bdc2d03fcd8c4c6d5d22853405a2431371eb06022ddd9c6df411f9b3d5fb7b
MD5 1c107391d03f2a77a9a695505dbec3ea
BLAKE2b-256 e26676382b2de6732b6d03fa37001924d61be47fef31d3b0100a3d1962bbff06

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.3.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 a57718a6d3132f6a1ca0587f0ab978998e8a557ab15810c21339cb2cc70c355b
MD5 ba3806502680c0053ea73c0c66c4d93d
BLAKE2b-256 2133b602f171da26809f3d78a32ba3d9eb5e74831290d6ceeb24b252df31a737

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 20f2f409cb7f90152e4a0980c32c50ee8f70c39b978be9d55c2f3ce92ea747af
MD5 0db5ca021bc036ad600c4f4493e9c0bb
BLAKE2b-256 a7b1e3e6444ddfdc15ded70f86d6d6964d5804a56d5e3985e1991327e3b8d48f

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.3.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 37b45e10ed2645b3f7f707e262f96355a2cdd87b5b9bf4091adfc8a32cfd1b1e
MD5 74e7a36c7cd18da1dc3e3973c1aa4bf7
BLAKE2b-256 6a1f4a7dad37285be920b34e93f80cce0000b62fa211a267910024c8792928b9

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.3.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 16.5 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1d89b4eb7920d6d2c049b146331b89ddb27cee087732615039aed73f182f6808
MD5 d0e6bd4729dbf844c9fb8b245b307dbb
BLAKE2b-256 9f1bd9d83938deb60642ffb122fc882bc35426c6bd4435686d1f2e72920c7012

See more details on using hashes here.

File details

Details for the file smg-1.3.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.3.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cfc298a02971dee2ecf2da4d737590b10deb3aa393580b645a744fc678590048
MD5 19ad4a1b554508e636f479d82deb8b56
BLAKE2b-256 f04091c73589183af8fc3866107006ea85c5f8016666ceaef857b072450630ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page