Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release License Docs Discord Slack

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.2.0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.2.0-cp38-abi3-win_amd64.whl (18.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.2.0-cp38-abi3-musllinux_1_1_x86_64.whl (20.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.2.0-cp38-abi3-musllinux_1_1_aarch64.whl (20.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (20.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.2.0-cp38-abi3-macosx_11_0_arm64.whl (16.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.2.0-cp38-abi3-macosx_10_12_x86_64.whl (17.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.2.0.tar.gz.

File metadata

  • Download URL: smg-1.2.0.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.2.0.tar.gz
Algorithm Hash digest
SHA256 b0af6a5486850e51c36776184d3339bbad8b4b6ac470f0cada1dc548f0a8a77b
MD5 4fb83db0ec7b0a00389ac32d6aec1344
BLAKE2b-256 d4c227f3718e490ad9d31522a1dd2719208889a9079957b47b4b89fa737318d2

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.2.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 18.7 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.2.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 38aec30ae9185e121c9b638feb33d691d44f9594b432fd2ef2f95c86e22994fd
MD5 69e62eaa57e13dca768f6f819b8f51ac
BLAKE2b-256 f5938883a9b27f44dc5680aead1509c028029ce85285e06cbfe6100581012fab

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.2.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 20.6 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.2.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d0f9e4f46392bff03091ece57961e7fa4e8aa95af064bdb34c64a4ac12c0720b
MD5 34266b707b15c8796371c8b70bfe13a6
BLAKE2b-256 532cacf710593489d00136a98d860d89bc9610260bdd8f340527c709a476f66e

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.2.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.2.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 eb65bfafebf56e58979c708d65f6cdb4dbc9b9e011f2e450221ad6a8dd308d09
MD5 6a886d4b5122b4421cc2c3c587e8a9ba
BLAKE2b-256 a9aec83f145a512bc63cc375baf936deae2471e9946efce6f290c354a2f572ec

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cc053135084d7823fb3e850d16af0bb75542b26d0573fbf6d3e8473efd46d4c8
MD5 23b832a502d230387f71805c257007a5
BLAKE2b-256 b31776954d9e4b2ef9037a85190699577c0cd53e22dd0d454b58745f01bfd051

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a70151307470caa6b3c2f63731db0f0134c7e35012c9b13c9fe9e086617fe9fd
MD5 cc83cc4ae40b785ac6fe9d526697009c
BLAKE2b-256 af55e407a7a39153d3eaf424aaf89bb697882c5c81c468668004a7b9e4155e10

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.2.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 16.5 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.2.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b4f4ac0ed35e47ac378aa0c2f9d483c83ca102999305882f71b8ba01b6eba75c
MD5 4acf2c8f9c9bea8a6f6fb1801344d37c
BLAKE2b-256 a7f5f8ca3e77c4115f71f6f8ffd657daa0baf83133aad82c11274b2896052bfc

See more details on using hashes here.

File details

Details for the file smg-1.2.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.2.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.2.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 abc2afdc5d596564ccf9925172e6d43d7e45f1537c82e4afe2da20bc055fcd56
MD5 6a2379c0de2f3e364af438496458fc47
BLAKE2b-256 bd8e130d3323ecd8541db35965ef5e4f43cb85fc6cf36ef7cc0a0991b40ef5e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page