Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.3.2.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.3.2-cp38-abi3-win_amd64.whl (18.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.3.2-cp38-abi3-musllinux_1_1_x86_64.whl (20.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.3.2-cp38-abi3-musllinux_1_1_aarch64.whl (20.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.3.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.3.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (20.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.3.2-cp38-abi3-macosx_11_0_arm64.whl (16.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.3.2-cp38-abi3-macosx_10_12_x86_64.whl (17.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.3.2.tar.gz.

File metadata

  • Download URL: smg-1.3.2.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.2.tar.gz
Algorithm Hash digest
SHA256 09cd0c585f7c9cf1e64bb0b235b5f9d7ded9b7241bd9c99f131eff8b91b869db
MD5 227ab357e772ae2b9de4fbcd76b1e6ae
BLAKE2b-256 05fd736db0ae7dc52747b8b2a2e39267409f61af4cfa84aaeb212cfdf24b2cca

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.3.2-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 18.8 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e0868e0ffca968726c6b35f0e881c56d3e700593168243a70fbac20d0ae7a4b6
MD5 d9c4e5001fd3cd37c8a183803b538b1b
BLAKE2b-256 52d1cf4c5295053dda92800b913cb353bcecd63103f2095a3cae84631ec34b42

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.3.2-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 20.6 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.2-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 3e0653c081407f6fe72a465c1120a44598d1316bdf6d62600303d10b0cdd56e0
MD5 d81eb4f8a7e77a9c568bbe9569e9859f
BLAKE2b-256 de81d84d17e8ccb0efd76995bdc69519cb3b43f0ed8c83052b1f5112cbdd6f82

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.3.2-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 20.8 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.2-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 fce40fa2eadb049391e97ac139cfddfccc2c9537ad148b1fe5d72b47ce90c0a8
MD5 05774750f5d43d0bbd8bbe352d2cd5ab
BLAKE2b-256 9d2ca6a084798042011eea35938be6de81262db8814faec52130c37ef01d0bee

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.3.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 94ca0cf74fffff13f8884f54f8c075749f023b1499d9cc78845c0d1de5938be3
MD5 95f996f158e8ad17ce7393d56d467e8c
BLAKE2b-256 4823c1219c4cdc781658dc459b5c9220ef23d1a264f6328f83338c34946a582f

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.3.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d0fce6ee606977da20f1de08513f8b47159fbd68049575d5c83c6f13d7836756
MD5 ca14e26b7e9ada8189630bbdffea44a8
BLAKE2b-256 8754460360454f8902af53fae3d5c5a09a0af0200793997fd1e8a4e733ea2359

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.3.2-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 16.6 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 df4028d15999d59a900774a0a020f8a502e28656dcabc2a126c068ab2e09740f
MD5 a2a9fd5fa8c0ba867929f6bd83bb8b9e
BLAKE2b-256 94eb6d268664b326ff1f06fcdefae19dc6dce60399dba8e93b9ff52cabf4fb48

See more details on using hashes here.

File details

Details for the file smg-1.3.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.3.2-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1e997732636b85f26a6d48463079752b3b0f65b4e9a41a6a134b7150d351c6da
MD5 39ae2bf2bde44040d0ad309ab1499dfe
BLAKE2b-256 9d456bd4e5784cbed6dd1c42c71ab446931e2e7709cabfd0192c5f9c9bafaf3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page