Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki PyTorch Blog

Engine-agnostic, high-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether vLLM, TensorRT-LLM, TokenSpeed, or SGLang—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (vLLM, TensorRT-LLM, TokenSpeed, SGLang) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg launch --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg launch --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg launch --worker-urls http://gpu1:8000 --enable-mesh \
  --mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
TensorRT-LLM Anthropic
TokenSpeed Google Gemini
SGLang AWS Bedrock
Ollama Azure OpenAI
Any OpenAI-compatible server Any OpenAI-compatible provider

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.5.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.5.0-cp38-abi3-win_amd64.whl (24.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.5.0-cp38-abi3-musllinux_1_1_x86_64.whl (27.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.5.0-cp38-abi3-musllinux_1_1_aarch64.whl (29.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (29.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.5.0-cp38-abi3-macosx_11_0_arm64.whl (26.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.5.0-cp38-abi3-macosx_10_12_x86_64.whl (26.0 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.5.0.tar.gz.

File metadata

  • Download URL: smg-1.5.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.5.0.tar.gz
Algorithm Hash digest
SHA256 184dbf57035513699fda7f7c0c8b5b3126711cdf1c68ad40a0d8116526f53a18
MD5 79becab4596019492b7266303977592f
BLAKE2b-256 815c3bdebfb8a24137b76896f4e4ef6d776fcd9b71018207a89493f12c73814e

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.5.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 24.8 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.5.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1686aba758d8f25984f61cac80821c1765f7faa25fd2a37a38d544f0bb928c0d
MD5 ccb0dc7826d191d9e757340206411ce2
BLAKE2b-256 0ec856148e8dc46864c8ca8bce9cc44efc70faac7c61acc6d4254997490fafd5

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.5.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 27.9 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.5.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f788c744694ce0601048475ae1ae43a3d09701ae7a769b1074ffe15635de0aac
MD5 0a97eea91def1527ff0bbe97bebd4ac6
BLAKE2b-256 89489e2dc1af17efaffc959ec5793c0c1f244917d952af42d0156918a087bec4

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.5.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 29.5 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.5.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 5778b726462653cf234807e3193f71b5ff5051a085e8e198906d63118770fd60
MD5 fe8af13c5c0f05111f3c15ba1559dea9
BLAKE2b-256 ab911a1084762c635067c935edb656c08ffe6022190c76aa8b6d1eafa1d288cb

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1947ad50313789a0130a5d1fa60e3922112ca16be588d767edfc95a70599e181
MD5 dc5545638162e8e072ae6d7039ebb714
BLAKE2b-256 d4bb0dca1c6a973d4c2dfd634cba182bd4736bd2deb942ae33388d5ef96ce0c3

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.5.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 758dd3348d781d927e90e3e4557bff5319fc45550e23e07943c12b111862b392
MD5 25702d3f492aa6e6d65fa0c42f319289
BLAKE2b-256 0653f02f9bcaf919b19ba401789850428fce0117a3f71cfcff05200343f2b6e7

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.5.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 26.9 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.5.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2f606595fa26e16eb5295f060e4336162019280ff91314f30b1e8bc926404300
MD5 5f1727199c3c9ce443ad554bca3b19bb
BLAKE2b-256 0ec87bebe7f6ac47475723dc92415284f09c63b715ff7a969eaafb45e6bf0c02

See more details on using hashes here.

File details

Details for the file smg-1.5.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.5.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 26.0 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.5.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f0f9b012645602246a7a64dfbe320c11075ec7b03bc7f429ed34669851ea4d70
MD5 e09bda4a0df0033a91cac6b77258df1d
BLAKE2b-256 4f73344f459c0cf8a1f6c335807cad3dbbab6ab323c0041710b060146d3c1af5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page