Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.3.1.tar.gz (1.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.3.1-cp38-abi3-win_amd64.whl (18.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.3.1-cp38-abi3-musllinux_1_1_x86_64.whl (20.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.3.1-cp38-abi3-musllinux_1_1_aarch64.whl (20.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (20.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.3.1-cp38-abi3-macosx_11_0_arm64.whl (16.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.3.1-cp38-abi3-macosx_10_12_x86_64.whl (17.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.3.1.tar.gz.

File metadata

  • Download URL: smg-1.3.1.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.1.tar.gz
Algorithm Hash digest
SHA256 4a746c30cb1446078326777f4432d2cdaac3aa06a3e5378f64c939f2b55efb40
MD5 37cfa3d6917df593c976dab92cc62ef5
BLAKE2b-256 93ff01eb060c75e635a8f8a3d06eac16377ca658ec503fde3d8142238a1f5e8e

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.3.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 18.8 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 638758cfa1e45f676175fba4ad397a669eed1163e06f278f185634bfe9995be8
MD5 aa7e6b4e96c79c49df5fa0bed25e9d9d
BLAKE2b-256 aa18736f8e3f38bff62c76eec1087fa7700ea79ddda766d6fb6dd44f7a25f933

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.3.1-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 20.6 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.1-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 5cac0ad74e953f0c871fc0ac443b376ff720e8f34c315dd39032e521b7137b52
MD5 1e802a683ecf9ce231aa4f23f05796dc
BLAKE2b-256 3a98bd2e7be5185d9ce81680ba3f7311f2caf16056253a1c70cf8650e7c2f6a3

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.3.1-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 20.7 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.1-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 2338ed756eaff04e65911d1d3f79a3173ba1c55463f2717c4e279929962fc83e
MD5 aeba5b397a418c5ba3a5fc61d186a371
BLAKE2b-256 b9d6642e898a63d5610be6b101f6a3dd92289225982bc7da015814a729986da2

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cabf9ceeb650865aed8068930be501a1185bef407144b83fd812a779a09f5de6
MD5 c8ba6cb2bee8722f38b4949c9c3f6e20
BLAKE2b-256 f7d6e71f921b56baeb766a480991efad7216fd56283f13664a30ab300e0660ee

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5ca296945f9ea78403e50d4d82e3fefc06d0f35b251eb7c783ce17269dc79977
MD5 ac2d2c573e69b52975489283f2aef917
BLAKE2b-256 1b8ef5d367288f0c06855af94ba62acf9d7470a47c7212157f1921eede724f8b

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.3.1-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 16.5 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3d23eded605da7eceab798d85e73b34d38344f3d9857c355fee272ea0135dbe1
MD5 648f536cd98468837917a6bba008e7e2
BLAKE2b-256 bb8d0683ac1f9174cee7b9bb6e6b7f524d3fb6a5036ea9f118e740de2db689c2

See more details on using hashes here.

File details

Details for the file smg-1.3.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.3.1-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 17.3 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.3.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b76cd30400158e074cc0c8065656b6c9f147a7329ada4b44f934094a4ed55c6d
MD5 5bad1bacbda954ab99a53f1675b2303d
BLAKE2b-256 b87e4f24a48b505769dee8497af791f41449c9aa781e360ce588b33dd2560263

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page