Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release License Docs Discord Slack

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether vLLM, SGLang, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (vLLM, SGLang, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.1.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.1.0-cp38-abi3-win_amd64.whl (17.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.1.0-cp38-abi3-musllinux_1_1_x86_64.whl (19.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.1.0-cp38-abi3-musllinux_1_1_aarch64.whl (19.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (19.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.1.0-cp38-abi3-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.1.0.tar.gz.

File metadata

  • Download URL: smg-1.1.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e00e4cf3b2509df6fb7a65bf288000ec4afe58ca6973732dbf3c3aee9308888c
MD5 fbb6817c4eb79da52bfa4165d4c32a54
BLAKE2b-256 ed061051d5b63b8a7b1b538b3f0a94441db97216b3758022857ebf563901d3e2

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.1.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 17.8 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 da433ef6858e3e456982089d8f7d5a25d3e99015a3e413ea4d4c31f08e1e8f18
MD5 093f2c6d0f5884960d64d872fa97c864
BLAKE2b-256 10d314d8a7bd92af12de421daf8dddfe48baaaccae53f771137fed5e541354ca

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.1.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 19.7 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.1.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 eadebedece39db0b3bc215d3d7d08b98b7cff4fd392d854476b085ca0ce5bc4b
MD5 0c7777a2eb1862c19467a48c65a9a023
BLAKE2b-256 ecc618ba811c9dc6962fe801750453b7e566836145596bc63c14e3304fc56e58

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.1.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 19.8 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.1.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 72ff3977da9bf667344dd02d6594aad91a5b6a316e8e5ec86824ca55de8868a8
MD5 fde94f02886fd20113d74de6b1cbe384
BLAKE2b-256 880876a3974a89c3cbd97add3654580705894bd0f9448fb5b6663898928d996b

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2bcac47e2dd91027af15baac907140e430523a9d530084c5748f69f01c0c8596
MD5 cc3a419ac0d9d87c8b0e9baf350bef38
BLAKE2b-256 6dc2fb72c5b1b9c4e9992221168d04fccc948b04a431b99f52b33511fe19621c

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d71bb07a525b6ed7930c0845191bbf587a811dc15e57dfa498bb39e122003eca
MD5 4560492a6d7af92dd39ba007acd8ed0f
BLAKE2b-256 950c624c99be994179ee9f9593f5e7afbe6725d02ef04a714a9cab19f902a9db

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.1.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 15.7 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 87cb512176ef39ae0f9dad9c832759f8042ad41b70ca19293f493e1a0768c0bf
MD5 0b481044606b195dca68f280861cb469
BLAKE2b-256 ab2f33c9a00dda1b004b6c4060e965eac32758d1619275e1e332bfd978570947

See more details on using hashes here.

File details

Details for the file smg-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 16.4 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 01716a7c5020190bd977bfe28d52adf55760889bcf21a749d7318f10df7d1a7f
MD5 067c5a717261d3369e5d28eeca57295a
BLAKE2b-256 693a718e2af838fa4510c688a0700318137c1bd75057f07bf88666964b739363

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page