Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --ha-mesh --seeds 10.0.0.2:30001,10.0.0.3:30001

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg-1.4.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smg-1.4.0-cp38-abi3-win_amd64.whl (19.6 MB view details)

Uploaded CPython 3.8+Windows x86-64

smg-1.4.0-cp38-abi3-musllinux_1_1_x86_64.whl (21.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ x86-64

smg-1.4.0-cp38-abi3-musllinux_1_1_aarch64.whl (21.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.1+ ARM64

smg-1.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

smg-1.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (21.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

smg-1.4.0-cp38-abi3-macosx_11_0_arm64.whl (17.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

smg-1.4.0-cp38-abi3-macosx_10_12_x86_64.whl (18.0 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file smg-1.4.0.tar.gz.

File metadata

  • Download URL: smg-1.4.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.0.tar.gz
Algorithm Hash digest
SHA256 ec39de6554bc03bd2dfa81f5d300400898fc50bdc88cf166aca4f882e71ca124
MD5 a41658f7e4b9f4301295abc7f82a79e1
BLAKE2b-256 865043717179b2e6fa0e560e6e07ffbf327f07ed24c946f834f250fbc059a56a

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: smg-1.4.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 19.6 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ca1ff0d096bc3bb7279c51ab3406e2467aafac8532ece819a06cfde47a757009
MD5 5ab3fd5b6a6e6172395dae2cda457393
BLAKE2b-256 63284fc44613a4143c8ae27e34c82db3255051bab48aa1133f356d7bbb46f7c6

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

  • Download URL: smg-1.4.0-cp38-abi3-musllinux_1_1_x86_64.whl
  • Upload date:
  • Size: 21.4 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 0ce1e15ef69cc2d9afe781a3f66de362b97a3c146b5c85088330ed8aa949ff26
MD5 f893ad2053c6a9c0c2ceb641340d8e5d
BLAKE2b-256 42468f718b681384d62e2c1d060dccd443a61a97d9f22f45efebc79997ba3e3e

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

  • Download URL: smg-1.4.0-cp38-abi3-musllinux_1_1_aarch64.whl
  • Upload date:
  • Size: 21.4 MB
  • Tags: CPython 3.8+, musllinux: musl 1.1+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 7a47699c2dec6fd189377775b3c8845e71a071989c2114bfdc3b701174981cfd
MD5 03526b654d032da5719f1503bbb9e88e
BLAKE2b-256 ba53f78cd3fc4f26dcb03f82e418d9a20b2255f2cc0ea4bf1ad21cf57a96f9af

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smg-1.4.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8d11c61a10afdecd8e11767511275c8d8adb55b9d4912e36a5671fffcb7a0579
MD5 88963a59098aa5f75476fd28dca40df6
BLAKE2b-256 4fee0e2bb90e3d5e5d610f8d5fce2f2fe30fa2179adb029ccd4cf48510d755f9

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for smg-1.4.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fbc7d6d2c92176c4befd783c11aa29feb3c1c61df54734a42ddfcf298ca55354
MD5 f2048f82b945fb85d92ca0d1323be68a
BLAKE2b-256 f7c47e06500d6fb5894cee7ed02e85d66ef6e7d6fa69dfac9cdad3c45b10b76a

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: smg-1.4.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 17.2 MB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ad6dff075f20448b406e36c7978422d416e12825eaf2f8d3ecd8de0207b93d5f
MD5 85ee14e2031cb553d5e9f68b0231570f
BLAKE2b-256 c18f5766cdeb248db1e2fb8d4059ea84d7cd18c72edf664449d52d4c7b30ec8e

See more details on using hashes here.

File details

Details for the file smg-1.4.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: smg-1.4.0-cp38-abi3-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 18.0 MB
  • Tags: CPython 3.8+, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for smg-1.4.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bb5887e5f0bde687db6392a2bf8cc1ae3e2138877e9cdf5a890ffc97f5771ee3
MD5 74afc84613d291bd54a84540300e2e0f
BLAKE2b-256 94cc54f3b7ef07569ef3250cc9f1169dcc7d66c4c4ec11eb0f109845986658f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page