Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki PyTorch Blog

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg launch --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg launch --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg launch --worker-urls http://gpu1:8000 --enable-mesh \
  --mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspeed_smg-1.4.1.post20260519.tar.gz (2.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (22.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

File details

Details for the file tokenspeed_smg-1.4.1.post20260519.tar.gz.

File metadata

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260519.tar.gz
Algorithm Hash digest
SHA256 dbcd7af286efc5d675a6783b844ec00e98caa50410762cdbaae03df7b306c609
MD5 abfa614266303ce1fd5f366765d39579
BLAKE2b-256 55409c7ce869da9f2f9647a8dedefa1724f655ac6be187f0d45bb477050f27b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260519.tar.gz:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 951ba44988aa30a25419bbf2d570ce0fd34dbb7d962dd7afdb453a67679afc31
MD5 b092b8ec641e876d602812e65b272c64
BLAKE2b-256 ae496f4b9084e3551030adbfb97ed4cded033ec7efacaa18a572b48e32cd0c81

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ede7e9a56906af2f53a89da34f1b91658e91a7191c31910f3bd9509c12da2221
MD5 2204a2a95059e57b70a3c1eafdd3706b
BLAKE2b-256 0103e2b2ee53d984bc27c0fd8758616966966f6da8680daad14f599d00b23921

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260519-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page