Skip to main content

High-performance Rust-based inference gateway for large-scale LLM deployments

Project description

SMG Logo

Shepherd Model Gateway

Release Docker PyPI License Docs Discord Slack Ask DeepWiki

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?

🚀 Maximize GPU Utilization Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --enable-mesh \
  --mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted Cloud Providers
vLLM OpenAI
SGLang Anthropic
TensorRT-LLM Google Gemini
Ollama AWS Bedrock
Any OpenAI-compatible server Azure OpenAI

Features

Feature Description
8 Routing Policies cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration Connect external tool servers via Model Context Protocol
High Availability Mesh networking with SWIM protocol for multi-node deployments
Chat History Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins Extend with custom WebAssembly logic
Resilience Circuit breakers, retries with backoff, rate limiting

Documentation

Getting Started Installation and first steps
Architecture How SMG works
Configuration CLI reference and options
API Reference OpenAI-compatible endpoints
Kubernetes Setup In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspeed_smg-1.4.1.post20260514.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (23.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

File details

Details for the file tokenspeed_smg-1.4.1.post20260514.tar.gz.

File metadata

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260514.tar.gz
Algorithm Hash digest
SHA256 cad530e2fc1f0d631c1b7e7dcf619087555b99eee421af5ab4fc4b927d0dea7f
MD5 f3d266f889f5684514a3abfb9055231b
BLAKE2b-256 0b0e017fcb7676a02ea1f04b4988f6bbfe5f610a9db2f8089b229b4f731ba19b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260514.tar.gz:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c3e10954219dc9c450281e0aba3193f99d106148b41723585330d9ebfe184274
MD5 2209386416ae2a0688068462fc94cc3f
BLAKE2b-256 4ab809c731db09c34552e3ebd6315772d6f2e6eb616193390069f0c93a69a606

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c39cc70a66e036fba74d3c8a4e96633cfa0dc4a3ddad7717d0e687c74ef047aa
MD5 cb130e06c66be92854054c83e954c50e
BLAKE2b-256 efbf7d5d121192ba416a308f33221415a096d9b5e8e5165d4ff69643dd9e63e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page