High-performance Rust-based inference gateway for large-scale LLM deployments

These details have not been verified by PyPI

Project description

SMG Logo

Shepherd Model Gateway

High-performance model-routing gateway for large-scale LLM deployments. Centralizes worker lifecycle management, balances traffic across HTTP/gRPC/OpenAI-compatible backends, and provides enterprise-ready control over history storage, MCP tooling, and privacy-sensitive workflows.

SMG Architecture

Why SMG?


🚀 Maximize GPU Utilization	Cache-aware routing understands your inference engine's KV cache state—whether SGLang, vLLM, or TensorRT-LLM—to reuse prefixes and reduce redundant computation.
🔌 One API, Any Backend	Route to self-hosted models (SGLang, vLLM, TensorRT-LLM) or cloud providers (OpenAI, Anthropic, Gemini, Bedrock, and more) through a single unified endpoint.
⚡ Built for Speed	Native Rust with gRPC pipelines, sub-millisecond routing decisions, and zero-copy tokenization. Circuit breakers and automatic failover keep things running.
🔒 Enterprise Control	Multi-tenant rate limiting with OIDC, WebAssembly plugins for custom logic, and a privacy boundary that keeps conversation history within your infrastructure.
📊 Full Observability	40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs with request correlation—know exactly what's happening at every layer.

API Coverage: OpenAI Chat/Completions/Embeddings, Responses API for agents, Anthropic Messages, and MCP tool execution.

Quick Start

Install — pick your preferred method:

# Docker
docker pull lightseekorg/smg:latest

# Python
pip install smg

# Rust
cargo install smg

Run — point SMG at your inference workers:

# Single worker
smg --worker-urls http://localhost:8000

# Multiple workers with cache-aware routing
smg --worker-urls http://gpu1:8000 http://gpu2:8000 --policy cache_aware

# With high availability mesh
smg --worker-urls http://gpu1:8000 --enable-mesh \
  --mesh-advertise-host 10.0.0.1 --mesh-peer-urls 10.0.0.2:39527

Use — send requests to the gateway:

curl http://localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello!"}]}'

That's it. SMG is now load-balancing requests across your workers.

Supported Backends

Self-Hosted	Cloud Providers
vLLM	OpenAI
SGLang	Anthropic
TensorRT-LLM	Google Gemini
Ollama	AWS Bedrock
Any OpenAI-compatible server	Azure OpenAI

Features

Feature	Description
8 Routing Policies	cache_aware, round_robin, power_of_two, consistent_hashing, prefix_hash, manual, random, bucket
gRPC Pipeline	Native gRPC with streaming, reasoning extraction, and tool call parsing
MCP Integration	Connect external tool servers via Model Context Protocol
High Availability	Mesh networking with SWIM protocol for multi-node deployments
Chat History	Pluggable storage: PostgreSQL, Oracle, Redis, or in-memory
WASM Plugins	Extend with custom WebAssembly logic
Resilience	Circuit breakers, retries with backoff, rate limiting

Documentation


Getting Started	Installation and first steps
Architecture	How SMG works
Configuration	CLI reference and options
API Reference	OpenAI-compatible endpoints
Kubernetes Setup	In-cluster discovery and production setup

Contributing

We welcome contributions! See Contributing Guide for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.7.0.post20260630

Jun 30, 2026

1.6.0.post20260623

Jun 22, 2026

1.5.0.post20260622

Jun 21, 2026

1.5.0.post20260621

Jun 21, 2026

1.5.0.post20260620

Jun 20, 2026

1.5.0.post20260618

Jun 18, 2026

1.5.0.post20260617

Jun 18, 2026

1.5.0.post20260612

Jun 12, 2026

1.4.1.post20260607

Jun 7, 2026

1.4.1.post20260527

May 27, 2026

1.4.1.post20260525

May 26, 2026

1.4.1.post20260519

May 19, 2026

This version

1.4.1.post20260514

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenspeed_smg-1.4.1.post20260514.tar.gz (2.1 MB view details)

Uploaded May 14, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.0 MB view details)

Uploaded May 14, 2026 CPython 3.8+manylinux: glibc 2.17+ x86-64

tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (23.2 MB view details)

Uploaded May 14, 2026 CPython 3.8+manylinux: glibc 2.17+ ARM64

File details

Details for the file tokenspeed_smg-1.4.1.post20260514.tar.gz.

File metadata

Download URL: tokenspeed_smg-1.4.1.post20260514.tar.gz
Upload date: May 14, 2026
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260514.tar.gz
Algorithm	Hash digest
SHA256	`cad530e2fc1f0d631c1b7e7dcf619087555b99eee421af5ab4fc4b927d0dea7f`
MD5	`f3d266f889f5684514a3abfb9055231b`
BLAKE2b-256	`0b0e017fcb7676a02ea1f04b4988f6bbfe5f610a9db2f8089b229b4f731ba19b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260514.tar.gz:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspeed_smg-1.4.1.post20260514.tar.gz
- Subject digest: cad530e2fc1f0d631c1b7e7dcf619087555b99eee421af5ab4fc4b927d0dea7f
- Sigstore transparency entry: 1531216273
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: lightseekorg/tokenspeed-third-party@de3961854b379c4bdb6775b85812aa73de20f7a4
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lightseekorg
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: tokenspeed-smg.yml@de3961854b379c4bdb6775b85812aa73de20f7a4
- Trigger Event: workflow_dispatch

File details

Details for the file tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 14, 2026
Size: 23.0 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`c3e10954219dc9c450281e0aba3193f99d106148b41723585330d9ebfe184274`
MD5	`2209386416ae2a0688068462fc94cc3f`
BLAKE2b-256	`4ab809c731db09c34552e3ebd6315772d6f2e6eb616193390069f0c93a69a606`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: c3e10954219dc9c450281e0aba3193f99d106148b41723585330d9ebfe184274
- Sigstore transparency entry: 1531216483
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: lightseekorg/tokenspeed-third-party@de3961854b379c4bdb6775b85812aa73de20f7a4
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lightseekorg
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: tokenspeed-smg.yml@de3961854b379c4bdb6775b85812aa73de20f7a4
- Trigger Event: workflow_dispatch

File details

Details for the file tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: May 14, 2026
Size: 23.2 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`c39cc70a66e036fba74d3c8a4e96633cfa0dc4a3ddad7717d0e687c74ef047aa`
MD5	`cb130e06c66be92854054c83e954c50e`
BLAKE2b-256	`efbf7d5d121192ba416a308f33221415a096d9b5e8e5165d4ff69643dd9e63e2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: tokenspeed-smg.yml on lightseekorg/tokenspeed-third-party

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenspeed_smg-1.4.1.post20260514-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Subject digest: c39cc70a66e036fba74d3c8a4e96633cfa0dc4a3ddad7717d0e687c74ef047aa
- Sigstore transparency entry: 1531216370
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: lightseekorg/tokenspeed-third-party@de3961854b379c4bdb6775b85812aa73de20f7a4
- Branch / Tag: refs/heads/main
- Owner: https://github.com/lightseekorg
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: tokenspeed-smg.yml@de3961854b379c4bdb6775b85812aa73de20f7a4
- Trigger Event: workflow_dispatch

tokenspeed-smg 1.4.1.post20260514

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Shepherd Model Gateway

Why SMG?

Quick Start

Supported Backends

Features

Documentation

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance