Higress Wasm plugins for GPUStack
Project description
GPUStack Higress Plugins
Higress Proxy-Wasm plugins for GPUStack, providing AI API traffic processing, observability, and enhanced gateway features.
Overview
This repository contains custom Higress Proxy-Wasm plugins designed for GPUStack, distributed as a Python package that includes pre-compiled Wasm plugins and a built-in HTTP file server for serving them.
Installation
pip install gpustack-higress-plugins
Requirements: Python >= 3.10
Available Plugins
-
gpustack-token-usage - Collects and injects token usage statistics into AI API responses. For streaming responses: time to first token, time per output token, and tokens per second. For non-streaming responses: tokens per second only. Supports real client IP injection and path-based filtering.
-
gpustack-set-header-pre-route - Automatically injects the route name and model name into HTTP request headers before routing, based on configurable path suffixes or prefixes.
Usage
Start Plugin Server
# Start the built-in HTTP file server
gpustack-plugins start --port 8080
# Or with custom host
gpustack-plugins start --port 8080 --host 0.0.0.0
The server will be available at http://localhost:8080.
API Endpoints
# Health check
curl http://localhost:8080/
# Download a plugin
curl http://localhost:8080/wasm-plugins/gpustack-token-usage/1.0.0/plugin.wasm -o plugin.wasm
# Get metadata
curl http://localhost:8080/wasm-plugins/gpustack-token-usage/1.0.0/metadata.txt
Python API
from gpustack_higress_plugins import create_app, router
# Embed in an existing FastAPI app
app.include_router(router)
# Or create a standalone app
app = create_app()
Configure Higress WasmPlugin
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: gpustack-token-usage
namespace: higress-system
spec:
url: http://plugin-server:8080/wasm-plugins/gpustack-token-usage/1.0.0/plugin.wasm
defaultConfig:
realIPToHeader: x-gpustack-real-ip
Development
Prerequisites
- Go 1.24+
- Python 3.10+
- oras (
brew install oras) — required for fetching remote plugins
Build Plugins
# Install Python dependencies
make dev
# Build all plugins (local + remote, requires oras)
make build
# Build only local Go plugins (no oras required)
make -C extensions build-all
# Build specific plugin
make -C extensions build PLUGIN_NAME=gpustack-token-usage
If
orasis not installed,make buildwill build local plugins only and print a warning.
Run Tests
# Test Go plugins
make test
# Test single plugin
make -C extensions test PLUGIN_NAME=gpustack-token-usage
Check Wheel Contents
make verify-whl
Reports each expected plugin (from extensions/*/VERSION and remote_plugins.yaml) as ✓ present, ✗ missing, or version mismatch, and checks that manifest.json is included.
Deployment
Kubernetes (recommended)
Deploy the plugin server as a separate service and reference it from WasmPlugin resources:
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpustack-higress-plugins
spec:
template:
spec:
containers:
- name: plugins
image: gpustack/higress-plugins:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /
port: 8080
readinessProbe:
httpGet:
path: /
port: 8080
Docker Image
# Build Docker image
make image
# Build with custom Go proxy
GOPROXY=https://goproxy.cn,direct make image
# Run standalone
docker run -p 8080:8080 gpustack/higress-plugins:latest
Project Structure
gpustack-higress-plugins/
├── extensions/ # Go plugin source code
│ ├── gpustack-token-usage/
│ │ ├── main.go
│ │ ├── go.mod
│ │ └── VERSION
│ ├── gpustack-set-header-pre-route/
│ ├── remote_plugins.yaml # Remote OCI plugin config
│ └── Makefile
├── gpustack_higress_plugins/ # Python package
│ ├── __init__.py
│ ├── main.py # CLI + FastAPI app factory
│ ├── server.py # /wasm-plugins router
│ ├── plugins/ # Compiled .wasm files (generated)
│ └── manifest.json # Plugin index (generated)
├── scripts/ # Build scripts
│ ├── generate_manifest.py
│ ├── generate_metadata.py
│ └── fetch_remote_plugins.py
├── Dockerfile
├── pyproject.toml
└── Makefile
Versioning
- Package version follows Semantic Versioning (MAJOR.MINOR.PATCH)
- Each plugin has its own version in
extensions/<name>/VERSION - Package version is set from the git tag at release time (placeholder
0.0.0in development) - RC releases (e.g.
0.2.0rc1) are published to TestPyPI; stable releases go to PyPI
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpustack_higress_plugins-0.2.0-py3-none-any.whl.
File metadata
- Download URL: gpustack_higress_plugins-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78fbf5404ce428e067d5b195f777e62287f93f7f1ea8676b2d98fa563c602a24
|
|
| MD5 |
76fcfd85018c84c72176982fe2e73df9
|
|
| BLAKE2b-256 |
681041021dcd75168465dd511b1f4902ea4536ed9cd58853e9363547d4226496
|