OpenAI-compatible API gateway for HPC clusters via Globus Compute
Project description
hpc-gateway
OpenAI-compatible API gateway for HPC clusters via Globus Compute.
hpc-gateway exposes any vLLM-served model running on an HPC cluster (SLURM, PBS, etc.) as a standard OpenAI-compatible REST API. It handles authentication, rate limiting, payload size management, and real-time token streaming — so your existing OpenAI clients work without modification.
from hpc_gateway.compute import GlobusComputeClient
client = GlobusComputeClient(
endpoint_id="8d978809-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
models={
"qwen25-vl-72b": {
"hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
"url": "http://ghi2-002:8000",
"context_reserve_output": 4096,
}
},
)
result = await client.submit_inference(
messages=[{"role": "user", "content": "Explain quantum entanglement."}],
model="qwen25-vl-72b",
)
Why
HPC clusters run the largest open-source LLMs (72B+ parameters) on GPU hardware that typical cloud users can't afford. But HPC infrastructure has no standard API surface — each cluster has its own SLURM scripts, SSH tunnels, and authentication systems. hpc-gateway provides a uniform OpenAI-compatible interface over any vLLM-served model, using Globus Compute for authentication and job dispatch (no open ports required on the HPC side).
Architecture
Your App / OpenAI Client
│ POST /v1/chat/completions
▼
hpc-gateway (FastAPI)
│ Globus Compute (AMQP — no HPC firewall holes)
▼
HPC Cluster (SLURM)
│ vLLM HTTP API (internal LAN)
▼
GPU Compute Node
│ tokens flow via WebSocket relay (streamrelay)
▼
hpc-gateway → SSE stream → Your App
Key design points:
- No open ports on HPC: Globus Compute is outbound-only from the cluster
- Real-time streaming: Tokens stream back via streamrelay WebSocket relay
- E2E encryption: Optional AES-256-GCM encryption between HPC and consumer (relay sees only ciphertext)
- OpenAI-compatible: Drop-in for any client using the OpenAI SDK
Installation
# Base package (no Globus SDK)
pip install hpc-gateway
# With Globus Compute support
pip install "hpc-gateway[globus]"
Quickstart: Run as a service
Set environment variables and start:
export GLOBUS_COMPUTE_ENDPOINT_ID="your-endpoint-uuid"
export HPC_MODELS='{"qwen25-vl-72b": {"hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ", "url": "http://ghi2-002:8000", "context_reserve_output": 4096}}'
export RELAY_URL="wss://relay.example.com"
export RELAY_SECRET="your-relay-secret"
uvicorn hpc_gateway.app:app --host 0.0.0.0 --port 8001
The gateway is now reachable at http://localhost:8001/v1/chat/completions with the standard OpenAI API schema.
Embed in an existing FastAPI app
from fastapi import FastAPI
from hpc_gateway.app import router
app = FastAPI()
app.include_router(router, prefix="/hpc")
Configuration reference
| Variable | Default | Description |
|---|---|---|
GLOBUS_COMPUTE_ENDPOINT_ID |
— | Globus endpoint UUID for the HPC cluster |
HPC_MODELS |
{} |
JSON dict: model name → HPC config |
RELAY_URL |
— | WebSocket relay URL for token streaming |
RELAY_SECRET |
— | Shared secret for relay auth |
RELAY_ENCRYPTION_KEY |
— | AES-256 hex key for E2E encryption |
USE_GLOBUS_COMPUTE |
true |
false to route directly via SSH tunnel |
LAKESHORE_VLLM_ENDPOINT |
http://localhost:8000 |
Direct vLLM URL (SSH mode) |
HPC_PROXY_HOST |
0.0.0.0 |
Bind host |
HPC_PROXY_PORT |
8001 |
Bind port |
HPC_MODELS schema
{
"my-model-name": {
"hf_name": "org/ModelName",
"url": "http://compute-node:8000",
"context_reserve_output": 4096
}
}
Authentication
The gateway supports two auth modes (configured in hpc_gateway/auth.py):
- Globus token: Bearer token from Globus Auth, validated via introspection
- API key: Static key from
HPC_API_KEYSenv var (comma-separated)
Development
git clone https://github.com/uicacer/hpc-gateway
cd hpc-gateway
uv sync --extra dev
uv run pytest
Related
- streamrelay — WebSocket relay for real-time token streaming from Globus Compute
- STREAM — Full tiered LLM routing system that uses hpc-gateway
License
Apache 2.0 — see LICENSE.
Citation
If you use hpc-gateway in research, please cite:
@software{nassar2025hpcgateway,
author = {Nassar, Anas},
title = {hpc-gateway: OpenAI-compatible API gateway for HPC clusters via Globus Compute},
year = {2025},
url = {https://github.com/uicacer/hpc-gateway}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hpc_gateway-0.1.0.tar.gz.
File metadata
- Download URL: hpc_gateway-0.1.0.tar.gz
- Upload date:
- Size: 110.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1483b13c6f9a30411b94f4b79d365de3534fc65beba55f99569fb6e0f751c8a
|
|
| MD5 |
15ca6db67105e0df731d27f59e264e4d
|
|
| BLAKE2b-256 |
36f137ee7cc55fcb5badab5de262a7fb2a3e470afbdfbcd4af0bdc225e1009fe
|
File details
Details for the file hpc_gateway-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hpc_gateway-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e08b3837c4dec6d4219f64af8343bbd16347fdfee5c6e60808e8fb0b4c1182cd
|
|
| MD5 |
d950b25f6aa57d630a693f6841108881
|
|
| BLAKE2b-256 |
3cd27d9ee16d403a2705e4f30dcb07b40fa093faaca8f0a6118db82c6ba8ec88
|