HTTP gateway for any HPC function via Globus Compute + WebSocket relay

These details have not been verified by PyPI

Project links

Project description

hpc-as-api

OpenAI-compatible API gateway for HPC clusters via Globus Compute.

hpc-as-api exposes any vLLM-served model running on an HPC cluster (SLURM, PBS, etc.) as a standard OpenAI-compatible REST API. It handles authentication, rate limiting, payload size management, and real-time token streaming — so your existing OpenAI clients work without modification.

from hpc_as_api.compute import GlobusComputeClient

client = GlobusComputeClient(
    endpoint_id="8d978809-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    models={
        "qwen25-vl-72b": {
            "hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ",
            "url": "http://ghi2-002:8000",
            "context_reserve_output": 4096,
        }
    },
)
result = await client.submit_inference(
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
    model="qwen25-vl-72b",
)

Why

HPC clusters run the largest open-source LLMs (72B+ parameters) on GPU hardware that typical cloud users can't afford. But HPC infrastructure has no standard API surface — each cluster has its own SLURM scripts, SSH tunnels, and authentication systems. hpc-as-api provides a uniform OpenAI-compatible interface over any vLLM-served model, using Globus Compute for authentication and job dispatch (no open ports required on the HPC side).

Architecture

Relay architecture: the HPC compute node and gateway consumer both connect outbound to the WebSocket relay, traversing firewalls without VPN or inbound ports.

Your App / OpenAI Client
        │  POST /v1/chat/completions
        ▼
  hpc-as-api (FastAPI)
        │  Globus Compute (AMQP — no HPC firewall holes)
        ▼
  HPC Cluster (SLURM)
        │  vLLM HTTP API (internal LAN)
        ▼
  GPU Compute Node
        │  tokens flow via WebSocket relay (streamrelay)
        ▼
  hpc-as-api → SSE stream → Your App

Key design points:

No open ports on HPC: Globus Compute is outbound-only from the cluster
Real-time streaming: Tokens stream back via streamrelay WebSocket relay
E2E encryption: Optional AES-256-GCM encryption between HPC and consumer (relay sees only ciphertext)
OpenAI-compatible: Drop-in for any client using the OpenAI SDK

Installation

# Base package (no Globus SDK)
pip install hpc-as-api

# With Globus Compute support
pip install "hpc-as-api[globus]"

Quickstart: Run as a service

Set environment variables and start:

export GLOBUS_COMPUTE_ENDPOINT_ID="your-endpoint-uuid"
export HPC_MODELS='{"qwen25-vl-72b": {"hf_name": "Qwen/Qwen2.5-VL-72B-Instruct-AWQ", "url": "http://ghi2-002:8000", "context_reserve_output": 4096}}'
export RELAY_URL="wss://relay.example.com"
export RELAY_SECRET="your-relay-secret"

uvicorn hpc_as_api.app:app --host 0.0.0.0 --port 8001

The gateway is now reachable at http://localhost:8001/v1/chat/completions with the standard OpenAI API schema.

Embed in an existing FastAPI app

from fastapi import FastAPI
from hpc_as_api.app import router

app = FastAPI()
app.include_router(router, prefix="/hpc")

Configuration reference

Variable	Default	Description
`GLOBUS_COMPUTE_ENDPOINT_ID`	—	Globus endpoint UUID for the HPC cluster
`HPC_MODELS`	`{}`	JSON dict: model name → HPC config
`RELAY_URL`	—	WebSocket relay URL for token streaming
`RELAY_SECRET`	—	Shared secret for relay auth
`RELAY_ENCRYPTION_KEY`	—	AES-256 hex key for E2E encryption
`USE_GLOBUS_COMPUTE`	`true`	`false` to route directly via SSH tunnel
`LAKESHORE_VLLM_ENDPOINT`	`http://localhost:8000`	Direct vLLM URL (SSH mode)
`HPC_PROXY_HOST`	`0.0.0.0`	Bind host
`HPC_PROXY_PORT`	`8001`	Bind port

HPC_MODELS schema

{
  "my-model-name": {
    "hf_name": "org/ModelName",
    "url": "http://compute-node:8000",
    "context_reserve_output": 4096
  }
}

Authentication

The gateway supports two auth modes (configured in hpc_as_api/auth.py):

Globus token: Bearer token from Globus Auth, validated via introspection
API key: Static key from HPC_API_KEYS env var (comma-separated)

Development

git clone https://github.com/uicacer/hpc-as-api
cd hpc-as-api
uv sync --extra dev
uv run pytest

streamrelay — WebSocket relay for real-time token streaming from Globus Compute
STREAM — Full tiered LLM routing system that uses hpc-as-api

License

Apache 2.0 — see LICENSE.

Citation

If you use hpc-as-api in research, please cite:

@software{nassar2025hpcgateway,
  author = {Nassar, Anas},
  title  = {hpc-as-api: OpenAI-compatible API gateway for HPC clusters via Globus Compute},
  year   = {2025},
  url    = {https://github.com/uicacer/hpc-as-api}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.1

Jun 21, 2026

0.6.0

Jun 19, 2026

0.5.12

Jun 19, 2026

0.5.11

Jun 19, 2026

0.5.8

Jun 18, 2026

0.5.7

Jun 18, 2026

0.5.6

Jun 17, 2026

0.5.5

Jun 17, 2026

0.5.4

Jun 17, 2026

0.5.3

Jun 17, 2026

0.5.2

Jun 16, 2026

0.5.1

Jun 16, 2026

0.5.0

Jun 16, 2026

0.4.1

Jun 16, 2026

0.3.6

Jun 11, 2026

0.3.5

Jun 11, 2026

0.3.4

Jun 10, 2026

0.3.3

Jun 10, 2026

0.3.2

Jun 4, 2026

0.3.1

Jun 4, 2026

0.3.0

Jun 4, 2026

This version

0.2.0

May 28, 2026

0.1.0

May 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hpc_as_api-0.2.0.tar.gz (478.6 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hpc_as_api-0.2.0-py3-none-any.whl (49.2 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file hpc_as_api-0.2.0.tar.gz.

File metadata

Download URL: hpc_as_api-0.2.0.tar.gz
Upload date: May 28, 2026
Size: 478.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for hpc_as_api-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e20cc3b6cae02e7ceb43dc95930b3f04db39f908f3d694778267f9b5bb2bcdda`
MD5	`ff02ecdea244ef79f8685acb0ca26f33`
BLAKE2b-256	`ab7cfd58c1f2220e711b04b83a45fc0facc39de1ff74fa5a794969532d35b16c`

See more details on using hashes here.

File details

Details for the file hpc_as_api-0.2.0-py3-none-any.whl.

File metadata

Download URL: hpc_as_api-0.2.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 49.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for hpc_as_api-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ac35030f882d8ae8572265789f039017f3c3f5d285e7ab0b74d73b15d12466c`
MD5	`edf5796b2a628d924868299825e38252`
BLAKE2b-256	`201ef997fac03408dd5d8892018b74a5afd515750c413ba07909cf7e1dd612e3`

See more details on using hashes here.

hpc-as-api 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hpc-as-api

Why

Architecture

Installation

Quickstart: Run as a service

Embed in an existing FastAPI app

Configuration reference

HPC_MODELS schema

Authentication

Development

Related

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes