Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with one command.
Project description
gpuroutertest
Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with a single command. gpuroutertest picks a GPU instance that fits the model, boots it from a Deep Learning AMI, downloads the weights, and hands you an endpoint URL + API key. Tear it down just as fast when you're done.
Built for developers who want to rapidly stand up a private inference endpoint for a Hugging Face model without hand-rolling EC2, security groups, and vLLM flags.
Install
pip install gpuroutertest
You need AWS credentials (aws configure, AWS_PROFILE, or SSO) with permission to manage EC2, plus GPU (G/P) instance quota in your region.
CLI quickstart
# See what's available
br list-models
# Deploy GLM-4 9B on a single A10G, wait until it's serving
br deploy glm49b --size small --region us-east-1 --profile myprofile
# List running deployments (source of truth = AWS tags, so keys are never lost)
br ps --region us-east-1
# Inspect / fetch endpoint / read boot logs
br status <api_key|instance-id> --region us-east-1
br endpoint <api_key|instance-id> --region us-east-1
br logs <api_key|instance-id> --region us-east-1
# Tear everything down (stops billing)
br delete <api_key|instance-id> --region us-east-1 -y
Every command takes --profile/-p and --region/-r. deploy also supports:
| Flag | Purpose |
|---|---|
--size |
small / medium / large tier from the registry |
--hf-token |
token for gated HF models (or set HF_TOKEN) |
--cidr |
restrict who can reach port 8000 (default 0.0.0.0/0) |
--ttl |
auto-terminate after N minutes — a cost guard for dev instances |
--timeout |
minutes to wait for the health check (default 30) |
--no-wait |
return as soon as the instance is running |
Library usage
import gpuroutertest as br
dep = br.deploy("glm49b", size="small", region="us-east-1", profile="myprofile")
print(dep.endpoint_url, dep.api_key)
# ... use the OpenAI-compatible API at dep.endpoint_url with dep.api_key ...
br.destroy(dep.api_key, region="us-east-1")
Call the endpoint like any OpenAI server:
curl $ENDPOINT/chat/completions \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"model":"zai-org/glm-4-9b-chat-hf","messages":[{"role":"user","content":"Hello"}]}'
Models
Models live in sdk/gpuroutertest/registry.json. Each entry maps a key to a Hugging Face model id and per-size serving config (instance type, disk, vLLM flags). Add a model by adding an entry — no code changes required.
Notes & limitations
- The security group opens port 8000; it defaults to the whole internet and is protected only by the API key. Use
--cidrin real use. - Traffic is plain HTTP (no TLS). Put it behind a proxy/load balancer for anything beyond dev.
- Instance boot logs are available via
br logs(EC2 console output). vLLM's own container logs require SSH/SSM, which are intentionally not provisioned.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpuroutertest-0.1.0.tar.gz.
File metadata
- Download URL: gpuroutertest-0.1.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c08fabc19f84b4e8b6878e5cc7b8eee19fa97af5abcf6846be1bb90f2f640714
|
|
| MD5 |
4b86ceaf466a3bf2bed382033c432849
|
|
| BLAKE2b-256 |
730a5fa2faa203e631837f8e33cefe9549f7bb3dec89658ee672080e0d6b6237
|
File details
Details for the file gpuroutertest-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gpuroutertest-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34005eb411b4cdac77bd010c6ec27e29f7d0eaf32eec4019ae1abadff6682033
|
|
| MD5 |
a3dbf004646cb264176892ff1f812b80
|
|
| BLAKE2b-256 |
2234427ffc1930ba59f4ad714d923cb9721b25b7ad0234bbd3c63e8d18329fe2
|