Skip to main content

Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with one command.

Project description

gpuroutertest

Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with a single command. gpuroutertest picks a GPU instance that fits the model, boots it from a Deep Learning AMI, downloads the weights, and hands you an endpoint URL + API key. Tear it down just as fast when you're done.

Built for developers who want to rapidly stand up a private inference endpoint for a Hugging Face model without hand-rolling EC2, security groups, and vLLM flags.

Install

pip install gpuroutertest

You need AWS credentials (aws configure, AWS_PROFILE, or SSO) with permission to manage EC2, plus GPU (G/P) instance quota in your region.

CLI quickstart

# See what's available
br list-models

# Deploy GLM-4 9B on a single A10G, wait until it's serving
br deploy glm49b --size small --region us-east-1 --profile myprofile

# List running deployments (source of truth = AWS tags, so keys are never lost)
br ps --region us-east-1

# Inspect / fetch endpoint / read boot logs
br status  <api_key|instance-id> --region us-east-1
br endpoint <api_key|instance-id> --region us-east-1
br logs     <api_key|instance-id> --region us-east-1

# Tear everything down (stops billing)
br delete  <api_key|instance-id> --region us-east-1 -y

Every command takes --profile/-p and --region/-r. deploy also supports:

Flag Purpose
--size small / medium / large tier from the registry
--hf-token token for gated HF models (or set HF_TOKEN)
--cidr restrict who can reach port 8000 (default 0.0.0.0/0)
--ttl auto-terminate after N minutes — a cost guard for dev instances
--timeout minutes to wait for the health check (default 30)
--no-wait return as soon as the instance is running

Library usage

import gpuroutertest as br

dep = br.deploy("glm49b", size="small", region="us-east-1", profile="myprofile")
print(dep.endpoint_url, dep.api_key)

# ... use the OpenAI-compatible API at dep.endpoint_url with dep.api_key ...

br.destroy(dep.api_key, region="us-east-1")

Call the endpoint like any OpenAI server:

curl $ENDPOINT/chat/completions \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"zai-org/glm-4-9b-chat-hf","messages":[{"role":"user","content":"Hello"}]}'

Models

Models live in sdk/gpuroutertest/registry.json. Each entry maps a key to a Hugging Face model id and per-size serving config (instance type, disk, vLLM flags). Add a model by adding an entry — no code changes required.

Notes & limitations

  • The security group opens port 8000; it defaults to the whole internet and is protected only by the API key. Use --cidr in real use.
  • Traffic is plain HTTP (no TLS). Put it behind a proxy/load balancer for anything beyond dev.
  • Instance boot logs are available via br logs (EC2 console output). vLLM's own container logs require SSH/SSM, which are intentionally not provisioned.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuroutertest-0.1.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpuroutertest-0.1.0-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file gpuroutertest-0.1.0.tar.gz.

File metadata

  • Download URL: gpuroutertest-0.1.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for gpuroutertest-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c08fabc19f84b4e8b6878e5cc7b8eee19fa97af5abcf6846be1bb90f2f640714
MD5 4b86ceaf466a3bf2bed382033c432849
BLAKE2b-256 730a5fa2faa203e631837f8e33cefe9549f7bb3dec89658ee672080e0d6b6237

See more details on using hashes here.

File details

Details for the file gpuroutertest-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gpuroutertest-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for gpuroutertest-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34005eb411b4cdac77bd010c6ec27e29f7d0eaf32eec4019ae1abadff6682033
MD5 a3dbf004646cb264176892ff1f812b80
BLAKE2b-256 2234427ffc1930ba59f4ad714d923cb9721b25b7ad0234bbd3c63e8d18329fe2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page