Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with one command.

These details have not been verified by PyPI

Project links

Project description

gpuroutertest

Spin up OpenAI-compatible vLLM endpoints on your own AWS GPUs with a single command. gpuroutertest picks a GPU instance that fits the model, boots it from a Deep Learning AMI, downloads the weights, and hands you an endpoint URL + API key. Tear it down just as fast when you're done.

Built for developers who want to rapidly stand up a private inference endpoint for a Hugging Face model without hand-rolling EC2, security groups, and vLLM flags.

Install

pip install gpuroutertest

You need AWS credentials (aws configure, AWS_PROFILE, or SSO) with permission to manage EC2, plus GPU (G/P) instance quota in your region.

CLI quickstart

# See what's available
br list-models

# Deploy GLM-4 9B on a single A10G, wait until it's serving
br deploy glm49b --size small --region us-east-1 --profile myprofile

# List running deployments (source of truth = AWS tags, so keys are never lost)
br ps --region us-east-1

# Inspect / fetch endpoint / read boot logs
br status  <api_key|instance-id> --region us-east-1
br endpoint <api_key|instance-id> --region us-east-1
br logs     <api_key|instance-id> --region us-east-1

# Tear everything down (stops billing)
br delete  <api_key|instance-id> --region us-east-1 -y

Every command takes --profile/-p and --region/-r. deploy also supports:

Flag	Purpose
`--size`	`small` / `medium` / `large` tier from the registry
`--hf-token`	token for gated HF models (or set `HF_TOKEN`)
`--cidr`	restrict who can reach port 8000 (default `0.0.0.0/0`)
`--ttl`	auto-terminate after N minutes — a cost guard for dev instances
`--timeout`	minutes to wait for the health check (default 30)
`--no-wait`	return as soon as the instance is running

Library usage

import gpuroutertest as br

dep = br.deploy("glm49b", size="small", region="us-east-1", profile="myprofile")
print(dep.endpoint_url, dep.api_key)

# ... use the OpenAI-compatible API at dep.endpoint_url with dep.api_key ...

br.destroy(dep.api_key, region="us-east-1")

Call the endpoint like any OpenAI server:

curl $ENDPOINT/chat/completions \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"zai-org/glm-4-9b-chat-hf","messages":[{"role":"user","content":"Hello"}]}'

Models

Models live in sdk/gpuroutertest/registry.json. Each entry maps a key to a Hugging Face model id and per-size serving config (instance type, disk, vLLM flags). Add a model by adding an entry — no code changes required.

Notes & limitations

The security group opens port 8000; it defaults to the whole internet and is protected only by the API key. Use --cidr in real use.
Traffic is plain HTTP (no TLS). Put it behind a proxy/load balancer for anything beyond dev.
Instance boot logs are available via br logs (EC2 console output). vLLM's own container logs require SSH/SSM, which are intentionally not provisioned.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Jul 2, 2026

This version

0.1.0

Jul 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuroutertest-0.1.0.tar.gz (19.8 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpuroutertest-0.1.0-py3-none-any.whl (19.3 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file gpuroutertest-0.1.0.tar.gz.

File metadata

Download URL: gpuroutertest-0.1.0.tar.gz
Upload date: Jul 2, 2026
Size: 19.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for gpuroutertest-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c08fabc19f84b4e8b6878e5cc7b8eee19fa97af5abcf6846be1bb90f2f640714`
MD5	`4b86ceaf466a3bf2bed382033c432849`
BLAKE2b-256	`730a5fa2faa203e631837f8e33cefe9549f7bb3dec89658ee672080e0d6b6237`

See more details on using hashes here.

File details

Details for the file gpuroutertest-0.1.0-py3-none-any.whl.

File metadata

Download URL: gpuroutertest-0.1.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 19.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for gpuroutertest-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`34005eb411b4cdac77bd010c6ec27e29f7d0eaf32eec4019ae1abadff6682033`
MD5	`a3dbf004646cb264176892ff1f812b80`
BLAKE2b-256	`2234427ffc1930ba59f4ad714d923cb9721b25b7ad0234bbd3c63e8d18329fe2`

See more details on using hashes here.

gpuroutertest 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gpuroutertest

Install

CLI quickstart

Library usage

Models

Notes & limitations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes