Python SDK for GPUniq GPU Meta-Cloud platform
Project description
GPUniq
GPUniq is a Python SDK and CLI for the GPUniq GPU Meta-Cloud platform.
One account gets you:
- GPU compute — browse 5000+ GPUs across multiple providers, deploy by type, or scale to dozens at once with automatic fallback.
- LLM API — Claude (Opus 4.7, Sonnet 4.6, Haiku 4.5), GPT-5 family, Gemini 3, Grok 4, plus 30+ open-source models through one key, billed in USD.
- Image generation — Nano Banana, Nano Banana Pro / 4K, and Grok 4 Image; text-to-image and image-to-image under the same API.
- Persistent volumes — S3-backed storage that survives instance swaps and failures.
ggCLI — a singlepip installgives you a full-featured terminal app for everything above, plus command checkpointing inside GPU instances.
All services share one USD balance on your account — no token pools, no separate top-ups.
Installation
pip install -U gpuniq
Python 3.8+. The gg command is available in your terminal immediately after install.
Quick Start — Python
from gpuniq import GPUniq
client = GPUniq(api_key="gpuniq_your_key_here")
# Browse 5000+ GPUs on the marketplace
gpus = client.marketplace.list(sort_by="price-low")
# Deploy a GPU in one call
deploy = client.gpu_cloud.deploy(gpu_name="RTX_4090", docker_image="pytorch/pytorch:latest")
# Chat with an LLM (billed in USD from your balance)
print(client.llm.chat("claude-haiku-4-5", "Write me a haiku about GPUs"))
# Generate an image and save it locally
client.llm.generate_image(
"a red cat astronaut on Mars",
model="nano-banana",
save_to="cat.png",
)
Quick Start — CLI
pip install -U gpuniq
gg login # paste your API key
gg rent # interactive: pick GPU, plan, template, volume
gg open # SSH into the rented instance
gg llm "explain CUDA streams" # one-shot chat
gg image "a red cat astronaut" -o cat.png
GPU Products
GPUniq offers three ways to rent GPU compute, each suited to a different use case.
1. GPU Marketplace — pick a specific machine
Browse 5000+ GPU servers from multiple providers with detailed specs (VRAM, RAM, disk, network, location, reliability, verification) and rent the exact machine you want.
Pricing plans: minute, week, month. Longer commitments give deeper discounts. hour and day are no longer offered as defaults — use week unless you explicitly need per-minute flexibility.
gpus = client.marketplace.list(
gpu_model=["RTX 4090", "A100"],
min_vram_gb=24,
min_inet_speed_mbps=500,
verified_only=True,
sort_by="price-low",
page=1, page_size=20,
)
for agent in gpus["agents"]:
print(f"{agent['gpu_model']} x{agent['gpu_count']} — ${agent['price_per_hour']}/hr")
order = client.marketplace.create_order(
agent_id=gpus["agents"][0]["id"],
pricing_type="week",
docker_image="pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime",
ssh_key_ids=[1],
disk_gb=100,
volume_id=9,
)
2. GPU Dex-Cloud — deploy by GPU type
Say which GPU you want and how many — GPUniq picks the best available machine and provisions it automatically.
deploy = client.gpu_cloud.deploy(
gpu_name="RTX_4090",
gpu_count=1,
docker_image="pytorch/pytorch:latest",
disk_gb=100,
volume_id=9,
)
3. GPU Burst — scale to many GPUs with fallback
Request dozens of GPUs at once with fallback types and price caps. If your primary choice isn't available, the platform automatically substitutes.
order = client.burst.create_order(
docker_image="pytorch/pytorch:latest",
primary_gpu="RTX_4090",
gpu_count=8,
extra_gpus=[
{"gpu_name": "RTX_3090", "max_price": 0.5},
{"gpu_name": "A100", "max_price": 1.2},
],
volume_id=9,
disk_gb=200,
)
Comparison
| Marketplace | Dex-Cloud | Burst | |
|---|---|---|---|
| Use case | Pick a specific machine | Quick deploy by GPU type | Scale to many GPUs |
| Control | Full (choose server) | Medium (choose GPU type) | Low (auto-provisioned) |
| GPU count | 1 server | 1-8 GPUs | 1-100 GPUs |
| Fallback GPUs | No | No | Yes |
| Best for | Long training runs | Quick experiments | Distributed training |
Instance Management
# List your instances
instances = client.instances.list(page=1, page_size=20)
archived = client.instances.list_archived()
# Details
details = client.instances.get(task_id=456)
# Lifecycle
client.instances.start(task_id=456)
client.instances.stop(task_id=456)
client.instances.delete(task_id=456) # fully destroys the instance
client.instances.rename(task_id=456, name="my-training-run")
# Change billing plan mid-rental
client.instances.change_billing_plan(task_id=456, pricing_type="week")
# Container logs, SLA, SSH keys per-instance
logs = client.instances.logs(task_id=456)
sla = client.instances.sla(task_id=456)
client.instances.attach_ssh_key(task_id=456, ssh_key_id=1)
client.instances.detach_ssh_key(task_id=456, key_id=1)
Volumes
Persistent S3-backed storage that syncs automatically between your GPU instance and the cloud. Survives instance restarts and replacements.
vol = client.volumes.create(name="my-dataset", size_limit_gb=50)
# Attach at deploy time
client.gpu_cloud.deploy(
gpu_name="RTX_4090",
docker_image="pytorch/pytorch:latest",
volume_id=vol["id"],
)
# Manage
client.volumes.list()
client.volumes.update(volume_id=vol["id"], size_limit_gb=100)
client.volumes.delete(volume_id=vol["id"])
# Sync logs
client.volumes.sync_logs(volume_id=vol["id"])
LLM API
Access Claude, GPT-5, Gemini, Grok, and 30+ open-source models through one API, billed directly in USD from user.balance. No token packages, no pool conversions.
# One-shot chat
reply = client.llm.chat("claude-haiku-4-5", "Explain transformers in one paragraph.")
print(reply)
# Full completion with message history and parameters
data = client.llm.chat_completion(
messages=[
{"role": "system", "content": "You are a terse assistant."},
{"role": "user", "content": "What is Gaussian splatting?"},
],
model="claude-haiku-4-5",
temperature=0.3,
max_tokens=400,
)
print(data["content"])
print(f"Used {data['tokens_used']} tokens · cost ${data['cost_usd']:.4f} · balance ${data['balance_usd']:.2f}")
# What's available
models = client.llm.models() # list of text-model slugs
default = client.llm.default_model()
catalog = client.llm.model_catalog() # full catalog with pricing metadata
# Current USD balance
print(client.llm.balance())
# Persistent chat sessions (multi-turn history stored server-side)
session = client.llm.create_chat_session(model="claude-haiku-4-5", title="Research notes")
client.llm.send_message(chat_id=session["id"], message="Summarise this…")
client.llm.list_chat_sessions()
client.llm.delete_chat_session(chat_id=session["id"])
# Usage history
client.llm.usage_history(limit=50)
OpenAI-compatible endpoint
For tools that expect the OpenAI protocol — Claude Code via LiteLLM, Cursor, Continue.dev, Aider, the official OpenAI SDKs — point them at https://api.gpuniq.com/v1/openai with your GPUniq key as the Authorization: Bearer token. Byte-identical SSE streaming, every field (tools, tool_choice, response_format, logprobs, seed) forwarded unchanged.
from openai import OpenAI
oai = OpenAI(api_key="gpuniq_your_key", base_url="https://api.gpuniq.com/v1/openai")
oai.chat.completions.create(model="claude-haiku-4-5", messages=[{"role":"user","content":"hi"}])
Image Generation
Text-to-image and image-to-image through Nano Banana, Nano Banana Pro / 4K, and Grok 4 Image. Flat per-image billing: you pay only for delivered images.
Synchronous
Good for quick single images under a few seconds.
result = client.llm.generate_image(
"a red cat astronaut on Mars",
model="nano-banana",
n=1,
size="1024x1024",
save_to="cat.png",
)
print(result["saved_paths"]) # → ['cat.png']
print(f"cost ${result['cost_usd']:.4f} · balance ${result['balance_usd']:.2f}")
Async + poll (recommended for Nano Banana)
Higher-resolution / longer-running generations go through a job surface that isn't bound by the upstream proxy's ~100s read timeout. generate_image_async handles polling for you.
result = client.llm.generate_image_async(
"isometric cyberpunk city at dusk",
model="nano-banana-pro",
size="2048x2048",
save_to="city.png",
on_progress=lambda status, _payload: print("→", status),
)
Image-to-image / editing
Pass reference images as local paths, data: URLs, https:// URLs, raw bytes, or bare base64. The SDK detects local paths and inlines them as data URLs for you.
client.llm.generate_image(
"same cat but in Tokyo at night, neon reflections",
model="nano-banana-pro",
input_images=["cat.png", "reference/mood_board.jpg"],
size="2048x2048",
save_to="cat_tokyo.png",
)
Low-level job control
If you want to do your own polling / cancellation UI:
job = client.llm.start_image_job("abstract painting of a neural network", model="nano-banana")
while True:
status = client.llm.get_image_job(job["job_id"])
if status["status"] in ("completed", "failed"):
break
time.sleep(3)
Image model slugs
| Model | Slug | Price / image | Notes |
|---|---|---|---|
| Nano Banana | nano-banana |
$0.0312 | Fast text-to-image & image-to-image, ~1K |
| Nano Banana 2 | nano-banana-2 |
$0.0500 | Quality-value generation up to 2K |
| Nano Banana Pro | nano-banana-pro |
$0.1072 | Higher quality, ~1K |
| Nano Banana Pro 4K | nano-banana-pro-4k |
$0.192 | 4K resolution |
| Grok 4 Image | grok-4-image |
$0.0352 | xAI image generator |
Prices are displayed on client.llm.model_catalog() and may change.
CLI — gg
The gg CLI has two modes:
- Client mode — runs on your local laptop / dev machine. Browse and rent GPUs, SSH into them, chat with LLMs, generate images, manage volumes and SSH keys.
- GPU mode — runs on the GPU instance itself. Command checkpointing, persistent services, and auto-recovery after hardware swaps.
Client commands (your machine)
gg login # paste your API key (stored in ~/.gpuniq/config.json)
gg status # show login status and instance summary
gg balance # current USD balance
gg help # same as gg --help
Rent a GPU
gg rent
Opens a full-width interactive flow:
- Filter wizard — GPU model (2D picker by generation: Datacenter / 50XX / 40XX / 30XX / 20XX / 1660), min GPU count, max price / hr, verified only, sort.
- Marketplace table — paginated, resizes columns to your terminal. On wide terminals you see GPU, CNT, VRAM, RAM, DISK, CPU, NET ↓/↑, LOCATION, RELIA, AVAIL, HOSTING, PRICE, VER.
- Billing plan — week (default) / month / minute.
- Template — PyTorch, ComfyUI, vLLM, Ubuntu VM, or custom image.
- Volume — pick existing, create new, or skip.
- Confirm. If the offer was taken by someone else between listing and order (410), the picker loops back so you don't lose your plan/volume choices.
Flags (skip any prompt):
gg rent --gpu "RTX 4090" --count 1 --pricing week \
--image pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime \
--disk 100 --max-price 1.50 --verified
Swap the GPU on a running instance
gg replace # pick interactively
gg replace 142 # replace this instance
Destroys the old instance (DELETE, not just stop — the provider machine and SSH proxy port are released) and rents a new GPU preserving the original billing plan and volume. Docker image defaults to whatever the old instance was running; you can change it in the same flow.
SSH into an instance
gg open # auto-select if only one instance, else arrow-key menu
gg open 142 # connect to a specific instance
The CLI scans ~/.ssh/*.pub and offers to attach a matching key before connecting. It also calls /v1/instances/{id}/ssh-proxy/ensure so you always get routed through ssh.gpuniq.com — never a bare provider IP — even on older orders whose proxy allocation failed at order time.
LLM chat
gg llm "Write a haiku about CUDA streams"
gg llm # interactive REPL: /exit, /clear
gg llm --list-models
gg llm -m claude-haiku-4-5 --temperature 0.3 "..."
Image generation
gg image "a red cat astronaut" # auto-named PNG in cwd
gg image "variations" -n 4 -o ./renders/ # directory target
gg image "edit" --input cat.png --model nano-banana-pro --size 2048x2048 -o cat_v2.png
For Nano Banana slugs, gg image automatically uses the async-poll path so you never hit the 100s proxy timeout.
Instance list / stop / delete
gg orders # list active instances
gg stop # interactive or: gg stop 142
Use gg replace to swap GPUs; use gg stop for temporary pause.
SSH keys
gg ssh-keys list
gg ssh-keys add # uploads ~/.ssh/id_*.pub (interactive pick if multiple)
Volumes
gg volumes # list
gg volumes create my-data --size 50 --description "training set"
gg volumes delete 7
GPU-mode commands (on the instance)
gg init <token> # one-time, usually done automatically on deploy
gg python train.py # run under checkpointing — logs, exit code, duration persisted
gg bash run_pipeline.sh
gg list # list checkpoints
gg logs <checkpoint_id> --tail 200
gg services # list persistent services; gg services rm <id> / clear
gg restart # re-run all registered services (used during auto-recovery)
gg replay # re-run commands interrupted by the last instance death
When the platform replaces your GPU instance (hardware failure, auto-recovery, gg replace), the volume is synced to the new machine and gg restart resumes every registered service automatically.
Error handling
from gpuniq import GPUniq, GPUniqError, AuthenticationError, RateLimitError, NotFoundError, ValidationError
client = GPUniq(api_key="gpuniq_your_key")
try:
instances = client.instances.list()
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited, retry after {e.retry_after}s")
except NotFoundError:
print("Resource not found")
except ValidationError as e:
print(f"Bad request: {e}")
except GPUniqError as e:
print(f"API error: {e.message} (code={e.error_code}, status={e.http_status})")
Rate limit: 120 requests/minute per API key. The SDK automatically retries on 429 (up to 3 times with Retry-After backoff).
Configuration
client = GPUniq(
api_key="gpuniq_your_key",
base_url="https://api.gpuniq.com/v1", # default
timeout=120, # seconds (default 60)
)
CLI config lives at ~/.gpuniq/config.json:
{
"version": 1,
"api_key": "gpuniq_...",
"api_base_url": "https://api.gpuniq.com/v1"
}
To point the CLI at a staging environment:
gg login --api-url https://dev-api.gpuniq.com/v1
Backward compatibility
v1.x code continues to work:
import gpuniq
client = gpuniq.init("gpuniq_your_key")
response = client.request("claude-haiku-4-5", "Hello!")
License
MIT
gpuniq.com | PyPI | GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpuniq-3.5.6.tar.gz.
File metadata
- Download URL: gpuniq-3.5.6.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
058878d31a9ddd2df3613df755d00bd65bab28c0472464105434ab6bdb46be4a
|
|
| MD5 |
9438b0ef93631913b754de99f585f795
|
|
| BLAKE2b-256 |
2c9d673ac5b5692d84a178d50eb42ac0133136ea240427907df72e05d79d422a
|
File details
Details for the file gpuniq-3.5.6-py3-none-any.whl.
File metadata
- Download URL: gpuniq-3.5.6-py3-none-any.whl
- Upload date:
- Size: 56.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00e30feb31593230d0195784cc0a1608918871eb09ff6b04daf220066d2e1160
|
|
| MD5 |
bdedb0ee74b6fe97585615b7504b5faf
|
|
| BLAKE2b-256 |
c9dd6720b036d18b88b8001ab4d898a3ad4699344f0c8318544e22b1d951cbd5
|