gpuctl-cli

AI Computing Power Scheduling Platform

These details have not been verified by PyPI

Project links

Project description

gpuctl logo

python kubernetes license contributions

🚀 Schedule GPU Clusters Like Writing Python Scripts

Declarative YAML · Zero K8s Knowledge · Resource Pool Isolation

简体中文 • Quick Start • Documentation • Features

✨ Why gpuctl

One Command to Rule Them All
gpuctl create -f job.yaml

_{Say goodbye to 100+ lines of K8s YAML, submit tasks with declarative configuration} Multi-Team Resource Isolation
Training Pool / Inference Pool / Dev Pool

_{Logical isolation prevents resource contention, with quota management per team} Distributed Training Built-In
Indexed Job + Headless Service

_{Set resources.nodes: N — platform auto-injects DDP env vars (MASTER_ADDR / RANK / WORLD_SIZE)} One-Stop Monitoring
Logs / Events / Resource Usage

_{No more kubectl get pods to find pod names}

ML Engineer Friendly
kind / job / resources

_{Familiar YAML syntax, no need to understand Pod/Deployment concepts}

Namespace-Level Quotas
CPU / Memory / GPU

_{Auto-bind ResourceQuota when creating Namespace}

Complete API Support
HTTP / WebSocket

_{Easy integration with MLOps platforms or third-party tools}

Existing K8s Cluster
Ready to Use

_{No cluster configuration changes, no impact on existing workloads}

Zero-config NFS storage on every job
_{Operator runs gpuctl init once → every job auto-mounts a persistent, per-user /home/jovyan (read-write) and a shared /datasets (read-only). No mount paths, no storage classes, no PVCs in user YAML. Files survive restarts and are shared across a user's Notebook and Training jobs.}

🚀 Quick Start

# 1. Install CLI
pip install gpuctl

# (operator, once) Enable transparent persistent storage for every job
gpuctl init --nfs-server <IP> --nfs-path /exports

# 2. Submit LLM fine-tuning task (4x A100)
cat > training.yaml << 'EOF'
kind: training
version: v0.1
job:
  name: qwen2-7b-sft
environment:
  image: llama-factory:latest
  command: ["llamafactory-cli", "train", "--stage", "sft"]
resources:
  pool: training-pool
  gpu: 4
  cpu: 32
  memory: 128Gi
EOF

gpuctl create -f training.yaml

# 3. Check task status
gpuctl get jobs

# 4. View logs in real-time
gpuctl logs qwen2-7b-sft -f

🆚 gpuctl vs Native Kubectl

Scenario	✨ gpuctl Way	Native Kubectl Way
📝 Submit Training Task	Just 15-20 lines of declarative config, fill in familiar fields like kind, job.name, resources.gpu, and submit	Write 120+ lines of K8s YAML, manually create Secret, ConfigMap, Job resources, understand PodSpec, ResourceRequirements, VolumeMounts
📊 Check Task Status	One command for all tasks `gpuctl get jobs`, auto-aggregate Pod status, show task name, status, resource usage	`kubectl get jobs` to find Job, then `get pods -l job-name=xxx` to find Pod, finally `describe pod` for details, tedious process
🔍 View Task Logs	Use task name directly `gpuctl logs <job-name> -f`, auto-track Pod changes, support multi-replica aggregated logs	Remember Pod name (e.g. `training-job-7d9f4b8c5-x2mnp`), run `kubectl logs <pod-name> -f`, re-find after Pod restart
🧠 Multi-Node Distributed Training	Just set `resources.nodes: N`, platform creates an Indexed Job + Headless Service and auto-injects DDP rendezvous env vars (MASTER_ADDR, MASTER_PORT, WORLD_SIZE, RANK, LOCAL_RANK); all workers share one NFS `/home/jovyan` for checkpoints	Manually create an Indexed/JobSet + Headless Service, wire up MASTER_ADDR/RANK/WORLD_SIZE, provision shared storage, understand GPU communication and process groups
🏊 Resource Pool Management	Declarative pool config, `pool: training-pool` auto-schedules to corresponding node group, supports multi-team isolation and quota control	Manually bind nodes via LabelSelector and NodeAffinity, maintain complex scheduling strategies and resource limits per team
📋 Resource Quota Management	Quota auto-created with Namespace, `gpuctl describe quota` one-click view of used/total, auto-reject with friendly message when exceeded	Manually create ResourceQuota and LimitRange, configure per Namespace, query usage multiple times for aggregation
⚡ Deploy Inference Service	Auto-create Deployment + Service, declare replicas and port, auto-generate NodePort to expose service, built-in readiness probe	Create Deployment, Service, Ingress/NodePort separately, configure HPA auto-scaling, understand Service types and network policies
📓 Launch Notebook	One-click JupyterLab launch, auto-generate access link, support custom images and passwords, auto-mount storage volumes	Manually create StatefulSet, Headless Service, Ingress, configure PVC storage, handle Jupyter Token and passwords

🏗️ Architecture

gpuctl architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────────────────────┐
│   User      │────▶│  gpuctl CLI │────▶│  K8s Job/Deployment/        │
│  (YAML)     │     │   / REST API│     │  StatefulSet + Service      │
└─────────────┘     └─────────────┘     └─────────────────────────────┘

📚 Documentation

Complete documentation is available in the docs/ directory, or check out the quick navigation below:

Getting Started

Quick Start — Get started with gpuctl in 5 minutes
Installation Guide — Detailed installation steps

User Guides

Training Tasks — LLM fine-tuning, conda env reuse, multi-node distributed training
Persistent Storage — transparent NFS /home/jovyan + /datasets, zero config in job YAML
Inference Services — VLLM inference deployment; single-node tensor-parallel and multi-node (resources.nodes: N) model-parallel serving
Notebooks — JupyterLab interactive development
Resource Pool Management — GPU resource pool configuration

Reference

CLI Commands — Complete command reference
API Documentation — RESTful API specifications
FAQ — Frequently asked questions and troubleshooting

Development & Contribution

Architecture Design — System design documentation
Local Development — Development environment setup
Contributing Guide — How to contribute

💻 Installation

Prerequisites

Python 3.8+
Kubernetes cluster access (via kubectl)

From PyPI (Recommended)

pip install gpuctl

From Source

git clone https://github.com/runwhere-ai/gpuctl.git
cd gpuctl
pip install -e .

Binary Download

# Linux
wget https://github.com/runwhere-ai/gpuctl/releases/latest/download/gpuctl-linux-amd64
chmod +x gpuctl-linux-amd64
sudo mv gpuctl-linux-amd64 /usr/local/bin/gpuctl

🌟 Show Your Support

If gpuctl helps you, please give us a ⭐️ Star!

📄 License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpuctl_cli-0.9.0.tar.gz (99.9 kB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpuctl_cli-0.9.0-py3-none-any.whl (125.1 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file gpuctl_cli-0.9.0.tar.gz.

File metadata

Download URL: gpuctl_cli-0.9.0.tar.gz
Upload date: Jun 29, 2026
Size: 99.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for gpuctl_cli-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`5a3088731ad53b58f4d03bc60614430b280192d9d916aa8d246e93bcef1dcf8c`
MD5	`ebf157300ab215b84652e5e0aff2e80c`
BLAKE2b-256	`65f64cf8ca486516ac7726f5c4b23ae83461b959521d6a7c4f04c8fdb5512cf5`

See more details on using hashes here.

File details

Details for the file gpuctl_cli-0.9.0-py3-none-any.whl.

File metadata

Download URL: gpuctl_cli-0.9.0-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 125.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for gpuctl_cli-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a33b2e34d820232eab62ee04d85de5fc2a4dd2717bf4d53e59f1fbccfed2cfa0`
MD5	`b9c159d67c26efe86cff0ffeabc96acb`
BLAKE2b-256	`17feeda90c9692204fd658aa8a4426c6722d84b4e1025278d9f2e00f4a4f71e2`

See more details on using hashes here.

gpuctl-cli 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 Schedule GPU Clusters Like Writing Python Scripts

✨ Why gpuctl

🚀 Quick Start

🆚 gpuctl vs Native Kubectl

🏗️ Architecture

📚 Documentation

💻 Installation

Prerequisites

From PyPI (Recommended)

From Source

Binary Download

🌟 Show Your Support

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes