Skip to main content

Wait for a free GPU, claim it, and run a command on it.

Project description

gpu-gate

CI PyPI Python License: MIT

Wait for a free GPU, claim it, set CUDA_VISIBLE_DEVICES, and run your command.

On a shared multi-GPU box without a cluster scheduler, starting a job usually means watching nvidia-smi, picking a card by hand, exporting the env var, and remembering to actually launch. gpu-gate is the small wait-pick-export-run loop that does this for you, with a cooperative lock so two invocations on the same host do not grab the same just-freed card. No daemon, no server, nothing to administer.

$ gpu-gate run --min-free-mb 8000 -- python train.py
gpu-gate: waiting for a free GPU ...
# ... blocks until a card has >= 8 GB free, then runs train.py with
# CUDA_VISIBLE_DEVICES set to the chosen index

Install

$ pip install gpu-gate                 # from PyPI, once released
$ pip install git+https://github.com/jmweb-org/gpu-gate   # latest, available now

It requires an NVIDIA driver at run time. The NVML binding (nvidia-ml-py) is pulled in automatically; the package still installs and imports on machines without a GPU, so it is safe to add to shared requirements.

Usage

Run a command on a free GPU

$ gpu-gate run -n 1 --min-free-mb 8000 -- python train.py --epochs 50

Everything after -- is the command. gpu-gate blocks until the requirements are met, claims the chosen device(s), exports CUDA_VISIBLE_DEVICES, and execs the command. Its own exit code is the command's exit code, so it drops cleanly into scripts and CI.

Common options:

Option Meaning
-n, --count Number of GPUs to claim (default 1)
--min-free-mb Require at least this much free memory
--max-util Skip cards busier than this percent
--only 0,1 Restrict the search to these indices
--exclude 2,3 Never pick these indices
--poll Seconds between checks (default 5)
--timeout Give up after N seconds (exit 124)

Just wait, then use the result yourself

$ export CUDA_VISIBLE_DEVICES=$(gpu-gate wait --min-free-mb 8000)

Inspect the current state

$ gpu-gate status
idx  name           free        total       util
  0  NVIDIA L40S    44211 MiB   46068 MiB    3%
  1  NVIDIA L40S      812 MiB   46068 MiB   97%

$ gpu-gate status --json

Exit codes

Code Meaning
0 Command ran (its own code is forwarded)
2 Bad invocation (for example, no command after --)
124 Timed out waiting for a GPU
3 Requirements could never be met
4 Could not read GPU state (no driver / NVML error)

How selection works

A GPU is eligible when it has enough free memory, is below the utilization ceiling, is not excluded, and is not currently locked by another gpu-gate caller. Eligible cards are ranked by most free memory, then lowest utilization, then index, and the top --count are chosen. The ordering is fully deterministic.

Locking

While a command runs, gpu-gate holds an advisory file lock per claimed device under $GPU_GATE_LOCK_DIR (a per-user directory by default). Other gpu-gate invocations skip locked devices, which avoids the classic race where two jobs both see the same card free at the same instant. The lock is advisory: it coordinates gpu-gate callers, not arbitrary CUDA programs.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpu_gate-0.2.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpu_gate-0.2.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file gpu_gate-0.2.0.tar.gz.

File metadata

  • Download URL: gpu_gate-0.2.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gpu_gate-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8ea0fed668208c5bca85373a4aac7c5c9131ebc8eb9e638fd1ecaeb882892cce
MD5 3e8dea72ebfa0f155dbc7b5d658509ec
BLAKE2b-256 1854060f9046fa039cb091e0668cf172deb5b2bb3c9e562c36c616b9a6ab8b45

See more details on using hashes here.

File details

Details for the file gpu_gate-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gpu_gate-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gpu_gate-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a068635d8505584559feee93df601dea2264cbaa5565f77cf45c77fb97e1e687
MD5 9282f6897d9e20f6c3d33bb126e26463
BLAKE2b-256 ffb56560feac9ff0ccc67a909c006930880200735b1fc2fb00ec4647d6d0cdba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page