A process-safe acquisition of exclusive GPU

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

safe-gpu

A module for safe acquisition of GPUs in exclusive mode. Relevant mainly in clusters with a purely declarative gpu resource, such as many versions of SGE.

Features:

toolkit independence (PyTorch/TensorFlow/pycuda/...), this just sets CUDA_VISIBLE_DEVICES properly
included support for PyTorch and TensorFlow2 backends, open to others
multiple GPUs acquisition
workaround for machines with a single GPU used for display and computation alike
open to implementation in different languages

Downsides:

in order to really prevent the race condition, everyone on your cluster has to use this

Instalation

In addition to manual installation, safe-gpu is on PyPi, so you can simply:

pip install safe-gpu

Note that safe-gpu does not formally depend on any backend, giving you, the user, the freedom to pick one of your liking.

Usage

Prior to initializing CUDA (typically happens in a lazy fashion when you place something on GPU), instantiate GPUOwner and bind it to a variable, that's all.

from safe_gpu import safe_gpu

gpu_owner = safe_gpu.GPUOwner()

If you want multiple GPUs, pass the desired number to GPUOwner:

gpu_owner = safe_gpu.GPUOwner(nb_gpus)

Other backends

The default implementation uses a PyTorch tensor to claim a GPU. Additionally, a TensorFlow2 placeholder is provided as safe_gpu.tensorflow_placeholder.

If you don't want to / can't use that, provide your own GPU memory allocating function as GPUOwner's parameter placeholder_fn. It has to accept one parameter device_no, occupy a (preferrably negligible) piece of memory on that device, and return a pointer to it.

Pull requests for other backends are welcome.

Checking that it works

Together with this package, a small testing script is provided. It exagerrates the time needed to acquire the GPU after polling nvidia-smi, making the race condition technically sure to happen.

To run the following example, get to a machine with 3 free GPUs and run two instances of the script in parallel as shown. You should see in the output that one of them really waited for the faster one to fully acquire the GPU.

This script is not distributed along in the pip package, so please download it separately.

$ python3 gpu-acquisitor.py --backend pytorch --id 1 --nb-gpus 1 & python3 gpu-acquisitor.py --backend pytorch --id 2 --nb-gpus 2
GPUOwner1 2020-11-30 14:29:33,315 [INFO] acquiring lock
GPUOwner1 2020-11-30 14:29:33,315 [INFO] lock acquired
GPUOwner2 2020-11-30 14:29:33,361 [INFO] acquiring lock
GPUOwner1 2020-11-30 14:29:34,855 [INFO] Set CUDA_VISIBLE_DEVICES=2
GPUOwner2 2020-11-30 14:29:45,447 [INFO] lock acquired
GPUOwner1 2020-11-30 14:29:45,447 [INFO] lock released
GPUOwner2 2020-11-30 14:29:48,926 [INFO] Set CUDA_VISIBLE_DEVICES=4,5
GPUOwner1 2020-11-30 14:29:54,492 [INFO] Finished
GPUOwner2 2020-11-30 14:30:00,525 [INFO] lock released
GPUOwner2 2020-11-30 14:30:09,571 [INFO] Finished

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.5.1

Feb 7, 2023

1.5

Dec 12, 2022

1.4

Feb 18, 2022

1.3

Aug 20, 2021

1.2.2

May 18, 2021

This version

1.2.1

Mar 9, 2021

1.2

Feb 4, 2021

1.1

Nov 30, 2020

1.0

Nov 29, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safe-gpu-1.2.1.tar.gz (4.0 kB view hashes)

Uploaded Mar 9, 2021 Source

Built Distribution

safe_gpu-1.2.1-py3-none-any.whl (5.0 kB view hashes)

Uploaded Mar 9, 2021 Python 3

Hashes for safe-gpu-1.2.1.tar.gz

Hashes for safe-gpu-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`c9b83f21aaa4843763149e24bb8ffb6a3858b60af00f2f2bad5a61a8b7d38a49`
MD5	`fbc092a882ca9000951ff97453c175be`
BLAKE2b-256	`7ba9efca79d642c4372e44320dab9152e3cee6f88fcee7a0433eec636fcc2cf1`

Hashes for safe_gpu-1.2.1-py3-none-any.whl

Hashes for safe_gpu-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1392103c6b52be8dc0d4aa054fba226ca40e110df099a70fa4b800f33044fded`
MD5	`3266602dbd0a63dc3b417f284dcbdbdd`
BLAKE2b-256	`7b3756fd5404e193eea81b7b576a9f3a1a1d463159cf06efc22ae274189df936`