Skip to main content

A simple scheduler for running commands on multiple GPUs.

Project description

simple_gpu_scheduler

A simple scheduler to run your commands on individual GPUs. Following the KISS principle, this script simply accepts commands via stdin and executes them on a specific GPU by setting the CUDA_VISIBLE_DEVICES variable.

The commands read are executed using the login shell, thus redirections > pipes | and all other kinds of bash magic can be used.

Installation

The package can simply be installed from pypi

$ pip install simple_gpu_scheduler

Example

To show how this generally works, we will create jobs that simply outputs a job id and the value of CUDA_VISIBLE_DEVICES:

for i in {0..10}; do echo "echo job_id=$i device=\$CUDA_VISIBLE_DEVICES && sleep 3"; done | simple_gpu_scheduler --gpus 0,1,2

which results in the following output:

Processing `command echo job_id=0 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing `command echo job_id=1 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 1
Processing `command echo job_id=2 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=0 device=2
job_id=1 device=1
job_id=2 device=0
--- 3 seconds no output ---
Processing command `echo job_id=3 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing command `echo job_id=4 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 1
Processing command `echo job_id=5 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=3 device=2
job_id=4 device=1
job_id=5 device=0
--- 3 seconds no output ---
Processing command `echo job_id=6 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing command `echo job_id=7 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 1
Processing command `echo job_id=8 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=6 device=2
job_id=7 device=1
job_id=8 device=0
--- 3 seconds no output ---
Processing command `echo job_id=9 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing command `echo job_id=10 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=9 device=2
job_id=10 device=0

This is equivalent to creating a file commands.txt with the following content:

echo job_id=0 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=1 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=2 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=3 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=4 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=5 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=6 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=7 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=8 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=9 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=10 device=$CUDA_VISIBLE_DEVICES && sleep 3

and running

simple_gpu_scheduler --gpus 0,1,2 < commands.txt

Simple scheduler for jobs

Combined with some basic command line tools, one can set up a very basic scheduler which waits for new jobs to be "submitted" and executes them in order of submission.

Setup and start scheduler in background or in a separate permanent session (using for example tmux):

touch gpu.queue
tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2

the command tail -f -n 0 follows the end of the gpu.queue file. Thus if there was anything written into gpu.queue prior to the execution of the command it will not be passed to simple_gpu_scheduler.

Then submitting commands boils down to appending text to the gpu.queue file:

echo "my_command_with | and stuff > logfile" >> gpu.queue

Hyperparameter search

In order to allow user friendly utilization of the scheduler in the common scenario of hyperparameter search, a convenience script simple_hypersearch is included in the package.

simple_hypersearch -h
usage: simple_hypersearch [-h] [--sampling-mode {shuffled_grid,grid}]
                          [--n-samples N_SAMPLES] [--seed SEED]
                          [-p NAME [VALUES ...]]
                          command_pattern

Convenience tool to generate hyperparameter search commands from a command pattern and parameter ranges.

positional arguments:
  command_pattern       Command pattern where placeholders with {parameter_name} should be replaced.

optional arguments:
  -h, --help            show this help message and exit
  --sampling-mode {shuffled_grid,grid}
                        Determine how to sample commands. Either in the grid order [grid]
                        or in a shuffled order [shuffled_grid, default].
  --n-samples N_SAMPLES
                        Number of samples to draw. If not provided use all possible combinations.
  --seed SEED           Random seed to ensure reproducability when using randomized order of the grid.
  -p NAME [VALUES ...], --parameter NAME [VALUES ...]
                        Name of parameter followed by values that should be considered for hyperparameter search.
                        Example: `-p lr 0.01 0.001 0.0001`

Usage example:
    simple_hypersearch "my_program --param1 {param1} --param2 {param2}" -p param1 0 1 -p param2 2 3
    will generate the output:
    my_program --param1 0 --param2 2
    my_program --param1 0 --param2 3
    my_program --param1 1 --param2 2
    my_program --param1 1 --param2 3

This allows to easily perform hyperparameter searches over a grid of values or uniform samples of the grid (dependent on the setting of sampling-mode). The output can directly be piped into simple_gpu_scheduler or appended to the "queue file" (see Simple scheduler for jobs).

Here some more concrete examples:

Grid of all possible parameter configurations in random order:

simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2

5 uniformly sampled parameter configurations:

simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2

TODO

  • Multi line jobs (evtl. we would then need a submission script after all)
  • Stop, but let commands finish when receiving a defined signal
  • Tests would be nice, until now the project is still very small but if it grows tests should be added

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_gpu_scheduler-0.1.4.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

simple_gpu_scheduler-0.1.4-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file simple_gpu_scheduler-0.1.4.tar.gz.

File metadata

  • Download URL: simple_gpu_scheduler-0.1.4.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for simple_gpu_scheduler-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9b976e5ecbda2a4b9ba1e7651995232192572a72acff6176444be9f051b6e13f
MD5 5def38730b75192393d587d16951f789
BLAKE2b-256 7a650e26c47691274fcb14616f5cd6075c4b67534fadaedee95588c961339114

See more details on using hashes here.

File details

Details for the file simple_gpu_scheduler-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: simple_gpu_scheduler-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for simple_gpu_scheduler-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bc3a8d7268eac63891e051807f61eb2d4155701f3aa821f50052ae128617ebda
MD5 ec3bf3b5203f9d89271605f7f873f7cf
BLAKE2b-256 5fe8f88a719d4c6c1e58446a126d7a78f17d29df41a23fa15f30542e8c02f245

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page