Skip to main content

ParaCuda - Parallel Execution of CUDA hyperparameter sweeps in Python for pytorch and beyond.

Project description

ParaCuda

release tag PyPI version

paracuda

Overview

ParaCuda is a parallel cuda execution framework for arbitrary tools. Consider it a (very) poor man's version of workflow management systems. ParaCuda uses a config file that describes the script that needs to be run and the parameters that we want to pass down to the script. ParaCuda then logs the execution and distributes it over all specified GPUs.

Motivation

In machine learning, or modern bioinformatics we often require GPUs for execution of processes or predictions. Sometimes, this is extremely trivial and a simple solution is desired that can run a script parameterized in different ways across all available GPUs. ParaCuda addresses this challenge by easy set-up and execution of experiments concurrently.

Installation

For the moment, clone the repository and create the environment using mamba.

git clone https://github.com/gieses/paracuda.git
mamba env create -f environment.yml

Usage

Usage is very simple, use the paracuda command together with a configuration file (see below).

usage: paracuda [-h] --config CONFIG --gpus GPUS [--control_dir CONTROL_DIR] [--log-level {TRACE,DEBUG,INFO,SUCCESS,WARNING,ERROR,CRITICAL}] [--dry-run]

Run parallel grid search across multiple GPUs

options:
  -h, --help            show this help message and exit
  --config CONFIG       Path to JSON configuration file (default: None)
  --gpus GPUS           Number of GPUs to use (default: None)
  --control_dir CONTROL_DIR
                        Directory to store control files (default: control_dir)
  --log-level {TRACE,DEBUG,INFO,SUCCESS,WARNING,ERROR,CRITICAL}
                        Set the logging level (default: INFO)
  --dry-run             Only print commands without executing them (default: False)

Example

Please check the example directory for a simple demonstration of how to write the config and the python script. Note, that per default we generate all possible hyperparameter combinations from the config file to run the defined script.

{
  "base_command": "python example/example_script.py",
  "output_dir": "results_directory",
  "param_grid": {
    "number": [1, 2, 3]
  }
}
paracuda --config example/example_config.json --gpus 2 --dry-run

Returns:

2025-06-18 23:48:35 | INFO     | ๐Ÿ” DRY RUN MODE: Commands will be printed but not executed
2025-06-18 23:48:35 | INFO     | ๐Ÿ” Starting grid search with 2 GPUs
2025-06-18 23:48:35 | SUCCESS  | ๐Ÿ“‚ Successfully loaded configuration from example/example_config.json
2025-06-18 23:48:35 | INFO     | ๐Ÿงฎ Generated 3 parameter combinations
2025-06-18 23:48:35 | INFO     | Storing results in: control_dir
2025-06-18 23:48:37 | INFO     | โš™๏ธ Starting parallel execution                                                                                                                                             
โณ Starting... | โœ… 0 | โŒ 0:   0%|                                                                                                                                                          | 0/3 [00:02<?]
2025-06-18 23:48:40.455 | INFO     | paracuda.paracuda_run:run_task:148 - ๐Ÿ” [DRY RUN] Task 1 on GPU 0: c5b39f71
2025-06-18 23:48:40.455 | INFO     | paracuda.paracuda_run:run_task:149 - ๐Ÿ“ Command: CUDA_VISIBLE_DEVICES=0 python example/example_script.py --number 1 > control_dir/c5b39f7159270a92961b45aaf8ea44fa.log 2>&1
2025-06-18 23:48:40.455 | INFO     | paracuda.paracuda_run:run_task:148 - ๐Ÿ” [DRY RUN] Task 2 on GPU 1: e360de0c
2025-06-18 23:48:40.455 | INFO     | paracuda.paracuda_run:run_task:149 - ๐Ÿ“ Command: CUDA_VISIBLE_DEVICES=1 python example/example_script.py --number 2 > control_dir/e360de0cffb59c8b5711943068cee5e2.log 2>&1
2025-06-18 23:48:40.456 | INFO     | paracuda.paracuda_run:run_task:148 - ๐Ÿ” [DRY RUN] Task 3 on GPU 0: ba21f795
2025-06-18 23:48:40.456 | INFO     | paracuda.paracuda_run:run_task:149 - ๐Ÿ“ Command: CUDA_VISIBLE_DEVICES=0 python example/example_script.py --number 3 > control_dir/ba21f7957fd020b1c4473607144bf4ac.log 2>&1
โณ Task 3 on GPU 0: ba21f795 | โœ… 3 | โŒ 0: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3/3 [00:02<00:00]
2025-06-18 23:48:40 | SUCCESS  | โœจ Dry run completed! Printed 3 commands

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paracuda-0.3.0.tar.gz (695.3 kB view details)

Uploaded Source

Built Distribution

paracuda-0.3.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file paracuda-0.3.0.tar.gz.

File metadata

  • Download URL: paracuda-0.3.0.tar.gz
  • Upload date:
  • Size: 695.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for paracuda-0.3.0.tar.gz
Algorithm Hash digest
SHA256 35608c3a3ddb31387d513af512babf35ea71e9b4ddedd57e97605ce324d3e280
MD5 22d13554cb737d844b64f7b6e22681b5
BLAKE2b-256 f5f696eb93b85892f2afb11e3a580553f1dee7d5fc8a4b7c11a4f1c817dd5ff4

See more details on using hashes here.

File details

Details for the file paracuda-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: paracuda-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for paracuda-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c6ab680b8503ce1d638cf2db9183aaec96870b83f17e2a3d6ba234d8ba74ad6
MD5 ed220e254871649173986845b0eb9223
BLAKE2b-256 0a496abb7c9c5f8a269ef378985f958698771772acb9a9f583c074b4e1c9e837

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page