Skip to main content

A command-line tool for distributed parallel execution across multiple GPUs

Project description

🐙 OctoRun

Distributed Parallel Execution Made Simple

A powerful command-line tool for running Python scripts across multiple GPUs with intelligent task management and monitoring

PyPI version Python CUDA License Build Status


📋 Overview

OctoRun is designed to help you run computationally intensive Python scripts across multiple GPUs efficiently. It automatically manages GPU allocation, chunks your workload, handles failures with retry mechanisms, and provides comprehensive monitoring and logging.

✨ Key Features

  • 🔍 Automatic GPU Detection: Automatically detects and utilizes available GPUs
  • 🧩 Intelligent Chunk Management: Divides work into chunks and distributes across GPUs
  • 🔄 Failure Recovery: Automatic retry mechanism for failed chunks
  • 📊 Comprehensive Logging: Detailed logging for monitoring and debugging
  • ⚙️ Flexible Configuration: JSON-based configuration with CLI overrides
  • 🎯 Kwargs Support: Pass custom arguments to your scripts via config or CLI
  • 💾 Memory Monitoring: Monitor GPU memory usage and thresholds
  • 🔒 Lock Management: Prevent duplicate processing of chunks

🚀 Installation

You can install OctoRun using pip or uv.

Via pip

pip install octorun

Via uv

# Install globally
uv tool install octorun

# Install in your project
uv add octorun

Optional extras

  • Benchmark tooling: pip install "octorun[benchmark]" (installs PyTorch with CUDA support)

⚡ Quick Start

  1. Create Configuration:

    octorun save_config --script ./your_script.py
    
  2. Run Your Script:

    octorun run
    
  3. Monitor GPUs:

    octorun list_gpus -d
    

🎮 Commands

run (r)

Run your script with the specified configuration.

octorun run --config config.json [--kwargs '{"key": "value"}']

save_config (s)

Generate a default configuration file.

octorun save_config --script ./your_script.py

list_gpus (l)

List available GPUs and their current usage.

octorun list_gpus [--detailed]

The detailed flag provides a more comprehensive view of GPU stats, including memory usage, temperature, and running processes.

benchmark (b)

Run a benchmark to determine the optimal number of parallel processes for your GPUs.

octorun benchmark

This command runs a series of tests to help you configure the gpus parameter in your config.json for the best performance. Requires the optional benchmark extra (pip install "octorun[benchmark]") so PyTorch is available.

⚙️ Configuration

OctoRun uses a config.json file for configuration. You can generate a default one with octorun save_config.

Option Description Default
script_path Path to your Python script -
gpus "auto" or list of GPU IDs "auto"
total_chunks Number of chunks to divide work into 128
log_dir Directory for log files "./logs"
chunk_lock_dir Directory for chunk lock files "./logs/locks"
monitor_interval Monitoring interval in seconds 60
restart_failed Whether to restart failed processes false
max_retries Maximum retries for failed chunks 3
memory_threshold Memory threshold percentage 90
kwargs Custom arguments to pass to your script {}

🎯 Using Kwargs

You can pass custom arguments to your script via the kwargs object in your config.json or directly through the CLI.

CLI kwargs will override config file kwargs.

octorun run --kwargs '{"batch_size": 128, "learning_rate": 0.005}'

🔧 Script Implementation

Your script must accept the following arguments:

  • --gpu_id: GPU device ID (int)
  • --chunk_id: Current chunk number (int)
  • --total_chunks: Total number of chunks (int)

Here is an example of how to structure your script:

import argparse
import torch

def main():
    parser = argparse.ArgumentParser()
    
    # Required OctoRun arguments
    parser.add_argument('--gpu_id', type=int, required=True)
    parser.add_argument('--chunk_id', type=int, required=True)
    parser.add_argument('--total_chunks', type=int, required=True)
    
    # Your custom arguments
    parser.add_argument('--batch_size', type=int, default=32)
    parser.add_argument('--learning_rate', type=float, default=0.001)
    parser.add_argument('--model_type', type=str, default='default')
    parser.add_argument('--epochs', type=int, default=1)
    parser.add_argument('--output_dir', type=str, default='./output')
    
    args = parser.parse_args()
    
    # Set the GPU device
    if torch.cuda.is_available():
        torch.cuda.set_device(args.gpu_id)
        print(f"Using GPU {args.gpu_id}")
    
    print(f"Processing chunk {args.chunk_id}/{args.total_chunks}")
    
    # Your logic here

if __name__ == "__main__":
    main()

🤝 Contributing

Contributions are welcome! Please fork the repository, create a feature branch, and submit a pull request.

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octorun-1.0.0.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

octorun-1.0.0-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file octorun-1.0.0.tar.gz.

File metadata

  • Download URL: octorun-1.0.0.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for octorun-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f4ebda19d3ca175062d7b6f77926c1bfe9dae903de83f05bb0fc9711955c2279
MD5 ec531b2bf6576e3d67103026794f73db
BLAKE2b-256 5f5de99ec6d9a330012670f1d90c9c44a9ef09086afd99b8cf09ac7c39c1707b

See more details on using hashes here.

Provenance

The following attestation bundles were made for octorun-1.0.0.tar.gz:

Publisher: publish.yml on HarborYuan/OctoRun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file octorun-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: octorun-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for octorun-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddb62c91481af8f13d2df082c172d39b17acf29fcfb8e40bbf4bb5d128fded6a
MD5 046c6f05901a8c2373b2e92ebd2ee764
BLAKE2b-256 5f813c1175079264c6de62d851310f2faac9c8b52e658a39664b724093f7be08

See more details on using hashes here.

Provenance

The following attestation bundles were made for octorun-1.0.0-py3-none-any.whl:

Publisher: publish.yml on HarborYuan/OctoRun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page