A command-line tool for distributed parallel execution across multiple GPUs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yuanhaobo

These details have not been verified by PyPI

Project description

🐙 OctoRun

Distributed Parallel Execution Made Simple

A powerful command-line tool for running Python scripts across multiple GPUs with intelligent task management and monitoring

📋 Overview

OctoRun is designed to help you run computationally intensive Python scripts across multiple GPUs efficiently. It automatically manages GPU allocation, chunks your workload, handles failures with retry mechanisms, and provides comprehensive monitoring and logging.

✨ Key Features

🔍 Automatic GPU Detection: Automatically detects and utilizes available GPUs
🧩 Intelligent Chunk Management: Divides work into chunks and distributes across GPUs
🔄 Failure Recovery: Automatic retry mechanism for failed chunks
📊 Comprehensive Logging: Detailed logging for monitoring and debugging
⚙️ Flexible Configuration: JSON-based configuration with CLI overrides
🎯 Kwargs Support: Pass custom arguments to your scripts via config or CLI
💾 Memory Monitoring: Monitor GPU memory usage and thresholds
🔒 Lock Management: Prevent duplicate processing of chunks

🚀 Installation

Quick Run via uv (Without Installation)

uvx octorun [run, save_config, list_gpus]

Via uv (Installation, Globally)

uv tool install octorun

Via uv (Install in Your Own Project)

uv add octorun

Via pip

pip install octorun

⚡ Quick Start

1️⃣ Create Configuration

octorun save_config --script ./your_script.py

octorun s --script ./your_script.py

2️⃣ Run Your Script

octorun run [--config config.json]

octorun r

3️⃣ Monitor GPU Usage

octorun list_gpus [--detailed]

octorun l -d

4️⃣ View Logs

tail -f logs/session_*.log

and

tail -f logs/chunk_*.log

⚙️ Configuration

📄 Basic Configuration

The configuration file (config.json) contains the following options:

{
    "script_path": "./your_script.py",
    "gpus": "auto",
    "total_chunks": 128,
    "log_dir": "./logs",
    "chunk_lock_dir": "./logs/locks",
    "monitor_interval": 60,
    "restart_failed": false,
    "max_retries": 3,
    "memory_threshold": 90,
    "kwargs": {
        "batch_size": 32,
        "learning_rate": 0.001
    }
}

🔧 Configuration Options

Option	Description	Default
`script_path`	Path to your Python script	-
`gpus`	GPU configuration ("auto" or list of GPU IDs)	"auto"
`total_chunks`	Number of chunks to divide work into	128
`log_dir`	Directory for log files	"./logs"
`chunk_lock_dir`	Directory for chunk lock files	"./logs/locks"
`monitor_interval`	Monitoring interval in seconds	60
`restart_failed`	Whether to restart failed processes	false
`max_retries`	Maximum retries for failed chunks	3
`memory_threshold`	Memory threshold percentage	90
`kwargs`	Custom arguments to pass to script	{}

🎯 Using Kwargs

OctoRun supports passing additional keyword arguments to your scripts through both the configuration file and command line interface.

📋 Configuration File

Add kwargs to your config.json:

{
    "script_path": "./train_model.py",
    "gpus": "auto",
    "total_chunks": 128,
    "kwargs": {
        "batch_size": 64,
        "learning_rate": 0.01,
        "model_type": "transformer",
        "epochs": 10,
        "output_dir": "./results"
    }
}

🖥️ Command Line Interface

Override or add kwargs via command line:

# Override config kwargs
octorun run --config config.json --kwargs '{"batch_size": 128, "learning_rate": 0.005}'

# Add new kwargs
octorun run --config config.json --kwargs '{"model_type": "bert", "max_length": 512}'

🎯 Priority

CLI kwargs > Config file kwargs

CLI kwargs override config file kwargs for the same keys while preserving other config kwargs

🔧 Script Implementation

Your script must accept the required OctoRun arguments plus any custom kwargs:

import argparse

def main():
    parser = argparse.ArgumentParser()
    
    # 🔧 Required OctoRun arguments
    parser.add_argument('--gpu_id', type=int, required=True)
    parser.add_argument('--chunk_id', type=int, required=True)
    parser.add_argument('--total_chunks', type=int, required=True)
    
    # 🎯 Your custom arguments (Optional)
    parser.add_argument('--batch_size', type=int, default=32)
    parser.add_argument('--learning_rate', type=float, default=0.001)
    parser.add_argument('--model_type', type=str, default='default')
    parser.add_argument('--epochs', type=int, default=1)
    parser.add_argument('--output_dir', type=str, default='./output')
    
    args = parser.parse_args()
    
    # 🎮 Device handling - Set the GPU device
    # This is an exmple when using PyTorch
    import torch
    if torch.cuda.is_available():
        torch.cuda.set_device(args.gpu_id)
        print(f"🎮 Using GPU {args.gpu_id}: {torch.cuda.get_device_name(args.gpu_id)}")
    else:
        print("⚠️  CUDA not available, using CPU")
    
    # ✨ Use the arguments in your script
    print(f"🚀 Processing chunk {args.chunk_id}/{args.total_chunks} on GPU {args.gpu_id}")
    print(f"🎯 Training with batch_size={args.batch_size}, lr={args.learning_rate}")
    
    # Your processing logic here
    ...

if __name__ == "__main__":
    main()

🎮 Commands

🚀 `run` (r)

Run your script with the specified configuration:

octorun run --config config.json [--kwargs '{"key": "value"}']

💾 `save_config` (s)

Generate a default configuration file:

octorun save_config [--script ./your_script.py]

🔍 `list_gpus` (l)

List available GPUs:

octorun list_gpus [--detailed]

📚 Examples

🤖 Example 1: Machine Learning Training

Click to expand

Config file (ml_config.json):

{
    "script_path": "./train_model.py",
    "total_chunks": 64,
    "kwargs": {
        "batch_size": 32,
        "learning_rate": 0.001,
        "model_type": "resnet50",
        "epochs": 100,
        "dataset_path": "/data/imagenet"
    }
}

Command:

octorun run --config ml_config.json --kwargs '{"batch_size": 64, "learning_rate": 0.01}'

📊 Example 2: Data Processing

Click to expand

octorun run --config config.json --kwargs '{"input_dir": "/data/raw", "output_dir": "/data/processed", "compression": "gzip"}'

📊 Monitoring and Logging

OctoRun provides comprehensive logging:

Log Type	Location	Description
📋 Session logs	`logs/session_TIMESTAMP.log`	Overall session information
🧩 Chunk logs	`logs/chunk_N.log`	Individual chunk processing logs
🔒 Lock files	`logs/locks/`	Chunk completion tracking

📊 Real-time Monitoring

# Monitor session progress
tail -f logs/session_*.log

# Monitor specific chunk
tail -f logs/chunk_42.log

# Monitor GPU usage
watch -n 1 'octorun list_gpus --detailed'

🛠️ Error Handling

🔄 Automatic retry mechanism for failed chunks
📊 Configurable maximum retry attempts
💾 Memory threshold monitoring
📝 Comprehensive error logging

Robust error handling ensures your jobs complete successfully

📋 Requirements

🐍 Python ≥ 3.10
🎮 NVIDIA GPUs with CUDA support
🔧 nvidia-smi tool available in PATH

🤝 Contributing

We welcome contributions! Here's how to get started:

🍴 Fork the repository
🌿 Create a feature branch
✨ Make your changes
🧪 Add tests
📤 Submit a pull request

📄 License

This project is licensed under the MIT License.

👨‍💻 Author

Haobo Yuan - haoboyuan@ucmerced.edu

🙏 Acknowledgements

The project is highly relied on AI tools for code generation and documentation, enhancing productivity and code quality.

Made with ❤️ and 🤖 AI assistance

Star ⭐ this repo if you find it useful!

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yuanhaobo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.0

Jun 1, 2026

1.3.0

May 11, 2026

1.2.0

Apr 30, 2026

1.1.0

Apr 30, 2026

1.0.2

Apr 28, 2026

1.0.1

Apr 28, 2026

1.0.0

Apr 10, 2026

0.3.0

Mar 30, 2026

0.2.1

Oct 25, 2025

0.2.0

Oct 6, 2025

0.1.2

Jul 6, 2025

This version

0.1.1.post1

Jul 6, 2025

0.1.0

Jul 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octorun-0.1.1.post1.tar.gz (25.0 kB view details)

Uploaded Jul 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

octorun-0.1.1.post1-py3-none-any.whl (14.6 kB view details)

Uploaded Jul 6, 2025 Python 3

File details

Details for the file octorun-0.1.1.post1.tar.gz.

File metadata

Download URL: octorun-0.1.1.post1.tar.gz
Upload date: Jul 6, 2025
Size: 25.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for octorun-0.1.1.post1.tar.gz
Algorithm	Hash digest
SHA256	`857d1f481fec422d00911f7a4208e0587db31638f232581292cacb17bd9fe057`
MD5	`4df029b98c48bf62f178ed489e2bc18e`
BLAKE2b-256	`f3be38193f1cbf85249032cd00e1d345a7eaefa1a8e63a5d5e9f70a0cb912551`

See more details on using hashes here.

Provenance

The following attestation bundles were made for octorun-0.1.1.post1.tar.gz:

Publisher: publish.yml on HarborYuan/OctoRun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: octorun-0.1.1.post1.tar.gz
- Subject digest: 857d1f481fec422d00911f7a4208e0587db31638f232581292cacb17bd9fe057
- Sigstore transparency entry: 264674051
- Sigstore integration time: Jul 6, 2025
Source repository:
- Permalink: HarborYuan/OctoRun@75d2d1725c00907a15219bc8d2656ee79bf7cbf7
- Branch / Tag: refs/tags/v0.1.1.post1
- Owner: https://github.com/HarborYuan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@75d2d1725c00907a15219bc8d2656ee79bf7cbf7
- Trigger Event: release

File details

Details for the file octorun-0.1.1.post1-py3-none-any.whl.

File metadata

Download URL: octorun-0.1.1.post1-py3-none-any.whl
Upload date: Jul 6, 2025
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for octorun-0.1.1.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ebc7607c377f54c0549095110cd0db7ee9354a9393ea14b2504ea41994ccfcfa`
MD5	`c1663144129dcd59aec0ca370bf0abf4`
BLAKE2b-256	`fdfacec9c544752101c21bdf419c152750338b518c2e596323a1512e1d2ddf2e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for octorun-0.1.1.post1-py3-none-any.whl:

Publisher: publish.yml on HarborYuan/OctoRun

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: octorun-0.1.1.post1-py3-none-any.whl
- Subject digest: ebc7607c377f54c0549095110cd0db7ee9354a9393ea14b2504ea41994ccfcfa
- Sigstore transparency entry: 264674053
- Sigstore integration time: Jul 6, 2025
Source repository:
- Permalink: HarborYuan/OctoRun@75d2d1725c00907a15219bc8d2656ee79bf7cbf7
- Branch / Tag: refs/tags/v0.1.1.post1
- Owner: https://github.com/HarborYuan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@75d2d1725c00907a15219bc8d2656ee79bf7cbf7
- Trigger Event: release

octorun 0.1.1.post1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

🐙 OctoRun

📋 Overview

✨ Key Features

🚀 Installation

Quick Run via uv (Without Installation)

Via uv (Installation, Globally)

Via uv (Install in Your Own Project)

Via pip

⚡ Quick Start

1️⃣ Create Configuration

2️⃣ Run Your Script

3️⃣ Monitor GPU Usage

4️⃣ View Logs

⚙️ Configuration

📄 Basic Configuration

🔧 Configuration Options

🎯 Using Kwargs

📋 Configuration File

🖥️ Command Line Interface

🎯 Priority

🔧 Script Implementation

🎮 Commands

🚀 run (r)

💾 save_config (s)

🔍 list_gpus (l)

📚 Examples

🤖 Example 1: Machine Learning Training

📊 Example 2: Data Processing

📊 Monitoring and Logging

📊 Real-time Monitoring

🛠️ Error Handling

📋 Requirements

🤝 Contributing

📄 License

👨‍💻 Author

🙏 Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

🚀 `run` (r)

💾 `save_config` (s)

🔍 `list_gpus` (l)