A command-line tool for distributed parallel execution across multiple GPUs
Project description
๐ OctoRun
Distributed Parallel Execution Made Simple
Run Python scripts across multiple GPUs with intelligent chunk management, failure recovery, and live job monitoring
Overview
OctoRun dispatches one Python worker process per GPU, divides your dataset into chunks, and coordinates work across multiple machines through shared lock files. Each worker independently claims a chunk, processes it, and writes its output atomically โ making it safe to run OctoRun on many nodes simultaneously without any central coordinator.
Key Features
- Automatic GPU Detection โ detects and allocates all available GPUs
- Chunk-Based Work Distribution โ divides keys by stride, not by file, so any
total_chunksvalue works regardless of input sharding - Failure Recovery โ stale locks (heartbeat timeout 5 min) are automatically reclaimed; configurable retries
- Multi-Machine Safe โ multiple OctoRun instances on different nodes share the same lock directory on a shared filesystem
- Live Job Monitoring โ
octorun statusshows active sessions, completed chunks, and stale locks at a glance
Installation
# Via uv (recommended)
uv tool install octorun
# In a project venv
uv add octorun
# Via pip
pip install octorun
Optional extras:
# GPU benchmark tooling (requires PyTorch)
pip install "octorun[benchmark]"
Quick Start
# 1. Generate a default config
octorun save_config --script ./your_script.py
# 2. Run across all available GPUs
octorun run --config config.json
# 3. Monitor progress from any machine
octorun status ./logs
Commands
run (r)
Launch workers across all available GPUs.
octorun run --config config.json
octorun run --config config.json --kwargs '{"batch_size": 64}'
CLI --kwargs override any kwargs values from the config file.
status (st)
Show a live summary of a running or completed job.
octorun status <log_dir>
octorun status <log_dir> --alive-threshold 120
log_dir is the same directory you set as log_dir in your config. It contains the *_session_*.log files and the locks/ subdirectory.
Example output:
OctoRun Job Status โ ./logs/stage3
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Locks total : 164
Completed : 29
Active : 48 (6 sessions)
Stale locks : 87 (dead workers, will be auto-reclaimed)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Active sessions (6):
node1 8 chunks [25, 26, 28, 32, ...] (heartbeat 43s ago)
node2 8 chunks [16, 18, 20, 21, ...] (heartbeat 47s ago)
...
Dead sessions โ stale locks (4 node(s)):
node3 8 chunks [161, 162, ...] (last seen 1h31m ago)
...
| Field | Meaning |
|---|---|
| Locks total | Lock files currently on disk (active + stale) |
| Completed | Chunks that finished successfully (persistent .completed markers) |
| Active | Chunks held by workers with a live heartbeat |
| Stale locks | Locks from crashed/killed workers โ reclaimed automatically when the next chunk finishes on any active node |
--alive-threshold (default 300 s) sets how long a session can be silent before it is considered dead.
save_config (s)
Write a default config.json to the current directory.
octorun save_config --script ./your_script.py
list_gpus (l)
List available GPUs.
octorun list_gpus
octorun list_gpus --detailed
benchmark (b)
Continuously measure GPU TFLOPs and communication bandwidth.
octorun benchmark
octorun benchmark --gpus 0,1,2 --test-duration 10 --interval 30
Requires octorun[benchmark].
Configuration
Generate a starter config with octorun save_config, then edit as needed:
{
"script_path": "your_script.py",
"gpus": "auto",
"total_chunks": 1000,
"log_dir": "./logs",
"chunk_lock_dir": "./logs/locks",
"monitor_interval": 60,
"restart_failed": true,
"max_retries": 2,
"kwargs": {
"input_dir": "/data/input",
"output_dir": "/data/output",
"batch_size": 32
}
}
| Option | Description | Default |
|---|---|---|
script_path |
Path to your Python worker script | โ |
gpus |
"auto" or a list of GPU IDs |
"auto" |
total_chunks |
Total number of chunks to divide work into | 128 |
log_dir |
Directory for session and chunk logs | "./logs" |
chunk_lock_dir |
Directory for lock files | "./logs/locks" |
monitor_interval |
Seconds between process-status checks | 60 |
restart_failed |
Retry failed chunks | false |
max_retries |
Maximum retries per chunk | 3 |
success_codes |
Worker exit codes treated as success | [0] |
kwargs |
Extra arguments forwarded to your script | {} |
success_codes
By default a chunk is considered complete only when the worker exits with code 0.
Add additional codes here when your worker has a known-benign non-zero exit. For
example, if the worker calls sys.exit(0) but the CPython interpreter shutdown
phase fails (often 120 due to atexit / stdout-flush errors on slow networked
filesystems), the chunk's data is already written and you can safely treat 120
as success:
"success_codes": [0, 120]
Writing a Worker Script
--gpu_id is a local rank, not a device index
OctoRun does not set CUDA_VISIBLE_DEVICES or any other GPU environment variable.
--gpu_id is simply the index of this worker among the workers launched on the current
machine โ i.e. a local_rank. If you launch with gpus: [0, 1, 2, 3], four workers
start with --gpu_id 0, 1, 2, 3 respectively, but OctoRun leaves device selection
entirely to your script.
Common patterns:
# Pattern A โ direct device index (works when no other processes use the GPUs)
torch.cuda.set_device(args.gpu_id)
# Pattern B โ set CUDA_VISIBLE_DEVICES before any CUDA init so the process
# only sees one device and always addresses it as cuda:0.
# Do this at the very top of the script, before importing torch/transformers.
import os, argparse
_p = argparse.ArgumentParser(add_help=False)
_p.add_argument("--gpu_id", type=int, default=0)
_early, _ = _p.parse_known_args()
os.environ["CUDA_VISIBLE_DEVICES"] = str(_early.gpu_id)
# Now import torch โ it will only see one GPU
import torch
model = model.to("cuda:0")
Pattern B is strongly recommended when loading large models (transformers, etc.) because
many libraries initialize CUDA at import time and will claim the wrong device if
CUDA_VISIBLE_DEVICES is not set beforehand.
Minimal script template
Your script must accept three OctoRun arguments plus any custom ones:
import argparse
def parse_args():
p = argparse.ArgumentParser()
# Required by OctoRun โ gpu_id is a local_rank, not a device index
p.add_argument("--gpu_id", type=int, required=True)
p.add_argument("--chunk_id", type=int, required=True)
p.add_argument("--total_chunks", type=int, required=True)
# Your own arguments (forwarded via kwargs)
p.add_argument("--input_dir", type=str, required=True)
p.add_argument("--output_dir", type=str, required=True)
p.add_argument("--batch_size", type=int, default=32)
return p.parse_args()
def main():
args = parse_args()
# Shard your dataset by stride
all_keys = sorted(load_all_keys(args.input_dir))
my_keys = all_keys[args.chunk_id::args.total_chunks]
# Skip if already done (idempotent)
output_path = get_output_path(args.output_dir, args.chunk_id, args.total_chunks)
if output_path.exists():
return
# Process and write atomically
results = process(my_keys, gpu=args.gpu_id, batch_size=args.batch_size)
write_atomic(results, output_path)
if __name__ == "__main__":
main()
Multi-Machine Usage
Run OctoRun on each machine, pointing at the same shared log_dir and chunk_lock_dir:
# machine A
octorun run --config config.json # claims chunks 0, 1, 2, ...
# machine B (simultaneously)
octorun run --config config.json # claims the next available chunks
Lock files on the shared filesystem prevent any chunk from being processed twice. Monitor all machines from anywhere:
octorun status ./logs
Shared filesystem requirements
chunk_lock_dir relies on O_CREAT | O_EXCL having atomic, cross-client
semantics. Use a POSIX-compliant filesystem (local disk, NFS, CephFS, etc.).
Do not place chunk_lock_dir on object-storage-backed filesystems such as
HDFS-fuse or s3fs โ they typically fall back to non-atomic
"stat-then-create" sequences and have client-side metadata caching, both of
which let two workers acquire the same lock. Output data can still live on
HDFS / S3, but locking requires a real filesystem. If you must run on such
storage, ensure your worker writes outputs idempotently (tmp + rename) and
treat the lock as best-effort rather than a correctness guarantee.
Contributing
Fork the repository, create a feature branch, and open a pull request.
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file octorun-1.0.2.tar.gz.
File metadata
- Download URL: octorun-1.0.2.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86ca4813260b30033948ed08adc849be58f174680034aa828121d3bdfa70f8bd
|
|
| MD5 |
44f18174479518081cdcb6f653235fa7
|
|
| BLAKE2b-256 |
5d69a18f6722cbea3c6cf92c303e4f974cd7175a11d9c48f395db267fef99b0b
|
Provenance
The following attestation bundles were made for octorun-1.0.2.tar.gz:
Publisher:
publish.yml on HarborYuan/OctoRun
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
octorun-1.0.2.tar.gz -
Subject digest:
86ca4813260b30033948ed08adc849be58f174680034aa828121d3bdfa70f8bd - Sigstore transparency entry: 1397798836
- Sigstore integration time:
-
Permalink:
HarborYuan/OctoRun@d91a336a2828fa49be59a6bfb70709b5945f3231 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/HarborYuan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d91a336a2828fa49be59a6bfb70709b5945f3231 -
Trigger Event:
release
-
Statement type:
File details
Details for the file octorun-1.0.2-py3-none-any.whl.
File metadata
- Download URL: octorun-1.0.2-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
665455c43668e9a5e83abec3e5381509788a5c54919e55960d9ade8c6f4284d4
|
|
| MD5 |
0e60ca4c492c7bdbd0d782e5732ff00d
|
|
| BLAKE2b-256 |
91788ae45b1124406e221f7588b974ce0824fc7075a69fddf1441120d24baf43
|
Provenance
The following attestation bundles were made for octorun-1.0.2-py3-none-any.whl:
Publisher:
publish.yml on HarborYuan/OctoRun
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
octorun-1.0.2-py3-none-any.whl -
Subject digest:
665455c43668e9a5e83abec3e5381509788a5c54919e55960d9ade8c6f4284d4 - Sigstore transparency entry: 1397798860
- Sigstore integration time:
-
Permalink:
HarborYuan/OctoRun@d91a336a2828fa49be59a6bfb70709b5945f3231 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/HarborYuan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d91a336a2828fa49be59a6bfb70709b5945f3231 -
Trigger Event:
release
-
Statement type: