Parallel execution of DVC stages
Project description
paraffin
Paraffin, derived from the Latin phrase parum affinis
meaning
little related
, is a Python package designed to run DVC
stages in parallel. While DVC does not currently support this directly, Paraffin
provides an effective workaround. For more details, refer to the DVC
documentation on
parallel stage execution.
[!WARNING]
paraffin
is still very experimental. Do not use it for production workflows.
Installation
Install Paraffin via pip:
pip install paraffin
Usage
The paraffin submit
command mirrors dvc repro
, enabling you to queue and execute your entire pipeline or selected stages with parallelization.
If no parameters are specified, the entire graph will be queued and executed via dvc repro --single-item
.
paraffin submit <stage name> <stage name> ... <stage name>
# Example: run with a maximum of 4 parallel jobs
paraffin worker --concurrency=4
Parallel Execution
Due to limitations in Celery’s graph handling (see Celery discussion), complete parallelization is not always achievable. Paraffin will display parallel-ready stages in a flowchart format. All stages are visualized in a Mermaid flowchart.
flowchart TD
subgraph Level0:1
A_X_ParamsToOuts
A_X_ParamsToOuts_1
A_Y_ParamsToOuts
A_Y_ParamsToOuts_1
end
subgraph Level0:2
A_X_AddNodeNumbers
A_Y_AddNodeNumbers
end
subgraph Level0:3
A_SumNodeAttributes
end
Level0:1 --> Level0:2
Level0:2 --> Level0:3
subgraph Level1:1
B_X_ParamsToOuts
B_X_ParamsToOuts_1
B_Y_ParamsToOuts
B_Y_ParamsToOuts_1
end
subgraph Level1:2
B_X_AddNodeNumbers
B_Y_AddNodeNumbers
end
subgraph Level1:3
B_SumNodeAttributes
end
Level1:1 --> Level1:2
Level1:2 --> Level1:3
Queue Labels
To fine-tune execution, you can assign stages to specific Celery queues, allowing you to manage execution across different environments or hardware setups.
Define queues in a paraffin.yaml
file:
queue:
"B_X*": BQueue
"A_X_AddNodeNumbers": AQueue
Then, start a worker with specified queues, such as celery (default) and AQueue:
paraffin worker -q AQueue,celery
All stages
not assigned to a queue in paraffin.yaml
will default to the celery
queue.
[!TIP] If you are building Python-based workflows with DVC, consider trying our other project ZnTrack for a more Pythonic way to define workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file paraffin-0.2.0.tar.gz
.
File metadata
- Download URL: paraffin-0.2.0.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f45c2e8971e4da9e1edbb3e4d82cba709ed25ad26feae86f7872030081ac1f5 |
|
MD5 | c84e01770ed05103cd3d43db55e64370 |
|
BLAKE2b-256 | c5157689778c14759d09eb7d2ae369483a1c43a2f919d52f34509c8331d843e2 |
File details
Details for the file paraffin-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: paraffin-0.2.0-py3-none-any.whl
- Upload date:
- Size: 13.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a117df97c80f5445b57c02348b7ea57f3334edd25f4aa3c885d08b8ebd7e3e15 |
|
MD5 | 316b7c2ccb534a88cdbcd2ca561c97da |
|
BLAKE2b-256 | 4808b12f0427dbd297864410362cbf2b1f9a1967ac0045342e10e6d35e7a94db |