Skip to main content

Parallel execution of DVC stages

Project description

zincware PyPI version Discord

[!WARNING] celery can not handle large workflows and will crash your computer. See https://github.com/celery/celery/issues/9475. Therefore, paraffin should currently not be used for large workflows. We are working on a solution without relying on celery.

paraffin

Paraffin, derived from the Latin phrase parum affinis meaning little related, is a Python package designed to run DVC stages in parallel. While DVC does not currently support this directly, Paraffin provides an effective workaround. For more details, refer to the DVC documentation on parallel stage execution.

[!WARNING] paraffin is still very experimental. Do not use it for production workflows.

Installation

Install Paraffin via pip:

pip install paraffin

Usage

The paraffin submit command mirrors dvc repro, enabling you to queue and execute your entire pipeline or selected stages with parallelization. If no parameters are specified, the entire graph will be queued and executed via dvc repro --single-item.

paraffin submit <stage name> <stage name> ... <stage name>
# Example: run with a maximum of 4 parallel jobs
paraffin worker --concurrency=4

Parallel Execution

Due to limitations in Celery’s graph handling (see Celery discussion), complete parallelization is not always achievable. Paraffin will display parallel-ready stages in a flowchart format. All stages are visualized in a Mermaid flowchart.

flowchart TD
        subgraph Level0:1
                A_X_ParamsToOuts
                A_X_ParamsToOuts_1
                A_Y_ParamsToOuts
                A_Y_ParamsToOuts_1
        end
        subgraph Level0:2
                A_X_AddNodeNumbers
                A_Y_AddNodeNumbers
        end
        subgraph Level0:3
                A_SumNodeAttributes
        end
        Level0:1 --> Level0:2
        Level0:2 --> Level0:3
        subgraph Level1:1
                B_X_ParamsToOuts
                B_X_ParamsToOuts_1
                B_Y_ParamsToOuts
                B_Y_ParamsToOuts_1
        end
        subgraph Level1:2
                B_X_AddNodeNumbers
                B_Y_AddNodeNumbers
        end
        subgraph Level1:3
                B_SumNodeAttributes
        end
        Level1:1 --> Level1:2
        Level1:2 --> Level1:3

Queue Labels

To fine-tune execution, you can assign stages to specific Celery queues, allowing you to manage execution across different environments or hardware setups. Define queues in a paraffin.yaml file:

queue:
    "B_X*": BQueue
    "A_X_AddNodeNumbers": AQueue

Then, start a worker with specified queues, such as celery (default) and AQueue:

paraffin worker -q AQueue,celery

All stages not assigned to a queue in paraffin.yaml will default to the celery queue.

[!TIP] If you are building Python-based workflows with DVC, consider trying our other project ZnTrack for a more Pythonic way to define workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paraffin-0.3.2a4.tar.gz (640.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paraffin-0.3.2a4-py3-none-any.whl (643.1 kB view details)

Uploaded Python 3

File details

Details for the file paraffin-0.3.2a4.tar.gz.

File metadata

  • Download URL: paraffin-0.3.2a4.tar.gz
  • Upload date:
  • Size: 640.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.5 Darwin/24.1.0

File hashes

Hashes for paraffin-0.3.2a4.tar.gz
Algorithm Hash digest
SHA256 39a8df710262c977b1c637d4dcd153c1b9e4f6b5c0d360d80a5dee1adddef7b7
MD5 a1de5a4db9d930a9af0b421b35f881d9
BLAKE2b-256 b1bfd626f714f36c7cc2a7b139db9b0c6a10bcbd94b9d8554c3ee333db417cf1

See more details on using hashes here.

File details

Details for the file paraffin-0.3.2a4-py3-none-any.whl.

File metadata

  • Download URL: paraffin-0.3.2a4-py3-none-any.whl
  • Upload date:
  • Size: 643.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.5 Darwin/24.1.0

File hashes

Hashes for paraffin-0.3.2a4-py3-none-any.whl
Algorithm Hash digest
SHA256 870e75f6283b5bfccf8fc4b80a9db3f1a5bb2dfcbcf36fb5f1cf0ab7b3f43951
MD5 6ac93045fda3d27bfd0114153fe36970
BLAKE2b-256 270e20b211bf21647a97e995d41e6236f1713252b03bfb7fbb415973011956bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page