Use dask to run the DVC graph
Project description
Dask4DVC - Distributed Node Exectuion
DVC provides tools for building and executing the computational graph locally through various methods.
The dask4dvc
package combines Dask Distributed with DVC to make it easier to use with HPC managers like Slurm.
The dask4dvc
package will try to run the DVC graph in parallel.
Usage
Dask4DVC provides a CLI similar to DVC.
dvc repro
becomesdask4dvc repro
.
SLURM Cluster
You can use dask4dvc
easily with a slurm cluster.
This requires a running dask scheduler:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(
cores=1, memory='128GB',
queue="gpu",
processes=1,
walltime='8:00:00',
job_cpu=1,
job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
scheduler_options={"port": 31415}
)
cluster.adapt()
with this setup you can then run dask4dvc repro --address 127.0.0.1:31415
on the example port 31415
.
You can also use config files with dask4dvc repro --config myconfig.yaml
.
default:
SGECluster:
queue: regular
cores: 10
memory: 16 GB
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dask4dvc-0.2.0.tar.gz
(10.2 kB
view hashes)
Built Distribution
dask4dvc-0.2.0-py3-none-any.whl
(12.2 kB
view hashes)