Skip to main content

This library allows for the easy construction and management of Dask clusters from a Git repository via a simple context manager.

Project description

Distributed Runner

This library allows for the easy construction and management of Dask clusters from a Git repository via a simple context manager.

license activity Code style: black

Dask Git GitLab Linux Python

Installation

pip install distrunner

Usage

In your scheduler (Airfow etc.) use this:

from distrunner import DistRunner

with DistRunner(
    workers=5,
    python_version="3.10",
    repo="https://gitlab.com/crossref/labs/task-test.git",
    entry_module="task",
    entry_point="entry_point",
    requirements_file="requirements.txt",
    local=False,
    retries=3,
    worker_memory=16384,
    worker_cpus=4096,
) as dr:
    logging.basicConfig(level=logging.INFO)

    dr.run()

The "local" flag will determine whether a remote cluster is created.

The code in the git repository at the module and entry point that you specify will be called, passing the DaskRunner object. You can use this, then, to obtain a Dask client by calling cldr.client.

You will need to set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables to use the Fargate clusters.

Features

  • Context manager handling of Dask Fargate clusters with scale-to-zero on complete
  • Easy ability to switch between local and distributed/remote development
  • Simple deployment from a git repository including all requirements
  • Bugfixes to Dask AWS 2022.10.0 to suppress errors in weakref finalizers

What it Does

This library allows you to bootstrap a git repository into a distributed computation environment. It will install all the needed dependencies into the current virtual environment and sync these with workers. Your code's entrypoint will be called with access to a Dask Client object.

Credits

Copyright © Crossref 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distrunner-0.0.46.tar.gz (15.6 kB view hashes)

Uploaded Source

Built Distribution

distrunner-0.0.46-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page