Skip to main content

A library on top of either pex or conda-packto make your Python code easily available on a cluster

Project description

cluster-pack

cluster-pack is a library on top of either pex or conda-pack to make your Python code easily available on a cluster.

Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.

The first examples use Skein (a simple library for deploying applications on Apache YARN) and PySpark with HDFS storage. We intend to add more examples for other applications (like Dask, Ray) and S3 storage.

An introducing blog post can be found here.

cluster-pack

Installation

Install with Pip

$ pip install cluster-pack

Install from source

$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .

Prerequisites

cluster-pack supports Python ≥3.6.

Features

  • Ships a package with all the dependencies from your current virtual environment or your conda environment

  • Stores metadata for an environment

  • Supports "under development" mode by taking advantage of pip's editable installs mode, all editable requirements will be uploaded all the time, making local changes directly visible on the cluster

  • Interactive (Jupyter notebook) mode

  • Provides config helpers to directly use the uploaded zip file inside your application

  • Launching jobs from jobs by propagating all artifacts

Basic examples with skein

  1. Interactive mode

  2. Self shipping project

Basic examples with PySpark

  1. PySpark with HDFS on Yarn

  2. Docker with PySpark on S3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

cluster_pack-0.2.19-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file cluster_pack-0.2.19-py3-none-any.whl.

File metadata

  • Download URL: cluster_pack-0.2.19-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for cluster_pack-0.2.19-py3-none-any.whl
Algorithm Hash digest
SHA256 dfcd98f11121614ebdcd366c27f35061d94f7e8974bf87cd20094673701ce0ba
MD5 e47900954af11e35aafb754bb16fd6e7
BLAKE2b-256 cdc032bc8b846ea67f68f467b940f58496ad600df10f7e0dd6486759451f94d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page