A library on top of either pex or conda-packto make your Python code easily available on a cluster
Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.
The first examples use Skein (a simple library for deploying applications on Apache YARN) and PySpark with HDFS storage. We intend to add more examples for other applications (like Dask, Ray) and S3 storage.
An introducing blog post can be found here.
Install with Pip
$ pip install cluster-pack
Install from source
$ git clone https://github.com/criteo/cluster-pack $ cd cluster-pack $ pip install .
cluster-pack supports Python ≥3.6.
Ships a package with all the dependencies from your current virtual environment or your conda environment
Stores metadata for an environment
Supports "under development" mode by taking advantage of pip's editable installs mode, all editable requirements will be uploaded all the time, making local changes directly visible on the cluster
Interactive (Jupyter notebook) mode
Provides config helpers to directly use the uploaded zip file inside your application
Launching jobs from jobs by propagating all artifacts
Basic examples with skein
Basic examples with PySpark
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size cluster_pack-0.2.3-py3-none-any.whl (30.1 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
Hashes for cluster_pack-0.2.3-py3-none-any.whl