A library on top of pex to make your Python code easily available on a cluster
Project description
cluster-pack
cluster-pack is a library on top of either pex to make your Python code easily available on a cluster.
Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.
The first examples use Skein (a simple library for deploying applications on Apache YARN) and PySpark with HDFS storage. We intend to add more examples for other applications (like Dask, Ray) and S3 storage.
An introducing blog post can be found here.
Installation
Install with Pip
$ pip install cluster-pack
Install from source
$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .
Prerequisites
cluster-pack supports Python ≥3.9.
Feature flags
- C_PACK_USER: override the current user for HDFS path generation and Skein impersonation
- When set, this value is used instead of the system user (from
getpass.getuser()) - Useful for running jobs as a different user or in environments where the system user doesn't match the HDFS user
- If not set or empty, falls back to the current system user
- When set, this value is used instead of the system user (from
Features
-
Ships a package with all the dependencies from your current virtual environment
-
Stores metadata for an environment
-
Supports "under development" mode by taking advantage of pip's editable installs mode, all editable requirements will be uploaded all the time, making local changes directly visible on the cluster
-
Interactive (Jupyter notebook) mode
-
Provides config helpers to directly use the uploaded zip file inside your application
-
Launching jobs from jobs by propagating all artifacts
Basic examples with skein
Basic examples with PySpark
Info
Conda is no longer supported
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cluster_pack-0.3.17.post2-py3-none-any.whl.
File metadata
- Download URL: cluster_pack-0.3.17.post2-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7762bfd5dabeb3fd633da154a7796e3af21135447df485d3e29a97f568dc153c
|
|
| MD5 |
40a5b6e887cb59629f068a523e07c7d6
|
|
| BLAKE2b-256 |
338963993200aef9f8044d6fbd8709b59d34b86607d063d44e0f51dfcfee3648
|