A library on top of either pex or conda-packto make your Python code easily available on a cluster
Project description
cluster-pack
cluster-pack is a library on top of either pex or conda-pack to make your Python code easily available on a cluster.
Its goal is to make your prod/dev Python code & libraries easiliy available on any cluster. cluster-pack supports HDFS/S3 as a distributed storage.
The first example uses skein. We will add more examples for other applications (like Pyspark, Dask) with other compute clusters (like mesos, kubernetes) soon.
Installation
$ git clone https://github.com/criteo/cluster-pack
$ cd cluster-pack
$ pip install .
Prerequisites
cluster-pack supports Python ≥3.6.
Features
- ships a package with all the dependencies from your current virtual environment or your conda environment
- provides config helpers to inject those dependencies to your application
- when using pip with pex cluster-pack takes advantage of pip's editable installs mode, all editable requirements will be uploaded all the time separatly, making local changes direclty visible on the cluster, and not requiring to regenerate the packacke with all the dependencies again
Basic example with skein
- Prepare a virtual environment and install the sample project using skein
$ cd examples/skein-project
$ python3.6 -m venv skein_env
$ . skein_env/bin/activate
$ pip install --upgrade pip setuptools
$ pip install -e .
python
- Upload current virtual environment to the distributed storage (HDFS in this case)
import cluster_pack
package_path, _ = cluster_pack.upload_env()
- Call skein config helper to get the config that easily accesses those uploaded packages on the cluster,
skein_project.worker
is the module we want to call remotly (it has been shipped by cluster-pack)
from cluster_pack.skein import skein_config_builder
script = skein_config_builder.get_script(
package_path,
module_name="skein_project.worker")
files = skein_config_builder.get_files(package_path)
- Submit a simple skein application
import skein
with skein.Client() as client:
service = skein.Service(
resources=skein.model.Resources("1 GiB", 1),
files=files,
script=script
)
spec = skein.ApplicationSpec(services={"service": service})
app_id = client.submit(spec)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file cluster_pack-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: cluster_pack-0.0.3-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66c94daaf97e80d7bed8dc1326c9a0035366c5636bd70aa51181d803cd06062e |
|
MD5 | 2f2f88711712f9b93ebe489d7cef139b |
|
BLAKE2b-256 | b8f6f2d7275e2524b76ca45ecd75bf8e5189cf7efec0b57f4596321f7bd53990 |