Skip to main content

A pyspark management framework

Project description

Spark Management Consolidated
=============================

A small module that will load as a singleton class object to manage Spark
related things.

Installation
------------

Directly via ``pip`` on the command line, in a `virtualenv`:

.. code:: shell

pip install https://github.com/matz-e/sparkmanager/tarball/master

or for the current user:

.. code:: shell

pip install --user https://github.com/matz-e/sparkmanager/tarball/master

Usage
-----

The module itself acts as a mediator to Spark:

.. code:: python

import sparkmanager as sm

# Create a new application
sm.create("My fancy name",
[("spark.executor.cores", 4), ("spark.executor.memory", "8g")])

data = sm.spark.range(5)
# Will show up in the UI with the name "broadcasting some data"
with sm.jobgroup("broadcasting some data"):
data = sm.broadcast(data.collect())

The Spark session can be accessed via ``sm.spark``, the Spark context via
``sm.sc``. Both attributes are instantiated once the ``create`` method is
called, with the option to call unambiguous methods from both directly via
the :py:class:`SparkManager` object:

.. code:: python

# The following two calls are equivalent
c = sm.parallelize(range(5))
d = sm.sc.parallelize(range(5))
assert c.collect() == d.collect()

Cluster support scripts
-----------------------

.. note::

Scripts to run on the cluster are still somewhat experimental and should
be used with caution!

Environment setup
~~~~~~~~~~~~~~~~~

To create a self-contained Spark environment, the script provided in
``examples/env.sh`` can be used. It is currently tuned to the requirements of
the `bbpviz` cluster. A usage example:

.. code:: shell

SPARK_ROOT=/path/to/my/spark/installation SM_WORKDIR=/path/to/a/work/directory examples/env.sh

The working directory will contain:

* A Python virtual environment
* A basic Spark configuration pointing to directories within the working
directory
* An environment script to establish the setup

To use the resulting working environment:

.. code:: shell

. /path/to/a/work/directory/env.sh

Spark deployment on allocations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Within a cluster allocation, the script ``sm_cluster`` can be used to start
a Spark cluster. The script will be automatically installed by `pip`. To
use it, pass either a working directory containing an environment or
specify them separately:

.. code:: shell

sm_cluster startup $WORKDIR
sm_cluster startup $WORKDIR /path/to/some/env.sh

Similar, to stop a cluster (not necessary with slurm):

.. code:: shell

sm_cluster shutdown $WORKDIR
sm_cluster shutdown $WORKDIR /path/to/some/env.sh

Spark applications then can connect to a master found via:

.. code:: shell

cat $WORKDIR/spark_master

TL;DR on BlueBrain 5
~~~~~~~~~~~~~~~~~~~~

Setup a Spark environment in your current shell, and point `WORKDIR` to a
shared directory. `SPARK_HOME` needs to be in your environment and point to
your Spark installation. By default, only a file with the Spark master and
the cluster launch script will be copied to `WORKDIR`. Then submit a
cluster with:

.. code:: shell

sbatch -A proj16 -t 24:00:00 -N4 --exclusive -C nvme $(which sm_cluster) startup $WORKDIR


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmanager-0.7.1.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparkmanager-0.7.1-py2.py3-none-any.whl (25.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sparkmanager-0.7.1.tar.gz.

File metadata

  • Download URL: sparkmanager-0.7.1.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for sparkmanager-0.7.1.tar.gz
Algorithm Hash digest
SHA256 09bf28a5afa4feca28535c99a1fe0c86cdf9e1d69431aa7729d3daa3f5358a13
MD5 a952885f97ec0143bbc956806580e604
BLAKE2b-256 629fd4ac8729916d0e8fd76633d9cbd2809dfe066aa272db15cd485ee8e32e50

See more details on using hashes here.

File details

Details for the file sparkmanager-0.7.1-py2.py3-none-any.whl.

File metadata

  • Download URL: sparkmanager-0.7.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for sparkmanager-0.7.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2e61df71d51ab9918b6b0f5bf6f142b6349959a5d5bbacb6aaa0738b803009cb
MD5 300fea93521f76e883b139f17f0dc17f
BLAKE2b-256 0c76b614a39ff0f10e19e4f66d0c1432e1fc96620fa38b13c134481f964eb560

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page