Helper to connect to CERN's Spark Clusters
SparkMonitor is an extension for Jupyter that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.
This extension is composed of a Python package named
sparkmonitor, which installs the nbextension, Kernel extension and a NPM package named
@swan-cern/sparkmonitor for the JupyterLab extension (still under development).
- JupyterLab >= 2.0
- PySpark on Apache Spark version 2.1.1 or higher
- Jupyter Notebook version 4.4.0 or higher
- SBT to compile the Scala listener
Note: You will need NodeJS to install the extension.
pip install sparkmonitor jupyter nbextension install sparkmonitor --py jupyter nbextension enable sparkmonitor --py jupyter serverextension enable --py --system sparkmonitor # this should happen automatically jupyter lab build
To enable the Kernel extension, create the default profile configuration files (Skip if config file already exists) and configure the kernel to load the extension on startup. This is added to the configuration files in users home directory.
ipython profile create echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py
To use the extension, it is necessary to set the monitor in the Spark configuration, like so:
spark.extraListeners = sparkmonitor.listener.JupyterSparkMonitorListener # Pick one of the following: # For Spark 2 park.driver.extraClassPath = /usr/local/lib/sparkmonitor/listener_2.11.jar #lives inside the sparkmonitor module # For Spark 3 park.driver.extraClassPath = /usr/local/lib/sparkmonitor/listener_2.12.jar #lives inside the sparkmonitor module
To ease the configuration, and if the kernel extension is correctly installed, you should have the variable
swan_spark_conf available from inside your notebook with everything already set.
To use it, just configure SparkContext like so:
from pyspark import SparkContext sc = SparkContext.getOrCreate(conf=swan_spark_conf) #Start the spark context rdd = sc.parallelize([1, 2, 4, 8]) rdd.count()
Check if the server and nb extension are correctly installed:
jupyter nbextension list jupyter serverextension list
If the problem is with the kernel extension, check the logs to see if it was loaded or if there was any problem with the ipython profile.
If you are not seeing the frontend JupyterLab extension, check if it's installed:
jupyter labextension list
If it is installed, try:
jupyter lab clean jupyter lab build
jlpm command is JupyterLab's pinned version of
yarn that is installed with JupyterLab. You may use
npm in lieu of
# Clone the repo to your local environment # Move to sparkmonitor directory # Install server extension # This will also build the js code pip install -e . # Install and enable the nbextension jupyter nbextension install sparkmonitor --py --sys-prefix jupyter nbextension enable sparkmonitor --py --sys-prefix # Link your development version of the extension with JupyterLab jupyter labextension link . # Rebuild JupyterLab after making any changes jupyter lab build # Rebuild Typescript source after making changes jlpm build # Rebuild JupyterLab after making any changes jupyter lab build
You can watch the source directory and run JupyterLab in watch mode to watch for changes in the extension's source and automatically rebuild the extension and application.
# Watch the source directory in another terminal tab jlpm watch # Run jupyterlab in watch mode in one terminal tab jupyter lab --watch
pip uninstall sparkmonitor jupyter labextension uninstall @swan-cern/sparkmonitor
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size sparkmonitor-1.1.0-py3-none-any.whl (3.3 MB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size sparkmonitor-1.1.0.tar.gz (3.2 MB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for sparkmonitor-1.1.0-py3-none-any.whl