Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook
Project description
SparkMonitor
An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks
About
+ | = |
Requirements
- Jupyter Lab 4 OR Jupyter Notebook 4.4.0 or higher
- pyspark 2 or 3
Features
- Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
- A table of jobs and stages with progressbars
- A timeline which shows jobs, stages, and tasks
- A graph showing number of active tasks & executor cores vs time
Quick Start
Setting up the extension
pip install sparkmonitor # install the extension
# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py
# For use with jupyter notebook install and enable the nbextension
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable sparkmonitor --py
# The jupyterlab extension is automatically enabled
With the extension installed, a SparkConf
object called conf
will be usable from your notebooks. You can use it as follows:
from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)
If you already have your own spark configuration, you will need to set spark.extraListeners
to sparkmonitor.listener.JupyterSparkMonitorListener
and spark.driver.extraClassPath
to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener_<scala_version>.jar
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
.config('spark.driver.extraClassPath', 'venv/lib/python3.<X>/site-packages/sparkmonitor/listener_<scala_version>.jar')\
.getOrCreate()
Development
If you'd like to develop the extension:
# See package.json scripts for building the frontend
yarn run build:<action>
# Install the package in editable mode
pip install -e .
# Symlink jupyterlab extension
jupyter labextension develop --overwrite .
# Watch for frontend changes
yarn run watch
# Build the spark JAR files
sbt +package
History
-
This project was originally written by krishnan-r as a Google Summer of Code project for Jupyter Notebook with the SWAN Notebook Service team at CERN.
-
Further fixes and improvements were made by the team at CERN and members of the community maintained at swan-cern/jupyter-extensions/tree/master/SparkMonitor
-
Jafer Haider created the fork jupyterlab-sparkmonitor to update the extension to be compatible with JupyterLab as part of his internship at Yelp.
-
This repository merges all the work done above and provides support for Lab & Notebook from a single package.
Changelog
This repository is published to pypi as sparkmonitor
-
2.x see the github releases page of this repository
-
1.x and below were published from swan-cern/jupyter-extensions and some initial versions from krishnan-r/sparkmonitor
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sparkmonitor-3.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d30c2e0cf70584403c6d27537cf732952c3f1cdd88b5d9586644dc8f8f54d344 |
|
MD5 | dbbf4bad35f380c1d4cea747a9a1fdef |
|
BLAKE2b-256 | b5cc9f9a177decac88cabfc21119e47306e7cb976359927bd4206ded09d24929 |