Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook
Project description
SparkMonitor
An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks
About
+ | = |
Requirements
- Jupyter Lab 4 OR Jupyter Notebook 4.4.0 or higher
- pyspark 2 or 3
Features
- Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
- A table of jobs and stages with progressbars
- A timeline which shows jobs, stages, and tasks
- A graph showing number of active tasks & executor cores vs time
Quick Start
Setting up the extension
pip install sparkmonitor # install the extension
# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py
# For use with jupyter notebook install and enable the nbextension
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable sparkmonitor --py
# The jupyterlab extension is automatically enabled
With the extension installed, a SparkConf
object called conf
will be usable from your notebooks. You can use it as follows:
from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)
If you already have your own spark configuration, you will need to set spark.extraListeners
to sparkmonitor.listener.JupyterSparkMonitorListener
and spark.driver.extraClassPath
to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener_<scala_version>.jar
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
.config('spark.driver.extraClassPath', 'venv/lib/python3.<X>/site-packages/sparkmonitor/listener_<scala_version>.jar')\
.getOrCreate()
Development
If you'd like to develop the extension:
# See package.json scripts for building the frontend
yarn run build:<action>
# Install the package in editable mode
pip install -e .
# Symlink jupyterlab extension
jupyter labextension develop --overwrite .
# Watch for frontend changes
yarn run watch
# Build the spark JAR files
sbt +package
History
-
This project was originally written by krishnan-r as a Google Summer of Code project for Jupyter Notebook with the SWAN Notebook Service team at CERN.
-
Further fixes and improvements were made by the team at CERN and members of the community maintained at swan-cern/jupyter-extensions/tree/master/SparkMonitor
-
Jafer Haider created the fork jupyterlab-sparkmonitor to update the extension to be compatible with JupyterLab as part of his internship at Yelp.
-
This repository merges all the work done above and provides support for Lab & Notebook from a single package.
Changelog
This repository is published to pypi as sparkmonitor
-
2.x see the github releases page of this repository
-
1.x and below were published from swan-cern/jupyter-extensions and some initial versions from krishnan-r/sparkmonitor
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sparkmonitor-3.0.2.tar.gz
.
File metadata
- Download URL: sparkmonitor-3.0.2.tar.gz
- Upload date:
- Size: 5.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6c3a4839985a2e7d30773839c2692566c0f28e0e941114aa7b25ebf3b1e9d46 |
|
MD5 | 3d1d1512a2c5e8774b6010c169cbc98b |
|
BLAKE2b-256 | ef114af645bd79cdf225d3d2aa09c0e0fd40017daa27351fd6e100cc998f31f6 |
File details
Details for the file sparkmonitor-3.0.2-py3-none-any.whl
.
File metadata
- Download URL: sparkmonitor-3.0.2-py3-none-any.whl
- Upload date:
- Size: 5.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d30c2e0cf70584403c6d27537cf732952c3f1cdd88b5d9586644dc8f8f54d344 |
|
MD5 | dbbf4bad35f380c1d4cea747a9a1fdef |
|
BLAKE2b-256 | b5cc9f9a177decac88cabfc21119e47306e7cb976359927bd4206ded09d24929 |