Skip to main content

Spark Monitor Extension for Jupyter Lab

Project description

Spark Monitor - An extension for Jupyter Lab

This project was originally written by krishnan-r as a Google Summer of Code project for Jupyter Notebook. Check his website out here.

As a part of my internship as a Software Engineer at Yelp, I created this fork to update the extension to be compatible with JupyterLab - Yelp's choice for sharing and collaborating on notebooks.

About

+ =
SparkMonitor is an extension for Jupyter Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.

jobdisplay

Requirements

  • At least JupyterLab 2.0.0 (necessary to get cell execution metadata)
  • pyspark 2.X.X or older (pyspark 3.X is currently not supported)

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time
  • A notebook server extension that proxies the Spark UI and displays it in an iframe popup for more details
  • For a detailed list of features see the use case notebooks
  • Support for multiple SparkSessions (default port is 4040)
  • How it Works

Quick Start

To do a quick test of the extension

This docker image has pyspark and several other related packages installed alongside the sparkmonitor extension.

docker run -it -p 8888:8888 itsjafer/sparkmonitor

Setting up the extension

jupyter labextension install jupyterlab_sparkmonitor # install the jupyterlab extension
pip install jupyterlab-sparkmonitor # install the server/kernel extension
jupyter serverextension enable --py sparkmonitor

# set up ipython profile and add our kernel extension to it
ipython profile create --ipython-dir=.ipython
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  .ipython/profile_default/ipython_config.py

# run jupyter lab
IPYTHONDIR=.ipython jupyter lab --watch

With the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext

# start the spark context using the SparkConf the extension inserted
sc=SparkContext.getOrCreate(conf=conf) #Start the spark context

# Monitor should spawn under the cell with 4 jobs
sc.parallelize(range(0,100)).count()
sc.parallelize(range(0,100)).count()
sc.parallelize(range(0,100)).count()
sc.parallelize(range(0,100)).count()

If you already have your own spark configuration, you will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener.jar

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.7/site-packages/sparkmonitor/listener.jar')\
        .getOrCreate()

# should spawn 4 jobs in a monitor bnelow the cell
spark.sparkContext.parallelize(range(0,100)).count()
spark.sparkContext.parallelize(range(0,100)).count()
spark.sparkContext.parallelize(range(0,100)).count()
spark.sparkContext.parallelize(range(0,100)).count()

Development

If you'd like to develop the extension:

make venv # Creates a virtual environment using tox
source venv/bin/activate # Make sure we're using the virtual environment
make build # Build the extension
make develop # Run a local jupyterlab with the extension installed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupyterlab-sparkmonitor-2.0.0b0.tar.gz (185.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jupyterlab_sparkmonitor-2.0.0b0-py3-none-any.whl (188.7 kB view details)

Uploaded Python 3

File details

Details for the file jupyterlab-sparkmonitor-2.0.0b0.tar.gz.

File metadata

  • Download URL: jupyterlab-sparkmonitor-2.0.0b0.tar.gz
  • Upload date:
  • Size: 185.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.7

File hashes

Hashes for jupyterlab-sparkmonitor-2.0.0b0.tar.gz
Algorithm Hash digest
SHA256 123d6b6a00a86fe7af4abf678b28610b9d3ba5e9cac3fe737994da6a05188423
MD5 10b4a243447b21bb02352bf646316b85
BLAKE2b-256 87981f3fb248c42610d0c633877ba5b97ab31beaecd7f64df0fe5a969516e65e

See more details on using hashes here.

File details

Details for the file jupyterlab_sparkmonitor-2.0.0b0-py3-none-any.whl.

File metadata

  • Download URL: jupyterlab_sparkmonitor-2.0.0b0-py3-none-any.whl
  • Upload date:
  • Size: 188.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.7

File hashes

Hashes for jupyterlab_sparkmonitor-2.0.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 8489fb0813fcabeeb6f5867e35e390360039603d68decff65fb756c350eea773
MD5 3960a990d3975ff96120740ef18c935b
BLAKE2b-256 30ef5e46bbae1dc8d0832268bcc934a46911523e7e589e7a3d64f9a0ae1a91ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page