Skip to main content

Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook

Project description

SparkMonitor

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks

About

+ =
SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface.

jobdisplay

Requirements

  • Jupyter Lab 4 OR Jupyter Notebook 4.4.0 or higher
  • pyspark 2 or 3

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time

Quick Start

Setting up the extension

pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# For use with jupyter notebook install and enable the nbextension
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable  sparkmonitor --py

# The jupyterlab extension is automatically enabled

With the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)

If you already have your own spark configuration, you will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener_<scala_version>.jar

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.<X>/site-packages/sparkmonitor/listener_<scala_version>.jar')\
        .getOrCreate()

Development

If you'd like to develop the extension:

# See package.json scripts for building the frontend
yarn run build:<action>

# Install the package in editable mode
pip install -e .

# Symlink jupyterlab extension
jupyter labextension develop --overwrite .

# Watch for frontend changes
yarn run watch

# Build the spark JAR files
sbt +package

History

Changelog

This repository is published to pypi as sparkmonitor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmonitor-3.0.4.tar.gz (14.0 MB view details)

Uploaded Source

Built Distribution

sparkmonitor-3.0.4-py3-none-any.whl (13.9 MB view details)

Uploaded Python 3

File details

Details for the file sparkmonitor-3.0.4.tar.gz.

File metadata

  • Download URL: sparkmonitor-3.0.4.tar.gz
  • Upload date:
  • Size: 14.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sparkmonitor-3.0.4.tar.gz
Algorithm Hash digest
SHA256 4a0cd8d92e3d1114fc4e78f81978bc0cc4d83b6bc7ed6bee0f559a1d8ab448a3
MD5 c671b1f38c69d48a437faa58b1d80827
BLAKE2b-256 efabf81acb3cba2ab78a25a361e59b2a2401be581b9631506789846f5fe5b89e

See more details on using hashes here.

File details

Details for the file sparkmonitor-3.0.4-py3-none-any.whl.

File metadata

  • Download URL: sparkmonitor-3.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for sparkmonitor-3.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ca964d81d7a073204c698fd467b56286a8f46fdf2f8ca3cfd751e579ab430cbb
MD5 06bfec7a9270ff0f4cf0fb726b199a1c
BLAKE2b-256 515ccc75381795856c68aeaa2debb3cf8c20316b75fcbd46f8375330affb78b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page