Skip to main content

Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook

Project description

SparkMonitor

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks

About

+ =
SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface.

jobdisplay

Requirements

  • Jupyter Lab 3 OR Jupyter Notebook 4.4.0 or higher
  • pyspark 2 or 3

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time

Quick Start

Setting up the extension

pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# For use with jupyter notebook install and enable the nbextension
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable  sparkmonitor --py

# The jupyterlab extension is automatically enabled

With the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)

If you already have your own spark configuration, you will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener_<scala_version>.jar

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.<X>/site-packages/sparkmonitor/listener_<scala_version>.jar')\
        .getOrCreate()

Development

If you'd like to develop the extension:

# See package.json scripts for building the frontend
yarn run build:<action>

# Install the package in editable mode
pip install -e .

# Symlink jupyterlab extension
jupyter labextension develop --overwrite .

# Watch for frontend changes
yarn run watch

# Build the spark JAR files
sbt +package

History

Changelog

This repository is published to pypi as sparkmonitor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmonitor-2.0.0.tar.gz (3.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparkmonitor-2.0.0-py3-none-any.whl (3.8 MB view details)

Uploaded Python 3

File details

Details for the file sparkmonitor-2.0.0.tar.gz.

File metadata

  • Download URL: sparkmonitor-2.0.0.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for sparkmonitor-2.0.0.tar.gz
Algorithm Hash digest
SHA256 56b260df43adb1e0bf1525e0a92dcad3a3815e6c7efa59739ae008a60ddbe865
MD5 dde535e36d9cd260fef11db84b1a922f
BLAKE2b-256 359933c5a51614afb939846b81d930827b557b557b4b24685b39bd71ea27a3be

See more details on using hashes here.

File details

Details for the file sparkmonitor-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: sparkmonitor-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for sparkmonitor-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ff67cc69399b7a0c58d18b595e3df6bbc95ecee25aa3f629842f9ffa27ecb03
MD5 da9a286a7eb7517dd23a550029244d55
BLAKE2b-256 d07e016102a11a21dc75819d764effdcf98099e706f8fed27fd92ac4bebb3424

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page