Skip to main content

Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook

Project description

SparkMonitor

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks

About

+ =
SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface.

jobdisplay

Requirements

  • Jupyter Lab 4 OR Jupyter Notebook 4.4.0 or higher
  • pyspark 2 or 3

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time

Quick Start

Setting up the extension

pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# For use with jupyter notebook install and enable the nbextension
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable  sparkmonitor --py

# The jupyterlab extension is automatically enabled

With the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)

If you already have your own spark configuration, you will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package path/to/package/sparkmonitor/listener_<scala_version>.jar

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.<X>/site-packages/sparkmonitor/listener_<scala_version>.jar')\
        .getOrCreate()

Development

If you'd like to develop the extension:

# See package.json scripts for building the frontend
yarn run build:<action>

# Install the package in editable mode
pip install -e .

# Symlink jupyterlab extension
jupyter labextension develop --overwrite .

# Watch for frontend changes
yarn run watch

# Build the spark JAR files
sbt +package

History

Changelog

This repository is published to pypi as sparkmonitor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmonitor-3.0.2.tar.gz (5.9 MB view details)

Uploaded Source

Built Distribution

sparkmonitor-3.0.2-py3-none-any.whl (5.8 MB view details)

Uploaded Python 3

File details

Details for the file sparkmonitor-3.0.2.tar.gz.

File metadata

  • Download URL: sparkmonitor-3.0.2.tar.gz
  • Upload date:
  • Size: 5.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for sparkmonitor-3.0.2.tar.gz
Algorithm Hash digest
SHA256 e6c3a4839985a2e7d30773839c2692566c0f28e0e941114aa7b25ebf3b1e9d46
MD5 3d1d1512a2c5e8774b6010c169cbc98b
BLAKE2b-256 ef114af645bd79cdf225d3d2aa09c0e0fd40017daa27351fd6e100cc998f31f6

See more details on using hashes here.

File details

Details for the file sparkmonitor-3.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sparkmonitor-3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d30c2e0cf70584403c6d27537cf732952c3f1cdd88b5d9586644dc8f8f54d344
MD5 dbbf4bad35f380c1d4cea747a9a1fdef
BLAKE2b-256 b5cc9f9a177decac88cabfc21119e47306e7cb976359927bd4206ded09d24929

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page