Skip to main content

Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook

Project description

SparkMonitor

SparkMonitor is a Jupyter extension for monitoring Apache Spark jobs launched from notebooks. It displays live Spark metrics directly in the notebook interface, making it easier to understand, debug, and profile Spark workloads as they run.

It supports JupyterLab and classic Jupyter Notebook with PySpark 3.x and 4.x.

About

Jupyter + Apache Spark = SparkMonitor

SparkMonitor adds an interactive monitoring panel below notebook cells that trigger Spark jobs, so you can inspect execution progress without leaving the notebook.


SparkMonitor job display

Requirements

  • Python 3
  • PySpark 3.x or 4.x
  • JupyterLab 4 or Jupyter Notebook 4.4.0 or later
  • Spark Classic API mode
    • SparkMonitor works with the traditional Spark driver model used by PySpark.
    • It is not compatible with Spark Connect.

Features

  • Live monitoring of Spark jobs launched from a notebook cell
  • Job and stage table with progress bars and execution details
  • Timeline view showing jobs, stages, and tasks over time
  • Resource graphs for active tasks and executor core usage
Jobs and stages view Resource graphs Timeline view

Quick Start

Installation

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install SparkMonitor together with PySpark and a notebook frontend.

For JupyterLab

pip install sparkmonitor pyspark jupyterlab

Enable the SparkMonitor IPython kernel extension:

ipython profile create
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> "$(ipython profile locate default)/ipython_kernel_config.py"

This only needs to be done once per IPython profile.

Using SparkMonitor in a Notebook

To use SparkMonitor, create your Spark session with the SparkMonitor listener enabled.

This requires two Spark configurations:

Configuration Purpose
spark.extraListeners Registers the SparkMonitor listener that collects Spark job metrics
spark.driver.extraClassPath Points to the SparkMonitor listener JAR (listener_<scala_version>.jar) bundled with the sparkmonitor package

Recommended example

The most robust approach is to resolve the listener JAR path dynamically from the installed Python package instead of hardcoding the full environment path:

from pathlib import Path

import sparkmonitor
from pyspark.sql import SparkSession

sparkmonitor_dir = Path(sparkmonitor.__file__).resolve().parent
listener_jar = sparkmonitor_dir / "listener_2.13.jar"

spark = (
    SparkSession.builder.config(
        "spark.extraListeners",
        "sparkmonitor.listener.JupyterSparkMonitorListener",
    )
    .config("spark.driver.extraClassPath", str(listener_jar))
    .getOrCreate()
)

Example with a fixed environment path

If you already know the exact path in your environment, you can also configure it directly:

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.config(
        "spark.extraListeners",
        "sparkmonitor.listener.JupyterSparkMonitorListener",
    )
    .config(
        "spark.driver.extraClassPath",
        "venv/lib/python3.13/site-packages/sparkmonitor/listener_2.13.jar",
    )
    .getOrCreate()
)

Important

The correct listener JAR depends on:

  • the location of your Python environment
  • the Scala version used by your Spark installation

You can inspect the installed package location with:

import sparkmonitor

print(sparkmonitor.__path__)

Then locate the corresponding listener_<scala_version>.jar file inside that package directory.

If needed, you can also build the listener JAR yourself with sbt, as described in the development section below.

Development

To work on SparkMonitor locally:

# Install the package in editable mode
pip install -e .

# Build the frontend (see package.json for available scripts)
yarn run build:<action>

# Link the JupyterLab extension into your local Jupyter environment
jupyter labextension develop --overwrite .

# Watch frontend files for changes
yarn run watch

# Build the Spark listener JARs
cd scalalistener_spark4 # Spark 4 / Scala 2.13
sbt package

Troubleshooting

SparkMonitor panel does not appear

Check the following:

  • sparkmonitor is installed in the same Python environment as your notebook kernel
  • pyspark is installed
  • jupyterlab or notebook is installed
  • the IPython kernel extension is enabled
  • your Spark session includes spark.extraListeners
  • spark.driver.extraClassPath points to a valid listener JAR
  • you are using Spark Classic, not Spark Connect

Wrong Scala version

The listener JAR must match the Scala version used by your Spark installation. For example, if your Spark environment uses Scala 2.13, use:

listener_2.13.jar

Using the wrong listener JAR may prevent the listener from loading correctly.

Hardcoded virtual environment path does not work

Avoid hardcoding paths when possible. Environment-specific paths vary across systems, Python versions, and virtual environments. The dynamic path resolution example above is usually more portable.

Project History

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmonitor-3.1.2.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparkmonitor-3.1.2-py3-none-any.whl (3.4 MB view details)

Uploaded Python 3

File details

Details for the file sparkmonitor-3.1.2.tar.gz.

File metadata

  • Download URL: sparkmonitor-3.1.2.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkmonitor-3.1.2.tar.gz
Algorithm Hash digest
SHA256 75d0a3b03047b283e0fc3af9e5422558a027cbf52b2444fbfda8051bf57a9517
MD5 290cba995b1d7ba7f1884369b15599f3
BLAKE2b-256 ac3ca6b64be457806e9eb103584f497004f91208e3df61d130560130bfafb4e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkmonitor-3.1.2.tar.gz:

Publisher: publish.yml on swan-cern/sparkmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sparkmonitor-3.1.2-py3-none-any.whl.

File metadata

  • Download URL: sparkmonitor-3.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkmonitor-3.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0b12884c943ac6160bd717f1cf67b24acb083383cfdd278e424395d565946b3a
MD5 6cfde9f2a8ebfc52197c97ad8d6e8fb1
BLAKE2b-256 9eb5588438dd36fc4749bcedaedd261f23dba04588f7f02af9d4b48819e70ea4

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkmonitor-3.1.2-py3-none-any.whl:

Publisher: publish.yml on swan-cern/sparkmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page