Jupyter Notebook & Lab extension to monitor Apache Spark jobs from a notebook

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

SparkMonitor

SparkMonitor is a Jupyter extension for monitoring Apache Spark jobs launched from notebooks. It displays live Spark metrics directly in the notebook interface, making it easier to understand, debug, and profile Spark workloads as they run.

It supports JupyterLab and classic Jupyter Notebook with PySpark 3.x and 4.x.

About

SparkMonitor adds an interactive monitoring panel below notebook cells that trigger Spark jobs, so you can inspect execution progress without leaving the notebook.

SparkMonitor job display

Requirements

Python 3.x
PySpark 3.x or 4.x
JupyterLab 4.x or classic Jupyter Notebook 4.4.0 or later
Spark Classic API mode
- SparkMonitor works with the traditional Spark driver model used by PySpark.
- It is not compatible with Spark Connect.

Features

Live monitoring of Spark jobs launched from a notebook cell
Job and stage table with progress bars and execution details
Timeline view showing jobs, stages, and tasks over time
Resource graphs for active tasks and executor core usage

Quick Start

Installation

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install SparkMonitor together with PySpark and a notebook frontend.

For JupyterLab

pip install sparkmonitor pyspark jupyterlab

Enable the SparkMonitor IPython kernel extension:

ipython profile create
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> "$(ipython profile locate default)/ipython_kernel_config.py"

This only needs to be done once per IPython profile.

Using SparkMonitor in a Notebook

To use SparkMonitor, create your Spark session with the SparkMonitor listener enabled.

This requires two Spark configurations:

Configuration	Purpose
`spark.extraListeners`	Registers the SparkMonitor listener that collects Spark job metrics
`spark.driver.extraClassPath`	Points to the SparkMonitor listener JAR bundled with the `sparkmonitor` package

Example with a manually specified listener JAR path

If you already know the exact path to the matching SparkMonitor listener JAR in your current environment, you can set spark.driver.extraClassPath directly:

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder.config(
        "spark.extraListeners",
        "sparkmonitor.listener.JupyterSparkMonitorListener",
    )
    .config(
        "spark.driver.extraClassPath",
        # Put the path to the matching SparkMonitor listener JAR here.
        "venv/lib/python3.13/site-packages/sparkmonitor/listener_spark4_2.13.jar",
    )
    .getOrCreate()
)

Example with automatic listener JAR detection

The most robust approach is to resolve the listener JAR path dynamically from the installed Python package instead of hardcoding the full environment path. The example below first checks SPARK_HOME and then falls back to the pyspark package layout used by pip install pyspark, where SPARK_HOME is often not set:

import os
from pathlib import Path

import pyspark
import sparkmonitor
from pyspark.sql import SparkSession


def iter_spark_jar_dirs() -> list[Path]:
    candidates = []

    spark_home = os.environ.get("SPARK_HOME")
    if spark_home:
        candidates.append(Path(spark_home) / "jars")

    candidates.append(Path(pyspark.__file__).resolve().parent / "jars")
    return [path for path in candidates if path.exists()]


def resolve_listener_jar(sparkmonitor_dir: Path) -> Path:
    for jars_dir in iter_spark_jar_dirs():
        for jar in jars_dir.glob("spark-core_*.jar"):
            # spark-core_2.13-3.5.8.jar => scala=2.13, spark_major=3
            scala_ver, spark_ver = jar.name.split("_")[1].split("-")[:2]
            spark_major = spark_ver.split(".")[0]
            if spark_major == "3" and scala_ver == "2.12":
                return sparkmonitor_dir / "listener_spark3_2.12.jar"
            if spark_major == "3" and scala_ver == "2.13":
                return sparkmonitor_dir / "listener_spark3_2.13.jar"
            if spark_major == "4" and scala_ver == "2.13":
                return sparkmonitor_dir / "listener_spark4_2.13.jar"

    raise RuntimeError(
        "Could not detect Spark/Scala version from SPARK_HOME or the pyspark installation"
    )


sparkmonitor_dir = Path(sparkmonitor.__file__).resolve().parent
listener_jar = resolve_listener_jar(sparkmonitor_dir)

spark = (
    SparkSession.builder.config(
        "spark.extraListeners",
        "sparkmonitor.listener.JupyterSparkMonitorListener",
    )
    .config("spark.driver.extraClassPath", str(listener_jar))
    .getOrCreate()
)

Important

The correct listener JAR depends on:

the location of your Python environment

your Spark major version and Scala version

how Spark is installed (SPARK_HOME vs pip install pyspark)

You can inspect the installed package location with:

import sparkmonitor

print(sparkmonitor.__path__)

Then locate the corresponding listener JAR in that package directory:

listener_spark3_2.12.jar for Spark 3 + Scala 2.12
listener_spark3_2.13.jar for Spark 3 + Scala 2.13
listener_spark4_2.13.jar for Spark 4 + Scala 2.13

If needed, you can also build the listener JAR yourself with sbt, as described in the development section below.

Development

To work on SparkMonitor locally:

# Install the package in editable mode
pip install -e .

# Build the frontend (see package.json for available scripts)
yarn run build:<action>

# Link the JupyterLab extension into your local Jupyter environment
jupyter labextension develop --overwrite .

# Watch frontend files for changes
yarn run watch

# Build the Spark listener JARs
cd scalalistener_spark3 # Spark 3 / Scala 2.12 and 2.13
sbt +package

cd ../scalalistener_spark4 # Spark 4 / Scala 2.13
sbt package

Troubleshooting

SparkMonitor panel does not appear

Check the following:

sparkmonitor is installed in the same Python environment as your notebook kernel
pyspark is installed
jupyterlab or notebook is installed
the IPython kernel extension is enabled
your Spark session includes spark.extraListeners
spark.driver.extraClassPath points to a valid listener JAR
you are using Spark Classic, not Spark Connect

Wrong listener JAR selected

The listener JAR must match both your Spark major version and Scala version:

Spark 3 + Scala 2.12 -> listener_spark3_2.12.jar
Spark 3 + Scala 2.13 -> listener_spark3_2.13.jar
Spark 4 + Scala 2.13 -> listener_spark4_2.13.jar

Using the wrong listener JAR may prevent the listener from loading correctly.

Hardcoded virtual environment path does not work

Avoid hardcoding paths when possible. Environment-specific paths vary across systems, Python versions, and virtual environments. The dynamic path resolution example above is usually more portable.

`SPARK_HOME` is not set

This is expected in some setups, especially when Spark comes from pip install pyspark. In that case, use the pyspark package location to find the bundled Spark JARs instead of assuming SPARK_HOME/jars exists.

Project History

The first version of SparkMonitor was written by krishnan-r as a Google Summer of Code project with the SWAN Notebook Service team at CERN.
Further fixes and improvements were made by the CERN team and community contributors in swan-cern/jupyter-extensions/tree/master/SparkMonitor.
Jafer Haider updated the extension for JupyterLab during an internship at Yelp.
Work from the jupyterlab-sparkmonitor fork was later merged into this repository so that both JupyterLab and Jupyter Notebook are supported from a single package.
Ongoing maintenance and development continue through the SWAN team at CERN and the community.

References

PyPI package: sparkmonitor
Releases: swan-cern/sparkmonitor releases

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

krishnanr swan

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.3.0

Mar 31, 2026

This version

3.2.0

Mar 31, 2026

3.1.2

Mar 24, 2026

3.1.1

Mar 4, 2026

3.1.0

Jun 24, 2025

3.0.4

Jun 12, 2025

3.0.3

May 13, 2025

3.0.2

Jan 25, 2024

2.1.1

Jan 18, 2022

2.1.0

Jan 14, 2022

2.0.0

Nov 22, 2021

1.1.1

Feb 15, 2021

1.1.0

Oct 16, 2020

1.0.0

Sep 3, 2020

0.0.9

May 29, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmonitor-3.2.0.tar.gz (3.5 MB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparkmonitor-3.2.0-py3-none-any.whl (3.4 MB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file sparkmonitor-3.2.0.tar.gz.

File metadata

Download URL: sparkmonitor-3.2.0.tar.gz
Upload date: Mar 31, 2026
Size: 3.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkmonitor-3.2.0.tar.gz
Algorithm	Hash digest
SHA256	`19518d072dd8533392960898ed610850a47b487ce79e1439b7bda1e1c3345dd0`
MD5	`a0a1717fa072fcc348f02031aa3b1148`
BLAKE2b-256	`cd370380913579463cc841b895509289347fe0ad4ab910cab9e9516806259ca6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkmonitor-3.2.0.tar.gz:

Publisher: publish.yml on swan-cern/sparkmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sparkmonitor-3.2.0.tar.gz
- Subject digest: 19518d072dd8533392960898ed610850a47b487ce79e1439b7bda1e1c3345dd0
- Sigstore transparency entry: 1202819556
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: swan-cern/sparkmonitor@818ac4b0c2a545d8f89aa843a1a1cd0078596576
- Branch / Tag: refs/tags/v3.2.0
- Owner: https://github.com/swan-cern
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@818ac4b0c2a545d8f89aa843a1a1cd0078596576
- Trigger Event: push

File details

Details for the file sparkmonitor-3.2.0-py3-none-any.whl.

File metadata

Download URL: sparkmonitor-3.2.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 3.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sparkmonitor-3.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07702bd1a38a05e0278e346c313df8aea1263fa28d6bfbf637679e2f1d249287`
MD5	`e336294a363f4d5d456ecf92231a32f5`
BLAKE2b-256	`3ffdcbe425f3121fa7bbcf286ca74393f7a74ad99b305f231680c6bfa3fe2fef`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sparkmonitor-3.2.0-py3-none-any.whl:

Publisher: publish.yml on swan-cern/sparkmonitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sparkmonitor-3.2.0-py3-none-any.whl
- Subject digest: 07702bd1a38a05e0278e346c313df8aea1263fa28d6bfbf637679e2f1d249287
- Sigstore transparency entry: 1202819558
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: swan-cern/sparkmonitor@818ac4b0c2a545d8f89aa843a1a1cd0078596576
- Branch / Tag: refs/tags/v3.2.0
- Owner: https://github.com/swan-cern
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@818ac4b0c2a545d8f89aa843a1a1cd0078596576
- Trigger Event: push

sparkmonitor 3.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SparkMonitor

About

Requirements

Features

Quick Start

Installation

For JupyterLab

Using SparkMonitor in a Notebook

Example with a manually specified listener JAR path

Example with automatic listener JAR detection

Development

Troubleshooting

SparkMonitor panel does not appear

Wrong listener JAR selected

Hardcoded virtual environment path does not work

SPARK_HOME is not set

Project History

References

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`SPARK_HOME` is not set