Skip to main content

sparkmon

Project description

PyPI Python Version License

Read the documentation at https://sparkmon.readthedocs.io/ Tests Codecov

pre-commit Black

Description

sparkmon is a Python package to monitor Spark applications. You can see it as an advanced Spark UI, that keeps track all of Spark REST API metrics over time, which makes it quite unique compare to other solutions (see comparison below). It is specifically useful to do memory profiling, including Python UDF memory.

Features

Monitoring plot example:

docs/_static/monitoring-plot-example.png

Disclaimer: Be aware that if you run Spark in local mode some of the subplots will be empty, sparkmon is designed to analyse Spark applications running in a cluster.

  • Log the executors metrics

  • Plot monitoring, display in a notebook, or export to a file

  • Can monitor remote Spark application

  • Can run directly in your PySpark application, or run in a notebook, or via the command-line interface

  • Log to mlflow

Comparison with other solutions

This package brings much more information than Spark UI or other packages. Here is a quick comparison:

  • sparkmonitor:

    • Nice integration in notebook

    • Doesn’t bring more information that Spark UI, specially not memory usage over time.

  • sparklint:

    • Need to launch a server locally, might be difficult on-premise. sparkmon doesn’t need to have a port accessible.

    • Monitors only CPU over time, sparkmon monitors everything including Java and Python memory overtime.

    • No update since 2018

  • Data Mechanics Delight:

    • Really nice and complete

    • But cannot work fully on-premise

    • Is not fully open-source

  • Sparklens:

    • But cannot work fully on-premise

    • Is not fully open-source

Requirements

  • Python

  • Spark

  • mlflow (optional)

Installation

You can install sparkmon via pip from PyPI:

$ pip install sparkmon
$ pip install sparkmon[mlflow]

Usage

Simple use-case:

import sparkmon

# Create and start the monitoring process via a Spark session
mon = sparkmon.SparkMon(spark, period=5, callbacks=[
    sparkmon.callbacks.plot_to_image,
    sparkmon.callbacks.log_to_mlflow,
])
mon.start()

# Stop monitoring
mon.stop()

More advanced use-case:

import sparkmon

# Create an app connection
# via a Spark session
application = sparkmon.create_application_from_spark(spark)
# or via a remote Spark web UI link
application = sparkmon.create_application_from_link(index=0, web_url='http://localhost:4040')

# Create and start the monitoring process
mon = sparkmon.SparkMon(application, period=5, callbacks=[
    sparkmon.callbacks.plot_to_image,
    sparkmon.callbacks.log_to_mlflow,
])
mon.start()

# Stop monitoring
mon.stop()

You can also use it from a notebook: Notebook Example

There is also a command-line interface, see Command-line Reference for details.

How does it work?

SparkMon is running in the background a Python thread that is querying Spark web UI API and logging all the executors information over time.

The callbacks list parameters allows you to define what do after each update, like exporting executors historical info to a csv, or plotting to a file, or to your notebook.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, sparkmon is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkmon-0.1.10.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

sparkmon-0.1.10-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file sparkmon-0.1.10.tar.gz.

File metadata

  • Download URL: sparkmon-0.1.10.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for sparkmon-0.1.10.tar.gz
Algorithm Hash digest
SHA256 04e84e62e5697efa4c11e73a4f8130bdb85e9ad99052bd76c26a92c46bed5847
MD5 a5c4b5404786c83d9a6e73f6aab51705
BLAKE2b-256 0df10a89bee4b3fddf997aeeb157f6bf8ff9ed83da2ce2ed4869fc3943a91f50

See more details on using hashes here.

File details

Details for the file sparkmon-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: sparkmon-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for sparkmon-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2cbcd8ebca861e7ee5e1b0583bfab2c872b1cfe0ac25159ffcf3679dc9af014b
MD5 8e62ac17edcd38650cf3649da7623cf1
BLAKE2b-256 221c1f9a3fc01e86820e96682fa09144c11b051b63f7b1ed5752b505d6f29eec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page