Skip to main content

Interactive visualization of Spark jobs

Project description

spark-board: interactive PySpark dataframes visualization

spark-board provides an interactive way to analize PySpark data frame execution plans as a static website displaying the transformations DAG.

Check out the examples for a quick overview of the features (and the corresponding examples source code here).

Usage

spark-board takes a PySpark data frame and inspects the operations to build the DAG. This usually is the final step of a PySpark script, right before writing it to disk.

Install spark-board

pip install spark-board

Run spark-board

from spark_board.html import dump_dataframe, DefaultSettings

# get the PySpark data frame that will be displayed
df = ...

dump_dataframe(
    df=df,
    output_dir="./spark_board_output",
    overwrite=True,  # overwrite output_dir if it already exists
    default_settings=DefaultSettings(),  # override default settings if desired
)

and that's it! spark-board will generate a static website in the defined output_dir folder. You can now serve the website using any web server and inspect the operations.

You can check out the avaialble default settings here.

Serving

spark-board is intended to be a live documentation of PySpark scripts. Because of this, it's advisable to run it every time the source code is updated. For example, spark-board can be run as part of a CI pipeline and the generated website uploaded to a static website hosting service, like Github or Gitlab pages (we actually do this to update and serve the examples in this repository).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark-board-0.0.6.tar.gz (223.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spark_board-0.0.6-py3-none-any.whl (225.8 kB view details)

Uploaded Python 3

File details

Details for the file spark-board-0.0.6.tar.gz.

File metadata

  • Download URL: spark-board-0.0.6.tar.gz
  • Upload date:
  • Size: 223.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for spark-board-0.0.6.tar.gz
Algorithm Hash digest
SHA256 0c95c3c4a5d20ef5752c8416b3ec07df405fd83052d98a5717c6fda6394d8505
MD5 1ce64c32e434252382bef44617a07364
BLAKE2b-256 334f0ee16b3b05f7f782ce07d61f3fc997825389d5b3f3d327040096418764f8

See more details on using hashes here.

File details

Details for the file spark_board-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: spark_board-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 225.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for spark_board-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 da0d89c54f2afe5349025f773752c4b309a0528dcaa6042875fa1f4b0bf7413a
MD5 8de4d19848c3b2204d3bad8a20e934df
BLAKE2b-256 01b528f573c527876991cf3a86c5a6f6f354a0203c120257134300c02e7fedc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page