Skip to main content

Command line application to visualize the timeline of Spark executions.

Project description

Command line application to visualize the timeline of Apache Spark executions, reading Spark’s log files.

A fundamental assumption is that all the executors are added before the Spark application submits any job. That is, this tool does not support dynamic scaling.

Can you spot the bottleneck from the following visualization?

docs/example-timeline.svg

Image explanation

On the vertical axis we have the executor cores (grouped by executor). On the horizontal axis we have the time, going from left to right. Each task is a horizontal bar that starts at a certain time on a core of an executor and ends after some time. The color normally ranges from green, used for shorter tasks, to red, used for longer tasks. Failed tasks are black. All the white space corresponds to some unused core.

Usually, the greener the image is, the better. If there is a bottleneck in the execution it is easy to spot the guilty task(s). By opening the SVG in a browser and by moving the mouse over a task there should appear a tooltip with the task ID. It is then useful to inspect the task using the standard Spark UI.

Installation

pip install view-spark-timeline

Example

view-spark-timeline -i examples/application_1472176676028_555248_1 -o docs/timeline.svg -u 1000

Output:

Read events from 'examples/application_1472176676028_555248_1'...
Total cores: 32
Total duration: 312.5s
Number of tasks: 2990
Min task duration: 0.0s
Max task duration: 25.9s
Cluster utilization: 57.70%
Drawing events...
Read events from 'examples/application_1472176676028_555248_1'...
SVG size: 1500 160
Saving SVG...

Usage

view-spark-timeline --help

Output:

usage: view-spark-timeline [-h] -i INPUT_LOG -o OUTPUT_IMAGE
                       [-t TIME_UNCERTAINTY] [-v]

Visualize the timeline of a Spark execution from its log file. (v0.2.0)

optional arguments:
-h, --help            show this help message and exit
-i INPUT_LOG, --input-log INPUT_LOG
                        path to the spark's application log
-o OUTPUT_IMAGE, --output-image OUTPUT_IMAGE
                        path of the output image
-u TIME_UNCERTAINTY, --time-uncertainty TIME_UNCERTAINTY
                        maximum allowed time uncertainty (in ms) of the
                        timestamps in the log file. An high uncertainty
                        determines a slower, but more robust, execution.
                        (Default: 0)
-v, --version         print version and exit

License

Copyright (c) 2017-2020, Federico Poli <federpoli@gmail.com>

This project, except for files in the lib and examples folders, is released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

view-spark-timeline-0.2.9.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

view_spark_timeline-0.2.9-py2-none-any.whl (10.8 kB view details)

Uploaded Python 2

File details

Details for the file view-spark-timeline-0.2.9.tar.gz.

File metadata

  • Download URL: view-spark-timeline-0.2.9.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.9.1 setuptools/20.7.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/3.5.2

File hashes

Hashes for view-spark-timeline-0.2.9.tar.gz
Algorithm Hash digest
SHA256 4c07b420fc59edf45d9a2fbf7bdc5ea9fc3a6c649c48cfd21dcbde53d4f8fb1f
MD5 4f109cdee4b22070c115d7853c06f74b
BLAKE2b-256 b8fc81d9aad70e895dbf7634ff7338db17bc461949c4c27156fe89191cd403be

See more details on using hashes here.

File details

Details for the file view_spark_timeline-0.2.9-py2-none-any.whl.

File metadata

  • Download URL: view_spark_timeline-0.2.9-py2-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.9.1 setuptools/20.7.0 requests-toolbelt/0.8.0 tqdm/4.19.4 CPython/3.5.2

File hashes

Hashes for view_spark_timeline-0.2.9-py2-none-any.whl
Algorithm Hash digest
SHA256 fbe3d5b0dfcaf7da9a45baaf4b93497066ad489bc9581922d04784bab6f5a4dc
MD5 fc9f7b35d466395d062fea963f505379
BLAKE2b-256 bec6e9c42ffd72f23f3a3714d6a30585c2a55e0ca1f51773854d6dc3e05014d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page