Skip to main content

A low-overhead sampling profiler for PySpark, that outputs Flame Graphs

Project description

pyspark-flame

A low-overhead profiler for Spark on Python

Pyspark-flame hooks into Pyspark's existing profiling capabilities to provide a low-overhead stack-sampling profiler, that outputs performance data in a format compatible with Brendan Gregg's FlameGraph Visualizer.

Because pyspark-flame hooks into Pyspark's profiling capabilities, it can profile the entire execution of an RDD, across the whole of the cluster, and provides RDD-level visibility of performance.

Unlike the cProfile-based profiler included with Pyspark, pyspark-flame uses stack sampling. It takes stack traces at regular (configurable) intervals, which allows its overhead to be low and tunable, and doesn't skew results, making it suitable for use in performance test environments at high volumes.

Installation

pip install pyspark-flame

Usage

from pyspark_flame import FlameProfiler
from pyspark import SparkConf, SparkContext

conf = SparkConf().set("spark.python.profile", "true")
conf = conf.set("spark.python.profile.dump", ".")  # Optional - if not, dumps to stdout at exit
sc = SparkContext(
    'local', 'test', conf=conf, profiler_cls=FlameProfiler,
    environment={'pyspark_flame.interval': 0.25}  # Optional - default is 0.2 seconds
)
# Do stuff with Spark context...
sc.show_profiles()
# Or maybe
sc.dump_profiles('.')

For convenience, flamegraph.pl is vendored in, so you can produce a flame graph with:

flamegraph.pl rdd-1.flame > rdd-1.svg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-flame-0.2.9.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

pyspark_flame-0.2.9-py2.py3-none-any.whl (15.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pyspark-flame-0.2.9.tar.gz.

File metadata

  • Download URL: pyspark-flame-0.2.9.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.6

File hashes

Hashes for pyspark-flame-0.2.9.tar.gz
Algorithm Hash digest
SHA256 b9b0ccb3e6a56bd433e6330f26ba60746b49e4505c1d934b9ec73a1f7c5f1c4a
MD5 593e235f5814e8201121a6f650ed5f5e
BLAKE2b-256 fe8d76e1d93a4c06e48a825e49557341b3e7d66844662be47a60e8c6bf289325

See more details on using hashes here.

File details

Details for the file pyspark_flame-0.2.9-py2.py3-none-any.whl.

File metadata

  • Download URL: pyspark_flame-0.2.9-py2.py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.6

File hashes

Hashes for pyspark_flame-0.2.9-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 69eb2ec93742d972fb96846cc7d69b5228c2f6c23464b7440f80137da34fcd2d
MD5 cbfd9f1c7b9587c9289c8cf22509a7ef
BLAKE2b-256 5f9f45a7d2a4838b8c06595f7e7273317ee16aaefdfc58a26484ee8037b919ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page