Skip to main content

A python library to interact with the Spark History server

Project description

spark-monitoring

A python library to interact with the Spark History server.

Quickstart

Basic

$ pip install spark-monitoring
import sparkmonitoring as sparkmon

monitoring = sparkmon.client('my.history.server')
print(monitoring.list_applications())

Pandas

$ pip install spark-monitoring[pandas]
import sparkmonitoring as sparkmon
import matplotlib.pyplot as plt

monitoring = sparkmon.df('my.history.server')

apps = monitoring.list_applications()
apps['function'] = apps.name.str.split('(').str.get(0)
print(apps.head().stack())

plt.figure()
apps['duration'].hist(by=apps['function'], figsize=(40, 20))
plt.show()

jobs = monitoring.list_jobs(apps.iloc[0].id)

print(jobs.head().stack())

Reference

sparkmonitoring.client

Method to return a client to make calls to the spark history server with.

Arguments

Name Type Description Default
server string Hostname or IP pointing to the spark history server
port int Port which the spark history server is exposed on 18080
is_https bool Whether or not to use https to communicate with the spark server False
api_version int API Version to interact with. Currently only 1 is supported 1

Response

Examples

Basic Endpoint

import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server')

Custom Endpoint

import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server', port=8080, is_https=True)

sparkmonitoring.df

Method to return a client to make calls to the spark history server with. This client will return pandas dataframes, as opposed ot dictionaries in the standard client. Can be used when the spark-monitoring[pandas] extra is installed.

Arguments

Name Type Description Default
server string Hostname or IP pointing to the spark history server
port int Port which the spark history server is exposed on 18080
is_https bool Whether or not to use https to communicate with the spark server False
api_version int API Version to interact with. Currently only 1 is supported 1

Response

Examples

Basic Endpoint

import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server')

Custom Endpoint

import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server', port=8080, is_https=True)

sparkmonitoring.api.ClientV1

A client to interact with the Spark History Server. Generally this class is not instantiated directly, and is accessed via sparkmonitoring.client(...).

Arguments

Name Type Description Default
server string Hostname or IP pointing to the spark history server
port int Port which the spark history server is exposed on
is_https bool Whether or not to use https to communicate with the spark server
api_version int API Version to interact with. Currently only 1 is supported

Methods

  • list_applications(...)
  • get_application(...)
  • list_jobs(...)
  • get_job(...)
  • list_stages(...)
  • list_stage_attempts(...)
  • get_stage_attempt(...)
  • get_stage_attempt_summary(...)
  • get_stage_attempt_tasks(...)
  • list_active_executors(...)
  • list_executor_threads(...)
  • list_all_executors(...)

sparkmonitoring.dataframes.PandasClient.list_applications

A list of all applications.

Arguments

Name Type Description Default
status enum{'completed','running'} Type of applications to return
minDate string{ISO8601} Earliest Application
maxDate string{ISO8601} Latest Application
limit int Number of results to return

sparkmonitoring.dataframes.PandasClient

A client to interact with the Spark History Server, returning pandas DataFrames. Generally this class is not instantiated directly, and is accessed via sparkmonitoring.df(...).

Arguments

Name Type Description Default
server string Hostname or IP pointing to the spark history server
port int Port which the spark history server is exposed on 18080
is_https bool Whether or not to use https to communicate with the spark server False
api_version int API Version to interact with. Currently only 1 is supported 1

Methods

  • list_applications(...)
  • get_application(...)
  • list_jobs(...)
  • get_job(...)
  • list_stages(...)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark-monitoring-0.0.3.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

spark_monitoring-0.0.3-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file spark-monitoring-0.0.3.tar.gz.

File metadata

  • Download URL: spark-monitoring-0.0.3.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for spark-monitoring-0.0.3.tar.gz
Algorithm Hash digest
SHA256 db49a4b3333477cace69d0297c0f9361a6cce1d9562ab80b45b469357573b771
MD5 723bf30d421fc532b4232f68c9e82ff7
BLAKE2b-256 f4393805522e5bc7f4e9720db883d5b65d48d209c8dfb786a669e681ea1ab8e8

See more details on using hashes here.

File details

Details for the file spark_monitoring-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: spark_monitoring-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for spark_monitoring-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fda5602c9aaba398e59f986b53d40277413509a0a93fa3c88a7809c206186df9
MD5 23f8141660ebad20974e33b2ba4f49ab
BLAKE2b-256 d48fd9125fd0fa2964029f056f4df1f5cc4990799db3a45d5261d3a70194a68f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page