A python library to interact with the Spark History server
Project description
spark-monitoring
A python library to interact with the Spark History server.
Quickstart
Basic
$ pip install spark-monitoring
import sparkmonitoring as sparkmon
monitoring = sparkmon.client('my.history.server')
print(monitoring.list_applications())
Pandas
$ pip install spark-monitoring[pandas]
import sparkmonitoring as sparkmon
import matplotlib.pyplot as plt
monitoring = sparkmon.df('my.history.server')
apps = monitoring.list_applications()
apps['function'] = apps.name.str.split('(').str.get(0)
print(apps.head().stack())
plt.figure()
apps['duration'].hist(by=apps['function'], figsize=(40, 20))
plt.show()
jobs = monitoring.list_jobs(apps.iloc[0].id)
print(jobs.head().stack())
Reference
sparkmonitoring.client
Method to return a client to make calls to the spark history server with.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Response
Examples
Basic Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server')
Custom Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server', port=8080, is_https=True)
sparkmonitoring.df
Method to return a client to make calls to the spark history server with. This
client will return pandas dataframes, as opposed ot dictionaries in the
standard client. Can be used when the spark-monitoring[pandas]
extra is
installed.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Response
Examples
Basic Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server')
Custom Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server', port=8080, is_https=True)
sparkmonitoring.api.ClientV1
A client to interact with the Spark History Server.
Generally this class is not instantiated directly, and is accessed via
sparkmonitoring.client(...)
.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | |
is_https |
bool |
Whether or not to use https to communicate with the spark server | |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
Methods
list_applications(...)
get_application(...)
list_jobs(...)
get_job(...)
list_stages(...)
list_stage_attempts(...)
get_stage_attempt(...)
get_stage_attempt_summary(...)
get_stage_attempt_tasks(...)
list_active_executors(...)
list_executor_threads(...)
list_all_executors(...)
sparkmonitoring.dataframes.PandasClient.list_applications
A list of all applications.
Arguments
Name | Type | Description | Default |
---|---|---|---|
status |
enum{'completed','running'} |
Type of applications to return | |
minDate |
string{ISO8601} |
Earliest Application | |
maxDate |
string{ISO8601} |
Latest Application | |
limit |
int |
Number of results to return |
sparkmonitoring.dataframes.PandasClient
A client to interact with the Spark History Server, returning pandas
DataFrames.
Generally this class is not instantiated directly, and is accessed via
sparkmonitoring.df(...)
.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Methods
list_applications(...)
get_application(...)
list_jobs(...)
get_job(...)
list_stages(...)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spark-monitoring-0.0.3.tar.gz
.
File metadata
- Download URL: spark-monitoring-0.0.3.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db49a4b3333477cace69d0297c0f9361a6cce1d9562ab80b45b469357573b771 |
|
MD5 | 723bf30d421fc532b4232f68c9e82ff7 |
|
BLAKE2b-256 | f4393805522e5bc7f4e9720db883d5b65d48d209c8dfb786a669e681ea1ab8e8 |
File details
Details for the file spark_monitoring-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: spark_monitoring-0.0.3-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fda5602c9aaba398e59f986b53d40277413509a0a93fa3c88a7809c206186df9 |
|
MD5 | 23f8141660ebad20974e33b2ba4f49ab |
|
BLAKE2b-256 | d48fd9125fd0fa2964029f056f4df1f5cc4990799db3a45d5261d3a70194a68f |