A python library to interact with the Spark History server
Project description
spark-monitoring
A python library to interact with the Spark History server.
Quickstart
Basic
$ pip install spark-monitoring
import sparkmonitoring as sparkmon
monitoring = sparkmon.client('my.history.server')
print(monitoring.list_applications())
Pandas
$ pip install spark-monitoring[pandas]
import sparkmonitoring as sparkmon
import matplotlib.pyplot as plt
monitoring = sparkmon.df('my.history.server')
apps = monitoring.list_applications()
apps['function'] = apps.name.str.split('(').str.get(0)
print(apps.head().stack())
plt.figure()
apps['duration'].hist(by=apps['function'], figsize=(40, 20))
plt.show()
jobs = monitoring.list_jobs(apps.iloc[0].id)
print(jobs.head().stack())
Reference
sparkmonitoring.client
Method to return a client to make calls to the spark history server with.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Response
Examples
Basic Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server')
Custom Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server', port=8080, is_https=True)
sparkmonitoring.df
Method to return a client to make calls to the spark history server with. This
client will return pandas dataframes, as opposed ot dictionaries in the
standard client. Can be used when the spark-monitoring[pandas]
extra is
installed.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Response
Examples
Basic Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server')
Custom Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server', port=8080, is_https=True)
sparkmonitoring.api.ClientV1
A client to interact with the Spark History Server.
Generally this class is not instantiated directly, and is accessed via
sparkmonitoring.client(...)
.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | |
is_https |
bool |
Whether or not to use https to communicate with the spark server | |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
Methods
list_applications(...)
get_application(...)
list_jobs(...)
get_job(...)
list_stages(...)
list_stage_attempts(...)
get_stage_attempt(...)
get_stage_attempt_summary(...)
get_stage_attempt_tasks(...)
list_active_executors(...)
list_executor_threads(...)
list_all_executors(...)
sparkmonitoring.dataframes.PandasClient.list_applications
A list of all applications.
Arguments
Name | Type | Description | Default |
---|---|---|---|
status |
enum{'completed','running'} |
Type of applications to return | |
minDate |
string{ISO8601} |
Earliest Application | |
maxDate |
string{ISO8601} |
Latest Application | |
limit |
int |
Number of results to return |
sparkmonitoring.dataframes.PandasClient
A client to interact with the Spark History Server, returning pandas
DataFrames.
Generally this class is not instantiated directly, and is accessed via
sparkmonitoring.df(...)
.
Arguments
Name | Type | Description | Default |
---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Methods
list_applications(...)
get_application(...)
list_jobs(...)
get_job(...)
list_stages(...)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spark_monitoring-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fda5602c9aaba398e59f986b53d40277413509a0a93fa3c88a7809c206186df9 |
|
MD5 | 23f8141660ebad20974e33b2ba4f49ab |
|
BLAKE2b-256 | d48fd9125fd0fa2964029f056f4df1f5cc4990799db3a45d5261d3a70194a68f |