A python library to interact with the Spark History server
Project description
spark-monitoring
A python library to interact with the Spark History server.
Quickstart
Basic
$ pip install spark-monitoring
import sparkmonitoring as sparkmon
monitoring = sparkmon.client('my.history.server')
print(monitoring.list_applications())
Pandas
$ pip install spark-monitoring[pandas]
import sparkmonitoring as sparkmon
import matplotlib.pyplot as plt
monitoring = sparkmon.df('my.history.server')
apps = monitoring.list_applications()
apps['function'] = apps.name.str.split('(').str.get(0)
print(apps.head().stack())
plt.figure()
apps['duration'].hist(by=apps['function'], figsize=(40, 20))
plt.show()
jobs = monitoring.list_jobs(apps.iloc[0].id)
print(jobs.head().stack())
Reference
sparkmonitoring.client
Method to return a client to make calls to the spark history server with.
Arguments
| Name | Type | Description | Default |
|---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Response
Examples
Basic Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server')
Custom Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.client('my.history.server', port=8080, is_https=True)
sparkmonitoring.df
Method to return a client to make calls to the spark history server with. This
client will return pandas dataframes, as opposed ot dictionaries in the
standard client. Can be used when the spark-monitoring[pandas] extra is
installed.
Arguments
| Name | Type | Description | Default |
|---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Response
Examples
Basic Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server')
Custom Endpoint
import sparkmonitoring as sparkmon
client = sparkmon.df('my.history.server', port=8080, is_https=True)
sparkmonitoring.api.ClientV1
A client to interact with the Spark History Server.
Generally this class is not instantiated directly, and is accessed via
sparkmonitoring.client(...).
Arguments
| Name | Type | Description | Default |
|---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | |
is_https |
bool |
Whether or not to use https to communicate with the spark server | |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
Methods
list_applications(...)get_application(...)list_jobs(...)get_job(...)list_stages(...)list_stage_attempts(...)get_stage_attempt(...)get_stage_attempt_summary(...)get_stage_attempt_tasks(...)list_active_executors(...)list_executor_threads(...)list_all_executors(...)
sparkmonitoring.dataframes.PandasClient.list_applications
A list of all applications.
Arguments
| Name | Type | Description | Default |
|---|---|---|---|
status |
enum{'completed','running'} |
Type of applications to return | |
minDate |
string{ISO8601} |
Earliest Application | |
maxDate |
string{ISO8601} |
Latest Application | |
limit |
int |
Number of results to return |
sparkmonitoring.dataframes.PandasClient
A client to interact with the Spark History Server, returning pandas
DataFrames.
Generally this class is not instantiated directly, and is accessed via
sparkmonitoring.df(...).
Arguments
| Name | Type | Description | Default |
|---|---|---|---|
server |
string |
Hostname or IP pointing to the spark history server | |
port |
int |
Port which the spark history server is exposed on | 18080 |
is_https |
bool |
Whether or not to use https to communicate with the spark server | False |
api_version |
int |
API Version to interact with. Currently only 1 is supported |
1 |
Methods
list_applications(...)get_application(...)list_jobs(...)get_job(...)list_stages(...)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spark-monitoring-0.0.3.tar.gz.
File metadata
- Download URL: spark-monitoring-0.0.3.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db49a4b3333477cace69d0297c0f9361a6cce1d9562ab80b45b469357573b771
|
|
| MD5 |
723bf30d421fc532b4232f68c9e82ff7
|
|
| BLAKE2b-256 |
f4393805522e5bc7f4e9720db883d5b65d48d209c8dfb786a669e681ea1ab8e8
|
File details
Details for the file spark_monitoring-0.0.3-py3-none-any.whl.
File metadata
- Download URL: spark_monitoring-0.0.3-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda5602c9aaba398e59f986b53d40277413509a0a93fa3c88a7809c206186df9
|
|
| MD5 |
23f8141660ebad20974e33b2ba4f49ab
|
|
| BLAKE2b-256 |
d48fd9125fd0fa2964029f056f4df1f5cc4990799db3a45d5261d3a70194a68f
|