Guidepost. An overview visualization for understanding supercomputer queue data.
Project description
Guidepost
Guidepost is a Python library designed for seamless integration into Jupyter notebooks to visualize High Performance Computing (HPC) job data. It simplifies the process of understanding HPC workloads by providing a single, interactive visualization that offers an intuitive overview of job performance, resource usage, and other critical metrics.
Features
- Jupyter Notebook Integration: Designed for your existing workflow. Load and interact with the visualization directly in your Jupyter environment.
- HPC Job Data Insights: Visualize key metrics, including job runtimes, resource usage, and queue performance.
- Interactive Exploration: Export selections of specific jobs or groups of jobs for deeper analysis.
- Lightweight and Easy to Use: Focused on simplicity and efficiency for HPC users.
Installation
Guidepost is available on PyPI. You can install it using pip:
pip install guidepost
Quick Start
1. Import and Initialize Guidepost
from guidepost import Guidepost
gp = Guidepost()
2. Load Your Data
Guidepost supports input data in CSV or Pandas DataFrame format. Ensure your data includes columns such as job IDs, runtime, and resource usage.
import pandas as pd
jobs_data = pd.read_parquet("data/jobs_data.parquet")
3. Configure Visualization
gp.vis_data = jobs_data
gp.vis_configs = {
'x': 'queue_wait',
'y': 'start_time',
'color': 'nodes_req',
'color_agg': 'avg',
'categorical': 'user'
}
4. Run Visualization
gp
Run the above command in a Jupyter notebook cell to load data.
4. Retrieve Selections from Visualization
gp.retrieve_selected_data()
Example Dataset
Below is an example of the kind of data Guidepost works with:
| Job ID | Runtime (hours) | Nodes Used | partition | Status |
|---|---|---|---|---|
| 12345 | 5.2 | 10 | short | Complete |
| 12346 | 12.0 | 20 | long | Running |
Note that a column named "partition" must be sepecified.
API Reference
vis_data
- Description: Holds the vis data to passed to the visualization. Updates to this variable will automatically update the visualization.
vis_configs
- Description: Holds the vis configurations to passed to the visualization. Updates to this variable will automatically update the visualization.
Vis configurations must be specified as a python dictonary with the following fields:
- 'x': The column from the pandas dataframe which will be shown on the x axis. This can be a integer, float or datetime variable.
- 'y': The column from the pandas dataframe which will be shown on the y axis of this visualization. This can be an integer or float.
- 'color': The column from the pandas dataframe which will determine the color of squares in the main summary view. This can be an integer or float.
- 'color_agg': This is a specification for what aggregation is used for the color variable. It can be: 'avg', 'variance', 'std', 'sum', or 'median'
- 'categorical': A categorical variable from the dataset. It must be a string. The visualization will show the top 7 instances of this variable.
retrieve_selected_data()
- Description: Returns selected data back from the visualization.
- Returns:
subselection(DataFrame or str): A Pandas DataFrame that contains subselected data specified from selections made to the visualization.
Contributing
Contributions to Guidepost are welcome! To contribute:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Submit a pull request with a detailed description of your changes.
License
Guidepost is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL).
Contact
For questions or feedback, please reach out to the maintainer at [cscullyallison@sci.utah.edu].
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file guidepost-0.2.3.tar.gz.
File metadata
- Download URL: guidepost-0.2.3.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37786570acf36836e30dde41e3ad032a2118009e6c33249e2908ffa45cc8a315
|
|
| MD5 |
65acfae4e474888daa09ba09c133e590
|
|
| BLAKE2b-256 |
a12a9b74dd8c9f992c6e635b13e4b4b3a6630e4815117c5750767dcfb7198a5c
|
File details
Details for the file guidepost-0.2.3-py3-none-any.whl.
File metadata
- Download URL: guidepost-0.2.3-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bba5bbd505576a5ef31a782dcdba9be05b8dbe459181970b59e21c064b016061
|
|
| MD5 |
950118353faade39d308877e23e724de
|
|
| BLAKE2b-256 |
ba04762dfc12adf3f376dadb72cf82661bfc969b77a7aa0dd19048960e9a6398
|