Skip to main content

Guidepost. An overview visualization for understanding supercomputer queue data.

Project description

Guidepost

Guidepost is a Python library designed for seamless integration into Jupyter notebooks to visualize High Performance Computing (HPC) job data. It simplifies the process of understanding HPC workloads by providing a single, interactive visualization that offers an intuitive overview of job performance, resource usage, and other critical metrics.


Features

  • Jupyter Notebook Integration: Designed for your existing workflow. Load and interact with the visualization directly in your Jupyter environment.
  • HPC Job Data Insights: Visualize key metrics, including job runtimes, resource usage, and queue performance.
  • Interactive Exploration: Export selections of specific jobs or groups of jobs for deeper analysis.
  • Lightweight and Easy to Use: Focused on simplicity and efficiency for HPC users.

Installation

Guidepost is available on PyPI. You can install it using pip:

pip install guidepost

Quick Start

1. Import and Initialize Guidepost

from guidepost import Guidepost
gp = Guidepost()

2. Load Your Data

Guidepost supports input data in CSV or Pandas DataFrame format. Ensure your data includes columns such as job IDs, runtime, and resource usage.

import pandas as pd

jobs_data = pd.read_parquet("data/jobs_data.parquet")

3. Configure Visualization

gp.vis_data = jobs_data
gp.vis_configs = {
        'x': 'queue_wait',
        'y': 'start_time',
        'color': 'nodes_req',
        'color_agg': 'avg',
        'categorical': 'user',
        'facet_by': 'partition'
}

4. Run Visualization

gp

Run the above command in a Jupyter notebook cell to load data.

4. Retrieve Selections from Visualization

gp.retrieve_selected_data()

Example Dataset

Below is an example of the kind of data Guidepost works with:

Job ID Runtime (hours) Nodes Used partition Status
12345 5.2 10 short Complete
12346 12.0 20 long Running

API Reference

vis_data

  • Description: Holds the vis data to passed to the visualization. Updates to this variable will automatically update the visualization.

vis_configs

  • Description: Holds the vis configurations to passed to the visualization. Updates to this variable will automatically update the visualization.

Vis configurations must be specified as a python dictonary with the following fields:

  • 'x': The column from the pandas dataframe which will be shown on the x axis. This can be a integer, float or datetime variable.
  • 'y': The column from the pandas dataframe which will be shown on the y axis of this visualization. This can be an integer or float.
  • 'color': The column from the pandas dataframe which will determine the color of squares in the main summary view. This can be an integer or float.
  • 'color_agg': This is a specification for what aggregation is used for the color variable. It can be: 'avg', 'variance', 'std', 'sum', or 'median'
  • 'categorical': A categorical variable from the dataset. The data column must be a string datatype. The visualization will show the top 10 instances of this variable.
  • 'facet_by': A categorical variable from the dataset. Automatically looks for 'queue' or 'partition' if this config is not specified.

retrieve_selected_data()

  • Description: Returns selected data back from the visualization.
  • Returns:
    • subselection (DataFrame or str): A Pandas DataFrame that contains subselected data specified from selections made to the visualization.

Contributing

Contributions to Guidepost are welcome! To contribute:

  1. Fork the repository.
  2. Create a new branch for your feature or bugfix.
  3. Submit a pull request with a detailed description of your changes.

License

Guidepost is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

Guidepost was developed under the auspices and with funding provided by the National Renewable Energy Laboratory (NREL).


Contact

For questions or feedback, please reach out to the maintainer at [cscullyallison@sci.utah.edu].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guidepost-0.2.7.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

guidepost-0.2.7-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file guidepost-0.2.7.tar.gz.

File metadata

  • Download URL: guidepost-0.2.7.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for guidepost-0.2.7.tar.gz
Algorithm Hash digest
SHA256 ba52d7d5f849aa73434fd0c779c1c49eca9e55706f75b68e62ec79b04dd837bc
MD5 7f1a8f33fe22d850d0bbb16d6f744094
BLAKE2b-256 ffdbbd3e4a682b2deb8054f189b89affd83cf426b4abcdcac532c3379ecce535

See more details on using hashes here.

File details

Details for the file guidepost-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: guidepost-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for guidepost-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 82c78cceb3de8d21b386f676e05779511a4328d5a03f2b40ca8f803b9bf9c9dc
MD5 45da1323d7ee70cea2ae9eac1b48a26f
BLAKE2b-256 aba5927d0c17dbb33c375862de1128e0b825ff6df4274f0aa3b54339f856f7a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page