Skip to main content

No project description provided

Project description

codecov

Resource Monitor

This package contains utilities to monitor system resource utilization (CPU, memory, disk, network).

Here are the ways you can use it:

  • Monitor resource utilization for a compute node for a given set of resource types and process IDs.
  • Start a process and monitor its resource utilization.
  • Monitor resource utilization for a compute node asynchronously with the ability to dynamically change the resource types and process IDs being monitored.
  • Produce JSON reports of aggregated metrics.
  • Produce interactive HTML plots of the statistics.

Installation

  1. Create a Python virtual environment and activate it. Adjust as necessary if using Windows.
$ python -m venv ~/python-envs/rmon
$ source ~/python-envs/rmon/bin/activate
  1. Install the package.
$ pip install rmon
  1. Optionally, install jq by following instructions at https://jqlang.github.io/jq/download/.

Usage

CLI tool to monitor resource utilization

This command will monitor CPU, memory, and disk utilization every second and then plot the results whenever the user terminates the application.

$ rmon collect --cpu --memory --disk -i1 --plots -n run1

View the results in a table:

$ sqlite3 -table stats-output/run1.sqlite "select * from cpu"
$ sqlite3 -table stats-output/run1.sqlite "select * from memory"
$ sqlite3 -table stats-output/run1.sqlite "select * from disk"

This command will monitor CPU and memory utilization for specific process IDs and then plot the results whenever the user terminates the application.

rmon collect -i1 --plots -n run1 PID1 PID2 ...

View the results in a table:

$ sqlite3 -table stats-output/run1.sqlite "select * from process"

View min/max/avg metrics:

$ jq -s . stats-output/run1_results.json

Refer to rmon collect --help to see all options.

CLI tool to start a process and monitor its resource utilization

$ rmon monitor-process -i1 --plots python my_script.py ARGS [OPTIONS]

Use the stame steps above to view results.

CLI tool to monitor resource utilization with dynamic changes

This command will monitor CPU, memory, and disk utilization every second. It will present user prompts that allow you to change what is being monitored. It will plot the results when you select the exit command.

rmon collect -i1 --plots -n run1 --interactive PID1 PID2 ...

You can use this asynchronous functionality in your own application if you are controlling the processes being monitored. Refer to resource_monitor/cli/collect.py for example code. Search for run_monitor_async.

Collect stats for all compute nodes in an HPC job

The directory contains some example scripts that can be deployed in a Slurm job to collect stats for your compute nodes.

  1. Copy collect_stats.sh and wait_for_stats.sh to your HPC runtime directory.

  2. Modify collect_stats.sh such that it loads the environment containing rmon.

  3. Modify collect_stats.sh with your desired options for rmon collect.

  4. Modify your sbatch script with relevant lines from batch_job.sh.

The following will occur when Slurm runs your job:

  • Ensure that the file shutdown does not exist.
  • Start rmon collect as a background operation.
  • Run your job.
  • Create the file shutdown. That will trigger collect_stats.sh to stop.
  • Gracefully shut down rmon and generate plots.

Code timings

Refer to this page for instructions on how to collect timing statistics of targeted functions.

License

rmon is released under a BSD 3-Clause license.

Software Record

This package is developed under NREL Software Record SWR-24-128.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rmon-0.4.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rmon-0.4.0-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file rmon-0.4.0.tar.gz.

File metadata

  • Download URL: rmon-0.4.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rmon-0.4.0.tar.gz
Algorithm Hash digest
SHA256 706e1f8b9ff6f471410d7fc5c4c4786e9b8d35faf80cb3dffc231db4e5e39351
MD5 0845611312278f6aeb63ca05387c068f
BLAKE2b-256 0ba7b8198e19e08957510386e62c516098826b9d0d6e5c95e68ec519cc1a7859

See more details on using hashes here.

Provenance

The following attestation bundles were made for rmon-0.4.0.tar.gz:

Publisher: publish_to_pypi.yml on NREL/resource_monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rmon-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: rmon-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rmon-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a13732c8dce0282affdddf477899213756ac7e305c5419fab46bccfd4149ceae
MD5 2f908753c483eb3ef73b96253075f095
BLAKE2b-256 ef54db301e503e0a7314d7531f8d8f7d4cde746fdca902cfc5275191630f84ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for rmon-0.4.0-py3-none-any.whl:

Publisher: publish_to_pypi.yml on NREL/resource_monitor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page