Utilities for HPC clusters including PBS/Slurm job statistics and a psutil-based (CPU/GPU/Memory) resource monitor.
Project description
hpc-stats-scripts
Utilities for working with high-performance computing (HPC) environments. The scripts help inspect PBS/Slurm job efficiency and monitor CPU/GPU and memory usage on a running system or process tree.
Made with Codex :)
Dependencies
Install the required Python packages with pip:
| Feature | Packages | Install command |
|---|---|---|
| Core utilities | psutil | pip install psutil |
Plotting for psutil-monitor |
matplotlib, numpy | pip install matplotlib numpy |
GPU monitoring for psutil-monitor --gpu |
nvidia-ml-py3 (pynvml) | pip install nvidia-ml-py3 |
| Plot + GPU combo | psutil, matplotlib, numpy, nvidia-ml-py3 | pip install psutil matplotlib numpy nvidia-ml-py3 |
| All extras via pip extras | plot + GPU | pip install .[all] |
The pbs-bulk-user-stats command also expects the PBS qstat utility to be
available in your environment.
The slurm-bulk-user-stats command expects Slurm's sacct utility to be
available in your environment.
Installation
Clone the repository and install with pip:
Install from PyPI (once published):
```bash
pip install hpc-stats-scripts
pip install hpc-stats-scripts[plot]
pip install hpc-stats-scripts[gpu]
pip install hpc-stats-scripts[all]
Or install directly from GitHub (the latest version):
pip install "hpc-stats-scripts @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
pip install "hpc-stats-scripts[plot] @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
pip install "hpc-stats-scripts[gpu] @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
pip install "hpc-stats-scripts[all] @ git+https://github.com/avnikonenko/hpc-stats-scripts.git"
The base installation depends on psutil.
The plot extra pulls in matplotlib and numpy for the --plot feature of psutil-monitor.
The gpu extra installs nvidia-ml-py3 to enable --gpu.
CLI tools
pbs-bulk-user-stats
Summarize CPU and memory usage for PBS jobs and show which nodes the jobs are
allocated to. The command relies on qstat being available in your PATH.
The table now includes NGPUS (requested GPUs) when present.
Examples:
# Summarize a specific job and write CSV output
pbs-bulk-user-stats --job 12345 --csv stats.csv
# Summarize all jobs for the current user (default)
pbs-bulk-user-stats --include-finished
# Summarize all jobs for a specific user
pbs-bulk-user-stats --user myuser --include-finished
# Include finished jobs but only keep 20 finished entries
pbs-bulk-user-stats --user myuser --include-finished --finished-limit 20
# Faster best-effort mode: fetch active jobs + only N finished job details (use if default option takes too long)
pbs-bulk-user-stats --user myuser --include-finished --finished-limit 20 --finished-limit-strategy fetch
When invoked with no --user or --job options:
- On a login node (no
$PBS_JOBIDpresent), it summarizes all jobs for the current user. - Inside a running PBS job (where
$PBS_JOBIDis set), it automatically summarizes that specific job.
pbs-bulk-user-stats
State codes (PBS):
Rrunning,Qqueued/waiting,Xfinished (requires--include-finished), other codes are printed under “other” in the summary.
Expected output (CPU/RAM only):
$ pbs-bulk-user-stats
JOBID STATE NAME NODES NCPUS WALL(h) CPUT(h) avgCPU CPUeff memUsed memReq memEff
-------------------------------------------------------------------------------------------------------
0001 R run1 pbs-1 176 38.55 3632.12 163.6 93.53% 207.4 GiB 256.00 GiB 81.10%
0002 R run2 pbs-2 176 38.59 3589.72 93.13 52.91% 50.02 GiB 256.00 GiB 19.54%
...
Summary:
jobs: 5
unique nodes: 3
states: R=4 Q=1 X=0 other=0
mean CPUeff: 75.20%
mean avgCPU: 132.35
mean memEff: 82.50%
max memUsed: 230.16 GiB
or if run inside a running PBS:
JOBID STATE NAME NODES NCPUS WALL(h) CPUT(h) avgCPU CPUeff memUsed memReq memEff
-----------------------------------------------------------------------------------------------------
0001 R STDIN pbs-5 100 0.03 0.01 0.22 0.22% 666.58 MiB 30.00 GiB 2.17%
Summary:
jobs: 1
mean CPUeff: 0.22%
mean avgCPU: 0.22
mean memEff: 2.17%
max memUsed: 666.58 MiB
After the table, a summary reports the job count, mean CPU efficiency, mean average CPU usage, mean memory efficiency, and the peak memory used across all listed jobs.
psutil-monitor
Real-time CPU and memory monitor for the system or a process tree.
Use --gpu to also report aggregate GPU utilization and memory via NVML (requires nvidia-ml-py3).
When --csv/--plot are used, metrics stream live to the terminal during the run; CSV/PNG files are written when the monitor exits (Ctrl+C, duration reached, or proc tree ends).
Example output files (generated with --plot and --csv):
-
Plot (CPU + GPU stacked):
-
CSV:
docs/psutil-monitor-example.csv
GPU output fields (when --gpu is used):
- GPU util: Average utilization across visible GPUs.
- busyGPUs: Sum of utilization fractions (e.g., two GPUs at 50% each → 1.0).
- GPU mem %: Aggregate GPU memory usage percentage.
- Per-GPU (CSV
gpu_pergpu):index:util%/used/totalfor each device.
Examples:
# System-wide (by default) monitoring with console output only
psutil-monitor
# System-wide monitoring with CSV and PNG output
psutil-monitor --mode system --csv node.csv --plot node.png
# Monitor the current process tree (useful inside a PBS job)
psutil-monitor --mode proc --pid $$ --include-children --csv job.csv
# For script.py resources monitoring:
python script.py & # launch the workload
target=$! # PID of script.py
echo $target
# psutil-monitor watches that PID and exits when the process tree is gone
psutil-monitor --mode proc --pid "$target" --include-children --csv stat.csv --plot plot.png
Expected output:
$ psutil-monitor
CPUs available (affinity): 384
Total memory available: 754.76 GiB
CPU basis for %: 384
Memory basis for %: 754.76 GiB
2025-08-14T15:20:14 CPU 79.67% busyCPUs 305.93 (provided 384) MEM 9.93% used 74.96 GiB / total 754.76 GiB
2025-08-14T15:20:16 CPU 69.30% busyCPUs 266.13 (provided 384) MEM 9.95% used 75.12 GiB / total 754.76 GiB
2025-08-14T15:20:18 CPU 61.34% busyCPUs 235.53 (provided 384) MEM 10.05% used 75.82 GiB / total 754.76 GiB
2025-08-14T15:20:20 CPU 61.32% busyCPUs 235.47 (provided 384) MEM 10.09% used 76.15 GiB / total 754.76 GiB
2025-08-14T15:20:22 CPU 74.57% busyCPUs 286.33 (provided 384) MEM 9.94% used 74.99 GiB / total 754.76 GiB
2025-08-14T15:20:24 CPU 85.94% busyCPUs 330.01 (provided 384) MEM 9.86% used 74.44 GiB / total 754.76 GiB
Average busy CPUs over run: 276.570
Peak memory (system): 76.15 GiB
With GPUs (--gpu and NVIDIA GPUs present):
$ psutil-monitor --gpu
CPUs available (affinity): 96
Total memory available: 503.70 GiB
CPU basis for %: 96
Memory basis for %: 503.70 GiB
GPUs detected (NVML): 4
2026-02-03T10:00:14 CPU 45.12% busyCPUs 43.32 (provided 96) MEM 8.10% used 40.80 GiB / total 503.70 GiB GPU util 57.5% busyGPUs 2.30 mem 42.0%
2026-02-03T10:00:16 CPU 48.33% busyCPUs 46.39 (provided 96) MEM 8.20% used 41.30 GiB / total 503.70 GiB GPU util 63.0% busyGPUs 2.52 mem 44.1%
2026-02-03T10:00:18 CPU 52.10% busyCPUs 49.99 (provided 96) MEM 8.25% used 41.60 GiB / total 503.70 GiB GPU util 68.7% busyGPUs 2.75 mem 45.3%
Average busy CPUs over run: 46.567
Average busy GPUs over run: 2.523
Peak memory (system): 41.60 GiB
Use the --help option of each command to see all available options.
slurm-bulk-user-stats
Summarize CPU and memory usage for Slurm jobs and show which nodes the jobs are
allocated to. The command relies on sacct being available in your PATH.
The table includes NGPUS based on AllocTres/AllocGRES when present.
If TRES GPU usage metrics are available, the summary also reports mean GPU util and GPU hours (used/requested).
State codes (Slurm):
R/RUNNING,PD/PENDING,CD/COMPLETED; other states (e.g.,F,CG,S,TO) are grouped under “other” in the summary and listed in the breakdown.
Examples:
# Summarize a specific job and write CSV output
slurm-bulk-user-stats --job 12345 --csv stats.csv
# Summarize all running jobs for the current user (default)
slurm-bulk-user-stats
# Summarize all jobs (including finished) for a specific user
slurm-bulk-user-stats --user myuser --include-finished
# Include finished jobs but only keep 20 finished entries
slurm-bulk-user-stats --user myuser --include-finished --finished-limit 20
# Faster best-effort mode: fetch active jobs + only N finished job details
slurm-bulk-user-stats --user myuser --include-finished --finished-limit 20 --finished-limit-strategy fetch
When invoked with no --user or --job options:
- On a login node (no
$SLURM_JOB_IDpresent), it summarizes pending/running jobs for the current user. - Inside a running Slurm job (where
$SLURM_JOB_IDis set), it automatically summarizes that specific job.
slurm-bulk-user-stats
The output mirrors the PBS version, showing job state, node list, CPU/memory usage, efficiency metrics, and a summary block with job counts and averages.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file hpc_stats_scripts-1.3.tar.gz.
File metadata
- Download URL: hpc_stats_scripts-1.3.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da17686ed80bd67c2f09a040f8889d274fab0d69b4ca41c23acd35a872759f51
|
|
| MD5 |
d11e6205b945c40e111f6ca129997074
|
|
| BLAKE2b-256 |
09c6b4abd2e21b6d31c96680d196afd20725e7ace605661eb472ee3094bbedc8
|