Skip to main content

Identify and reduce instances of underutilization by the users of high-performance computing systems

Project description

Tests codecov PyPI License: GPL v2 DOI

Job Defense Shield

Job Defense Shield is a software tool for identifying and reducing instances of underutilization by the users of high-performance computing systems. The software can (1) send automated email alerts to users, (2) create reports for system administrators, and (3) automatically cancel GPU jobs at 0% utilization. Job Defense Shield is a component of the Jobstats job monitoring platform.

Below is an example report for 0% GPU utilization:

                         GPU-Hours at 0% Utilization
---------------------------------------------------------------------
    User   GPU-Hours-At-0%  Jobs             JobID             Emails
---------------------------------------------------------------------
1  u12998        308         39   62285369,62303767,62317153+   1 (7)
2  u9l487         84         14   62301737,62301738,62301742+   0         
3  u39635         25          2            62184669,62187323    2 (4)         
4  u24074         24         13   62303182,62303183,62303184+   0         
---------------------------------------------------------------------
   Cluster: della
Partitions: gpu, llm
     Start: Wed Feb 12, 2025 at 09:50 AM
       End: Wed Feb 19, 2025 at 09:50 AM

Below is an example email to a user that is requesting too much CPU memory:

Hi Alan (u12345),

Below are your jobs that ran on the Stellar cluster in the past 7 days:

     JobID   Memory-Used  Memory-Allocated  Percent-Used  Cores  Hours
    5761066      2 GB          100 GB            2%         1     48
    5761091      4 GB          100 GB            4%         1     48
    5761092      3 GB          100 GB            3%         1     48

It appears that you are requesting too much CPU memory for your jobs since
you are only using on average 3% of the allocated memory. For help on
allocating CPU memory with Slurm, please see:

    https://your-institution.edu/knowledge-base/memory

Replying to this automated email will open a support ticket with Research
Computing.

Getting Started

See the documentation for installing and running the software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

job_defense_shield-1.2.5.tar.gz (111.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

job_defense_shield-1.2.5-py3-none-any.whl (90.7 kB view details)

Uploaded Python 3

File details

Details for the file job_defense_shield-1.2.5.tar.gz.

File metadata

  • Download URL: job_defense_shield-1.2.5.tar.gz
  • Upload date:
  • Size: 111.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for job_defense_shield-1.2.5.tar.gz
Algorithm Hash digest
SHA256 d95a3bbb1a7571d3e69ebaf03bb1fffa0859e43114f7a51d80bceb43682441f5
MD5 95d4a9efd30697bf0c2c862f343a6c02
BLAKE2b-256 dcff81a628457140532c02f138f1ad5e92866eddea3c39f357408f877939b1a6

See more details on using hashes here.

File details

Details for the file job_defense_shield-1.2.5-py3-none-any.whl.

File metadata

File hashes

Hashes for job_defense_shield-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bec7410197b48d47ae7070e49017279c9f43e237942311571d820b55a152a586
MD5 6f4129687800f48cb9d377346fa7f90c
BLAKE2b-256 ed0743892e25053c0c48b807c591e78b6a13769a7cadb9f99996aea6220a42aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page