Skip to main content

Sonar: Tool to profile usage of HPC resources by regularly probing processes

Project description

https://github.com/NordicHPC/sonar/workflows/Test/badge.svg https://img.shields.io/badge/license-%20GPL--v3.0-blue.svg https://badge.fury.io/py/sonar.svg

sonar

Tool to profile usage of HPC resources by regularly probing processes using ps.

Overview

The code can do two things: take snapshots (sonar snap, typically every 20 minutes or so), and map them (sonar map, whenever you like) to applications/projects/users:

$ sonar --help

usage: sonar [-h]  ...

Tool to profile usage of HPC resources by regularly probing processes using ps.

optional arguments:
  -h, --help  show this help message and exit

Subcommands:

    snap      Take a snapshot of the system. Run this on every node and often
              (e.g. every 20 minutes).
    map       Parse the system snapshots and map applications. Run this only
              once centrally and typically once a day.

Run sonar <subcommand> -h to get more information about subcommands.

Authors

Design goals and design decisions

  • Pip installable

  • Minimal overhead for recording

  • Super quick reporting and dashboard, both stdout and csv for web postprocessing

  • Can be used as health check tool

Why ps instead of top? We started using top but it turned out that top is dependent on locale, so it displays floats with comma instead of decimal point in many non-English locales. ps always uses decimal points. In addition, ps is (arguably) more versatile/configurable and does not print the header that top prints. All these properties make the ps output easier to parse than the top output.

Installation

Ideally install into a virtual environment:

$ pip install sonar

If you develop sonar, you can install like this:

$ git clone https://github.com/nordichpc/sonar.git
$ cd sonar
$ virtualenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ flit install --symlink

How to analyze sonar logs

For this run sonar map which will go through the logs, and map processes to applications:

$ sonar map --input-dir /home/user/folder/with/logs

By default you will see data for the past 7 days. But you can change this:

$ sonar map --input-dir /home/user/folder/with/logs --num-days 300

Sonar uses the following mapping files: https://github.com/nordichpc/sonar/tree/master/sonar/mapping

The mapping files (string_map.txt and regex_map.txt) contain a space-separated (does not matter how many spaces) mapping from process to application.

You can use your own mapping files instead:

$ sonar map --input-dir /home/user/folder/with/logs \
            --str-map-file /home/user/my-own-mapping/string_map.txt \
            --re-map-file /home/user/my-own-mapping/regex_map.txt

You are welcome to use your own but encouraged to contribute mappings to https://github.com/nordichpc/sonar/tree/master/sonar/mapping.

You can also export daily, weekly, and monthly CPU load percentages in CSV format for further postprocessing, e.g. using https://github.com/NordicHPC/sonar-web:

$ sonar map --input-dir /home/user/folder/with/logs --export-csv daily
$ sonar map --input-dir /home/user/folder/with/logs --export-csv weekly --num-days 200

Taking snapshots with sonar snap

This is me running sonar snap on a compute node:

$ sonar snap --output-delimiter ","

2019-05-10T17:11:34.585859+0200,c10-4,16,me,sonar,31.0,0,-,-,-,-
2019-05-10T17:11:34.585859+0200,c10-4,16,somebody,vasp.5.3.5,1506.4,5151,someproject,1598301,64,2000M

The columns are: - time stamp - hostname - number of cores on this node - user - process - CPU percentage (this is a 20-core node) - memory used in MB - Slurm project - Slurm job ID - Number of CPUs requested by the job - Minimum size of memory requested by the job

By default they are tab-separated but here I chose to display the result comma-separated. You can also change cutoffs or ignore users to not measure the tool itself (sonar snap --help).

It can be useful to redirect the result to a file:

$ sonar snap >> /home/user/tmp/example.tsv

This is how it looks when I run sonar snap on my laptop (without Slurm):

$ sonar snap --output-delimiter ","

2019-05-11T14:54:16.940502+0200,laptop,4,root,Xorg,0.7,47,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,gnome-shell,0.7,188,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,pulseaudio,0.6,7,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,chromium,16.9,3283,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,fish,0.5,23,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,vim,0.6,7,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,sonar,23.0,23,-,-,-,-
2019-05-11T14:54:16.940502+0200,laptop,4,me,gnome-terminal-,0.9,47,-,-,-,-

Running sonar snap on a cluster

We let cron execute a script every 20 minutes:

10,30,50 * * * * /global/work/sonar/sonar/cron-sonar.sh

The script cron-sonar.sh creates a list of active nodes and executes run-snap.sh on all of these nodes:

#!/bin/bash

SONAR_ROOT="/global/work/sonar"

# get list of all available nodes
/usr/bin/sinfo -h -r -o '%n' > ${SONAR_ROOT}/tmp/list-of-nodes 2> ${SONAR_ROOT}/tmp/list-of-nodes.err

# run sonar snap on all available nodes
/usr/bin/pdsh -w \^${SONAR_ROOT}/tmp/list-of-nodes ${SONAR_ROOT}/sonar/run-snap.sh >> ${SONAR_ROOT}/tmp/pdsh.log 2>> ${SONAR_ROOT}/tmp/pdsh.err

In run-snap.sh we load the Python environment and wrap around sonar snap:

#!/usr/bin/env bash

source /global/work/sonar/python/environment
pyenv shell 3.6.7

source /global/work/sonar/sonar/venv/bin/activate
current_year=$(date +'%Y')
mkdir -p /global/work/sonar/snap-outputs/${current_year}
sonar snap --ignored-users root >> /global/work/sonar/snap-outputs/${current_year}/${HOSTNAME}.tsv

This produces ca. 10 MB data per day.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sonar-0.4.4.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

sonar-0.4.4-py2.py3-none-any.whl (25.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file sonar-0.4.4.tar.gz.

File metadata

  • Download URL: sonar-0.4.4.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.24.0

File hashes

Hashes for sonar-0.4.4.tar.gz
Algorithm Hash digest
SHA256 7d26cf97b7d53e598ae1a440c00191fd046663a5d52c8a596b47bf8fe158ceb2
MD5 21742129e19a58cea85ec0a7e01e368e
BLAKE2b-256 32db4202f8b179f85c46a4bdec170909d225e72f0f3c6f68dff86c2c81677a8d

See more details on using hashes here.

File details

Details for the file sonar-0.4.4-py2.py3-none-any.whl.

File metadata

  • Download URL: sonar-0.4.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.24.0

File hashes

Hashes for sonar-0.4.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 39581b260658e5a9890020eecc9d8f4e7f4334c2ca3ad3cd35bdd76582f4cf9f
MD5 4c964dfd3bcecb885cc63facf9b30c57
BLAKE2b-256 9ce34f94f6190bbb3d5e0ab82066df2a4f44eb8e136bb195bb90ab8feb8b22f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page