A simple resource monitoring tool, that can be used to test availability of HTCondor resources.

These details have not been verified by PyPI

Project description

Resource Availability Monitoring (ram) tool

This lightweight tool can be used to monitor the availability of resources and services via HTCondor. User-defined jobs are submitted in regular intervals, and their results are collected and written to an influxdb database. The service is designed to run as a systemd service, and can be configured extensively.

Installation

The tool is installed via pip:

pip install resource-availability-monitoring

Afterwards, the service can be started via

ram-cli

Configuration

Per default, ram comes without a configuration, however, a default configuration can be generated via

ram-cli --initialize --configdir /path/to/your/configdir

Job Configuration

The configuration has to be adjusted to the user's needs. The main configuration contains a list of defined testjobs. Each testjob configuration contains the following fields:

jobs:
  - name: "default" # Name of the testjob
    parameters:
      enabled: true # The testjob will only be executed if enabled is set to true
      description: "Default Test" # Description of the testjob
      site: "Default" # The name of the site to be monitored
      interval: 1200 # after each interval, a new job is submitted (in seconds)
      timeout: 1200 # maximum time the testjob has to finish (in seconds)
      job:
        executable: "default.sh" # The executable to be run, has to be located in <configdir>/<name of job>/
        AccountingGroup: "group" # The accounting group to be used by HTCondor
        arguments: "" # Arguments to be passed to the executable
        output_file: "job_result.yaml" # The file to which the result of the job is written
        output: "default.out" # The file to which the stdout of the job is written
        error: "default.err" # The file to which the stderr of the job is written
        log: "default.log" # The file to which the HTCondor of the job is written
        universe: "vanilla" # The universe to be used by HTCondor
        docker_image: "" # The docker image to be used by HTCondor (only if universe is set to docker)
      requirements: '' # Requirements to be passed to HTCondor
        cpu: 1 # The number of CPUs to be used by the job
        memory: 1000 # The amount of memory to be used by the job (in MB)
        disk: 100000 # The amount of disk space to be used by the job (in KB)
        gpu: 0 # The number of GPUs to be used by the job
        requirements: '' # Additional requirements to be passed to HTCondor, e.g. "OpSysMajorVer == 7"

In addition, the job executable has to be located in <configdir>/<name of job>/. The executable has to be a shell script, and has to return a yaml file with the following structure:

tests:
  - test: "default_test"
    passed: True
    message: "default_test passed"

A testjob can contain multiple tests, and each test has to contain the fields test, passed, and message. A testjob is considered to have passed if all tests have passed and the job has finished successfully. Within the shell script, the tests can be implemented as needed, and the results have to be written to the yaml file.

InfluxDB Configuration

The Influxdb parameters are stored in a separate configuration file, and contain the following fields:

url: ""
token: ""
bucket: ""
org: ""

Set all parameters to the correct values to enable the writing of the results to the Influxdb database. If you do not want to use an Influxdb database, run the service with the --no-influxdb flag. This will disable the writing of the results to the Influxdb database.

Usage

All command line options can be displayed via

ram-cli -h
usage: __main__.py [-h] --workdir WORKDIR --configdir CONFIGDIR
                   [--config-file CONFIG_FILE]
                   [--influxdb-config-file INFLUXDB_CONFIG_FILE]
                   [--job-db-file JOB_DB_FILE] [--log-file LOG_FILE]
                   [--initialize] [--check] [--no-influxdb]

optional arguments:
  -h, --help            show this help message and exit
  --workdir WORKDIR     Directory to store job results, job logs and job
                        database
  --configdir CONFIGDIR
                        Directory to store configuration files and job scripts
  --config-file CONFIG_FILE
                        Path to the configuration file for the jobs, default
                        is <configdir>/config.yml
  --influxdb-config-file INFLUXDB_CONFIG_FILE
                        Path to the InfluxDB configuration file, default is
                        <configdir>/influx_parameters.yml
  --job-db-file JOB_DB_FILE
                        Path to the job database file, default is
                        <workdir>/jobs.sqlite3
  --log-file LOG_FILE   Path to the log file, default is <workdir>/remote-
                        testsuite.log
  --initialize          Initialize the tool with default configuration
  --check               Check if the given configuration is valid and exit
  --no-influxdb         Do not write to InfluxDB, only run the jobs

After the configuration has been adjusted, the configuration and Influxdb parameters can be tested via

ram-cli --configdir /path/to/your/configdir --workdir /path/to/your/workdir --check

Recommended arguments are:

ram-cli --configdir /path/to/your/configdir --workdir /path/to/your/workdir

Systemd Service

To run the service as a systemd service, some best practices should be followed. The service should be run as a dedicated user, and the configuration and workdir should be owned by this user. After the user is created, setup a python venv, where the package is installed:

python3 -m venv /path/to/your/venv
source /path/to/your/venv/bin/activate
pip3 install resource-availability-monitoring

The service file should be located in /etc/systemd/system/, and should contain the following content and be named resource-availability-monitoring.service:

[Unit]
Description=Resource Availability Monitoring Service
After=network.target
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
User=ram
Group=ram
LimitNOFILE=65536
WorkingDirectory=/path/to/your/workdir
ExecStart=/path/to/your/venv/bin/python3 -m resource_availability_monitoring --configdir /path/to/your/configdir --workdir /path/to/your/workdir
Restart=on-failure
RestartSec=300s

After the service file has been created, the service can be started via

systemctl start resource-availability-monitoring

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.8

Feb 23, 2024

0.2.7

Feb 23, 2024

0.2.5

Feb 23, 2024

0.2.1

Feb 15, 2024

0.1.0

Feb 15, 2024

0.1.0a0 pre-release

Feb 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resource_availability_monitoring-0.2.8.tar.gz (27.4 kB view details)

Uploaded Feb 23, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

resource_availability_monitoring-0.2.8-py3-none-any.whl (29.4 kB view details)

Uploaded Feb 23, 2024 Python 3

File details

Details for the file resource_availability_monitoring-0.2.8.tar.gz.

File metadata

Download URL: resource_availability_monitoring-0.2.8.tar.gz
Upload date: Feb 23, 2024
Size: 27.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for resource_availability_monitoring-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`9fae76d07c69c312823dc18afbe8d898b445eb7c7433738ef3005567bf39aaf3`
MD5	`9b363a1660468e4635489dafd1aa6662`
BLAKE2b-256	`6690cb499bd136fc5d18c29f3783cc3b8823530de0755600b66240b186a55778`

See more details on using hashes here.

File details

Details for the file resource_availability_monitoring-0.2.8-py3-none-any.whl.

File metadata

Download URL: resource_availability_monitoring-0.2.8-py3-none-any.whl
Upload date: Feb 23, 2024
Size: 29.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for resource_availability_monitoring-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c0fb55593ee31d20bdba5a74acfc915a84b67b2c0be557c188ef3673637568ed`
MD5	`ca3bcc42d0ec09f37463347ce3d35cd0`
BLAKE2b-256	`213755fb3bfe42b1f4b41cd5178c28d1ffcdd1d5f340d462d60cbaff4841f2a0`

See more details on using hashes here.

resource-availability-monitoring 0.2.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Resource Availability Monitoring (ram) tool

Installation

Configuration

Job Configuration

InfluxDB Configuration

Usage

Systemd Service

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes