Skip to main content

Nagios plugin to monitor cloud costs in the StackIT cloud.

Project description

StackIt Cost Monitoring

Overview

This repository contains a Nagios Plugin to monitor costs in the StackIT cloud.

Why?

A common problem with compute intensive cloud projects is that expensive resources like GPU nodes may consume the available budget quickly if left behind even if unused, e.g.:

  • A node with eight H100 cards may cost 60 €/h
  • Or 43.000 €/month

To prevent this, a cleanup is needed after each compute run. If this is forgotten or fails (both have been seen in the wild), that needs to be discovered early. Checking the monthly bill may be too late.

For this reason it is recommended to set up alarms to detect phases of high ressource usage. In e.G. AWS this can be done by using Billing Alarms. For StackIT no such feature exists but can be implemented by using the StackIT Cost API.

General Approach

Nagios Plugin: This example assumes that a system is available that can genrate alarms and distribute them via suitable channels such as email, chat systems, or SMS. Many such systems can either use Nagios plugins (aka Nagios checks) directly (CheckMK, Icinga2) or at least supply instructions on how to integrate or adapt them. So this example just implements a Nagios plugin based on the StackIT Cost API and explains how to configure it.

Yesterday's Costs: It seems that costs are only reported with a daily granularity. Also, during the night the last day's cost may not be available. So the costs for the last two days are requested via the API, and the latest data available is used. For data boundaries UTC time is used.

Prerequisites

To access the API, one needs to create a suitable service account and assign the required permissions:

  1. Create an account using the WebGUI, item IAM AND MANAGEMENT / Service Accounts / + Create Service Account. Make sure that you save the credentials before closing the form.
  2. Use IAM AND MANAGEMENT / Access / + Grant Access to assign the cost-management.cost.reader role to that service account.
  3. Set up the credentials on your monitoring system in the same way as for a user who wants to use the StackIT CLI. The tool expects to find the file ~/.stackit/sa-key.json. That may be overwritten by supplying the --sa-key-json option.

Installation

The script uses Python 3 — most likely any version newer than 3.10 will work. Best use a separate virtual environment:

python3 -m venv /usr/local/lib/venvs/stackit_monitoring
. /usr/local/lib/venvs/stackit_monitoring/bin/activate
pip install stackit_cost_monitoring

To use the plugin with the Icinga example below, one will have to link /usr/local/lib/venvs/stackit_monitoring/bin/check_stackit_costs to /usr/lib/nagios/plugins.

Usage

$ check_stackit_costs --help
usage: check_stackit_costs [-h] --customer-account-id CUSTOMER_ACCOUNT_ID --project-id PROJECT_ID [-w WARNING] [-c CRITICAL] [--sa-key-json SA_KEY_JSON] [--skip-discounts] [--api-log-file API_LOG_FILE]

Nagios plugin to monitor StackIT costs. The higher value of the cost of the present day (always 0?) and yesterday is used.

options:
  -h, --help            show this help message and exit
  --customer-account-id CUSTOMER_ACCOUNT_ID
                        StackIT customer account ID
  --project-id PROJECT_ID
                        StackIT project ID
  -w, --warning WARNING
                        Warning threshold for 24h cost in EUR (default: 10.00)
  -c, --critical CRITICAL
                        Critical threshold for 24h cost in EUR (default: 50.00)
  --sa-key-json SA_KEY_JSON
                        Path to StackIT credentials in JSON format (default: /home/wilhelmh/.stackit/sa-key.json)
  --skip-discounts      Skip discounted costs in calculation.
  --api-log-file API_LOG_FILE
                        Optional path to file where the API requests and responses will be logged.

Icinga Example

Command definition:

object CheckCommand "stackit_costs" {
  command = [ PluginDir + "/check_stackit_costs" ]
  arguments = {
    "--customer-account-id" = {
      description = "ID of the StackIt Cloud account to monitor (required)"
      value = "$stackit_account_id$"
    }
    "--project-id" = {
      description = "ID of the StackIt Cloud project to monitor (required)"
      value = "$stackit_project_id$"
    }
    "--sa-key-json" = {
      description = "JSON file with StackIt service account data to use (required)"
      value = "$stackit_credential_path$"
    }
    "-w" = {
      description = "Warning threshold for daily costs in EUR"
      value = "$stackit_warning_eur$"
    }
    "-c" = {
      description = "Critical threshold for daily costs in EUR"
      value = "$stackit_critical_eur$"
    }
    "--api-log-file" = {
      description = "Path to file where the API requests and responses will be logged"
      value = "$stackit_api_log_file$"
    }
  }
  vars.stackit_warning_eur = 2.0
  vars.stackit_critical_eur = 10.0
}

It makes no sense to check the costs every minute, as the data is only updated daily. To recover form errors easier, we check hourly:

apply Service "COST-test" {
  import "generic-service"
  check_interval = 1h   
  retry_interval = 1h
  check_command = "stackit_costs"
  vars += {
    stackit_account_id = "..."
    stackit_project_id = "..."
    stackit_credential_path = "/var/lib/nagios/.stackit/cost-monitoring-test.json"
    stackit_warning_eur = "1.0"
    stackit_critical_eur = "10.0"
  }
  assign where host.name == "..."   # Does not really matter ...
}

Remarks

Unfortunately, when this code was written, no Python bindings existed for the StackIT Cost API. So we decided to manually code the API call.

It seems that service accounts are always tied to a project. Using a user for monitoring seems unsuitable for security reasons. So this plugin only can monitor one project at a time.

Resources

Acknowledgements

This tool was created by and with resources of University Hospital Heidelberg, Klaus Tschira Institute for Computational Cardiology. StackIt GmbH supported it by supplying the StackIT API and helping with some of the dirty details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stackit_cost_monitoring-0.8.1.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stackit_cost_monitoring-0.8.1-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file stackit_cost_monitoring-0.8.1.tar.gz.

File metadata

  • Download URL: stackit_cost_monitoring-0.8.1.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.5 Linux/6.12.63+deb13-amd64

File hashes

Hashes for stackit_cost_monitoring-0.8.1.tar.gz
Algorithm Hash digest
SHA256 ab7fb75a838eda3bc056f8132224118cfde0eb9b21a08ad509cad955fca56749
MD5 8f3a02234176a6202040bfac79217ea6
BLAKE2b-256 5bebe2b9bedd59061a672480826aa0be580c0a7ed80f1cef09cf1df2803df212

See more details on using hashes here.

File details

Details for the file stackit_cost_monitoring-0.8.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stackit_cost_monitoring-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d44542e45da160d3343c1639e908b9a8997e2fd9b2e72b9e4c0b97d0ce5ce83f
MD5 cf7574e8be64c8a1c00ebd426202748b
BLAKE2b-256 9ccf97e8a601a0dd5c0d433651565a0ebf618e608459439dff82a5458d9c31d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page