Skip to main content

Kwollect framework for metrics collection

Project description

Introduction

Kwollect is a framework for collecting metrics of IT infrastructures (performance, environmental, ...) and make them available to users.

Kwollect targets high frequency collection with lossless & long term storage of metrics and focuses on out-of-band metrics: those not available from computers operating systems, but outside: sensors from PDUs, network devices, BMCs, etc.

Kwollect is designed for integration with Job Schedulers, for instance when deployed in High Performance Computing datacenters.

Design Overview

Kwollect is a framework more than an individual software: It uses as many as "on the shelf" components as possible.

In particular, it relies on a PostgreSQL database (with the TimescaleDB extension) to store every metrics, provides the user API, and deal with the backend "logic".

Some independent programs, called kwollector, collects the metrics from various devices and store them in the database (currently supported protocols are: SNMP, IPMI sensors, OmegaWatt wattmetre).

User interface

Kwollect provides an API to retrieve collected metrics:

curl http://kwollect.host:3000/rpc/get_metrics \
  -H 'content-type: application/json' -X POST \
  -d '{"devices": "node-1,node-2",  "start_time": "2020-01-06 13:35:00", "end_time": "2020-01-06 14:35:00"}'

It also provides a graphical view of metrics. TODO.

As it uses a PostgreSQL database, regular SQL queries can be used:

SELECT timestamp, metrics_id, device_id, values
  FROM metrics_by_device
  WHERE device_id = 'node-1' AND timestamp > now() - interval '1 hour';

Installation

Kwollect package

The kwollect package contains kwollector programs and database setup scripts. To install it, use:

pip3 install kwollect

(TODO a debian package)

Database

Kwollect needs a PostgreSQL database with TimescaleDB extension to store metrics.

For example, use these commands to install them on Debian Buster:

sudo sh -c "echo 'deb https://packagecloud.io/timescale/timescaledb/debian/ `lsb_release -c -s main' > /etc/apt/sources.list.d/timescaledb.list"
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo apt-key add -
sudo apt update
sudo apt-get install -y postgresql postgresql-client timescaledb-postgresql-11 postgresql-plpython3-11

# TimescaleDB comes with a script to tune Postgres configuration that you might want to use:
sudo cp /etc/postgresql/11/main/postgresql.conf /etc/postgresql/11/main/postgresql.conf-timescaledb_tune.backup
sudo timescaledb-tune -yes -quiet
echo 'timescaledb.telemetry_level=off' | sudo tee -a /etc/postgresql/11/main/postgresql.conf
sudo systemctl restart postgresql

Then, you can setup Kwollect database using the kwollect-setup-db tool. It is required to connect to the database with administrator privileges. For instance:

sudo su postgres -s /bin/sh -c "kwollect-setup-db --kwollect_password changeme"

See kwollect-setup-db --help for more options. In particular, chunk_interval_hour should be chosen such as all metrics collected during this period, in hours, fits in the memory available to Postgres (about one quarter of the entire memory, provided that one metric needs 200 bytes approx.)

API

To provide an HTTP API to users to get metrics collected, kwollect uses Postgrest.

These commands may be used to install Postgrest (see website for more info).

wget https://github.com/PostgREST/postgrest/releases/download/v6.0.2/postgrest-v6.0.2-linux-x64-static.tar.xz -O /tmp/postgrest.txz
cd /tmp
tar xf postgrest.txz
sudo mv ./postgrest /usr/local/bin/

Postgrest needs a configuration file. A working configuration file is given by kwollector-setup-db output. It looks like:

db-uri = "postgres://<db_user>:<db_pass>@<db_host>/<db_name>"
db-schema = "api"
db-anon-role = "kwuser_ro"
jwt-secret = "changemechangemechangemechangemechangeme"

(See Postgrest documentation for the options meaning, but no change should be needed).

kwollector-setup-db also outputs an API token that is needed to perform write access to the database.

Finally, don't forget to start Postgres with postgrest <path_to_configuration_file> (TODO: a systemd service file)

Kwollector

The kwollector program collects metrics and stores them in the database. It may run on a any host (provided it can communicate with the database and devices to monitor).

kwollector is available in the kwollect package. Start it with:

kwollector <path_to_configuration_file>

(TODO a systemd service file)

kwollector configuration file should contain:

# Path to directory containing metrics description
metrics_dir: /etc/kwollect/metrics.d/

# Hostname of postgresql server
db_host: localhost 

# Database name
db_name: db_name

# Database user
db_user: kwuser

# Database password
db_password: changeme

# Log level
log_level: warning

(option may also be given on the command line, see kwollect --help)

Description of the metrics to fetch

Metrics are described inside yaml files under <metrics_dir> directory (/etc/kwollect/metrics.d/ by default). For instance, you may have one file per device containing all metrics to fetch on it.

Here is an example of file content for describing metrics of a device node-1:

- name: idrac_power_watth_total
  device_id: node-1
  url: snmp://public@node-1-admin.domain.com/1.3.6.1.4.1.674.10892.5.4.600.60.1.7.1.1
  update_every: 5000

- name: idrac_power_watt
  device_id: taurus-1
  url: snmp://public@node-1-admin.domain.com/1.3.6.1.4.1.674.10892.5.4.600.30.1.6.1.3
  update_every: 5000

Each metric should be described with:

- name:
  device_id:
  url:
  update_every:
  device_alias:
  optional:

Where:

  • name is an unique identifier for this metric (which may be used by several devices)

  • device_id is an identifier for the device from which the metric is collected

  • url specifies how and where to get the metric. Currently, SNMP and IPMI protocols are supported.

    • For SNMP, url must be in the form snmp://<community>@<host_address>/<oid>. For instance:

      snmp://public@node-1-admin.domain.com/1.3.6.1.4.1.674.10892.5.4.600.30.1.6.1.3

    • For IPMI, url must be in the form ipmisensor://<user>:<password>@<host_address>/<id>, where user and password are credentials needed to connect to the device using IPMI protocol, and id is the ID of the sensor to collect (as in the output of ipmi-sensor command). For instance:

      url: ipmisensor://root:calvin@node-1-adminn.domain.com/20

      IPMI protocol needs ipmi-sensor command to be available (on a Debian system, it is available in freeipmi-tools package)

  • Optional update_every specifies the interval between two successive fetch for this metric. Default is 10 seconds.

  • Optional device_alias may be used to record an alternative name for this metric's device (for instance, if you collect a metric related to a port on a network device, you may want to use the device connected to this port as a device alias)`

  • Optional optional field must be set to true if you don't want this metrics to be collected by default (see bellow)

Graphical interface

TODO

Advanced topics

Job scheduler integration

Kwollect may be associated to a Job Scheduler to retrieve metrics associated to a particular job.

To enable job scheduler integration, it is only needed to fill the nodetime_per_job view in Kwollect's PostgreSQL database. The view should return SQL data formatted as:

+------------+--------------------------+
| Column     | Type                     |
|------------+--------------------------+
| job_id     | integer                  |
| start_time | timestamp with time zone |
| stop_time  | timestamp with time zone |
| node       | text                     |
+------------+--------------------------+

Using one line for each node (which will be used as device_id to retrieve metrics) involved in the job job_id which started at start_time and ended at end_time (NULL if the job is still running).

We provide such integration for the OAR job scheduler, where nodetime_per_job is automatically filled by querying the OAR database. The kwollect-setup-db-oar tool is available to perform the setup.

With nodetime_per_job correctly filled, it becomes possible to perform requests on metrics_by_job, e.g.:

SELECT timestamp, device_id, metric_id, value FROM metrics_by_job WHERE job_id = 1234;

It is also possible to provide the "job_id" argument when calling API:

curl http://kwollect.host:3000/rpc/get_metrics -X POST -d 'job_id=1234'

Optional metrics

Kwollect handles collecting some metrics "on-demand", for instance for metrics that don't need to be collected anytime.

These metrics must be configured in kwollector using the optional: true parameter.

Such optional metric will only be collected for a particular device if the corresponding device_id is present in the promoted_metrics table of the Kwollect database.

This table can be filled according to specific needs (for instance, if Kwollect is integrated with a job scheduler, it can be configured to only collect optional metrics for devices associated to some specifically tagged jobs)

(TODO: meilleur exemple, API, revoir format promoted_metrics)

Wattmetre

A specific kwollector, called kwollector-wattmetre, is available to read and store values from OmegaWatt wattmetre. It simply reads output of OmegaWatt wattmetre reading program and stores values in the database. For instance in can be invoked with:

wattmetre-read /dev/ttyUSB0 42 20 | kwollector-wattmetre <path_to_configuration_file>

TODO: configuration file, mapping

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kwollect-0.0.1.tar.gz (19.1 kB view hashes)

Uploaded Source

Built Distribution

kwollect-0.0.1-py3-none-any.whl (17.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page