S.M.A.R.T. Prometheus Metrics Exporter
Project description
S.M.A.R.T. Prometheus Metrics Exporter
smart-prom-next is a Prometheus metric exporter for S.M.A.R.T. values of hard disks. Python and the Linux tool smartctl are used to read out the hard disk values. These are then exposed using Prometheus Python Client over network port 9902.
According to Wikipedia, the primary function of S.M.A.R.T. is to detect and report various indicators of drive reliability with the intent of anticipating imminent hardware failures.
Currently, smart-prom-next is only
available as a docker image.
The base is built from the slim version of the official Python Docker image,
which uses Debian Bullseye.
It is built for multiple platforms:
linux/386, linux/amd64, linux/arm/v5, linux/arm/v7, linux/arm64/v8
The second option is an Alpine based image.
It is built for multiple platforms:
linux/386, linux/amd64, linux/arm/v6, linux/arm/v7, linux/arm64/v8
Configuration Options / Environment Variables
smart-prom-next can be configured by the following environment variables:
PROMETHEUS_METRIC_PORT
- port number over which the Prometheus metrics are exposed (default: 9902)SMART_INFO_READ_INTERVAL_SECONDS
- time interval in seconds at which the SMART values of the hard disk are read (default: 60)
Docker / docker-compose
The images, which are based on Debian Bullseye slim, can be accessed using:
ghcr.io/philipmay/smart-prom-next:<version>-slim-bullseye
or ghcr.io/philipmay/smart-prom-next:latest
The images, which are based on Alpine, can be accessed using: ghcr.io/philipmay/smart-prom-next:<version>-alpine
The latest versions are visible in smart-prom-next GitHub packages.
Below is an example of a complete minimal docker-compose.yml
, how smart-prom-next can be used with docker-compose:
version: "3.0"
services:
smart-prom-next:
# see https://github.com/PhilipMay/smart-prom-next/pkgs/container/smart-prom-next
image: ghcr.io/philipmay/smart-prom-next:latest
container_name: "smart-prom-next"
restart: unless-stopped
privileged: true
ports:
- 9902:9902
The privileged: true
permission is absolutely necessary so that smartctl can also access the hard disks from
within the container.
Security note: In the production environment, you should leave out the ports:
part in the docker-compose.yml
in the vast majority of configurations so that it is not visible to the outside. Instead, the container should
be assigned to a network in which the prometheus container is located. This looks like this:
networks:
- monitor
To adjust the environment variables, the following settings can be added, for example:
environment:
- PROMETHEUS_METRIC_PORT=9009
- SMART_INFO_READ_INTERVAL_SECONDS=120
Available Metrics
smart_prom_smart_status_failed
The SMART health status of the device. A value of 0 indicates a healthy state. A value of 1 means that the device has not passed the health check and there is a problem.
List of labels used (description see below): "device", "type", "model", "serial"
smart_prom_smartctl_exit_status
The exit status (aka exit code or return code) of the smartctl
tool.
Any value other than zero indicates an issue.
A more detailed description can be found in the EXIT STATUS chapter of the
smartctl man pages.
List of labels used (description see below): "device", "type", "model", "serial"
smart_prom_smart_info
The SMART Attributes.
A more detailed description can be found in the -A, --attributes
chapter of the
smartctl man pages.
List of labels used (description see below): "device", "type", "model", "serial", "attr_name", "attr_type", "attr_id"
smart_prom_nvme_smart_info
NVMe specific SMART attributes obtained from
the SMART/Health Information log.
A more detailed description can be found in the -A, --attributes
chapter of the
smartctl man pages.
List of labels used (description see below): "device", "type", "model", "serial", "attr_name"
smart_prom_scsi_smart_info
SCSI specific SMART attributes obtained from
the SMART/Health Information log.
A more detailed description can be found in the -A, --attributes
chapter of the
smartctl man pages.
List of labels used (description see below): "device", "type", "model", "serial", "attr_name", "attr_type"
smart_prom_temperature
The temperature values of the device. These include not only the current temperature but also other values.
List of labels used (description see below): "device", "type", "model", "serial", "temperature_type"
smart_prom_scrape_iterations_total
Counter how often the SMART values were scraped.
Metrics Label
In this project, we use different labels on the metrics. These are described here:
device
- device file, e.g.:/dev/nvme0
,/dev/sda
type
- type of the device, e.g.:ata
,nvme
,usbjmicron
model
- model name, e.g.:KXG6AZNV512G TOSHIBA
,WDC WD3200BEVT-60ZCT0
serial
- serial number, e.g.:WD-WXE708D44703
,Y9SF71LHFWZL
temperature_type
- type of the temperature value, e.g.:current
,power_cycle_max
,lifetime_max
,op_limit_max
attr_name
- SMART attribute name, e.g.:raw_read_error_rate
,reallocated_sector_ct
,critical_warning
attr_id
- SMART attribute id, e.g.:1
,3
,4
attr_type
- type of the respective SMART attribute - value is one of this:value
,worst
,thresh
,raw
,failed_now
,failed_past
- a detailed description can be found in the-A, --attributes
chapter of the smartctl man pages
Prometheus Alerts
Based on the metrics, Prometheus alerts
can be defined. Below are a few suggestions for prometheus_rules.yml
:
groups:
- name: alert_rules
rules:
- alert: DiskFailing
expr: smart_prom_smart_info{attr_type="failed_now"} == 1
labels:
severity: critical
annotations:
summary: "disk failing"
- alert: DiskTemperatureHigh
expr: smart_prom_temperature{temperature_type="current"} > 50
labels:
severity: warning
annotations:
summary: "disk temperature > 50"
- alert: SMARTStatusFailing
expr: smart_prom_smart_status_failed == 1
labels:
severity: critical
annotations:
summary: "SMART status failing"
Release History
2022-07-28 with version 0.0.4
- add additional Alpine based image #40
- fix typo #42
- make Alpine image smaller #45
- add -slim-bullseye suffix to image #44
- improve logs with "error" and "warning" prefix #43
2022-07-27 with pre-release version 0.0.4rc1
2022-07-20 with version 0.0.3
- add scsi disk handling - thanks to Jopaul-John
2022-06-23 with version 0.0.2
- breaking change on
smart_prom_nvme_smart_info
- additional
smart_prom_scrape_iterations_total
metric - more doc
2022-06-20 with pre-release version 0.0.1rc9
- first pre-release
Special Thanks
A special thanks goes to the following contributors:
- @Jopaul-John for his help in adding scsi disk handling
- Michal Harakal (@michalharakal) for the first PR of this project to improve the docker-compose example
- Diego Heras (@ngosang) for his help in adding the Alpine image
Licensing
Copyright (c) 2022 Philip May
Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file smart_prom_next-0.0.6.tar.gz
.
File metadata
- Download URL: smart_prom_next-0.0.6.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8e5683294f829cdd2d061e54109b166715206bd5c54c7714fa2b11cf27105ef |
|
MD5 | 9a64f64c290443c497d204dd8537f069 |
|
BLAKE2b-256 | d9cf0114a477a194cdc6f9f92a758eee81ad8ea7d20654de48241ddb29d84620 |
File details
Details for the file smart_prom_next-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: smart_prom_next-0.0.6-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 668afff31fe2422934f24e9e0bcb667b90f8b4ce9b4912722dee2da21dc4bb01 |
|
MD5 | efbad81c8086cda83c93aabc4aca99f9 |
|
BLAKE2b-256 | ac6086c8a24bad5c69db7eb2d8e21de16db34708984abe8c0b372335a0b59e31 |