Skip to main content

Nagios / Icinga monitoring plugin to check systemd for failed units.

Project description

pypi.org Build Status

check_systemd

check_systemd is a Nagios / Icinga monitoring plugin to check systemd. This Python script will report a degraded system to your monitoring solution. It can also be used to monitor individual systemd services (with the -u, --unit parameter) and timers units (with the -t, --dead-timers parameter). The only dependency the plugin needs is the Python library nagiosplugin.

Installation

pip3 install check_systemd

Packages

Command line interface

usage: check_systemd [-h] [-u UNIT | -e UNIT] [-n] [-w SECONDS] [-c SECONDS]
                     [-t] [-W SECONDS] [-C SECONDS] [-i] [-v] [-V]

Copyright (c) 2014-18 Andrea Briganti <kbytesys@gmail.com>
Copyright (c) 2019-21 Josef Friedrich <josef@friedrich.rocks>

Nagios / Icinga monitoring plugin to check systemd.

optional arguments:
  -h, --help            show this help message and exit
  -u UNIT, --unit UNIT  Name of the systemd unit that is being tested.
  -e UNIT, --exclude UNIT
                        Exclude a systemd unit from the checks. This option can
                        be applied multiple times, for example: -e mnt-
                        data.mount -e task.service. Regular expressions can be
                        used to exclude multiple units at once, for example: -e
                        'user@\d+\.service'. For more informations see the
                        Python documentation about regular expressions
                        (https://docs.python.org/3/library/re.html).
  -n, --no-startup-time
                        Don’t check the startup time. Using this option the
                        options '-w, --warning' and '-c, --critical' have no
                        effect. Performance data about the startup time is
                        collected, but no critical, warning etc. states are
                        triggered.
  -w SECONDS, --warning SECONDS
                        Startup time in seconds to result in a warning status.
                        Thedefault is 60 seconds.
  -c SECONDS, --critical SECONDS
                        Startup time in seconds to result in a critical status.
                        Thedefault is 120 seconds.
  -t, --dead-timers     Detect dead / inactive timers. See the corresponding
                        options '-W, --dead-timer-warning' and '-C, --dead-
                        timers-critical'. Dead timers are detected by parsing
                        the output of 'systemctl list-timers'. Dead timer rows
                        displaying 'n/a' in the NEXT and LEFTcolumns and the
                        time span in the column PASSED exceeds the values
                        specified with the options '-W, --dead-timer-warning'
                        and '-C, --dead-timers-critical'.
  -W SECONDS, --dead-timers-warning SECONDS
                        Time ago in seconds for dead / inactive timers to
                        trigger a warning state (by default 6 days).
  -C SECONDS, --dead-timers-critical SECONDS
                        Time ago in seconds for dead / inactive timers to
                        trigger a critical state (by default 7 days).
  -i, --ignore-inactive-state
                        Ignore an inactive state on a specific unit. Oneshot
                        services for example are only active while running and
                        not enabled. The rest of the time they are inactive.
                        This option has only an affect if it is used with the
                        option -u.
  -v, --verbose         Increase output verbosity (use up to 3 times).
  -V, --version         show program's version number and exit

Performance data:
  - count_units
  - startup_time
  - units_activating
  - units_active
  - units_failed
  - units_inactive

Project pages

Behind the scenes

To detect failed units this monitoring script runs:

systemctl list-units --all

To get the startup time it executes:

systemd-analyze

To check a specific unit (-u, --unit) this command is executed:

systemctl is-active <unit-name>

To find dead timers this plugin launches:

systemctl list-timers --all

To learn how systemd produces the text output on the command line, it is worthwhile to take a look at systemd’s source code. Files relevant for text output are: basic/time-util.c, analyze/analyze.c.

Testing

pyenv install 3.6.12
pyenv install 3.7.9
pyenv local 3.6.12 3.7.9
pip3 install tox
tox

Deploying

Edit the version number in check_systemd.py (without v). Use the -s option to sign the tag (required for the Debian package).

git tag -s v2.0.11
git push --tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

check_systemd-2.3.1.tar.gz (21.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page