Skip to main content

Standalone Graphite metric data collectors for various stuff thats not (or poorly) handled by other monitoring daemons

Project description

graphite-metrics: standalone graphite collectors for various stuff not (or
poorly) handled by other monitoring daemons

Core of the project is a simple daemon (harvestd), which collects
metric values and sends them to graphite once per interval.

Consists of separate components ("collectors") for processing of:
* /proc/slabinfo for useful-to-watch values, not everything
(configurable).
* /proc/vmstat and /proc/meminfo in a consistent way.
* /proc/stat for irq, softirq, forks.
* /proc/buddyinfo and /proc/pagetypeinfo (memory fragmentation).
* /proc/interrupts and /proc/softirqs.
* Cron log to produce start/finish events and duration for each job
into a separate metrics, adapts jobs to metric names with regexes.
* Per-system-service accounting using [1]systemd and it's cgroups.
* [2]sysstat data from sadc logs (use something like sadc -F -L -S
DISK -S XDISK -S POWER 60 to have more stuff logged there) via
sadf binary and it's json export (sadf -j, supported since
sysstat-10.0.something, iirc).
* iptables rule "hits" packet and byte counters, taken from
ip{,6}tables-save, mapped via separate "table chain_name rule_no
metric_name" file, which should be generated along with firewall
rules (I use [3]this script to do that).

Additional metric collectors can be added via setuptools
graphite_metrics.collectors entry point. Look at shipped collectors
for API examples.

Running

% harvestd -h
usage: harvestd [-h] [-t host[:port]] [-i seconds] [-e collector]
[-d collector] [-c path] [-n] [--debug]

Collect and dispatch various metrics to carbon daemon.

optional arguments:
-h, --help show this help message and exit
-t host[:port], --carbon host[:port]
host[:port] (default port: 2003, can be overidden via
config file) of carbon tcp line-receiver destination.
-i seconds, --interval seconds
Interval between collecting and sending the
datapoints.
-e collector, --enable collector
Enable only the specified metric collectors, can be
specified multiple times.
-d collector, --disable collector
Explicitly disable specified metric collectors, can be
specified multiple times. Overrides --enabled.
-c path, --config path
Configuration files to process. Can be specified more
than once. Values from the latter ones override values
in the former. Available CLI options override the
values in any config.
-n, --dry-run Do not actually send data.
--debug Verbose operation mode.

See also: [4]default harvestd.yaml configuration file, which contains
configuration for all loaded collectors and can/should be overidden
using -c option.

Note that you don't have to specify all the options in each
override-config, just the ones you need to update.

For example, simple-case configuration file (say, /etc/harvestd.yaml)
just to specify carbon host and log lines format (dropping timestamp,
since it will be piped to syslog or systemd-journal anyway) might look
like this:
carbon:
host: carbon.example.host

logging:
formatters:
basic:
format: '%(levelname)s :: %(name)s: %(message)s'

And be started like this: harvestd -c /etc/harvestd.yaml

Rationale

Most other tools can (in theory) collect this data, and I've used
[5]collectd for most of these, but it:
* Doesn't provide some of the most useful stuff - nfs stats, disk
utilization time percentage, etc.
* Fails to collect some other stats, producing bullshit like 0'es,
clearly-insane or negative values (for io, network, sensors, ...).
* General-purpose plugins like "tail" add lot of complexity, making
configuration into a mess, while still lacking some basic
functionality which 10 lines of code can easily provide.
* Mangles names for metrics, as provided by /proc and referenced in
kernel docs and on the internets, no idea what the hell for,
"readability"?

Initially I've tried to implement these as collectd plugins, but it's
python plugin turned out to be leaking RAM and collectd itself
segfaults something like once-a-day, even in the latest releases
(although probably because of bug in some plugin).

Plus, collectd data requires post-processing anyway - proper metric
namespaces, counters, etc.

Given that the alternative is to just get the data and echo it as
"name val timestamp" to tcp socket, I just don't see why would I need
all the extra complexity and fail that collectd provides.

Other than collectd, I've experimented with [6]ganglia, but it's
static schema is a no-go and most of stuff there doesn't make sense in
graphite context.

Daemon binary is (weirdly) called "harvestd" because "metricsd" name
is already used to refer to [7]another graphite-related daemon (also,
[8]there is "metrics" w/o "d", probably others), and is too generic to
be used w/o extra confusion, I think. That, and I seem to lack
creativity to come up with a saner name ("reaperd" sounds too
MassEffect'ish these days).

References

1. http://www.freedesktop.org/wiki/Software/systemd
2. http://sebastien.godard.pagesperso-orange.fr/
3. https://github.com/mk-fg/trilobite
4. https://github.com/mk-fg/graphite-metrics/blob/master/graphite_metrics/harvestd.yaml
5. http://collectd.org/
6. http://ganglia.sourceforge.net/
7. https://github.com/kpumuk/metricsd
8. https://github.com/codahale/metrics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphite-metrics-12.04.22.tar.gz (23.0 kB view details)

Uploaded Source

File details

Details for the file graphite-metrics-12.04.22.tar.gz.

File metadata

File hashes

Hashes for graphite-metrics-12.04.22.tar.gz
Algorithm Hash digest
SHA256 7dd21550b5481d61814ca6bc40362cf42557ea96f7bbd3e3ded0766d1e7613b3
MD5 cdddb2a1e3b59f56e6f28e48cbedfe42
BLAKE2b-256 6b137747f5f2d277f0cf957796ced565a082af281f63f725a209bfd3fb71a7c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page