Standalone Graphite metric data collectors for various stuff thats not (or poorly) handled by other monitoring daemons
Core of the project is a simple daemon (harvestd), which collects metric values and sends them to graphite carbon daemon (and/or other configured destinations) once per interval.
Includes separate data collection components (“collectors”) for processing of:
Additional metric collectors can be added via setuptools/distribute graphite_metrics.collectors entry point and confgured via the common configuration mechanism.
Same for the datapoint sinks (destinations - it doesn’t have to be a single carbon host), datapoint processors (mangle/rename/filter datapoints) and the main loop, which can be replaced with the async (simple case - threads or gevent) or buffering loop.
Currently supported backends (data destinations, sinks):
It’s a regular package for Python 2.7 (not 3.X).
Using pip is the best way:
% pip install graphite-metrics
If you don’t have it, use:
% easy_install pip % pip install graphite-metrics
Alternatively (see also):
% curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python % pip install graphite-metrics
Or, if you absolutely must:
% easy_install graphite-metrics
But, you really shouldn’t do that.
Current-git version can be installed like this:
% pip install 'git+https://github.com/mk-fg/graphite-metrics.git#egg=graphite-metrics'
Basic requirements are (pip or easy_install should handle these for you):
Some shipped modules require additional packages to function (which can be installed automatically by specifying extras on install, example: pip install 'graphite-metrics[collectors.cgacct]'):
First run should probably look like this:
% harvestd --debug -s dump -i10
That will use default configuration with all the collectors enabled, dumping data to stderr (only “dump” data-sink enabled) and using short (5s) interval between collected datapoints, dumpng additional info about what’s being done.
After that, see default harvestd.yaml configuration file, which contains configuration for all loaded collectors and can/should be overidden using -c option.
Note that you don’t have to specify all the options in each override-config, just the ones you need to update.
For example, simple configuration file (say, /etc/harvestd.yaml) just to specify carbon host and log lines format (dropping timestamp, since it will be piped to syslog or systemd-journal anyway) might look like this:
sinks: carbon_socket: host: carbon.example.host logging: formatters: basic: format: '%(levelname)s :: %(name)s: %(message)s'
And be started like this: harvestd -c /etc/harvestd.yaml
See harvestd --help output for a full CLI reference.
While most stock collectors here pull metrics from /proc once per some interval, same as the other tools, be especially wary of the ones that process memory metrics, like /proc/slabinfo and cgroup value parsers.
So-called “files” in /proc are actually callbacks in the kernel code, and to get consistent reading for the whole slabinfo table, (at least some versions) of the kernel have to lock some operations, causing unexpected lags and delays on the whole system under some workloads (e.g. memcache servers).
cgroup data collector processes lots of files, potentially dozens, hundreds or even thoursands per collection cycle, which may also cause similar issues.
Special thanks to Marcus Barczak for pointing that out.
Most other tools can (in theory) collect this data, and I’ve used collectd for most of these, but it:
Initially I’ve tried to address these issues (implement the same collectors) with collectd plugins, but it’s python plugin system turned out to be leaking RAM and collectd itself segfaults something like once-a-day, even in the latest releases, although probably because of issues in C plugins.
Plus, collectd data requires post-processing anyway - proper metric namespaces, counter handling, etc.
Given that the alternative is to just get the data and echo it as “name val timestamp” to tcp socket, decided to avoid the extra complexity and problems that collectd provides.
Other than collectd, I’ve experimented with ganglia, munin, and some other monitoring infrastructures, but found little justification in re-using their aggregation and/or collection infrastructure, if not outright limitations (like static data schema in ganglia).
Daemon binary is (weirdly) called “harvestd” because “metricsd” name is already used to refer to another related daemon (also, there’s a “metrics” w/o “d”, probably others), and is too generic to be used w/o extra confusion, I think. That, and I seem to lack creativity to come up with a saner name (“reaperd” sounds too MassEffect’ish these days).