Skip to main content

A utility to monitor the jobs ressources in a HPC environment, espacially OAR

Project description

Colmet - Collecting metrics about jobs running in a distributed environnement


Colmet is a monitoring tool to collect metrics about jobs running in a
distributed environnement, especially for gathering metrics on clusters and
grids. It provides currently several backends :
- taskstats: fetch task metrics from the linux kernel
- stdout: display the metrics on the terminal
- zeromq: transport the metrics across the network
- hdf5: store the metrics on the filesystem



* a Linux kernel that supports
- Taskstats

* Python Version 2.7 or newer
- python-zmq 2.2.0 or newer
- python-tables 3.3.0 or newer
- python-pyinotify 0.9.3-2 or newer


You can install, upgrade, uninstall colmet with these commands::

$ pip install [--user] colmet
$ pip install [--user] --upgrade colmet
$ pip uninstall colmet

Or from git (last development version)::

$ pip install [--user] git+

Or if you already pulled the sources::

$ pip install [--user] path/to/sources

Or if you don't have pip::

$ easy_install colmet


for the nodes :

sudo colmet-node -vvv --zeromq-uri tcp://

for the collector :

colmet-collector -vvv --zeromq-bind-uri tcp:// --hdf5-filepath /data/colmet.hdf5 --hdf5-complevel 9

You will see the number of counters retrieved in the debug log.

For more information, please refer to the help of theses scripts (--help)


This product is distributed under the GNU General Public License Version2.
Please read through the file LICENSE for more information about our license.


Version 0.5.4

Released on January 19th 2018

- hdf5 extractor script for OAR RESTFUL API
- Added infiniband backend
- Added lustre backend
- Fixed cpuset_rootpath default always appended

Version 0.5.3

Released on April 29th 2015

- Removed unnecessary lock from the collector to avoid colmet to wait forever
- Removed (async) zmq eventloop and added ``--sample-period`` to the collector.
- Fixed some bugs about hdf file

Version 0.5.2

Released on Apr 2nd 2015

- Fixed python syntax error

Version 0.5.1

Released on Apr 2nd 2015

- Fixed error about missing ``requirements.txt`` file in the sdist package

Version 0.5.0

Released on Apr 2nd 2015

- Don't run colmet as a daemon anymore
- Maintained compatibility with zmq 3.x/4.x
- Dropped ``--zeromq-swap`` (swap was dropped from zmq 3.x)
- Handled zmq name change from HWM to SNDHWM and RCVHWM
- Fixed requirements
- Dropped python 2.6 support

Version 0.4.0

- Saved metrics in new HDF5 file if colmet is reloaded in order to avoid HDF5 data corruption
- Handled HUP signal to reload ``colmet-collector``
- Removed ``hiwater_rss`` and ``hiwater_vm`` collected metrics.

Version 0.3.1

- New metrics ``hiwater_rss`` and ``hiwater_vm`` for taskstats
- Worked with pyinotify 0.8
- Added ``--disable-procstats`` option to disable procstats backend.

Version 0.3.0

- Divided colmet package into three parts

- colmet-node : Retrieve data from taskstats and procstats and send to
collectors with ZeroMQ
- colmet-collector : A collector that stores data received by ZeroMQ in a
hdf5 file
- colmet-common : Common colmet part.
- Added some parameters of ZeroMQ backend to prevent a memory overflow
- Simplified the command line interface
- Dropped rrd backend because it is not yet working
- Added ``--buffer-size`` option for collector to define the maximum number of
counters that colmet should queue in memory before pushing it to output
- Handled SIGTERM and SIGINT to terminate colmet properly

Version 0.2.0

- Added options to enable hdf5 compression
- Support for multiple job by cgroup path scanning
- Used Inotify events for job list update
- Don't filter packets if no job_id range was specified, especially with zeromq
- Waited the cgroup_path folder creation before scanning the list of jobs
- Added procstat for node monitoring through fictive job with 0 as identifier
- Used absolute time take measure and not delay between measure, to avoid the
drift of measure time
- Added workaround when a newly cgroup is created without process in it
(monitoring is suspended upto one process is launched)

Version 0.0.1

- Conception

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for colmet, version 0.5.4
Filename, size File type Python version Upload date Hashes
Filename, size colmet-0.5.4.tar.gz (32.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page