Measures the resource utilization of a specific process over time

# Overview

Measures the resource utilization of a specific process over time.

Also measures the utilization/saturation of system-wide resources: this helps putting the process-specific metrics into context.

Built for Linux. Windows and Mac OS support might come.

For a list of the currently supported metrics see below.

The name, Göffel, is German for spork:

Convenient, right?

## Highlights

• High sampling rate: the default sampling interval of 0.5 s makes narrow spikes visible.
• Can monitor a program subject to process ID changes (for longevity experiments where the monitored process occasionally restarts, for instance as of fail-over scenarios).
• Can run indefinitely. Has predictable disk space requirements (output file rotation and retention policy).
• Keeps your data organized: the time series data is written into a structured HDF5 file annotated with relevant metadata (also including program invocation time, system hostname, a custom label, the Goeffel software version, and others).
• Interoperability: output files can be read with any HDF5 reader such as PyTables and especially with pandas.read_hdf(). See tips and tricks.
• Values measurement correctness very highly (see technical notes).
• Comes with a data plotting tool separate from the data acquisition program.

The latest Goeffel release can be downloaded and installed from PyPI, via pip:

$pip install goeffel  pip can also install the latest development version of Goeffel: $ pip install git+https://github.com/jgehrcke/goeffel


# CLI tutorial

## goeffel: data acquisition

Invoke Goeffel with the --pid <pid> argument if the process ID of the target process is known. In this mode, goeffel stops the measurement and terminates itself once the process with the given ID goes away. Example:

$goeffel --pid 29019 [... snip ...] 190809-15:46:57.914 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01805 s [... snip ...] 190809-15:56:13.842 INFO: Cannot inspect process: process no longer exists (pid=29019) 190809-15:56:13.843 INFO: Wait for producer buffer to become empty 190809-15:56:13.843 INFO: Wait for consumer process to terminate 190809-15:56:13.854 INFO: Updated HDF5 file: wrote 13 sample(s) in 0.01077 s 190809-15:56:13.856 INFO: Sample consumer process terminated  For measuring beyond the process lifetime use --pid-command <command>. In the following example, I use the pgrep utility is for discovering the newest stress process: $ goeffel --pid-command 'pgrep stress --newest'

[... snip ...]

190809-15:47:47.337 INFO: New process ID from PID command: 25890

[... snip ...]

190809-15:47:57.863 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01805 s
190809-15:48:06.850 INFO: Cannot inspect process: process no longer exists (pid=25890)
190809-15:48:06.859 INFO: PID command returned non-zero

[... snip ...]

190809-15:48:09.916 INFO: PID command returned non-zero
190809-15:48:10.926 INFO: New process ID from PID command: 28086
190809-15:48:12.438 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01013 s
190809-15:48:22.446 INFO: Updated HDF5 file: wrote 20 sample(s) in 0.01062 s

[... snip ...]


In this mode, goeffel runs forever until manually terminated via SIGINT or SIGTERM. Process ID changes are detected by periodically running the discovery command until it returns a valid process ID on stdout. This is useful for longevity experiments where the monitored process occasionally restarts, for instance as of fail-over scenarios.

## goeffel-analysis: data inspection and visualization

Note: goeffel-analysis provides an opinionated and limited approach to visualizing data. For advanced and thorough data analysis I recommend building a custom (maybe even ad-hoc) data analysis pipeline using pandas and matplotlib, or using the tooling of your choice.

Also note: The command line interface provided by goeffel-analysis, especially for the plot commands, might change in the future. Suggestions for improvement are welcome, of course.

### goeffel-analysis inspect:

Use goeffel-analysis inspect <path-to-HDF5-file> for inspecting the contents of a Goeffel output file. Example:

$goeffel-analysis inspect mwst18-master1-journal_20190801_111952.hdf5 Measurement metadata: System hostname: int-master1-mwt18.foo.bar Invocation time (local): 20190801_111952 PID command: pgrep systemd-journal PID: None Sampling interval: 1.0 s Table properties: Number of rows: 24981 Number of columns: 38 Number of data points (rows*columns): 9.49E+05 First row's (local) time: 2019-08-01T11:19:53.613377 Last row's (local) time: 2019-08-01T18:52:49.954582 Time span: 7h 32m 56s Column names: unixtime ... snip ... system_mem_inactive  ### goeffel-analysis plot: quickly plot data from a single time series file The goeffel-analysis plot <path-to-hdf5-file> command plots a pre-selected set of metrics in an opinionated way. More metrics can be added to the plot with the --metric <metric-name> option. Example command: goeffel-analysis plot \ mwst18-master2-mesosmaster_20190801_112136.hdf5 \ --metric proc_num_ip_sockets_open  Example output figure: ### goeffel-analysis flexplot: generic plot command This command can be used for example for comparing multiple time series. Say you have monitored the same program across multiple replicas in a distributed system and would like to compare the time evolution of a certain metric across these replicas. Then the goeffel-analysis flexplot command is here to help, invoked with multiple --series arguments: $ goeffel-analysis flexplot \
--series mwst18-master1-journal_20190801_111952.hdf5 master1 \
--series mwst18-master2-journal_20190801_112136.hdf5 master2 \
--series mwst18-master3-journal_20190801_112141.hdf5 master3 \
--series mwst18-master4-journal_20190801_112151.hdf5 master4 \
--series mwst18-master5-journal_20190801_112157.hdf5 master5 \
--column proc_cpu_util_percent_total \
'CPU util (total) / %' \
'systemd journal CPU utilization ' 15 \
--subtitle 'MWST18, measured with Goeffel' \
--legend-loc 'upper center'


Example output figure:

# Background and details

## Prior art

This was born out of a need for solid tooling. We started with pidstat from sysstat, launched as pidstat -hud -p \$PID 1 1. We found that it does not properly account for multiple threads running in the same process and that various issues in that regard exist in this program across various versions (see here, here, and here).

The program cpustat open-sourced by Uber has a delightful README about the general measurement methodology and overall seems to be a great tool. However, it seems to be optimized for interactive usage (whereas we were looking for a robust measurement program which can be pointed at a process and then be left unattended for a significant while) and there does not seem to be a well-documented approach towards persisting the collected time series data on disk for later inspection.

The program psrecord (which effectively wraps psutil) has a similar fundamental approach as Goeffel; it however only measures few metrics, and it does not have a clear separation of concerns between persisting the data to disk, performing the measurement itself, and analyzing/plotting the data.

## Technical notes

• The core sampling loop does little work besides the measurement itself: it writes each sample to a queue. A separate process consumes this queue and persists the time series data to disk, for later inspection. This keeps the sampling rate predictable upon disk write latency spikes, or generally upon backpressure. This matters especially in cloud environments where we sometimes see fsync latencies of multiple seconds.

• The sampling loop is (supposed to be, feedback welcome) built so that timing-related systematic measurement errors are minimized.

• Goeffel tries to not asymmetrically hide measurement uncertainty. For example, you might see it measure a CPU utilization of a single-threaded process slightly larger than 100 %. That's simply the measurement error. In related tooling such as sysstat it seems to be common practice to asymmetrically hide measurement uncertainty by capping values when they are known to in theory not exceed a certain threshold (example).

• goeffel must be run with root privileges.

• The value -1 has a special meaning for some metrics (NaN, which cannot be represented properly in HDF5). Example: A disk write latency of -1 ms means that no write happened in the corresponding time interval.

• The highest meaningful sampling rate is limited by the kernel's timer and bookkeeping system.

# Measurands

Measurand is a word! This section attempts to describe the individual data columns ("metrics"), their units, and their meaning. There are four main categories:

### Timestamps

#### unixtime, isotime_local, monotime

The timestamp corresponding to the right boundary of the sampled time interval.

• unixtime encodes the wall time. It is a canonical Unix timestamp (seconds since epoch, double-precision floating point number); with sub-second precision and no timezone information. This is compatible with a wide range of tooling and therefore the general-purpose timestamp column for time series analysis (also see How to convert the unixtime column into a pandas.DatetimeIndex). Note: this is subject to system clock drift. In extreme case, this might go backward, have discontinuities, and be a useless metric. In that case, the monotime metric helps (see below).

• isotime_local is a human-readable version of the same timestamp as stored in unixtime. It is a 26 character long text representation of the local time using an ISO 8601 notation (and therefore also machine-readable). Like unixtime this metric is subject to system clock drift and might become pretty useless in extreme cases.

• monotime is based on a so-called monotonic clock source that is not subject to (accidental or well-intended) system clock drift. This column encodes most accurately the relative time difference between any two samples in the time series. The timestamps encoded in this column only make sense relative to each other; the difference between any two values in this column is a wall time difference in seconds, with sub-second precision.

### Process-specific metrics

#### proc_pid

The process ID of the monitored process. It can change if Goeffel was invoked with the --pid-command option.

Momentary state at sampling time.

#### proc_cpu_util_percent_total

The CPU utilization of the process in percent.

Mean over the past sampling interval.

If the inspected process is known to contain just a single thread then this can still sometimes be larger than 100 % as of measurement errors. If the process runs more than one thread then this can go far beyond 100 %.

This is based on the sum of the time spent in user space and in kernel space. For a more fine-grained picture the following two metrics are also available: proc_cpu_util_percent_user, and proc_cpu_util_percent_system.

#### proc_cpu_id

The ID of the CPU that this process is currently running on.

Momentary state at sampling time.

#### proc_ctx_switch_rate_hz

The rate of (voluntary and involuntary) context switches in Hz.

Mean over the past sampling interval.

#### proc_num_threads

The number of threads in the process.

Momentary state at sampling time.

#### proc_num_ip_sockets_open

The number of sockets currently being open. This includes IPv4 and IPv6 and does not distinguish between TCP and UDP, and the connection state also does not matter.

Momentary state at sampling time.

#### proc_num_fds

The number of file descriptors currently opened by this process.

Momentary state at sampling time.

#### proc_disk_read_throughput_mibps and proc_disk_write_throughput_mibps

The disk I/O throughput of the inspected process, in MiB/s.

Based on Linux' /proc/<pid>/io rchar and wchar. Relevant Linux kernel documentation (emphasis mine):

rchar: The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache).

wcar: The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar.

Mean over the past sampling interval.

#### proc_disk_read_rate_hz and proc_disk_write_rate_hz

The rate of read/write system calls issued by the process as inferred from the Linux /proc file system. The relevant syscr/syscw counters are as of now only documented with "read I/O operations, i.e. syscalls like read() and pread()" and "write I/O operations, i.e. syscalls like write() and pwrite()". Reference: Documentation/filesystems/proc.txt

Mean over the past sampling interval.

#### proc_mem_rss_percent

Fraction of process resident set size (RSS) relative to the machine's physical memory size in percent. This is equivalent to what top shows in the %MEM column.

Momentary state at sampling time.

#### proc_mem_rss, proc_mem_vms. proc_mem_dirty

Various memory usage metrics of the monitored process. See the psutil docs for a quick summary of what the values mean. However, note that the values need careful interpretation, as shown by discussions like this and this.

Momentary snapshot at sampling time.

### Disk metrics

Only collected if Goeffel is invoked with the --diskstats <DEV> argument. The resulting data column names contain the device name <DEV> (note however that dashes in <DEV> get removed when building the column names).

Note that the conclusiveness of some of these disk metrics is limited. I believe that this blog post nicely covers a few basic Linux disk I/O concepts that should be known prior to read a meaning into these numbers.

#### disk_<DEV>_util_percent

This implements iostat's disk %util metric.

I like to think of it as the ratio between the actual (wall) time elapsed in the sampled time interval, and the corresponding device's "busy time" in the very same time interval, expressed in percent. The iostat documentation describes this metric in the following words:

Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device).

This is the mean over the sampling interval.

Note: In the case of modern storage systems 100 % utilization usually does not mean that the device is saturated. I would like to quote Marc Brooker:

As a measure of general IO busyness %util is fairly handy, but as an indication of how much the system is doing compared to what it can do, it's terrible.

#### disk_<DEV>_write_latency_ms and disk_<DEV>_read_latency_ms

This implements iostat's w_await which is documented with

The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

On Linux, this is built using /proc/diskstats documented here. Specifically, this uses field 8 ("number of milliseconds spent writing") and field 5 ("number of writes completed"). Notably, the latter it is not the merged write count but the user space write count (which seems to be what iostat uses for calculating w_await).

This can be a useful metric, but please be aware of its meaning and limitations. To put this into perspective, in an experiment I have seen that the following can happen within a second of real-time (observed via iostat -x 1 | grep xvdh and via direct monitoring of /proc/diskstats): 3093 userspace write requests served, merged into 22 device write requests, yielding a total of 120914 milliseconds "spent writing", resulting in a mean write latency of 25 ms. But what do these 25 ms really mean here? On average, humans have less than two legs, for sure. The current implementation method reproduces iostat output, which was the initial goal. Suggestions for improvement are very welcome.

This is the mean over the sampling interval.

The same considerations hold true for r_await, correspondingly.

#### disk_<DEV>_merged_read_rate_hz and disk_<DEV>_merged_write_rate_hz

The merged read and write request rate.

The Linux kernel attempts to merge individual user space requests before passing them to the storage hardware. For non-random I/O patterns this greatly reduces the rate of individual reads and writes issued to disk.

Built using fields 2 and 6 in /proc/diskstats documented here.

This is the mean over the sampling interval.

#### disk_<DEV>_userspace_read_rate_hz and disk_<DEV>_userspace_write_rate_hz

The read and write request rate issued from user space point of view (before merges).

Built using fields 1 and 5 in /proc/diskstats documented here.

This is the mean over the sampling interval.

### System-wide metrics

system_loadavg1

system_loadavg5

system_loadavg15

system_mem_available

system_mem_total

system_mem_used

system_mem_free

system_mem_shared

system_mem_buffers

system_mem_cached

system_mem_active

system_mem_inactive

# Tips and tricks

## How to convert a Goeffel HDF5 file into a CSV file

I recommend to de-serialize and re-serialize using pandas. Example one-liner:

python -c 'import sys; import pandas as pd; df = pd.read_hdf(sys.argv[1], key="goeffel_timeseries"); df.to_csv(sys.argv[2], index=False)' goeffel_20190718_213115.hdf5.0001 /tmp/hdf5-as-csv.csv


Note that this significantly inflates the file size (e.g., from 50 MiB to 300 MiB).

## How to visualize and browse the contents of an HDF5 file

At some point, you might feel inclined to poke around in an HDF5 file created by Goeffel or to do custom data inspection/processing. In that case, I recommend using one of the various available open-source HDF5 tools for managing and viewing HDF5 files. One GUI tool I have frequently used is ViTables. Install it with pip install vitables and then do e.g.

vitables goeffel_20190718_213115.hdf5


This opens a GUI which allows for browsing the tabular time series data, for viewing the metadata in the file, for exporting data as CSV, for querying the data, and various other things.

## How to do quick data analysis using IPython and pandas

I recommend to start an IPython REPL:

pip install ipython  # if you have not done so yet
ipython


Load the HDF5 file into a pandas data frame:

In [1]: import pandas as pd
In [2]: df = pd.read_hdf('goeffel_timeseries__20190806_213704.hdf5', key='goeffel_timeseries')


From here you can do anything.

For example, let's have a look at the mean value of the actual sampling interval used in this specific Goeffel time series:

In [3]: df['unixtime'].diff().mean()
Out[3]: 0.5003192476604296


Or, let's see how many threads the monitored process used at most during the entire observation period:

In [4]: df['proc_num_threads'].max()
Out[4]: 1


## How to convert the unixtime column into a pandas.DatetimeIndex

The HDF5 file contains a unixtime column which contains canonical Unix timestamp data ready to be consumed by a plethora of tools. If you are like me and like to use pandas then it is good to know how to convert this into a native pandas.DateTimeIndex:

In [1]: import pandas as pd
In [2]: df = pd.read_hdf('goeffel_timeseries__20190807_174333.hdf5', key='goeffel_timeseries')

# Now the data frame has an integer index.
In [3]: type(df.index)
Out[3]: pandas.core.indexes.numeric.Int64Index

# Parse unixtime column.
In [4]: timestamps = pd.to_datetime(df['unixtime'], unit='s')

# Replace the index of the data frame.
In [5]: df.index = timestamps

# Now the data frame has a DatetimeIndex.
In [6]: type(df.index)
Out[6]: pandas.core.indexes.datetimes.DatetimeIndex

# Let's look at some values.
In [7]: df.index[:5]
Out[7]:
DatetimeIndex(['2019-08-07 15:43:33.798929930',
'2019-08-07 15:43:34.300590992',
'2019-08-07 15:43:34.801260948',
'2019-08-07 15:43:35.301798105',
'2019-08-07 15:43:35.802226067'],
dtype='datetime64[ns]', name='unixtime', freq=None)


# Valuable references

External references on the subject matter that I found useful during development.

About system performance measurement, and kernel time bookkeeping:

Others:

## Project details

Uploaded source