Library for locating changes in time series by grouping results.
Project description
Origins
This library was developed as anomaly detection logic for “PAL” component of CSIT (Continuous System and Integration Testing) project of fd.io (“Fast Data”), one of LFN (Linux Foundation Networking) projects. Currently still being primarily used in PAL’s successor: CSIT-DASH.
In order to make this code available in PyPI (Python Package Index), the setuputils stuff (later converted to pyproject.toml) has been added, but after some discussion, that directory ended up having only a symlink to the original place of tightly coupled CSIT code.
Usage
High level description
The main method is “classify”, which partitions the input sequence of values into consecutive “groups”, so that standard deviation of samples within a group is small.
The design decisions that went into the final algorithm are heavily influenced by typical results seen in CSIT testing, so it is better to read about the inner workings of the classification procedure in CSIT documentation, especially the Minimum Description Length sub-chapter of trend analysis.
Example
A very basic example, showing some inputs and the structure of output. The output is a single line, here shown wrapped for readability.
>>> from jumpavg import classify
>>> classify(values=[2.1, 3.1, 3.2], unit=0.1)
BitCountingGroupList(max_value=3.2, unit=0.1, group_list=[BitCountingGroup(run_list=
[2.1], max_value=3.2, unit=0.1, comment='normal', prev_avg=None, stats=AvgStdevStats
(size=1, avg=2.1, stdev=0.0), cached_bits=6.044394119358453), BitCountingGroup(run_l
ist=[3.1, 3.2], max_value=3.2, unit=0.1, comment='progression', prev_avg=2.1, stats=
AvgStdevStats(size=2, avg=3.1500000000000004, stdev=0.050000000000000044), cached_bi
ts=10.215241265313393)], bits_except_last=6.044394119358453)
Change log
0.4.2: Should no longer divide by zero on empty inputs.
0.4.1: Fixed bug of not penalizing large stdev enough (at all for size 2 stats).
0.4.0: Added “unit” and “sbps” parameters so information content is reasonable even if sample values are below one.
0.3.0: Considerable speedup by avoiding unneeded copy. Dataclasses used. Mostly API compatible, but repr looks different.
0.2.0: API incompatible changes. Targeted to Python 3 now.
0.1.3: Changed stdev computation to avoid negative variance due to rounding errors.
0.1.2: First version published in PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.