Skip to main content

Distribution-based anomaly detection for time series.

Project description

PyPI-Status PyPI-Versions Build-Status Codecov LICENCE

Distribution-based anomaly detection for time series data.

>>> from fossa import LastWindowX2AnomalyDetector
>>> clf = LastWindowX2AnomalyDetector(p_threshold=0.005, normalize=True)
>>> clf.fit(historic_data_df)
>>> clf.predict(new_data)
                     direction
date       category
2018-06-01 hockey          1.0
           footbal         0.0
           soccer         -1.0
           tennis          0.0

1 Installation

pip install fossa

2 Features

  • scikit-learn-like classifier API.

  • Pickle-able classifier objects.

  • Pure python.

  • Supports Python 3.5+.

  • Fully tested.

3 Approach

Build on top of: http://lagrange.univ-lyon1.fr/docs/scipy/0.17.1/generated/scipy.stats.power_divergence.html#scipy.stats.power_divergence

4 Use

4.1 Data Format

All anomaly detectors are desgined to receive as fit parameter a pandas DataFrame with a two-leveled multi-index, the first indexing time and the second indexing category/topic frequency per-window, and a single column of a numeric dtype, giving said frequency.

When detecting trends a similarly-indexed dataframe with detection results is returned, giving detected trends per time windows and category.

4.2 API

All anomaly detector objects in fossa have an identical API:

  • fit - Recieves a history of time-windowed distributions to train on and fits the detector on it (see the Data Format section for the exact format). The set of categories may be different across different time windows or between historic and time windoes for detection; detection is done for the union of of categories over all commitee and new time windows.

  • partial_fit - The same as fit, but can also incrementaly fit an already-fit detector without necessarilly ignoring all past fitted data. Detectors who do not support incremental fitting will raise a NotImplementedError exception when this method is called.

  • detect_trends - Recieves a new dataframe (in the correct format) and detects, for each of the time windows in it, trends for each category. In addition to the direction column - indicating trend direction, with -1 for a downward trend, 0 for no trend and 1 for an upward trend - the returned dataframe might contain additional columns detailing detection confidence or probability, like p-values or commitee vote results.

  • predict - Like detect_trends, except the returned dataframe always contains only a single column of detected trend directions.

4.3 Anomaly Detectors

Chi-Square-based Detectors

This family of anomaly detectors all operate similarly: Every detector compares new time windows to a set of committe windows that represent its idea for relevant history and characteristic behaviour of the data; one detector might look at the same hour on the same weekday across several weeks, while another might look at all the same hours in the last 10 or 20 days, or the preciding few hours.

For each of the time windows given to the detect_trends or predict methods, a one-vs-all distribution is generated for each of the categories in the window (and is possibly normalized, depending on the specific detector and its initialization parameters). Then, for each of this distributions chi-squared tests are performed between it and the corresponding distributions in each of the commitee time windows. Each commitee member “votes” on whether a trend is detected or not, and a decision is generated by some pre-set voting rule (for example, majority vote).

5 Contributing

Current package maintainer (and one of the authors) is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed.

5.1 Installing for development

Clone:

git clone git@github.com:shaypal5/fossa.git

Install in development mode, including test dependencies:

cd fossa
pip install -e '.[test]'

5.2 Running the tests

To run the tests use:

cd fossa
pytest

5.3 Adding documentation

The project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings. When documenting code you add to this project, follow these conventions.

Additionally, if you update this README.rst file, use python setup.py checkdocs to validate it compiles.

6 Credits

Created by Shay Palachy (shay.palachy@gmail.com) and Omri Mendels.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fossa-0.0.3-py2.py3-none-any.whl (13.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page