No project description provided
Project description
Series2Graph++ (S2G++) is a time series anomaly detection algorithm based on the Series2Graph (S2G) and the DADS algorithms. S2G++ can handle multivariate time series whereas S2G and DADS can cope with only univariate time series. Moreover, S2G++ takes ideas from DADS to run distributedly in a computer cluster. S2G++ is written in Rust and leverages the actix and actix-telepathy libraries.
Quick Start
Requirements
- Rust 1.58
- openblas
- (Docker)
To have openblas
available to the Rust build process, do the following on Debian (Linux):
sudo apt install build-essential gfortran libopenblas-base libopenblas-dev gcc
Installation
From source
git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
cargo build
Docker
The base image akita/rust-base
must be available to your machine.
git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
docker build s2gpp .
Usage
Parameters
Pattern:
s2gpp --local-host <IP:Port> --pattern-length <Int> --latent <Int> --query-length <Int> --rate <Int> --threads <Int> --cluster-nodes <Int> --score-output-path <Path> [main --data-path <Path> | sub --mainhost <IP:Port>]
S2G++ expects one of two sub-commands with its specific parameters:
main
(The head computer in a cluster)data-path
(The path to the input time series)
sub
(The other computers in a cluster; only necessary in a distributed setting)mainhost
(The ip-address to the main computer in a cluster)
Before these sub-commands are used, general parameters must be defined:
local-host
(The ip-address with port to bind the listener on.)pattern-length
(Size of the sliding window, independent of anomaly length, but should in the best case be larger.)latent
(Size of latent embedding space. This space is the input for the PCA calculation afterwards.)query-length
(Size of the sliding windows used to find anomalies (query subsequences). query-length must be >= pattern-length!)rate
(Number of angles used to extract pattern nodes. A higher value will lead to high precision, but at the cost of increased computation time.)threads
(Number of helper threads started besides the main thread. (min=1))cluster-nodes
(Size of the computer cluster.)score-output-path
(Path the score are written to.)column-start-idx
(How many columns to skip)column-end-idx
(Until which column to use (exclusive). Can also take negative numbers to count from the end.)self-correction
(Whether S2G++ will correct the direction of the time embedding if too few transactions are available)
Input Format
The input format of the time series is expected to be a CSV with header. Each column represents a channel of the timeseries.
Sometimes, time series files include also the labels and an index. You can skip columns with the column-start-idx
/ column-end-idx
range pattern. It behave like Python ranges.
Python
We have wrapped the Rust code in a Python package, that can be used without installing Rust.
Installation
PyPI
pip install s2gpp
Build with Docker
make build-docker
pip install wheels/s2gpp-*.whl
Build from Source
make install
Cite
Please cite this work, when using it!
References
[1] P. Boniol and T. Palpanas, Series2Graph: Graph-based Subsequence Anomaly Detection in Time Series, PVLDB (2020) link
[2] Schneider, J., Wenig, P. & Papenbrock, T. Distributed detection of sequential anomalies in univariate time series. The VLDB Journal 30, 579–602 (2021). link
TODO
- add bibtex
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for s2gpp-0.8.2-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74f8aacba577e817ac2a999ae275cad8e8da694fb36a8d649a5aeb97c91f0ace |
|
MD5 | 2463c3f74796dd6c8ad32ffb9f37b3c9 |
|
BLAKE2b-256 | a83f497284871e3e460df55fe1135c2d7b8d94b349d276fcc46a29cb307a3730 |