Algorithm for Highly Efficient Detection of Correlation Anomalies in Multivariate Time Series
Project description
Series2Graph++ (S2G++) is a time series anomaly detection algorithm based on the Series2Graph (S2G) and the DADS algorithms. S2G++ can handle multivariate time series whereas S2G and DADS can cope with only univariate time series. Moreover, S2G++ takes ideas from DADS to run distributedly in a computer cluster. S2G++ is written in Rust and leverages the actix and actix-telepathy libraries.
Quick Start
Requirements
- Rust 1.58
- openblas
- (Docker)
To have openblas
available to the Rust build process, do the following on Debian (Linux):
sudo apt install build-essential gfortran libopenblas-base libopenblas-dev gcc
Installation
From source
git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
cargo build
Docker
The base image akita/rust-base
must be available to your machine.
git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
docker build s2gpp .
Usage (bin)
Parameters
Pattern:
s2gpp --local-host <IP:Port> --pattern-length <Int> --latent <Int> --query-length <Int> --rate <Int> --threads <Int> --cluster-nodes <Int> --score-output-path <Path> [main --data-path <Path> | sub --mainhost <IP:Port>]
S2G++ expects one of two sub-commands with its specific parameters:
main
(The head computer in a cluster)data-path
(The path to the input time series)
sub
(The other computers in a cluster; only necessary in a distributed setting)mainhost
(The ip-address to the main computer in a cluster)
Before these sub-commands are used, general parameters must be defined:
local-host
(The ip-address with port to bind the listener on.)pattern-length
(Size of the sliding window, independent of anomaly length, but should in the best case be larger.)latent
(Size of latent embedding space. This space is the input for the PCA calculation afterwards.)query-length
(Size of the sliding windows used to find anomalies (query subsequences). query-length must be >= pattern-length!)rate
(Number of angles used to extract pattern nodes. A higher value will lead to high precision, but at the cost of increased computation time.)threads
(Number of helper threads started besides the main thread. (min=1))cluster-nodes
(Size of the computer cluster.)score-output-path
(Path the score are written to.)column-start-idx
(How many columns to skip)column-end-idx
(Until which column to use (exclusive). Can also take negative numbers to count from the end.)self-correction
(Whether S2G++ will correct the direction of the time embedding if too few transactions are available)
Input Format
The input format of the time series is expected to be a CSV with header. Each column represents a channel of the timeseries.
Sometimes, time series files include also the labels and an index. You can skip columns with the column-start-idx
/ column-end-idx
range pattern. It behave like Python ranges.
Usage (lib)
Cargo.toml
[dependencies]
s2gpp = "1.0.2"
your Rust app
fn some_fn(timeseries: Array2<f32>) -> Result<Array1<f32>, ()> {
let params = s2gpp::Parameters::default();
let anomaly_score = s2gpp::s2gpp(params, Some(timeseries))?.unwrap();
Ok(anomaly_score)
}
Python
We have wrapped the Rust code in a Python package, that can be used without installing Rust.
Installation
PyPI
pip install s2gpp
Build with Docker
make build-docker
pip install wheels/s2gpp-*.whl
Build from Source
make install
Usage
Single Machine
from s2gpp import Series2GraphPP
import pandas as pd
ts = pd.read_csv("data/ts_0.csv").values
model = Series2GraphPP(pattern_length=100)
anomaly_scores = model.fit_predict(ts)
Distributed
from s2gpp import DistributedSeries2GraphPP
from pathlib import Path
# run on one machine
def main_node():
dataset_path = Path("data/ts_0.csv")
model = DistributedSeries2GraphPP.main(local_host="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
model.fit_predict(dataset_path)
# run on other machine
def sub_node():
model = DistributedSeries2GraphPP.sub(local_host="127.0.0.1:1993", mainhost="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
model.fit_predict()
Cite
Please cite this work, when using it!
@software{Wenig_Series2Graph_2022,
author = {Wenig, Phillip},
month = {6},
title = {{Series2Graph++}},
version = {1.0.2},
year = {2022}
}
References
[1] P. Boniol and T. Palpanas, Series2Graph: Graph-based Subsequence Anomaly Detection in Time Series, PVLDB (2020) link
[2] Schneider, J., Wenig, P. & Papenbrock, T. Distributed detection of sequential anomalies in univariate time series. The VLDB Journal 30, 579–602 (2021). link
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file s2gpp-1.0.2-cp38-cp38-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: s2gpp-1.0.2-cp38-cp38-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.8, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d558eec07a68d28c0cba15f3cc5b809b71af9ca6668ba8298a50b6cb7d40e86 |
|
MD5 | 0fa443a017bc680cd88998919d9f4d8f |
|
BLAKE2b-256 | c3c1d0857ceaa7cbc78e4497df8a704f421afcd9bc229b9a2448a277027f0b41 |