Flexible time series analysis libraryimplementing Matrix Profile related functionality.
Project description
Series Distance Matrix
This library implements the Series Distance Matrix framework, a flexible component-based framework that bundles various Matrix Profile related techniques. These techniques can be used for (time) series mining and analysis. Some example applications include:
- motif discovery: finding the best (imperfect) matching subsequence pair in a larger series
- discord discovery: finding the most dissimilar subsequence in a larger series
- finding repeating subsequences in one or more series (common and consensus motifs)
- visualizing series
- finding changing patterns
- ...
The Series Distance Matrix framework was designed to integrate the various Matrix Profile variants that were established over the years. It does this by splitting the generation and consumption of the all-pair subsequence distances, putting the focus on the distance matrix itself. This allows for easier and more flexible experiments by freely combining components and eliminates the need to re-implement algorithms to combine techniques in an efficient way.
Following core techniques are implemented:
- Z-normalized Euclidean distance (including noise elimination)
- Euclidean distance
- (Left/Right) Matrix Profile
- Multidimensional Matrix Profile
- Contextual Matrix Profile
- Radius Profile
- Streaming and batch calculation
Following Matrix Profile related techniques are implemented:
- Valmod: find the top-1 motif in a series for each subsequence length in a given range
- Ostinato: find the top-1 (k of n) consensus motif in a collection of series
- Anytime Ostinato: find the radius profile for a collection of series
Basic Usage
Calculate a standard Matrix Profile using z-normalized Euclidean distance over a single series.
import numpy as np
from distancematrix.generator.znorm_euclidean import ZNormEuclidean
from distancematrix.consumer.matrix_profile_lr import MatrixProfileLR
from distancematrix.calculator import AnytimeCalculator
data = np.random.randn(10000)
m = 100 # Subsequence length
calc = AnytimeCalculator(m, data)
gen_0 = calc.add_generator(0, ZNormEuclidean())
cons_mp = calc.add_consumer([0], MatrixProfileLR())
calc.calculate_columns()
matrix_profile = cons_mp.matrix_profile()
Calculate a Matrix Profile and (common-10) Radius Profile over a single series using Euclidean distance. A combined calculation is more efficients, as it can reuse the calculated distances.
import numpy as np
from distancematrix.generator.euclidean import Euclidean
from distancematrix.consumer.radius_profile import RadiusProfile
from distancematrix.consumer.matrix_profile_lr import MatrixProfileLR
from distancematrix.calculator import AnytimeCalculator
data = np.random.randn(10000)
m = 100 # Subsequence length
calc = AnytimeCalculator(m, data)
gen_0 = calc.add_generator(0, Euclidean()) # Generator 0 works on channel 0
cons_mp = calc.add_consumer([0], MatrixProfileLR()) # Consumer consumes generator 0
cons_rp = calc.add_consumer([0], RadiusProfile(10, m//2)) # Consumer consumes generator 0
calc.calculate_columns()
matrix_profile = cons_mp.matrix_profile()
radius_profile = cons_rp.values
Calculate a partial multidimensional Matrix Profile over two data channels. Partial calculations return approximated results but are significantly faster, they are particularly interesting in interactive workflows, as they can be resumed.
import numpy as np
from distancematrix.generator.znorm_euclidean import ZNormEuclidean
from distancematrix.consumer.multidimensional_matrix_profile_lr import MultidimensionalMatrixProfileLR
from distancematrix.consumer.matrix_profile_lr import MatrixProfileLR
from distancematrix.calculator import AnytimeCalculator
data = np.random.randn(2, 10000)
m = 100 # Subsequence length
calc = AnytimeCalculator(m, data)
gen_0 = calc.add_generator(0, ZNormEuclidean()) # Generator 0 works on channel 0
gen_1 = calc.add_generator(1, ZNormEuclidean()) # Generator 1 works on channel 1
cons_mmp = calc.add_consumer([0, 1], MultidimensionalMatrixProfileLR()) # Consumer consumes generator 0 & 1
# Calculate only 1/4 of all distances: faster, but returns approximated results
calc.calculate_diagonals(partial=0.25)
multidimensional_matrix_profile = cons_mmp.md_matrix_profile()
# Calculate the next quarter, so in total 1/2 of all distances are processed.
calc.calculate_diagonals(partial=0.5)
multidimensional_matrix_profile = cons_mmp.md_matrix_profile()
Documentation
Documentation for the latest version is available online.
Building the documentation locally is done using Sphinx. Navigate to the docs
folder, activate the conda environment
defined in the environment file, and run:
make html
Installing
Using pip:
pip install seriesdistancematrix
Alternatively, clone this repositor and run:
python setup.py clean build install
For local development (this allows you to edit code without having to reinstall the library):
python setup.py develop
Academic Usage
When using this library for academic purposes, please cite:
@article{series_distance_matrix,
title = "A generalized matrix profile framework with support for contextual series analysis",
journal = "Engineering Applications of Artificial Intelligence",
volume = "90",
pages = "103487",
year = "2020",
issn = "0952-1976",
doi = "https://doi.org/10.1016/j.engappai.2020.103487",
url = "http://www.sciencedirect.com/science/article/pii/S0952197620300087",
author = "De Paepe, Dieter and Vanden Hautte, Sander and Steenwinckel, Bram and De Turck, Filip and Ongenae, Femke and Janssens, Olivier and Van Hoecke, Sofie"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for seriesdistancematrix-0.3.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fa1ba2712c79aa9630a717ab5bebb426180db991c7d2c0712ebaa7148a918c0 |
|
MD5 | af73e8e527a2da3e5af52ed5754b0eb2 |
|
BLAKE2b-256 | 1fce35338afadfad8b1870de002e944e151c1022f8e8a8d0a33273d2f1a1158c |
Hashes for seriesdistancematrix-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5b1c7f5d1296ff0b9b4024a441bb10939010e50cc49d35b2fa8061fce1b3b38 |
|
MD5 | 1691ab1d1c91e1ecac3e90d9b05b3adc |
|
BLAKE2b-256 | ab8bb75833dcca02fd72a8e4a12be05fcf49c91d09ed0866d806e5e218472bf8 |