An efficient aggregation method for the symbolic representation of temporal data
Project description
fABBA is a fast and accurate symbolic representation method for temporal data. It is based on a polygonal chain approximation of the time series followed by an aggregation of the polygonal pieces into groups. The aggregation process is sped up by sorting the polygonal pieces and exploiting early termination conditions. In contrast to the ABBA method [S. Elsworth and S. Güttel, Data Mining and Knowledge Discovery, 34:1175-1200, 2020], fABBA avoids repeated within-cluster-sum-of-squares computations which reduces its computational complexity significantly. Furthermore, fABBA is fully tolerance-driven and does not require the number of time series symbols to be specified by the user.
Install
fABBA has the following essential dependencies for its functionality:
cython
numpy
scipy
requests
To install the current release via PIP use:
pip install fABBA
Download this repository:
git clone https://github.com/nla-group/fABBA.git
Examples
Compress and reconstruct a time series
The following example approximately transforms a time series into a symbolic string representation (transform) and then converts the string back into a numerical format (inverse_transform). fABBA essentially requires two parameters tol and alpha. The tolerance tol determines how closely the polygonal chain approximation follows the original time series. The parameter alpha controls how similar time series pieces need to be in order to be represented by the same symbol. A smaller tol means that more polygonal pieces are used and the polygonal chain approximation is more accurate; but on the other hand, it will increase the length of the string representation. A smaller alpha typically results in a larger number of symbols.
The choice of parameters depends on the application, but in practice, one often just wants the polygonal chain to mimic the key features in time series and not to approximate any noise. In this example the time series is a sine wave and the chosen parameters result in the symbolic representation #$!”!”!”!”!”!”!”%. Note how the periodicity in the time series is nicely reflected in repetitions in its string representation.
import numpy as np
import matplotlib.pyplot as plt
from fABBA import fabba_model
ts = [np.sin(0.05*i) for i in range(1000)] # original time series
fabba = fabba_model(tol=0.1, alpha=0.1, sorting='2-norm', scl=1, verbose=0)
string = fabba.fit_transform(ts) # string representation of the time series
print(string) # prints BbAaAaAaAaAaAaAaC
inverse_ts = fabba.inverse_transform(string, ts[0]) # numerical time series reconstruction
Adaptive polygonal chain approximation
Instead of using transform which combines the polygonal chain approximation of the time series and the symbolic conversion into one, both steps of fABBA can be performed independently. Here’s how to obtain the compression pieces and reconstruct time series by inversely transforming the pieces:
import numpy as np
from fABBA import compress
from fABBA import inverse_compress
ts = [np.sin(0.05*i) for i in range(1000)]
pieces = compress(ts, tol=0.1) # pieces is a list of the polygonal chain pieces
inverse_ts = inverse_compress(pieces, ts[0]) # reconstruct polygonal chain from pieces
Similarly, the digitization can be implemented after compression step as belows:
from fABBA import digitize
from fABBA import inverse_digitize
string, parameters = digitize(pieces, alpha=0.1, sorting='2-norm', scl=1) # compression of the polygon
print(''.join(string)) # prints BbAaAaAaAaAaAaAaC
inverse_pieces = inverse_digitize(string, parameters)
inverse_ts = inverse_compress(inverse_pieces, ts[0]) # numerical time series reconstruction
Alternative ABBA approach
We also provide other clustering based ABBA methods, it is easy to use with the support of scikit-learn tools. The user guidance is as follows
import numpy as np
from sklearn.cluster import KMeans
from fABBA import ABBAbase
ts = [np.sin(0.05*i) for i in range(1000)] # original time series
# specifies 5 symbols using kmeans clustering
kmeans = KMeans(n_clusters=5, random_state=0, init='k-means++', verbose=0)
abba = ABBAbase(tol=0.1, scl=1, clustering=kmeans)
string = abba.fit_transform(ts) # string representation of the time series
print(string) # prints BbAaAaAaAaAaAaAaC
inverse_ts = abba.inverse_transform(string) # reconstruction
Image compression
The following example shows how to apply fABBA to image data.
import matplotlib.pyplot as plt
from fABBA.load_datasets import load_images
from fABBA import image_compress
from fABBA import image_decompress
from fABBA import fabba_model
from cv2 import resize
img_samples = load_images() # load test images
img = resize(img_samples[0], (100, 100)) # select the first image for test
fabba = fabba_model(tol=0.1, alpha=0.01, sorting='2-norm', scl=1, verbose=1)
string = image_compress(fabba, img) # compress image
inverse_img = image_decompress(fabba, string) # decompress image
Citation
If you use fABBA in a scientific publication, we would appreciate your citing:
@techreport{CG22a,
title = {An efficient aggregation method for the symbolic representation of temporal data},
author = {Chen, Xinye and G\"{u}ttel, Stefan},
year = {2022},
number = {arXiv:2201.05697},
pages = {23},
institution = {The University of Manchester},
address = {UK},
type = {arXiv EPrint},
url = {https://arxiv.org/abs/2201.05697}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file fABBA-1.2.8.tar.gz
.
File metadata
- Download URL: fABBA-1.2.8.tar.gz
- Upload date:
- Size: 11.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce9d9e04a84a4f8e28c50d6a667ffe107c50db1399dbb1e7d945e61c965ef2ec |
|
MD5 | de128f2713b6344223983e81dcd8db48 |
|
BLAKE2b-256 | ad1ca25eb9dc752fdca678ccf10b1ced50b50da38b99e1a81c4da4907f6ab6f7 |