A Python implementation of the matrix profile
Project description
pytsmp is a Python implementation of the matrix profile. More details about matrix profile can be found in the UCR Matrix Profile Page by the paper authors.
Currently support MASS and the matrix profile algorithms STAMP, STOMP, SCRIMP++ (no multi-core or GPU support yet), and some convenience functions such as discords and motifs finding. I plan to implement the parallelized version of the matrix profile algorithms later.
The original implementation (in R) of the paper authors from the UCR group can be found here.
Installation
pytsmp is available via pip.
pip install pytsmp
Usage
To compute the matrix profile using STAMP, use the following code.
import numpy as np
from pytsmp import STAMP
# create a 1000 step random walk and a random query
ts = np.cumsum(np.random.randint(2, size=(1000,)) * 2 - 1)
query = np.random.rand(200)
# Create the STAMP object. Note that computation starts immediately.
mp = STAMP(ts, query, window_size=50) # window_size must be specified as a named argument
# get the matrix profile and the profile indexes
mat_profile, ind_profile = mp.get_profiles()
Incremental of the time series and the query is supported.
import numpy as np
from pytsmp import STAMP
# create a 1000 step random walk and its matrix profile
ts = np.cumsum(np.random.randint(2, size=(1000,)) * 2 - 1)
mp = STAMP(ts, window_size=50)
mat_profile, _ = mp.get_profiles()
# create the matrix profile of the first 999 steps and increment the last step later
mp_inc = STAMP(ts[:-1], window_size=50)
mp_inc.update_ts1(ts[-1]) # similarly, you can update the query by update_ts2()
mat_profile_inc, _ = mp_inc.get_profiles()
print(np.allclose(mat_profile, mat_profile_inc)) # True
Benchmark
Perform a simple trial run on a random walk with 40000 data points.
import numpy as np
from pytsmp import STAMP
np.random.seed(42) # fix a seed to control randomness
ts = np.cumsum(np.random.randint(2, size=(40000,)) * 2 - 1)
# ipython magic command
%timeit mp = STAMP(ts, window_size=1000, verbose=False, seed=42)
# and similarly for STOMP and SCRIMP
On my MacBook Pro with 2.2 GHz Intel Core i7, the results are (all over 7 runs, 1 loop each):
Algorithm |
Data Size |
Window Size |
Elapsed Time |
---|---|---|---|
STAMP |
40000 |
1000 |
2min 14s ± 392ms |
STOMP |
40000 |
1000 |
22.1s ± 52.8ms |
SCRIMP (without PreSCRIMP) |
40000 |
1000 |
23.6s ± 402ms |
PreSCRIMP (Approximate algorithm) |
40000 |
1000 |
606ms ± 9.5ms |
Reference
C.C.M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H.A. Dau, D. Silva, A. Mueen and E. Keogh. “Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets”. IEEE ICDM 2016.
Y. Zhu, Z. Zimmerman, N.S. Senobari, C.C.M. Yeh, G. Funning, A. Mueen, P. Berisk and E. Keogh. “Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins”. IEEE ICDM 2016.
Y. Zhu, C.C.M. Yeh, Z. Zimmerman, K. Kamgar and E. Keogh. “Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed”. IEEE ICDM 2018.
Disclaimer
This project is for my own learning and understanding purpose, and I may not be able to actively develop it from time to time. If you need a Python implementation of the matrix profile, you may try matrixprofile-ts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pytsmp-0.3.0.tar.gz
.
File metadata
- Download URL: pytsmp-0.3.0.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a923c1aed2f4eaa46b5c8aa70c1432b0e14fe7762cc1555b9d0d2c03270c0b90 |
|
MD5 | 082b98006367cc89f59828a8a00bd695 |
|
BLAKE2b-256 | ff95cdde3543951d3b2ba34a161b54628603a191561f2bc88589e3247800e2a3 |