Skip to main content

Distance measures for time series (Dynamic Time Warping, fast C implementation)

Project description

PyPi Version Conda Version Documentation Status DOI

Time Series Distances

Library for time series distances (e.g. Dynamic Time Warping) used in the DTAI Research Group. The library offers a pure Python implementation and a fast implementation in C. The C implementation has only Cython as a dependency. It is compatible with Numpy and Pandas and implemented such that unnecessary data copy operations are avoided.

Documentation: http://dtaidistance.readthedocs.io

Example:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0.0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2)

Citing this work:

Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck & Pieter Robberechts.
DTAIDistance (Version v2). Zenodo.
http://doi.org/10.5281/zenodo.5901139

New in v2:

  • Numpy is now an optional dependency, also to compile the C library (only Cython is required).
  • Small optimizations throughout the C code to improve speed.
  • The consistent use of ssize_t instead of int allows for larger data structures on 64 bit machines and be more compatible with Numpy.
  • The parallelization is now implemented directly in C (included if OpenMP is installed).
  • The max_dist argument turned out to be similar to Silva and Batista's work on PrunedDTW [7]. The toolbox now implements a version that is equal to PrunedDTW since it prunes more partial distances. Additionally, a use_pruning argument is added to automatically set max_dist to the Euclidean distance, as suggested by Silva and Batista, to speed up the computation (a new method ub_euclidean is available).
  • Support in the C library for multi-dimensional sequences in the dtaidistance.dtw_ndim package.
  • DTW Barycenter Averaging for clustering (v2.2).
  • Subsequence search and local concurrences (v2.3).

Installation

$ pip install dtaidistance

or

$ conda install -c conda-forge dtaidistance

The pip installation requires Numpy as a dependency to compile Numpy-compatible C code (using Cython). However, this dependency is optional and can be removed.

The source code is available at github.com/wannesm/dtaidistance.

If you encounter any problems during compilation (e.g. the C-based implementation or OpenMP is not available), see the documentation for more options.

Usage

Dynamic Time Warping (DTW) Distance Measure

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import numpy as np
s1 = np.array([0., 0, 1, 2, 1, 0, 1, 0, 0, 2, 1, 0, 0])
s2 = np.array([0., 1, 2, 3, 1, 0, 0, 0, 2, 1, 0, 0, 0])
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path, filename="warp.png")

Dynamic Time Warping (DTW) Example

DTW Distance Measure Between Two Series

Only the distance measure based on two sequences of numbers:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance = dtw.distance(s1, s2)
print(distance)

The fastest version (30-300 times) uses c directly but requires an array as input (with the double type), and (optionally) also prunes computations by setting max_dist to the Euclidean upper bound:

from dtaidistance import dtw
import array
s1 = array.array('d',[0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = array.array('d',[0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Or you can use a numpy array (with dtype double or float):

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Check the __doc__ for information about the available arguments:

print(dtw.distance.__doc__)

A number of options are foreseen to early stop some paths the dynamic programming algorithm is exploring or tune the distance measure computation:

  • window: Only allow for shifts up to this amount away from the two diagonals.
  • max_dist: Stop if the returned distance measure will be larger than this value.
  • max_step: Do not allow steps larger than this value.
  • max_length_diff: Return infinity if difference in length of two series is larger.
  • penalty: Penalty to add if compression or expansion is applied (on top of the distance).
  • psi: Psi relaxation to ignore begin and/or end of sequences (for cylical sequences) [2].
  • use_pruning: Prune computations based on the Euclidean upper bound.

DTW Distance Measure all warping paths

If, next to the distance, you also want the full matrix to see all possible warping paths:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance, paths = dtw.warping_paths(s1, s2)
print(distance)
print(paths)

The matrix with all warping paths can be visualised as follows:

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import random
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
random.seed(1)
for idx in range(len(s2)):
    if random.random() < 0.05:
        s2[idx] += (random.random() - 0.5) / 2
d, paths = dtw.warping_paths(s1, s2, window=25, psi=2)
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

DTW Example

Notice the psi parameter that relaxes the matching at the beginning and end. In this example this results in a perfect match even though the sine waves are slightly shifted.

DTW Distance Measures Between Set of Series

To compute the DTW distance measures between all sequences in a list of sequences, use the method dtw.distance_matrix. You can set variables to use more or less c code (use_c and use_nogil) and parallel or serial execution (parallel).

The distance_matrix method expects a list of lists/arrays:

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

or a matrix (in case all series have the same length):

from dtaidistance import dtw
import numpy as np
series = np.matrix([
    [0.0, 0, 1, 2, 1, 0, 1, 0, 0],
    [0.0, 1, 2, 0, 0, 0, 0, 0, 0],
    [0.0, 0, 1, 2, 1, 0, 0, 0, 0]])
ds = dtw.distance_matrix_fast(series)

DTW Distance Measures Between Set of Series, limited to block

You can instruct the computation to only fill part of the distance measures matrix. For example to distribute the computations over multiple nodes, or to only compare source series to target series.

from dtaidistance import dtw
import numpy as np
series = np.matrix([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])
ds = dtw.distance_matrix_fast(series, block=((1, 4), (3, 5)))

The output in this case will be:

#  0     1    2    3       4       5
[[ inf   inf  inf     inf     inf  inf]    # 0
 [ inf   inf  inf  1.4142  0.0000  inf]    # 1
 [ inf   inf  inf  2.2360  1.7320  inf]    # 2
 [ inf   inf  inf     inf  1.4142  inf]    # 3
 [ inf   inf  inf     inf     inf  inf]    # 4
 [ inf   inf  inf     inf     inf  inf]]   # 5

Clustering

A distance matrix can be used for time series clustering. You can use existing methods such as scipy.cluster.hierarchy.linkage or one of two included clustering methods (the latter is a wrapper for the SciPy linkage method).

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)
# Augment Hierarchical object to keep track of the full tree
model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

For models that keep track of the full clustering tree (HierarchicalTree or LinkageTree), the tree can be visualised:

model.plot("myplot.png")

Dynamic Time Warping (DTW) hierarchical clusteringt

Dependencies

Optional:

Development:

Contact

References

  1. T. K. Vintsyuk, Speech discrimination by dynamic programming. Kibernetika, 4:81–88, 1968.
  2. H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.
  3. C. S. Myers and L. R. Rabiner, A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal, 60(7):1389–1409, Sept 1981.
  4. Mueen, A and Keogh, E, Extracting Optimal Performance from Dynamic Time Warping, Tutorial, KDD 2016
  5. D. F. Silva, G. E. A. P. A. Batista, and E. Keogh. On the effect of endpoints on dynamic time warping, In SIGKDD Workshop on Mining and Learning from Time Series, II. Association for Computing Machinery-ACM, 2016.
  6. C. Yanping, K. Eamonn, H. Bing, B. Nurjahan, B. Anthony, M. Abdullah and B. Gustavo. The UCR Time Series Classification Archive, 2015.
  7. D. F. Silva and G. E. Batista. Speeding up all-pairwise dynamic time warping matrix calculation, In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 837–845. SIAM, 2016.

License

DTAI distance code.

Copyright 2016-2022 KU Leuven, DTAI Research Group

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtaidistance-2.3.7.tar.gz (808.8 kB view details)

Uploaded Source

Built Distributions

dtaidistance-2.3.7-cp310-cp310-win_amd64.whl (782.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

dtaidistance-2.3.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.7-cp310-cp310-macosx_10_15_x86_64.whl (870.8 kB view details)

Uploaded CPython 3.10 macOS 10.15+ x86-64

dtaidistance-2.3.7-cp39-cp39-win_amd64.whl (786.6 kB view details)

Uploaded CPython 3.9 Windows x86-64

dtaidistance-2.3.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.7-cp39-cp39-macosx_10_15_x86_64.whl (871.9 kB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

dtaidistance-2.3.7-cp38-cp38-win_amd64.whl (786.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

dtaidistance-2.3.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.7-cp38-cp38-macosx_10_15_x86_64.whl (863.3 kB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

File details

Details for the file dtaidistance-2.3.7.tar.gz.

File metadata

  • Download URL: dtaidistance-2.3.7.tar.gz
  • Upload date:
  • Size: 808.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dtaidistance-2.3.7.tar.gz
Algorithm Hash digest
SHA256 198ed7e147fee4791951d84707ecac95e3080dcbdb54848cdff6af3eb06186b8
MD5 2bebc2822938365a9896757187b67a83
BLAKE2b-256 3fe8cded9bb4bb5b753c62e643cfc9081e9e51645ba8f938f3c7f2a71055aa5f

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 274ba242a66bdd157ae71d3ef381190a9a03372536b18ef36cf03144d715aad7
MD5 f9178c9d1744a1c8fde306c87b1dd08b
BLAKE2b-256 c336d8c8f47147bbd68fbf8b81d31acc8556bb5674904d1f05b7b1766e20596d

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a6fb9135e831c35b786a959dc464c3bf72f9e1b7f04001407c556b9e6d1514bb
MD5 3d181ae71e098bc0d131ac264a434265
BLAKE2b-256 f95cdc6a29e1dc0d3141133969eaa038722626045984106e76e9d2f1c15f9df6

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp310-cp310-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp310-cp310-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 a82adb714ac71b87c92e5bb02b326e9b400e84d7add9630146d139ad417e5424
MD5 b67cc7b88b57ba84369423746716b99c
BLAKE2b-256 ad8342832c5c94cd2e15300386b9b21a607117272ec6b003e13983cf070f6de3

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 31d6927304719e1ac52b4fd3aab159e3e7fe29dc88cc41cb8dd8bfe5c5e2c537
MD5 651279e1681946c882ff3b4019a5dad7
BLAKE2b-256 7deed2c7623f1604c3489e46bd9a75124e4f0202efa9cae37d8840b9da48d186

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 028892f43fbf7d2fbaf7ea38fb310b6c8b9d60f2f48d7e79c505f5f1e6ed92c8
MD5 91297b6feb9b04bebc88a27cb0607aa2
BLAKE2b-256 2d2fa5f8af1ae051150bf0b2aa1b4137d8b07ccf9b014f6fa94dfd154ce49283

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 1cb9ed8f9a17c87609beba28263dac97a1ef907ea1149d954902b6dd5fd6a1e9
MD5 5fbd10626b2e5b8bdd7556f298de29bc
BLAKE2b-256 894976664c105e2af557674d9ef96047fe088d479656da552ff631b541e4a4f8

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e95cf1b8070de40b047a5e9e3e50aa19e88fcd43377cef9b5b80fa346e4239c1
MD5 44a9d366c646efc09f2330eee5fe434b
BLAKE2b-256 b289a8973447566c74fdb18dff94412d1dbdd29185b31d9282eac4b84979b3c1

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 40fc823f33699d895d87551053c858b58b95c555608c8beb8be974dd83965636
MD5 a0cffb1b4dddd0b0dac0a1e2cde8ba00
BLAKE2b-256 ac19b0082ca947f1296b2445039507b71ca0ca40ceaebd0d041b059ab4e999f3

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.7-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.7-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 d8c6cb895f8d9727a6e17cdb9b322c34baa823e13787c4042c6ab7aa6c078e54
MD5 44c19520cb4cb9dbb50d42f727e02738
BLAKE2b-256 aeded3e3e71f08c629cf212a9823ad9cc43639bcfaffa893516dd111aa884971

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page