Skip to main content

Distance measures for time series (Dynamic Time Warping, fast C implementation)

Project description

PyPi Version Conda Version Documentation Status DOI

Time Series Distances

Library for time series distances (e.g. Dynamic Time Warping) used in the DTAI Research Group. The library offers a pure Python implementation and a fast implementation in C. The C implementation has only Cython as a dependency. It is compatible with Numpy and Pandas and implemented such that unnecessary data copy operations are avoided.

Documentation: http://dtaidistance.readthedocs.io

Example:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0.0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2)

Citing this work:

Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck & Pieter Robberechts.
DTAIDistance (Version v2). Zenodo.
http://doi.org/10.5281/zenodo.5901139

New in v2:

  • Numpy is now an optional dependency, also to compile the C library (only Cython is required).
  • Small optimizations throughout the C code to improve speed.
  • The consistent use of ssize_t instead of int allows for larger data structures on 64 bit machines and be more compatible with Numpy.
  • The parallelization is now implemented directly in C (included if OpenMP is installed).
  • The max_dist argument turned out to be similar to Silva and Batista's work on PrunedDTW [7]. The toolbox now implements a version that is equal to PrunedDTW since it prunes more partial distances. Additionally, a use_pruning argument is added to automatically set max_dist to the Euclidean distance, as suggested by Silva and Batista, to speed up the computation (a new method ub_euclidean is available).
  • Support in the C library for multi-dimensional sequences in the dtaidistance.dtw_ndim package.
  • DTW Barycenter Averaging for clustering (v2.2).
  • Subsequence search and local concurrences (v2.3).
  • Support for N-dimensional time series (v2.3.7).

Installation

$ pip install dtaidistance

or

$ conda install -c conda-forge dtaidistance

The pip installation requires Numpy as a dependency to compile Numpy-compatible C code (using Cython). However, this dependency is optional and can be removed.

The source code is available at github.com/wannesm/dtaidistance.

If you encounter any problems during compilation (e.g. the C-based implementation or OpenMP is not available), see the documentation for more options.

Usage

Dynamic Time Warping (DTW) Distance Measure

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import numpy as np
s1 = np.array([0., 0, 1, 2, 1, 0, 1, 0, 0, 2, 1, 0, 0])
s2 = np.array([0., 1, 2, 3, 1, 0, 0, 0, 2, 1, 0, 0, 0])
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path, filename="warp.png")

Dynamic Time Warping (DTW) Example

DTW Distance Measure Between Two Series

Only the distance measure based on two sequences of numbers:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance = dtw.distance(s1, s2)
print(distance)

The fastest version (30-300 times) uses c directly but requires an array as input (with the double type), and (optionally) also prunes computations by setting max_dist to the Euclidean upper bound:

from dtaidistance import dtw
import array
s1 = array.array('d',[0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = array.array('d',[0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Or you can use a numpy array (with dtype double or float):

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Check the __doc__ for information about the available arguments:

print(dtw.distance.__doc__)

A number of options are foreseen to early stop some paths the dynamic programming algorithm is exploring or tune the distance measure computation:

  • window: Only allow for shifts up to this amount away from the two diagonals.
  • max_dist: Stop if the returned distance measure will be larger than this value.
  • max_step: Do not allow steps larger than this value.
  • max_length_diff: Return infinity if difference in length of two series is larger.
  • penalty: Penalty to add if compression or expansion is applied (on top of the distance).
  • psi: Psi relaxation to ignore begin and/or end of sequences (for cylical sequences) [2].
  • use_pruning: Prune computations based on the Euclidean upper bound.

DTW Distance Measure all warping paths

If, next to the distance, you also want the full matrix to see all possible warping paths:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance, paths = dtw.warping_paths(s1, s2)
print(distance)
print(paths)

The matrix with all warping paths can be visualised as follows:

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import random
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
random.seed(1)
for idx in range(len(s2)):
    if random.random() < 0.05:
        s2[idx] += (random.random() - 0.5) / 2
d, paths = dtw.warping_paths(s1, s2, window=25, psi=2)
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

DTW Example

Notice the psi parameter that relaxes the matching at the beginning and end. In this example this results in a perfect match even though the sine waves are slightly shifted.

DTW Distance Measures Between Set of Series

To compute the DTW distance measures between all sequences in a list of sequences, use the method dtw.distance_matrix. You can set variables to use more or less c code (use_c and use_nogil) and parallel or serial execution (parallel).

The distance_matrix method expects a list of lists/arrays:

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

or a matrix (in case all series have the same length):

from dtaidistance import dtw
import numpy as np
series = np.matrix([
    [0.0, 0, 1, 2, 1, 0, 1, 0, 0],
    [0.0, 1, 2, 0, 0, 0, 0, 0, 0],
    [0.0, 0, 1, 2, 1, 0, 0, 0, 0]])
ds = dtw.distance_matrix_fast(series)

DTW Distance Measures Between Set of Series, limited to block

You can instruct the computation to only fill part of the distance measures matrix. For example to distribute the computations over multiple nodes, or to only compare source series to target series.

from dtaidistance import dtw
import numpy as np
series = np.matrix([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])
ds = dtw.distance_matrix_fast(series, block=((1, 4), (3, 5)))

The output in this case will be:

#  0     1    2    3       4       5
[[ inf   inf  inf     inf     inf  inf]    # 0
 [ inf   inf  inf  1.4142  0.0000  inf]    # 1
 [ inf   inf  inf  2.2360  1.7320  inf]    # 2
 [ inf   inf  inf     inf  1.4142  inf]    # 3
 [ inf   inf  inf     inf     inf  inf]    # 4
 [ inf   inf  inf     inf     inf  inf]]   # 5

Clustering

A distance matrix can be used for time series clustering. You can use existing methods such as scipy.cluster.hierarchy.linkage or one of two included clustering methods (the latter is a wrapper for the SciPy linkage method).

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)
# Augment Hierarchical object to keep track of the full tree
model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

For models that keep track of the full clustering tree (HierarchicalTree or LinkageTree), the tree can be visualised:

model.plot("myplot.png")

Dynamic Time Warping (DTW) hierarchical clusteringt

Dependencies

Optional:

Development:

Contact

References

  1. T. K. Vintsyuk, Speech discrimination by dynamic programming. Kibernetika, 4:81–88, 1968.
  2. H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.
  3. C. S. Myers and L. R. Rabiner, A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal, 60(7):1389–1409, Sept 1981.
  4. Mueen, A and Keogh, E, Extracting Optimal Performance from Dynamic Time Warping, Tutorial, KDD 2016
  5. D. F. Silva, G. E. A. P. A. Batista, and E. Keogh. On the effect of endpoints on dynamic time warping, In SIGKDD Workshop on Mining and Learning from Time Series, II. Association for Computing Machinery-ACM, 2016.
  6. C. Yanping, K. Eamonn, H. Bing, B. Nurjahan, B. Anthony, M. Abdullah and B. Gustavo. The UCR Time Series Classification Archive, 2015.
  7. D. F. Silva and G. E. Batista. Speeding up all-pairwise dynamic time warping matrix calculation, In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 837–845. SIAM, 2016.

License

DTAI distance code.

Copyright 2016-2022 KU Leuven, DTAI Research Group

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtaidistance-2.3.9.tar.gz (809.7 kB view details)

Uploaded Source

Built Distributions

dtaidistance-2.3.9-py3-none-any.whl (539.0 kB view details)

Uploaded Python 3

dtaidistance-2.3.9-cp310-cp310-win_amd64.whl (783.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

dtaidistance-2.3.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.9-cp39-cp39-win_amd64.whl (787.6 kB view details)

Uploaded CPython 3.9 Windows x86-64

dtaidistance-2.3.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.9-cp39-cp39-macosx_10_15_x86_64.whl (872.9 kB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

dtaidistance-2.3.9-cp38-cp38-win_amd64.whl (787.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

dtaidistance-2.3.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.9-cp38-cp38-macosx_10_15_x86_64.whl (864.3 kB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

File details

Details for the file dtaidistance-2.3.9.tar.gz.

File metadata

  • Download URL: dtaidistance-2.3.9.tar.gz
  • Upload date:
  • Size: 809.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dtaidistance-2.3.9.tar.gz
Algorithm Hash digest
SHA256 69787e3fc1dc8536c7b26201505ab4247677bed4be9aedb486d3beac1d911ce1
MD5 ab264d76b01a533a056b6e6522193490
BLAKE2b-256 ba1456f4def3dbb98d3681e93f2c46876f218b27becdbf2055ae8c4b8288aa66

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-py3-none-any.whl.

File metadata

  • Download URL: dtaidistance-2.3.9-py3-none-any.whl
  • Upload date:
  • Size: 539.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dtaidistance-2.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 79e7d0f5a4b0f2c0eb9a2b9f5edf18ccef0c94903000ca8fdc5cb92d8a0ab645
MD5 f96a78735e27dd94be2e61e71fec8955
BLAKE2b-256 2ea20cef333ad2f461a559b3f31fa19d5726c1ba29f50dddd864f38fcf025fc5

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 4afb5de9bc613fc50cba2631b6e64e279815187f5a55ed39e40e238dde007111
MD5 ce9cc3c19630e90b69a6ed19ef13526b
BLAKE2b-256 d20c85e826d882192ea0fce744a68af55dae04f4b44ece4110cc4b6639d3ae08

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 809b0c82746bfe26c9654551a33170bb84d2aea738b2779167557ce340271bf7
MD5 97b748129b9dbf6490b4af750c883e47
BLAKE2b-256 b656b85cf1d9d9c4810529f7b906809ce9a5ab26deb8d1548019fc8b0a39d6b0

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 dda00a644ccd7d0dcf05cc462619816bda19c9907f59b4ea014fba90d0341592
MD5 fc233446de1757ee9d25e2c5d4667278
BLAKE2b-256 0e5f56957456519532a38fe41660b32fd59376e02fea5dd78fc4a90c7dc79086

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8de0383db60b099c24f48392f3f6e3b7033143e1c22f2a22c7ca362112cfbeaf
MD5 88cc1d7c9ae00f145a06a65e53eff88a
BLAKE2b-256 dab1801bfd40fc316b3c714e145c9aad7855c7219bbaa1b7eed6c58c25925027

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 d8bfa3c6a7512fe338ff67f30a32ce3100068488aabfd69aa604b191065707fc
MD5 0219df92aa1385230c4a6af21b3f6b63
BLAKE2b-256 22f243e681446912e23dd6c631363073e8b7b6b33e49dd7d8e4f04905e5354f4

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 842ee9bc03e22725c4a96fe4bd3c2aeb324e605d98277740599768a7106e9969
MD5 77f8fbc5062a10ba4b5875b092ad8b14
BLAKE2b-256 d41ee0e0b0c3196c2813593f2ee816908262b9ea18592cc1236af2ec32668fb1

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2952f0abe97ecb345f10b942e6fdd005c90ef57ffb28272845d959fde8430282
MD5 f8b0699988137d4f097e134356211105
BLAKE2b-256 a447e6df911524447795a2f0782f47967ea2772b5450e3d7fcd1a6d87819f2f8

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.9-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.9-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 2ca5455d4f650a6b254c37ab113fb9650325fd5ffcb5cc64dc1b80a66b9517e2
MD5 5c6857f17793a818eb93cece895f8a69
BLAKE2b-256 6b8d561d2bf6f13704599f1857373ee8aeb073c3f8b9959a1924181155e7eace

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page