Skip to main content

Distance measures for time series (Dynamic Time Warping, fast C implementation)

Project description

PyPi Version Conda Version Documentation Status DOI

Time Series Distances

Library for time series distances (e.g. Dynamic Time Warping) used in the DTAI Research Group. The library offers a pure Python implementation and a fast implementation in C. The C implementation has only Cython as a dependency. It is compatible with Numpy and Pandas and implemented such that unnecessary data copy operations are avoided.

Documentation: http://dtaidistance.readthedocs.io

Example:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0.0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2)

Citing this work:

Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck & Pieter Robberechts.
DTAIDistance (Version v2). Zenodo.
http://doi.org/10.5281/zenodo.5901139

New in v2:

  • Numpy is now an optional dependency, also to compile the C library (only Cython is required).
  • Small optimizations throughout the C code to improve speed.
  • The consistent use of ssize_t instead of int allows for larger data structures on 64 bit machines and be more compatible with Numpy.
  • The parallelization is now implemented directly in C (included if OpenMP is installed).
  • The max_dist argument turned out to be similar to Silva and Batista's work on PrunedDTW [7]. The toolbox now implements a version that is equal to PrunedDTW since it prunes more partial distances. Additionally, a use_pruning argument is added to automatically set max_dist to the Euclidean distance, as suggested by Silva and Batista, to speed up the computation (a new method ub_euclidean is available).
  • Support in the C library for multi-dimensional sequences in the dtaidistance.dtw_ndim package.
  • DTW Barycenter Averaging for clustering (v2.2).
  • Subsequence search and local concurrences (v2.3).
  • Support for N-dimensional time series (v2.3.7).

Installation

$ pip install dtaidistance

or

$ conda install -c conda-forge dtaidistance

The pip installation requires Numpy as a dependency to compile Numpy-compatible C code (using Cython). However, this dependency is optional and can be removed.

The source code is available at github.com/wannesm/dtaidistance.

If you encounter any problems during compilation (e.g. the C-based implementation or OpenMP is not available), see the documentation for more options.

Usage

Dynamic Time Warping (DTW) Distance Measure

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import numpy as np
s1 = np.array([0., 0, 1, 2, 1, 0, 1, 0, 0, 2, 1, 0, 0])
s2 = np.array([0., 1, 2, 3, 1, 0, 0, 0, 2, 1, 0, 0, 0])
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path, filename="warp.png")

Dynamic Time Warping (DTW) Example

DTW Distance Measure Between Two Series

Only the distance measure based on two sequences of numbers:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance = dtw.distance(s1, s2)
print(distance)

The fastest version (30-300 times) uses c directly but requires an array as input (with the double type), and (optionally) also prunes computations by setting max_dist to the Euclidean upper bound:

from dtaidistance import dtw
import array
s1 = array.array('d',[0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = array.array('d',[0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Or you can use a numpy array (with dtype double or float):

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Check the __doc__ for information about the available arguments:

print(dtw.distance.__doc__)

A number of options are foreseen to early stop some paths the dynamic programming algorithm is exploring or tune the distance measure computation:

  • window: Only allow for shifts up to this amount away from the two diagonals.
  • max_dist: Stop if the returned distance measure will be larger than this value.
  • max_step: Do not allow steps larger than this value.
  • max_length_diff: Return infinity if difference in length of two series is larger.
  • penalty: Penalty to add if compression or expansion is applied (on top of the distance).
  • psi: Psi relaxation to ignore begin and/or end of sequences (for cylical sequences) [2].
  • use_pruning: Prune computations based on the Euclidean upper bound.

DTW Distance Measure all warping paths

If, next to the distance, you also want the full matrix to see all possible warping paths:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance, paths = dtw.warping_paths(s1, s2)
print(distance)
print(paths)

The matrix with all warping paths can be visualised as follows:

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import random
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
random.seed(1)
for idx in range(len(s2)):
    if random.random() < 0.05:
        s2[idx] += (random.random() - 0.5) / 2
d, paths = dtw.warping_paths(s1, s2, window=25, psi=2)
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

DTW Example

Notice the psi parameter that relaxes the matching at the beginning and end. In this example this results in a perfect match even though the sine waves are slightly shifted.

DTW Distance Measures Between Set of Series

To compute the DTW distance measures between all sequences in a list of sequences, use the method dtw.distance_matrix. You can set variables to use more or less c code (use_c and use_nogil) and parallel or serial execution (parallel).

The distance_matrix method expects a list of lists/arrays:

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

or a matrix (in case all series have the same length):

from dtaidistance import dtw
import numpy as np
series = np.matrix([
    [0.0, 0, 1, 2, 1, 0, 1, 0, 0],
    [0.0, 1, 2, 0, 0, 0, 0, 0, 0],
    [0.0, 0, 1, 2, 1, 0, 0, 0, 0]])
ds = dtw.distance_matrix_fast(series)

DTW Distance Measures Between Set of Series, limited to block

You can instruct the computation to only fill part of the distance measures matrix. For example to distribute the computations over multiple nodes, or to only compare source series to target series.

from dtaidistance import dtw
import numpy as np
series = np.matrix([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])
ds = dtw.distance_matrix_fast(series, block=((1, 4), (3, 5)))

The output in this case will be:

#  0     1    2    3       4       5
[[ inf   inf  inf     inf     inf  inf]    # 0
 [ inf   inf  inf  1.4142  0.0000  inf]    # 1
 [ inf   inf  inf  2.2360  1.7320  inf]    # 2
 [ inf   inf  inf     inf  1.4142  inf]    # 3
 [ inf   inf  inf     inf     inf  inf]    # 4
 [ inf   inf  inf     inf     inf  inf]]   # 5

Clustering

A distance matrix can be used for time series clustering. You can use existing methods such as scipy.cluster.hierarchy.linkage or one of two included clustering methods (the latter is a wrapper for the SciPy linkage method).

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)
# Augment Hierarchical object to keep track of the full tree
model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

For models that keep track of the full clustering tree (HierarchicalTree or LinkageTree), the tree can be visualised:

model.plot("myplot.png")

Dynamic Time Warping (DTW) hierarchical clusteringt

Dependencies

Optional:

Development:

Contact

References

  1. T. K. Vintsyuk, Speech discrimination by dynamic programming. Kibernetika, 4:81–88, 1968.
  2. H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.
  3. C. S. Myers and L. R. Rabiner, A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal, 60(7):1389–1409, Sept 1981.
  4. Mueen, A and Keogh, E, Extracting Optimal Performance from Dynamic Time Warping, Tutorial, KDD 2016
  5. D. F. Silva, G. E. A. P. A. Batista, and E. Keogh. On the effect of endpoints on dynamic time warping, In SIGKDD Workshop on Mining and Learning from Time Series, II. Association for Computing Machinery-ACM, 2016.
  6. C. Yanping, K. Eamonn, H. Bing, B. Nurjahan, B. Anthony, M. Abdullah and B. Gustavo. The UCR Time Series Classification Archive, 2015.
  7. D. F. Silva and G. E. Batista. Speeding up all-pairwise dynamic time warping matrix calculation, In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 837–845. SIAM, 2016.

License

DTAI distance code.

Copyright 2016-2022 KU Leuven, DTAI Research Group

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtaidistance-2.3.8.tar.gz (809.1 kB view details)

Uploaded Source

Built Distributions

dtaidistance-2.3.8-py3-none-any.whl (538.1 kB view details)

Uploaded Python 3

dtaidistance-2.3.8-cp310-cp310-win_amd64.whl (782.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

dtaidistance-2.3.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.8-cp39-cp39-win_amd64.whl (786.7 kB view details)

Uploaded CPython 3.9 Windows x86-64

dtaidistance-2.3.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.8-cp39-cp39-macosx_10_15_x86_64.whl (872.0 kB view details)

Uploaded CPython 3.9 macOS 10.15+ x86-64

dtaidistance-2.3.8-cp38-cp38-win_amd64.whl (786.3 kB view details)

Uploaded CPython 3.8 Windows x86-64

dtaidistance-2.3.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

dtaidistance-2.3.8-cp38-cp38-macosx_10_15_x86_64.whl (863.4 kB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

File details

Details for the file dtaidistance-2.3.8.tar.gz.

File metadata

  • Download URL: dtaidistance-2.3.8.tar.gz
  • Upload date:
  • Size: 809.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dtaidistance-2.3.8.tar.gz
Algorithm Hash digest
SHA256 302b7e423beee571ff4980445adee8b0745c2cd7f8f28324794d600984bc9a8b
MD5 368aa10107bdf2a45182e3e679304ff7
BLAKE2b-256 674a01a35436f887f9953bc3f97845ea033099178d21afc4004c46ebdf80058d

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-py3-none-any.whl.

File metadata

  • Download URL: dtaidistance-2.3.8-py3-none-any.whl
  • Upload date:
  • Size: 538.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dtaidistance-2.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b2c0276ea4026d8b55763a207d2c4df293599596044ab84554927767d9ea4b2a
MD5 167ab8da82de5aae5787c773ef527e9e
BLAKE2b-256 0ed9e1bcdcc1dd83d0f0c88d1f6c8b6d4ab4b44af8da74bed48303a7e7b311f0

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 a68750c7efa6350ec1b9ae2942af1b90341b6684b939df47c3075a73f68fcfcf
MD5 b3926d27597a3bf39f2835f50cfea647
BLAKE2b-256 c6aaf4efd09e8a424f0252723e328a6091267d03a8955e5ef12800234064bd75

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f935984b5825a079a1dd838f10c0a099846d241b0c9fed6ffcd1ac00ece19841
MD5 74b536fe8b970531fd36d890b26f0283
BLAKE2b-256 9e6c6d415a3b8760c8f1ec3644806fcc04832d299a92418c1599f3ce8687d82d

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7ccb6feb2ee517c4ad63644cada79f369ded78f94333f7b5b38564eeacfa7dd3
MD5 cdf5cd93787f29f641a0138103a2c6ef
BLAKE2b-256 bd2ed64c884058288e3f1f14ddc4c5b953d777c31f63ad584ec3d95028472ef8

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fdf48af8f1d21a1c24dbaf2894ae404aec41ba412d9165e25af0bab7a29e31d3
MD5 1e6bad77c8392ef320b94c924b7aa337
BLAKE2b-256 54ee31a7455fb204d8d1a948f585c8c943da608946440158e11ec0ddd4cc27d2

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp39-cp39-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp39-cp39-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 963a6631a5e001a2b365d06ccc3c8b93c0eed105287a912f36b02a5f0f891bbd
MD5 9787235c8e13300df9211d774c0d3897
BLAKE2b-256 cac9e99dc9d4f07f58a921009276b0f671c19eaf4bf02941eadedad212dde0f0

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 e38833ed1fe1f373148fa3d676ca4ba14051e8b4c552a04e4c898ee46bb3111d
MD5 189d8c212949d3eea0a91edf59332153
BLAKE2b-256 486a47008f999ba6e08ac00d1401d7c99aff8756ffaad8c1fbfaf8409c712aae

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5ee06a276be6575ead5a1ceee792a60d9f3cf89a635c58a2a046877b96eae5f7
MD5 33cb542658874957b98e7ad5a59bd582
BLAKE2b-256 1300b6a232c59b88742ac31495f15e01257f7325c0f61f7928188b795281e399

See more details on using hashes here.

File details

Details for the file dtaidistance-2.3.8-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for dtaidistance-2.3.8-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 4f2e0dc2a4948a56b51966fbcc6fb1d7f1346ad7a5f3eac52e5a4126062a2c9e
MD5 55664b9fbb138205f4fdccdb47f3c29d
BLAKE2b-256 c8d8f8c9fa529ab6943cf13309f2b53f723b638182c98c7ac6653bd35908a4b4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page