Skip to main content

MASS (Mueen's Algorithm for Similarity Search)

Project description

Introduction

Tweet

MASS allows you to search a time series for a subquery resulting in an array of distances. These array of distances enable you to identify similar or dissimilar subsequences compared to your query. At its core, MASS computes Euclidean distances under z-normalization in an efficient manner and is domain agnostic in nature. It is the fundamental algorithm that the matrix profile algorithm is built on top of.

mass-ts is a python 2 and 3 compatible library.

Free software: Apache Software License 2.0

Features

Original Author's Algorithms

  • MASS - the first implementation of MASS
  • MASS2 - the second implementation of MASS that is significantly faster. Typically this is the one you will use.
  • MASS3 - a piecewise version of MASS2 that can be tuned to your hardware. Generally this is used to search very large time series.
  • MASS_weighted - TODO

Library Specific Algorithms

  • MASS2_batch - a batch version of MASS2 that reduces overall memory usage, provides parallelization and enables you to find top K number of matches within the time series. The goal of using this implementation is for very large time series similarity search.
  • top_k_motifs - find the top K number of similar subsequences to your given query. It returns the starting index of the subsequence.
  • top_k_discords - find the top K number of dissimilar subsequences to your given query. It returns the starting index of the subsequence.
  • MASS2_gpu - a GPU implementation of MASS2 leveraging the Python library CuPy.

Installation

pip install mass-ts

GPU Support

Please follow the installation guide for CuPy. It covers what drivers and environmental dependencies are required. Once you are finished there, you can install GPU support for the algorithms.

pip install mass-ts[gpu]

Example Usage

A dedicated repository for practical examples can be found at the mass-ts-examples repository.

import numpy as np
import mass_ts as mts

ts = np.loadtxt('ts.txt')
query = np.loadtxt('query.txt')

# mass
distances = mts.mass(ts, query)

# mass2
distances = mts.mass2(ts, query)

# mass3
distances = mts.mass3(ts, query, 256)

# mass2_gpu
distances = mts.mass2_gpu(ts, query)

# mass2_batch
# start a multi-threaded batch job with all cpu cores and give me the top 5 matches.
# note that batch_size partitions your time series into a subsequence similarity search.
# even for large time series in single threaded mode, this is much more memory efficient than
# MASS2 on its own.
batch_size = 10000
top_matches = 5
n_jobs = -1
indices, distances = mts.mass2_batch(ts, query, batch_size, 
    top_matches=top_matches, n_jobs=n_jobs)

# find minimum distance
min_idx = np.argmin(distances)

# find top 4 motif starting indices
k = 4
exclusion_zone = 25
top_motifs = mts.top_k_motifs(distances, k, exclusion_zone)

# find top 4 discord starting indices
k = 4
exclusion_zone = 25
top_discords = mts.top_k_discords(distances, k, exclusion_zone)

Citations

Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2015), The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance, URL: http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html

======= History

0.1.0 (2019-05-16)

  • First release on PyPI.

0.1.1 (2019-05-17)

  • Minor precision bug fixes.

0.1.2 (2019-05-19)

  • mass2_batch release for efficient large time series searching.

0.1.3 (2019-05-19)

  • top_k_motifs - find the top k similar subsequences given a distance profile.
  • top_k_discords - find the top k dissimilar subsequences given a distance profile.

0.1.4 (2019-10-04)

  • add GPU implementation of MASS2 - mass2_gpu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mass_ts-0.1.4.tar.gz (48.9 kB view details)

Uploaded Source

Built Distribution

mass_ts-0.1.4-py2.py3-none-any.whl (15.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mass_ts-0.1.4.tar.gz.

File metadata

  • Download URL: mass_ts-0.1.4.tar.gz
  • Upload date:
  • Size: 48.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for mass_ts-0.1.4.tar.gz
Algorithm Hash digest
SHA256 1629515379042ec5166454f5bd68d13cb7a0023d6589a4d2e1d9a96e7c4712a2
MD5 3b24e35f09d2c495cd797f35d68f8d89
BLAKE2b-256 0ec8814623a4c4d19e4e085e8b81d071b02c7b3e88889ecc448fd61870854255

See more details on using hashes here.

File details

Details for the file mass_ts-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: mass_ts-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for mass_ts-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3c9c41944115c0776d652fa89a33b5870534d153d44af9e25ffb0448abac2a5b
MD5 1dca02b458dfd4d96ed32a4dffd3abf3
BLAKE2b-256 5890a66739db6d92edcf51e8c672372305511444b562e175f0f1a53adcfab18f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page