Skip to main content

MASS (Mueen's Algorithm for Similarity Search)

Project description

MASS (Mueen's Algorithm for Similarity Search)

MASS is the fundamental algorithm that the matrix profile algorithm is built on top of. It allows you to search a time series for a smaller series. The result is an array of distances. To find the "closest" section of a time series to yours, simply find the minimum distance(s).

mass-ts is a python 2 and 3 compatible library.

  • Free software: Apache Software License 2.0

Features

  • MASS - the first implementation of MASS
  • MASS2 - the second implementation of MASS that is significantly faster. Typically this is the one you will use.
  • MASS3 - a piecewise version of MASS2 that can be tuned to your hardware. Generally this is used to search very large time series.
  • MASS2_batch - a batch version of MASS2 that reduces overall memory usage, provides parallelization and enables you to find top K number of matches within the time series. The goal of using this implementation is for very large time series similarity search.

Installation

pip install mass-ts

Example Usage

A dedicated repository for practical examples can be found at the mass-ts-examples repository.

import numpy as np
import mass_ts as mts

ts = np.loadtxt('ts.txt')
query = np.loadtxt('query.txt')

# mass
distances = mts.mass(ts, query)

# mass2
distances = mts.mass2(ts, query)

# mass3
distances = mts.mass3(ts, query, 256)

# mass2_batch
# start a multi-threaded batch job with all cpu cores and give me the top 5 matches.
# note that batch_size partitions your time series into a subsequence similarity search.
# even for large time series in single threaded mode, this is much more memory efficient than
# MASS2 on its own.
batch_size = 10000
top_matches = 5
n_jobs = -1
indices, distances = mts.mass2_batch(ts, query, batch_size, 
    top_matches=top_matches, n_jobs=n_jobs)

# find minimum distance
min_idx = np.argmin(distances)

Citations

Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2015), The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance, URL: http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html

======= History

0.1.0 (2019-05-16)

  • First release on PyPI.

0.1.1 (2019-05-17)

  • Minor precision bug fixes.

0.1.2 (2019-05-19)

  • mass2_batch release for efficient large time series searching.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mass_ts-0.1.2.tar.gz (30.0 kB view hashes)

Uploaded Source

Built Distribution

mass_ts-0.1.2-py2.py3-none-any.whl (12.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page