SCAMPI - Fast Pattern Discovery in Massive Time Series
Project description
Fast Pattern Discovery in Massive Time Series
This repository provides the reference implementation and experimental material for the paper “Fast Pattern Discovery in Massive Time Series”.
Time series motif discovery identifies repeated subsequences for applications like ECG, EEG, and activity recognition. State-of-the- art methods such as MASS are limited by quadratic time or memory complexity, making large-scale analysis impractical. Current CPU methods take nearly a day for 30 million points, and even GPU acceleration requires hours. SCAMPI is a scalable anytime motif set finder using LSH-based pruning, anytime processing, and a memory-efficient graph structure. On up to 25 large datasets (180k to 62M points, totaling 413 compute days across all competitors), SCAMPI consistently ranks among the fastest and most accurate methods. For million-scale data, it finds high-quality motifs in under a minute with only 8 GB RAM, which is four orders of magnitude faster than extrapolated MASS (21 days) and three orders faster than MOMP (12 hours). Competitors either produced low-quality results, needed excessive RAM, or crashed. SCAMPI enables motif discovery in previously infeasible scenarios, including memory-constrained systems, and single-machine analytics, while also reducing energy consumption, making it suitable for sustainable large-scale data analysis.
Repository Structure
This repository contains the full framework, benchmark datasets, and reproducible experiments used in the evaluation.
-
motiflets/
Core implementation of the k-Motiflets algorithm. -
notebooks/
Jupyter notebooks demonstrating typical use cases and reproducing paper figures. -
datasets/momp/
Benchmark time series datasets used throughout the paper. -
tests/csvs/
Raw experimental results for all competing methods.
SCAMPI (SCalable Anytime Mining of Patterns In time series)
This paper introduces SCAMPI (scalable Anytime Mining of Pat- terns under Euclidean Distance). SCAMPI is a fast LSH-based backend for discovering motif sets in massive time series. The code builds upon the Motiflets definition of motif sets but was systematically designed from the ground up to exploit commodity multi-core hardware and SOTA data structures while maintaining high precision. To overcome the inherent quadratic-time bottleneck of motif search, SCAMPI employs Locality-Sensitive Hashing (LSH) to aggressively prune
Motiflets
Motif discovery aims to identify repeated patterns in time series data. A key
difficulty in classical motif discovery is that both the motif length and the number
of motif occurrences are unknown and must be inferred indirectly.
k-Motiflets introduce a new formulation of motif discovery that explicitly models
the desired motif set size.
Intuitively, a k-Motiflet is the set of exactly k most similar subsequences of a
given length. Instead of fixing a distance threshold and counting matches, k-Motiflets:
- take the motif size
kas an explicit parameter, and - maximize the internal similarity of the resulting motif set.
This turns classical motif discovery upside down. The parameter k has a clear and
intuitive interpretation and is often known or easily estimated in real applications.
This formulation enables fast algorithms with strong empirical performance on massive
time series.
Installation
The easiest is to use pip to install motiflets.
a) Install using pip
pip install scampi
You can also install the project from source.
b) Build from Source
Clone the repository:
git clone https://github.com/patrickzib/scampi.git
cd scampi
Install the package:
pip install .
Usage Example
from motiflets.plotting import *
ml = Motiflets(
ds_name, # dataset name
series, # time series data
n_jobs # number of CPU cores
)
k_max = 20
motif_length = 100
dists, candidates, elbow_points = ml.fit_k_elbow(
k_max,
motif_length
)
ml.plot_motifset()
Raw Experimental Results
All raw benchmark results reported in the paper are available in tests/csvs/ for full
reproducibility.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scampi-0.1.1.tar.gz.
File metadata
- Download URL: scampi-0.1.1.tar.gz
- Upload date:
- Size: 72.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d355fc6a078cd740b9b6f4c7348ca909440143e1c794967e6b51e48e603e1d0
|
|
| MD5 |
a78086bc867f64b109fe399758fc7c80
|
|
| BLAKE2b-256 |
0cd387a0dd7822e2cd70d8f863bbfe0ffd017d66e853fd2fb3ebadc2a0708f37
|