scampi - Fast Pattern Discovery in Massive Time Series

These details have not been verified by PyPI

Project links

Project description

Fast Pattern Discovery in Massive Time Series

This repository provides the reference implementation and experimental material for the paper “Fast Pattern Discovery in Massive Time Series”.

Time series motif discovery identifies repeated subsequences for applications like ECG, EEG, and activity recognition. State-of-the- art methods such as MASS are limited by quadratic time or memory complexity, making large-scale analysis impractical. Current CPU methods take nearly a day for 30 million points, and even GPU acceleration requires hours. AMPED is a scalable anytime motif set finder using LSH-based pruning, anytime processing, and a memory-efficient graph structure. On up to 25 large datasets (180k to 62M points, totaling 413 compute days across all competitors), AMPED consistently ranks among the fastest and most accurate methods. For million-scale data, it finds high-quality motifs in under a minute with only 8 GB RAM, which is four orders of magnitude faster than extrapolated MASS (21 days) and three orders faster than MOMP (12 hours). Competitors either produced low-quality results, needed excessive RAM, or crashed. AMPED enables motif discovery in previously infeasible scenarios, including memory-constrained systems, and single-machine analytics, while also reducing energy consumption, making it suitable for sustainable large-scale data analysis.

Repository Structure

This repository contains the full framework, benchmark datasets, and reproducible experiments used in the evaluation.

motiflets/
Core implementation of the k-Motiflets algorithm.
notebooks/
Jupyter notebooks demonstrating typical use cases and reproducing paper figures.
datasets/momp/
Benchmark time series datasets used throughout the paper.
tests/csvs/
Raw experimental results for all competing methods.

AMPED (scalable Anytime Mining of Patterns under Euclidean Distance)

This paper introduces AMPED (scalable Anytime Mining of Pat- terns under Euclidean Distance). It builds upon the Motiflets definition of motif sets but was systematically designed from the ground up to exploit commodity multi-core hardware and SOTA data structures while maintaining high precision. To overcome the inherent quadratic-time bottleneck of motif search, AMPED employs Locality-Sensitive Hashing (LSH) to aggressively prune

Motiflets

Motif discovery aims to identify repeated patterns in time series data. A key difficulty in classical motif discovery is that both the motif length and the number of motif occurrences are unknown and must be inferred indirectly. k-Motiflets introduce a new formulation of motif discovery that explicitly models the desired motif set size. Intuitively, a k-Motiflet is the set of exactly k most similar subsequences of a given length. Instead of fixing a distance threshold and counting matches, k-Motiflets:

take the motif size k as an explicit parameter, and
maximize the internal similarity of the resulting motif set.

This turns classical motif discovery upside down. The parameter k has a clear and intuitive interpretation and is often known or easily estimated in real applications. This formulation enables fast algorithms with strong empirical performance on massive time series.

Installation

Install the project directly from source.

Build from Source

Clone the repository:

git clone https://github.com/patrickzib/motiflets.git
cd motiflets

Install the package:

pip install .

Usage Example

from motiflets.plotting import *

ml = Motiflets(
    ds_name,   # dataset name
    series,    # time series data
    n_jobs     # number of CPU cores
)

k_max = 20
motif_length = 100

dists, candidates, elbow_points = ml.fit_k_elbow(
    k_max,
    motif_length
)

ml.plot_motifset()

Raw Experimental Results

All raw benchmark results reported in the paper are available in tests/csvs/ for full reproducibility.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Dec 22, 2025

This version

0.1.0

Dec 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scampi-0.1.0.tar.gz (72.7 kB view details)

Uploaded Dec 21, 2025 Source

File details

Details for the file scampi-0.1.0.tar.gz.

File metadata

Download URL: scampi-0.1.0.tar.gz
Upload date: Dec 21, 2025
Size: 72.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for scampi-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`431c98aa1956c807a22f3086078058863baf4b5b61fe7b374beb417de2ff7dea`
MD5	`b838e587074c47d67934ea99c4945c98`
BLAKE2b-256	`020b615ca14fd29c4c43e12a7de06ba4da0ff254dfb7276d33d3c2c16809ff7c`

See more details on using hashes here.

scampi 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast Pattern Discovery in Massive Time Series

Repository Structure

AMPED (scalable Anytime Mining of Patterns under Euclidean Distance)

Motiflets

Installation

Build from Source

Usage Example

Raw Experimental Results

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes