Pure-numpy Matrix Profile (motif and anomaly discovery in time series). No Numba required.
Project description
fastmatrix-profile
Pure-numpy Matrix Profile for time-series motif and anomaly detection. No Numba. No C extensions. No GPU. Just numpy + scipy.
Status: v0.1.0 — 29/29 tests passing, output bit-identical to brute-force reference up to n=10k, NYC Taxi tutorial recovers 4/5 documented anomalies.
What it does
Given a 1D time series T and a window length m, computes the Matrix
Profile P — a 1D summary where:
argmin(P)points to a repeated pattern (motif)argmax(P)points to the most unusual subsequence (anomaly / discord)
One call, no tuning, no labels.
Install
pip install fastmatrix-profile
30-second example
import numpy as np
from fastmatrix_profile import matrix_profile
T = np.loadtxt("your_timeseries.csv")
P, I = matrix_profile(T, m=100)
anomaly_idx = int(P.argmax())
motif_idx = int(P.argmin())
That's it. See examples/nyc_taxi_anomalies.ipynb
for a real walk-through on the Numenta NAB NYC taxi dataset — recovers the
NYC Marathon, Christmas, New Year and the Jan 2015 snowstorm without any
parameter tuning beyond m.
Benchmarks vs STUMPY
Numbers below: macOS arm64 (Apple M4) + Accelerate BLAS. Also verified on
Linux x86 + OpenBLAS (GitHub Actions CI — see
bench.yml)
and Linux arm64 + OpenBLAS (Docker container). The relative competitive
story holds across all three; on the 2-core x86 Linux runner, the f32
mode ties-or-beats STUMPY in 12 of 14 configurations. Run the
interactive demo
yourself to see numbers on your own hardware.
Warm wall-time (median of 3 reps after warmup)
| n | m | fastmatrix-profile (default) | + dtype="float32" |
STUMPY | winner |
|---|---|---|---|---|---|
| 1,000 | 50 | 1.4 ms | 1.0 ms | 7.4 ms | ours, 5–7× |
| 2,000 | 50 | 6.4 ms | 4 ms | 11.1 ms | ours, 1.7–2.5× |
| 5,000 | 100 | 39 ms | 19 ms | 27 ms | f32 beats STUMPY (1.4×) |
| 10,000 | 100 | 136 ms | 72 ms | 62 ms | STUMPY (2.2× / 1.15×) |
| 20,000 | 100 | 590 ms | 263 ms | 175 ms | STUMPY (3.4× / 1.5×) |
dtype="float32" is opt-in: ~2× faster at the cost of ~3e-5 absolute
error in P (argmin/argmax positions unchanged in practice — verified
on the NYC taxi dataset, both modes flag identical anomalies).
Cold start (fresh Python process, n=2000, m=50)
| Time | |
|---|---|
| fastmatrix-profile | 0.17 s |
| STUMPY | 8.6 s (49× slower — Numba JIT compile) |
When to use this vs STUMPY
| Situation | Use |
|---|---|
| Long series (n > ~10k), Numba installs fine, many calls per process | STUMPY |
| Lambda / edge / Docker-slim / Pyodide / locked-down corporate Python | fastmatrix-profile |
| One-shot calls or interactive notebooks where cold start matters | fastmatrix-profile |
| GPU available, very large n | STUMPY-CUDA or SCAMP |
This is not a STUMPY replacement at large n. It's the option for when you can't or don't want to install a JIT toolchain.
Algorithm
Uses a row-tiled BLAS-batched self-join: form the z-normalised window
matrix Wz of shape (L, m), then sweep row blocks of Wz, multiply each
block against Wz.T in a single gemm call, mask the trivial-match band,
reduce to per-row min, and discard the tile. Peak memory: tile × L floats
per tile (tens of MB) instead of the GBs a full (L, L) Gram matrix would
need. Cost: O(L² m) flops at near-peak BLAS throughput.
Falls back to the STOMP recurrence (Zhu et al., ICDM 2016) when even a
minimum-size tile would exceed max_mem_mb (default 512 MB) — typically
n ≳ 1M. You can call matrix_profile(very_long_series, m) without blowing
up RAM; it will just be slower past the dense regime.
dtype="float32" runs the inner GEMM in single precision: ~2× faster, ~half
the peak memory, ~3e-5 absolute error in P (argmin/argmax positions
unchanged in practice).
mass(Q, T) is exposed as a standalone primitive (Mueen's MASS distance
profile via FFT).
Limitations
- 1D series only. No multi-dimensional matrix profile.
- No streaming / incremental updates.
- No GPU.
- NaN/Inf in the input are not handled — clean them first.
These are deliberate v0.1.0 scope decisions. Open an issue if you have a concrete use case.
Citation
@inproceedings{yeh2016matrix,
title={Matrix profile I: all pairs similarity joins for time series},
author={Yeh, Chin-Chia Michael and others},
booktitle={2016 IEEE 16th International Conference on Data Mining (ICDM)},
year={2016}
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastmatrix_profile-0.1.1.tar.gz.
File metadata
- Download URL: fastmatrix_profile-0.1.1.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6df85f846ec859701b4d2839d87c1392b7e492bc8fbecf501e83163beda390cb
|
|
| MD5 |
4a189a8f79af8f0d46b622083d953417
|
|
| BLAKE2b-256 |
4c8488e77e3f4fa167beb95f03071bafa24e182c1fcbf162fda835214f2d950e
|
File details
Details for the file fastmatrix_profile-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fastmatrix_profile-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0959d327e8b4fc017f39f26face376e5b11f425d760850b5dabe23d80e3dc3cc
|
|
| MD5 |
5d57e74b6eae6ef018f107d337f55ea7
|
|
| BLAKE2b-256 |
08dbae6a1c6325b4ee56ec2341d28acc8d22930604c17a9f01e572f34ed1d94d
|