A Collection of Methods for Online Changepoint Detection
Project description
changepoint_online
- A Collection of Methods for Online Changepoint Detection
- Installation
- Examples
- License
- GitHub Repository
- Experimental
- Contributors
- How to Cite This Work
- References
A Collection of Methods for Online Changepoint Detection
The changepoint_online package provides efficient algorithms for
detecting changes in data streams, based on the Focus
algorithm. The
Focus
algorithm solves the CUSUM likelihood-ratio test exactly in
$O(\log(n))$ time per iteration, where n represents the current
iteration. The method is equivalent to running a rolling window (MOSUM)
simultaneously for all sizes of windows or the Page-CUSUM for all
possible values of the size of change (an infinitely dense grid).
Key Features
-
Contains all
Focus
exponential family algorithms as well as theNPFocus
algorithm for non-parametric changepoint detection. -
It’s versatile enough to be applied in scenarios where the pre-change parameter is either known or unknown.
-
It is possible to apply constraints to detect specific types of changes (such as increases or decreases in parameter values).
Installation
installing from PyPI
pip install changepoint-online
installing from github with pip
python -m pip install 'git+https://github.com/grosed/changepoint_online/#egg=changepoint_online&subdirectory=python/package'
Testing the package
To run package tests:
python -m changepoint_online.tests.runtests
Examples
Simple Univariate Gaussian Change-in-mean
from changepoint_online import Focus, Gaussian
import numpy as np
# generating some data with a change at 50,000
np.random.seed(0)
Y = np.concatenate((np.random.normal(loc=0.0, scale=1.0, size=50000), np.random.normal(loc=2.0, scale=1.0, size=5000)))
# initialize a Focus Gaussian detector
detector = Focus(Gaussian())
threshold = 13.0
for y in Y:
# update your detector sequentially with
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}
If the pre-change location is known (in case of previous training data), this can be specified with:
# initialize a Focus Gaussian detector (with pre-change location known)
detector = Focus(Gaussian(loc=0))
threshold = 13.0
for y in Y:
# update your detector sequentially with
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}
See help(FamilyName)
for distribution specific parameters,
e.g. help(Gaussian)
.
Change in one-parameter exponential family distributions
As in the Gaussian case, we can detect:
-
Poisson change-in-rate
-
Gamma change-in-scale (or rate). Exponential change-in-rate implemented as a Gamma
shape=1
. -
Bernoulli change-in-probability
For example:
from changepoint_online import Focus, Gamma
import numpy as np
np.random.seed(0)
Y = np.concatenate((np.random.gamma(4.0, scale=3.0, size=50000),
np.random.gamma(4.0, scale=6.0, size=5000)))
# initialize a Gamma change-in-scale detector (with shape = 4)
detector = Focus(Gamma(shape=4.0))
threshold = 12.0
for y in Y:
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 50012, 'changepoint': 50000}
Non-Parametric changepoint detection
If we do not know the underlying distribution, or if the nature of the
change is unkown a priori, we can then use NPFocus
.
from changepoint_online import NPFocus
import numpy as np
# Define a simple Gaussian noise function
def generate_gaussian_noise(size):
return np.random.normal(loc=0.0, scale=1.0, size=size)
# Generate mixed data with change in gamma component
gamma_1 = np.random.gamma(4.0, scale=3.0, size=5000)
gamma_2 = np.random.gamma(4.0, scale=6.0, size=5000)
gaussian_noise = generate_gaussian_noise(10000)
Y = np.concatenate((gamma_1 + gaussian_noise[:5000], gamma_2 + gaussian_noise[5000:]))
# Create and use NPFocus detector
## One needs to provide some quantiles to track the null distribuition over
quantiles = [np.quantile(Y[:100], q) for q in [0.25, 0.5, 0.75]]
## the detector can be initialised with those quantiles
detector = NPFocus(quantiles)
stat_over_time = []
for y in Y:
detector.update(y)
# we can sum the statistics over to get a detection
# see (Romano, Eckley, and Fearnhead 2024) for more details
if np.sum(detector.statistic()) > 25:
break
changepoint_info = detector.changepoint()
print(changepoint_info["stopping_time"])
5014
Multivariate Data
It is possible to run a multivariate analysis via MDFocus by feeding, at
each iteration, a numpy
array.
Gaussian Change-in-mean
from changepoint_online import MDFocus, MDGaussian
import numpy as np
np.random.seed(123)
# Define means and standard deviations for pre-change and post-change periods (independent dimensions)
mean_pre = np.array([0.0, 0.0, 5.0])
mean_post = np.array([1.0, 1.0, 4.5])
# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500
# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.normal(mean_pre, size=(size_pre, 3))
# Generate post-change data (independent samples for each dimension)
Y_post = np.random.normal(mean_post, size=(size_post, 3))
# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))
# Initialize the Gaussian detector
# (for change in mean known, as in univariate case, write MDGaussian(loc=mean_pre))
detector = MDFocus(MDGaussian(), pruning_params = (2, 1))
threshold = 25
for y in Y:
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 5014, 'changepoint': 5000}
Poisson Change-in-rate
from changepoint_online import MDFocus, MDPoisson
import numpy as np
np.random.seed(123)
# Define rates (lambda) for pre-change and post-change periods (independent dimensions)
rate_pre = np.array([1.0, 2.0, 3.0])
rate_post = np.array([2.0, 3.0, 4.0])
# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500
# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.poisson(rate_pre, size=(size_pre, 3))
# Generate post-change data (independent samples for each dimension)
Y_post = np.random.poisson(rate_post, size=(size_post, 3))
# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))
# Initialize the Poisson detector
detector = MDFocus(MDPoisson(), pruning_params = (2, 1))
threshold = 45 # Adjust the threshold as needed for the Poisson distribution
for y in Y:
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 5056, 'changepoint': 5000}
Real-data examples
More examples, including real-world applications, are found in the examples folder, including:
-
Change in the tails of Energy Wholesale Price. (markdown, quarto notebook)
-
Constrained Spike inference in calcium imaging data (markdown, quarto notebook)
License
Copyright (C) 2023 Gaetano Romano, Daniel Grose
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.txt.
GitHub Repository
Source files for the packages can be found at https://github.com/grosed/changepoint_online
Experimental
The package includes the nunc
algorithm for non-parametric changepoint
detectino from Austin et al. (2023) as an experimental function. It is
possible to call nunc
as:
from changepoint_online.nunc import nunc, nunc_default_quantiles
import numpy as np
np.random.seed(1)
# Generate data with change
X1 = np.random.normal(1, 1, 3000)
X2 = np.random.normal(-2, 4, 5000)
Y = np.concatenate((X1, X2))
# Create and use NUNC detector
detector = nunc(300, 5, nunc_default_quantiles)
stat_over_time = []
for y in Y:
detector.update(y)
stat_over_time.append(detector.max_cost)
if detector.statistic() > 30:
break
changepoint_info = detector.changepoint()
print(f"Output information:\n{changepoint_info}")
Output information:
{'stopping_time': 300, 'changepoint': 284, 'max_cost': 31.492069041110202}
Contributors
-
Austin Eduard: email (Author) (Maintainer)
-
Idris A. Eckley: email (Author) (Thesis Advisor)
-
Paul Fearnhead: email (Author) (Thesis Advisor)
-
Daniel Grose: email (Author) (Maintainer) (Creator)
-
Liudmila Pishchagina: email (Author)
-
Guillem Rigaill: email (Author) (Thesis Advisor)
-
Gaetano Romano: email (Author) (Maintainer) (Creator) (Translator)
-
Vincent Runge: email (Author) (Thesis Advisor)
-
Kes Ward: email (Author)
How to Cite This Work
A possible BibTeX entry for this package could be:
@software{changepoint_online,
author = {Idris A. Eckley, Paul Fearnhead, Daniel Grose, Liudmila Pishchagina, Guillem Rigaill, Gaetano Romano, Vincent Runge, Kes Ward},
title = {changepoint_online: A Collection of Methods for Online Changepoint Detection.},
month = Apr,
year = 2024,
version = {v1.0.0},
url = {https://https://github.com/grosed/changepoint_online}
}
For citing the methodologies:
-
Gaussian FOCuS: (Romano et al. 2023)
-
Other Exponential Family detectors: (Ward et al. 2024)
-
Multi-dimentional FOCuS (Pishchagina et al. 2023)
-
NPFocus: (Romano, Eckley, and Fearnhead 2024)
See references below.
References
Austin, Edward, Gaetano Romano, Idris A Eckley, and Paul Fearnhead. 2023. “Online Non-Parametric Changepoint Detection with Application to Monitoring Operational Performance of Network Devices.” Computational Statistics & Data Analysis 177: 107551.
Pishchagina, Liudmila, Gaetano Romano, Paul Fearnhead, Vincent Runge, and Guillem Rigaill. 2023. “Online Multivariate Changepoint Detection: Leveraging Links with Computational Geometry.” arXiv Preprint arXiv:2311.01174.
Romano, Gaetano, Idris A. Eckley, and Paul Fearnhead. 2024. “A Log-Linear Nonparametric Online Changepoint Detection Algorithm Based on Functional Pruning.” IEEE Transactions on Signal Processing 72: 594–606. https://doi.org/10.1109/tsp.2023.3343550.
Romano, Gaetano, Idris A Eckley, Paul Fearnhead, and Guillem Rigaill. 2023. “Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics.” Journal of Machine Learning Research 24 (81): 1–36. https://www.jmlr.org/papers/v24/21-1230.html.
Ward, Kes, Gaetano Romano, Idris Eckley, and Paul Fearnhead. 2024. “A Constant-Per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models.” Statistics and Computing 34 (3): 1–11.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file changepoint_online-1.2.0.tar.gz
.
File metadata
- Download URL: changepoint_online-1.2.0.tar.gz
- Upload date:
- Size: 34.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f7077aa6c53453d12c855e7d61c8cbb8a114e274454c0c87798be43073db5a6 |
|
MD5 | a83c6dbdab3cd14c9d2b1f80097e244f |
|
BLAKE2b-256 | fbed0a5e52030c357e9cef058e68ea6a7e1cbf4e8dda8902aba9d061a12a55b1 |
File details
Details for the file changepoint_online-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: changepoint_online-1.2.0-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6afa62706d7655ca5670eda564ccff00602275bb792b503c63b5ea2fd72f5af |
|
MD5 | 42f7ec6320285431e9d1246e2527fae8 |
|
BLAKE2b-256 | 9fe0197d72559903c1ab2f69b69856b80a6bd8130bd47336506250e226390001 |