Skip to main content

A Collection of Methods for Online Changepoint Detection

Project description

changepoint_online

A Collection of Methods for Online Changepoint Detection

The changepoint_online package provides efficient algorithms for detecting changes in data streams, based on the Focus algorithm. The Focus algorithm solves the CUSUM likelihood-ratio test exactly in $O(\log(n))$ time per iteration, where n represents the current iteration. The method is equivalent to running a rolling window (MOSUM) simultaneously for all sizes of windows or the Page-CUSUM for all possible values of the size of change (an infinitely dense grid).

Key Features

  • Contains all Focus exponential family algorithms as well as the NPFocus algorithm for non-parametric changepoint detection.

  • It’s versatile enough to be applied in scenarios where the pre-change parameter is either known or unknown.

  • It is possible to apply constraints to detect specific types of changes (such as increases or decreases in parameter values).

Installation

installing from PyPI

pip install changepoint-online

installing from github with pip

python -m pip install 'git+https://github.com/grosed/changepoint_online/#egg=changepoint_online&subdirectory=python/package'

Testing the package

To run package tests:

python -m changepoint_online.tests.runtests

Examples

Simple Univariate Gaussian Change-in-mean

from changepoint_online import Focus, Gaussian
import numpy as np

# generating some data with a change at 50,000
np.random.seed(0)
Y = np.concatenate((np.random.normal(loc=0.0, scale=1.0, size=50000), np.random.normal(loc=2.0, scale=1.0, size=5000)))

# initialize a Focus Gaussian detector
detector = Focus(Gaussian())
threshold = 13.0

for y in Y:
    # update your detector sequentially with
    detector.update(y)
    if detector.statistic() >= threshold:
        break

print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}

If the pre-change location is known (in case of previous training data), this can be specified with:

# initialize a Focus Gaussian detector (with pre-change location known)
detector = Focus(Gaussian(loc=0))
threshold = 13.0

for y in Y:
    # update your detector sequentially with
    detector.update(y)
    if detector.statistic() >= threshold:
        break

print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}

See help(FamilyName) for distribution specific parameters, e.g. help(Gaussian).

Change in one-parameter exponential family distributions

As in the Gaussian case, we can detect:

  • Poisson change-in-rate

  • Gamma change-in-scale (or rate). Exponential change-in-rate implemented as a Gamma shape=1.

  • Bernoulli change-in-probability

For example:

from changepoint_online import Focus, Gamma
import numpy as np 

np.random.seed(0)
Y = np.concatenate((np.random.gamma(4.0, scale=3.0, size=50000), 
                    np.random.gamma(4.0, scale=6.0, size=5000)))

# initialize a Gamma change-in-scale detector (with shape = 4)
detector = Focus(Gamma(shape=4.0))
threshold = 12.0
for y in Y:
    detector.update(y)
    if detector.statistic() >= threshold:
        break
        
print(detector.changepoint())
{'stopping_time': 50012, 'changepoint': 50000}

Non-Parametric changepoint detection

If we do not know the underlying distribution, or if the nature of the change is unkown a priori, we can then use NPFocus.

from changepoint_online import NPFocus
import numpy as np

# Define a simple Gaussian noise function
def generate_gaussian_noise(size):
    return np.random.normal(loc=0.0, scale=1.0, size=size)

# Generate mixed data with change in gamma component
gamma_1 = np.random.gamma(4.0, scale=3.0, size=5000)
gamma_2 = np.random.gamma(4.0, scale=6.0, size=5000)
gaussian_noise = generate_gaussian_noise(10000)
Y = np.concatenate((gamma_1 + gaussian_noise[:5000], gamma_2 + gaussian_noise[5000:]))

# Create and use NPFocus detector
## One needs to provide some quantiles to track the null distribuition over
quantiles = [np.quantile(Y[:100], q) for q in [0.25, 0.5, 0.75]]
## the detector can be initialised with those quantiles
detector = NPFocus(quantiles)

stat_over_time = []

for y in Y:
    detector.update(y)
    # we can sum the statistics over to get a detection
    # see  (Romano, Eckley, and Fearnhead 2024) for more details
    if np.sum(detector.statistic()) > 25:
        break


changepoint_info = detector.changepoint()
print(changepoint_info["stopping_time"])
5014

Multivariate Data

It is possible to run a multivariate analysis via MDFocus by feeding, at each iteration, a numpy array.

Gaussian Change-in-mean

from changepoint_online import MDFocus, MDGaussian
import numpy as np

np.random.seed(123)

# Define means and standard deviations for pre-change and post-change periods (independent dimensions)
mean_pre = np.array([0.0, 0.0, 5.0])
mean_post = np.array([1.0, 1.0, 4.5])

# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500

# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.normal(mean_pre, size=(size_pre, 3))

# Generate post-change data (independent samples for each dimension)
Y_post = np.random.normal(mean_post, size=(size_post, 3))

# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))

# Initialize the Gaussian detector 
# (for change in mean known, as in univariate case, write MDGaussian(loc=mean_pre))
detector = MDFocus(MDGaussian(), pruning_params = (2, 1))
threshold = 25
for y in Y:
    detector.update(y)
    if detector.statistic() >= threshold:
        break
print(detector.changepoint())
{'stopping_time': 5014, 'changepoint': 5000}

Poisson Change-in-rate

from changepoint_online import MDFocus, MDPoisson
import numpy as np

np.random.seed(123)

# Define rates (lambda) for pre-change and post-change periods (independent dimensions)
rate_pre = np.array([1.0, 2.0, 3.0])
rate_post = np.array([2.0, 3.0, 4.0])

# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500

# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.poisson(rate_pre, size=(size_pre, 3))

# Generate post-change data (independent samples for each dimension)
Y_post = np.random.poisson(rate_post, size=(size_post, 3))

# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))

# Initialize the Poisson detector
detector = MDFocus(MDPoisson(), pruning_params = (2, 1))
threshold = 45  # Adjust the threshold as needed for the Poisson distribution
for y in Y:
    detector.update(y)
    if detector.statistic() >= threshold:
        break
print(detector.changepoint())
{'stopping_time': 5056, 'changepoint': 5000}

Real-data examples

More examples, including real-world applications, are found in the examples folder, including:

License

Copyright (C) 2023 Gaetano Romano, Daniel Grose

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.txt.

GitHub Repository

Source files for the packages can be found at https://github.com/grosed/changepoint_online

Experimental

The package includes the nunc algorithm for non-parametric changepoint detectino from Austin et al. (2023) as an experimental function. It is possible to call nunc as:

from changepoint_online.nunc import nunc, nunc_default_quantiles
import numpy as np

np.random.seed(1)
# Generate data with change
X1 = np.random.normal(1, 1, 3000)
X2 = np.random.normal(-2, 4, 5000)
Y = np.concatenate((X1, X2))

# Create and use NUNC detector
detector = nunc(300, 5, nunc_default_quantiles)

stat_over_time = []

for y in Y:
    detector.update(y)
    stat_over_time.append(detector.max_cost)
    if detector.statistic() > 30:
        break


changepoint_info = detector.changepoint()

print(f"Output information:\n{changepoint_info}")
Output information:
{'stopping_time': 300, 'changepoint': 284, 'max_cost': 31.492069041110202}

Contributors

  • Austin Eduard: email (Author) (Maintainer)

  • Idris A. Eckley: email (Author) (Thesis Advisor)

  • Paul Fearnhead: email (Author) (Thesis Advisor)

  • Daniel Grose: email (Author) (Maintainer) (Creator)

  • Liudmila Pishchagina: email (Author)

  • Guillem Rigaill: email (Author) (Thesis Advisor)

  • Gaetano Romano: email (Author) (Maintainer) (Creator) (Translator)

  • Vincent Runge: email (Author) (Thesis Advisor)

  • Kes Ward: email (Author)

How to Cite This Work

A possible BibTeX entry for this package could be:

@software{changepoint_online,
  author       = {Idris A. Eckley, Paul Fearnhead, Daniel Grose, Liudmila Pishchagina, Guillem Rigaill, Gaetano Romano, Vincent Runge, Kes Ward},
  title        = {changepoint_online: A Collection of Methods for Online Changepoint Detection.},
  month        = Apr,
  year         = 2024,
  version      = {v1.0.0},
  url          = {https://https://github.com/grosed/changepoint_online}
}

For citing the methodologies:

  • Gaussian FOCuS: (Romano et al. 2023)

  • Other Exponential Family detectors: (Ward et al. 2024)

  • Multi-dimentional FOCuS (Pishchagina et al. 2023)

  • NPFocus: (Romano, Eckley, and Fearnhead 2024)

See references below.

References

Austin, Edward, Gaetano Romano, Idris A Eckley, and Paul Fearnhead. 2023. “Online Non-Parametric Changepoint Detection with Application to Monitoring Operational Performance of Network Devices.” Computational Statistics & Data Analysis 177: 107551.

Pishchagina, Liudmila, Gaetano Romano, Paul Fearnhead, Vincent Runge, and Guillem Rigaill. 2023. “Online Multivariate Changepoint Detection: Leveraging Links with Computational Geometry.” arXiv Preprint arXiv:2311.01174.

Romano, Gaetano, Idris A. Eckley, and Paul Fearnhead. 2024. “A Log-Linear Nonparametric Online Changepoint Detection Algorithm Based on Functional Pruning.” IEEE Transactions on Signal Processing 72: 594–606. https://doi.org/10.1109/tsp.2023.3343550.

Romano, Gaetano, Idris A Eckley, Paul Fearnhead, and Guillem Rigaill. 2023. “Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics.” Journal of Machine Learning Research 24 (81): 1–36. https://www.jmlr.org/papers/v24/21-1230.html.

Ward, Kes, Gaetano Romano, Idris Eckley, and Paul Fearnhead. 2024. “A Constant-Per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models.” Statistics and Computing 34 (3): 1–11.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

changepoint_online-1.2.0.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

changepoint_online-1.2.0-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file changepoint_online-1.2.0.tar.gz.

File metadata

  • Download URL: changepoint_online-1.2.0.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for changepoint_online-1.2.0.tar.gz
Algorithm Hash digest
SHA256 1f7077aa6c53453d12c855e7d61c8cbb8a114e274454c0c87798be43073db5a6
MD5 a83c6dbdab3cd14c9d2b1f80097e244f
BLAKE2b-256 fbed0a5e52030c357e9cef058e68ea6a7e1cbf4e8dda8902aba9d061a12a55b1

See more details on using hashes here.

File details

Details for the file changepoint_online-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for changepoint_online-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6afa62706d7655ca5670eda564ccff00602275bb792b503c63b5ea2fd72f5af
MD5 42f7ec6320285431e9d1246e2527fae8
BLAKE2b-256 9fe0197d72559903c1ab2f69b69856b80a6bd8130bd47336506250e226390001

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page