Skip to main content

A Collection of Online Methods for Changepoint and Anomaly Detection

Project description

changepoint_online

A Collection of Methods for Online Changepoint Detection

The changepoint_online package provides efficient algorithms for detecting changes in data streams, based on the Focus algorithm. The Focus algorithm solves the CUSUM likelihood-ratio test exactly in $O(\log(n))$ time per iteration, where n represents the current iteration. The method is equivalent to running a rolling window (MOSUM) simultaneously for all sizes of windows or the Page-CUSUM for all possible values of the size of change (an infinitely dense grid).

Key Features

  • Contains all Focus exponential family algorithms as well as the NPFocus algorithm for non-parametric changepoint detection.

  • It’s versatile enough to be applied in scenarios where the pre-change parameter is either known or unknown.

  • It is possible to apply constraints to detect specific types of changes (such as increases or decreases in parameter values).

Installation

installing from PyPI

pip install changepoint-online

installing from github with pip

python -m pip install 'git+https://github.com/grosed/changepoint_online/#egg=changepoint_online&subdirectory=python/package'

Testing the package

To run package tests:

python -m changepoint_online.tests.runtests

Examples

Simple Univariate Gaussian Change-in-mean

from changepoint_online import Focus, Gaussian
import numpy as np

# generating some data with a change at 50,000
np.random.seed(0)
Y = np.concatenate((np.random.normal(loc=0.0, scale=1.0, size=50000), np.random.normal(loc=2.0, scale=1.0, size=5000)))

# initialize a Focus Gaussian detector
detector = Focus(Gaussian())
threshold = 13.0

for y in Y:
    # update your detector sequentially with
    detector.update(y)
    if detector.statistic() >= threshold:
        break

print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}

If the pre-change location is known (in case of previous training data), this can be specified with:

# initialize a Focus Gaussian detector (with pre-change location known)
detector = Focus(Gaussian(loc=0))
threshold = 13.0

for y in Y:
    # update your detector sequentially with
    detector.update(y)
    if detector.statistic() >= threshold:
        break

print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}

See help(FamilyName) for distribution specific parameters, e.g. help(Gaussian).

Change in one-parameter exponential family distributions

As in the Gaussian case, we can detect:

  • Poisson change-in-rate

  • Gamma change-in-scale (or rate). Exponential change-in-rate implemented as a Gamma shape=1.

  • Bernoulli change-in-probability

For example:

from changepoint_online import Focus, Gamma
import numpy as np 

np.random.seed(0)
Y = np.concatenate((np.random.gamma(4.0, scale=3.0, size=50000), 
                    np.random.gamma(4.0, scale=6.0, size=5000)))

# initialize a Gamma change-in-scale detector (with shape = 4)
detector = Focus(Gamma(shape=4.0))
threshold = 12.0
for y in Y:
    detector.update(y)
    if detector.statistic() >= threshold:
        break
        
print(detector.changepoint())
{'stopping_time': 50012, 'changepoint': 50000}

Non-Parametric changepoint detection

If we do not know the underlying distribution, or if the nature of the change is unkown a priori, we can then use NPFocus.

from changepoint_online import NPFocus
import numpy as np

# Define a simple Gaussian noise function
def generate_gaussian_noise(size):
    return np.random.normal(loc=0.0, scale=1.0, size=size)

# Generate mixed data with change in gamma component
gamma_1 = np.random.gamma(4.0, scale=3.0, size=5000)
gamma_2 = np.random.gamma(4.0, scale=6.0, size=5000)
gaussian_noise = generate_gaussian_noise(10000)
Y = np.concatenate((gamma_1 + gaussian_noise[:5000], gamma_2 + gaussian_noise[5000:]))

# Create and use NPFocus detector
## One needs to provide some quantiles to track the null distribuition over
quantiles = [np.quantile(Y[:100], q) for q in [0.25, 0.5, 0.75]]
## the detector can be initialised with those quantiles
detector = NPFocus(quantiles)

stat_over_time = []

for y in Y:
    detector.update(y)
    # we can sum the statistics over to get a detection
    # see  (Romano, Eckley, and Fearnhead 2024) for more details
    if np.sum(detector.statistic()) > 25:
        break


changepoint_info = detector.changepoint()
print(changepoint_info["stopping_time"])
5014

Multivariate Data

It is possible to run a multivariate analysis via MDFocus by feeding, at each iteration, a numpy array.

Gaussian Change-in-mean

from changepoint_online import MDFocus, MDGaussian
import numpy as np

np.random.seed(123)

# Define means and standard deviations for pre-change and post-change periods (independent dimensions)
mean_pre = np.array([0.0, 0.0, 5.0])
mean_post = np.array([1.0, 1.0, 4.5])

# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500

# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.normal(mean_pre, size=(size_pre, 3))

# Generate post-change data (independent samples for each dimension)
Y_post = np.random.normal(mean_post, size=(size_post, 3))

# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))

# Initialize the Gaussian detector 
# (for change in mean known, as in univariate case, write MDGaussian(loc=mean_pre))
detector = MDFocus(MDGaussian(), pruning_params = (2, 1))
threshold = 25
for y in Y:
    detector.update(y)
    if detector.statistic() >= threshold:
        break
print(detector.changepoint())
{'stopping_time': 5014, 'changepoint': 5000}

Poisson Change-in-rate

from changepoint_online import MDFocus, MDPoisson
import numpy as np

np.random.seed(123)

# Define rates (lambda) for pre-change and post-change periods (independent dimensions)
rate_pre = np.array([1.0, 2.0, 3.0])
rate_post = np.array([2.0, 3.0, 4.0])

# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500

# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.poisson(rate_pre, size=(size_pre, 3))

# Generate post-change data (independent samples for each dimension)
Y_post = np.random.poisson(rate_post, size=(size_post, 3))

# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))

# Initialize the Poisson detector
detector = MDFocus(MDPoisson(), pruning_params = (2, 1))
threshold = 45  # Adjust the threshold as needed for the Poisson distribution
for y in Y:
    detector.update(y)
    if detector.statistic() >= threshold:
        break
print(detector.changepoint())
{'stopping_time': 5056, 'changepoint': 5000}

Real-data examples

More examples, including real-world applications, are found in the examples folder, including:

License

Copyright (C) 2023 Gaetano Romano, Daniel Grose

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.txt.

GitHub Repository

Source files for the packages can be found at https://github.com/grosed/changepoint_online

Experimental

The package includes the nunc algorithm for non-parametric changepoint detectino from Austin et al. (2023) as an experimental function. It is possible to call nunc as:

from changepoint_online.nunc import nunc, nunc_default_quantiles
import numpy as np

np.random.seed(1)
# Generate data with change
X1 = np.random.normal(1, 1, 3000)
X2 = np.random.normal(-2, 4, 5000)
Y = np.concatenate((X1, X2))

# Create and use NUNC detector
detector = nunc(300, 5, nunc_default_quantiles)

stat_over_time = []

for y in Y:
    detector.update(y)
    stat_over_time.append(detector.max_cost)
    if detector.statistic() > 30:
        break


changepoint_info = detector.changepoint()

print(f"Output information:\n{changepoint_info}")
Output information:
{'stopping_time': 300, 'changepoint': 284, 'max_cost': 31.492069041110202}

Contributors

  • Gaetano Romano: email (Author) (Maintainer) (Creator) (Translator)

  • Daniel Grose: email (Author) (Maintainer) (Creator)

  • Kes Ward: email (Author)

  • Austin Edward: email (Author) (Maintainer)

  • Liudmila Pishchagina: email (Author)

  • Guillem Rigaill: email (Author) (Thesis Advisor)

  • Vincent Runge: email (Author) (Thesis Advisor)

  • Paul Fearnhead: email (Author) (Thesis Advisor)

  • Idris A. Eckley: email (Author) (Thesis Advisor)

How to Cite This Work

A possible BibTeX entry for this package could be:

@software{changepoint_online,
  author       = {Gaetano Romano, Daniel Grose, Kes Ward, Austin Edward, Liudmila Pishchagina, Guillem Rigaill, Vincent Runge, Paul Fearnhead, Idris A. Eckley},
  title        = {changepoint_online: A Collection of Methods for Online Changepoint Detection.},
  month        = Apr,
  year         = 2024,
  version      = {v1.2.1},
  url          = {https://https://github.com/grosed/changepoint_online}
}

For citing the methodologies:

  • Gaussian FOCuS: (Romano et al. 2023)

  • Other Exponential Family detectors: (Ward et al. 2024)

  • Multi-dimentional FOCuS (Pishchagina et al. 2023)

  • NPFocus: (Romano, Eckley, and Fearnhead 2024)

  • NUNC: (Austin et al. 2023)

See references below.

References

Austin, Edward, Gaetano Romano, Idris A Eckley, and Paul Fearnhead. 2023. “Online Non-Parametric Changepoint Detection with Application to Monitoring Operational Performance of Network Devices.” Computational Statistics & Data Analysis 177: 107551.

Pishchagina, Liudmila, Gaetano Romano, Paul Fearnhead, Vincent Runge, and Guillem Rigaill. 2023. “Online Multivariate Changepoint Detection: Leveraging Links with Computational Geometry.” arXiv Preprint arXiv:2311.01174.

Romano, Gaetano, Idris A. Eckley, and Paul Fearnhead. 2024. “A Log-Linear Nonparametric Online Changepoint Detection Algorithm Based on Functional Pruning.” IEEE Transactions on Signal Processing 72: 594–606. https://doi.org/10.1109/tsp.2023.3343550.

Romano, Gaetano, Idris A Eckley, Paul Fearnhead, and Guillem Rigaill. 2023. “Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics.” Journal of Machine Learning Research 24 (81): 1–36. https://www.jmlr.org/papers/v24/21-1230.html.

Ward, Kes, Gaetano Romano, Idris Eckley, and Paul Fearnhead. 2024. “A Constant-Per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models.” Statistics and Computing 34 (3): 1–11.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

changepoint_online-1.2.1.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

changepoint_online-1.2.1-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file changepoint_online-1.2.1.tar.gz.

File metadata

  • Download URL: changepoint_online-1.2.1.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for changepoint_online-1.2.1.tar.gz
Algorithm Hash digest
SHA256 e292af32960c0dae3f0db88dd1354652c5e2fa8a5ece414734340d04f2a0a78e
MD5 94fbbfeb26fd599cc793655ef56e9916
BLAKE2b-256 ef1c47a17c7ee52368d763abbb0c23925f999e2edc9b6bf20917f8fa52f5bd66

See more details on using hashes here.

File details

Details for the file changepoint_online-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for changepoint_online-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 647ccc6bbd7fc2bd9cfdfbe91cac6775e86e555dddd5589509c7bae780a18b57
MD5 f5cb1a85660cd92e453b0181483dc5b4
BLAKE2b-256 247cdb4863f37f24ccd188b06bf8b85054264cbf40c41cb58dd405dcf89fe08d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page