Skip to main content

A Python package for performing fast change point detection.

Project description

Fast Change Point Detection

Codecov test coverage CodeFactor CRAN status doi R-CMD-check.yaml r-universe Python version Python package

Overview

The fastcpd (fast change point detection) is a fast implmentation of change point detection methods in R/Python.

Documentation

Installation

R

# install.packages("devtools")
devtools::install_github("doccstat/fastcpd")
# or install from CRAN
install.packages("fastcpd")

Python WIP

# python -m ensurepip --upgrade
pip install .
# or install from PyPI
pip install fastcpd

Usage

R

set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
  x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd::fastcpd.ar(x[3 + seq_len(n)], 3, r.progress = FALSE)
summary(result)
#> 
#> Call:
#> fastcpd::fastcpd.ar(data = x[3 + seq_len(n)], order = 3, r.progress = FALSE)
#> 
#> Change points:
#> 614 
#> 
#> Cost values:
#> 2754.116 2038.945 
#> 
#> Parameters:
#>     segment 1 segment 2
#> 1  0.57120256 0.2371809
#> 2 -0.20985108 0.4031244
#> 3  0.08221978 0.2290323
plot(result)

Python WIP

import fastcpd.segmentation
from numpy import concatenate
from numpy.random import normal, multivariate_normal
covariance_mat = [[100, 0, 0], [0, 100, 0], [0, 0, 100]]
data = concatenate((multivariate_normal([0, 0, 0], covariance_mat, 300),
                    multivariate_normal([50, 50, 50], covariance_mat, 400),
                    multivariate_normal([2, 2, 2], covariance_mat, 300)))
fastcpd.segmentation.mean(data)

import fastcpd.variance_estimation
fastcpd.variance_estimation.mean(data)

Comparison

library(microbenchmark)
set.seed(1)
n <- 5 * 10^6
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
ggplot2::autoplot(microbenchmark(
  wbs = wbs::wbs(mean_data),
  not = not::not(mean_data, contrast = "pcwsConstMean"),
  changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
  jointseg = jointseg::jointSeg(mean_data, K = 12),
  fpop = fpop::Fpop(mean_data, 2 * log(n)),
  mosum = mosum::mosum(c(mean_data), G = 40),
  fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1)
))
#> Warning in microbenchmark(wbs = wbs::wbs(mean_data), not = not::not(mean_data,
#> : less accurate nanosecond times to avoid potential integer overflows

library(microbenchmark)
set.seed(1)
n <- 10^8
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
system.time(fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1))
#>    user  system elapsed 
#>  11.753   9.150  26.455 
system.time(mosum::mosum(c(mean_data), G = 40))
#>    user  system elapsed 
#>   5.518  11.516  38.368 
system.time(fpop::Fpop(mean_data, 2 * log(n)))
#>    user  system elapsed 
#>  35.926   5.231  58.269 
system.time(changepoint::cpt.mean(mean_data, method = "PELT"))
#>    user  system elapsed 
#>  32.342   9.681  66.056 
ggplot2::autoplot(microbenchmark(
  changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
  fpop = fpop::Fpop(mean_data, 2 * log(n)),
  mosum = mosum::mosum(c(mean_data), G = 40),
  fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1),
  times = 10
))

Some packages are not included in the microbenchmark comparison due to either memory constraints or long running time.

# Device: Mac mini (M1, 2020)
# Memory: 8 GB
system.time(CptNonPar::np.mojo(mean_data, G = floor(length(mean_data) / 6)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.061 0.026 0.092
system.time(ecp::e.divisive(matrix(mean_data)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.076 0.044 0.241
system.time(strucchange::breakpoints(y ~ 1, data = data.frame(y = mean_data)))
#> Timing stopped at: 265.1 145.8 832.5
system.time(breakfast::breakfast(mean_data))
#> Timing stopped at: 45.9 89.21 562.3

Cheatsheet

fastcpd cheatsheet

References

FAQ

Should I install suggested packages?

The suggested packages are not required for the main functionality of the package. They are only required for the vignettes. If you want to learn more about the package comparison and other vignettes, you could either check out vignettes on CRAN or pkgdown generated documentation.

I countered problems related to gfortran on Mac OSX or Linux!

The package should be able to install on Mac and any Linux distribution without any problems if all the dependencies are installed. However, if you encountered problems related to gfortran, it might be because RcppArmadillo is not installed previously. Try Mac OSX stackoverflow solution or Linux stackover solution if you have trouble installing RcppArmadillo.

We welcome contributions from everyone. Please follow the instructions below to make contributions.
  1. Fork the repo.

  2. Create a new branch from main branch.

  3. Make changes and commit them.

    1. Please follow the Google’s R style guide for naming variables and functions.
    2. If you are adding a new family of models with new cost functions with corresponding gradient and Hessian, please add them to src/fastcpd_class_cost.cc with proper example and tests in vignettes/gallery.Rmd and tests/testthat/test-gallery.R.
    3. Add the family name to src/fastcpd_constants.h.
    4. [Recommended] Add a new wrapper function in R/fastcpd_wrappers.R for the new family of models and move the examples to the new wrapper function as roxygen examples.
    5. Add the new wrapper function to the corresponding section in _pkgdown.yml.
  4. Push the changes to your fork.

  5. Create a pull request.

  6. Make sure the pull request does not create new warnings or errors in devtools::check().

Trouble installing Python package.

Python headers are required to install the Python package. If you are using Ubuntu, you can install the headers with:

sudo apt install python3-dev
Encountered a bug or unintended behavior?
  1. File a ticket at GitHub Issues.
  2. Contact the authors specified in DESCRIPTION.

Stargazers over time

Stargazers over time

Acknowledgements

Special thanks to clODE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastcpd-0.19.0.tar.gz (106.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastcpd-0.19.0-cp312-cp312-macosx_12_0_universal2.whl (256.5 kB view details)

Uploaded CPython 3.12macOS 12.0+ universal2 (ARM64, x86-64)

fastcpd-0.19.0-cp311-cp311-macosx_12_0_universal2.whl (255.9 kB view details)

Uploaded CPython 3.11macOS 12.0+ universal2 (ARM64, x86-64)

fastcpd-0.19.0-cp310-cp310-macosx_13_0_x86_64.whl (255.9 kB view details)

Uploaded CPython 3.10macOS 13.0+ x86-64

File details

Details for the file fastcpd-0.19.0.tar.gz.

File metadata

  • Download URL: fastcpd-0.19.0.tar.gz
  • Upload date:
  • Size: 106.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fastcpd-0.19.0.tar.gz
Algorithm Hash digest
SHA256 11a7ce68bb99447c525e1d1b6b7f6e7da71de2211b7155507307046dcd3ada50
MD5 ce41bc20d6943ee8abe2109291bc2857
BLAKE2b-256 a33189e33397fb46d26f0f3ca601dba94248daf039d8be31bb64a31b825da625

See more details on using hashes here.

File details

Details for the file fastcpd-0.19.0-cp312-cp312-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for fastcpd-0.19.0-cp312-cp312-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 734feb1cfa4f43bbdc0817e38df62891fc14848a424652e3ff9bde48a5748d37
MD5 d18b961a88a241e9452ba03e8bf30ee0
BLAKE2b-256 542c61dda0eac12dca62be55d0c8634935985ea8887e000505189980e122f7b0

See more details on using hashes here.

File details

Details for the file fastcpd-0.19.0-cp311-cp311-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for fastcpd-0.19.0-cp311-cp311-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 ae43689fa7fe64622f28367440a222047387f9e280c4dfab4136afc8f71d12a2
MD5 d235c9fe948414de5d4279f5642b9ae3
BLAKE2b-256 64a715d2fe561071525f32781b1e3884e2b65c2d207fe9a166d946dec5e2c654

See more details on using hashes here.

File details

Details for the file fastcpd-0.19.0-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for fastcpd-0.19.0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 ca2efd90b79fed89d4499b18fd2d48c54f9e4a2aeb0e97abf3d4d1d909d58177
MD5 fe9e7aea94212e8fb2b17b39303001a2
BLAKE2b-256 fd90163e6caf3f0f35df1513f3cd07a95f76dc31458ac031b383ed588b29224b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page