Skip to main content

A Python package for performing fast change point detection.

Project description

fastcpd: Fast Change Point Detection in R

Codecov test coverage CodeFactor CRAN status doi Downloads Last Commit R-CMD-check.yaml r-universe

Overview

The fastcpd (fast change point detection) is a fast implmentation of change point detection methods in R. The fastcpd package is designed to find change points in a fast manner. It is easy to install and extensible to all kinds of change point problems with a user specified cost function apart from the built-in cost functions.

To learn more behind the algorithms:

Installation

R

# install.packages("devtools")
devtools::install_github("doccstat/fastcpd")
# or install from CRAN
install.packages("fastcpd")

Python WIP

# python -m ensurepip --upgrade
pip install .
# or install from TestPyPI
pip install --extra-index-url https://test.pypi.org/simple/ fastcpd

Usage

R

set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
  x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd::fastcpd.ar(x[3 + seq_len(n)], 3, r.progress = FALSE)
summary(result)
#> 
#> Call:
#> fastcpd::fastcpd.ar(data = x[3 + seq_len(n)], order = 3, r.progress = FALSE)
#> 
#> Change points:
#> 614 
#> 
#> Cost values:
#> 2754.116 2038.945 
#> 
#> Parameters:
#>     segment 1 segment 2
#> 1  0.57120256 0.2371809
#> 2 -0.20985108 0.4031244
#> 3  0.08221978 0.2290323
plot(result)

Python WIP

import fastcpd
from numpy import concatenate
from numpy.random import normal, multivariate_normal
covariance_mat = [[100, 0, 0], [0, 100, 0], [0, 0, 100]]
data = concatenate((multivariate_normal([0, 0, 0], covariance_mat, 300),
                    multivariate_normal([50, 50, 50], covariance_mat, 400),
                    multivariate_normal([2, 2, 2], covariance_mat, 300)))
fastcpd.mean(data)
fastcpd.variance_estimation.mean(data)

Comparison

library(microbenchmark)
set.seed(1)
n <- 5 * 10^6
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
ggplot2::autoplot(microbenchmark(
  wbs = wbs::wbs(mean_data),
  not = not::not(mean_data, contrast = "pcwsConstMean"),
  changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
  jointseg = jointseg::jointSeg(mean_data, K = 12),
  fpop = fpop::Fpop(mean_data, 2 * log(n)),
  mosum = mosum::mosum(c(mean_data), G = 40),
  fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1)
))
#> Warning in microbenchmark(wbs = wbs::wbs(mean_data), not = not::not(mean_data,
#> : less accurate nanosecond times to avoid potential integer overflows

library(microbenchmark)
set.seed(1)
n <- 10^8
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
system.time(fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1))
#>    user  system elapsed 
#>  11.753   9.150  26.455 
system.time(mosum::mosum(c(mean_data), G = 40))
#>    user  system elapsed 
#>   5.518  11.516  38.368 
system.time(fpop::Fpop(mean_data, 2 * log(n)))
#>    user  system elapsed 
#>  35.926   5.231  58.269 
system.time(changepoint::cpt.mean(mean_data, method = "PELT"))
#>    user  system elapsed 
#>  32.342   9.681  66.056 
ggplot2::autoplot(microbenchmark(
  changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
  fpop = fpop::Fpop(mean_data, 2 * log(n)),
  mosum = mosum::mosum(c(mean_data), G = 40),
  fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1),
  times = 10
))

Some packages are not included in the microbenchmark comparison due to either memory constraints or long running time.

# Device: Mac mini (M1, 2020)
# Memory: 8 GB
system.time(CptNonPar::np.mojo(mean_data, G = floor(length(mean_data) / 6)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.061 0.026 0.092
system.time(ecp::e.divisive(matrix(mean_data)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.076 0.044 0.241
system.time(strucchange::breakpoints(y ~ 1, data = data.frame(y = mean_data)))
#> Timing stopped at: 265.1 145.8 832.5
system.time(breakfast::breakfast(mean_data))
#> Timing stopped at: 45.9 89.21 562.3

Cheatsheet

fastcpd cheatsheet

Function references

FAQ

Should I install suggested packages?

The suggested packages are not required for the main functionality of the package. They are only required for the vignettes. If you want to learn more about the package comparison and other vignettes, you could either check out vignettes on CRAN or pkgdown generated documentation.

I countered problems related to gfortran on Mac OSX or Linux!

The package should be able to install on Mac and any Linux distribution without any problems if all the dependencies are installed. However, if you encountered problems related to gfortran, it might be because RcppArmadillo is not installed previously. Try Mac OSX stackoverflow solution or Linux stackover solution if you have trouble installing RcppArmadillo.

We welcome contributions from everyone. Please follow the instructions below to make contributions.
  1. Fork the repo.

  2. Create a new branch from main branch.

  3. Make changes and commit them.

    1. Please follow the Google’s R style guide for naming variables and functions.
    2. If you are adding a new family of models with new cost functions with corresponding gradient and Hessian, please add them to src/fastcpd_class_cost.cc with proper example and tests in vignettes/gallery.Rmd and tests/testthat/test-gallery.R.
    3. Add the family name to src/fastcpd_constants.h.
    4. [Recommended] Add a new wrapper function in R/fastcpd_wrappers.R for the new family of models and move the examples to the new wrapper function as roxygen examples.
    5. Add the new wrapper function to the corresponding section in _pkgdown.yml.
  4. Push the changes to your fork.

  5. Create a pull request.

  6. Make sure the pull request does not create new warnings or errors in devtools::check().

Trouble installing Python package.

Python headers are required to install the Python package. If you are using Ubuntu, you can install the headers with:

sudo apt install python3-dev
Encountered a bug or unintended behavior?
  1. File a ticket at GitHub Issues.
  2. Contact the authors specified in DESCRIPTION.

Stargazers over time

Stargazers over time

Acknowledgements

Special thanks to clODE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastcpd-0.17.0.tar.gz (6.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastcpd-0.17.0-cp312-cp312-macosx_12_0_universal2.whl (201.4 kB view details)

Uploaded CPython 3.12macOS 12.0+ universal2 (ARM64, x86-64)

fastcpd-0.17.0-cp311-cp311-macosx_12_0_universal2.whl (200.8 kB view details)

Uploaded CPython 3.11macOS 12.0+ universal2 (ARM64, x86-64)

fastcpd-0.17.0-cp310-cp310-macosx_13_0_x86_64.whl (200.8 kB view details)

Uploaded CPython 3.10macOS 13.0+ x86-64

File details

Details for the file fastcpd-0.17.0.tar.gz.

File metadata

  • Download URL: fastcpd-0.17.0.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fastcpd-0.17.0.tar.gz
Algorithm Hash digest
SHA256 ac95caa83c9a3cd948603c93f1879244ba9102e010fcb807174facbf79431b64
MD5 c9f2a521674f8756ba8a41917bbf308b
BLAKE2b-256 14e761873c89c1dac2b0beed5155d2a14cc51656b5085929ec56b5d77a571095

See more details on using hashes here.

File details

Details for the file fastcpd-0.17.0-cp312-cp312-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for fastcpd-0.17.0-cp312-cp312-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 c4f902f45a4ffac2c64ab69add73b58cee7c49b4b6bf231002326c570dc6d678
MD5 5ca71d82b6dfeadf819b151c8c1769f2
BLAKE2b-256 afb5d66336f23d9fe90b8bdbd331b64990d6e304d056803f58b063451b6a2265

See more details on using hashes here.

File details

Details for the file fastcpd-0.17.0-cp311-cp311-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for fastcpd-0.17.0-cp311-cp311-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 96a852c14fb695751610f6a75c7fa425e0f4068848790edfd6aff5e628fc65b4
MD5 5faf11811ec15355aca06763a38c4583
BLAKE2b-256 babf5e8c90d316ec0205ce457dd516a7c0ecc6e932bdd49662f4708c32c3ef99

See more details on using hashes here.

File details

Details for the file fastcpd-0.17.0-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for fastcpd-0.17.0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 78b950dd5334498e09a8456e84e393e9efb8be72a9a00ea2b57943702ec5807f
MD5 dbbb328c0bca10f14e6e8a1136cfd7e0
BLAKE2b-256 fc01282300d7a8a6954bcebc050a93388486048ff315756677584c0b57071655

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page