Skip to main content

A Python package for performing fast change point detection.

Project description

Fast Change Point Detection

Codecov test coverage CodeFactor CRAN status doi R-CMD-check.yaml r-universe Python version Python package

Overview

The fastcpd (fast change point detection) is a fast implmentation of change point detection methods in R/Python.

Documentation

Installation

R

# install.packages("devtools")
devtools::install_github("doccstat/fastcpd")
# or install from CRAN
install.packages("fastcpd")

Python WIP

# python -m ensurepip --upgrade
pip install .
# or install from PyPI
pip install fastcpd

Usage

R

set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
  x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
  x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd::fastcpd.ar(x[3 + seq_len(n)], 3, r.progress = FALSE)
summary(result)
#> 
#> Call:
#> fastcpd::fastcpd.ar(data = x[3 + seq_len(n)], order = 3, r.progress = FALSE)
#> 
#> Change points:
#> 614 
#> 
#> Cost values:
#> 2754.116 2038.945 
#> 
#> Parameters:
#>     segment 1 segment 2
#> 1  0.57120256 0.2371809
#> 2 -0.20985108 0.4031244
#> 3  0.08221978 0.2290323
plot(result)

Python WIP

import fastcpd.segmentation
from numpy import concatenate
from numpy.random import normal, multivariate_normal
covariance_mat = [[100, 0, 0], [0, 100, 0], [0, 0, 100]]
data = concatenate((multivariate_normal([0, 0, 0], covariance_mat, 300),
                    multivariate_normal([50, 50, 50], covariance_mat, 400),
                    multivariate_normal([2, 2, 2], covariance_mat, 300)))
fastcpd.segmentation.mean(data)

import fastcpd.variance_estimation
fastcpd.variance_estimation.mean(data)

Comparison

library(microbenchmark)
set.seed(1)
n <- 5 * 10^6
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
ggplot2::autoplot(microbenchmark(
  wbs = wbs::wbs(mean_data),
  not = not::not(mean_data, contrast = "pcwsConstMean"),
  changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
  jointseg = jointseg::jointSeg(mean_data, K = 12),
  fpop = fpop::Fpop(mean_data, 2 * log(n)),
  mosum = mosum::mosum(c(mean_data), G = 40),
  fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1)
))
#> Warning in microbenchmark(wbs = wbs::wbs(mean_data), not = not::not(mean_data,
#> : less accurate nanosecond times to avoid potential integer overflows

library(microbenchmark)
set.seed(1)
n <- 10^8
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
system.time(fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1))
#>    user  system elapsed 
#>  11.753   9.150  26.455 
system.time(mosum::mosum(c(mean_data), G = 40))
#>    user  system elapsed 
#>   5.518  11.516  38.368 
system.time(fpop::Fpop(mean_data, 2 * log(n)))
#>    user  system elapsed 
#>  35.926   5.231  58.269 
system.time(changepoint::cpt.mean(mean_data, method = "PELT"))
#>    user  system elapsed 
#>  32.342   9.681  66.056 
ggplot2::autoplot(microbenchmark(
  changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
  fpop = fpop::Fpop(mean_data, 2 * log(n)),
  mosum = mosum::mosum(c(mean_data), G = 40),
  fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1),
  times = 10
))

Some packages are not included in the microbenchmark comparison due to either memory constraints or long running time.

# Device: Mac mini (M1, 2020)
# Memory: 8 GB
system.time(CptNonPar::np.mojo(mean_data, G = floor(length(mean_data) / 6)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.061 0.026 0.092
system.time(ecp::e.divisive(matrix(mean_data)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.076 0.044 0.241
system.time(strucchange::breakpoints(y ~ 1, data = data.frame(y = mean_data)))
#> Timing stopped at: 265.1 145.8 832.5
system.time(breakfast::breakfast(mean_data))
#> Timing stopped at: 45.9 89.21 562.3

Cheatsheet

fastcpd cheatsheet

References

FAQ

Should I install suggested packages?

The suggested packages are not required for the main functionality of the package. They are only required for the vignettes. If you want to learn more about the package comparison and other vignettes, you could either check out vignettes on CRAN or pkgdown generated documentation.

I countered problems related to gfortran on Mac OSX or Linux!

The package should be able to install on Mac and any Linux distribution without any problems if all the dependencies are installed. However, if you encountered problems related to gfortran, it might be because RcppArmadillo is not installed previously. Try Mac OSX stackoverflow solution or Linux stackover solution if you have trouble installing RcppArmadillo.

We welcome contributions from everyone. Please follow the instructions below to make contributions.
  1. Fork the repo.

  2. Create a new branch from main branch.

  3. Make changes and commit them.

    1. Please follow the Google’s R style guide for naming variables and functions.
    2. If you are adding a new family of models with new cost functions with corresponding gradient and Hessian, please add them to src/fastcpd_class_cost.cc with proper example and tests in vignettes/gallery.Rmd and tests/testthat/test-gallery.R.
    3. Add the family name to src/fastcpd_constants.h.
    4. [Recommended] Add a new wrapper function in R/fastcpd_wrappers.R for the new family of models and move the examples to the new wrapper function as roxygen examples.
    5. Add the new wrapper function to the corresponding section in _pkgdown.yml.
  4. Push the changes to your fork.

  5. Create a pull request.

  6. Make sure the pull request does not create new warnings or errors in devtools::check().

Trouble installing Python package.

Python headers are required to install the Python package. If you are using Ubuntu, you can install the headers with:

sudo apt install python3-dev
Encountered a bug or unintended behavior?
  1. File a ticket at GitHub Issues.
  2. Contact the authors specified in DESCRIPTION.

Stargazers over time

Stargazers over time

Acknowledgements

Special thanks to clODE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastcpd-0.18.0.tar.gz (114.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastcpd-0.18.0-cp312-cp312-macosx_12_0_universal2.whl (264.7 kB view details)

Uploaded CPython 3.12macOS 12.0+ universal2 (ARM64, x86-64)

fastcpd-0.18.0-cp311-cp311-macosx_12_0_universal2.whl (264.2 kB view details)

Uploaded CPython 3.11macOS 12.0+ universal2 (ARM64, x86-64)

fastcpd-0.18.0-cp310-cp310-macosx_13_0_x86_64.whl (264.2 kB view details)

Uploaded CPython 3.10macOS 13.0+ x86-64

File details

Details for the file fastcpd-0.18.0.tar.gz.

File metadata

  • Download URL: fastcpd-0.18.0.tar.gz
  • Upload date:
  • Size: 114.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for fastcpd-0.18.0.tar.gz
Algorithm Hash digest
SHA256 5f385d1bc77db691125bfe842ef5fccfeb7587fcbff251ce89398a6358fc786b
MD5 003cc84a8a9204f13fbb1336dfe3e73b
BLAKE2b-256 c8f9ac56831e0bcbca224efd3077a288250e169c8cdd3885f2136b2712c35c9e

See more details on using hashes here.

File details

Details for the file fastcpd-0.18.0-cp312-cp312-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for fastcpd-0.18.0-cp312-cp312-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 fe6dc710a04db4a92f7ee780634da55053a3310b1dd5ec003b67dae26e604bd8
MD5 5d30666eef5926dc47046cf7c8debaf1
BLAKE2b-256 00054678d15b2d0f54dcad2e81bac8efd26cfcdf477dfe4d90311b5d792d835b

See more details on using hashes here.

File details

Details for the file fastcpd-0.18.0-cp311-cp311-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for fastcpd-0.18.0-cp311-cp311-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 f032640c1aba972cd964a00fa819579ed816ce71c9efc666996dbb8157e6ad14
MD5 5a305adf5a104f2f84cca5921b0bbacc
BLAKE2b-256 22ac41dd0d3b18d770e7c89c798a6f1eefc5124a4c65de51e7a14cdc79665bcc

See more details on using hashes here.

File details

Details for the file fastcpd-0.18.0-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for fastcpd-0.18.0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 1105d7560a5ed6b6972a8e9acb0321c362fec35148ea3d7fadb662fee2ecbe7e
MD5 83ec8fca20b86dd2b9c266ae609ada18
BLAKE2b-256 567735363232a55bde3a1650f0f374e458e43604a89af905596f0d466606d16b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page