A Python package for performing fast change point detection.
Project description
fastcpd: Fast Change Point Detection in R 
Overview
The fastcpd (fast change point detection) is a fast implmentation of change point detection methods in R. The fastcpd package is designed to find change points in a fast manner. It is easy to install and extensible to all kinds of change point problems with a user specified cost function apart from the built-in cost functions.
To learn more behind the algorithms:
- fastcpd: Fast Change Point Detection in R
- Sequential Gradient Descent and Quasi-Newton’s Method for Change-Point Analysis
Installation
R
# install.packages("devtools")
devtools::install_github("doccstat/fastcpd")
# or install from CRAN
install.packages("fastcpd")
Python WIP
# python -m ensurepip --upgrade
pip install .
# or install from TestPyPI
pip install --extra-index-url https://test.pypi.org/simple/ fastcpd
Usage
R
set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd::fastcpd.ar(x[3 + seq_len(n)], 3, r.progress = FALSE)
summary(result)
#>
#> Call:
#> fastcpd::fastcpd.ar(data = x[3 + seq_len(n)], order = 3, r.progress = FALSE)
#>
#> Change points:
#> 614
#>
#> Cost values:
#> 2754.116 2038.945
#>
#> Parameters:
#> segment 1 segment 2
#> 1 0.57120256 0.2371809
#> 2 -0.20985108 0.4031244
#> 3 0.08221978 0.2290323
plot(result)
Python WIP
import fastcpd
from numpy import concatenate
from numpy.random import normal, multivariate_normal
covariance_mat = [[100, 0, 0], [0, 100, 0], [0, 0, 100]]
data = concatenate((multivariate_normal([0, 0, 0], covariance_mat, 300),
multivariate_normal([50, 50, 50], covariance_mat, 400),
multivariate_normal([2, 2, 2], covariance_mat, 300)))
fastcpd.mean(data)
fastcpd.variance_estimation.mean(data)
Comparison
library(microbenchmark)
set.seed(1)
n <- 5 * 10^6
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
ggplot2::autoplot(microbenchmark(
wbs = wbs::wbs(mean_data),
not = not::not(mean_data, contrast = "pcwsConstMean"),
changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
jointseg = jointseg::jointSeg(mean_data, K = 12),
fpop = fpop::Fpop(mean_data, 2 * log(n)),
mosum = mosum::mosum(c(mean_data), G = 40),
fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1)
))
#> Warning in microbenchmark(wbs = wbs::wbs(mean_data), not = not::not(mean_data,
#> : less accurate nanosecond times to avoid potential integer overflows
library(microbenchmark)
set.seed(1)
n <- 10^8
mean_data <- c(rnorm(n / 2, 0, 1), rnorm(n / 2, 50, 1))
system.time(fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1))
#> user system elapsed
#> 11.753 9.150 26.455
system.time(mosum::mosum(c(mean_data), G = 40))
#> user system elapsed
#> 5.518 11.516 38.368
system.time(fpop::Fpop(mean_data, 2 * log(n)))
#> user system elapsed
#> 35.926 5.231 58.269
system.time(changepoint::cpt.mean(mean_data, method = "PELT"))
#> user system elapsed
#> 32.342 9.681 66.056
ggplot2::autoplot(microbenchmark(
changepoint = changepoint::cpt.mean(mean_data, method = "PELT"),
fpop = fpop::Fpop(mean_data, 2 * log(n)),
mosum = mosum::mosum(c(mean_data), G = 40),
fastcpd = fastcpd::fastcpd.mean(mean_data, r.progress = FALSE, cp_only = TRUE, variance_estimation = 1),
times = 10
))
Some packages are not included in the microbenchmark comparison due to
either memory constraints or long running time.
# Device: Mac mini (M1, 2020)
# Memory: 8 GB
system.time(CptNonPar::np.mojo(mean_data, G = floor(length(mean_data) / 6)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.061 0.026 0.092
system.time(ecp::e.divisive(matrix(mean_data)))
#> Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
#> Timing stopped at: 0.076 0.044 0.241
system.time(strucchange::breakpoints(y ~ 1, data = data.frame(y = mean_data)))
#> Timing stopped at: 265.1 145.8 832.5
system.time(breakfast::breakfast(mean_data))
#> Timing stopped at: 45.9 89.21 562.3
Cheatsheet
Function references
- Main function
- Wrapper functions
- Time series
- AR(p):
fastcpd_ar - ARIMA(p, d, q):
fastcpd_arima - ARMA(p, q):
fastcpd_arma - GARCH(p, q):
fastcpd_garch - VAR(p):
fastcpd_var - General time series:
fastcpd_ts
- AR(p):
- Unlabeled data
- Mean change:
fastcpd_mean - Variance change:
fastcpd_variance - Mean and/or variance change:
fastcpd_meanvariance
- Mean change:
- Regression data
- Logistic regression:
fastcpd_binomial - Penalized linear regression:
fastcpd_lasso - Linear regression:
fastcpd_lm - Poisson regression:
fastcpd_poisson
- Logistic regression:
- Time series
- Utility functions
- Variance estimation
- Variance estimation in ARMA models:
variance_arma - Variance estimation in linear models:
variance_lm - Variance estimation in mean change models:
variance_mean - Variance estimation in median change models:
variance_median
- Variance estimation in ARMA models:
- Variance estimation
- Class methods
- Data
- Bitcoin Market Price (USD):
bitcoin - Occupancy Detection Data Set:
occupancy - Transcription Profiling of 57 Human Bladder Carcinoma Samples:
transcriptome - UK Seatbelts Data:
uk_seatbelts - Well-log Dataset from Numerical Bayesian Methods Applied to Signal
Processing:
well_log
- Bitcoin Market Price (USD):
- Main class
FAQ
Should I install suggested packages?
The suggested packages are not required for the main functionality of the package. They are only required for the vignettes. If you want to learn more about the package comparison and other vignettes, you could either check out vignettes on CRAN or pkgdown generated documentation.
I countered problems related to gfortran on Mac OSX or Linux!
The package should be able to install on Mac and any Linux distribution
without any problems if all the dependencies are installed. However, if
you encountered problems related to gfortran, it might be because
RcppArmadillo is not installed previously. Try Mac OSX stackoverflow
solution or Linux stackover
solution if you have trouble
installing RcppArmadillo.
We welcome contributions from everyone. Please follow the instructions below to make contributions.
-
Fork the repo.
-
Create a new branch from
mainbranch. -
Make changes and commit them.
- Please follow the Google’s R style guide for naming variables and functions.
- If you are adding a new family of models with new cost functions
with corresponding gradient and Hessian, please add them to
src/fastcpd_class_cost.ccwith proper example and tests invignettes/gallery.Rmdandtests/testthat/test-gallery.R. - Add the family name to
src/fastcpd_constants.h. - [Recommended] Add a new wrapper function in
R/fastcpd_wrappers.Rfor the new family of models and move the examples to the new wrapper function as roxygen examples. - Add the new wrapper function to the corresponding section in
_pkgdown.yml.
-
Push the changes to your fork.
-
Create a pull request.
-
Make sure the pull request does not create new warnings or errors in
devtools::check().
Trouble installing Python package.
Python headers are required to install the Python package. If you are using Ubuntu, you can install the headers with:
sudo apt install python3-dev
Encountered a bug or unintended behavior?
- File a ticket at GitHub Issues.
- Contact the authors specified in DESCRIPTION.
Stargazers over time
Acknowledgements
Special thanks to clODE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastcpd-0.17.0.tar.gz.
File metadata
- Download URL: fastcpd-0.17.0.tar.gz
- Upload date:
- Size: 6.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac95caa83c9a3cd948603c93f1879244ba9102e010fcb807174facbf79431b64
|
|
| MD5 |
c9f2a521674f8756ba8a41917bbf308b
|
|
| BLAKE2b-256 |
14e761873c89c1dac2b0beed5155d2a14cc51656b5085929ec56b5d77a571095
|
File details
Details for the file fastcpd-0.17.0-cp312-cp312-macosx_12_0_universal2.whl.
File metadata
- Download URL: fastcpd-0.17.0-cp312-cp312-macosx_12_0_universal2.whl
- Upload date:
- Size: 201.4 kB
- Tags: CPython 3.12, macOS 12.0+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4f902f45a4ffac2c64ab69add73b58cee7c49b4b6bf231002326c570dc6d678
|
|
| MD5 |
5ca71d82b6dfeadf819b151c8c1769f2
|
|
| BLAKE2b-256 |
afb5d66336f23d9fe90b8bdbd331b64990d6e304d056803f58b063451b6a2265
|
File details
Details for the file fastcpd-0.17.0-cp311-cp311-macosx_12_0_universal2.whl.
File metadata
- Download URL: fastcpd-0.17.0-cp311-cp311-macosx_12_0_universal2.whl
- Upload date:
- Size: 200.8 kB
- Tags: CPython 3.11, macOS 12.0+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96a852c14fb695751610f6a75c7fa425e0f4068848790edfd6aff5e628fc65b4
|
|
| MD5 |
5faf11811ec15355aca06763a38c4583
|
|
| BLAKE2b-256 |
babf5e8c90d316ec0205ce457dd516a7c0ecc6e932bdd49662f4708c32c3ef99
|
File details
Details for the file fastcpd-0.17.0-cp310-cp310-macosx_13_0_x86_64.whl.
File metadata
- Download URL: fastcpd-0.17.0-cp310-cp310-macosx_13_0_x86_64.whl
- Upload date:
- Size: 200.8 kB
- Tags: CPython 3.10, macOS 13.0+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78b950dd5334498e09a8456e84e393e9efb8be72a9a00ea2b57943702ec5807f
|
|
| MD5 |
dbbb328c0bca10f14e6e8a1136cfd7e0
|
|
| BLAKE2b-256 |
fc01282300d7a8a6954bcebc050a93388486048ff315756677584c0b57071655
|