Classifier based non-parametric change point detection
Project description
Random Forests for Change Point Detection
Change point detection aims to identify structural breaks in the probability distribution of a time series. Existing methods either assume a parametric model for within-segment distributions or are based on ranks or distances and thus fail in scenarios with a reasonably large dimensionality.
changeforest
implements a classifier-based algorithm that consistently estimates
change points without any parametric assumptions, even in high-dimensional scenarios.
It uses the out-of-bag probability predictions of a random forest to construct a
pseudo-log-likelihood that gets optimized using a computationally feasible two-step
method.
See [1] for details.
Installation
To install from conda-forge
(recommended), run
conda install -c conda-forge changeforest
To install from PyPI
, run
pip install changeforest
Example
In the following example, we perform random forest-based change point detection on
a simulated dataset with n=600
observations and covariance shifts at t=200, 400
.
In [1]: import numpy as np
...:
...: Sigma = np.full((5, 5), 0.7)
...: np.fill_diagonal(Sigma, 1)
...:
...: rng = np.random.default_rng(12)
...: X = np.concatenate(
...: (
...: rng.normal(0, 1, (200, 5)),
...: rng.multivariate_normal(np.zeros(5), Sigma, 200),
...: rng.normal(0, 1, (200, 5)),
...: ),
...: axis=0,
...: )
The simulated dataset X
coincides with the change in covariance (CIC) setup
described in [1]. Observations in the first and last segment are independently drawn
from a standard multivariate Gaussian distribution. Observations in the second segment
are i.i.d. normal with mean zero and unit variance, but with a covariance of ρ = 0.7
between coordinates. This is a challenging scenario.
In [2]: from changeforest import changeforest
...:
...: result = changeforest(X, "random_forest", "bs")
...: result
Out[2]:
best_split max_gain p_value
(0, 600] 412 19.603 0.005
¦--(0, 412] 201 62.981 0.005
¦ ¦--(0, 201] 194 -12.951 0.76
¦ °--(201, 412] 211 -9.211 0.545
°--(412, 600] 418 -37.519 0.915
In [3]: result.split_points()
Out[3]: [201, 412]
changeforest
correctly identifies the change point around t=200
but is slightly
off at t=412
. The changeforest
function returns a BinarySegmentationResult
.
We use its plot
method to investigate the gain curves maximized by the change point estimates:
In [4]: result.plot().show()
Change point estimates are marked in red.
For method="random_forest"
and method="knn"
, the changeforest
algorithm uses a two-step approach to
find an optimizer of the gain. This fits a classifier for three split candidates
at the segment's 1/4, 1/2 and 3/4 quantiles, computes approximate gain curves using
the resulting pseudo-log-likelihoods and selects the overall optimizer as a second guess.
We can investigate the gain curves from the optimizer using the plot
method of OptimizerResult
.
The initial guesses are marked in blue.
In [5]: result.optimizer_result.plot().show()
One can observe that the approximate gain curves are piecewise linear, with maxima around the true underlying change points.
The BinarySegmentationResult
returned by changeforest
is a tree-like object with attributes
start
, stop
, best_split
, max_gain
, p_value
, is_significant
, optimizer_result
, model_selection_result
, left
, right
and segments
.
These can be interesting to investigate the output of the algorithm further.
The changeforest
algorithm can be tuned with hyperparameters. See
here
for their descriptions and default values. In Python, the parameters can
be specified with the Control
class,
which can be passed to changeforest
. The following will build random forests with
20 trees:
In [6]: from changeforest import Control
...: changeforest(X, "random_forest", "bs", Control(random_forest_n_estimators=20))
Out[6]:
best_split max_gain p_value
(0, 600] 592 -11.786 0.01
¦--(0, 592] 121 -6.26 0.015
¦ ¦--(0, 121] 13 -14.219 0.615
¦ °--(121, 592] 416 21.272 0.005
¦ ¦--(121, 416] 201 37.157 0.005
¦ ¦ ¦--(121, 201] 192 -17.54 0.65
¦ ¦ °--(201, 416] 207 -6.701 0.74
¦ °--(416, 592] 584 -44.054 0.935
°--(592, 600]
The changeforest
algorithm still detects change points around t=200, 400
but also
returns two false positives.
Due to the nature of the change, method="change_in_mean"
is unable to detect any
change points at all:
In [7]: changeforest(X, "change_in_mean", "bs")
Out[7]:
best_split max_gain p_value
(0, 600] 589 8.318
References
[1] M. Londschien, S. Kovács and P. Bühlmann (2022), "Random Forests for Change Point Detection", working paper.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for changeforest-0.7.2-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 274f9243c6731e55b1bb493f23d1969214cbaa0adb1149a93d344977aaf1540f |
|
MD5 | 084163767705a7f575c7bab8301cd83e |
|
BLAKE2b-256 | 16160a83647f62ea3d1867aca5b6208fc49f35d9365a21723c9504afa4025161 |
Hashes for changeforest-0.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91bd8c6b7b57fdeebf148724be45edf43845e20f27278b2f5900bfb431e269e2 |
|
MD5 | c41dbcc8a8ed59bf16d7fd97f9437746 |
|
BLAKE2b-256 | b9abac47e9aa041285eb24bd7e2fcd5e34ffefd3dec13d6906aeb73fc7e65760 |
Hashes for changeforest-0.7.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4e78dd6b044a0c705e928ac27408214003934f832aa93631cf50e21d7a935e1 |
|
MD5 | 17c19b743cf1f33f22b7f28902be6429 |
|
BLAKE2b-256 | d8d0b8a329d0277eb81d6d61a485c172640120d19cbb2f00fdf29666a791c2e6 |
Hashes for changeforest-0.7.2-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 124b17d6edbe1ed3c4e9bbd4177b70e1f52e15dceb1555d37bf5435f9213a193 |
|
MD5 | 67b9e359230d609b7878d069d52ecd07 |
|
BLAKE2b-256 | 763caaba6a959cf0022213194d7c186ef72037694eb4500751d325193c1b893e |
Hashes for changeforest-0.7.2-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e794b2dee3b261ec218ccba442f9026224480d57e5332e85fa81f54c8ea4d660 |
|
MD5 | 4f28ae5505e4c2d7a83f54e8321abde5 |
|
BLAKE2b-256 | 1870e02047b6e5d4e64b1d5b76cfc8a8e78407045190c3776c1cef07379d79ce |
Hashes for changeforest-0.7.2-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1711f8f3858926ac83ad17e3407513a87a30a24369ff26a83852f41177e73f54 |
|
MD5 | 64b11c2f8437f44722985967b8d459f9 |
|
BLAKE2b-256 | 4f7890d6f4b173c67f9ef5b84959476e14a5cef87af332f315842125594de490 |
Hashes for changeforest-0.7.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c581b57be86457c50bae7f8d8e8ccc6a85a28ebb8530287f3da9728afb6acef6 |
|
MD5 | 6c6df95049bdadb2b19c29bd9d2fa3ea |
|
BLAKE2b-256 | 601e5e3c821cbbb54567408629ace42b04d35583d78d41b21024eca6d7638405 |
Hashes for changeforest-0.7.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 959c7eb015467100143e28d4be5e563a286e0f749ec3bdeb8fb47eccc83ced0d |
|
MD5 | 29c41b0850647350027201d6efe86fb0 |
|
BLAKE2b-256 | a4020b14e7d3f25ee648bfa7727701df7d45ac4a8b408732ec5c2e5e391223c2 |
Hashes for changeforest-0.7.2-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d21ca15abf9f63fd7c43976d9c186efd19d4b510430663bbb7067bbd96a3e9a |
|
MD5 | 574651544f8ae9996db0fca6dd7c8f54 |
|
BLAKE2b-256 | d2c86ce8ce2c2ce140bc7e4d954f2b9bf872c3dbf13f9d1e4d2948eeb1719123 |
Hashes for changeforest-0.7.2-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93108aafa30fc2a3f193ca25e6dce7777b1b296af31a2447c11943ab845a97c4 |
|
MD5 | 1ae9326c98d541c878f1e58dae8eadb5 |
|
BLAKE2b-256 | 8fcfaca5e2035129d731b72a110b5485611504e798510d5e8364064092b13b55 |
Hashes for changeforest-0.7.2-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c104d55fbb29dbc2e7bda1aa20840d8fe00d9f98b0994b0dc1705ebadbcf64e |
|
MD5 | e866852bf824c722e3bfdf9bb358f0c1 |
|
BLAKE2b-256 | 9f747f3b695db0dd4fe78e3a9fca2fbcd22ff1876f61705fe8816d0ee4444c11 |
Hashes for changeforest-0.7.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4f81e3b4dc809b071122d950a656818b4f3781949cd145e01592539f68eb609 |
|
MD5 | 35dee8b75a2afb29c9a630cf1d7c82d4 |
|
BLAKE2b-256 | 0d0331110d8d0b0bb850566f128a7a8a8aedc6b6042d18dcebe1b5890b6f0ca5 |
Hashes for changeforest-0.7.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84ee38f212e756529d198140d198abf5261384d9bc8312ee1de63fc1e81ce10c |
|
MD5 | 4d9b929674669d42d7e98360eecd724f |
|
BLAKE2b-256 | f57fdefc2c23f1f91a95c3380bebda5e921c26d30e3fb5a249b3f2762c013869 |
Hashes for changeforest-0.7.2-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce5f2d1fe29f599b89ef40e0a3c98a82b6746af1a9770693f23701c39f35c8a5 |
|
MD5 | 49ca58094fa68c3176b001e2def8acbd |
|
BLAKE2b-256 | b72e0e5bc6b4806e8cff9b092a0d59fe0ca322b4b22caff1c199a846c68a6cf5 |
Hashes for changeforest-0.7.2-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b90ccda8793949e5be4317e83d5e500bbd07c7fb56836b08c2ac63f32cfc854 |
|
MD5 | 7dbba53410cffd5992f71f7c5b469fb7 |
|
BLAKE2b-256 | eea8d718687f2b33e932c562647577917cdc9045df4a4290a94ef79ca8a52f15 |
Hashes for changeforest-0.7.2-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0141c252da3f4e57b8272740e09a4b7b963118ab4caefb49b60dac79a90bc8b |
|
MD5 | a4d82d28c1fea025b3a696f466582618 |
|
BLAKE2b-256 | 47c254183187eef2e13e7189703420ac9652c89d600d65a12942c07d9c2b059d |
Hashes for changeforest-0.7.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e121f4b429df3ad2926e1aed61e66e9edff7d9b59de529415b374f84f705519 |
|
MD5 | 14df79982e2cfbfa7641aa0e5bb3689a |
|
BLAKE2b-256 | a537f7e9fa421be5d378c30eee34c2d5f1160b4c03d9a1b500e5d6aa9108c108 |
Hashes for changeforest-0.7.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7aaeab4a5becff978f10b3f5776f6c88698edcb3cd5328c26c61b6c8ba2650c |
|
MD5 | d965c4af5dd92a2139dc0fabb2c2cdc5 |
|
BLAKE2b-256 | b608dd0476cec9fb60ae8bb27b2d0d97f257406bdac8aa2dcf76cd723cee52e7 |
Hashes for changeforest-0.7.2-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcead3944a14a866ac6ecb4bd9b092466a367d43fbf394e0a5c484b0eecf7adb |
|
MD5 | 5ecfe6d01289172125518e61b2f99953 |
|
BLAKE2b-256 | de07c34cf14d916ff36a449d9907679acf4c3829d3f1b5d86ea2c48f03b35907 |