Skip to main content

Random Forests for Change Point Detection

Project description

Random Forests for Change Point Detection

Change point detection aims to identify structural breaks in the probability distribution of a time series. Existing methods either assume a parametric model for within-segment distributions or are based on ranks or distances and thus fail in scenarios with a reasonably large dimensionality.

changeforest implements a classifier-based algorithm that consistently estimates change points without any parametric assumptions, even in high-dimensional scenarios. It uses the out-of-bag probability predictions of a random forest to construct a classifier log-likelihood ratio that gets optimized using a computationally feasible two-step method.

See [1] for details.

Installation

To install from conda-forge (recommended), run

conda install -c conda-forge changeforest

To install from PyPI, run

pip install changeforest

Example

In the following example, we perform random forest-based change point detection on a simulated dataset with n=600 observations and covariance shifts at t=200, 400.

In [1]: import numpy as np
   ...: 
   ...: Sigma = np.full((5, 5), 0.7)
   ...: np.fill_diagonal(Sigma, 1)
   ...: 
   ...: rng = np.random.default_rng(12)
   ...: X = np.concatenate(
   ...:     (
   ...:         rng.normal(0, 1, (200, 5)),
   ...:         rng.multivariate_normal(np.zeros(5), Sigma, 200, method="cholesky"),
   ...:         rng.normal(0, 1, (200, 5)),
   ...:     ),
   ...:     axis=0,
   ...: )

The simulated dataset X coincides with the change in covariance (CIC) setup described in [1]. Observations in the first and last segments are independently drawn from a standard multivariate Gaussian distribution. Observations in the second segment are i.i.d. normal with mean zero and unit variance, but with a covariance of ρ = 0.7 between coordinates. This is a challenging scenario.

In [2]: from changeforest import changeforest
   ...: 
   ...: result = changeforest(X, "random_forest", "bs")
   ...: result
Out[2]: 
                    best_split max_gain p_value
(0, 600]                   400   14.814   0.005
 ¦--(0, 400]               200   59.314   0.005
 ¦   ¦--(0, 200]             6    -1.95    0.67
 ¦   °--(200, 400]         393   -8.668    0.81
 °--(400, 600]             412   -9.047    0.66

In [3]: result.split_points()
Out[3]: [200, 400]

changeforest correctly identifies the change points at t=200 and t=400. The changeforest function returns a BinarySegmentationResult. We use its plot method to investigate the gain curves maximized by the change point estimates:

In [4]: result.plot().show()

Change point estimates are marked in red.

For method="random_forest" and method="knn", the changeforest algorithm uses a two-step approach to find an optimizer of the gain. This fits a classifier for three split candidates at the segment's 1/4, 1/2 and 3/4 quantiles, computes approximate gain curves using the resulting classifier log-likelihood ratios and selects the overall optimizer as a second guess. We can investigate the gain curves from the optimizer using the plot method of OptimizerResult. The initial guesses are marked in blue.

In [5]: result.optimizer_result.plot().show()

One can observe that the approximate gain curves are piecewise linear, with maxima around the true underlying change points.

The BinarySegmentationResult returned by changeforest is a tree-like object with attributes start, stop, best_split, max_gain, p_value, is_significant, optimizer_result, model_selection_result, left, right and segments. These can be interesting to investigate the output of the algorithm further.

The changeforest algorithm can be tuned with hyperparameters. See here for their descriptions and default values. In Python, the parameters can be specified with the Control class, which can be passed to changeforest. The following will build random forests with 50 trees:

In [6]: from changeforest import Control
   ...: changeforest(X, "random_forest", "bs", Control(random_forest_n_estimators=50))
Out[6]: 
                    best_split max_gain p_value
(0, 600]                   416    7.463    0.01
 ¦--(0, 416]               200   43.935   0.005
 ¦   ¦--(0, 200]           193  -14.993   0.945
 ¦   °--(200, 416]         217    -9.13   0.085
 °--(416, 600]             591   -12.07       1 

The changeforest algorithm still detects change points at t=200, but is slightly off with t=416.

Due to the nature of the change, method="change_in_mean" is unable to detect any change points at all:

In [7]: changeforest(X, "change_in_mean", "bs")
Out[7]: 
          best_split max_gain p_value
(0, 600]         589    8.625  

References

[1] M. Londschien, P. Bühlmann and S. Kovács (2023). "Random Forests for Change Point Detection" Journal of Machine Learning Research

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

changeforest-1.1.4.tar.gz (368.8 kB view hashes)

Uploaded Source

Built Distributions

changeforest-1.1.4-cp312-none-win_amd64.whl (319.3 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

changeforest-1.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

changeforest-1.1.4-cp312-cp312-macosx_11_0_arm64.whl (471.2 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

changeforest-1.1.4-cp312-cp312-macosx_10_12_x86_64.whl (481.3 kB view hashes)

Uploaded CPython 3.12 macOS 10.12+ x86-64

changeforest-1.1.4-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (946.1 kB view hashes)

Uploaded CPython 3.12 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

changeforest-1.1.4-cp311-none-win_amd64.whl (318.9 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

changeforest-1.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

changeforest-1.1.4-cp311-cp311-macosx_11_0_arm64.whl (471.8 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

changeforest-1.1.4-cp311-cp311-macosx_10_12_x86_64.whl (482.4 kB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

changeforest-1.1.4-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (947.8 kB view hashes)

Uploaded CPython 3.11 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

changeforest-1.1.4-cp310-none-win_amd64.whl (319.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

changeforest-1.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

changeforest-1.1.4-cp310-cp310-macosx_11_0_arm64.whl (471.8 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

changeforest-1.1.4-cp310-cp310-macosx_10_12_x86_64.whl (482.5 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

changeforest-1.1.4-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (947.9 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

changeforest-1.1.4-cp39-none-win_amd64.whl (319.9 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

changeforest-1.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

changeforest-1.1.4-cp39-cp39-macosx_11_0_arm64.whl (471.9 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

changeforest-1.1.4-cp39-cp39-macosx_10_12_x86_64.whl (482.7 kB view hashes)

Uploaded CPython 3.9 macOS 10.12+ x86-64

changeforest-1.1.4-cp39-cp39-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (947.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

changeforest-1.1.4-cp38-none-win_amd64.whl (319.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

changeforest-1.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

changeforest-1.1.4-cp38-cp38-macosx_11_0_arm64.whl (472.1 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

changeforest-1.1.4-cp38-cp38-macosx_10_12_x86_64.whl (482.9 kB view hashes)

Uploaded CPython 3.8 macOS 10.12+ x86-64

changeforest-1.1.4-cp38-cp38-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (948.1 kB view hashes)

Uploaded CPython 3.8 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page