Skip to main content

Fast Bayesian Online Changepoint Detection with C backend

Project description

Fast BOCPD

PyPI version Python License Documentation

High-performance Bayesian Online Changepoint Detection (BOCPD) with a pure C backend and clean Python API. Achieve 10-1500x speedup over existing implementations while maintaining numerical accuracy.

Key Features

  • Exceptional Performance: 10-1500x faster than competing implementations
  • Production Ready: Extensively tested with 98 C unit tests and 359 Python test cases
  • Comprehensive Model Library: 7 observation models covering continuous, discrete, and count data
  • Dual Processing Modes: Online streaming and batch processing (Offline mode)
  • Minimal Dependencies: Only requires NumPy
  • Well Documented: Complete Sphinx documentation with examples and theory
  • Benchmarked: Rigorous performance comparisons against 5 major implementations

Performance Benchmarks

Comparative performance on 100,000 observations:

Library Language Throughput vs Fast-BOCPD
Fast-BOCPD C 25,952 obs/s 1.0x (baseline)
promised-ai Rust 915 obs/s 28.3x slower
ruptures Cython/C 2,564 obs/s* 10.1x slower*
dtolpin Python 163 obs/s 159.2x slower
hildensia PyTorch 17 obs/s** 1,525x slower**

See benchmarks/ for detailed methodology and results.

Installation

pip install fast-bocpd

The C extension is automatically compiled during installation. Requires a C compiler and NumPy.

From Source

git clone https://github.com/TiaanViviers/Fast_BOCPD.git
cd Fast_BOCPD
pip install -e .

Quick Start

Basic Usage

import numpy as np
from fast_bocpd import BOCPD, GaussianNIG, ConstantHazard

# Generate synthetic data with a changepoint
np.random.seed(42)
data = np.concatenate([
    np.random.normal(0, 1, 100),
    np.random.normal(5, 1, 100)
])

# Configure the model
obs_model = GaussianNIG(mu0=0.0, kappa0=1.0, alpha0=1.0, beta0=1.0)
hazard = ConstantHazard(lambda_=100)  # Expected run length = 100
bocpd = BOCPD(obs_model, hazard, max_run_length=200)

# Process data and detect changepoints
for t, x in enumerate(data):
    posterior_r, cp_prob = bocpd.update(x)
    if cp_prob > 0.5:
        print(f"Changepoint detected at t={t} (probability: {cp_prob:.3f})")

Streaming Detection with OnlineChangeDetector

from fast_bocpd import BOCPD, StudentTNG, ConstantHazard, OnlineChangeDetector

# Setup detector
obs_model = StudentTNG(mu0=0.0, kappa0=1.0, alpha0=1.0, beta0=1.0)
hazard = ConstantHazard(lambda_=100)
bocpd = BOCPD(obs_model, hazard)
detector = OnlineChangeDetector(bocpd, min_confidence=0.3)

# Process streaming data
for t, observation in enumerate(data_stream):
    cp = detector.update(observation)
    
    if cp:
        print(f"Changepoint at t={cp.index}")
        print(f"Previous segment: {cp.prev_run_length} observations")
        print(f"Confidence: {cp.confidence:.3f}")

# Retrieve all detected changepoints
changepoints = detector.get_changepoints()
segments = detector.get_segments()

Batch Processing

# Process entire dataset at once
cp_probs = bocpd.batch_update(data)

# Find changepoint locations
changepoints = np.where(cp_probs > 0.5)[0]
print(f"Changepoints detected at: {changepoints}")

Available Models

Observation Models

Continuous Data:

  • GaussianNIG: Gaussian likelihood with Normal-Inverse-Gamma conjugate prior
  • StudentTNG: Student-t likelihood with Normal-Gamma conjugate prior (robust to outliers)
  • StudentTNGGrid: Grid-based Student-t for faster inference with controlled precision

Discrete Data:

  • BernoulliBeta: Bernoulli likelihood (binary outcomes) with Beta conjugate prior
  • BinomialBeta: Binomial likelihood (count of successes) with Beta conjugate prior

Count Data:

  • PoissonGamma: Poisson likelihood (rare events) with Gamma conjugate prior
  • GammaGamma: Gamma likelihood with Gamma conjugate prior (positive continuous data)

Hazard Functions

  • ConstantHazard: Constant changepoint probability (geometric prior on run lengths)

All models feature efficient conjugate Bayesian updates implemented in optimized C code.

Documentation

Complete documentation is available at https://fast-bocpd.readthedocs.io

Documentation includes:

  • Getting Started Guide
  • User Guide with detailed model descriptions
  • API Reference
  • Mathematical Theory and Derivations
  • Example Notebooks
  • Architecture and Implementation Details
  • Benchmark Results and Methodology

Example Notebooks:

Testing

Fast-BOCPD is extensively tested to ensure correctness and reliability:

  • 98 C unit tests: Core algorithm and model implementations
  • 359 Python tests: API, integration, and end-to-end workflows
  • Numerical validation: Results verified against reference implementations
  • Edge case coverage: Boundary conditions and error handling

Run the test suite:

# Run all tests
make test

# C unit tests only
make test-c

# Python tests only
make test-python

Development

Build from Source

git clone https://github.com/TiaanViviers/Fast_BOCPD.git
cd Fast_BOCPD

# Install in development mode
pip install -e .

# Build C library manually (optional)
make lib

# Run tests
make test

Project Structure

Fast_BOCPD/
├── fast_bocpd/           # Python package
│   ├── core.py           # Main BOCPD class
│   ├── models.py         # Observation models
│   ├── hazard.py         # Hazard functions
│   ├── utils.py          # Helper utilities
│   ├── _bindings.py      # C extension bindings
│   └── _c/               # C implementation
│       ├── bocpd_core.c
│       ├── gaussian_nig.c
│       ├── student_t_ng.c
│       └── ...
├── tests/                # Test suite
│   ├── python/           # Python integration tests
│   └── c_tests/          # C unit tests
├── examples/             # Jupyter notebooks
├── docs/                 # Sphinx documentation
├── benchmarks/           # Performance comparisons
└── Makefile              # Build system

Algorithm

Fast-BOCPD implements the Bayesian Online Changepoint Detection algorithm (Adams & MacKay, 2007). The algorithm:

  1. Maintains a distribution over run lengths (time since last changepoint)
  2. Updates beliefs online as new data arrives
  3. Provides probabilistic changepoint detection without threshold tuning
  4. Supports arbitrary observation models via conjugate Bayesian updates

The implementation uses dynamic programming with efficient log-space computations to maintain numerical stability.

License

This project is licensed under the MIT License. See LICENSE for details.

Links

Acknowledgments

This implementation is based on the foundational work by Adams and MacKay (2007). Performance optimizations and the C backend were developed to enable real-time changepoint detection in production systems

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_bocpd-1.0.0.tar.gz (40.2 kB view details)

Uploaded Source

File details

Details for the file fast_bocpd-1.0.0.tar.gz.

File metadata

  • Download URL: fast_bocpd-1.0.0.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fast_bocpd-1.0.0.tar.gz
Algorithm Hash digest
SHA256 08585deff68547b92273856d679852dc2540e9b2f960096fdd0d83c6d3d96482
MD5 d89b6da8ca4aac3403b1eefd1a5bd8a1
BLAKE2b-256 b4032c02ba0f576c99eb725e4adf8c245e36324244ef443ed932f4f0d0592316

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page