Skip to main content

Fast change-point detection bindings backed by Rust.

Project description

changepoint-doctor Python Bindings (MVP-A)

changepoint-doctor exposes fast offline change-point detection from Rust into Python.

For citation and provenance policy, see ../CITATION.cff and ../docs/clean_room_policy.md.

Install

From PyPI (target release 0.0.3):

python -m pip install --upgrade pip
python -m pip install changepoint-doctor==0.0.3

For local development from this repository:

cd cpd/python
python -m pip install --upgrade pip maturin
maturin develop --release --manifest-path ../crates/cpd-python/Cargo.toml
python -m pip install --upgrade ".[dev]"

Apple Silicon contributors should run the architecture checks and sanity path in ../docs/python_apple_silicon_toolchain.md before debugging pyo3/linker errors.

Common extras:

  • plot: python -m pip install "changepoint-doctor[plot]==0.0.3"
  • notebooks: python -m pip install "changepoint-doctor[notebooks]==0.0.3"
  • parity: python -m pip install "changepoint-doctor[parity]==0.0.3"
  • dev: python -m pip install "changepoint-doctor[dev]==0.0.3"

plot/notebooks/parity extras only install optional Python tooling. They do not toggle Rust compile-time features. Rust features are set when building the extension (for example maturin develop --features preprocess,serde ...).

Install/import naming: install with python -m pip install changepoint-doctor, then import with import cpd in Python. Optional compatibility alias: import changepoint_doctor as cpd.

API Map

  • cpd.Pelt: high-level PELT detector.
  • cpd.Binseg: high-level Binary Segmentation detector.
  • cpd.Fpop: high-level FPOP detector (L2 cost only).
  • cpd.detect_offline: low-level API for explicit detector/cost/constraints/stopping/preprocess selection, including detector="segneigh" (exact fixed-K DP; dynp alias supported).
  • cpd.OfflineChangePointResult: typed result object with breakpoints and diagnostics.

Streaming update() vs update_many() Policy

update_many() now uses a size-aware GIL strategy in Rust bindings:

  • Workloads with < 16 scalar work items (n * d) keep the GIL (lower overhead for tiny micro-batches).
  • Workloads with >= 16 scalar work items (n * d) release the GIL (py.allow_threads) for throughput and thread fairness.

To reproduce the benchmark snapshot used for this policy:

cd cpd/python
python -m pip install --upgrade ".[dev]"
pytest -q tests/test_streaming_perf_contract.py

Optional controls:

  • CPD_PY_STREAMING_PERF_ENFORCE=1: enable stricter ratio gates.
  • CPD_PY_STREAMING_PERF_REPORT_OUT=/tmp/cpd-python-streaming-perf.json: write JSON metrics.

The perf contract uses median latency with outlier-triggered retry rounds to reduce scheduler-noise flakiness.

Reference run (local dev machine, tests/test_streaming_perf_contract.py, median ms):

Batch size update() median ms update_many() median ms update_many() speedup vs update()
1 0.0035 0.0097 0.36x
8 0.0177 0.0194 0.91x
16 0.0356 0.0310 1.15x
64 0.1308 0.0891 1.47x
4096 7.8216 4.4616 1.75x

Masking Risk Guidance

If BinSeg diagnostics indicate masking risk (for example warnings that closely spaced weaker changes may be hidden), prefer Wild Binary Segmentation (WBS) in Rust/offline flows (cpd-offline::Wbs) for stronger recovery.

Python high-level APIs expose cpd.Pelt, cpd.Binseg, and cpd.Fpop. WBS and SegNeigh are not yet exposed as Python high-level detector classes; use detect_offline(...).

Quickstart

See QUICKSTART.md for a full walkthrough.

Reproducibility Modes

detect_offline(..., repro_mode=...) supports strict, balanced (default), and fast. For deterministic contracts, cross-platform expectations, and tolerance gates, see ../docs/reproducibility_modes.md.

Result JSON Contract

OfflineChangePointResult.to_json() / OfflineChangePointResult.from_json(...) follow the versioned contract in ../docs/result_json_contract.md, with the canonical schema marker at diagnostics.schema_version. When available, build provenance is emitted under diagnostics.build (for Python adapters this includes ABI and enabled feature context).

In 0.x, schema compatibility follows the bounded version window documented in ../VERSIONING.md: readers accept only supported schema-marker versions (currently 1..=2 for offline result fixtures).

Serialization + plotting workflow:

import numpy as np
import cpd

x = np.concatenate([
    np.zeros(40, dtype=np.float64),
    np.full(40, 8.0, dtype=np.float64),
    np.full(40, -4.0, dtype=np.float64),
])

pelt = cpd.Pelt(model="l2").fit(x).predict(n_bkps=2)
binseg = cpd.Binseg(model="l2").fit(x).predict(n_bkps=2)
fpop = cpd.Fpop(min_segment_len=2).fit(x).predict(n_bkps=2)
low = cpd.detect_offline(
    x,
    detector="pelt",
    cost="l2",
    constraints={"min_segment_len": 2},
    stopping={"n_bkps": 2},
)
segneigh = cpd.detect_offline(
    x,
    detector="segneigh",  # 'dynp' alias also supported
    cost="l2",
    constraints={"min_segment_len": 2},
    stopping={"n_bkps": 2},
)

payload = pelt.to_json()
restored = cpd.OfflineChangePointResult.from_json(payload)
assert restored.breakpoints == pelt.breakpoints

try:
    fig = restored.plot(x, title="Detected breakpoints")
except ImportError:
    # Plotting remains optional.
    # Install with: python -m pip install "changepoint-doctor[plot]==0.0.3"
    fig = None

Compatibility + limitations:

  • from_json(...) accepts only supported schema markers (diagnostics.schema_version, currently 1..=2 in 0.x).
  • to_json() writes the current schema marker (currently 1) and preserves additive unknown fields when round-tripping payloads.
  • plot() requires optional plotting dependencies (changepoint-doctor[plot]).
  • plot(values=None, ...) requires per-segment summaries in the result; if segments are unavailable, pass explicit values.
  • plot(ax=...) is supported only for univariate data (diagnostics.d == 1).

These paths are smoke-tested in CI in tests/test_integration_mvp_a.py, including fixture compatibility checks and example-script execution.

Stopping and Penalty Guide

Ruptures-compatible naming is supported in Python:

  • n_bkps: exact number of change points (Stopping::KnownK)
  • pen: manual penalty scalar (Stopping::Penalized(Penalty::Manual(...)))
  • min_segment_len: minimum segment size (Constraints.min_segment_len)

When to use each stopping style:

  • n_bkps (KnownK): use when you know the expected number of changes and need an exact count.
  • pen="bic": good default when you want automatic model-selection behavior that scales with sample size.
  • pen="aic": less conservative than BIC; can recover weaker changes but may over-segment noisy data.
  • pen=<float>: use when you need tight operational control over sensitivity (lower finds more changes, higher finds fewer).
  • stopping={"PenaltyPath": [...]} (pipeline serde form): request multiple penalties in one PELT sweep and inspect diagnostics notes for each path entry.

BIC/AIC complexity terms are model-aware by default:

  • l2 uses params_per_segment=2 (mean + residual variance proxy)
  • normal uses params_per_segment=3 (mean + variance + residual term)
  • normal_full_cov uses model-aware effective complexity for BIC/AIC: 1 + d + d(d+1)/2 (mean vector + full covariance + residual term)

Advanced users can still override params_per_segment in low-level pipeline detector config.

SegNeigh Sizing Guide (detector="segneigh" / "dynp")

SegNeigh is exact dynamic programming for fixed-k segmentation (n_bkps / KnownK).

  • Let m be the effective candidate count after constraints (jump, candidate_splits, min_segment_len filtering).
  • Expected scaling is approximately:
    • runtime: O(k * m^2)
    • memory: O(k * m + m)
  • Practical guidance:
    • Use SegNeigh when k is known and m is modest.
    • Increase jump and/or min_segment_len first when runtime or memory is high.
    • Prefer pelt/fpop when k is unknown or when very large n requires penalty-based model selection.

Reproducible local benchmark harness for representative (n, k) regimes:

cd cpd
cargo bench -p cpd-bench --bench offline_segneigh

Preprocess Config Contract

detect_offline(..., preprocess=...) validates keys and method payloads. Unknown preprocess stage keys fail with ValueError. Default PyPI wheels include preprocess support.

Canonical shape:

preprocess = {
    "detrend": {"method": "linear"},  # or {"method": "polynomial", "degree": 2}
    "deseasonalize": {"method": "differencing", "period": 2},  # or method="stl_like" (period >= 2)
    "winsorize": {"lower_quantile": 0.05, "upper_quantile": 0.95},  # optional fields
    "robust_scale": {"mad_epsilon": 1e-9, "normal_consistency": 1.4826},  # optional fields
}

Validation details:

  • detrend.method: "linear" or "polynomial" (degree required for polynomial).
  • deseasonalize.method: "differencing" (period >= 1) or "stl_like" (period >= 2).
  • winsorize: defaults to lower_quantile=0.01, upper_quantile=0.99 when omitted.
  • robust_scale: defaults to mad_epsilon=1e-9, normal_consistency=1.4826 when omitted.

Example Scripts

  • examples/synthetic_signal.py: synthetic step-function detection with all MVP-A APIs.
  • examples/csv_detect.py: detect breakpoints from a CSV column.
  • examples/plot_breakpoints.py: render detected breakpoints over a synthetic signal.

Run from repo root:

cpd/python/.venv/bin/python cpd/python/examples/synthetic_signal.py
cpd/python/.venv/bin/python cpd/python/examples/csv_detect.py --csv /path/to/data.csv --column 0
cpd/python/.venv/bin/python cpd/python/examples/plot_breakpoints.py --out /tmp/cpd_breakpoints.png

Notebook Examples

  • examples/notebooks/01_offline_algorithms.ipynb: quick comparison of offline detectors (Pelt, Binseg, Fpop, segneigh, and pipeline-form wbs).
  • examples/notebooks/02_online_algorithms.ipynb: streaming workflows for Bocpd, Cusum, and PageHinkley.
  • examples/notebooks/03_doctor_recommendations.ipynb: doctor recommendation workflow with live CLI execution and snapshot fallback.
  • examples/notebooks/README.md: notebook launch instructions and workflow overview.

Launch from cpd/python:

python -m pip install --upgrade "changepoint-doctor[notebooks]==0.0.3"
jupyter lab

Ruptures Parity Suite

To run the differential parity suite locally:

cd cpd/python
python -m pip install --upgrade ".[parity]"
CPD_PARITY_PROFILE=smoke pytest -q tests/test_ruptures_parity.py
CPD_PARITY_PROFILE=full CPD_PARITY_REPORT_OUT=/tmp/cpd-parity-report.json pytest -q tests/test_ruptures_parity.py

See ../docs/parity_ruptures.md for corpus structure, tolerance rules, and CI thresholds.

BOCPD Bayesian Parity Suite

To run BOCPD parity against hildensia/bayesian_changepoint_detection (preferred pin with fallback):

cd cpd/python
python -m pip install --upgrade ".[parity]"
REF_REPO="https://github.com/hildensia/bayesian_changepoint_detection.git"
PREFERRED_REF="f3f8f03af0de7f4f98bd54c7ca0b5f6d0b0f6f8c"
python -m pip install "git+${REF_REPO}@${PREFERRED_REF}" || \
  python -m pip install "git+${REF_REPO}"
CPD_BOCPD_PARITY_PROFILE=smoke pytest -q tests/test_bocpd_bayesian_parity.py
CPD_BOCPD_PARITY_PROFILE=full CPD_BOCPD_PARITY_REPORT_OUT=/tmp/cpd-bocpd-parity-report.json pytest -q tests/test_bocpd_bayesian_parity.py

Extras Validation

Run the metadata sanity checks for optional extras:

cd cpd/python
pytest -q tests/test_optional_extras_contract.py

Optional install commands (one per workflow extra):

python -m pip install "changepoint-doctor[plot]==0.0.3"
python -m pip install "changepoint-doctor[notebooks]==0.0.3"
python -m pip install "changepoint-doctor[parity]==0.0.3"
python -m pip install "changepoint-doctor[dev]==0.0.3"

See ../docs/parity_bocpd_bayesian.md for comparison logic, corpus layout, and threshold gates.

Wheel CI Policy

Cross-platform wheel hardening is enforced by ../../.github/workflows/wheel-build.yml and ../../.github/workflows/wheel-smoke.yml.

  • Build backend: cibuildwheel
  • Platforms:
    • Linux manylinux x86_64
    • macOS universal2 (validated on macos-13 and macos-14)
    • Windows amd64 (windows-2022)
  • Python matrix:
    • Full (main/nightly/tag): 3.9, 3.10, 3.11, 3.12, 3.13
    • Tiered (pull_request): representative subset with at least one 3.13 row
  • NumPy matrix:
    • 1.26.* and 2.*
    • 3.13 + numpy 1.26.* is excluded
  • Python 3.13 rows are marked experimental and soft-gated (continue-on-error)

Default wheels are BLAS-free by policy:

  • Native dependency reports are gated by ../../.github/scripts/wheel_dependency_gate.py using auditwheel (Linux), delocate (macOS), and delvewheel (Windows).
  • Runtime smoke asserts low.diagnostics.blas_backend is None for default wheel installs.

Troubleshooting

  1. TypeError: expected float32 or float64 Cause: integer/object arrays are passed into .fit(...) or detect_offline(...). Fix: cast first, e.g. x = np.asarray(x, dtype=np.float64).

  2. Input contains NaN/missing values and detection fails Cause: MVP-A Python APIs reject missing values under MissingPolicy::Error. Fix: impute/drop NaNs before calling detectors.

  3. RuntimeError: fit(...) must be called before predict(...) Cause: .predict(...) called on an unfitted high-level detector. Fix: always call .fit(x) first.

  4. Extension import fails after Rust/Python upgrade Cause: wheel/extension built against a different interpreter environment. Fix: rebuild via maturin develop --release in the active environment.

  5. Apple Silicon linker mismatch (arm64 vs x86_64) Cause: host shell/interpreter/libpython architectures do not match. Fix: follow ../docs/python_apple_silicon_toolchain.md to verify architecture and run the CI-aligned local sanity flow.

API Reference Outline

  • Pelt(model="l2"|"normal"|"normal_full_cov", min_segment_len, jump, max_change_points)
    • .fit(x) -> detector
    • .predict(pen=..., n_bkps=...) -> OfflineChangePointResult
  • Binseg(model="l2"|"normal"|"normal_full_cov", min_segment_len, jump, max_change_points, max_depth)
    • .fit(x) -> detector
    • .predict(pen=..., n_bkps=...) -> OfflineChangePointResult
  • Fpop(min_segment_len, jump, max_change_points) (l2 only)
    • .fit(x) -> detector
    • .predict(pen=..., n_bkps=...) -> OfflineChangePointResult
  • detect_offline(x, pipeline=None, detector, cost, constraints, stopping, preprocess, repro_mode, return_diagnostics)
    • detector accepts pelt, binseg, fpop, or segneigh (dynp alias). fpop requires cost="l2".
    • segneigh is exact fixed-K dynamic programming (best when stopping is n_bkps/KnownK); runtime/memory can grow quickly on large n and high k.
    • cost accepts l1_median, l2, normal, normal_full_cov, and (pipeline-only) nig.
    • pipeline accepts both simplified Python dicts (for example {"detector": {"kind": "segneigh"}}) and Rust PipelineSpec serde shape (for example {"detector": {"Offline": {"SegNeigh": {...}}}, ...}).
  • OfflineChangePointResult
    • fields: breakpoints, change_points, scores, segments, diagnostics
    • helpers: to_json(), from_json(payload), plot(values=None, *, ax=None, title=...)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

changepoint_doctor-0.0.3.tar.gz (303.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

changepoint_doctor-0.0.3-cp39-abi3-win_amd64.whl (924.0 kB view details)

Uploaded CPython 3.9+Windows x86-64

changepoint_doctor-0.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

changepoint_doctor-0.0.3-cp39-abi3-macosx_11_0_arm64.whl (986.2 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file changepoint_doctor-0.0.3.tar.gz.

File metadata

  • Download URL: changepoint_doctor-0.0.3.tar.gz
  • Upload date:
  • Size: 303.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for changepoint_doctor-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7b76b264e6099bd8416d7df44d94551cca2529f97a3ee606a36398f2d85b4448
MD5 310760435274d47ef94c6479d117f7c8
BLAKE2b-256 a2936340c9e93382f845efbb0b378e6dee4302bc9272a6d2371da4c11de16467

See more details on using hashes here.

Provenance

The following attestation bundles were made for changepoint_doctor-0.0.3.tar.gz:

Publisher: release.yml on xang1234/changepoint-doctor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file changepoint_doctor-0.0.3-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for changepoint_doctor-0.0.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5b8ab8cd3a674b02321a4b7bd346f8b7611f6d0f8b51a90bdf83152e68e6b72e
MD5 1f0f94331786073900f24a55f83b2b1d
BLAKE2b-256 4ba2f20ddfa43dd31b7900c5e997ae8599ccd88e655e815817a1f934c5b10f91

See more details on using hashes here.

Provenance

The following attestation bundles were made for changepoint_doctor-0.0.3-cp39-abi3-win_amd64.whl:

Publisher: release.yml on xang1234/changepoint-doctor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file changepoint_doctor-0.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for changepoint_doctor-0.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fe80e22a282597b7d600a56e683d4916edd74ac037b1af9b4cde7d529fb525aa
MD5 69dcc8984ae353be443d034a969f923a
BLAKE2b-256 7a944fc98276d1fa0af18dcd4cf80dc266fab15dcab9c4d03ea8a0974fe73201

See more details on using hashes here.

Provenance

The following attestation bundles were made for changepoint_doctor-0.0.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on xang1234/changepoint-doctor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file changepoint_doctor-0.0.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for changepoint_doctor-0.0.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3e5791e3ced0eda68ed6f501ade48489550cd3e6c65ad4b7e993166400f8eb5a
MD5 dc17814a772cc1b72825f073e648987b
BLAKE2b-256 13b064600c227a22a04675acfe11594e3d70a39a7c31b21573bb81d7fbf1f75f

See more details on using hashes here.

Provenance

The following attestation bundles were made for changepoint_doctor-0.0.3-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on xang1234/changepoint-doctor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page