Fast change-point detection bindings backed by Rust.

These details have not been verified by PyPI

Project links

Project description

changepoint-doctor Python Bindings (MVP-A)

changepoint-doctor exposes fast offline change-point detection from Rust into Python.

For citation and provenance policy, see ../CITATION.cff and ../docs/clean_room_policy.md.

Install

From PyPI (target release 0.0.1):

python -m pip install --upgrade pip
python -m pip install changepoint-doctor==0.0.1

For local development from this repository:

cd cpd/python
python -m pip install --upgrade pip maturin numpy
maturin develop --release --manifest-path ../crates/cpd-python/Cargo.toml

Apple Silicon contributors should run the architecture checks and sanity path in ../docs/python_apple_silicon_toolchain.md before debugging pyo3/linker errors.

API Map

cpd.Pelt: high-level PELT detector.
cpd.Binseg: high-level Binary Segmentation detector.
cpd.Fpop: high-level FPOP detector (L2 cost only).
cpd.detect_offline: low-level API for explicit detector/cost/constraints/stopping/preprocess selection, including detector="segneigh" (exact fixed-K DP; dynp alias supported).
cpd.OfflineChangePointResult: typed result object with breakpoints and diagnostics.

Streaming `update()` vs `update_many()` Policy

update_many() now uses a size-aware GIL strategy in Rust bindings:

Workloads with < 16 scalar work items (n * d) keep the GIL (lower overhead for tiny micro-batches).
Workloads with >= 16 scalar work items (n * d) release the GIL (py.allow_threads) for throughput and thread fairness.

To reproduce the benchmark snapshot used for this policy:

cd cpd/python
python -m pip install --upgrade pytest
pytest -q tests/test_streaming_perf_contract.py

Optional controls:

CPD_PY_STREAMING_PERF_ENFORCE=1: enable stricter ratio gates.
CPD_PY_STREAMING_PERF_REPORT_OUT=/tmp/cpd-python-streaming-perf.json: write JSON metrics.

The perf contract uses median latency with outlier-triggered retry rounds to reduce scheduler-noise flakiness.

Reference run (local dev machine, tests/test_streaming_perf_contract.py, median ms):

Batch size	`update()` median ms	`update_many()` median ms	`update_many()` speedup vs `update()`
1	0.0035	0.0097	0.36x
8	0.0177	0.0194	0.91x
16	0.0356	0.0310	1.15x
64	0.1308	0.0891	1.47x
4096	7.8216	4.4616	1.75x

Masking Risk Guidance

If BinSeg diagnostics indicate masking risk (for example warnings that closely spaced weaker changes may be hidden), prefer Wild Binary Segmentation (WBS) in Rust/offline flows (cpd-offline::Wbs) for stronger recovery.

Python high-level APIs expose cpd.Pelt, cpd.Binseg, and cpd.Fpop. WBS and SegNeigh are not yet exposed as Python high-level detector classes; use detect_offline(...).

Quickstart

See QUICKSTART.md for a full walkthrough.

Reproducibility Modes

detect_offline(..., repro_mode=...) supports strict, balanced (default), and fast. For deterministic contracts, cross-platform expectations, and tolerance gates, see ../docs/reproducibility_modes.md.

Result JSON Contract

OfflineChangePointResult.to_json() / OfflineChangePointResult.from_json(...) follow the versioned contract in ../docs/result_json_contract.md, with the canonical schema marker at diagnostics.schema_version.

In 0.x, schema compatibility follows the bounded version window documented in ../VERSIONING.md: readers accept only supported schema-marker versions (currently 1..=2 for offline result fixtures).

Serialization + plotting workflow:

import numpy as np
import cpd

x = np.concatenate([
    np.zeros(40, dtype=np.float64),
    np.full(40, 8.0, dtype=np.float64),
    np.full(40, -4.0, dtype=np.float64),
])

pelt = cpd.Pelt(model="l2").fit(x).predict(n_bkps=2)
binseg = cpd.Binseg(model="l2").fit(x).predict(n_bkps=2)
fpop = cpd.Fpop(min_segment_len=2).fit(x).predict(n_bkps=2)
low = cpd.detect_offline(
    x,
    detector="pelt",
    cost="l2",
    constraints={"min_segment_len": 2},
    stopping={"n_bkps": 2},
)
segneigh = cpd.detect_offline(
    x,
    detector="segneigh",  # 'dynp' alias also supported
    cost="l2",
    constraints={"min_segment_len": 2},
    stopping={"n_bkps": 2},
)

payload = pelt.to_json()
restored = cpd.OfflineChangePointResult.from_json(payload)
assert restored.breakpoints == pelt.breakpoints

try:
    fig = restored.plot(x, title="Detected breakpoints")
except ImportError:
    # Plotting remains optional.
    # Install with: python -m pip install matplotlib
    fig = None

Compatibility + limitations:

from_json(...) accepts only supported schema markers (diagnostics.schema_version, currently 1..=2 in 0.x).
to_json() writes the current schema marker (currently 1) and preserves additive unknown fields when round-tripping payloads.
plot() requires optional matplotlib.
plot(values=None, ...) requires per-segment summaries in the result; if segments are unavailable, pass explicit values.
plot(ax=...) is supported only for univariate data (diagnostics.d == 1).

These paths are smoke-tested in CI in tests/test_integration_mvp_a.py, including fixture compatibility checks and example-script execution.

Stopping and Penalty Guide

Ruptures-compatible naming is supported in Python:

n_bkps: exact number of change points (Stopping::KnownK)
pen: manual penalty scalar (Stopping::Penalized(Penalty::Manual(...)))
min_segment_len: minimum segment size (Constraints.min_segment_len)

When to use each stopping style:

n_bkps (KnownK): use when you know the expected number of changes and need an exact count.
pen="bic": good default when you want automatic model-selection behavior that scales with sample size.
pen="aic": less conservative than BIC; can recover weaker changes but may over-segment noisy data.
pen=<float>: use when you need tight operational control over sensitivity (lower finds more changes, higher finds fewer).
stopping={"PenaltyPath": [...]} (pipeline serde form): request multiple penalties in one PELT sweep and inspect diagnostics notes for each path entry.

BIC/AIC complexity terms are model-aware by default:

l2 uses params_per_segment=2 (mean + residual variance proxy)
normal uses params_per_segment=3 (mean + variance + residual term)
normal_full_cov uses model-aware effective complexity for BIC/AIC: 1 + d + d(d+1)/2 (mean vector + full covariance + residual term)

Advanced users can still override params_per_segment in low-level pipeline detector config.

SegNeigh Sizing Guide (`detector="segneigh"` / `"dynp"`)

SegNeigh is exact dynamic programming for fixed-k segmentation (n_bkps / KnownK).

Let m be the effective candidate count after constraints (jump, candidate_splits, min_segment_len filtering).
Expected scaling is approximately:
- runtime: O(k * m^2)
- memory: O(k * m + m)
Practical guidance:
- Use SegNeigh when k is known and m is modest.
- Increase jump and/or min_segment_len first when runtime or memory is high.
- Prefer pelt/fpop when k is unknown or when very large n requires penalty-based model selection.

Reproducible local benchmark harness for representative (n, k) regimes:

cd cpd
cargo bench -p cpd-bench --bench offline_segneigh

Preprocess Config Contract

detect_offline(..., preprocess=...) validates keys and method payloads. Unknown preprocess stage keys fail with ValueError.

Canonical shape:

preprocess = {
    "detrend": {"method": "linear"},  # or {"method": "polynomial", "degree": 2}
    "deseasonalize": {"method": "differencing", "period": 2},  # or method="stl_like" (period >= 2)
    "winsorize": {"lower_quantile": 0.05, "upper_quantile": 0.95},  # optional fields
    "robust_scale": {"mad_epsilon": 1e-9, "normal_consistency": 1.4826},  # optional fields
}

Validation details:

detrend.method: "linear" or "polynomial" (degree required for polynomial).
deseasonalize.method: "differencing" (period >= 1) or "stl_like" (period >= 2).
winsorize: defaults to lower_quantile=0.01, upper_quantile=0.99 when omitted.
robust_scale: defaults to mad_epsilon=1e-9, normal_consistency=1.4826 when omitted.

Example Scripts

examples/synthetic_signal.py: synthetic step-function detection with all MVP-A APIs.
examples/csv_detect.py: detect breakpoints from a CSV column.
examples/plot_breakpoints.py: render detected breakpoints over a synthetic signal.

Run from repo root:

cpd/python/.venv/bin/python cpd/python/examples/synthetic_signal.py
cpd/python/.venv/bin/python cpd/python/examples/csv_detect.py --csv /path/to/data.csv --column 0
cpd/python/.venv/bin/python cpd/python/examples/plot_breakpoints.py --out /tmp/cpd_breakpoints.png

Ruptures Parity Suite

To run the differential parity suite locally (after installing ruptures in the active environment):

cd cpd/python
CPD_PARITY_PROFILE=smoke pytest -q tests/test_ruptures_parity.py
CPD_PARITY_PROFILE=full CPD_PARITY_REPORT_OUT=/tmp/cpd-parity-report.json pytest -q tests/test_ruptures_parity.py

See ../docs/parity_ruptures.md for corpus structure, tolerance rules, and CI thresholds.

Wheel CI Policy

Cross-platform wheel hardening is enforced by ../../.github/workflows/wheel-build.yml and ../../.github/workflows/wheel-smoke.yml.

Build backend: cibuildwheel
Platforms:
- Linux manylinux x86_64
- macOS universal2 (validated on macos-13 and macos-14)
- Windows amd64 (windows-2022)
Python matrix:
- Full (main/nightly/tag): 3.9, 3.10, 3.11, 3.12, 3.13
- Tiered (pull_request): representative subset with at least one 3.13 row
NumPy matrix:
- 1.26.* and 2.*
- 3.13 + numpy 1.26.* is excluded
Python 3.13 rows are marked experimental and soft-gated (continue-on-error)

Default wheels are BLAS-free by policy:

Native dependency reports are gated by ../../.github/scripts/wheel_dependency_gate.py using auditwheel (Linux), delocate (macOS), and delvewheel (Windows).
Runtime smoke asserts low.diagnostics.blas_backend is None for default wheel installs.

Troubleshooting

TypeError: expected float32 or float64 Cause: integer/object arrays are passed into .fit(...) or detect_offline(...). Fix: cast first, e.g. x = np.asarray(x, dtype=np.float64).
Input contains NaN/missing values and detection fails Cause: MVP-A Python APIs reject missing values under MissingPolicy::Error. Fix: impute/drop NaNs before calling detectors.
RuntimeError: fit(...) must be called before predict(...) Cause: .predict(...) called on an unfitted high-level detector. Fix: always call .fit(x) first.
Extension import fails after Rust/Python upgrade Cause: wheel/extension built against a different interpreter environment. Fix: rebuild via maturin develop --release in the active environment.
Apple Silicon linker mismatch (arm64 vs x86_64) Cause: host shell/interpreter/libpython architectures do not match. Fix: follow ../docs/python_apple_silicon_toolchain.md to verify architecture and run the CI-aligned local sanity flow.

API Reference Outline

Pelt(model="l2"|"normal"|"normal_full_cov", min_segment_len, jump, max_change_points)
- .fit(x) -> detector
- .predict(pen=..., n_bkps=...) -> OfflineChangePointResult
Binseg(model="l2"|"normal"|"normal_full_cov", min_segment_len, jump, max_change_points, max_depth)
- .fit(x) -> detector
- .predict(pen=..., n_bkps=...) -> OfflineChangePointResult
Fpop(min_segment_len, jump, max_change_points) (l2 only)
- .fit(x) -> detector
- .predict(pen=..., n_bkps=...) -> OfflineChangePointResult
detect_offline(x, pipeline=None, detector, cost, constraints, stopping, preprocess, repro_mode, return_diagnostics)
- detector accepts pelt, binseg, fpop, or segneigh (dynp alias). fpop requires cost="l2".
- segneigh is exact fixed-K dynamic programming (best when stopping is n_bkps/KnownK); runtime/memory can grow quickly on large n and high k.
- cost accepts l1_median, l2, normal, normal_full_cov, and (pipeline-only) nig.
- pipeline accepts both simplified Python dicts (for example {"detector": {"kind": "segneigh"}}) and Rust PipelineSpec serde shape (for example {"detector": {"Offline": {"SegNeigh": {...}}}, ...}).
OfflineChangePointResult
- fields: breakpoints, change_points, scores, segments, diagnostics
- helpers: to_json(), from_json(payload), plot(values=None, *, ax=None, title=...)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3

Feb 19, 2026

0.0.2

Feb 17, 2026

This version

0.0.1

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

changepoint_doctor-0.0.1.tar.gz (282.9 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

changepoint_doctor-0.0.1-cp39-abi3-macosx_10_12_x86_64.whl (998.5 kB view details)

Uploaded Feb 17, 2026 CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file changepoint_doctor-0.0.1.tar.gz.

File metadata

Download URL: changepoint_doctor-0.0.1.tar.gz
Upload date: Feb 17, 2026
Size: 282.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for changepoint_doctor-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0ca143aaaba9b9ee23a31a4aa622605744f2d336c191454214c7580f1fc46463`
MD5	`ed0f51403a8d7ca7e57663c9e9d7c6b1`
BLAKE2b-256	`feebf337744eabe42d3ee58f8640cffc2bf2ce9ff46ff35b78a815da3fd348bc`

See more details on using hashes here.

File details

Details for the file changepoint_doctor-0.0.1-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: changepoint_doctor-0.0.1-cp39-abi3-macosx_10_12_x86_64.whl
Upload date: Feb 17, 2026
Size: 998.5 kB
Tags: CPython 3.9+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for changepoint_doctor-0.0.1-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`31d5ca1e9c97b4287250c72964a34fc3ab056592bc8f2671604f8eaa807b4c28`
MD5	`afa371b16b96311298f1461b08314642`
BLAKE2b-256	`d7da51982288ecaba6e7fdb6e3988a15190bfb7500078c88cb8d156b6f871def`

See more details on using hashes here.

changepoint-doctor 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

changepoint-doctor Python Bindings (MVP-A)

Install

API Map

Streaming update() vs update_many() Policy

Masking Risk Guidance

Quickstart

Reproducibility Modes

Result JSON Contract

Stopping and Penalty Guide

SegNeigh Sizing Guide (detector="segneigh" / "dynp")

Preprocess Config Contract

Example Scripts

Ruptures Parity Suite

Wheel CI Policy

Troubleshooting

API Reference Outline

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Streaming `update()` vs `update_many()` Policy

SegNeigh Sizing Guide (`detector="segneigh"` / `"dynp"`)