Skip to main content

Bi-Persistence Clustering for Applications with Noise.

Project description

PyPI version Tests

Bi-Persistence Clustering of Applications with Noise

biperscan adapts HDBSCAN* to extract clusters from bi-filtrations over a distance scale and an centrality scale (other lens-dimensions are untested but might work as well). This type of clustering is particularly useful for detecting (lower-density) branches in a datasets. Such branches are difficult to detect with clustering algorithms because they are connected to a central core with short distances. In other words, there is no gap or low-density region between branches and the central core, so distance-based clustering algorithms cannot detect them. The bi-filtration effectively introduces a gap between the branches by filtering out points with a varying centrality threshold, allowing the branches to be detected as separate connected components (i.e., clusters).

While biperscan is implemented to be fast, it does not scale nicely with data size. For practical applications, we instead recommend pyflasc: our more efficient branch & cluster detection algorithm. The main difference is that pyflasc first extracts HDBSCAN clusters and then extracts branches within the clusters, rather than trying to detect both at the same time. This results in two fast filtrations, instead of one expensive bi-filtration.

How to use BPSCAN

biperscan's API based on the hdbscan package and supports a similar API:

import numpy as np
import matplotlib.pyplot as plt

from biperscan import BPSCAN

data = np.load("./notebooks/data/flared/flared_clusterable_data.npy")

clusterer = BPSCAN(
    lens='negative_distance_to_median', # the lens function to use
    metric='euclidean',                 # same as in HDBSCAN
    min_samples=20,                     # same as in HDBSCAN
    min_cluster_size=80,                # same as in HDBSCAN
    distance_fraction=0.05,             # suppress noise at lower values
).fit(data)

plt.figure()
plt.scatter(
    *data.T, c=clusterer.labels_ % 20, s=5, alpha=0.5, 
    edgecolor="none", cmap="tab20", vmin=0, vmax=19
)
plt.axis("off")
plt.show()

scatterplot

The labelling can be re-computed with a different depth limit using clusterer.labels_at_depth(2) or accessing the clusterer.membership_ property which lists all detected (but overlapping) subgroups. biperscan does not (yet) support weighted cluster membership.

Example Notebooks

A notebook demonstrating how the algorithm works is available at How BPSCAN Works. The other notebooks demonstrate the algorithm on several data sets and contain additional analyses.

Installing

Binary wheels (for x86_64) are available on PyPI. Presuming you have an up-to-date pip:

pip install biperscan

For a manual install of the latest code directly from GitHub:

pip install --upgrade git+https://github.com/vda-lab/biperscan.git#egg=biperscan

The repository contains C++23 code so a suitable compiler is required to install from source. The code is tested to compile with MSVC (build tools version 17), GCC-14 and clang-19.

Citing

To credit this software please cite the Zenodo DOI: ... (TODO).

Licensing

The biperscan package has a 3-Clause BSD license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biperscan-0.1.0.tar.gz (36.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

biperscan-0.1.0-cp312-cp312-win_amd64.whl (174.5 kB view details)

Uploaded CPython 3.12Windows x86-64

biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

biperscan-0.1.0-cp311-cp311-win_amd64.whl (173.5 kB view details)

Uploaded CPython 3.11Windows x86-64

biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

biperscan-0.1.0-cp310-cp310-win_amd64.whl (173.4 kB view details)

Uploaded CPython 3.10Windows x86-64

biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file biperscan-0.1.0.tar.gz.

File metadata

  • Download URL: biperscan-0.1.0.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for biperscan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bbc84bf39586ae666398dcecb2c4a5f16ec7ce90ffd3e606f1bbd08a061c48b2
MD5 c230888c5cacf7cfc2d48ec14087bd7a
BLAKE2b-256 43a909f461a9b98a7dbbc988194b041db5d443b979abcfe78812f69e584efbd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0.tar.gz:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biperscan-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: biperscan-0.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 174.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for biperscan-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7079ba7c696aed30709b84fe202b445013529e26e5c77b79b701750fc0e1fbbd
MD5 c1165e1c45ce7d7218dc925a2c62f62f
BLAKE2b-256 b3f241804cc017b768146eecab13e8c8870e7cb5df1a95e624684ecbbe4441d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0-cp312-cp312-win_amd64.whl:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 567df01fab755ce8a6f6fab5583c9a2904db0a1a01a910816939a84a5e7e805b
MD5 21cb7f5c7fa2d34a872ca957f9c2fd49
BLAKE2b-256 18ef92471a7b62dfbc62da8526ffd402cc16a921a955b003d135c11b5f7b41cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biperscan-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: biperscan-0.1.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 173.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for biperscan-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cbd2279ed7a8bf2bcc0c2e4f2fd8a7a0a2de45c1b5c6ae4a1234b3943430153f
MD5 3d74bdc65fcce879c191e95039ca374f
BLAKE2b-256 cec3f699c027450cb8b6f5a1dfb396f9bb199f2c44cd60e41b518133e0822567

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0-cp311-cp311-win_amd64.whl:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5f8e6a2ae0f616cadb684341f68b7aa638dea59f482803b73c932ace5958b92
MD5 1c4646821c2b1c4eb307aa65da8d34cd
BLAKE2b-256 7a8f71edae1e827689b75d6f87dd968a8e7ad497f315b7e4c4d84a318d807374

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biperscan-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: biperscan-0.1.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 173.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for biperscan-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 039e4f3b8aa361b21f04e305ccbe9e4e3c9365edf4c1024ecd0c266c7374175f
MD5 a78a0b22f1d4b50a8626907347338e23
BLAKE2b-256 d1b063a9521cd6ef0cd18ba62831a355e7accc3cf22b8d87606c45fb64dc56d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0-cp310-cp310-win_amd64.whl:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 34062563d6c036e5097c99f559b75da6dca5f1bb88d918d4dae9fb5ca0f53bed
MD5 6111364b4351bc987e54442df878b4c8
BLAKE2b-256 8706aeefb73ebf4782d07c4e09e4a67b593fd46820abda69dd29c209f7ca4cda

See more details on using hashes here.

Provenance

The following attestation bundles were made for biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: Publish.yml on vda-lab/biperscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page