Bi-Persistence Clustering for Applications with Noise.
Project description
Bi-Persistence Clustering of Applications with Noise
biperscan adapts HDBSCAN* to extract clusters from bi-filtrations over a
distance scale and an centrality scale (other lens-dimensions are untested but
might work as well). This type of clustering is particularly useful for
detecting (lower-density) branches in a datasets. Such branches are difficult to
detect with clustering algorithms because they are connected to a central core
with short distances. In other words, there is no gap or low-density region
between branches and the central core, so distance-based clustering algorithms
cannot detect them. The bi-filtration effectively introduces a gap between the
branches by filtering out points with a varying centrality threshold, allowing
the branches to be detected as separate connected components (i.e., clusters).
While biperscan is implemented to be fast, it does not scale nicely with data
size. For practical applications, we instead recommend
pyflasc: our more efficient branch &
cluster detection algorithm. The main difference is that pyflasc first
extracts HDBSCAN clusters and then extracts branches within the clusters, rather
than trying to detect both at the same time. This results in two fast
filtrations, instead of one expensive bi-filtration.
How to use BPSCAN
biperscan's API based on the hdbscan
package and supports a similar API:
import numpy as np
import matplotlib.pyplot as plt
from biperscan import BPSCAN
data = np.load("./notebooks/data/flared/flared_clusterable_data.npy")
clusterer = BPSCAN(
lens='negative_distance_to_median', # the lens function to use
metric='euclidean', # same as in HDBSCAN
min_samples=20, # same as in HDBSCAN
min_cluster_size=80, # same as in HDBSCAN
distance_fraction=0.05, # suppress noise at lower values
).fit(data)
plt.figure()
plt.scatter(
*data.T, c=clusterer.labels_ % 20, s=5, alpha=0.5,
edgecolor="none", cmap="tab20", vmin=0, vmax=19
)
plt.axis("off")
plt.show()
The labelling can be re-computed with a different depth limit using
clusterer.labels_at_depth(2) or accessing the clusterer.membership_ property
which lists all detected (but overlapping) subgroups. biperscan does not (yet)
support weighted cluster membership.
Example Notebooks
A notebook demonstrating how the algorithm works is available at How BPSCAN Works. The other notebooks demonstrate the algorithm on several data sets and contain additional analyses.
Installing
Binary wheels (for x86_64) are available on PyPI. Presuming you have an up-to-date pip:
pip install biperscan
For a manual install of the latest code directly from GitHub:
pip install --upgrade git+https://github.com/vda-lab/biperscan.git#egg=biperscan
The repository contains C++23 code so a suitable compiler is required to install from source. The code is tested to compile with MSVC (build tools version 17), GCC-14 and clang-19.
Citing
To credit this software please cite the Zenodo DOI: ... (TODO).
Licensing
The biperscan package has a 3-Clause BSD license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biperscan-0.1.0.tar.gz.
File metadata
- Download URL: biperscan-0.1.0.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbc84bf39586ae666398dcecb2c4a5f16ec7ce90ffd3e606f1bbd08a061c48b2
|
|
| MD5 |
c230888c5cacf7cfc2d48ec14087bd7a
|
|
| BLAKE2b-256 |
43a909f461a9b98a7dbbc988194b041db5d443b979abcfe78812f69e584efbd5
|
Provenance
The following attestation bundles were made for biperscan-0.1.0.tar.gz:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0.tar.gz -
Subject digest:
bbc84bf39586ae666398dcecb2c4a5f16ec7ce90ffd3e606f1bbd08a061c48b2 - Sigstore transparency entry: 198045606
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file biperscan-0.1.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: biperscan-0.1.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 174.5 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7079ba7c696aed30709b84fe202b445013529e26e5c77b79b701750fc0e1fbbd
|
|
| MD5 |
c1165e1c45ce7d7218dc925a2c62f62f
|
|
| BLAKE2b-256 |
b3f241804cc017b768146eecab13e8c8870e7cb5df1a95e624684ecbbe4441d1
|
Provenance
The following attestation bundles were made for biperscan-0.1.0-cp312-cp312-win_amd64.whl:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0-cp312-cp312-win_amd64.whl -
Subject digest:
7079ba7c696aed30709b84fe202b445013529e26e5c77b79b701750fc0e1fbbd - Sigstore transparency entry: 198045621
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
567df01fab755ce8a6f6fab5583c9a2904db0a1a01a910816939a84a5e7e805b
|
|
| MD5 |
21cb7f5c7fa2d34a872ca957f9c2fd49
|
|
| BLAKE2b-256 |
18ef92471a7b62dfbc62da8526ffd402cc16a921a955b003d135c11b5f7b41cb
|
Provenance
The following attestation bundles were made for biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
567df01fab755ce8a6f6fab5583c9a2904db0a1a01a910816939a84a5e7e805b - Sigstore transparency entry: 198045618
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file biperscan-0.1.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: biperscan-0.1.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 173.5 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbd2279ed7a8bf2bcc0c2e4f2fd8a7a0a2de45c1b5c6ae4a1234b3943430153f
|
|
| MD5 |
3d74bdc65fcce879c191e95039ca374f
|
|
| BLAKE2b-256 |
cec3f699c027450cb8b6f5a1dfb396f9bb199f2c44cd60e41b518133e0822567
|
Provenance
The following attestation bundles were made for biperscan-0.1.0-cp311-cp311-win_amd64.whl:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0-cp311-cp311-win_amd64.whl -
Subject digest:
cbd2279ed7a8bf2bcc0c2e4f2fd8a7a0a2de45c1b5c6ae4a1234b3943430153f - Sigstore transparency entry: 198045619
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5f8e6a2ae0f616cadb684341f68b7aa638dea59f482803b73c932ace5958b92
|
|
| MD5 |
1c4646821c2b1c4eb307aa65da8d34cd
|
|
| BLAKE2b-256 |
7a8f71edae1e827689b75d6f87dd968a8e7ad497f315b7e4c4d84a318d807374
|
Provenance
The following attestation bundles were made for biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
e5f8e6a2ae0f616cadb684341f68b7aa638dea59f482803b73c932ace5958b92 - Sigstore transparency entry: 198045615
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file biperscan-0.1.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: biperscan-0.1.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 173.4 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
039e4f3b8aa361b21f04e305ccbe9e4e3c9365edf4c1024ecd0c266c7374175f
|
|
| MD5 |
a78a0b22f1d4b50a8626907347338e23
|
|
| BLAKE2b-256 |
d1b063a9521cd6ef0cd18ba62831a355e7accc3cf22b8d87606c45fb64dc56d4
|
Provenance
The following attestation bundles were made for biperscan-0.1.0-cp310-cp310-win_amd64.whl:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0-cp310-cp310-win_amd64.whl -
Subject digest:
039e4f3b8aa361b21f04e305ccbe9e4e3c9365edf4c1024ecd0c266c7374175f - Sigstore transparency entry: 198045608
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type:
File details
Details for the file biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.
File metadata
- Download URL: biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34062563d6c036e5097c99f559b75da6dca5f1bb88d918d4dae9fb5ca0f53bed
|
|
| MD5 |
6111364b4351bc987e54442df878b4c8
|
|
| BLAKE2b-256 |
8706aeefb73ebf4782d07c4e09e4a67b593fd46820abda69dd29c209f7ca4cda
|
Provenance
The following attestation bundles were made for biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:
Publisher:
Publish.yml on vda-lab/biperscan
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biperscan-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl -
Subject digest:
34062563d6c036e5097c99f559b75da6dca5f1bb88d918d4dae9fb5ca0f53bed - Sigstore transparency entry: 198045612
- Sigstore integration time:
-
Permalink:
vda-lab/biperscan@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/vda-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
Publish.yml@694b370c8361723f1d00fbbdfe5d9b04657112dd -
Trigger Event:
push
-
Statement type: