Hierarchical Boundary Coefficient Clustering.
Project description
Hierarchical Boundary Coefficient Clustering
HBCC is a clustering algorithm is inspired by the work of Peng et al. [4]. Their CDC algorithm detects regions that separate clusters by quantifying how spread out points' neighbors are in feature space. The intuition for this process is as follows: (1) points within a clusters have near neighbors on all sides, and (2) points in-between clusters or on the edge of a cluster predominantly have neighbors in the direction of the cluster's core.
The same intuition was previously used by Vandaele et al. [2] to quantify how
close points are to the boundary of the data's manifold. They used their
boundary coefficient to extract manifold skeletons.
HBCC combines Vandaele et al. [2]'s boundary coefficient with HDBSCAN's [1], [3]
principled density-based cluster selection strategies. This approach removes
CDC's ratio parameter. Instead a clustering hierarchy is evaluated over all
boundary coefficient values and clusters are selected from the hierarchy.
The implementation relies on McInnes' fast_hdbscan and its support for lensed clustering introduced for Bot et al. [5] to detect branches.
The HBCC algorithm works as follows:
- Compute k nearest neighbors and minimum spanning tree [3].
- Compute n-hop distances from nearest neighbors [2]
- Compute the boundary coefficient for each point [2].
- Compute minimum spanning tree from boundary coefficient weighted knn--mst graph union [5].
- Compute HDBSCAN cluster hierarchy and selection [1].
How to use HBCC
import numpy as np
import matplotlib.pyplot as plt
from fast_hbcc import HBCC
data = np.load('data/flareable/flared_clusterable_data.npy')
c = HBCC(
num_hops=4,
min_samples=10,
min_cluster_size=100,
cluster_selection_method="leaf",
).fit(data)
plt.scatter(*data.T, c=c.labels_ % 10, s=2, cmap='tab10', vmin=0, vmax=9)
plt.axis("off")
plt.show()
Example Notebooks
A notebook demonstrating how the algorithm works is available on nbviewer.
Citing
Citing information not available yet.
Licensing
The fast_hbcc package has a 2-Clause BSD license.
References
-
[1] Campello, R. J., Moulavi, D., & Sander, J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 160-172). Springer Berlin Heidelberg.
-
[2] Vandaele, R., Saeys, Y., & De Bie, T. (2019). The Boundary Coefficient : a Vertex Measure for Visualizing and Finding Structure in Weighted Graphs. 15th International Workshop on Mining and Learning with Graphs (MLG).
-
[3] McInnes, L., & Healy, J. (2017). Accelerated Hierarchical Density Based Clustering. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 2017-Novem, 33–42. https://doi.org/10.1109/ICDMW.2017.12.
-
[4] Peng, D., Gui, Z., Wang, D., Ma, Y., Huang, Z., Zhou, Y., & Wu, H. (2022). Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nature Communications, 13(1), 1–14. https://doi.org/10.1038/s41467-022-33136-9.
-
[5] Bot, Daniël M., et al. "FLASC: A Flare-Sensitive Clustering Algorithm: Extending HDBSCAN* for Detecting Branches in Clusters." arXiv preprint arXiv:2311.15887 (2023). https://arxiv.org/abs/2311.15887.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_hbcc-0.1.0.tar.gz.
File metadata
- Download URL: fast_hbcc-0.1.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
548a3f93221d87f2685f28c535d906dd5ff2d9d8381129358a1cd4c962ca6379
|
|
| MD5 |
a081e7717e4dc5099e420494023a4ef9
|
|
| BLAKE2b-256 |
ed0abac275dd1306aef202c50dcf67d148b164acfc2952064ee1fdf75aadc85c
|
File details
Details for the file fast_hbcc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fast_hbcc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
448de6f10ccc3a26f04e32e0402cc63d03f8b18e279819620e2e4ea572209ac5
|
|
| MD5 |
75590a40c2ac724233b09b2d94dc5118
|
|
| BLAKE2b-256 |
1a66d6b523b1fa6d877d92611e11dbce466c95f60fd5484307aea1a2f004181d
|