High-performance HDBSCAN clustering, compatible with scikit-learn
Project description
hdbscan-rs
A Rust implementation of HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Produces results compatible with scikit-learn's HDBSCAN, but runs significantly faster on large datasets thanks to a dual-tree Boruvka MST and tight pruning in native code.
Quick start
Add it to your project:
cargo add hdbscan-rs
Cluster some data:
use hdbscan_rs::{Hdbscan, HdbscanParams};
use ndarray::array;
let data = array![
[0.0, 0.0], [0.1, 0.0], [0.0, 0.1], [0.1, 0.1], [0.05, 0.05],
[10.0, 10.0], [10.1, 10.0], [10.0, 10.1], [10.1, 10.1], [10.05, 10.05],
];
let params = HdbscanParams { min_cluster_size: 3, ..Default::default() };
let mut hdbscan = Hdbscan::new(params);
let labels = hdbscan.fit_predict(&data.view()).unwrap();
// labels: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
Features
- sklearn-compatible output -labels, probabilities, outlier scores, and condensed tree all match the reference Python implementation (ARI > 0.99 across fixture suite)
- Fast -dual-tree Boruvka MST with per-node component caching, lazy sqrt, and closer-child-first traversal. Falls back to Prim's for non-Euclidean metrics or small datasets
- Approximate prediction -classify new points against a fitted model without re-clustering
- Cluster centers -optional centroid and/or medoid computation
- Five distance metrics -Euclidean, Manhattan, Cosine, Minkowski(p), or bring your own precomputed distance matrix
Performance
Measured on 2D blobs (5 clusters, min_cluster_size=10), single thread, best-of-N:
| n | Time |
|---|---|
| 500 | 1.8 ms |
| 1,000 | 5.0 ms |
| 2,000 | 7.9 ms |
| 5,000 | 24.1 ms |
| 10,000 | 54.5 ms |
| 20,000 | 114.4 ms |
| 50,000 | 290.2 ms |
The MST algorithm is selected automatically: dual-tree Boruvka for Euclidean data with n >= 128, Prim's otherwise. See BENCHMARKS.md for methodology and detailed results.
Parameters
| Parameter | Default | Description |
|---|---|---|
min_cluster_size |
5 | Smallest group that counts as a cluster |
min_samples |
None (= min_cluster_size) |
Controls density estimate; higher = more conservative |
metric |
Euclidean | Distance metric |
alpha |
1.0 | Mutual reachability scaling factor |
cluster_selection_epsilon |
0.0 | Merge clusters below this distance threshold |
cluster_selection_method |
Eom | Eom (Excess of Mass) or Leaf |
allow_single_cluster |
false | Permit the entire dataset to form one cluster |
store_centers |
None |
Compute Centroid, Medoid, or Both |
Richer output
After calling fit or fit_predict, you can access:
hdbscan.labels() // Option<&[i32]> -cluster labels (-1 = noise)
hdbscan.probabilities() // Option<&[f64]> -membership strength [0, 1]
hdbscan.outlier_scores() // Option<&[f64]> -GLOSH outlier scores [0, 1]
hdbscan.condensed_tree() // Option<&[CondensedTreeEdge]>
hdbscan.centroids() // Option<&Array2<f64>> -if store_centers was set
hdbscan.medoids() // Option<&Array2<f64>> -if store_centers was set
Precomputed distances
If you already have a distance matrix:
use hdbscan_rs::{Hdbscan, HdbscanParams, Metric};
let params = HdbscanParams {
min_cluster_size: 3,
metric: Metric::Precomputed,
..Default::default()
};
let mut hdbscan = Hdbscan::new(params);
let labels = hdbscan.fit_predict(&dist_matrix.view()).unwrap();
Testing
The test suite validates against scikit-learn fixtures (blobs, moons, circles, varying density, duplicates, precomputed matrices) and includes property-based invariant tests.
cargo test
# 71 tests, plus 2 optional large-scale tests (100K and 1M points):
cargo test -- --ignored
License
Licensed under either of Apache License, Version 2.0 or MIT License, at your option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hdbscan_rs-0.1.1.tar.gz.
File metadata
- Download URL: hdbscan_rs-0.1.1.tar.gz
- Upload date:
- Size: 587.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a389d5c8235660fdc12c8a8d565e14254c8cf1814d2fd46a4cb8acb935a5813
|
|
| MD5 |
fc07ce7d54c8fd8958be197e1adbe7d8
|
|
| BLAKE2b-256 |
39e596ba452546923d9b9d2efeaf529bc7ab4b7ab7954b940e9ab9b8fc25ef85
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1.tar.gz:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1.tar.gz -
Subject digest:
0a389d5c8235660fdc12c8a8d565e14254c8cf1814d2fd46a4cb8acb935a5813 - Sigstore transparency entry: 1088954866
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 416.7 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47c7052a8ce31c45da4d977f11cacc50e2481fdf22d8032efd0cffb4b70bb257
|
|
| MD5 |
2ae2782816c7e15bed8c723c681e9987
|
|
| BLAKE2b-256 |
ec34229319803325e6c9a696c6f93af70cd5cc75289c6cdc241570e548f357ef
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
47c7052a8ce31c45da4d977f11cacc50e2481fdf22d8032efd0cffb4b70bb257 - Sigstore transparency entry: 1088954906
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 400.2 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6608b6c07d9fad4e1cf9eac036327770da0dcdf5e722aacb5a0a60ab1dc5da59
|
|
| MD5 |
1db2454fef737028cee00ffa56f04367
|
|
| BLAKE2b-256 |
684ae2be871f7ea8a50180a6520f2cc5acf6193ab07579530ef94fdb37e13253
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
6608b6c07d9fad4e1cf9eac036327770da0dcdf5e722aacb5a0a60ab1dc5da59 - Sigstore transparency entry: 1088955151
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 259.2 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23573ce2b822ef45674556e927268b04c07db060585d16283f2dd35c78cc71cc
|
|
| MD5 |
8ba021664008f690993310b2c705ce05
|
|
| BLAKE2b-256 |
688abfb8b78c69ae6cb015d2ae9deafdbe1ba9f92bcbaba338e0f456166122cd
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl -
Subject digest:
23573ce2b822ef45674556e927268b04c07db060585d16283f2dd35c78cc71cc - Sigstore transparency entry: 1088955079
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 416.7 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35238d5e765863dfb67c1a2f9eca9adbc4b967ce3c4e37a7ba52a59196f7dba9
|
|
| MD5 |
1e8ffca99f583d0ddbfcfd56b848cdbe
|
|
| BLAKE2b-256 |
6a00c3bee5eff3ef4ecc3764aef49b8000834430b4a31a586f31e52a5b5aeadb
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
35238d5e765863dfb67c1a2f9eca9adbc4b967ce3c4e37a7ba52a59196f7dba9 - Sigstore transparency entry: 1088955111
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 400.1 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37c4552d0458011ad9825927dedefd3f81c7b4bb2b33d249a7b85a026dfa507a
|
|
| MD5 |
0b435cdde53b5a20bde38dd1859c2e67
|
|
| BLAKE2b-256 |
8e9636fa4eea0ea96ae27d5318ea47fc454afa03a457ef1d1c852631d211af90
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
37c4552d0458011ad9825927dedefd3f81c7b4bb2b33d249a7b85a026dfa507a - Sigstore transparency entry: 1088954995
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 360.0 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7637217a267c1655909c1d11e1f6b4ea9630a56fa47ed8c0b0f4a09a743f8124
|
|
| MD5 |
4a9e724e14f98151b0b1a93b328572e7
|
|
| BLAKE2b-256 |
462d1761b2410085e6bd89fa1fa8dd2d10567e099b0a280036180679a09c97f4
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl -
Subject digest:
7637217a267c1655909c1d11e1f6b4ea9630a56fa47ed8c0b0f4a09a743f8124 - Sigstore transparency entry: 1088954955
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type:
File details
Details for the file hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 372.7 kB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aec4b5353adbd9065179e3bccf28487a681862c48843911bd9889d7f0e4f2094
|
|
| MD5 |
4a1c8aaa33af297a23e12de708ebf748
|
|
| BLAKE2b-256 |
657b4b65c52befc51f1091e0136c1ee8c1fedf6d784520997c86009ce7aea990
|
Provenance
The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl:
Publisher:
publish.yml on JasonLovesDoggo/hdbscan-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl -
Subject digest:
aec4b5353adbd9065179e3bccf28487a681862c48843911bd9889d7f0e4f2094 - Sigstore transparency entry: 1088955029
- Sigstore integration time:
-
Permalink:
JasonLovesDoggo/hdbscan-rs@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/JasonLovesDoggo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4e34c1fc1fad9eafbfbbe4792aba06069317155b -
Trigger Event:
push
-
Statement type: