Skip to main content

High-performance HDBSCAN clustering, compatible with scikit-learn

Project description

hdbscan-rs

A Rust implementation of HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Produces results compatible with scikit-learn's HDBSCAN, but runs significantly faster on large datasets thanks to a dual-tree Boruvka MST and tight pruning in native code.

Quick start

Add it to your project:

cargo add hdbscan-rs

Cluster some data:

use hdbscan_rs::{Hdbscan, HdbscanParams};
use ndarray::array;

let data = array![
    [0.0, 0.0], [0.1, 0.0], [0.0, 0.1], [0.1, 0.1], [0.05, 0.05],
    [10.0, 10.0], [10.1, 10.0], [10.0, 10.1], [10.1, 10.1], [10.05, 10.05],
];

let params = HdbscanParams { min_cluster_size: 3, ..Default::default() };
let mut hdbscan = Hdbscan::new(params);
let labels = hdbscan.fit_predict(&data.view()).unwrap();
// labels: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

Features

  • sklearn-compatible output -labels, probabilities, outlier scores, and condensed tree all match the reference Python implementation (ARI > 0.99 across fixture suite)
  • Fast -dual-tree Boruvka MST with per-node component caching, lazy sqrt, and closer-child-first traversal. Falls back to Prim's for non-Euclidean metrics or small datasets
  • Approximate prediction -classify new points against a fitted model without re-clustering
  • Cluster centers -optional centroid and/or medoid computation
  • Five distance metrics -Euclidean, Manhattan, Cosine, Minkowski(p), or bring your own precomputed distance matrix

Performance

Measured on 2D blobs (5 clusters, min_cluster_size=10), single thread, best-of-N:

n Time
500 1.8 ms
1,000 5.0 ms
2,000 7.9 ms
5,000 24.1 ms
10,000 54.5 ms
20,000 114.4 ms
50,000 290.2 ms

The MST algorithm is selected automatically: dual-tree Boruvka for Euclidean data with n >= 128, Prim's otherwise. See BENCHMARKS.md for methodology and detailed results.

Parameters

Parameter Default Description
min_cluster_size 5 Smallest group that counts as a cluster
min_samples None (= min_cluster_size) Controls density estimate; higher = more conservative
metric Euclidean Distance metric
alpha 1.0 Mutual reachability scaling factor
cluster_selection_epsilon 0.0 Merge clusters below this distance threshold
cluster_selection_method Eom Eom (Excess of Mass) or Leaf
allow_single_cluster false Permit the entire dataset to form one cluster
store_centers None Compute Centroid, Medoid, or Both

Richer output

After calling fit or fit_predict, you can access:

hdbscan.labels()         // Option<&[i32]>      -cluster labels (-1 = noise)
hdbscan.probabilities()  // Option<&[f64]>      -membership strength [0, 1]
hdbscan.outlier_scores() // Option<&[f64]>      -GLOSH outlier scores [0, 1]
hdbscan.condensed_tree() // Option<&[CondensedTreeEdge]>
hdbscan.centroids()      // Option<&Array2<f64>> -if store_centers was set
hdbscan.medoids()        // Option<&Array2<f64>> -if store_centers was set

Precomputed distances

If you already have a distance matrix:

use hdbscan_rs::{Hdbscan, HdbscanParams, Metric};

let params = HdbscanParams {
    min_cluster_size: 3,
    metric: Metric::Precomputed,
    ..Default::default()
};
let mut hdbscan = Hdbscan::new(params);
let labels = hdbscan.fit_predict(&dist_matrix.view()).unwrap();

Testing

The test suite validates against scikit-learn fixtures (blobs, moons, circles, varying density, duplicates, precomputed matrices) and includes property-based invariant tests.

cargo test
# 71 tests, plus 2 optional large-scale tests (100K and 1M points):
cargo test -- --ignored

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_rs-0.1.1.tar.gz (587.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (416.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (400.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl (259.2 kB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (416.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (400.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (360.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl (372.7 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file hdbscan_rs-0.1.1.tar.gz.

File metadata

  • Download URL: hdbscan_rs-0.1.1.tar.gz
  • Upload date:
  • Size: 587.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0a389d5c8235660fdc12c8a8d565e14254c8cf1814d2fd46a4cb8acb935a5813
MD5 fc07ce7d54c8fd8958be197e1adbe7d8
BLAKE2b-256 39e596ba452546923d9b9d2efeaf529bc7ab4b7ab7954b940e9ab9b8fc25ef85

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1.tar.gz:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 47c7052a8ce31c45da4d977f11cacc50e2481fdf22d8032efd0cffb4b70bb257
MD5 2ae2782816c7e15bed8c723c681e9987
BLAKE2b-256 ec34229319803325e6c9a696c6f93af70cd5cc75289c6cdc241570e548f357ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6608b6c07d9fad4e1cf9eac036327770da0dcdf5e722aacb5a0a60ab1dc5da59
MD5 1db2454fef737028cee00ffa56f04367
BLAKE2b-256 684ae2be871f7ea8a50180a6520f2cc5acf6193ab07579530ef94fdb37e13253

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 259.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 23573ce2b822ef45674556e927268b04c07db060585d16283f2dd35c78cc71cc
MD5 8ba021664008f690993310b2c705ce05
BLAKE2b-256 688abfb8b78c69ae6cb015d2ae9deafdbe1ba9f92bcbaba338e0f456166122cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 35238d5e765863dfb67c1a2f9eca9adbc4b967ce3c4e37a7ba52a59196f7dba9
MD5 1e8ffca99f583d0ddbfcfd56b848cdbe
BLAKE2b-256 6a00c3bee5eff3ef4ecc3764aef49b8000834430b4a31a586f31e52a5b5aeadb

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 37c4552d0458011ad9825927dedefd3f81c7b4bb2b33d249a7b85a026dfa507a
MD5 0b435cdde53b5a20bde38dd1859c2e67
BLAKE2b-256 8e9636fa4eea0ea96ae27d5318ea47fc454afa03a457ef1d1c852631d211af90

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7637217a267c1655909c1d11e1f6b4ea9630a56fa47ed8c0b0f4a09a743f8124
MD5 4a9e724e14f98151b0b1a93b328572e7
BLAKE2b-256 462d1761b2410085e6bd89fa1fa8dd2d10567e099b0a280036180679a09c97f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 aec4b5353adbd9065179e3bccf28487a681862c48843911bd9889d7f0e4f2094
MD5 4a1c8aaa33af297a23e12de708ebf748
BLAKE2b-256 657b4b65c52befc51f1091e0136c1ee8c1fedf6d784520997c86009ce7aea990

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page