Skip to main content

High-performance HDBSCAN clustering, compatible with scikit-learn

Project description

hdbscan-rs

High-performance HDBSCAN clustering for Python, powered by a Rust core. Drop-in compatible with scikit-learn's API, but significantly faster -- especially on small and large datasets.

Installation

pip install hdbscan-rs

Requires Python >= 3.12 and NumPy >= 1.20. Pre-built wheels available for Linux, macOS, and Windows.

Quick start

import numpy as np
from hdbscan_rs import HDBSCAN

data = np.random.randn(10000, 2)

clusterer = HDBSCAN(min_cluster_size=15)
labels = clusterer.fit_predict(data)

print(f"Found {labels.max() + 1} clusters, {(labels == -1).sum()} noise points")

API

HDBSCAN(
    min_cluster_size=5,       # Smallest group that counts as a cluster
    min_samples=None,         # Controls density estimate (default: min_cluster_size)
    metric="euclidean",       # "euclidean", "manhattan", "cosine", "minkowski", "precomputed"
    p=None,                   # Minkowski p parameter
    alpha=1.0,                # Mutual reachability scaling factor
    cluster_selection_epsilon=0.0,  # Merge clusters below this distance
    cluster_selection_method="eom", # "eom" (Excess of Mass) or "leaf"
    allow_single_cluster=False,
)

Methods

  • fit_predict(X) -- Fit and return cluster labels (numpy array, -1 = noise)
  • fit(X) -- Fit the model without returning labels
  • approximate_predict(X) -- Predict labels for new points (returns labels, probabilities)

Properties (after fitting)

  • labels_ -- Cluster labels (-1 = noise)
  • probabilities_ -- Membership strength [0, 1]
  • outlier_scores_ -- GLOSH outlier scores [0, 1]

Performance

Best-of-3 wall time on a 4-core AMD EPYC. Data is make_blobs with 5 centers, min_cluster_size=10. Numbers are from the native Rust core; the Python binding adds <5ms overhead for data conversion.

Config sklearn HDBSCAN hdbscan (C) hdbscan-rs vs sklearn vs C
1Kx2D 8.9 ms 12.7 ms 2.6 ms 3.4x 4.9x
5Kx2D 128 ms 80.2 ms 10.6 ms 12.1x 7.6x
10Kx2D 455 ms 189 ms 18.4 ms 24.7x 10.3x
50Kx2D 12,812 ms 1,024 ms 124 ms 103x 8.2x
5Kx10D 241 ms 136 ms 62 ms 3.9x 2.2x
1Kx256D 246 ms 230 ms 19 ms 12.6x 11.8x
500x1536D 424 ms 444 ms 28 ms 14.9x 15.7x

Memory usage is 5-60x lower than Python-based implementations.

Migrating from sklearn

# Before
from sklearn.cluster import HDBSCAN
clusterer = HDBSCAN(min_cluster_size=15)

# After
from hdbscan_rs import HDBSCAN
clusterer = HDBSCAN(min_cluster_size=15)

The API matches sklearn's interface. Input should be a 2D NumPy array of float64. Results are sklearn-compatible (ARI > 0.99 across the test suite).

Precomputed distances

from hdbscan_rs import HDBSCAN
import numpy as np

# Compute your own distance matrix
dist_matrix = np.array([[0, 1, 5], [1, 0, 3], [5, 3, 0]], dtype=np.float64)

clusterer = HDBSCAN(min_cluster_size=2, metric="precomputed")
labels = clusterer.fit_predict(dist_matrix)

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_rs-0.4.0.tar.gz (616.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_rs-0.4.0-cp313-cp313-win_amd64.whl (295.4 kB view details)

Uploaded CPython 3.13Windows x86-64

hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (386.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (351.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.4.0-cp313-cp313-macosx_11_0_arm64.whl (333.3 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

hdbscan_rs-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl (352.5 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

hdbscan_rs-0.4.0-cp312-cp312-win_amd64.whl (295.8 kB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (386.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (352.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.4.0-cp312-cp312-macosx_11_0_arm64.whl (333.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hdbscan_rs-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl (352.8 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file hdbscan_rs-0.4.0.tar.gz.

File metadata

  • Download URL: hdbscan_rs-0.4.0.tar.gz
  • Upload date:
  • Size: 616.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.4.0.tar.gz
Algorithm Hash digest
SHA256 5e72483bf6b962e5126d1dc715155aaceed68b245efe4d07ca5079c3d9db2613
MD5 364b6f289b6b331ee2bdacb1cb2bf7c2
BLAKE2b-256 30a2d93d7a89271b811050d54a6aa9b86b9a2b8b9f20c64004f999fe435a4966

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0.tar.gz:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.4.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 295.4 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.4.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 44141cf7a8819b6eabd7524e42ace2dada076b03118ddc91c57d05508487ffb4
MD5 17f58c692dc06e0d12fc3717cad37cdb
BLAKE2b-256 cf6ee49136c41b9915a6f1986b802ca69f4819fd8ab9d537cb0f9c1256f4b3e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ee9498d0bdd17fee91cec3d85fdfc62ba01672eee05a794ee332371b1cf48a41
MD5 fcd44bd70cdd974b8037ef10e85f6c4c
BLAKE2b-256 d90592cd791709f9ce10fcffbfaf80493e3cba0479d8a3bab703cd19724ccba5

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d418e2890c785079014647f762250f8d8d22d831b2c000c09b71f653a7a1ec5f
MD5 3eced7c8e013f28bd5937e0f7ea22525
BLAKE2b-256 0c640d41836c5afbbac7d7e899287818d2e0ebe41d8b5eba0b6c2fee8819e5f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bdd95ec5411f39efb0e7e8b78c326783fc5b7cb19053299f37c2c8b591354eb8
MD5 2e59bfc735f5428246a5b5cde2ebfdbf
BLAKE2b-256 80fdeb68640d4cb996e4e1f6df795b3fd281acfc89a68e1b7eb4c194081b180b

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b43dca7e9fb299128889932da60b233565c9677203f2a7af893fd27037f53007
MD5 3d3b88a544f8a88ee8c920bda77cbcb9
BLAKE2b-256 388fc6a155b7ffd2e38793ad286d26d06f3dc4131725332d6cdd213fa67ae8f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp313-cp313-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.4.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 295.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.4.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d832909d29682142b7107d5eed7f05f7f7f9d104422672215ac4f25c27fe34cd
MD5 58027e7c184fa53404df8794bfe36948
BLAKE2b-256 8d486ee991c4c13ed1356fed14bc0b4924a9c889dcb20a3519dc0f2427ae28f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e42b220f53e5590dd8b2b84d59589283a95005a47b8dbfad5313f50364757b20
MD5 c1cdc79fa05b2c0ae39621c333c7ac40
BLAKE2b-256 ca640424cc8e034870779da3e9ee8afd76fe15460f5a1916506304cce1254876

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7bdc06fca6561d9df8d7f01f7faf367cc10b5b9db51bcdc7d2a63c5d9486aa00
MD5 1798aec3c122b04f9c9ae76a9b609d6f
BLAKE2b-256 a6732657669f17ff96715e8eb7f7c4318508d6cdc844872f459959ccc8090104

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 38768fe9352b56ca897d04827ee5c583917c874d139fc281d69f39ba432f6f62
MD5 e65c30fab1d5dc07c0e5e4d1926ef5a4
BLAKE2b-256 50ee516e5849c04b9c4b627a4f3e7aeace17539ad76ea4cd26f27f82d2468929

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 095e234cdf6fa7698f38cd4609d68cbf3c9e9c6302a85ad6a00bfa638304ef7c
MD5 ff7a59d82ce511a3f7a88aa4fcd58602
BLAKE2b-256 0334ce37188900831bb9b02001990bfa9b0b6012ca74a48da06eb6a64f3ab369

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.4.0-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page