Skip to main content

High-performance HDBSCAN clustering, compatible with scikit-learn

Project description

hdbscan-rs

High-performance HDBSCAN clustering for Python, powered by a Rust core. Drop-in compatible with scikit-learn's API, but significantly faster -- especially on small and large datasets.

Installation

pip install hdbscan-rs

Requires Python >= 3.11 and NumPy >= 1.20. Pre-built wheels available for Linux, macOS, and Windows.

Quick start

import numpy as np
from hdbscan_rs import HDBSCAN

data = np.random.randn(10000, 2)

clusterer = HDBSCAN(min_cluster_size=15)
labels = clusterer.fit_predict(data)

print(f"Found {labels.max() + 1} clusters, {(labels == -1).sum()} noise points")

Migrating from other HDBSCAN packages

From hdbscan (standalone)

- import hdbscan
- clusterer = hdbscan.HDBSCAN(min_cluster_size=15, min_samples=2, metric='euclidean')
+ from hdbscan_rs import HDBSCAN
+ clusterer = HDBSCAN(min_cluster_size=15, min_samples=2, metric='euclidean')
  labels = clusterer.fit_predict(data)

From sklearn.cluster.HDBSCAN

- from sklearn.cluster import HDBSCAN
+ from hdbscan_rs import HDBSCAN
  clusterer = HDBSCAN(min_cluster_size=15)
  labels = clusterer.fit_predict(data)

From BERTopic

from hdbscan_rs import HDBSCAN
from bertopic import BERTopic

topic_model = BERTopic(hdbscan_model=HDBSCAN(min_cluster_size=15))

No other code changes needed. Labels, probabilities, and outlier scores are all compatible.

API

HDBSCAN(
    min_cluster_size=5,       # Smallest group that counts as a cluster
    min_samples=None,         # Controls density estimate (default: min_cluster_size)
    metric="euclidean",       # "euclidean", "manhattan", "cosine", "minkowski", "precomputed"
    p=None,                   # Minkowski p parameter
    alpha=1.0,                # Mutual reachability scaling factor
    cluster_selection_epsilon=0.0,  # Merge clusters below this distance
    cluster_selection_method="eom", # "eom" (Excess of Mass) or "leaf"
    allow_single_cluster=False,
)

Methods

  • fit_predict(X) -- Fit and return cluster labels (numpy array, -1 = noise)
  • fit(X) -- Fit the model without returning labels
  • approximate_predict(X) -- Predict labels for new points (returns labels, probabilities)

Properties (after fitting)

  • labels_ -- Cluster labels (-1 = noise)
  • probabilities_ -- Membership strength [0, 1]
  • outlier_scores_ -- GLOSH outlier scores [0, 1]

Performance

Best-of-3 wall time on a 4-core AMD EPYC. Data is make_blobs with 5 centers, min_cluster_size=10. Numbers are from the native Rust core; the Python binding adds <5ms overhead for data conversion.

Config sklearn C-hdbscan fast-hdbscan hdbscan-rs vs sklearn vs fast
1Kx2D 12.0 ms 12.7 ms 3.8 ms 2.2 ms 5.4x 1.7x
5Kx2D 121 ms 76.0 ms 20.6 ms 9.2 ms 13.1x 2.2x
10Kx2D 445 ms 181 ms 45.1 ms 17.8 ms 25.0x 2.5x
50Kx2D 12,757 ms 1,011 ms 302 ms 101 ms 126x 3.0x
5Kx10D 240 ms 133 ms 70.5 ms 49 ms 4.9x 1.4x
1Kx256D 235 ms 230 ms 65.4 ms 19 ms 12.1x 3.4x
500x1536D 412 ms 439 ms 80.7 ms 27 ms 15.1x 3.0x

Memory: 3-41 MB (Rust) vs 120-150 MB (sklearn) vs 121-169 MB (C-hdbscan) vs 457-470 MB (fast-hdbscan + Numba JIT).

Precomputed distances

from hdbscan_rs import HDBSCAN
import numpy as np

# Compute your own distance matrix
dist_matrix = np.array([[0, 1, 5], [1, 0, 3], [5, 3, 0]], dtype=np.float64)

clusterer = HDBSCAN(min_cluster_size=2, metric="precomputed")
labels = clusterer.fit_predict(dist_matrix)

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_rs-0.6.1.tar.gz (620.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_rs-0.6.1-cp313-cp313-win_amd64.whl (304.1 kB view details)

Uploaded CPython 3.13Windows x86-64

hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (394.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (357.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.6.1-cp313-cp313-macosx_11_0_arm64.whl (339.8 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

hdbscan_rs-0.6.1-cp313-cp313-macosx_10_12_x86_64.whl (359.9 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

hdbscan_rs-0.6.1-cp312-cp312-win_amd64.whl (304.4 kB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (394.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (358.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.6.1-cp312-cp312-macosx_11_0_arm64.whl (340.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hdbscan_rs-0.6.1-cp312-cp312-macosx_10_12_x86_64.whl (360.3 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

hdbscan_rs-0.6.1-cp311-cp311-win_amd64.whl (303.4 kB view details)

Uploaded CPython 3.11Windows x86-64

hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (395.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (359.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.6.1-cp311-cp311-macosx_11_0_arm64.whl (340.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

hdbscan_rs-0.6.1-cp311-cp311-macosx_10_12_x86_64.whl (360.9 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file hdbscan_rs-0.6.1.tar.gz.

File metadata

  • Download URL: hdbscan_rs-0.6.1.tar.gz
  • Upload date:
  • Size: 620.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.6.1.tar.gz
Algorithm Hash digest
SHA256 e0b8588f2ae9201e3729e7274db3fa77f98af4c4d77de860488f476b67e4dcac
MD5 1d6beee90f3403889a14921a92697ea2
BLAKE2b-256 e5dc89cd1dc6b308a9ff9eeda441a6d885c724b33172b3b9252e90dc4e262d3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1.tar.gz:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.6.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 304.1 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.6.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 58db33dac67f99dc29eaf7e7a399ad4f317ed9beea257c3f1734c71b002ed2aa
MD5 abc6550f73b00e9270586a6aa273bdcd
BLAKE2b-256 b1ba037b7f511a77e742e1b237eaa83ef38e5b79ccea85dc6069a48e69c54122

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 892f7205c264a31b20c521fc7b4fa39163e061ce498898e53e398d00b696132c
MD5 0ad192b4536846a5270523df5d6e0411
BLAKE2b-256 a1108d67f40ac1077279296f40174046da6748bea1a17ba6a95e01cea110b2b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ef26d4961b26ad1f380375fd0b46209463452d546379d2f32a7d6f4d2c5ca766
MD5 da5a007641fa6a65363fd5b723b6ddea
BLAKE2b-256 0383b7f461dfa33ea24c102911b35666861a41ae504ec28b8e454408b3d225ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 536397b09d7589e53f130234e38382508482d92c2399f3cdf3bd1efce918bd38
MD5 221072ecbc52691759467674d6b191b4
BLAKE2b-256 3ed4309ff824c74bd4e41506e8030c23d88ae3658231319e870f62a86ef16080

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 73455d77dccffa80e4a4e4bc301da14b95deff6074ace489d37b03920f0d9f4f
MD5 12900ec78e622bd65033185c0d7c3570
BLAKE2b-256 f185ab270e0457ecf83228e1d068e5ac75c3ec0d8170822bebfc28de8852f389

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp313-cp313-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.6.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 304.4 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.6.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 56541fa8e2b29538ec1328dd7e979fe98b66ad71043d05b210b2374d4fc6bb50
MD5 d93a6973086d57537e5599cd74dc208e
BLAKE2b-256 a6cc92b1256d2f53faa26e464e0c089eabc37883080c5f050fe19165aa661731

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4cb3e116d611927e0010045e6f9893c33ddf1d14e673ce54b58a1fcb02f6ef60
MD5 3e5617f592555af130ef6b43bea43147
BLAKE2b-256 60f2ec98047a9717dce74f8eb74d1eff382b5354213466173bf0125ce85f2f29

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 bead81e56d39e0e4c992cac021afebfe256ab319a1b7c5b470c2b683f29bfbff
MD5 2e2ed9fb91bc22a14a6587417b785ea5
BLAKE2b-256 f56a69aa85599e2fb3764bc346ec527abd050a56db7ebf7598437d78a6881303

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef0b76a8ac6ec983ec3e59616ede7d9f6130908419e9b02d05f4a94a2c6799eb
MD5 cd0b51a040d3c1cc711b55b7a803c043
BLAKE2b-256 d3052e4f3a75ef7ae1a7a5671d93d13d376e5f1d0e370250c724cb90e68c4674

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b1aaca8c1b66eb8928b4c031a91338b845fbb70f1fdb426255306394dc704331
MD5 a74e18b3c5446decc3eb6c2006b9b387
BLAKE2b-256 181371eab1f39bd8422fbff6af3cf2cffb0d26bd8bdafd0fd93cbff20864df69

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.6.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 303.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.6.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 883925a3a1896255a3cb2229ef1c0b2a19571ac433073ff4a9c0d34de9c27169
MD5 5e1b24cb498d6a767e20c91797f529f1
BLAKE2b-256 4505b25dad69bd6a4d583631f6c26e7b439678d00a69b7820a08e77711a6c427

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ffb65b4d571d25f536c9b8c765c968318a1208481e30087043f61549b3bbe500
MD5 718ff15d27ef1ed8da208d10df37027d
BLAKE2b-256 121e4c4e24ad254daf186252c13c88c167eb29d2a400ff3e705304d2c7be25cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 159df5c2d1f6e0d77d81779bc52ab88670a70e2bafeeccff4c778b10e0b52df2
MD5 26850d42d2f0b97d3d206ffb2eb98a79
BLAKE2b-256 5183eba6653c29478159991b7f15eba1a52be3dc2668e72d6be67190644f1204

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2072497953afc9b74de7920c3fca197cbfddafe36de52fb7a527f27c02875964
MD5 c1e551dc3ab159502042d78606e37817
BLAKE2b-256 1923a2221a169cb96f76f03ec2533f47aaff659a8a56620c0dbf66d64c257f2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.6.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.6.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 286322da3a98c255084d09101406c39324af8406cb4108759935299f4f50bdb1
MD5 0ea9f984da9a5cf333337722d1d10b19
BLAKE2b-256 91fdcd8b499a33c08a26c08eb33c0ff03defc38ae8aee9c8623d7e98a6593945

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.6.1-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page