Skip to main content

High-performance HDBSCAN clustering, compatible with scikit-learn

Project description

hdbscan-rs

High-performance HDBSCAN clustering for Python, powered by a Rust core. Drop-in compatible with scikit-learn's API, but significantly faster -- especially on small and large datasets.

Installation

pip install hdbscan-rs

Requires Python >= 3.11 and NumPy >= 1.20. Pre-built wheels available for Linux, macOS, and Windows.

Quick start

import numpy as np
from hdbscan_rs import HDBSCAN

data = np.random.randn(10000, 2)

clusterer = HDBSCAN(min_cluster_size=15)
labels = clusterer.fit_predict(data)

print(f"Found {labels.max() + 1} clusters, {(labels == -1).sum()} noise points")

Migrating from other HDBSCAN packages

From hdbscan (standalone)

- import hdbscan
- clusterer = hdbscan.HDBSCAN(min_cluster_size=15, min_samples=2, metric='euclidean')
+ from hdbscan_rs import HDBSCAN
+ clusterer = HDBSCAN(min_cluster_size=15, min_samples=2, metric='euclidean')
  labels = clusterer.fit_predict(data)

From sklearn.cluster.HDBSCAN

- from sklearn.cluster import HDBSCAN
+ from hdbscan_rs import HDBSCAN
  clusterer = HDBSCAN(min_cluster_size=15)
  labels = clusterer.fit_predict(data)

From BERTopic

from hdbscan_rs import HDBSCAN
from bertopic import BERTopic

topic_model = BERTopic(hdbscan_model=HDBSCAN(min_cluster_size=15))

No other code changes needed. Labels, probabilities, and outlier scores are all compatible.

API

HDBSCAN(
    min_cluster_size=5,       # Smallest group that counts as a cluster
    min_samples=None,         # Controls density estimate (default: min_cluster_size)
    metric="euclidean",       # "euclidean", "manhattan", "cosine", "minkowski", "precomputed"
    p=None,                   # Minkowski p parameter
    alpha=1.0,                # Mutual reachability scaling factor
    cluster_selection_epsilon=0.0,  # Merge clusters below this distance
    cluster_selection_method="eom", # "eom" (Excess of Mass) or "leaf"
    allow_single_cluster=False,
)

Methods

  • fit_predict(X) -- Fit and return cluster labels (numpy array, -1 = noise)
  • fit(X) -- Fit the model without returning labels
  • approximate_predict(X) -- Predict labels for new points (returns labels, probabilities)

Properties (after fitting)

  • labels_ -- Cluster labels (-1 = noise)
  • probabilities_ -- Membership strength [0, 1]
  • outlier_scores_ -- GLOSH outlier scores [0, 1]

Performance

Best-of-3 wall time on a 4-core AMD EPYC. Data is make_blobs with 5 centers, min_cluster_size=10. Numbers are from the native Rust core; the Python binding adds <5ms overhead for data conversion.

Config sklearn C-hdbscan fast-hdbscan hdbscan-rs vs sklearn vs fast
1Kx2D 8.9 ms 12.7 ms 3.7 ms 2.6 ms 3.4x 1.4x
5Kx2D 128 ms 80.2 ms 24.5 ms 10.6 ms 12.1x 2.3x
10Kx2D 455 ms 189 ms 43.3 ms 18.4 ms 24.7x 2.4x
50Kx2D 12,812 ms 1,024 ms 293 ms 124 ms 103x 2.4x
5Kx10D 241 ms 136 ms 72.7 ms 62 ms 3.9x 1.2x
1Kx256D 246 ms 230 ms 49 ms 19 ms 12.6x 2.6x
500x1536D 424 ms 444 ms 87.7 ms 28 ms 14.9x 3.1x

Memory: 3-56 MB (Rust) vs 128-178 MB (sklearn/C) vs 468-486 MB (fast-hdbscan + Numba JIT).

Precomputed distances

from hdbscan_rs import HDBSCAN
import numpy as np

# Compute your own distance matrix
dist_matrix = np.array([[0, 1, 5], [1, 0, 3], [5, 3, 0]], dtype=np.float64)

clusterer = HDBSCAN(min_cluster_size=2, metric="precomputed")
labels = clusterer.fit_predict(dist_matrix)

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_rs-0.5.0.tar.gz (618.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_rs-0.5.0-cp313-cp313-win_amd64.whl (297.6 kB view details)

Uploaded CPython 3.13Windows x86-64

hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (387.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (353.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.5.0-cp313-cp313-macosx_11_0_arm64.whl (335.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

hdbscan_rs-0.5.0-cp313-cp313-macosx_10_12_x86_64.whl (354.4 kB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

hdbscan_rs-0.5.0-cp312-cp312-win_amd64.whl (297.9 kB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (388.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (353.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl (335.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hdbscan_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl (354.7 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

hdbscan_rs-0.5.0-cp311-cp311-win_amd64.whl (297.0 kB view details)

Uploaded CPython 3.11Windows x86-64

hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (389.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (355.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.5.0-cp311-cp311-macosx_11_0_arm64.whl (336.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

hdbscan_rs-0.5.0-cp311-cp311-macosx_10_12_x86_64.whl (355.5 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file hdbscan_rs-0.5.0.tar.gz.

File metadata

  • Download URL: hdbscan_rs-0.5.0.tar.gz
  • Upload date:
  • Size: 618.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.5.0.tar.gz
Algorithm Hash digest
SHA256 963657f8c6a8f58fa1923f30d6c122b9025044b957bdd1c2d4c9d59340f51201
MD5 44e2aff8ef0144365b803951c2f14797
BLAKE2b-256 d448ee271b4635d26bd42f74eac2beb665cf11ecf5b574c4546c3d953c4965a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0.tar.gz:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.5.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 297.6 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.5.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 32ce71347a1427387fc1b282117362bc22064fa1f528a632b778c49bf6a95f91
MD5 815cc534002d1d995b132f76e1ef1373
BLAKE2b-256 75cd6d2b6c3b8644171298e931db62499f7675b9d34d902842686bc52961fb6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9fe97ad85ec0b2d17323f36ee60519209f803ed56536b16cdf8d47a2584bbebd
MD5 80284feee596a49d79fbee40c03d6000
BLAKE2b-256 abca1b53ccde1233a34d209cf72778ca74e3eecd6ff4abc770fa40ab98482ab5

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2f97cada1fd54fbb763111b1af50a9aa5f4a0255bf635d78361d6652b7319ef8
MD5 831df406751d0456dce9646851c821a3
BLAKE2b-256 e4094035802c7c4fa5a344dd48775717da9c47fbdfafd95be48c9f55db151e1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ddf5fb17f9f65722b1db642ae2eb1e71a22220622b07e8a2cfd616910f2f4a68
MD5 c59c5f4b9ebe8f7667fc860a7f2430c7
BLAKE2b-256 93b910cffee1993cd51c72537199993f8e8f8cbb64124157639d04788bb68e13

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9db47e6216c5b7eda351454148c663ee6c37726ba4e7f7839532da1342f7ac32
MD5 026b2ef7e311e7dc3453662aa592b33e
BLAKE2b-256 c1748acf1257c4415d1a40b8bc04679758fda642340acb069720ac5b18c18bde

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp313-cp313-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.5.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 297.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a5756db0c520936b867951d1c1b5b45b7e9fed41b6c9f9bbabb600e2fd7613c7
MD5 d152d2791eb6ee60e08361e831f801f0
BLAKE2b-256 1a4284e768ccc7720917abd22f2cb601a3225473e43ee8766f634daff9f0e9fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 818a629e97f604ad277f60ee9e9e51eb9ddbce9d8c27a5c8b630b18a17f186ed
MD5 2de7f0d1e08730c8b03b311d3259147a
BLAKE2b-256 e5bac9361c0e7fe2bfe30a4c7de538748379db4ec0e34866aac9af2db4206b86

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6b0dca8640f02c1887b872ad4675beb4852c830a29c224c1a14c6ca2dbd08273
MD5 e08796020ceea43c42a42829883eba4b
BLAKE2b-256 38380091a46e29be7ccdf9a11aacfbeb2d10d37ad7e71746b07637cad8847782

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1038d0228c54f8d9a43bbfe1df35702bfdb735539d1ff93f65dba6bcce96a1e7
MD5 611da2c9cd9d9bd61b7b583e4f29d679
BLAKE2b-256 9d47af2f05614247ead960b763b45febd9de7779703d8645a015cea4e70ddb2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 48fe3cb64d0053c07d8fe8620231bc6d32237408aada848d25ae8838703aa669
MD5 26c73bbcdca1606cbedfc4dd61be55f1
BLAKE2b-256 62df38fd6c00e003749644d6954dffd616794a0983c00758fa3e92be0f535f71

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.5.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 297.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.5.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 345a13cdeeb48408fc19465353e4195cabbec95a88b66e7f51979d4f1883569e
MD5 9085e2745129eeba33512506e3e058f0
BLAKE2b-256 e1e6b4359ce76a9b2b564942f2294c84161b98ec470ea8af119b57f59909f9a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 49e1733d68c64f78a763f527142d7a994977145850ad2db81370c77bf620c9e2
MD5 bab1d05d150271bd573026d79ae8598d
BLAKE2b-256 a7c8ec1caf6449b9b5e8954d3782883b71259b34ad208826151659c4b9dab7b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cf069f788e0c088b20a1c75d87f198b2decdc8986a8de3b55cef7db475130b4f
MD5 8c44d049780dbdde539f9f94bca944ed
BLAKE2b-256 f8a90481fda4091607ce05670a7bebf76fb7ee382b0e79bcd3b8759d96e05a22

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cfbf57084eee057f6f9f128d69d17e9d1d761ed185fd659f5aae89cff2c0449f
MD5 4de408d2d9957900dc6452cd7bfbb8a5
BLAKE2b-256 54011ce56831902361d51012bae03dd156d87ccaf1a5f48c8c6a8e25dabe9209

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.5.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.5.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 257ba462f7dd93fe18c8aa2b37d0df561b1fcc75e969a1ba7a1b1c4150a62203
MD5 ea541666d531d5ee045b7891d4920ed9
BLAKE2b-256 380e18a37b30f33fdf4cc1af39c26ac3fec8a9007c37c3417e3cd8f4c2e280f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.5.0-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page