Skip to main content

High-performance HDBSCAN clustering, compatible with scikit-learn

Project description

hdbscan-rs

High-performance HDBSCAN clustering for Python, powered by a Rust core. Drop-in compatible with scikit-learn's API, but significantly faster -- especially on small and large datasets.

Installation

pip install hdbscan-rs

Requires Python >= 3.12 and NumPy >= 1.20. Pre-built wheels available for Linux, macOS, and Windows.

Quick start

import numpy as np
from hdbscan_rs import HDBSCAN

data = np.random.randn(10000, 2)

clusterer = HDBSCAN(min_cluster_size=15)
labels = clusterer.fit_predict(data)

print(f"Found {labels.max() + 1} clusters, {(labels == -1).sum()} noise points")

API

HDBSCAN(
    min_cluster_size=5,       # Smallest group that counts as a cluster
    min_samples=None,         # Controls density estimate (default: min_cluster_size)
    metric="euclidean",       # "euclidean", "manhattan", "cosine", "minkowski", "precomputed"
    p=None,                   # Minkowski p parameter
    alpha=1.0,                # Mutual reachability scaling factor
    cluster_selection_epsilon=0.0,  # Merge clusters below this distance
    cluster_selection_method="eom", # "eom" (Excess of Mass) or "leaf"
    allow_single_cluster=False,
)

Methods

  • fit_predict(X) -- Fit and return cluster labels (numpy array, -1 = noise)
  • fit(X) -- Fit the model without returning labels
  • approximate_predict(X) -- Predict labels for new points (returns labels, probabilities)

Properties (after fitting)

  • labels_ -- Cluster labels (-1 = noise)
  • probabilities_ -- Membership strength [0, 1]
  • outlier_scores_ -- GLOSH outlier scores [0, 1]

Performance

Best-of-3 wall time on a 4-core AMD EPYC. Data is make_blobs with 5 centers, min_cluster_size=10. Numbers are from the native Rust core; the Python binding adds <5ms overhead for data conversion.

Config sklearn HDBSCAN hdbscan (C) hdbscan-rs vs sklearn vs C
1Kx2D 8.9 ms 12.7 ms 2.6 ms 3.4x 4.9x
5Kx2D 128 ms 80.2 ms 10.6 ms 12.1x 7.6x
10Kx2D 455 ms 189 ms 18.4 ms 24.7x 10.3x
50Kx2D 12,812 ms 1,024 ms 124 ms 103x 8.2x
5Kx10D 241 ms 136 ms 62 ms 3.9x 2.2x
1Kx256D 246 ms 230 ms 19 ms 12.6x 11.8x
500x1536D 424 ms 444 ms 28 ms 14.9x 15.7x

Memory usage is 5-60x lower than Python-based implementations.

Migrating from sklearn

# Before
from sklearn.cluster import HDBSCAN
clusterer = HDBSCAN(min_cluster_size=15)

# After
from hdbscan_rs import HDBSCAN
clusterer = HDBSCAN(min_cluster_size=15)

The API matches sklearn's interface. Input should be a 2D NumPy array of float64. Results are sklearn-compatible (ARI > 0.99 across the test suite).

Precomputed distances

from hdbscan_rs import HDBSCAN
import numpy as np

# Compute your own distance matrix
dist_matrix = np.array([[0, 1, 5], [1, 0, 3], [5, 3, 0]], dtype=np.float64)

clusterer = HDBSCAN(min_cluster_size=2, metric="precomputed")
labels = clusterer.fit_predict(dist_matrix)

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_rs-0.3.0.tar.gz (611.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (430.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (407.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.3.0-cp312-cp312-win_amd64.whl (305.3 kB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (430.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (408.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl (378.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hdbscan_rs-0.3.0-cp312-cp312-macosx_10_12_x86_64.whl (401.1 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file hdbscan_rs-0.3.0.tar.gz.

File metadata

  • Download URL: hdbscan_rs-0.3.0.tar.gz
  • Upload date:
  • Size: 611.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.3.0.tar.gz
Algorithm Hash digest
SHA256 408f06ed6585e3c861a16ecdc509c4f617e5093a5296e109ecd9cf9cabe00445
MD5 06a8c23a772bec24631fbc3dca61ca44
BLAKE2b-256 d9e7e71c410c0e2c721b6eb29067ae44a3810ed01956f36443df53dd62bd026a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0.tar.gz:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5f5734266ba410c70bc12710d7cc7e84125c45cda2fd25dac65a535c1d281ba1
MD5 87426b9c36ade5cb756eb4f0f05e6c76
BLAKE2b-256 99ecceced240af3ef98a2c4c9b597d5417dd0479227cb08724b340918931585d

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 eebcab52fd547697b69dfc3fb53be62d1ccc4a10b33f8efc24bae9dcd6f25e06
MD5 c290a3ea2fe84c2784bf51119a2434ab
BLAKE2b-256 f626a873f50de63e1e17a93ebf9ba2939eae77496fe8254d71a8ebbff333f1eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.3.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 305.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3208c714c2c27db9ca8b58e3ed1317c9b87f7ed4aa352ad3d76e32287869296d
MD5 82698441279e023d311677fa17ec9639
BLAKE2b-256 09469fc5d792a81526d7680cbac019aead2e66f9beb08352f30964f9361cd1c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 48531362600e450203b3c671d8db9afb0da9d9f763b7c6f896d3bab4a39b177a
MD5 0a3521f8deb491348b0c3b57d93c4683
BLAKE2b-256 072f353e7f9d190e8a6aea85677dbeb4d5c8d4f77921d0ba779b2aeac568e0b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 85ec739ef0fabc518a5fa80c5c3cb01af5b08b82070f0b9393e13a0b77d68108
MD5 a243d1f764d10887669d100bcce06120
BLAKE2b-256 cb22ad67028b2dd5d0b6b9d216b26de0afb356ee82451a060492900366eb0044

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 db756dc64829ee5cb0800c2309268cc289893648779c8e3d2474ac8a69add92f
MD5 fd12cbc50897b6de16c453be02334bc9
BLAKE2b-256 df413f21ab9dcdd932dd034324fb7f089f9b49a9a929fb95977911dfcc18d180

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.3.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.3.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4c5a3c3de72bc4d74257fe062c4f71c62574c18c471f7f42c0878b752940a14d
MD5 8853f0e8d4723361f93415f0b8e36b04
BLAKE2b-256 6d644cd46b44f7a2a86116532afb97126e431601ecd5a2d857a34e00a31b93ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.3.0-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page