Skip to main content

High-performance HDBSCAN clustering, compatible with scikit-learn

Project description

hdbscan-rs

High-performance HDBSCAN clustering for Python, powered by a Rust core. Drop-in compatible with scikit-learn's API, but significantly faster -- especially on small and large datasets.

Installation

pip install hdbscan-rs

Requires Python >= 3.12 and NumPy >= 1.20. Pre-built wheels available for Linux, macOS, and Windows.

Quick start

import numpy as np
from hdbscan_rs import HDBSCAN

data = np.random.randn(10000, 2)

clusterer = HDBSCAN(min_cluster_size=15)
labels = clusterer.fit_predict(data)

print(f"Found {labels.max() + 1} clusters, {(labels == -1).sum()} noise points")

API

HDBSCAN(
    min_cluster_size=5,       # Smallest group that counts as a cluster
    min_samples=None,         # Controls density estimate (default: min_cluster_size)
    metric="euclidean",       # "euclidean", "manhattan", "cosine", "minkowski", "precomputed"
    p=None,                   # Minkowski p parameter
    alpha=1.0,                # Mutual reachability scaling factor
    cluster_selection_epsilon=0.0,  # Merge clusters below this distance
    cluster_selection_method="eom", # "eom" (Excess of Mass) or "leaf"
    allow_single_cluster=False,
)

Methods

  • fit_predict(X) -- Fit and return cluster labels (numpy array, -1 = noise)
  • fit(X) -- Fit the model without returning labels
  • approximate_predict(X) -- Predict labels for new points (returns labels, probabilities)

Properties (after fitting)

  • labels_ -- Cluster labels (-1 = noise)
  • probabilities_ -- Membership strength [0, 1]
  • outlier_scores_ -- GLOSH outlier scores [0, 1]

Performance

Best-of-3 wall time on a 4-core AMD EPYC. Data is make_blobs with 5 centers, min_cluster_size=10. Numbers are from the native Rust core; the Python binding adds <5ms overhead for data conversion.

Config sklearn HDBSCAN hdbscan (C) hdbscan-rs vs sklearn vs C
1Kx2D 8.9 ms 12.7 ms 2.6 ms 3.4x 4.9x
5Kx2D 128 ms 80.2 ms 10.6 ms 12.1x 7.6x
10Kx2D 455 ms 189 ms 18.4 ms 24.7x 10.3x
50Kx2D 12,812 ms 1,024 ms 124 ms 103x 8.2x
5Kx10D 241 ms 136 ms 62 ms 3.9x 2.2x
1Kx256D 246 ms 230 ms 19 ms 12.6x 11.8x
500x1536D 424 ms 444 ms 28 ms 14.9x 15.7x

Memory usage is 5-60x lower than Python-based implementations.

Migrating from sklearn

# Before
from sklearn.cluster import HDBSCAN
clusterer = HDBSCAN(min_cluster_size=15)

# After
from hdbscan_rs import HDBSCAN
clusterer = HDBSCAN(min_cluster_size=15)

The API matches sklearn's interface. Input should be a 2D NumPy array of float64. Results are sklearn-compatible (ARI > 0.99 across the test suite).

Precomputed distances

from hdbscan_rs import HDBSCAN
import numpy as np

# Compute your own distance matrix
dist_matrix = np.array([[0, 1, 5], [1, 0, 3], [5, 3, 0]], dtype=np.float64)

clusterer = HDBSCAN(min_cluster_size=2, metric="precomputed")
labels = clusterer.fit_predict(dist_matrix)

License

Licensed under either of Apache License, Version 2.0 or MIT License, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_rs-0.2.3.tar.gz (610.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (429.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (407.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.2.3-cp312-cp312-win_amd64.whl (302.4 kB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (429.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (408.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

hdbscan_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl (376.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

hdbscan_rs-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl (398.5 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

File details

Details for the file hdbscan_rs-0.2.3.tar.gz.

File metadata

  • Download URL: hdbscan_rs-0.2.3.tar.gz
  • Upload date:
  • Size: 610.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.2.3.tar.gz
Algorithm Hash digest
SHA256 fad53850decf4f37c2e0e4f3502fecaa78513b376617ab3699cfd567cbb9cb4b
MD5 16812b6727e79f7c2eec63fab385c79f
BLAKE2b-256 634ad4bbfac9952c7e53ea833a9d540394ee89d04e1b17e108e8a9c2293ccce6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3.tar.gz:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 420da529732bb5ad1dd9441a32fcc13b4de7b773402848a22e1e0dd7938218bf
MD5 0de73b99281bb8fc4db179dc3f2deb49
BLAKE2b-256 3e78b11e4de0f2dbbb52f3fd5f46f241d0f8cc94f19b2c6ef3764ea3cdda50e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7a2b8b095eaf3d0261ba0ff77a7af5ea0bdef904e1704b608461c31c84f48b2b
MD5 5fe5ecf29a6e0083b843dc1b2552ca25
BLAKE2b-256 56bec747de85552f0433dccfa40c789d82ad2dd4f5220fb500e68ee93a05884e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hdbscan_rs-0.2.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 302.4 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hdbscan_rs-0.2.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 27de71e078faeeec6e20efb080337a5e3be1ed04bb78391c9acefb905159b01b
MD5 ddd434df5b3d7222283bc8105475ba85
BLAKE2b-256 f2551fdf2bb120a9424d09b445ccbcff4d7871eea1972f13c5ee31aa581985c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 39bf8307631e63cccdc8a1ecce8dd9c078e250bb0504512e4c7ca2c88b8c85e3
MD5 a011987fac8aa2f8f3d00558738e53e8
BLAKE2b-256 ffae59af5a500f76b3b5d00ebf024a424f792ca13b4b77873f0c8bd0b15cf248

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cab112c8d331bb1e8ec5cb4385e4d86034a76e86504f0919e357f350520a71ac
MD5 ffab3ed308158fdb1c5877b509ecdf80
BLAKE2b-256 62a5f337aa4aa6ce34acf86b97aed60ca381436553ca539eabcdc3327c34053e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8ace94792a6ddfc9baaa51c21154bb747eb6cf7630aae29c50a3ac0996846e23
MD5 9ab269cfadc0cdcae9cd3825bb41f18f
BLAKE2b-256 f22c94e5a10ed7eba49d33af9e990d71202d33127deb13e3f0598aa9fabafba6

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hdbscan_rs-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_rs-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fb1aa9535cf6e0793d62b47b36ed977acafbc3d3ed9d15a132f6d3f7a5f2e41e
MD5 e918fabbd6ce8eb35b67ecdc21f419a0
BLAKE2b-256 1bdc6c7047ce80851b2e289c66bc004e6d5b9832592d070d1029f31c382c5e15

See more details on using hashes here.

Provenance

The following attestation bundles were made for hdbscan_rs-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: publish.yml on JasonLovesDoggo/hdbscan-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page