fast-hdbscan

A fast multicore version of HDBSCAN and PLSCAN clustering algorithms.

These details have not been verified by PyPI

Project links

Homepage

Project description

Fast Multicore HDBSCAN

The fast_hdbscan library provides an implementation of the HDBSCAN clustering algorithm designed specifically for high performance on multicore machine. The algorithm runs in parallel and can make effective use of as many cores as you wish to throw at a problem. It is thus ideal for large SMP systems, and even modern multicore laptops.

This library provides a re-implementation of a subset of the HDBSCAN algorithm that is compatible with the hdbscan library. There are specific optimizationsfor data that is Euclidean and low dimensional, other distance metrics and high dimensional data fallback to alternative parallel approaches that are faster than the hdbscan library, but not necessarily as performant as the highly optimized low-dimensional Euclidean case. The primary advantages of this library over the standard hdbscan library are:

this library can easily use all available cores to speed up computation;

this library has much faster implementations of tree condensing and cluster extraction;

this library is much simpler and more approachable for extending or using components from;

this library is built on numba and has less issues with binaries and compilation.

this library provides features such as semi-supervision, linking constraints, sample weights, and branch detection from FLASC, and an implemntation of PLSCAN.

This library does not support all the features and input formats available in the hdbscan library, but covers the most common use cases.

This library does support a number of research extensions to HDBSCAN including branch detection from FLASC and the semi-supervised clustering methods, as well as support for sample weights.

As a bonus this library also provides an easy to use implementation of the PLSCAN algorithm for automated cluster resolution selection and layered clustering.

Basic Usage

The fast_hdbscan library follows the hdbscan library in using the sklearn API. You can use the fast_hdbscan class HDBSCAN exactly as you would that of the hdbscan library with the caveat that fast_hdbscan only supports a subset of the parameters and options of hdbscan. Nonetheless, if you have low-dimensional Euclidean data (e.g. the output of UMAP), you can use this library as a straightforward drop in replacement for hdbscan:

import fast_hdbscan
from sklearn.datasets import make_blobs

data, _ = make_blobs(1000)

clusterer = fast_hdbscan.HDBSCAN(min_cluster_size=10)
cluster_labels = clusterer.fit_predict(data)

The first import of the package will take a while, as numba functions will be compiled for the first time. These functions are cached by default; you can tell numba to ignore the cache by setting the environment variable FAST_HDBSCAN_NUMBA_CACHE to ‘false’.

Aternatively, you can use the PLSCAN class to perform automated multiscale clustering:

import fast_hdbscan
from sklearn.datasets import make_blobs

data, _ = make_blobs(1000)

clusterer = fast_hdbscan.PLSCAN()
cluster_labels = clusterer.fit_predict(data)
print(len(clusterer.cluster_layers_)) # number of layers found -- each layer is a layering at a different resolution

Installation

fast_hdbscan requires:

numba

numpy

scikit-learn

if you need more than just Euclidean distance, or support for high dimensional data, you will also need:

pynndescent

fast_hdbscan can be installed via pip:

pip install fast_hdbscan

To manually install this package:

wget https://github.com/TutteInstitute/fast_hdbscan/archive/main.zip
unzip main.zip
rm main.zip
cd fast_hdbscan-main
python setup.py install

References

The algorithm used here is an adaptation of the algorithms described in the papers:

McInnes L, Healy J. Accelerated Hierarchical Density Based Clustering In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017 [pdf]

R. Campello, D. Moulavi, and J. Sander, Density-Based Clustering Based on Hierarchical Density Estimates In: Advances in Knowledge Discovery and Data Mining, Springer, pp 160-172. 2013

The branch-detection functionality is adapted from:

D.M. Bot, J. Peeters, J. Liesenborgs, J. Aerts. FLASC: a flare-sensitive clustering algorithm. In: PeerJ Computer Science, Volume 11, e2792, 2025. https://doi.org/10.7717/peerj-cs.2792.

The PLSCAN functionality is adapted from:

D.M. Bot, L. McInnes, J. Aerts. Persistent Multiscale Density-based Clustering. In: arXiv preprint arXiv:2512.16558, 2025. https://arxiv.org/abs/2512.16558.

License

fast_hdbscan is BSD (2-clause) licensed. See the LICENSE file for details.

Contributing

Contributions are more than welcome! If you have ideas for features of projects please get in touch. Everything from code to notebooks to examples and documentation are all equally valuable so please don’t feel you can’t contribute. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged in.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.2

Apr 7, 2026

0.3.1

Mar 27, 2026

0.3.0

Mar 18, 2026

0.2.2

Mar 19, 2025

0.2.1

Feb 28, 2025

0.2.0

Oct 1, 2024

0.1.3

Oct 18, 2023

0.1.0

Mar 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_hdbscan-0.3.2.tar.gz (59.8 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fast_hdbscan-0.3.2-py3-none-any.whl (62.6 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file fast_hdbscan-0.3.2.tar.gz.

File metadata

Download URL: fast_hdbscan-0.3.2.tar.gz
Upload date: Apr 7, 2026
Size: 59.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for fast_hdbscan-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`248e09202eda04da4b84dd819de3cc714469912205bb447ad4c7136db580f205`
MD5	`33e7c041b2d4020df3be7e48cef9788d`
BLAKE2b-256	`2b3b235c47dd8282610522f9fa6869ad8b896c2d5927f82039f66d547229fc07`

See more details on using hashes here.

File details

Details for the file fast_hdbscan-0.3.2-py3-none-any.whl.

File metadata

Download URL: fast_hdbscan-0.3.2-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 62.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for fast_hdbscan-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`619b42963935e6c3ab1e51de7895dd35260f3ef9ebe4627ff4317073fe51a667`
MD5	`bacf9c6aecf6ce9ebc0b9fb713477259`
BLAKE2b-256	`d09234dac2dff0877d637535bd8852bc68323f155799521c9ee94c27c548426f`

See more details on using hashes here.

fast-hdbscan 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast Multicore HDBSCAN

Basic Usage

Installation

References

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes