Skip to main content

Temporary fork of hdbscan with Python 3.14 wheel support; see scikit-learn-contrib/hdbscan#706. Imports as `hdbscan`.

Project description

PyPI Version Conda-forge Version Conda-forge downloads License Travis Build Status https://codecov.io/gh/scikit-learn-contrib/hdbscan/branch/master/graph/badge.svg Docs JOSS article Launch example notebooks in Binder

HDBSCAN

HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.

In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning – and the primary parameter, minimum cluster size, is intuitive and easy to select.

HDBSCAN is ideal for exploratory data analysis; it’s a fast and robust algorithm that you can trust to return meaningful clusters (if there are any).

Based on the papers:

McInnes L, Healy J. Accelerated Hierarchical Density Based Clustering In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017 [pdf]

R. Campello, D. Moulavi, and J. Sander, Density-Based Clustering Based on Hierarchical Density Estimates In: Advances in Knowledge Discovery and Data Mining, Springer, pp 160-172. 2013

Documentation, including tutorials, are available on ReadTheDocs at http://hdbscan.readthedocs.io/en/latest/ .

Notebooks comparing HDBSCAN to other clustering algorithms, explaining how HDBSCAN works and comparing performance with other python clustering implementations are available.

How to use HDBSCAN

The hdbscan package inherits from sklearn classes, and thus drops in neatly next to other sklearn clusterers with an identical calling API. Similarly it supports input in a variety of formats: an array (or pandas dataframe, or sparse matrix) of shape (num_samples x num_features); an array (or sparse matrix) giving a distance matrix between samples.

import hdbscan
from sklearn.datasets import make_blobs

data, _ = make_blobs(1000)

clusterer = hdbscan.HDBSCAN(min_cluster_size=10)
cluster_labels = clusterer.fit_predict(data)

Performance

Significant effort has been put into making the hdbscan implementation as fast as possible. It is orders of magnitude faster than the reference implementation in Java, and is currently faster than highly optimized single linkage implementations in C and C++. version 0.7 performance can be seen in this notebook . In particular performance on low dimensional data is better than sklearn’s DBSCAN , and via support for caching with joblib, re-clustering with different parameters can be almost free.

Additional functionality

The hdbscan package comes equipped with visualization tools to help you understand your clustering results. After fitting data the clusterer object has attributes for:

  • The condensed cluster hierarchy

  • The robust single linkage cluster hierarchy

  • The reachability distance minimal spanning tree

All of which come equipped with methods for plotting and converting to Pandas or NetworkX for further analysis. See the notebook on how HDBSCAN works for examples and further details.

The clusterer objects also have an attribute providing cluster membership strengths, resulting in optional soft clustering (and no further compute expense). Finally each cluster also receives a persistence score giving the stability of the cluster over the range of distance scales present in the data. This provides a measure of the relative strength of clusters.

Outlier Detection

The HDBSCAN clusterer objects also support the GLOSH outlier detection algorithm. After fitting the clusterer to data the outlier scores can be accessed via the outlier_scores_ attribute. The result is a vector of score values, one for each data point that was fit. Higher scores represent more outlier like objects. Selecting outliers via upper quantiles is often a good approach.

Based on the paper:

R.J.G.B. Campello, D. Moulavi, A. Zimek and J. Sander Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection, ACM Trans. on Knowledge Discovery from Data, Vol 10, 1 (July 2015), 1-51.

Robust single linkage

The hdbscan package also provides support for the robust single linkage clustering algorithm of Chaudhuri and Dasgupta. As with the HDBSCAN implementation this is a high performance version of the algorithm outperforming scipy’s standard single linkage implementation. The robust single linkage hierarchy is available as an attribute of the robust single linkage clusterer, again with the ability to plot or export the hierarchy, and to extract flat clusterings at a given cut level and gamma value.

Example usage:

import hdbscan
from sklearn.datasets import make_blobs

data, _ = make_blobs(1000)

clusterer = hdbscan.RobustSingleLinkage(cut=0.125, k=7)
cluster_labels = clusterer.fit_predict(data)
hierarchy = clusterer.cluster_hierarchy_
alt_labels = hierarchy.get_clusters(0.100, 5)
hierarchy.plot()
Based on the paper:

K. Chaudhuri and S. Dasgupta. “Rates of convergence for the cluster tree.” In Advances in Neural Information Processing Systems, 2010.

Branch detection

The hdbscan package supports a branch-detection post-processing step by Bot et al.. Cluster shapes, such as branching structures, can reveal interesting patterns that are not expressed in density-based cluster hierarchies. The BranchDetector class mimics the HDBSCAN API and can be used to detect branching hierarchies in clusters. It provides condensed branch hierarchies, branch persistences, and branch memberships and supports joblib’s caching functionality. A notebook demonstrating the BranchDetector is available.

Example usage:

import hdbscan
from sklearn.datasets import make_blobs

data, _ = make_blobs(1000)

clusterer = hdbscan.HDBSCAN(branch_detection_data=True).fit(data)
branch_detector = hdbscan.BranchDetector().fit(clusterer)
branch_detector.cluster_approximation_graph_.plot(edge_width=0.1)
Based on the paper:

D.M. Bot, J. Peeters, J. Liesenborgs and J. Aerts FLASC: a flare-sensitive clustering algorithm. PeerJ Computer Science, Vol 11, April 2025, e2792. https://doi.org/10.7717/peerj-cs.2792.

Installing

Easiest install, if you have Anaconda (thanks to conda-forge which is awesome!):

conda install -c conda-forge hdbscan

PyPI install, presuming you have an up to date pip:

pip install hdbscan

Binary wheels for a number of platforms are available thanks to the work of Ryan Helinski <rlhelinski@gmail.com>.

If pip is having difficulties pulling the dependencies then we’d suggest to first upgrade pip to at least version 10 and try again:

pip install --upgrade pip
pip install hdbscan

Otherwise install the dependencies manually using anaconda followed by pulling hdbscan from pip:

conda install cython
conda install numpy scipy
conda install scikit-learn
pip install hdbscan

For a manual install of the latest code directly from GitHub:

pip install --upgrade git+https://github.com/scikit-learn-contrib/hdbscan.git#egg=hdbscan

Alternatively download the package, install requirements, and manually run the installer:

wget https://github.com/scikit-learn-contrib/hdbscan/archive/master.zip
unzip master.zip
rm master.zip
cd hdbscan-master

pip install -r requirements.txt

python setup.py install

Running the Tests

The package tests can be run after installation using the command:

nosetests -s hdbscan

or, if nose is installed but nosetests is not in your PATH variable:

python -m nose -s hdbscan

If one or more of the tests fail, please report a bug at https://github.com/scikit-learn-contrib/hdbscan/issues/new

Python Version

The hdbscan library supports both Python 2 and Python 3. However we recommend Python 3 as the better option if it is available to you.

Help and Support

For simple issues you can consult the FAQ in the documentation. If your issue is not suitably resolved there, please check the issues on github. Finally, if no solution is available there feel free to open an issue ; the authors will attempt to respond in a reasonably timely fashion.

Contributing

We welcome contributions in any form! Assistance with documentation, particularly expanding tutorials, is always welcome. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

Citing

If you have used this codebase in a scientific publication and wish to cite it, please use the Journal of Open Source Software article.

L. McInnes, J. Healy, S. Astels, hdbscan: Hierarchical density based clustering In: Journal of Open Source Software, The Open Journal, volume 2, number 11. 2017

@article{mcinnes2017hdbscan,
  title={hdbscan: Hierarchical density based clustering},
  author={McInnes, Leland and Healy, John and Astels, Steve},
  journal={The Journal of Open Source Software},
  volume={2},
  number={11},
  pages={205},
  year={2017}
}

To reference the high performance algorithm developed in this library please cite our paper in ICDMW 2017 proceedings.

McInnes L, Healy J. Accelerated Hierarchical Density Based Clustering In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017

@inproceedings{mcinnes2017accelerated,
  title={Accelerated Hierarchical Density Based Clustering},
  author={McInnes, Leland and Healy, John},
  booktitle={Data Mining Workshops (ICDMW), 2017 IEEE International Conference on},
  pages={33--42},
  year={2017},
  organization={IEEE}
}

If you used the branch-detection functionality in this library please cite our PeerJ paper:

Bot DM, Peeters J, Liesenborgs J, Aerts J. FLASC: a flare-sensitive clustering algorithm. In: PeerJ Computer Science, Volume 11, e2792, 2025. https://doi.org/10.7717/peerj-cs.2792

@article{bot2025flasc,
    title   = {{FLASC: a flare-sensitive clustering algorithm}},
    author  = {Bot, Dani{\"{e}}l M. and Peeters, Jannes and Liesenborgs, Jori and Aerts, Jan},
    year    = {2025},
    month   = {apr},
    journal = {PeerJ Comput. Sci.},
    volume  = {11},
    pages   = {e2792},
    issn    = {2376-5992},
    doi     = {10.7717/peerj-cs.2792},
    url     = {https://peerj.com/articles/cs-2792},
}

Licensing

The hdbscan package is 3-clause BSD licensed. Enjoy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdbscan_314-0.8.42.tar.gz (7.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hdbscan_314-0.8.42-cp314-cp314-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.14Windows x86-64

hdbscan_314-0.8.42-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

hdbscan_314-0.8.42-cp314-cp314-macosx_10_15_universal2.whl (2.6 MB view details)

Uploaded CPython 3.14macOS 10.15+ universal2 (ARM64, x86-64)

hdbscan_314-0.8.42-cp313-cp313-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.13Windows x86-64

hdbscan_314-0.8.42-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

hdbscan_314-0.8.42-cp313-cp313-macosx_10_13_universal2.whl (2.6 MB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

hdbscan_314-0.8.42-cp312-cp312-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.12Windows x86-64

hdbscan_314-0.8.42-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

hdbscan_314-0.8.42-cp312-cp312-macosx_10_13_universal2.whl (2.6 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

hdbscan_314-0.8.42-cp311-cp311-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.11Windows x86-64

hdbscan_314-0.8.42-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

hdbscan_314-0.8.42-cp311-cp311-macosx_10_9_universal2.whl (2.6 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

hdbscan_314-0.8.42-cp310-cp310-win_amd64.whl (2.0 MB view details)

Uploaded CPython 3.10Windows x86-64

hdbscan_314-0.8.42-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64manylinux: glibc 2.28+ x86-64

hdbscan_314-0.8.42-cp310-cp310-macosx_10_9_universal2.whl (2.6 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file hdbscan_314-0.8.42.tar.gz.

File metadata

  • Download URL: hdbscan_314-0.8.42.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for hdbscan_314-0.8.42.tar.gz
Algorithm Hash digest
SHA256 d504797dec926e78157ba0d7094c112f292cacb5ac9b7ccdb81ead63c8e65a7a
MD5 87f4f8a0e0974a6680d6f3de9aced916
BLAKE2b-256 008236784782e9b38ff580b4f08a45a7a6296a47af8f00ed8507f382e6da0fa0

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 1b7f2b05734ffe9f2f0de2c5ed87c9e32224ad5b14a26a335a7f10aaf35af713
MD5 97d48a08f9509c0ff01ea53bb9c764b0
BLAKE2b-256 9257173290260b0a1fc39141e6d6176c7d8da8533187fbdef1d8f8d60ec50b46

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f04161b27b18de83ecd7bf9dfdd1652562c315ff842ba23e9651585cde84b567
MD5 de7b9515e7f4597fe9de3182c0f8b6af
BLAKE2b-256 3529366f93fe7d0a6d1a3b79bb321ceeede3b9a685f5a3d5ccd979f1b049747b

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp314-cp314-macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp314-cp314-macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 9a63b1c07ded0fcaa1ce5261d8e94eef60a26109a6d916484685659cbf503d1e
MD5 e1c6f859f1134277a08872e109b52c5c
BLAKE2b-256 446f1b315ed1193be47ce12c0dd3c6477165cd74993e4c3d7e65f4fbb658de89

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f78436dde70a1190ba40394878e1745d08560eb239b1e1d2d3d5e2f597af3a19
MD5 d936acdb4b7924644a826c0a4abb09d9
BLAKE2b-256 4a1fa5eedb7c56761402cbd81732979d02dce0b207e96a775308b0db1969ab1a

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 82d65e5b1e54aa768a24a06dadf9d9018b775d582ace4c2cb428d6b73e7d36cf
MD5 5f1d873616efb19548915309319bcf6c
BLAKE2b-256 a066e79c1f4dbfadf13fb005ea9f8b1c83da3493904953cccae5b0e9aae4503f

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 10d0b1d416dd06256e1d03c159d32e8c0afef1dba010113fc7f35d896d032e1f
MD5 8faefd61cceffde7154e00c97f7dfab4
BLAKE2b-256 1b939b900aa393458a486ab99a4b7462984deb504ac4850219f9d3d4575c4f87

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 75c998d00d58ea6e2aadf91e3c5ec165dc635c2c1fb438a7491f6cc921a480d6
MD5 9f13a4b36194a29cfad0a15ae153d2e5
BLAKE2b-256 881d07df6a8f42e5b545ea9e5dfa9c876d6865c74140b651a9843ac949e9936b

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f4db86ea1472e59218e73abf343ed7a6b9202db2727f5add25584d5615c34e95
MD5 b0a00d207148cc639477dbc54874b4c6
BLAKE2b-256 d3092518c6719d922cda21bdc1555569004e9a910c99361ade788586c350429b

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 7d0600d83c863ee6033b8c799039bac340b70e0c2fdb8487b62b1f421e3f45a7
MD5 28b5f52a46b302863f4177097ebfa1ba
BLAKE2b-256 1ef733e4986ecaac37d941216b59984242978e5df0e75ef65fac8d6899feb8ab

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6bc8a364f510fd41bcdaf3b2f62f869ff6e3ea995a171749be79c26479a76332
MD5 814872407e234ff21cafe137f6a96363
BLAKE2b-256 eac00cf9910483f35a0996c106153fa946f0c829aee16d1a0444b141a3b09319

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c168ededf37d233aa60054f847bcea0948904e90ac8887980d9d7efb71d1972c
MD5 0f474e8676538ea2f7435b6d83e4e458
BLAKE2b-256 6e00e742642fc1801d10c1360663946eb543e4f79864d304db81a13ea8d5d701

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 0e4ad68a78e1ef71d3fba342b38a6a40d01d5c82601f48564835f68fee24ca23
MD5 54fb7b4beb479f95aceafe07ecdd61b7
BLAKE2b-256 db0b2f7da4edcaa2207a32abf63e2de77e66fd8ef2a8a79e4d6dbb64349eed51

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 812e9e8420d8fd287e9233a016468d853c70595b04cd8ec5519485e9b1f30996
MD5 3a9fba9cb28d914ead8e0511a7d66b1f
BLAKE2b-256 d918e5101d83081acbb78c783a34300a56c97770ea4f897769b3b2f37ae3d9a5

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 454df8e53a386b7c8851dea3b9c5f5a1fd257704ce24ab521a036134e12cce52
MD5 c46d64167d4cb86665b7371469a8f59b
BLAKE2b-256 c8328fc475db6315a4937d0eb65a9f9f6a1b8c5833fb4a30d6a28d64e3dd8669

See more details on using hashes here.

File details

Details for the file hdbscan_314-0.8.42-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for hdbscan_314-0.8.42-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 4389f89eed5073441ff48f0ac6d956bbaa0f559ef31a7a41310d854e7f0bd0f2
MD5 482c0f02b25eea333502a6315455a405
BLAKE2b-256 4cb4a444885dd41bc207fbbcf60b2ce5fb03fdff955947b942853e9a9b86b083

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page