UMAP + HDBSCAN for numeric and/or categorical variables
Project description
QuickClus
QuickClus is a Python module for clustering categorical and numerical data using UMAP and HDBSCAN. QuickClus allows incorporating numerical and categorical values (even with null values) into the clustering, in a simple and fast way. The imputation of null values, the scaling and transformation of numerical variables, and the combination of categorical variables are performed automatically.
Installation
python3 -m pip install QuickClus
Usage
QuickClus requires a Pandas dataframe as input, which may contain numeric, categorical, or both types of variables. In the case of null values, QuickClus takes care of the imputation and subsequent scaling of all the features. All this process is done automatically under the hood. It is also possible to automatically optimize the algorithm using optuna, calling tune_model(). Finally, QuickClus provides a summary of the characteristics of each cluster.
from quickclus import QuickClus
clf = QuickClus(
umap_combine_method = "intersection_union_mapper",
)
clf.fit(df)
print(clf.hdbscan_.labels_)
clf.tune_model()
results = clf.assing_results(df)
clf.cluster_summary(results)
Documentation
https://quickclus.readthedocs.io/
Examples
Notebooks with examples of use
References
@article{mcinnes2018umap-software,
title={UMAP: Uniform Manifold Approximation and Projection},
author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},
journal={The Journal of Open Source Software},
volume={3},
number={29},
pages={861},
year={2018}
}
@article{mcinnes2017hdbscan,
title={hdbscan: Hierarchical density based clustering},
author={McInnes, Leland and Healy, John and Astels, Steve},
journal={The Journal of Open Source Software},
volume={2},
number={11},
pages={205},
year={2017}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file QuickClus-0.2.0.tar.gz
.
File metadata
- Download URL: QuickClus-0.2.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87d92a38b536c59d796648ef8be6e825df0ca82a14d9effe0870f68b88896ebb |
|
MD5 | 1877026664a59181b462763d135098fe |
|
BLAKE2b-256 | 723f8ff118c67da189dc9e4b82dd52302557bf85eafc0b87f43d169705b6a033 |
File details
Details for the file QuickClus-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: QuickClus-0.2.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 913789bd7ef9a62506f56fa83bf7d9d9e093b1d5df624b5142c59a31e2d4892e |
|
MD5 | c28593260caa3ba30eecd1e9b77fcf4e |
|
BLAKE2b-256 | 0ca59017c186dfd5ef8ef0e330e7844b317e211aab55b327fc233a85ef32628d |