Skip to main content

UMAP + HDBSCAN for numeric and/or categorical variables

Project description

QuickClus

QuickClus is a Python module for clustering categorical and numerical data using UMAP and HDBSCAN. QuickClus allows incorporating numerical and categorical values (even with null values) into the clustering, in a simple and fast way. The imputation of null values, the scaling and transformation of numerical variables, and the combination of categorical variables are performed automatically.

Installation

python3 -m pip install QuickClus

Usage

QuickClus requires a Pandas dataframe as input, which may contain numeric, categorical, or both types of variables. In the case of null values, QuickClus takes care of the imputation and subsequent scaling of all the features. All this process is done automatically under the hood. It is also possible to automatically optimize the algorithm using optuna, calling tune_model(). Finally, QuickClus provides a summary of the characteristics of each cluster.

from quickclus import QuickClus
clf = QuickClus(
    umap_combine_method = "intersection_union_mapper",
)

clf.fit(df)

print(clf.hdbscan_.labels_)

clf.tune_model()

results = clf.assing_results(df)

clf.cluster_summary(results)

Documentation

https://quickclus.readthedocs.io/

Examples

Notebooks with examples of use

References

@article{mcinnes2018umap-software,
  title={UMAP: Uniform Manifold Approximation and Projection},
  author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},
  journal={The Journal of Open Source Software},
  volume={3},
  number={29},
  pages={861},
  year={2018}
}
@article{mcinnes2017hdbscan,
  title={hdbscan: Hierarchical density based clustering},
  author={McInnes, Leland and Healy, John and Astels, Steve},
  journal={The Journal of Open Source Software},
  volume={2},
  number={11},
  pages={205},
  year={2017}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

QuickClus-0.2.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

QuickClus-0.2.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file QuickClus-0.2.0.tar.gz.

File metadata

  • Download URL: QuickClus-0.2.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for QuickClus-0.2.0.tar.gz
Algorithm Hash digest
SHA256 87d92a38b536c59d796648ef8be6e825df0ca82a14d9effe0870f68b88896ebb
MD5 1877026664a59181b462763d135098fe
BLAKE2b-256 723f8ff118c67da189dc9e4b82dd52302557bf85eafc0b87f43d169705b6a033

See more details on using hashes here.

File details

Details for the file QuickClus-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: QuickClus-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for QuickClus-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 913789bd7ef9a62506f56fa83bf7d9d9e093b1d5df624b5142c59a31e2d4892e
MD5 c28593260caa3ba30eecd1e9b77fcf4e
BLAKE2b-256 0ca59017c186dfd5ef8ef0e330e7844b317e211aab55b327fc233a85ef32628d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page