Skip to main content

UMAP + HDBSCAN for numeric and/or categorical variables

Project description

QuickClus

QuickClus is a Python module for clustering categorical and numerical data using UMAP and HDBSCAN. QuickClus allows incorporating numerical and categorical values (even with null values) into the clustering, in a simple and fast way. The imputation of null values, the scaling and transformation of numerical variables, and the combination of categorical variables are performed automatically.

Installation

python3 -m pip install QuickClus

Usage

QuickClus requires a Pandas dataframe as input, which may contain numeric, categorical, or both types of variables. In the case of null values, QuickClus takes care of the imputation and subsequent scaling of all the features. All this process is done automatically under the hood. It is also possible to automatically optimize the algorithm using optuna, calling tune_model(). Finally, QuickClus provides a summary of the characteristics of each cluster.

from quickclus import QuickClus
clf = QuickClus(
    umap_combine_method = "intersection_union_mapper",
)

clf.fit(df)

print(clf.hdbscan_.labels_)

clf.tune_model()

results = clf.assing_results(df)

clf.cluster_summary(results)

Examples

TO DO

References

@article{mcinnes2018umap-software,
  title={UMAP: Uniform Manifold Approximation and Projection},
  author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},
  journal={The Journal of Open Source Software},
  volume={3},
  number={29},
  pages={861},
  year={2018}
}
@article{mcinnes2017hdbscan,
  title={hdbscan: Hierarchical density based clustering},
  author={McInnes, Leland and Healy, John and Astels, Steve},
  journal={The Journal of Open Source Software},
  volume={2},
  number={11},
  pages={205},
  year={2017}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

QuickClus-0.0.2.tar.gz (407.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

QuickClus-0.0.2-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file QuickClus-0.0.2.tar.gz.

File metadata

  • Download URL: QuickClus-0.0.2.tar.gz
  • Upload date:
  • Size: 407.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for QuickClus-0.0.2.tar.gz
Algorithm Hash digest
SHA256 47728dc4fbe1b34e0a6154ea421c7dbc3f6b798f9d3affc96128c1416113b101
MD5 48bb7abb78d7ff423c040f59a1313bb9
BLAKE2b-256 f866d7765992915439df3273bbd37735a0227721bc563231492ebd87f2eb7d51

See more details on using hashes here.

File details

Details for the file QuickClus-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: QuickClus-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.22.0 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/18.0.1 rfc3986/2.0.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for QuickClus-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fe079bc0d6f15cdb4052de06a222082be6c72d030ff65c7007f2d9437c2defb4
MD5 4d17cce134a830aa9255c00636c0c2e4
BLAKE2b-256 e02cba99828a3253197fbc68be1f4d168327b90d990043e41e12d24113ccd8e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page