Skip to main content

A Python library for clustering operations. Evaluation and meta-feature generation.

Project description

The PyClustKit Module: All about clustering in a single Python Module!

The pyclustkit module is built on top of various libraries to enable many clustering operations. Currently, the module is built for clustering evaluation and meta-learning.

Table of Contents

Installation Instructions

The pyclustkit is available to download with pypi

pip install pyclustkit

I

Useful Links

Usage Examples

Calculating Internal Cluster Validity Indices (CVI)

PyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, the module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process tracking.

from pyclustkit.eval import CVIToolbox 

ct = CVIToolbox(X,y)
ct.calculate_icvi(cvi=["dunn", "silhouette"]) # if no CVI are specified it defaults to 'all'.
print(ct.cvi_results)

Meta Learning

Meta-Feature Extraction

PyClustKit comes with an evaluation suite of 46 internal validity indices. Each is implemented on top of numpy and, the module incorporates specific methods for speeding up the execution of multiple CVI by implementing a shared process tracking.

from pyclustkit.eval import CVIToolbox 

ct = CVIToolbox(X,y)
ct.calculate_icvi(cvi=["dunn", "silhouette"]) # if no CVI are specified it defaults to 'all'.
print(ct.cvi_results)

Citing This Work

List of Implemented CVI with citations Currently the collection consists of the following internal CVIs. R does not do gdi 61,62,63 due to hausdorff:
  1. ball_hall: G. H. Ball and D. J. Hall. Isodata: A novel method of data analysis and pattern classification. Menlo Park: Stanford Research Institute. (NTIS No. AD 699616),1965.

  2. banfeld_raftery: J.D. Banfield and A.E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics, 49:803–821, 1993.

  3. c_index: Hubert, Lawrence & Levin, Joel. (1976). A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin. 83. 1072-1080. 10.1037/0033-2909.83.6.1072.

  4. CDbw : Halkidi, M., & Vazirgiannis, M. (2008). A density-based cluster validity approach using multi-representatives. Pattern Recognit. Lett., 29, 773-786.

  5. det_ratio : A. J. Scott and M. J. Symons. Clustering methods based on likelihood ratio criteria. Biometrics, 27:387–397, 1971.

  6. Dunn Index : J. Dunn. Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4:95–104, 1974.

  7. GDI [11,21,31,41,51,61][12,22,32,42,52,62][13,23,33,43,53,63]: J. C. Bezdek and N. R. Pal. Some new indexes of cluster validity. IEEE Transactions on Systems, Man, and CyberneticsÑPART B: CYBERNETICS, 28, no.3:301–315, 1998.

  8. ksq_detw: F. H. B. Marriot. Practical problems in a method of cluster analysis. Biometrics, 27:456–460, 1975.

  9. log_det_ratio: Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.

  10. log_ss_ratio: J. A. Hartigan. Clustering algorithms. New York: Wiley, 1975.

  11. McClain_Rao: J. O. McClain and V. R. Rao. Clustisz: A program to test for the quality of clustering of a set of objects. Journal of Marketing Research, 12:456–460, 1975.

  12. trace_w Index

  13. Friedman-Rudin 1 Index

  14. Friedman-Rudin 2 Index

  15. S_dbw: M. Halkidi and M. Vazirgiannis, "Clustering validity assessment: finding the optimal partitioning of a data set," Proceedings 2001 IEEE International Conference on Data Mining.

  16. sd_dis Index: Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.

  17. sd_scat Index: Halkidi et al. On clustering validation techniques. J. Intell. Inf. Syst., 2001.

  18. pbm: Bandyopadhyay S. Pakhira M. K. and Maulik U. Validity index for crisp and fuzzy clusters. Pattern Recognition, 2004.

  19. ratkowsky_lance

  20. ray_turi: Ray et al. Determination of number of clusters in k-means clustering and application in colour image segmentation. 4th International Conference on Advances in Pattern Recognition and Digital Techniques, 1999.

  21. wemmert_gancarski

  22. xie_beni: X.L. Xie and G. Beni. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991.

  23. banfeld_raftery

  24. trace_wib

  25. log_det_ratio

  26. point_biserial

  27. calinski_harabasz

  28. silhouette

  29. davies_bouldin

  30. scott_symons

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyclust_evl-0.1.0.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyclust_evl-0.1.0-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file pyclust_evl-0.1.0.tar.gz.

File metadata

  • Download URL: pyclust_evl-0.1.0.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for pyclust_evl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 678981305d8fdbdae2f4333bdb1a2f263ed239c64c77f68627f8652d6002dde9
MD5 c773c2e7cd6b7c6c8924715887cb9a41
BLAKE2b-256 dcd4b0ba4712fc323ee7379648fabee8f6fe62ac4976050cca4c1584dfe5fe4c

See more details on using hashes here.

File details

Details for the file pyclust_evl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyclust_evl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for pyclust_evl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24433de6f41346fe55b27229c0b83ea7ed678b3f78d30652151ae3c8438a9be4
MD5 e8cdac33a2e0e9cc51a432f26250420a
BLAKE2b-256 b64b539bc1e8206dcb69df9a75e9c81e832a5796b3ca66463b384a882f91dc3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page