Skip to main content

Efficient implementation of DBCV with a k-dimensional tree

Project description

k-DBCV

k-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014).

Getting Started

Dependencies

  • SciPy
  • NumPy

Installation

k-DBCV can be installed via pip:

pip install kDBCV

Usage

To score clustering scenarios, the following libraries are used:

  • scikit-learn
  • ClustSim

For visualization:

  • matplotlib

DBCV Score

Simple Scenario

The half moons dataset simulated from scikit-learn is shown:

DBCV_Score(X,labels)

Output: 0.5068928345037831

Scenario II

A larger dataset of clusters simulated with Clust_Sim-SMLM is shown:

score = DBCV_score(X,labels)

Output: 0.6171526846848352

Extracting Individual Cluster Scores

k-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise: Individual Cluster Score = separation-sparseness/max(separation,sparseness)

By default, ind_clust_scores is set to False

score, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)

Individual cluster scores are displayed by color below:

Memory cutoff

A memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).

score = DBCV_score(X,labels, memory_cutoff = 25.0)

Relevant Citations

Density Based Cluster Validation

Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839–847 (2014)

k-DBCV implementation

Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)

License

k-DBCV is licensed with an MIT license. See LICENSE file for more information.

Referencing

In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):

Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)

Contact

kaufmangroup.rubylab@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kdbcv-1.0.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kDBCV-1.0.0-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file kdbcv-1.0.0.tar.gz.

File metadata

  • Download URL: kdbcv-1.0.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for kdbcv-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3b0d66e103f935eb11008e90bb6d52b5d7e7a0bbed1f69a3e44f050d5144a3e2
MD5 4d0cde9ea2bb2c5919f4a0820ddc2107
BLAKE2b-256 e7f7fb4cd6b293b6f3cb61a2c800bfb89a8dc387dc79545e48a1de0ff6cf5313

See more details on using hashes here.

File details

Details for the file kDBCV-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kDBCV-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for kDBCV-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3059216e871d9578e93ed19b8a97fd3cca263b9ae600b656030b27583122d2c0
MD5 298a41210fbee5ef98d976c5d710c8a9
BLAKE2b-256 47ccf312b478ca3839a1847213e33818ffac57ebd8f992dbd7f840f57e66e8d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page