Skip to main content

This is the package for DenMune Clustering Algorithm published in paper https://doi.org/10.1016/j.patcog.2020.107589

Project description

DenMune: A density-peak clustering algorithm

DenMune a clustering algorithm that can find clusters of arbitrary size, shapes and densities in two-dimensions. Higher dimensions are first reduced to 2-D using the t-sne. The algorithm relies on a single parameter K (the number of nearest neighbors). The results show the superiority of the algorithm. Enjoy the simplicity but the power of DenMune.

PyPI Version Launch notebook examples in Binder read the documentation Launch notebook examples in Colaboratory, Google Research Launch notebook examples in Kaggle, the workspace where data scientist meet Elsevier, journal's article publisher Research datasets at  Mendeley BSD 3-Clause “New” or “Revised” License" CircleCI, continuous integration

Based on the paper

Paper Journal
Mohamed Abbas, Adel El-Zoghabi, Amin Ahoukry, scimagojr
DenMune: Density peak based clustering using mutual nearest neighbors
In: Journal of Pattern Recognition, Elsevier,
volume 109, number 107589, January 2021
DOI: https://doi.org/10.1016/j.patcog.2020.107589

Documentation:

Documentation, including tutorials, are available on https://docs.zerobytes.one/denmune

read the documentation

Watch it in action

This 30 seconds will tell you how a density-baased algorithm, DenMune propagates:

Propagation in DenMune

How to install DenMune

Simply install DenMune clustering algorithm using pip command from the official Python repository

PyPI Version

From the shell run the command

pip install denmune

From jupyter notebook cell run the command

!pip install denmune

How to use DenMune

Once DenMune is installed, you just need to import it

from denmune import DenMune
Please note that first denmune (the package) in small letters, while the other one(the class itself) has D and M in capital case.

Interact with the algorithm

chameleon datasets

This notebook allows you interact with the algorithm in many asspects:

  • you can choose which dataset to cluster (among 4 chameleon datasets)
  • you can decide which number of k-nearest neighbor to use
  • show noise on/off; thus you can invesitigate noise detected by the algorithm
  • show analyzer on/off

How to run and test

  1. Launch Examples in Repo2Dpcker Binder

    Simply use our repo2docker offered by mybinder.org, which encapsulate the algorithm and all required data in one virtual machine instance. All jupter notebooks examples found in this repository will be also available to you in action to practice in this respo2docer. Thanks mybinder.org, you made it possible!

    Launch notebook examples in Binder

  2. Launch each Example in Kaggle workspace

    If you are a kaggler like me, then Kaggle, the best workspace where data scientist meet, should fit you to test the algorithm with great experince.

    Dataset Kaggle URL
    Non-groundtruth datasets Non-groundtruth datasets
    2D Shape datasets 2D Shapes dataset
    MNIST dataset MNIST dataset
    Iris dataset iris dataset
    The beayty of propagation The beayty of propagation
  3. Launch each Example in Google Research, CoLab

    Need to test examples one by one, then here another option. Use colab offered by google research to test each example individually.

    Here is a list of Google CoLab URL to use the algorithm interactively

    Dataset CoLab URL
    Chameleon datasets Chameleon dataset
    2D Shape datasets 2D Shapes dataset
    MNIST dataset MNIST dataset
    Non-groundtruth datasets Non-groundtruth datasets

How to cite

If you have used this codebase in a scientific publication and wish to cite it, please use the Journal of Pattern Recognition article

Mohamed Abbas McInnes, Adel El-Zoghaby, Amin Ahoukry, *DenMune: Density peak based clustering using mutual nearest neighbors*
In: Journal of Pattern Recognition, Elsevier, volume 109, number 107589.
January 2021
@article{ABBAS2021107589,
title = {DenMune: Density peak based clustering using mutual nearest neighbors},
journal = {Pattern Recognition},
volume = {109},
pages = {107589},
year = {2021},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2020.107589},
url = {https://www.sciencedirect.com/science/article/pii/S0031320320303927},
author = {Mohamed Abbas and Adel El-Zoghabi and Amin Shoukry},
keywords = {Clustering, Mutual neighbors, Dimensionality reduction, Arbitrary shapes, Pattern recognition, Nearest neighbors, Density peak},
abstract = {Many clustering algorithms fail when clusters are of arbitrary shapes, of varying densities, or the data classes are unbalanced and close to each other, even in two dimensions. A novel clustering algorithm “DenMune” is presented to meet this challenge. It is based on identifying dense regions using mutual nearest neighborhoods of size K, where K is the only parameter required from the user, besides obeying the mutual nearest neighbor consistency principle. The algorithm is stable for a wide range of values of K. Moreover, it is able to automatically detect and remove noise from the clustering process as well as detecting the target clusters. It produces robust results on various low and high dimensional datasets relative to several known state of the art clustering algorithms.}
}

Licensing

The DenMune algorithm is 3-clause BSD licensed. Enjoy.

BSD 3-Clause “New” or “Revised” License"

Task List

  • Update Github with the DenMune sourcode
  • create repo2docker repository
  • Create pip Package
  • create CoLab shared examples
  • create documentation
  • create Kaggle shared examples
  • create conda package

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

denmune-0.0.8.9.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

denmune-0.0.8.9-py3-none-any.whl (13.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page