A 2D clustering algorithms visualization package
Project description
ClustViz
2D Clustering Algorithms Visualization
Check out ClustVizGUI, too!
The aim of ClustViz
is to visualize every step of each clustering algorithm, in the case of 2D input data.
The following algorithms have been examined:
- OPTICS
- DBSCAN
- HDBSCAN
- SPECTRAL CLUSTERING
- HIERARCHICAL AGGLOMERATIVE CLUSTERING
- single linkage
- complete linkage
- average linkage
- Ward's method
- CURE
- BIRCH
- PAM
- CLARA
- CLARANS
- CHAMELEON
- CHAMELEON2
- DENCLUE
Instructions
Documentation: click here
Install with
pip install clustviz
To run BIRCH algorithm, the open source visualization software Graphviz is required. Install Graphviz from the official webpage (https://graphviz.gitlab.io/download/) or using HomeBrew, then modify the PATH variable as follows (replace the string according to the path where you installed Graphviz):
import os
# on Windows usually
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin'
# on MacOS usually
os.environ["PATH"] += os.pathsep + '/usr/local/bin'
To run CHAMELEON and CHAMELEON2 algorithms, the METIS library is required. To install it on MacOS, execute the following commands (partially taken from here):
# download the file using wget (do it from the website if you prefer)
wget http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz
# uncompress it
gunzip metis-5.1.0.tar.gz
# untar it
tar -xvf metis-5.1.0.tar
# remove the tar
rm metis-5.1.0.tar
# go inside the folder
cd metis-5.1.0
# install it using make
make config shared=1
make install
# export the dll
export METIS_DLL=/usr/local/lib/libmetis.dylib
To install METIS on Windows, go to conda-metis and follow the instructions.
Usage
Let's see a basic example using OPTICS:
from clustviz.optics import OPTICS, plot_clust
from sklearn.datasets import make_blobs
# create a random dataset
X, y = make_blobs(n_samples=30, centers=4, n_features=2, cluster_std=1.8, random_state=42)
# perform OPTICS algorithm, with plotting enabled
ClustDist, CoreDist = OPTICS(X, eps=2, minPTS=3, plot=True, plot_reach=True)
# plot the final clusters
plot_clust(X, ClustDist, CoreDist, eps=2, eps_db=1.9)
For many other examples, take a look at the detailed clustviz_example notebook.
Repository structure
-
The folder
data/DOCUMENTS
contains all the official papers, powerpoint presentations and other PDFs regarding all the algorithms involved and clustering in general. -
The folder
clustviz
contains the scripts necessary to run the clustering algorithms. -
The notebook
data/clustviz_example.ipynb
lets the user run every algorithm on 2D datasets; it contains a subsection for every algorithm, with the necessary modules and functions imported and some commented lines of code which can be uncommented to run the algorithms. -
The folder
docs
contains the necessary files to build the documentation using Sphinx and ReadTheDocs. -
The folder
tests
contains pytest tests.
Credits for some algorithms
I did not start to write the scripts for each algorithm from scratch; in some cases I modified some Python libraries, in other cases I took some publicly available GitHub repositories and modified the scripts contained there. The following list provides all the sources used when I did not write all the code by myself:
- HDBSCAN https://hdbscan.readthedocs.io/en/latest/
- SPECTRAL CLUSTERING http://dx.doi.org/10.1007/s11222-007-9033-z
- BIRCH https://github.com/annoviko/pyclustering/blob/master/pyclustering/cluster/birch.py
- PAM https://github.com/SachinKalsi/kmedoids/blob/master/KMedoids.py
- CLARA https://github.com/akalino/Clustering/blob/master/clara.py
- CLARANS https://github.com/annoviko/pyclustering/blob/master/pyclustering/cluster/clarans.py
- CHAMELEON https://github.com/Moonpuck/chameleon_cluster
The other algorithms have been implemented from scratch following the relative papers. Thanks to Darius (https://github.com/dariomonici), the GUI Meister, for the help with PyQt5, used for ClustVizGUI.
Possible improvements
- add more clustering algorithms
- comment every code block and improve code quality
- pymetis doesnt work on Windows, but could be an option for MacOS
- add highlights to docstrings using ``
- show aliases typehints using Sphinx (open issue)
TravisCI path
- if Travis CI doesn't trigger, it is probably because
.travis.yml
isn't properly formatted. Useyamllint
to correct it - add package update
- for the deployment phase: brew install ruby, brew install travis
- added empty conftest.py in clustviz folder for tests in windows version
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file clustviz-0.0.6b0.tar.gz
.
File metadata
- Download URL: clustviz-0.0.6b0.tar.gz
- Upload date:
- Size: 51.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d102547d1fbe3a9bf31d9288d73ea07202bf9530cb0e0d6051239d4e17239212 |
|
MD5 | 3e7bc511a998edeb2a68c86383356fd2 |
|
BLAKE2b-256 | cf3bfd3f085283cead466864efb7de86247b7d35d5578b7ec69790f44600bc32 |
File details
Details for the file clustviz-0.0.6b0-py3-none-any.whl
.
File metadata
- Download URL: clustviz-0.0.6b0-py3-none-any.whl
- Upload date:
- Size: 57.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a355fdd634ef1378d49b0c78e2a616d7e85e7a92b33fbd3715f2d567f0332d0 |
|
MD5 | e792efa5708a66ea184e3c122c8d172f |
|
BLAKE2b-256 | 06daace20f9ab3cf15122911780d2ced75dc2f1f722c1446d66b240fb79a28b6 |