Fast and explainable clustering based on sorting
Project description
======= CLASSIX
.. image:: https://codecov.io/gh/nla-group/classix/branch/master/graph/badge.svg?token=D4MQZS67H1 :target: https://codecov.io/gh/nla-group/classix :alt: codecov .. image:: https://img.shields.io/pypi/v/ClassixClustering?color=orange :target: https://pypi.org/project/ClassixClustering/ :alt: pypi .. image:: https://static.pepy.tech/badge/ClassixClustering :target: https://pypi.org/project/ClassixClustering/ :alt: Download Status .. image:: https://readthedocs.org/projects/classix/badge/?version=latest :target: https://classix.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status .. image:: https://img.shields.io/badge/License-MIT-yellow.svg :target: https://github.com/nla-group/classix/blob/master/LICENSE :alt: License: MIT
CLASSIX is a contrived acronym of CLustering by Aggregation with Sorting-based Indexing and the letter X for explainability. CLASSIX clustering consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by a merging phase of groups into clusters. The algorithm is controlled by two parameters, namely the distance parameter radius for the group aggregation and a minPts parameter controlling the minimal cluster size.
CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highlights:
- Ability to cluster low and high-dimensional data of arbitrary shape efficiently.
- Ability to detect and deal with outliers in the data.
- Ability to provide textual explanations for the generated clusters.
- Full reproducibility of all tests in the accompanying paper.
- Support of Cython compilation.
Installing and example
CLASSIX has the following dependencies for its clustering functionality:
- cython
- numpy>=1.20.0
- scipy>=1.2.1
- requests
and requires the following packages for data visualization:
- matplotlib
- pandas
To install the current CLASSIX release via PIP use:
.. code:: bash
pip install ClassixClustering
To check the CLASSIX installation you can use:
.. code:: bash
python -m pip show ClassixClustering
Download the repository via:
.. code:: bash
git clone https://github.com/nla-group/classix.git
Example usage:
.. code:: python
from sklearn import datasets
from classix import CLASSIX
# Generate synthetic data
X, y = datasets.make_blobs(n_samples=2000000, centers=4, n_features=10, random_state=1)
# Employ CLASSIX clustering
clx = CLASSIX(sorting='pca', verbose=1)
clx.fit(X)
Citation
.. code:: bibtex
@techreport{CG22b,
title = {Fast and explainable clustering based on sorting},
author = {Chen, Xinye and G\"{u}ttel, Stefan},
year = {2022},
number = {arXiv:2202.01456},
pages = {25},
institution = {The University of Manchester},
address = {UK},
type = {arXiv EPrint},
url = {https://arxiv.org/abs/2202.01456}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.