Skip to main content

Fast and explainable clustering based on sorting

Project description

codecov pypi Download Status Documentation Status License: MIT

CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highlights:

  • Ability to cluster low and high-dimensional data of arbitrary shape efficiently.

  • Ability to detect and deal with outliers in the data.

  • Ability to provide textual explanations for the generated clusters.

  • Full reproducibility of all tests in the accompanying paper.

  • Support of Cython compilation.

CLASSIX is a contrived acronym of CLustering by Aggregation with Sorting-based Indexing and the letter X for explainability. CLASSIX clustering consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by a merging phase of groups into clusters. The algorithm is controlled by two parameters, namely the distance parameter radius for the group aggregation and a minPts parameter controlling the minimal cluster size.

Installing and example

CLASSIX has the following dependencies for its clustering functionality:

  • cython

  • numpy

  • scipy

  • requests

and requires the following packages for data visualization:

  • matplotlib

  • pandas

To install the current CLASSIX release via PIP use:

pip install classixclustering

To check the CLASSIX installation you can use:

python -m pip show classixclustering

Download the repository via:

git clone https://github.com/nla-group/classix.git

Example usage:

from sklearn import datasets
from classix import CLASSIX

# Generate synthetic data
X, y = datasets.make_blobs(n_samples=2000000, centers=4, n_features=10, random_state=1)

# Employ CLASSIX clustering
clx = CLASSIX(sorting='pca', verbose=1)
clx.fit(X)

Citation

@techreport{CG22b,
  title   = {Fast and explainable clustering based on sorting},
  author  = {Chen, Xinye and G\"{u}ttel, Stefan},
  year    = {2022},
  number  = {arXiv:2202.01456},
  pages   = {25},
  institution = {The University of Manchester},
  address = {UK},
  type    = {arXiv EPrint},
  url     = {https://arxiv.org/abs/2202.01456}
}

Project details


Release history Release notifications | RSS feed

This version

1.2.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classixclustering-1.2.7.tar.gz (32.4 kB view details)

Uploaded Source

File details

Details for the file classixclustering-1.2.7.tar.gz.

File metadata

  • Download URL: classixclustering-1.2.7.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for classixclustering-1.2.7.tar.gz
Algorithm Hash digest
SHA256 c144de8750f7d8ec8b2d88150a94b3919ce69e0f0900c2346d35f86b558c8265
MD5 23d776dd1e70803e680ee797df2b30aa
BLAKE2b-256 8690b23f7646fe7d2fab0f8599f0e92856a19265751e418faa2ac3b45c8706e2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page