Skip to main content

A scikit-learn compatible implementation of CLARANS clustering algorithm

Project description

scikit-clarans

A scikit-learn compatible implementation of the CLARANS (Clustering Large Applications based on RANdomized Search) algorithm.

License DOI Python 3.8+ Docs Build Test Suite Quality Check PyPI version Open In Colab

CLARANS acts as a bridge between the high quality of PAM (Partition Around Medoids) and the speed required for large datasets. By using randomized search instead of exhaustive search, it finds high-quality medoids efficiently without exploring the entire graph of solutions.


Features

  • Scikit-Learn Native: Use it just like KMeans or DBSCAN. Drop-in compatibility for pipelines and cross-validation.
  • Scalable: Designed to handle datasets where standard PAM/k-medoids is too slow.
  • Flexible: Choose from multiple initialization strategies (k-medoids++, build, etc.) and distance metrics (euclidean, manhattan, cosine, etc.).

Installation

Install simply via pip:

pip install scikit-clarans

Or install from source:

pip install .

For development

pip install -e ".[dev]"

Quick Start

CLARANS

from clarans import CLARANS
from sklearn.datasets import make_blobs

# 1. Create dummy data
X, _ = make_blobs(n_samples=1000, centers=5, random_state=42)

# 2. Initialize CLARANS
#    - n_clusters: 5 clusters
#    - numlocal: 3 restarts for better quality
#    - init: 'k-medoids++' for smart starting points
clarans = CLARANS(n_clusters=5, numlocal=3, init='k-medoids++', random_state=42)

# 3. Fit
clarans.fit(X)

# 4. Results
print("Medoid Indices:", clarans.medoid_indices_)
print("Labels:", clarans.labels_)

FastCLARANS

FastCLARANS implements the faster variant from Schubert & Rousseeuw (2021). It evaluates swaps with all k medoids simultaneously using FastPAM1 delta formulas, exploring k edges of the search graph in the time CLARANS explores one:

from clarans import FastCLARANS

# FastCLARANS computes distances on-the-fly (memory efficient)
# and samples 2.5% of non-medoid points per iteration
fast_model = FastCLARANS(n_clusters=5, numlocal=3, random_state=42)
fast_model.fit(X)

Key differences from CLARANS:

  • Samples only non-medoid candidates (not medoid-candidate pairs)
  • Evaluates swap with all k medoids at once (O(k) speedup per evaluation)
  • Memory efficient: O(n) instead of O(n²)

Examples

This repository includes a number of runnable examples in the examples/ folder showing common usage patterns, integrations and a Jupyter notebook (examples/clarans_examples.ipynb) with many interactive recipes. Run any example with::

python examples/01_quick_start.py

Documentation

For full API reference and usage guides, please see the Documentation.

Contributing

Contributions are welcome! Please check out CONTRIBUTING.md for guidelines.

Citation

If you use scikit-clarans in your research, please cite:

@software{scikit_clarans,
  author       = {Nguyen, Ngoc Thien},
  title        = {scikit-clarans: A Python Library for CLARANS Clustering},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18366801},
  url          = {https://github.com/ThienNguyen3001/scikit-clarans}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_clarans-0.2.0.tar.gz (34.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikit_clarans-0.2.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file scikit_clarans-0.2.0.tar.gz.

File metadata

  • Download URL: scikit_clarans-0.2.0.tar.gz
  • Upload date:
  • Size: 34.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scikit_clarans-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b4f35c12ddc9c7302d42bfbb9270a31a6b0bcf0313d3a74c2e5322507fd1ed59
MD5 62d78b816743a2c84e3820bb63c0de6a
BLAKE2b-256 97803864703515420a18bdaa8f4f9b616b367fb55d63a02b5325223fa3d759ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for scikit_clarans-0.2.0.tar.gz:

Publisher: pypi-publish.yml on ThienNguyen3001/scikit-clarans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scikit_clarans-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: scikit_clarans-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scikit_clarans-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf5a435fecaa47a4e028ec2279241ec5f6e054601c6856bdf648d58d7463e520
MD5 9d5ec9dd8402bef217f7b07ad866de16
BLAKE2b-256 7b3259fa964e149e7419a579124334052fea8a636a9d090501bb75fa42171abe

See more details on using hashes here.

Provenance

The following attestation bundles were made for scikit_clarans-0.2.0-py3-none-any.whl:

Publisher: pypi-publish.yml on ThienNguyen3001/scikit-clarans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page