Skip to main content

A scikit-learn compatible implementation of CLARANS clustering algorithm

Project description

scikit-clarans

A scikit-learn compatible implementation of the CLARANS (Clustering Large Applications based on RANdomized Search) algorithm.

License DOI Python 3.8+ Docs Build Test Suite Quality Check PyPI version Open In Colab

CLARANS acts as a bridge between the high quality of PAM (Partition Around Medoids) and the speed required for large datasets. By using randomized search instead of exhaustive search, it finds high-quality medoids efficiently without exploring the entire graph of solutions.


Features

  • Scikit-Learn Native: Use it just like KMeans or DBSCAN. Drop-in compatibility for pipelines and cross-validation.
  • Scalable: Designed to handle datasets where standard PAM/k-medoids is too slow.
  • Flexible: Choose from multiple initialization strategies (k-medoids++, build, etc.) and distance metrics (euclidean, manhattan, cosine, etc.).

Installation

Install simply via pip:

pip install scikit-clarans

Or install from source:

pip install .

For development

pip install -e ".[dev]"

Quick Start

CLARANS

from clarans import CLARANS
from sklearn.datasets import make_blobs

# 1. Create dummy data
X, _ = make_blobs(n_samples=1000, centers=5, random_state=42)

# 2. Initialize CLARANS
#    - n_clusters: 5 clusters
#    - numlocal: 3 restarts for better quality
#    - init: 'k-medoids++' for smart starting points
clarans = CLARANS(n_clusters=5, numlocal=3, init='k-medoids++', random_state=42)

# 3. Fit
clarans.fit(X)

# 4. Results
print("Medoid Indices:", clarans.medoid_indices_)
print("Labels:", clarans.labels_)

FastCLARANS

FastCLARANS implements the faster variant from Schubert & Rousseeuw (2021). It evaluates swaps with all k medoids simultaneously using FastPAM1 delta formulas, exploring k edges of the search graph in the time CLARANS explores one:

from clarans import FastCLARANS

# FastCLARANS computes distances on-the-fly (memory efficient)
# and samples 2.5% of non-medoid points per iteration
fast_model = FastCLARANS(n_clusters=5, numlocal=3, random_state=42)
fast_model.fit(X)

Key differences from CLARANS:

  • Samples only non-medoid candidates (not medoid-candidate pairs)
  • Evaluates swap with all k medoids at once (O(k) speedup per evaluation)
  • Memory efficient: O(n) instead of O(n²)

Examples

This repository includes a number of runnable examples in the examples/ folder showing common usage patterns, integrations and a Jupyter notebook (examples/clarans_examples.ipynb) with many interactive recipes. Run any example with::

python examples/01_quick_start.py

Documentation

For full API reference and usage guides, please see the Documentation.

Contributing

Contributions are welcome! Please check out CONTRIBUTING.md for guidelines.

Citation

If you use scikit-clarans in your research, please cite:

@software{scikit_clarans,
  author       = {Nguyen, Ngoc Thien},
  title        = {scikit-clarans: A Python Library for CLARANS Clustering},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.18366801},
  url          = {https://github.com/ThienNguyen3001/scikit-clarans}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_clarans-0.2.1.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scikit_clarans-0.2.1-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file scikit_clarans-0.2.1.tar.gz.

File metadata

  • Download URL: scikit_clarans-0.2.1.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scikit_clarans-0.2.1.tar.gz
Algorithm Hash digest
SHA256 027c88194aff409b289e18539e032d278441fd5565fb07da57c14f4fc6d27b10
MD5 c14a331ab1021e325f86be0d7f075e09
BLAKE2b-256 7ec6888bc5f38233cab3de6121ba70c6c3609943e47a43d5013bb82da516fa36

See more details on using hashes here.

Provenance

The following attestation bundles were made for scikit_clarans-0.2.1.tar.gz:

Publisher: pypi-publish.yml on ThienNguyen3001/scikit-clarans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scikit_clarans-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: scikit_clarans-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scikit_clarans-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e379fe7b3fa8d7ff73f75230ef1eb1acd85152915b5f8b3b26c1a5e9a2a90692
MD5 d97eef97062414f338e856d8414b0060
BLAKE2b-256 4b962ce1322ac37567efde77e16b609dd1252abe6e3849835d0b8b650bd7e5c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for scikit_clarans-0.2.1-py3-none-any.whl:

Publisher: pypi-publish.yml on ThienNguyen3001/scikit-clarans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page