Skip to main content

Equilibrium K-Means (EKMeans) clustering algorithms compatible with scikit-learn

Project description

sklekmeans - Equilibrium K-Means for scikit-learn

Unit Tests codecov docs PyPI version Python versions License: BSD-3-Clause

sklekmeans provides batch and mini-batch implementations of the Equilibrium K-Means (EKMeans) clustering algorithm. The method introduces an equilibrium weighting scheme that can yield improved robustness on imbalanced datasets compared to standard k-means. The API is compatible with sklearn estimators.

Features

  • Drop-in scikit-learn compatible estimators: EKMeans, MiniBatchEKMeans, SSEKM, MiniBatchSSEKM (semi-supervised).
  • Supports Euclidean and Manhattan distances.
  • Heuristic alpha selection via alpha='dvariance' (default).
  • Mini-batch variant with accumulation or online update modes.
  • Soft memberships (membership) and equilibrium weights (W_).
  • Semi-supervised learning via a prior matrix (prior_matrix, shape (n_samples, n_clusters)), with supervision strength theta (default theta='auto' = |N|/|S|).

Installation

The package is available on PyPI. Install the base package:

pip install sklekmeans

Optional extras:

  • With numba acceleration (recommended for speed):
pip install "sklekmeans[speed]"

From source (latest main):

  • Basic installation
git clone https://github.com/ydcnanhe/sklearn-ekmeans.git
cd sklearn-ekmeans
pip install .
  • Or in editable mode
pip install -e .
  • With numba acceleration
pip install -e .[speed]
  • Development tools (tests, lint):
pip install -e .[dev]
  • Docs build dependencies:
pip install -e .[docs]
  • Everything (dev + docs + speed):
pip install -e .[all]

Quick Start

from sklekmeans import EKMeans
import numpy as np

X = np.random.rand(200, 2)
ekm = EKMeans(n_clusters=3, random_state=0).fit(X)
print(ekm.cluster_centers_)

Mini-batch variant with multiple initializations and selection of the best run:

from sklekmeans import MiniBatchEKMeans
mb = MiniBatchEKMeans(n_clusters=3, batch_size=256, max_epochs=20, n_init=5, random_state=0)
mb.fit(X)
print(mb.cluster_centers_)

Semi-supervised variant (SSEKM)

Use prior_matrix to inject partial labels or weak supervision. Unlabeled rows are all zeros; labeled rows provide per-class probabilities (e.g., one-hot).

from sklekmeans import SSEKM
import numpy as np

X = np.random.rand(100, 2)
K = 3
prior = np.zeros((X.shape[0], K))
prior[:10, 0] = 1.0  # first 10 samples known to be in class 0

model = SSEKM(n_clusters=K, theta='auto', random_state=0)
model.fit(X, prior_matrix=prior)
print(model.cluster_centers_)

Documentation

The latest HTML documentation is hosted on GitHub Pages:

ydcnanhe.github.io/sklearn-ekmeans

Badges above reflect build status; if the link 404s, wait for the docs CI to finish.

PyPI project page: https://pypi.org/project/sklekmeans/

Build and publish (maintainers)

Local build of artifacts:

python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*

Publishing to PyPI is automated via GitHub Actions (Trusted Publishing). See PUBLISHING.md.

References

  • [1] Y. He. An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data, IEEE Transactions on Fuzzy Systems, 2025.
  • [2] Y. He. Semi-supervised equilibrium K-means for imbalanced data clustering, Knowledge-Based Systems, p.113990, 2025.
  • [3] Y. He. Imbalanced Data Clustering Using Equilibrium K-Means, arXiv, 2024.

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklekmeans-0.2.1.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklekmeans-0.2.1-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file sklekmeans-0.2.1.tar.gz.

File metadata

  • Download URL: sklekmeans-0.2.1.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sklekmeans-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d5908a38e1b0418a536f8511c7387b92d5e8a92d6a58f896ebe4eb47a73ee9c4
MD5 e814ac9c54115292808864cccbbc7308
BLAKE2b-256 6fcfd6802f4f18536fe1fa3a89a177d315df5585066388e682e10decf8768d9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sklekmeans-0.2.1.tar.gz:

Publisher: publish.yml on ydcnanhe/sklearn-ekmeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sklekmeans-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sklekmeans-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sklekmeans-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0bf4b08efaa43917fa053a24cda17fab2dc0a930f0c4bf06dc30d882c93e0ebb
MD5 0ba169f779d31b8b5f35854e4d46df27
BLAKE2b-256 31c688736855a8a1952dae53c1a181fbe9801f3c7c00fe8e6e7ae3db074a1669

See more details on using hashes here.

Provenance

The following attestation bundles were made for sklekmeans-0.2.1-py3-none-any.whl:

Publisher: publish.yml on ydcnanhe/sklearn-ekmeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page