Skip to main content

Equilibrium K-Means (EKMeans) clustering algorithms compatible with scikit-learn

Project description

sklekmeans - Equilibrium K-Means for scikit-learn

Unit Tests codecov docs PyPI version Python versions License: BSD-3-Clause

sklekmeans provides batch and mini-batch implementations of the Equilibrium K-Means (EKMeans) clustering algorithm. The method introduces an equilibrium weighting scheme that can yield improved robustness on imbalanced datasets compared to standard k-means. The API is compatible with sklearn estimators.

Features

  • Drop-in scikit-learn compatible estimators: EKMeans, MiniBatchEKMeans, SSEKM, MiniBatchSSEKM (semi-supervised).
  • Supports Euclidean and Manhattan distances.
  • Heuristic alpha selection via alpha='dvariance' (default).
  • Mini-batch variant with accumulation or online update modes.
  • Soft memberships (membership) and equilibrium weights (W_).
  • Semi-supervised learning via a prior matrix (prior_matrix, shape (n_samples, n_clusters)), with supervision strength theta (default theta='auto' = |N|/|S|).

Installation

The package is available on PyPI. Install the base package:

pip install sklekmeans

Optional extras:

  • With numba acceleration (recommended for speed):
pip install "sklekmeans[speed]"

From source (latest main):

  • Basic installation
git clone https://github.com/ydcnanhe/sklearn-ekmeans.git
cd sklearn-ekmeans
pip install .
  • Or in editable mode
pip install -e .
  • With numba acceleration
pip install -e .[speed]
  • Development tools (tests, lint):
pip install -e .[dev]
  • Docs build dependencies:
pip install -e .[docs]
  • Everything (dev + docs + speed):
pip install -e .[all]

Quick Start

from sklekmeans import EKMeans
import numpy as np

X = np.random.rand(200, 2)
ekm = EKMeans(n_clusters=3, random_state=0).fit(X)
print(ekm.cluster_centers_)

Mini-batch variant with multiple initializations and selection of the best run:

from sklekmeans import MiniBatchEKMeans
mb = MiniBatchEKMeans(n_clusters=3, batch_size=256, max_epochs=20, n_init=5, random_state=0)
mb.fit(X)
print(mb.cluster_centers_)

Semi-supervised variant (SSEKM)

Use prior_matrix to inject partial labels or weak supervision. Unlabeled rows are all zeros; labeled rows provide per-class probabilities (e.g., one-hot).

from sklekmeans import SSEKM
import numpy as np

X = np.random.rand(100, 2)
K = 3
prior = np.zeros((X.shape[0], K))
prior[:10, 0] = 1.0  # first 10 samples known to be in class 0

model = SSEKM(n_clusters=K, theta='auto', random_state=0)
model.fit(X, prior_matrix=prior)
print(model.cluster_centers_)

Documentation

The latest HTML documentation is hosted on GitHub Pages:

ydcnanhe.github.io/sklearn-ekmeans

Badges above reflect build status; if the link 404s, wait for the docs CI to finish.

PyPI project page: https://pypi.org/project/sklekmeans/

Build and publish (maintainers)

Local build of artifacts:

python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*

Publishing to PyPI is automated via GitHub Actions (Trusted Publishing). See PUBLISHING.md.

References

  • [1] Y. He. An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data, IEEE Transactions on Fuzzy Systems, 2025.
  • [2] Y. He. Semi-supervised equilibrium K-means for imbalanced data clustering, Knowledge-Based Systems, p.113990, 2025.
  • [3] Y. He. Imbalanced Data Clustering Using Equilibrium K-Means, arXiv, 2024.

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklekmeans-0.2.0.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklekmeans-0.2.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file sklekmeans-0.2.0.tar.gz.

File metadata

  • Download URL: sklekmeans-0.2.0.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sklekmeans-0.2.0.tar.gz
Algorithm Hash digest
SHA256 917b9cdcaa7f9dce2fa1be53ba98cc1f1d2c645d6aea38b7d129234538b3e067
MD5 575eb6f998001f93084e3f9ab4a664ab
BLAKE2b-256 f0336436fa74a619f1e3f79bf593f4a282df00a9b2c956947eab5b0900422e18

See more details on using hashes here.

Provenance

The following attestation bundles were made for sklekmeans-0.2.0.tar.gz:

Publisher: publish.yml on ydcnanhe/sklearn-ekmeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sklekmeans-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: sklekmeans-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sklekmeans-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a69a188d79b8de99ea2e4289ffc4d42248f862faaa8a2f606fa686ae778d511e
MD5 20b428343d3ca86b95fd593fe49638a5
BLAKE2b-256 31c14a725b5b0be69669a7779ea24e866c53c6916f792c5e47a38233aad4fad5

See more details on using hashes here.

Provenance

The following attestation bundles were made for sklekmeans-0.2.0-py3-none-any.whl:

Publisher: publish.yml on ydcnanhe/sklearn-ekmeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page