Equilibrium K-Means (EKMeans) clustering algorithms compatible with scikit-learn
Project description
sklekmeans - Equilibrium K-Means for scikit-learn
sklekmeans provides batch and mini-batch implementations of the
Equilibrium K-Means (EKMeans) clustering algorithm. The method introduces
an equilibrium weighting scheme that can yield improved robustness on
imbalanced datasets compared to standard k-means. The API is compatible
with sklearn estimators.
Features
- Drop-in scikit-learn compatible estimators:
EKMeans,MiniBatchEKMeans,SSEKM,MiniBatchSSEKM(semi-supervised). - Supports Euclidean and Manhattan distances.
- Heuristic alpha selection via
alpha='dvariance'(default). - Mini-batch variant with accumulation or online update modes.
- Soft memberships (
membership) and equilibrium weights (W_). - Semi-supervised learning via a prior matrix (
prior_matrix, shape(n_samples, n_clusters)), with supervision strengththeta(defaulttheta='auto' = |N|/|S|).
Installation
The package is available on PyPI. Install the base package:
pip install sklekmeans
Optional extras:
- With numba acceleration (recommended for speed):
pip install "sklekmeans[speed]"
From source (latest main):
- Basic installation
git clone https://github.com/ydcnanhe/sklearn-ekmeans.git
cd sklearn-ekmeans
pip install .
- Or in editable mode
pip install -e .
- With numba acceleration
pip install -e .[speed]
- Development tools (tests, lint):
pip install -e .[dev]
- Docs build dependencies:
pip install -e .[docs]
- Everything (dev + docs + speed):
pip install -e .[all]
Quick Start
from sklekmeans import EKMeans
import numpy as np
X = np.random.rand(200, 2)
ekm = EKMeans(n_clusters=3, random_state=0).fit(X)
print(ekm.cluster_centers_)
Mini-batch variant with multiple initializations and selection of the best run:
from sklekmeans import MiniBatchEKMeans
mb = MiniBatchEKMeans(n_clusters=3, batch_size=256, max_epochs=20, n_init=5, random_state=0)
mb.fit(X)
print(mb.cluster_centers_)
Semi-supervised variant (SSEKM)
Use prior_matrix to inject partial labels or weak supervision. Unlabeled rows are all zeros; labeled rows provide per-class probabilities (e.g., one-hot).
from sklekmeans import SSEKM
import numpy as np
X = np.random.rand(100, 2)
K = 3
prior = np.zeros((X.shape[0], K))
prior[:10, 0] = 1.0 # first 10 samples known to be in class 0
model = SSEKM(n_clusters=K, theta='auto', random_state=0)
model.fit(X, prior_matrix=prior)
print(model.cluster_centers_)
Documentation
The latest HTML documentation is hosted on GitHub Pages:
ydcnanhe.github.io/sklearn-ekmeans
Badges above reflect build status; if the link 404s, wait for the docs CI to finish.
PyPI project page: https://pypi.org/project/sklekmeans/
Build and publish (maintainers)
Local build of artifacts:
python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*
Publishing to PyPI is automated via GitHub Actions (Trusted Publishing). See PUBLISHING.md.
References
- [1] Y. He. An Equilibrium Approach to Clustering: Surpassing Fuzzy C-Means on Imbalanced Data, IEEE Transactions on Fuzzy Systems, 2025.
- [2] Y. He. Semi-supervised equilibrium K-means for imbalanced data clustering, Knowledge-Based Systems, p.113990, 2025.
- [3] Y. He. Imbalanced Data Clustering Using Equilibrium K-Means, arXiv, 2024.
License
BSD 3-Clause
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sklekmeans-0.2.0.tar.gz.
File metadata
- Download URL: sklekmeans-0.2.0.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
917b9cdcaa7f9dce2fa1be53ba98cc1f1d2c645d6aea38b7d129234538b3e067
|
|
| MD5 |
575eb6f998001f93084e3f9ab4a664ab
|
|
| BLAKE2b-256 |
f0336436fa74a619f1e3f79bf593f4a282df00a9b2c956947eab5b0900422e18
|
Provenance
The following attestation bundles were made for sklekmeans-0.2.0.tar.gz:
Publisher:
publish.yml on ydcnanhe/sklearn-ekmeans
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sklekmeans-0.2.0.tar.gz -
Subject digest:
917b9cdcaa7f9dce2fa1be53ba98cc1f1d2c645d6aea38b7d129234538b3e067 - Sigstore transparency entry: 589453454
- Sigstore integration time:
-
Permalink:
ydcnanhe/sklearn-ekmeans@c3cca75e3551c769ee723b3ee76bc3acbb080ff2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ydcnanhe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c3cca75e3551c769ee723b3ee76bc3acbb080ff2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sklekmeans-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sklekmeans-0.2.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a69a188d79b8de99ea2e4289ffc4d42248f862faaa8a2f606fa686ae778d511e
|
|
| MD5 |
20b428343d3ca86b95fd593fe49638a5
|
|
| BLAKE2b-256 |
31c14a725b5b0be69669a7779ea24e866c53c6916f792c5e47a38233aad4fad5
|
Provenance
The following attestation bundles were made for sklekmeans-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on ydcnanhe/sklearn-ekmeans
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sklekmeans-0.2.0-py3-none-any.whl -
Subject digest:
a69a188d79b8de99ea2e4289ffc4d42248f862faaa8a2f606fa686ae778d511e - Sigstore transparency entry: 589453497
- Sigstore integration time:
-
Permalink:
ydcnanhe/sklearn-ekmeans@c3cca75e3551c769ee723b3ee76bc3acbb080ff2 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ydcnanhe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c3cca75e3551c769ee723b3ee76bc3acbb080ff2 -
Trigger Event:
release
-
Statement type: