High-performance K-Means clustering library with Python bindings
Project description
Kentro Python Bindings
Python bindings for the Kentro high-performance K-Means clustering library, implemented in Rust using PyO3.
Features
- Identical API: The Python bindings expose the exact same API as the Rust library
- High Performance: Leverages Rust's performance with Python's ease of use
- NumPy Integration: Seamless integration with NumPy arrays
- Method Chaining: Fluent API for easy configuration
- Comprehensive Error Handling: Proper Python exceptions for all error conditions
Installation
Prerequisites
- Python 3.8 or higher
- Rust toolchain (if building from source)
- NumPy
Install from PyPI (when available)
pip install kentro
Build from Source
-
Install Rust (if not already installed):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source ~/.cargo/env
-
Install maturin:
pip install maturin[patchelf]
-
Build and install:
# Development build maturin develop --features python # Or production build maturin build --release --features python pip install target/wheels/kentro-*.whl
Version Management
The Python package version is automatically synchronized with the Rust crate version defined in Cargo.toml. This ensures that both the Rust library and Python bindings always have the same version number.
- Single Source of Truth: Version is defined only in
Cargo.toml - Automatic Synchronization: Python package version is extracted from
Cargo.tomlduring build - Runtime Access: Python version is available via
kentro.__version__
import kentro
print(f"Kentro version: {kentro.__version__}")
Quick Start
import numpy as np
from kentro import KMeans
# Create sample data
data = np.random.rand(100, 2).astype(np.float32)
# Create and train K-Means
kmeans = KMeans(n_clusters=3)
clusters = kmeans.train(data, num_threads=None)
print(f"Found {len(clusters)} clusters")
print(f"Cluster sizes: {[len(c) for c in clusters]}")
API Reference
KMeans Class
Constructor
KMeans(n_clusters: int)
Create a new K-Means instance.
Parameters:
n_clusters: Number of clusters (must be positive)
Raises:
ValueError: If n_clusters is 0
Configuration Methods (Method Chaining)
with_iterations(iters: int) -> KMeans
Set the number of iterations (default: 25).
with_euclidean(euclidean: bool) -> KMeans
Use Euclidean distance instead of cosine similarity (default: False).
with_balanced(balanced: bool) -> KMeans
Enable balanced K-Means clustering (default: False).
with_max_balance_diff(max_balance_diff: int) -> KMeans
Set maximum balance difference for balanced clustering (default: 16).
with_verbose(verbose: bool) -> KMeans
Enable verbose output (default: False).
with_use_medoids(use_medoids: bool) -> KMeans
Enable K-medoids clustering (default: False).
Training and Prediction
train(data: np.ndarray, num_threads: Optional[int] = None) -> List[List[int]]
Perform K-Means clustering on the provided data.
Parameters:
data: Data matrix (n_points × n_dimensions) as float32num_threads: Number of threads to use (None for automatic)
Returns:
- List of lists where each inner list contains indices of points assigned to the corresponding cluster
Raises:
ValueError: If already trained, insufficient points, or dimension mismatch
assign(data: np.ndarray, k: int) -> List[List[int]]
Assign data points to their k nearest clusters.
Parameters:
data: Data matrix (n_points × n_dimensions) as float32k: Number of nearest clusters to assign each point to
Returns:
- List of lists where each inner list contains indices of points assigned to the corresponding cluster
Raises:
ValueError: If not trained, k is 0, or dimension mismatch
Properties
n_clusters: int # Number of clusters
iterations: int # Number of iterations
is_euclidean: bool # Whether using Euclidean distance
is_balanced: bool # Whether using balanced clustering
is_use_medoids: bool # Whether using medoids clustering
centroids: Optional[np.ndarray] # Cluster centroids (n_clusters × n_dimensions)
medoid_indices: Optional[List[int]] # Medoid point indices (if using medoids)
is_trained: bool # Whether model has been trained
Examples
Basic K-Means
import numpy as np
from kentro import KMeans
# Create sample data
np.random.seed(42)
data = np.random.rand(100, 2).astype(np.float32)
# Create and train K-Means
kmeans = KMeans(n_clusters=3)
clusters = kmeans.train(data, num_threads=None)
print(f"Found {len(clusters)} clusters")
print(f"Cluster sizes: {[len(c) for c in clusters]}")
print(f"Centroids:\n{kmeans.centroids}")
Method Chaining Configuration
kmeans = KMeans(n_clusters=3) \
.with_iterations(50) \
.with_euclidean(True) \
.with_verbose(True)
clusters = kmeans.train(data, num_threads=None)
K-Medoids Clustering
# K-medoids finds actual data points as cluster centers
kmeans = KMeans(n_clusters=3) \
.with_use_medoids(True) \
.with_euclidean(True)
clusters = kmeans.train(data, num_threads=None)
# Get medoid indices
medoid_indices = kmeans.medoid_indices
if medoid_indices:
print(f"Medoid indices: {medoid_indices}")
print("Medoid points:")
for i, idx in enumerate(medoid_indices):
print(f" Cluster {i}: {data[idx]}")
Balanced K-Means
# Balanced K-Means ensures clusters have similar sizes
kmeans = KMeans(n_clusters=3) \
.with_balanced(True) \
.with_max_balance_diff(5)
clusters = kmeans.train(data, num_threads=None)
print(f"Balanced cluster sizes: {[len(c) for c in clusters]}")
Cluster Assignment
# Train on training data
train_data = np.random.rand(100, 2).astype(np.float32)
kmeans = KMeans(n_clusters=3)
kmeans.train(train_data, num_threads=None)
# Assign new data to clusters
test_data = np.random.rand(20, 2).astype(np.float32)
assignments = kmeans.assign(test_data, k=1)
print("Assignment results:")
for i, cluster_points in enumerate(assignments):
if cluster_points:
print(f" Cluster {i}: {cluster_points}")
Euclidean vs Cosine Similarity
# Cosine similarity (default) - good for high-dimensional data
kmeans_cosine = KMeans(n_clusters=3)
clusters_cosine = kmeans_cosine.train(data, num_threads=None)
# Euclidean distance - good for low-dimensional data
kmeans_euclidean = KMeans(n_clusters=3).with_euclidean(True)
clusters_euclidean = kmeans_euclidean.train(data, num_threads=None)
Error Handling
The Python bindings provide proper error handling with descriptive error messages:
try:
# This will raise ValueError
kmeans = KMeans(n_clusters=0)
except ValueError as e:
print(f"Error: {e}")
try:
# This will raise ValueError if not enough data
kmeans = KMeans(n_clusters=10)
small_data = np.random.rand(5, 2).astype(np.float32)
kmeans.train(small_data, num_threads=None)
except ValueError as e:
print(f"Error: {e}")
Testing
Run the test suite:
python test_python_bindings.py
Run the comprehensive example:
python examples/python_example.py
Performance Notes
- Always use
float32NumPy arrays for optimal performance - For large datasets, consider using
num_threadsparameter to control parallelization - K-medoids is slower than standard K-means but provides actual data points as cluster centers
- Balanced K-means adds computational overhead but ensures more even cluster sizes
Comparison with Rust API
The Python bindings provide an identical API to the Rust library:
| Rust | Python |
|---|---|
KMeans::new(n_clusters) |
KMeans(n_clusters) |
with_iterations(25) |
with_iterations(25) |
with_euclidean(true) |
with_euclidean(True) |
train(data.view(), None) |
train(data, num_threads=None) |
assign(data.view(), k) |
assign(data, k) |
centroids() |
centroids (property) |
medoid_indices() |
medoid_indices (property) |
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kentro-0.2.3-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: kentro-0.2.3-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 274.0 kB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cdf11664e6d553c099696bf6b5c3de124fd4626735c4e1b246d5267a7c3bf80
|
|
| MD5 |
e6087815f725181ba0c6d7fb5bbbb457
|
|
| BLAKE2b-256 |
66545316e5c8af4985e1c622535264437796426477cf6162a8624287f9d06c4a
|
File details
Details for the file kentro-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: kentro-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 384.5 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba0cb66c6247c8f343c88ce16903c309216665908acccb6c9ce0a4dfa57305e8
|
|
| MD5 |
e63ba72c4f241750d2f7b8236993bcbc
|
|
| BLAKE2b-256 |
be61ca0bbe27eed6083556204028b996cc7466b2b08e2a1b787a3bb126f18477
|
File details
Details for the file kentro-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: kentro-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 393.6 kB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d31f1593741f3fe6330dbe62cbbe6c80af02acfc0b9be0afe195987702abc504
|
|
| MD5 |
2fd7b9d093c0cc88b254d817747217f4
|
|
| BLAKE2b-256 |
6925a2bad893232fdf0c29fb2acb787ff65bbec74b2fda281ab66f1b216b8b99
|
File details
Details for the file kentro-0.2.3-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: kentro-0.2.3-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 274.4 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f1ccac55e036741a332bd952df9818d181159b7fc33a339582a484f70e26c9e
|
|
| MD5 |
404db7b3f54c6d1931fc723c29725a7b
|
|
| BLAKE2b-256 |
bc911be1d9d31b4ca8efd84e703e2ef09cad2aca058de09706e18a39161909ad
|
File details
Details for the file kentro-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: kentro-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 387.5 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ddd8e2575b7ed8723d6ffe571d0e3cb09abccde2aa297e48a517cf43ec4f3bd
|
|
| MD5 |
8a73e080b7a2e662358081040e267f62
|
|
| BLAKE2b-256 |
fabc9bd3bec3d95ac8baa4659f7a77c5ce4612bf57540ae3cc9966e8b1c822f3
|
File details
Details for the file kentro-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: kentro-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 395.9 kB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9145a24651b888308ac7cfacc845e5d5a7bcf821bbd07e6fe0ce892ab7b4dd05
|
|
| MD5 |
9edf9805c7fbc4da282c1cb93d4a7fb1
|
|
| BLAKE2b-256 |
39a900ce6e43d8e7784f49940d01a61478813baac8922670984faefb9976129c
|
File details
Details for the file kentro-0.2.3-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: kentro-0.2.3-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 274.1 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95193c1e09d5d9838be6726c0f495751f64f6c0e0d75f2db6e439fc22767dce4
|
|
| MD5 |
261cc246603a2b731b098e7cc788608e
|
|
| BLAKE2b-256 |
cc36d307c2f70f2ca70b2007e746c276cf6f9bd292a8313db08b2ce87b0a025c
|
File details
Details for the file kentro-0.2.3-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: kentro-0.2.3-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 387.8 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6515b241c81f73360821d63be06c9a8baaf0d37b9587df34c07623c20b9419e7
|
|
| MD5 |
a347c63a47c079ecd5db556070fbdb93
|
|
| BLAKE2b-256 |
7e3f17451ebd91c48a492c39c4a040d707c4bb36b4d886b9638bff1253788437
|
File details
Details for the file kentro-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl.
File metadata
- Download URL: kentro-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl
- Upload date:
- Size: 395.6 kB
- Tags: CPython 3.10, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9151ab349b29fcba30d1b7814880f9d12765d0cd2bad903cf4045b68db4b7acc
|
|
| MD5 |
8267a0eb486ed8626a29fa9051e14d9f
|
|
| BLAKE2b-256 |
dcbb6e514e122e152e700bc52098275c93554f66e5e4a3518533c6bd96c7a7e9
|
File details
Details for the file kentro-0.2.3-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: kentro-0.2.3-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 274.7 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dce4c9218941a4c0e0eb17f7f600e3cbec13a3ebbed90cda9db381cad39d8a4
|
|
| MD5 |
a1d4d716df8c57dea075fdcc781fd686
|
|
| BLAKE2b-256 |
8aecb083f25cf882f021dd8d3cf0a6da23eb831a0edf45a16508a5c2da3f9d67
|
File details
Details for the file kentro-0.2.3-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: kentro-0.2.3-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 388.2 kB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b900081ac4ab6350727395b315698d6bf78c0c9c36dc951eec3ab608573604d0
|
|
| MD5 |
6b6027861f09bb49e5e752f83a138b6d
|
|
| BLAKE2b-256 |
e272ff349ea8400412e1d773fdb7f0db66ff3942433c3dd299beef827fd4cceb
|
File details
Details for the file kentro-0.2.3-cp39-cp39-macosx_10_12_x86_64.whl.
File metadata
- Download URL: kentro-0.2.3-cp39-cp39-macosx_10_12_x86_64.whl
- Upload date:
- Size: 395.9 kB
- Tags: CPython 3.9, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bba2d4278faa638e0a8e25520bcbc79a1508bc79347a0c5e13da47d0d0eb2fd
|
|
| MD5 |
384edf162e2d8e4cb79c0fb9e826d4da
|
|
| BLAKE2b-256 |
fcbc577714b58bfe36a61c61149dbf9559356e293f74f022c68956743b16a312
|
File details
Details for the file kentro-0.2.3-cp38-cp38-win_amd64.whl.
File metadata
- Download URL: kentro-0.2.3-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 274.4 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69f59a141ef8162256d7276b5052894413eb33ef5c83c07cb197df069894d404
|
|
| MD5 |
d5372c9f7d34ccd3d7fe3682356f63c3
|
|
| BLAKE2b-256 |
101c39e4a1b007f0541ce7fbc500b077a09c1cf754229a038114bc239845d187
|
File details
Details for the file kentro-0.2.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: kentro-0.2.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 443.3 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ba7c9e1ed5de1fd1bb767aae68ba308661c352deec50118c1900f6b568c96c6
|
|
| MD5 |
83df185f9ffc66115bb8d020846877e3
|
|
| BLAKE2b-256 |
75faba5290b537ae449ed592b46082cfa39a64bab1117f807e0f9d6dd72a9f9d
|
File details
Details for the file kentro-0.2.3-cp38-cp38-macosx_11_0_arm64.whl.
File metadata
- Download URL: kentro-0.2.3-cp38-cp38-macosx_11_0_arm64.whl
- Upload date:
- Size: 388.0 kB
- Tags: CPython 3.8, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
121dd2f4040b82238ef8515bda60a3e546ef73d7061df4021b9a98887705086d
|
|
| MD5 |
09f9c9f78965392c4563859b0c750624
|
|
| BLAKE2b-256 |
d0f90440accca1f88462c20bfd16834e60546ed319635acf330c4d9ca506a047
|
File details
Details for the file kentro-0.2.3-cp38-cp38-macosx_10_12_x86_64.whl.
File metadata
- Download URL: kentro-0.2.3-cp38-cp38-macosx_10_12_x86_64.whl
- Upload date:
- Size: 395.9 kB
- Tags: CPython 3.8, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9fd6a370f1bc869670a6dbd5977119fe98f560f7401fc5dd4b859111df0c803
|
|
| MD5 |
2c2afd79b4a055f58e10fba4041b02fc
|
|
| BLAKE2b-256 |
888e6c48fc864a7fa6b8e6dbb37deca38b81beb561262f07a7b545b511106162
|