Skip to main content

High-performance K-Means clustering library with Python bindings

Project description

Kentro Python Bindings

Python bindings for the Kentro high-performance K-Means clustering library, implemented in Rust using PyO3.

Features

  • Identical API: The Python bindings expose the exact same API as the Rust library
  • High Performance: Leverages Rust's performance with Python's ease of use
  • NumPy Integration: Seamless integration with NumPy arrays
  • Method Chaining: Fluent API for easy configuration
  • Comprehensive Error Handling: Proper Python exceptions for all error conditions

Installation

Prerequisites

  • Python 3.8 or higher
  • Rust toolchain (if building from source)
  • NumPy

Install from PyPI (when available)

pip install kentro

Build from Source

  1. Install Rust (if not already installed):

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source ~/.cargo/env
    
  2. Install maturin:

    pip install maturin[patchelf]
    
  3. Build and install:

    # Development build
    maturin develop --features python
    
    # Or production build
    maturin build --release --features python
    pip install target/wheels/kentro-*.whl
    

Version Management

The Python package version is automatically synchronized with the Rust crate version defined in Cargo.toml. This ensures that both the Rust library and Python bindings always have the same version number.

  • Single Source of Truth: Version is defined only in Cargo.toml
  • Automatic Synchronization: Python package version is extracted from Cargo.toml during build
  • Runtime Access: Python version is available via kentro.__version__
import kentro
print(f"Kentro version: {kentro.__version__}")

Quick Start

import numpy as np
from kentro import KMeans

# Create sample data
data = np.random.rand(100, 2).astype(np.float32)

# Create and train K-Means
kmeans = KMeans(n_clusters=3)
clusters = kmeans.train(data, num_threads=None)

print(f"Found {len(clusters)} clusters")
print(f"Cluster sizes: {[len(c) for c in clusters]}")

API Reference

KMeans Class

Constructor

KMeans(n_clusters: int)

Create a new K-Means instance.

Parameters:

  • n_clusters: Number of clusters (must be positive)

Raises:

  • ValueError: If n_clusters is 0

Configuration Methods (Method Chaining)

with_iterations(iters: int) -> KMeans

Set the number of iterations (default: 25).

with_euclidean(euclidean: bool) -> KMeans

Use Euclidean distance instead of cosine similarity (default: False).

with_balanced(balanced: bool) -> KMeans

Enable balanced K-Means clustering (default: False).

with_max_balance_diff(max_balance_diff: int) -> KMeans

Set maximum balance difference for balanced clustering (default: 16).

with_verbose(verbose: bool) -> KMeans

Enable verbose output (default: False).

with_use_medoids(use_medoids: bool) -> KMeans

Enable K-medoids clustering (default: False).

Training and Prediction

train(data: np.ndarray, num_threads: Optional[int] = None) -> List[List[int]]

Perform K-Means clustering on the provided data.

Parameters:

  • data: Data matrix (n_points × n_dimensions) as float32
  • num_threads: Number of threads to use (None for automatic)

Returns:

  • List of lists where each inner list contains indices of points assigned to the corresponding cluster

Raises:

  • ValueError: If already trained, insufficient points, or dimension mismatch
assign(data: np.ndarray, k: int) -> List[List[int]]

Assign data points to their k nearest clusters.

Parameters:

  • data: Data matrix (n_points × n_dimensions) as float32
  • k: Number of nearest clusters to assign each point to

Returns:

  • List of lists where each inner list contains indices of points assigned to the corresponding cluster

Raises:

  • ValueError: If not trained, k is 0, or dimension mismatch

Properties

n_clusters: int                    # Number of clusters
iterations: int                    # Number of iterations
is_euclidean: bool                # Whether using Euclidean distance
is_balanced: bool                 # Whether using balanced clustering
is_use_medoids: bool              # Whether using medoids clustering
centroids: Optional[np.ndarray]   # Cluster centroids (n_clusters × n_dimensions)
medoid_indices: Optional[List[int]] # Medoid point indices (if using medoids)
is_trained: bool                  # Whether model has been trained

Examples

Basic K-Means

import numpy as np
from kentro import KMeans

# Create sample data
np.random.seed(42)
data = np.random.rand(100, 2).astype(np.float32)

# Create and train K-Means
kmeans = KMeans(n_clusters=3)
clusters = kmeans.train(data, num_threads=None)

print(f"Found {len(clusters)} clusters")
print(f"Cluster sizes: {[len(c) for c in clusters]}")
print(f"Centroids:\n{kmeans.centroids}")

Method Chaining Configuration

kmeans = KMeans(n_clusters=3) \
    .with_iterations(50) \
    .with_euclidean(True) \
    .with_verbose(True)

clusters = kmeans.train(data, num_threads=None)

K-Medoids Clustering

# K-medoids finds actual data points as cluster centers
kmeans = KMeans(n_clusters=3) \
    .with_use_medoids(True) \
    .with_euclidean(True)

clusters = kmeans.train(data, num_threads=None)

# Get medoid indices
medoid_indices = kmeans.medoid_indices
if medoid_indices:
    print(f"Medoid indices: {medoid_indices}")
    print("Medoid points:")
    for i, idx in enumerate(medoid_indices):
        print(f"  Cluster {i}: {data[idx]}")

Balanced K-Means

# Balanced K-Means ensures clusters have similar sizes
kmeans = KMeans(n_clusters=3) \
    .with_balanced(True) \
    .with_max_balance_diff(5)

clusters = kmeans.train(data, num_threads=None)
print(f"Balanced cluster sizes: {[len(c) for c in clusters]}")

Cluster Assignment

# Train on training data
train_data = np.random.rand(100, 2).astype(np.float32)
kmeans = KMeans(n_clusters=3)
kmeans.train(train_data, num_threads=None)

# Assign new data to clusters
test_data = np.random.rand(20, 2).astype(np.float32)
assignments = kmeans.assign(test_data, k=1)

print("Assignment results:")
for i, cluster_points in enumerate(assignments):
    if cluster_points:
        print(f"  Cluster {i}: {cluster_points}")

Euclidean vs Cosine Similarity

# Cosine similarity (default) - good for high-dimensional data
kmeans_cosine = KMeans(n_clusters=3)
clusters_cosine = kmeans_cosine.train(data, num_threads=None)

# Euclidean distance - good for low-dimensional data
kmeans_euclidean = KMeans(n_clusters=3).with_euclidean(True)
clusters_euclidean = kmeans_euclidean.train(data, num_threads=None)

Error Handling

The Python bindings provide proper error handling with descriptive error messages:

try:
    # This will raise ValueError
    kmeans = KMeans(n_clusters=0)
except ValueError as e:
    print(f"Error: {e}")

try:
    # This will raise ValueError if not enough data
    kmeans = KMeans(n_clusters=10)
    small_data = np.random.rand(5, 2).astype(np.float32)
    kmeans.train(small_data, num_threads=None)
except ValueError as e:
    print(f"Error: {e}")

Testing

Run the test suite:

python test_python_bindings.py

Run the comprehensive example:

python examples/python_example.py

Performance Notes

  • Always use float32 NumPy arrays for optimal performance
  • For large datasets, consider using num_threads parameter to control parallelization
  • K-medoids is slower than standard K-means but provides actual data points as cluster centers
  • Balanced K-means adds computational overhead but ensures more even cluster sizes

Comparison with Rust API

The Python bindings provide an identical API to the Rust library:

Rust Python
KMeans::new(n_clusters) KMeans(n_clusters)
with_iterations(25) with_iterations(25)
with_euclidean(true) with_euclidean(True)
train(data.view(), None) train(data, num_threads=None)
assign(data.view(), k) assign(data, k)
centroids() centroids (property)
medoid_indices() medoid_indices (property)

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kentro-0.2.3-cp312-cp312-win_amd64.whl (274.0 kB view details)

Uploaded CPython 3.12Windows x86-64

kentro-0.2.3-cp312-cp312-macosx_11_0_arm64.whl (384.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

kentro-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl (393.6 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

kentro-0.2.3-cp311-cp311-win_amd64.whl (274.4 kB view details)

Uploaded CPython 3.11Windows x86-64

kentro-0.2.3-cp311-cp311-macosx_11_0_arm64.whl (387.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

kentro-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl (395.9 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

kentro-0.2.3-cp310-cp310-win_amd64.whl (274.1 kB view details)

Uploaded CPython 3.10Windows x86-64

kentro-0.2.3-cp310-cp310-macosx_11_0_arm64.whl (387.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

kentro-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl (395.6 kB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

kentro-0.2.3-cp39-cp39-win_amd64.whl (274.7 kB view details)

Uploaded CPython 3.9Windows x86-64

kentro-0.2.3-cp39-cp39-macosx_11_0_arm64.whl (388.2 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

kentro-0.2.3-cp39-cp39-macosx_10_12_x86_64.whl (395.9 kB view details)

Uploaded CPython 3.9macOS 10.12+ x86-64

kentro-0.2.3-cp38-cp38-win_amd64.whl (274.4 kB view details)

Uploaded CPython 3.8Windows x86-64

kentro-0.2.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (443.3 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

kentro-0.2.3-cp38-cp38-macosx_11_0_arm64.whl (388.0 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

kentro-0.2.3-cp38-cp38-macosx_10_12_x86_64.whl (395.9 kB view details)

Uploaded CPython 3.8macOS 10.12+ x86-64

File details

Details for the file kentro-0.2.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: kentro-0.2.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 274.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for kentro-0.2.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2cdf11664e6d553c099696bf6b5c3de124fd4626735c4e1b246d5267a7c3bf80
MD5 e6087815f725181ba0c6d7fb5bbbb457
BLAKE2b-256 66545316e5c8af4985e1c622535264437796426477cf6162a8624287f9d06c4a

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba0cb66c6247c8f343c88ce16903c309216665908acccb6c9ce0a4dfa57305e8
MD5 e63ba72c4f241750d2f7b8236993bcbc
BLAKE2b-256 be61ca0bbe27eed6083556204028b996cc7466b2b08e2a1b787a3bb126f18477

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d31f1593741f3fe6330dbe62cbbe6c80af02acfc0b9be0afe195987702abc504
MD5 2fd7b9d093c0cc88b254d817747217f4
BLAKE2b-256 6925a2bad893232fdf0c29fb2acb787ff65bbec74b2fda281ab66f1b216b8b99

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: kentro-0.2.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 274.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for kentro-0.2.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0f1ccac55e036741a332bd952df9818d181159b7fc33a339582a484f70e26c9e
MD5 404db7b3f54c6d1931fc723c29725a7b
BLAKE2b-256 bc911be1d9d31b4ca8efd84e703e2ef09cad2aca058de09706e18a39161909ad

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4ddd8e2575b7ed8723d6ffe571d0e3cb09abccde2aa297e48a517cf43ec4f3bd
MD5 8a73e080b7a2e662358081040e267f62
BLAKE2b-256 fabc9bd3bec3d95ac8baa4659f7a77c5ce4612bf57540ae3cc9966e8b1c822f3

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9145a24651b888308ac7cfacc845e5d5a7bcf821bbd07e6fe0ce892ab7b4dd05
MD5 9edf9805c7fbc4da282c1cb93d4a7fb1
BLAKE2b-256 39a900ce6e43d8e7784f49940d01a61478813baac8922670984faefb9976129c

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: kentro-0.2.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 274.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for kentro-0.2.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 95193c1e09d5d9838be6726c0f495751f64f6c0e0d75f2db6e439fc22767dce4
MD5 261cc246603a2b731b098e7cc788608e
BLAKE2b-256 cc36d307c2f70f2ca70b2007e746c276cf6f9bd292a8313db08b2ce87b0a025c

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6515b241c81f73360821d63be06c9a8baaf0d37b9587df34c07623c20b9419e7
MD5 a347c63a47c079ecd5db556070fbdb93
BLAKE2b-256 7e3f17451ebd91c48a492c39c4a040d707c4bb36b4d886b9638bff1253788437

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9151ab349b29fcba30d1b7814880f9d12765d0cd2bad903cf4045b68db4b7acc
MD5 8267a0eb486ed8626a29fa9051e14d9f
BLAKE2b-256 dcbb6e514e122e152e700bc52098275c93554f66e5e4a3518533c6bd96c7a7e9

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: kentro-0.2.3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 274.7 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for kentro-0.2.3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 5dce4c9218941a4c0e0eb17f7f600e3cbec13a3ebbed90cda9db381cad39d8a4
MD5 a1d4d716df8c57dea075fdcc781fd686
BLAKE2b-256 8aecb083f25cf882f021dd8d3cf0a6da23eb831a0edf45a16508a5c2da3f9d67

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b900081ac4ab6350727395b315698d6bf78c0c9c36dc951eec3ab608573604d0
MD5 6b6027861f09bb49e5e752f83a138b6d
BLAKE2b-256 e272ff349ea8400412e1d773fdb7f0db66ff3942433c3dd299beef827fd4cceb

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp39-cp39-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp39-cp39-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5bba2d4278faa638e0a8e25520bcbc79a1508bc79347a0c5e13da47d0d0eb2fd
MD5 384edf162e2d8e4cb79c0fb9e826d4da
BLAKE2b-256 fcbc577714b58bfe36a61c61149dbf9559356e293f74f022c68956743b16a312

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: kentro-0.2.3-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 274.4 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for kentro-0.2.3-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 69f59a141ef8162256d7276b5052894413eb33ef5c83c07cb197df069894d404
MD5 d5372c9f7d34ccd3d7fe3682356f63c3
BLAKE2b-256 101c39e4a1b007f0541ce7fbc500b077a09c1cf754229a038114bc239845d187

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1ba7c9e1ed5de1fd1bb767aae68ba308661c352deec50118c1900f6b568c96c6
MD5 83df185f9ffc66115bb8d020846877e3
BLAKE2b-256 75faba5290b537ae449ed592b46082cfa39a64bab1117f807e0f9d6dd72a9f9d

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 121dd2f4040b82238ef8515bda60a3e546ef73d7061df4021b9a98887705086d
MD5 09f9c9f78965392c4563859b0c750624
BLAKE2b-256 d0f90440accca1f88462c20bfd16834e60546ed319635acf330c4d9ca506a047

See more details on using hashes here.

File details

Details for the file kentro-0.2.3-cp38-cp38-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for kentro-0.2.3-cp38-cp38-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a9fd6a370f1bc869670a6dbd5977119fe98f560f7401fc5dd4b859111df0c803
MD5 2c2afd79b4a055f58e10fba4041b02fc
BLAKE2b-256 888e6c48fc864a7fa6b8e6dbb37deca38b81beb561262f07a7b545b511106162

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page