Skip to main content

A Rust-backed Python clustering library

Project description

rustcluster

Fast, Rust-backed clustering for Python. Five algorithms, sklearn-compatible API, 1.3-6.4x faster K-means.

Highlights

  • 5 algorithms: KMeans (Lloyd + Hamerly), MiniBatchKMeans, DBSCAN, HDBSCAN, AgglomerativeClustering
  • 3 distance metrics: euclidean, cosine, manhattan
  • 3 evaluation metrics: silhouette score, Calinski-Harabasz, Davies-Bouldin
  • KD-tree acceleration for DBSCAN/HDBSCAN neighbor queries (10-200x on low-d data)
  • Native f32/f64 — no silent upcast, doubles cache efficiency with f32
  • Pickle serialization for all fitted models
  • GIL released during all compute — plays well with threads and async
  • 320 tests across Rust and Python

Installation

pip install rustcluster

Or from source (requires Rust toolchain + Python 3.9+):

pip install maturin
git clone https://github.com/mfbaig35r/rustcluster.git
cd rustcluster
maturin develop --release

Quickstart

K-Means

from rustcluster import KMeans

model = KMeans(n_clusters=3, random_state=42)
model.fit(X)
model.labels_           # cluster assignments
model.cluster_centers_  # centroids (k x d)
model.inertia_          # sum of squared distances
model.predict(X_new)    # assign new data

Mini-Batch K-Means

from rustcluster import MiniBatchKMeans

model = MiniBatchKMeans(n_clusters=3, batch_size=256, random_state=42)
model.fit(X_large)      # scales to large datasets

DBSCAN

from rustcluster import DBSCAN

model = DBSCAN(eps=0.5, min_samples=5)
model.fit(X)
model.labels_                # -1 for noise
model.core_sample_indices_   # core point indices

HDBSCAN

from rustcluster import HDBSCAN

model = HDBSCAN(min_cluster_size=5)
model.fit(X)
model.labels_              # -1 for noise
model.probabilities_       # soft membership [0, 1]
model.cluster_persistence_ # per-cluster stability

Agglomerative Clustering

from rustcluster import AgglomerativeClustering

model = AgglomerativeClustering(n_clusters=3, linkage="ward")
model.fit(X)
model.labels_     # cluster assignments
model.children_   # merge history
model.distances_  # distance at each merge

Evaluation Metrics

from rustcluster import silhouette_score, calinski_harabasz_score, davies_bouldin_score

silhouette_score(X, labels)         # [-1, 1], higher is better
calinski_harabasz_score(X, labels)  # higher is better
davies_bouldin_score(X, labels)     # lower is better

Distance Metrics

All algorithms accept a metric parameter:

KMeans(n_clusters=5, metric="cosine")
DBSCAN(eps=0.3, metric="manhattan")
HDBSCAN(min_cluster_size=5, metric="euclidean")
Metric Aliases KD-tree acceleration Notes
"euclidean" "l2" Yes Default for all algorithms
"cosine" No (brute force) K-means forces Lloyd (Hamerly assumes Euclidean)
"manhattan" "cityblock", "l1" Yes

Ward linkage requires euclidean metric.

Performance

K-means benchmark vs scikit-learn (single-threaded, n_init=1, median of 5 runs):

n d k Speedup vs sklearn
1,000 8 8 2.9x
10,000 8 8 2.4x
100,000 8 32 3.2x
100,000 32 32 1.4x

DBSCAN and HDBSCAN use KD-tree acceleration for d <= 16 with euclidean or manhattan metrics, reducing neighbor queries from O(n^2) to O(n log n).

Full benchmark: python benches/benchmark.py

Serialization

All models support pickle:

import pickle

model = KMeans(n_clusters=3).fit(X)
data = pickle.dumps(model)
model_restored = pickle.loads(data)  # fitted state preserved

Development

maturin develop --release              # build
cargo test --no-default-features --lib # Rust tests (128)
pytest tests/ -v                       # Python tests (192)
python benches/benchmark.py            # benchmark vs sklearn
cargo fmt -- --check                   # formatting
cargo clippy --no-default-features --lib -- -D warnings  # linting

Architecture

Three-layer kernel design separating concerns:

  1. PyO3 boundary (src/lib.rs) — input validation, GIL release, dtype dispatch
  2. Algorithm logic (src/kmeans.rs, etc.) — iteration, convergence, ndarray types
  3. Hot kernel (src/utils.rs, src/distance.rs) — raw &[F] slices for auto-vectorization

See docs/architecture-decisions.md for details and docs/lessons-building-rustcluster.md for the full build story.

Contributing

See CONTRIBUTING.md for how to add algorithms, distance metrics, and tests.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rustcluster-0.3.1.tar.gz (227.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustcluster-0.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

rustcluster-0.3.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (919.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

rustcluster-0.3.1-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

rustcluster-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

rustcluster-0.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (919.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

rustcluster-0.3.1-cp312-cp312-macosx_11_0_arm64.whl (846.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

rustcluster-0.3.1-cp312-cp312-macosx_10_12_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

rustcluster-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

rustcluster-0.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (919.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

rustcluster-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

rustcluster-0.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (919.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

rustcluster-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

rustcluster-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (920.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

File details

Details for the file rustcluster-0.3.1.tar.gz.

File metadata

  • Download URL: rustcluster-0.3.1.tar.gz
  • Upload date:
  • Size: 227.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rustcluster-0.3.1.tar.gz
Algorithm Hash digest
SHA256 664a48bfa50fd3c8303bff995748d0ba5610970c4e9ec942835e640c33c9005c
MD5 a7ce5c96525ad0830bce5cd142a5d18a
BLAKE2b-256 677fe61559969c7f84e89410a8227a8530d684eb82f7c8eb2b5fca76296e92de

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1.tar.gz:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 195206c0e674392cf6cef092821c7688155a10bb38603bfe524035f45d45a9ba
MD5 5665227cc1424ecd017b365122af65fd
BLAKE2b-256 b973bd020a4d7fdbeaedfb9642ff294e9d90a7cd9224b7a1b88ac3f65d956863

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e14ce48a06122019bd2764c4c1d30e205f61ad0af57557a84a614af9252f5f2e
MD5 7de1e9c9f3972e347568e0c08722a380
BLAKE2b-256 9619527a0234d87cb4ef9fb3b8dd6c280d8d4182048997d16db1508c54ebd9a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7efde73dd938f5de7b95b9e1d62c23de20dc7d88e69dee529db93c70e262c702
MD5 4992d6614fcd6a6412a4997e2c8b7db9
BLAKE2b-256 e623cb8f7137bbdff86633398a26be32fa2db63e28a2b4bd7ce01f4a4f815053

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 350194a84115d8ed604f98958f22bdff88d70ce9cf0d398cbf2b5e520cbcc4dc
MD5 55972c59c7c60f40fc3312053a0b3d58
BLAKE2b-256 a0d6eca40764b000915fa617d08310059682bbb57612fc12c6f56d7acedbffb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 154ec3dd5f6e28d0db4cc43185377af1496e545e497eab93c36915384edd7e38
MD5 ac0caa6a125445d2fc0c5dd7c91f60f2
BLAKE2b-256 69c408d112632498c7d41137a647de8b0614741d8bad443fa622dad6205a84a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7cf4ffb6d39717dca0f6fb4f8ee47c25584ed8a18d2241f1c8abac08ac21120e
MD5 e97ec9be7f1ba07d2372f11c2f6d346c
BLAKE2b-256 108ce6b14e6832577f2346d4a68df6d48ab2ce328781f517d95194478cd1902e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 034643d88ff4df886b241a84a0d7d0773a7dddee9f8b329e825248dd5841a8bf
MD5 a6953f95a548551172b9b7eef98e90a2
BLAKE2b-256 beb1ee6d009bdc1cf4dd4c53c06dc416bfb4395b99463172a2f7a4593c5a9c79

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp312-cp312-macosx_10_12_x86_64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bbb847e6b499691de98a241c4b44fe2db9d985b5baf50c70cd85a94463eb73a6
MD5 11e6a526f08035a43934b95949d1cd5a
BLAKE2b-256 409e835c6c9f241529994ea1edfa8258ec9052ce14232e8db1d3cecffb4e910c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 eb7fd38cf60a4e583a79b36aa20223f5602dc6e10d22d785664f838ffc6ccb10
MD5 846993c354757250812768a5f0123b4b
BLAKE2b-256 8137a984a59e809cccf94f7b1c394e9ae7877e32d6498d499d7fa57976b3d623

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 54c2f428c789f12fc2a8535f4a6896f91db8d9dd882afc67583fba16f869068d
MD5 afe2229a3bb8f10ac1d2297671eb91cc
BLAKE2b-256 39573d05840b993b791a134e1b274ab56a7b26a1749bb2ee26ac1e0482c4ff8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9bef857ef47c157d6525aa873fa2c44bd2cd452b02c42a25acce085e293586ff
MD5 a212af7af15cea38fbe8099aaa68c36a
BLAKE2b-256 a771a8b2a01eb0844bb954cb061598fd5335bb36b3e8393a27c3fa08373a7770

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4ffb0462e814ac114a79e65bb6aa6c56528fe8526531936857ceff429334e076
MD5 b5dd847e0a0ff3bdabbbee9c9b2af275
BLAKE2b-256 ad31159879c552bf7e46fa423e0dda5d4bce6bc9a4dcade2e06ead18e1fba1aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustcluster-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustcluster-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 356542a4f3ee3ae290ad0f88722cf5881f4e74f68a619f28ff1c706675e356fd
MD5 4017f1c1e0ed09f5fdd162987b821d6f
BLAKE2b-256 f1d19206efbf03707b2d36ea5e839458bc2e7071160a406a969edbabad4ed1c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustcluster-0.3.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on mfbaig35r/rustcluster

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page