Skip to main content

A Super fast K-Means library for High-Dimensional vectors on CPUs (x86, ARM) and GPUs

Project description

Super K-Means
Paper License License GitHub stars

A super-fast clustering library for high-dimensional vector embeddings

SuperKMeans vs FAISS and Scikit Learn

Why Super K-Means?

  • 100x faster clustering than FAISS of vector embeddings (Cohere, OpenAI, MXBAI, CLIP, MiniLM).
  • Index 10M embeddings of 1024 dimensions in less than a minute on a single CPU.
  • Faster without compromising clustering quality.
  • Efficient in CPUs (ARM and x86) and GPUs.

Our secret sauce

  • Carefully interleaving GEMM routines and pruning kernels that prune dimensions efficiently
  • In the benchmarks you see in the cover image, all algorithms are clustering the same data: No dimensionality reduction, no sampling, no early-termination.

Usage

from superkmeans import SuperKMeans

data = ... # Numpy 2D matrix
k = 1000
d = 768

kmeans = SuperKMeans(
    n_clusters=k,
    dimensionality=d
)

# Run the clustering
centroids = kmeans.train(data) # 2D array with centroids (k x d) 

# Get assignments
assignments = kmeans.assign(data)

Then, you can use the centroids to create an IVF index for Vector Search, for example, in FAISS.

Usage in C++
#include <vector>
#include <cstddef>
#include "superkmeans/superkmeans.h"
#include "superkmeans/hierarchical_superkmeans.h"

int main(int argc, char* argv[]) {
    std::vector<float> data; // Fill
    size_t n = 1000000;
    size_t k = 10000;
    size_t d = 768;

    auto kmeans = skmeans::SuperKMeans(k, d);

    // Or Hierarchical Super K-Means for extreme performance:
    // auto kmeans = skmeans::HierarchicalSuperKMeans(k, d);
    
    // Run the clustering
    std::vector<float> centroids = kmeans.Train(data.data(), n);
    
    // Assign points
    std::vector<uint32_t> assignments = kmeans.Assign(data.data(), centroids.data(), n, k);
}

Check our examples for fully working examples in Python and C++.

Documentation

Check our wiki for advanced usage.

Installation

pip install superkmeans

[!TIP] For maximum performance, we recommend compiling from source.

Compiling Python Bindings from source

Prerequisites

  • Clang 17, CMake 3.26
  • OpenMP
  • A BLAS implementation
  • Python 3 (only for Python bindings)
git clone https://github.com/cwida/SuperKMeans.git
cd SuperKMeans
git submodule update --init
pip install .

# Run plug-and-play example
python ./examples/simple_clustering.py

# Set a value for n, d and k
python ./examples/simple_clustering.py 200000 1536 1000
Compiling C++ library from source

Prerequisites

  • Clang 17, CMake 3.26
  • OpenMP
  • A BLAS implementation
  • Python 3 (only for Python bindings)
git clone https://github.com/cwida/SuperKMeans.git
cd SuperKMeans
git submodule update --init

# Set proper path to clang if needed
export CXX="/usr/bin/clang++-18" 

# Compile
cmake .
make examples

# Run plug-and-play example
cd examples
./simple_clustering.out

# Set a value for n, d and k
./simple_clustering.out 100000 1536 1000

For a more comprehensive installation and compilation guide, check INSTALL.md.

Getting the Best Performance

Check INSTALL.md.

Roadmap

We are actively developing Super K-Means and accepting contributions! Check CONTRIBUTING.md

Benchmarking

To run our benchmark suite in C++, refer to BENCHMARKING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superkmeans-0.1.0.tar.gz (45.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

superkmeans-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

superkmeans-0.1.0-cp313-cp313-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

superkmeans-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (472.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

superkmeans-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

superkmeans-0.1.0-cp312-cp312-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

superkmeans-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (472.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

superkmeans-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

superkmeans-0.1.0-cp311-cp311-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

superkmeans-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (470.6 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

superkmeans-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

superkmeans-0.1.0-cp310-cp310-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

superkmeans-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (469.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

superkmeans-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

superkmeans-0.1.0-cp39-cp39-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ ARM64

superkmeans-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (469.5 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file superkmeans-0.1.0.tar.gz.

File metadata

  • Download URL: superkmeans-0.1.0.tar.gz
  • Upload date:
  • Size: 45.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for superkmeans-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d23afd4a5f19c87a27fdd5f3c9631274612071fa96a19c0c569e537237b349f2
MD5 b360b6eb2a5bb54cc0d5a74946780e2f
BLAKE2b-256 8200c760ab4fa4f0981fff205cb33b761bc72e6597113a93d4cb1f0d180b50b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0.tar.gz:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7b8cb60e7f19fb3c9b1ec65d6242678a31d38a90c9c7e920f576c0021ee30498
MD5 9a7fb8941c5f8a3a0e67168f31284e33
BLAKE2b-256 17b853f630dc4ca0fe93be3d3bee94a5940db6cdbde383ee2c61d13a36bc4aa4

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 78263791f22aeb5d5acc0e11f9df574f140e567f29e85583cc386349a1c3d641
MD5 db28e2f35bbf65c04e92c47ff9bb01ad
BLAKE2b-256 0b2f7a2c0ae94981c848422fabc5f4aefe6151be1a25aa224166185100018b6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp313-cp313-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a45d3fc0b74ae650111d5d17adebb25e0952f1072bb6c1ad1889b52649360df5
MD5 9e498c5316a8c7e8f208f5d7df7531fa
BLAKE2b-256 a34d32af197dffdcfbc285dea5769767d38f882db999761ae8c200464273e536

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f883b1f1286695f4292ebc9be18354d4f60df1a0aea3a6a03384ed39ea8cd279
MD5 a84b20b13028742b6334309028d631f6
BLAKE2b-256 abd4b990421e751ba99cbe7d74e6618c4f7a70bacfd16610c9021b2e723c30c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ef557bee636512075f4a413e825c18343fbfa66833c81f569127e7ce412f2f4e
MD5 504b5346d3bc972cea45c05a8d972cce
BLAKE2b-256 40e85523f5e696ff94be8349f7d63b55adff16cae7f603dc6887971435d8586e

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp312-cp312-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f96bbe304da460b1fafc96344ea06fc1b510febe1fe78a2927b22c2acb660e9a
MD5 788d4fb968760335a2c73a894d33d061
BLAKE2b-256 113562d432a83c7293472ba8c23128c14304ae5b9eb0c6d3a34bce57b55e9532

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 faa198ad4cd4f6baf4116a9c8301b404ce827b83b9856f81ce48a5ab77122af1
MD5 2002fc26e49300e1b506bbf6d5760c4d
BLAKE2b-256 328fba225bde28534eaa4b70a00635ecaf86c6f1bccf2ae4b866605f4f6288d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b5bde120bc781522aad2716d6058503d3e1509841cd9ec739b783e069217afdf
MD5 db6ee52a5b00a816a1f200cb646ae8ab
BLAKE2b-256 05f879b2912743ef60379d3eab5efb28d5b4d49c30027c2580939ca2c63a60b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp311-cp311-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 08503ddbfbd9efb57f3799d778e1248f5438bd278e8ca80fdb124386543d512a
MD5 12c8d0ab989dfc3a9854e8d27ee146bd
BLAKE2b-256 5d6380c09014fdee3fef33e9f8f608110834bb708660be0dd87efdbd4d9e6fb7

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1c9e4167b8772779b6824212ded2d9545ec5fa34cf29b11e60c66aaf0e4a6c60
MD5 2fbf8302162d2c8965445eefbe61c634
BLAKE2b-256 7ee5d05ecaaeede7a761f4e1b979d849931b904a0d20bd3602f8a5d44c741d2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bde59975010286463002b338b4a3b5aa348c9ec8f6b8a662df0a47932aeade1e
MD5 e0c2cdfb579f8d90afa4328ad2d6911d
BLAKE2b-256 79095f42f08f9e6b762ba9da02a405a82527f2b80cab592714fe6c3b4c789296

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp310-cp310-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cbc8fefebff167a406df7542d4b0d666a1eb069b181d9df3c9328e457562a0c3
MD5 bf70b1d97659a1bc4e088be7c804da68
BLAKE2b-256 f7fc7fe32fc56b482b1894ae0c8d0e4ff8485154d1ba6a4d8387f9af7470c056

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b4f97e6df7a95990e527bb163842f2d8c4fea61d152f17c713d93f226a88733a
MD5 07948b1381dd322b2f31660d2f13a648
BLAKE2b-256 330b5322ec97506bbd025cde13e655b3ba4a6484512b9e69877036e6b964bfe1

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a3c2d90a7e93b5d891ab0e9292a7d0080ccc51703fd93af4d7d84041aafcd6db
MD5 393b281ce43bbfa829a81aed5738d6fd
BLAKE2b-256 8e52c1eb8dfb5078b53d9f628115dcd581970486702584c3ce9de53e04640871

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp39-cp39-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 48d7b0a74888e2586245ea420ee8a0e52454fa8f65f46a08d32c4709ea6b3e02
MD5 10ac9b3067754f6fff91edb09fe49876
BLAKE2b-256 878e579c8807fe7c68806bc46b059de16028a2330cef4b0ff912946b81569896

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.0-cp39-cp39-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page