Skip to main content

A Super fast K-Means library for High-Dimensional vectors on CPUs (x86, ARM) and GPUs

Project description

Super K-Means
Paper PyPI License License GitHub stars

A super-fast clustering library for high-dimensional vector embeddings

SuperKMeans vs FAISS and Scikit Learn

Why Super K-Means?

  • 100x faster clustering than FAISS of vector embeddings (Cohere, OpenAI, MXBAI, CLIP, MiniLM).
  • Index 10M embeddings of 1024 dimensions in less than a minute on a single CPU.
  • Faster without compromising clustering quality.
  • Efficient in CPUs (ARM and x86) and GPUs.

Our secret sauce

  • Carefully interleaving GEMM routines and pruning kernels that prune dimensions efficiently
  • In the benchmarks you see in the cover image, all algorithms are clustering the same data: No dimensionality reduction, no sampling, no early-termination.

Usage

from superkmeans import SuperKMeans

data = ... # Numpy 2D matrix
k = 1000
d = 768

kmeans = SuperKMeans(
    n_clusters=k,
    dimensionality=d
)

# Run the clustering
centroids = kmeans.train(data) # 2D array with centroids (k x d) 

# Get assignments
assignments = kmeans.assign(data)

Then, you can use the centroids to create an IVF index for Vector Search, for example, in FAISS.

Usage in C++
#include <vector>
#include <cstddef>
#include "superkmeans/superkmeans.h"
#include "superkmeans/hierarchical_superkmeans.h"

int main(int argc, char* argv[]) {
    std::vector<float> data; // Fill
    size_t n = 1000000;
    size_t k = 10000;
    size_t d = 768;

    auto kmeans = skmeans::SuperKMeans(k, d);

    // Or Hierarchical Super K-Means for extreme performance:
    // auto kmeans = skmeans::HierarchicalSuperKMeans(k, d);
    
    // Run the clustering
    std::vector<float> centroids = kmeans.Train(data.data(), n);
    
    // Assign points
    std::vector<uint32_t> assignments = kmeans.Assign(data.data(), centroids.data(), n, k);
}

Check our examples for fully working examples in Python and C++.

Documentation

Check our wiki for advanced usage.

Installation

Python

pip install superkmeans

[!TIP] For maximum performance, we recommend compiling from source.

C++

As a header-only library with CMake FetchContent:

FetchContent_Declare(
    superkmeans
    GIT_REPOSITORY https://github.com/cwida/superkmeans
)
FetchContent_MakeAvailable(superkmeans)

target_link_libraries(myapp PRIVATE superkmeans)
Compiling Python Bindings from source

Prerequisites

  • Clang 17 or GCC 13
  • CMake 3.26
  • OpenMP
  • A BLAS implementation
  • Python 3 (only for Python bindings)
git clone https://github.com/cwida/SuperKMeans.git
cd SuperKMeans
git submodule update --init
pip install .

# Run plug-and-play example
python ./examples/simple_clustering.py

# Set a value for n, d and k
python ./examples/simple_clustering.py 200000 1536 1000
Compiling C++ library from source

Prerequisites

  • Clang 17 or GCC 13
  • CMake 3.26
  • OpenMP
  • A BLAS implementation
git clone https://github.com/cwida/SuperKMeans.git
cd SuperKMeans
git submodule update --init

# Set proper path to clang if needed
export CXX="/usr/bin/clang++-18" 

# Compile
cmake .
make examples

# Run plug-and-play example
cd examples
./simple_clustering.out

# Set a value for n, d and k
./simple_clustering.out 100000 1536 1000

For a more comprehensive installation and compilation guide, check INSTALL.md.

Getting the Best Performance

Check INSTALL.md.

Roadmap

We are actively developing Super K-Means and accepting contributions! Check CONTRIBUTING.md

Benchmarking

To run our benchmark suite in C++, refer to BENCHMARKING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superkmeans-0.1.1.tar.gz (45.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

superkmeans-0.1.1-cp313-cp313-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

superkmeans-0.1.1-cp313-cp313-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

superkmeans-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (467.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

superkmeans-0.1.1-cp312-cp312-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

superkmeans-0.1.1-cp312-cp312-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

superkmeans-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (467.3 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

superkmeans-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

superkmeans-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

superkmeans-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (465.6 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

superkmeans-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

superkmeans-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ ARM64

superkmeans-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (464.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

superkmeans-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ x86-64

superkmeans-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl (4.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.28+ ARM64

superkmeans-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (464.5 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file superkmeans-0.1.1.tar.gz.

File metadata

  • Download URL: superkmeans-0.1.1.tar.gz
  • Upload date:
  • Size: 45.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for superkmeans-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e3805540a996a53ee5a99d657f1e82ca0601a49a08cb87417d94ff652386002e
MD5 43e7f75801959b37a8c0f80c6c7d9e37
BLAKE2b-256 7c036a8665c4b5163241a815698ef68fde7ac70a63c93fe8d76f8943cf9901b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1.tar.gz:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 798949574df7fb78ca8208ba96b7dc830e2dfc375a39f139d87d42ddb5a71800
MD5 3c00e1296ba1479e6f0b80b962baa11d
BLAKE2b-256 de07b3648b3a0097628aa4d80fb70e5492ec50e180bbfe0da8e91bb3bd9443a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7e79fa5ba19f725e0a358a203c1de1ca2459aa5a9ae87c21a306d9bf5f3b95b7
MD5 05ee4ca443975fcfc3021556c9e20cab
BLAKE2b-256 9ed1bf3092f1d9102687cbd8159656dbfe17f70b8374fd5e1b2c2d7ab120eaee

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp313-cp313-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3b026b99c0f8f2f9fc60ba249d26a9ba611bfa0f9efa1c3385489aee3d1e2207
MD5 18ece670c3dc8fbb9820cdfb3d671ec8
BLAKE2b-256 49e5706654a65b7a8e72896008b21c1fc9eaefdce4843a99a067f4eacae54875

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 245c737e0b5c83a6aed579f5d1c118ae60fe5c8aedf796b550d83bf13e476b26
MD5 44f15564f51b53c8b9f835f90970f8f9
BLAKE2b-256 347db390b4f133f35d7ae61109b12eb3b037ad6cbf44811ecf5c79c6c1ca7a4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c0a8c28deb5e4017d6ea9d212987b36e429614635792e1f5c5b67bf550142d62
MD5 88d026d29cca3f369568b1c395eaf7de
BLAKE2b-256 4c412947262509157919c47b2999bfedd43113f1fd8811a8f467a38772f97193

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp312-cp312-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8c1cae80455af91fad515cc78285ddd92952e0095d9bd66d2292d762745d73f0
MD5 53ba5e3b262cdb6790a00ab3eec3156c
BLAKE2b-256 a76627208283e45498cb45c5b4ef88305a10cdd455d4937fd125521712cd7820

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c8bfef53ea99f35620dc93346c7f7d21b9076f97c89369f26f736a47fd77faff
MD5 770102ec88629cbcce5adb24d702278b
BLAKE2b-256 6e3c23fb5b4f096c67c988cca54ac8a8ffe4570d86016b815aa6cee111eaa478

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2a17e87ffd8aa7200f6a0b5cd0c1becc74896ab5a59719c4ff9ff91ef267b26f
MD5 d08f70e3bf9fedac40942020a4112c96
BLAKE2b-256 92d1e00816d37661c7246de29a135d47fd050bb40ee057afec93f395f5788801

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp311-cp311-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 78982bcd40fd1afb632d6a64d9224ddd38624cebaee979a8ad4bd3d3e58883e2
MD5 db249ca6fa5174c2a5d7f7f310103ffb
BLAKE2b-256 216902e9371d7c4bb7d9db828bd8385344fd7cda1202fa6c8cad1d914e72c1b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b798129715adb1a97fa36ccdf53d71a7761b1454c50c3b07a7249e4bb055cac7
MD5 dd09c4e171ce565e9f609d2773f5a91a
BLAKE2b-256 65190480c3af46420e0dce9e8520a7bf4bb9caf826b397986c330694cd9205c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9828a4056cc4e731c321fdcd3c5007ec727717c561028be7db10f732fe75bb67
MD5 4ecbe7b9be525e7d744768395ed5603b
BLAKE2b-256 38d285bc91a105d7433a551ba1230f3dd3101ccd71a6c0db9dee57ab4bb0638e

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp310-cp310-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 84bc174333480113923b0540ff27285f91beea5dfc8ac1ccb1ae82362ac3314a
MD5 9a497adbfb25c1d1677aeb4f969d1c92
BLAKE2b-256 56ef908063bbafcb055a394f39f42cc6e15d27b1645b783d1659b039b1bf1e70

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2402eb2c49f3a66ba3b4e731abf62477276e62faae4512a62d5f7595cd9644f0
MD5 575058c5c4fee5f61e7325d4a80d1040
BLAKE2b-256 224d5ffd629160fb90ec681795ca4d661054ef4346335cc843c69dec3d511d00

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp39-cp39-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dcaf882b2b747905b6e5f649ef47c725704529cbf79011878e5a71222c52b3a6
MD5 c78df33d3f04559fdc07b96c176a5c1a
BLAKE2b-256 77fcbb597cc443c5624a3e2a9937c61ad39ec792e7ba799ea93703f16d5ae434

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp39-cp39-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file superkmeans-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superkmeans-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 edc3625552521913641494d22855e135eadea2cc569e5dc1118c5c1448cc0d1f
MD5 025f33092c74fefb640dc3a185d0d850
BLAKE2b-256 0780205bd5e9cf23087fea8c0def3591ef1fd9270aebd6cb31b0b23396a2d5e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for superkmeans-0.1.1-cp39-cp39-macosx_11_0_arm64.whl:

Publisher: publish.yml on cwida/SuperKMeans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page