Skip to main content

Spectral Bridges clustering algorithm

Project description

📊 Spectral Bridges

sbcluster is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called Spectral Bridges. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets.


✨ Features

  • Spectral Bridges Clustering Algorithm: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results.
  • Scalability: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations.
  • Customizable: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations.
  • Model selection: Automatic model selection for number of nodes (m) according to a normalized eigengap metric.
  • scikit-learn: Native integration with the standard API, with easy options for model selection and evaluation.

⚡ Speed

Spectral Bridges not only utilizes FAISS's efficient k-means implementation but also uses a scikit-learn method clone for centroid initialization, which is much faster than using scikit-learn's implementation (over 2x improvement).


🚀 Installation

pip install sbcluster

🔧 Usage

Example

import numpy as np
from sbcluster import SpectralBridges, ngap_scorer
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import GridSearchCV

# Load some synthetic data
data = np.genfromtxt("datasets/impossible.csv", delimiter=",")
X, y = data[:, :-1], data[:, -1]

# Define the parameter grid
param_grid = {"n_clusters": [2, 3, 4, 5, 6, 7, 8, 9, 10]}
cv = [(np.arange(X.shape[0]), np.arange(X.shape[0]))] * 5

# Perform grid search for optimal parameters
grid_search = GridSearchCV(
    estimator=SpectralBridges(n_clusters=2, n_nodes=250),
    param_grid=param_grid,
    scoring=ngap_scorer,
    cv=cv,
    verbose=1,
)

# Fit the grid search
grid_search.fit(X)

# Print the results
print(grid_search.cv_results_["mean_test_score"])
print(grid_search.best_params_)

# Make predictions with the best model
guess = grid_search.best_estimator_.predict(X)
ari = adjusted_rand_score(y, guess)

# Print the ARI
print(f"Adjusted Rand Index: {ari}")

📖 Learn More

For tutorials, API reference, visit the official site:
👉 sbcluster Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbcluster-0.3.1.tar.gz (379.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sbcluster-0.3.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file sbcluster-0.3.1.tar.gz.

File metadata

  • Download URL: sbcluster-0.3.1.tar.gz
  • Upload date:
  • Size: 379.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for sbcluster-0.3.1.tar.gz
Algorithm Hash digest
SHA256 41f0af1b7e4562f9b8b06aaeaeb64a8adb61a9ffaa892a88142c2e4c0f5fa8c9
MD5 7e41c34322d18500cae230450520de25
BLAKE2b-256 a136480b843865ff8754e9b5cc50b5ade12a2e891e78d5efbcb084d5635472d1

See more details on using hashes here.

File details

Details for the file sbcluster-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: sbcluster-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for sbcluster-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0476b40bcfae27835e1108c4be3ada9fe5feaf56b100ceba1a7c03cf2532b693
MD5 5b6a630055a7ab8f3098e427b9bf9210
BLAKE2b-256 f333e440998dab2b1a873653be0416d869de36dc2ac30fa79dc7b4cfcb5d6773

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page