Skip to main content

Spectral Bridges clustering algorithm

Project description

📊 Spectral Bridges

sbcluster is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called Spectral Bridges. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets.


✨ Features

  • Spectral Bridges Clustering Algorithm: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results.
  • Scalability: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations.
  • Customizable: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations.
  • Model selection: Automatic model selection for number of nodes (m) according to a normalized eigengap metric.
  • scikit-learn: Native integration with the standard API, with easy options for model selection and evaluation.

⚡ Speed

Spectral Bridges utilizes fastkmeanspp's efficient implementation for KMeans, which makes it remarkably fast even with large scale datasets.


🚀 Installation

pip install sbcluster

🔧 Usage

Example

from time import time

import matplotlib.pyplot as plt
import numpy as np
from sbcluster import SpectralBridges, ngap_scorer
from sklearn.cluster import SpectralClustering
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import GridSearchCV

# Load some synthetic data
data = np.genfromtxt("data/impossible.csv", delimiter=",")
X, y = data[:, :-1], data[:, -1]

# Define the parameter grid
param_grid = {"n_clusters": [2, 3, 4, 5, 6, 7, 8, 9, 10]}
cv = [(np.arange(X.shape[0]), np.arange(X.shape[0]))] * 5

# Perform grid search for optimal parameters
grid_search = GridSearchCV(
    estimator=SpectralBridges(n_clusters=2, n_nodes=250),
    param_grid=param_grid,
    scoring=ngap_scorer,
    cv=cv,
    verbose=1,
)

# Fit the grid search
grid_search.fit(X)

# Print the results
print(grid_search.cv_results_["mean_test_score"])
print(grid_search.best_params_)

# Make predictions with the best model
guess = grid_search.best_estimator_.predict(X)
ari = adjusted_rand_score(y, guess)

# Print the ARI
print(f"Adjusted Rand Index: {ari}")

# Visualize the clustering results
plt.scatter(X[:, 0], X[:, 1], c=guess, alpha=0.1)
plt.scatter(
    grid_search.best_estimator_.subcluster_centers_[:, 0],
    grid_search.best_estimator_.subcluster_centers_[:, 1],
    c=grid_search.best_estimator_.subcluster_labels_,
    marker="X",
)
plt.title("Clustered data and centroids with best SpectralBridges fit")
plt.show()

# Compare with sklearn's SpectralClustering
sc_low = SpectralClustering(n_clusters=7).fit(X)

plt.scatter(X[:, 0], X[:, 1], c=sc_low.labels_)
plt.title("Spectral Clustering of the original dataset, gamma=1.0")
plt.show()

sc_high = SpectralClustering(n_clusters=7, gamma=5).fit(X)

plt.scatter(X[:, 0], X[:, 1], c=sc_high.labels_)
plt.title("Spectral Clustering of the original dataset, gmma=5.0")
plt.show()

# Comapre times
start = time()
grid_search.best_estimator_.fit(X)
end = time()
print("SpectralBridges fit time:", end - start)

start = time()
sc_low.fit(X)
end = time()
print("SpectralClustering fit time:", end - start)

Results Comparison

Spectral Bridges result Spectral Clustering result Spectral Clustering result


📖 Learn More

For tutorials, API reference, visit the official site:
👉 sbcluster's documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbcluster-0.5.0.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sbcluster-0.5.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file sbcluster-0.5.0.tar.gz.

File metadata

  • Download URL: sbcluster-0.5.0.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sbcluster-0.5.0.tar.gz
Algorithm Hash digest
SHA256 217baa66a0d3f3dcd1f2464cf8a6234b2ce582f753b2fcfe4a210d7a8134697f
MD5 92e5c44d8eaa8fe84b88f911316dadf0
BLAKE2b-256 c481824e06a9d1fc4af0f79e2fc848f359701d2ebc2e4b3ae0d0cdff28dffe3b

See more details on using hashes here.

File details

Details for the file sbcluster-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: sbcluster-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sbcluster-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 59891d35c4549df5dbafc5361f632dbf6d65b268bd780b62f7edb5a4586df8b5
MD5 4219f56f58e52a07ff8d7f51cf68b6ba
BLAKE2b-256 9a3b9f3bed86fd325034eeebe3a30126415450b3e56a87804f53cfe9118547ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page