Spectral Bridges clustering algorithm
Project description
📊 Spectral Bridges
sbcluster is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called Spectral Bridges. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets.
✨ Features
- Spectral Bridges Clustering Algorithm: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results.
- Scalability: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations.
- Customizable: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations.
- Model selection: Automatic model selection for number of nodes (m) according to a normalized eigengap metric.
- scikit-learn: Native integration with the standard API, with easy options for model selection and evaluation.
⚡ Speed
Spectral Bridges not only utilizes FAISS's efficient k-means implementation but also uses a scikit-learn method clone for centroid initialization, which is much faster than using scikit-learn's implementation (over 2x improvement).
🚀 Installation
pip install sbcluster
🔧 Usage
Example
from time import time
import matplotlib.pyplot as plt
import numpy as np
from sbcluster import SpectralBridges, ngap_scorer
from sklearn.cluster import SpectralClustering
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import GridSearchCV
# Load some synthetic data
data = np.genfromtxt("datasets/impossible.csv", delimiter=",")
X, y = data[:, :-1], data[:, -1]
# Define the parameter grid
param_grid = {"n_clusters": [2, 3, 4, 5, 6, 7, 8, 9, 10]}
cv = [(np.arange(X.shape[0]), np.arange(X.shape[0]))] * 5
# Perform grid search for optimal parameters
grid_search = GridSearchCV(
estimator=SpectralBridges(n_clusters=2, n_nodes=250),
param_grid=param_grid,
scoring=ngap_scorer,
cv=cv,
verbose=1,
)
# Fit the grid search
grid_search.fit(X)
# Print the results
print(grid_search.cv_results_["mean_test_score"])
print(grid_search.best_params_)
# Make predictions with the best model
guess = grid_search.best_estimator_.predict(X)
ari = adjusted_rand_score(y, guess)
# Print the ARI
print(f"Adjusted Rand Index: {ari}")
# Visualize the clustering results
plt.scatter(X[:, 0], X[:, 1], c=guess, alpha=0.1)
plt.scatter(
grid_search.best_estimator_.cluster_centers_[:, 0],
grid_search.best_estimator_.cluster_centers_[:, 1],
c=grid_search.best_estimator_.cluster_labels_,
marker="X",
)
plt.title("Clustered data and centroids with best SpectralBridges fit")
plt.show()
# Compare with sklearn's SpectralClustering
sc_low = SpectralClustering(n_clusters=7).fit(X)
plt.scatter(X[:, 0], X[:, 1], c=sc_low.labels_)
plt.title("Spectral Clustering of the original dataset, gamma=1.0")
plt.show()
sc_high = SpectralClustering(n_clusters=7, gamma=5).fit(X)
plt.scatter(X[:, 0], X[:, 1], c=sc_high.labels_)
plt.title("Spectral Clustering of the original dataset, gmma=5.0")
plt.show()
# Comapre times
start = time()
grid_search.best_estimator_.fit(X)
end = time()
print("SpectralBridges fit time:", end - start)
start = time()
sc_low.fit(X)
end = time()
print("SpectralClustering fit time:", end - start)
Results Comparison
📖 Learn More
For tutorials, API reference, visit the official site:
👉 sbcluster Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sbcluster-0.3.4.tar.gz.
File metadata
- Download URL: sbcluster-0.3.4.tar.gz
- Upload date:
- Size: 740.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
821ebb7e38e8c19aff7ed62f6d536ff02d391048228abcf4d25090071208d453
|
|
| MD5 |
74a7e51330a5cfc12e3394c2be9ae172
|
|
| BLAKE2b-256 |
e168721982640efb270e963322c9b19aafc77c48c7f6f74ba9d686ecb2a7acfe
|
File details
Details for the file sbcluster-0.3.4-py3-none-any.whl.
File metadata
- Download URL: sbcluster-0.3.4-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c112705c57824f125fb4bf55857ea17681b73481eac8824ea9751e58984e71e
|
|
| MD5 |
3d175db7f66db6a394affc9c7f150832
|
|
| BLAKE2b-256 |
38e21e2cd78cce1f809728ba457dad75c62afd6208a2f7d2b50e645378f885ce
|