Skip to main content

Spectral Bridges clustering algorithm

Project description

📊 Spectral Bridges

sbcluster is a Python package that implements a novel clustering algorithm combining k-means and spectral clustering techniques, called Spectral Bridges. It leverages efficient affinity matrix computation and merges clusters based on a connectivity measure inspired by SVM's margin concept. This package is designed to provide robust clustering solutions, particularly suited for large datasets.


✨ Features

  • Spectral Bridges Algorithm: Integrates k-means and spectral clustering with efficient affinity matrix calculation for improved clustering results.
  • Scalability: Designed to handle large datasets by optimizing cluster formation through advanced affinity matrix computations.
  • Customizable: Parameters such as number of clusters, iterations, and random state allow flexibility in clustering configurations.
  • Model selection: Automatic model selection for number of nodes (m) according to a normalized eigengap metric.

⚡ Speed

Spectral Bridges not only utilizes FAISS's efficient k-means implementation but also uses a scikit-learn method clone for centroid initialization, which is much faster than using scikit-learn's implementation (over 2x improvement).


🚀 Installation

pip install sbcluster

🔧 Usage

Example

import numpy as np

from sbcluster import SpectralBridges

# Generate sample data
np.random.seed(0)
X = np.random.rand(100, 10)  # Replace with your dataset

# Initialize and fit Spectral Bridges (with a specified number of nodes if needed) and random seed
model = SpectralBridges(n_clusters=5, random_state=42)

# Define range of nodes to evaluate, should be an iterable of integers, or None if n_nodes is already set.
n_nodes_range = [10, 15, 20]

# Find the optimal number of nodes for a given value of clusters
# Modifies the instance attributes, returns a dict
# If n_nodes_range is None, then the model selects using self.n_nodes if not None
mean_ngaps = model.fit_select(X, n_nodes_range) 

print("Optimal number of nodes:", model.n_nodes)
print("Dict of mean normalized eigengaps:", mean_ngaps)

# Predict clusters for new data points
new_data = np.random.rand(20, 10)  # Replace with new data
predicted_clusters = model.predict(new_data)

print("Predicted clusters:", predicted_clusters)

# With a custom number of nodes
custom_model = SpectralBridges(n_clusters=5, n_nodes=12, p=1) # And a p-bridge affinity

# Fit the model
custom_model.fit(X)

# Predict the same way...
custom_predicted_clusters = custom_model.predict(new_data)

print("Predicted clusters:", custom_predicted_clusters)

📖 Learn More

For tutorials, API reference, visit the official site:
👉 sbcluster Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbcluster-0.1.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sbcluster-0.1.2-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file sbcluster-0.1.2.tar.gz.

File metadata

  • Download URL: sbcluster-0.1.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sbcluster-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d20a20b8d83ca5b13e1c9710392a77e7312a81f9851ffc9d4cc3e471fe0ad8aa
MD5 d6951a43556382b4d6ea1552b7117300
BLAKE2b-256 4565f06fd15e7a2c977b030dc994894d8f8d7d03b4d3a8f586ffbb229db1e4d9

See more details on using hashes here.

File details

Details for the file sbcluster-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: sbcluster-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sbcluster-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e722c2f5d61bea15a2f1cca88f2b46c0565f32d386fb2c3168d42abd57a602d1
MD5 1618a6bb6b269ba925e1c7aac39d8459
BLAKE2b-256 3a51d2588d24000e4c871a7e919be7be83dec9cf4cee68ea02b8bbc29396e98b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page