Graph-Based Clustering using connected components and spanning trees
Project description
Graph-Based Clustering
Graph-Based Clustering using connected components and minimum spanning trees.
Both clustering methods, supported by this library, are transductive - meaning they are not designed to be applied to new, unseen data.
Installation
To install graph-based-clustering run:
pip install graph-based-clustering
Usage
The library has sklearn-like fit/fit_predict
interface.
ConnectedComponentsClustering
This method computes pairwise distances matrix on the input data, and using threshold (parameter provided by the user) to binarize pairwise distances matrix makes an undirected graph in order to find connected components to perform the clustering.
Required arguments:
- threshold - paremeter to binarize pairwise distances matrix and make undirected graph
Optional arguments:
- metric - sklearn.metrics.pairwise_distances parameter (default: "euclidean")
- n_jobs - sklearn.metrics.pairwise_distances parameter (default: None)
Example:
import numpy as np
from graph_based_clustering import ConnectedComponentsClustering
X = np.array([[0, 1], [1, 0], [1, 1]])
clustering = ConnectedComponentsClustering(
threshold=0.275,
metric="euclidean",
n_jobs=-1,
)
clustering.fit(X)
labels_pred = clustering.labels_
# alternative
labels_pred = clustering.fit_predict(X)
SpanTreeConnectedComponentsClustering
This method computes pairwise distances matrix on the input data, builds a graph on the obtained matrix, finds minimum spanning tree, and finaly, performs the clustering through dividing the graph into n_clusters (parameter given by the user) by removing n-1 edges with the highest weights.
Required arguments:
- n_clusters - the number of clusters to find
Optional arguments:
- metric - sklearn.metrics.pairwise_distances parameter (default: "euclidean")
- n_jobs - sklearn.metrics.pairwise_distances parameter (default: None)
Example:
import numpy as np
from graph_based_clustering import SpanTreeConnectedComponentsClustering
X = np.array([[0, 1], [1, 0], [1, 1]])
clustering = SpanTreeConnectedComponentsClustering(
n_clusters=3,
metric="euclidean",
n_jobs=-1,
)
clustering.fit(X)
labels_pred = clustering.labels_
# alternative
labels_pred = clustering.fit_predict(X)
Comparing on sklearn toy datasets
ConnectedComponentsClustering
SpanTreeConnectedComponentsClustering
Requirements
Python >= 3.7
Citation
If you use graph-based-clustering in a scientific publication, we would appreciate references to the following BibTex entry:
@misc{dayyass2021graphbasedclustering,
author = {El-Ayyass, Dani},
title = {Graph-Based Clustering using connected components and spanning trees},
howpublished = {\url{https://github.com/dayyass/graph-based-clustering}},
year = {2021}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file graph-based-clustering-0.1.0.tar.gz
.
File metadata
- Download URL: graph-based-clustering-0.1.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a87b9a865249e3bbda8c7894852f0cf31647162e0e85a04a4bf0a3dba6f455c |
|
MD5 | 6596b2ac238bcf7487dcd57e6a5eb774 |
|
BLAKE2b-256 | 2c0f3626835f503a3d0f4f9d5bb668761230f81060be03c2e62d9cbf74b4adef |
File details
Details for the file graph_based_clustering-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: graph_based_clustering-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb1ba7c278d5b51351d3a8491dda2b6b6dda2bc2dd6ef9be4a2545154c7f710b |
|
MD5 | 1bff811d3dfc0002a5849d6cd6b3dc8a |
|
BLAKE2b-256 | d3705483b7e506396ae04c837dfaea799ebcea31aeda2df54284f59f5c0bf7ff |