Skip to main content

Graph-Based Clustering using connected components and spanning trees

Project description

tests linter codecov

python 3.7 release (latest by date) license

pre-commit code style: black

pypi version pypi downloads

Graph-Based Clustering

Graph-Based Clustering using connected components and minimum spanning trees.

Both clustering methods, supported by this library, are transductive - meaning they are not designed to be applied to new, unseen data.

Installation

To install graph-based-clustering run:

pip install graph-based-clustering

Usage

The library has sklearn-like fit/fit_predict interface.

ConnectedComponentsClustering

This method computes pairwise distances matrix on the input data, and using threshold (parameter provided by the user) to binarize pairwise distances matrix makes an undirected graph in order to find connected components to perform the clustering.

Required arguments:

  • threshold - paremeter to binarize pairwise distances matrix and make undirected graph

Optional arguments:

Example:

import numpy as np
from graph_based_clustering import ConnectedComponentsClustering

X = np.array([[0, 1], [1, 0], [1, 1]])

clustering = ConnectedComponentsClustering(
    threshold=0.275,
    metric="euclidean",
    n_jobs=-1,
)

clustering.fit(X)
labels_pred = clustering.labels_

# alternative
labels_pred = clustering.fit_predict(X)

SpanTreeConnectedComponentsClustering

This method computes pairwise distances matrix on the input data, builds a graph on the obtained matrix, finds minimum spanning tree, and finaly, performs the clustering through dividing the graph into n_clusters (parameter given by the user) by removing n-1 edges with the highest weights.

Required arguments:

  • n_clusters - the number of clusters to find

Optional arguments:

Example:

import numpy as np
from graph_based_clustering import SpanTreeConnectedComponentsClustering

X = np.array([[0, 1], [1, 0], [1, 1]])

clustering = SpanTreeConnectedComponentsClustering(
    n_clusters=3,
    metric="euclidean",
    n_jobs=-1,
)

clustering.fit(X)
labels_pred = clustering.labels_

# alternative
labels_pred = clustering.fit_predict(X)

Comparing on sklearn toy datasets

ConnectedComponentsClustering

ConnectedComponentsClustering

SpanTreeConnectedComponentsClustering

SpanTreeConnectedComponentsClustering

Requirements

Python >= 3.7

Citation

If you use graph-based-clustering in a scientific publication, we would appreciate references to the following BibTex entry:

@misc{dayyass2021graphbasedclustering,
    author       = {El-Ayyass, Dani},
    title        = {Graph-Based Clustering using connected components and spanning trees},
    howpublished = {\url{https://github.com/dayyass/graph-based-clustering}},
    year         = {2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph-based-clustering-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

graph_based_clustering-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file graph-based-clustering-0.1.0.tar.gz.

File metadata

  • Download URL: graph-based-clustering-0.1.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for graph-based-clustering-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4a87b9a865249e3bbda8c7894852f0cf31647162e0e85a04a4bf0a3dba6f455c
MD5 6596b2ac238bcf7487dcd57e6a5eb774
BLAKE2b-256 2c0f3626835f503a3d0f4f9d5bb668761230f81060be03c2e62d9cbf74b4adef

See more details on using hashes here.

File details

Details for the file graph_based_clustering-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: graph_based_clustering-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.5

File hashes

Hashes for graph_based_clustering-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb1ba7c278d5b51351d3a8491dda2b6b6dda2bc2dd6ef9be4a2545154c7f710b
MD5 1bff811d3dfc0002a5849d6cd6b3dc8a
BLAKE2b-256 d3705483b7e506396ae04c837dfaea799ebcea31aeda2df54284f59f5c0bf7ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page