Skip to main content

Python 3 implementation and documentation of the Hermina-Janos local graph clustering algorithm.

Project description

LocalClustering

The project implements multiple variations of a local graph clustering algorithm named the Hermina-Janos algorithm in memory of my beloved grandparents.

Graph cluster analysis is used in a wide variety of fields. This project does not target one specific field, instead it aims to be a general tool for graph cluster analysis for cases where global cluster analysis is not applicable or practical for example because of the size of the data set or because a different (local) perspective is required.

The algorithms are independent of the cluster definition. The interface cluster definitions must implement can be found in the definitions package along with a simple connectivity based cluster definition implementation. Besides the algorithms and the cluster definition, other utilities are also provided, most notably a module for node ranking.

Installation

  1. Install the latest version of the project from the Python Package Index using pip install localclustering.
  2. The only dependency of this project is the graphscraper project. graphscraper should already be installed after pip install localclustering, but it has optional dependencies, one of which must be available on your system:
    • SQLAlchemy: It can be installed with pip install SQLAlchemy.
    • Flask-SQLAlchemy: It can be installed with pip install Flask-SQLAlchemy.

Getting started

This section will guide you through the basics using SQLAlchemy and the IGraphWrapper graph implementation from graphscraper. IGraphWrapper requires the igraph project to be installed. You can do this by following the instructions at http://igraph.org/python/.

Once everything is in place, the analyzed graph can be created:

import igraph
from graphscraper.igraphwrapper import IGraphWrapper

graph = IGraphWrapper(igraph.Graph.Famous("Zachary"))

The next step is the creation of the cluster definition and the preparation of the clustering algorithm:

from localclustering.definitions.connectivity import ConnectivityClusterDefinition
from localclustering.localengine import LocalClusterEngine

cluster_definition = ConnectivityClusterDefinition(1.5, 0.85)
local_cluster_engine = LocalClusterEngine(
    cluster_definition,  # The cluster definition the algorithm should use.
    source_nodes_in_result=True,  # Ensure that source nodes are not removed from the cluster.
    max_cluster_size=34  # Specify an upper limit for the calculated cluster's size.
)

Now the source node of the clustering must be retrieved:

source_node = graph.nodes.get_node_by_name("2", can_validate_and_load=True)

And finally the cluster analysis can be executed:

cluster = local_cluster_engine.cluster([source_node])

Additionally you can list the nodes inside the cluster with their rank to get an overview of the result:

rank_provider = local_cluster_engine.get_rank_provider()
for node in cluster.nodes:
    print(node.igraph_index, rank_provider.get_node_rank(node))

Community guidelines

Any form of constructive contribution is welcome:

  • Questions, feedback, bug reports: please open an issue in the issue tracker of the project or contact the repository owner in email, whichever you feel appropriate.
  • Contribution to the software: please open an issue in the issue tracker of the project that describes the changes you would like to make to the software and open a pull request with the changes. The description of the pull request must references the corresponding issue.

The following types of contribution are especially appreciated:

  • Implementation of new cluster definitions.
  • Result comparison with global clustering algorithms on well-known and -analyzed graphs.
  • Analysis of how cluster definitions should be configured for graphs with different characteristics.
  • Analysis of how the weighting coefficients of the connectivity based cluster definition corresponding to the different hierarchy levels relate to each-other in different real-world graphs.

License - GNU AGPLv3

The library is open-sourced under the conditions of the GNU Affero General Public License v3.0, which is the strongest copyleft license. The reason for using this license is that this library is the "publication" of the Hermina-Janos algorithm and it should be referenced accordingly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localclustering-0.13.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

localclustering-0.13.0-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file localclustering-0.13.0.tar.gz.

File metadata

  • Download URL: localclustering-0.13.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for localclustering-0.13.0.tar.gz
Algorithm Hash digest
SHA256 d2c469bb08d03546aaf703a97f0073f73b949e3688f0f36041f70590f04ed349
MD5 7f0dd53f504e1671405e908a05b748e3
BLAKE2b-256 0396f1b3b259995ecdfd7696aea275d331167f644d0647f96fd0300aa19a04e0

See more details on using hashes here.

File details

Details for the file localclustering-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: localclustering-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for localclustering-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5c5930e516e73e36099c4cf5cf2d1f63e31b95c7981d163128ee5df9a5c7f77
MD5 ce7e1217eaf1424d1c130bd0672b0e7c
BLAKE2b-256 625a5d77f5d407fee793d361f07a7ea52c4d89069467a64be2abeb449096ea64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page