Skip to main content

Markov Clustering in Python

Project description

pyMarkovClustering

Python3 OS License Latest PyPI version conda-forge CI

Table of contents

Overview

pyMarkovClustering is a python package for Markov Clustering (MCL) and its graph visualization. While there is already a python library markov_clustering that implements the MCL algorithm, it has not been maintained for a long time and lacks some functionality. To address these issues, pyMarkovClustering was developed.

[!NOTE] This library uses scipy sparse matrix for its MCL implementation and can cluster simple graphs with a few thousand nodes without any problems. However, if you need to cluster a complex graph with tens of thousands of nodes or more, I recommend using mcl command-line tool for better runtime performance and memory efficiency.

Installation

Python 3.9 or later is required for installation.
For visualization, networkx and extra packages (e.g. matplotlib, pygraphviz) are also required.

Install PyPI package:

pip install pymarkovclustering 
pip install networkx[default,extra] # For visualization

Install conda-forge package:

conda install -c conda-forge pymarkovclustering
conda install -c conda-forge networkx matplotlib-base pygraphviz pydot lxml # For visualization

[!NOTE] pygraphviz installation requires graphviz and C/C++ compiler.
If you encounter installation troubles, see pygraphviz install docs for details.

API Usage

See notebooks and API docs in documents for more details.

Markov Clustering

Simple edges

import pymarkovclustering as pymcl

# List of edges (source, target, weight)
edges = [
    ("A", "B", 10),
    ("A", "C", 10),
    ("B", "C", 2),
    ("D", "E", 5),
    ("F", "G", 2),
    ("H", "I", 0.0),
]
# load edges as matrix, MCL, extract clusters
matrix, labels = pymcl.edges_to_sparse_matrix(edges)
mcl_matrix = pymcl.mcl(matrix, quiet=False)
clusters = pymcl.extract_clusters(mcl_matrix, labels)

for i, cluster in enumerate(clusters, 1):
    print(f"Cluster{i:03d}: {cluster}")

Output:

Cluster001: ['A', 'B', 'C']  
Cluster002: ['D', 'E']  
Cluster003: ['F', 'G']  
Cluster004: ['H']  
Cluster005: ['I']  

Random generated edges

import pymarkovclustering as pymcl

# Generate random edges for MCL test
edges = pymcl.random_edges(30, min_cluster_size=2, max_cluster_size=6)
print(f"Edges: {edges}\n")

# easymcl automates load edges as matrix, MCL, extract clusters
clusters = pymcl.easymcl(edges, inflation=2.0)

for i, cluster in enumerate(clusters, 1):
    print(f"Cluster{i:03d}: {cluster}")

Output:

Edges: [('5_2', '5_5', 0.625), ('2_1', '2_5', 0.602), ('6_4', '6_5', 0.301), ('5_3', '5_6', 0.73), ('5_2', '5_6', 0.612), ('5_3', '5_5', 0.333), ('2_3', '2_5', 0.33), ('5_1', '5_3', 0.918), ('1_2', '1_4', 0.218), ('7_1', '7_2', 0.291), ('4_2', '4_3', 0.553), ('3_1', '3_2', 0.354), ('5_3', '5_4', 0.828), ('2_2', '2_4', 0.099), ('6_2', '6_5', 0.875), ('2_1', '2_3', 0.533), ('2_1', '2_4', 0.705), ('5_4', '5_5', 0.704), ('1_1', '1_4', 0.968), ('2_2', '2_5', 0.074), ('5_1', '5_5', 0.093), ('1_2', '1_3', 0.892), ('6_2', '6_3', 0.091), ('1_3', '1_5', 0.095), ('6_2', '6_4', 0.993), ('5_2', '5_4', 0.785), ('1_1', '1_3', 0.83), ('4_3', '4_4', 0.521), ('6_1', '6_2', 0.222), ('4_1', '4_3', 0.64), ('2_3', '2_4', 0.85), ('4_1', '4_2', 0.316), ('6_1', '6_5', 0.543), ('6_3', '6_5', 0.489), ('5_1', '5_6', 0.84), ('4_1', '4_4', 0.204), ('1_3', '1_4', 0.14), ('1_2', '1_5', 0.139), ('7_1', '7_3', 0.125), ('6_1', '6_3', 0.803), ('5_4', '5_6', 0.063), ('2_2', '2_3', 0.147), ('2_1', '2_2', 0.987), ('4_2', '4_4', 0.443), ('5_2', '5_3', 0.71), ('7_2', '7_3', 0.333), ('6_3', '6_4', 0.998), ('1_4', '1_5', 0.799), ('1_1', '1_5', 0.358), ('5_1', '5_4', 0.916), ('5_1', '5_2', 0.062), ('2_4', '2_5', 0.56), ('1_1', '1_2', 0.918), ('5_5', '5_6', 0.917), ('6_1', '6_4', 0.142)]  

Cluster001: ['5_2', '5_5', '5_3', '5_6', '5_1', '5_4']  
Cluster002: ['2_1', '2_5', '2_3', '2_2', '2_4']  
Cluster003: ['6_4', '6_5', '6_2', '6_3', '6_1']  
Cluster004: ['1_2', '1_4', '1_1', '1_3', '1_5']  
Cluster005: ['4_2', '4_3', '4_4', '4_1']  
Cluster006: ['7_1', '7_2', '7_3']  
Cluster007: ['3_1', '3_2']  

Visualization

import pymarkovclustering as pymcl

# Generate random edges for MCL test
edges = pymcl.random_edges(100, random_add_rate=0.1, min_cluster_size=2)

# easymclviz automates load edges as matrix, MCL, extract clusters, visualization
fig = pymcl.easymclviz(edges, inflation=2.0, show_label=True)
fig.suptitle("Markov Clustering Result Visualization")
fig.savefig("clusters.png", dpi=100)

clusters.png

CLI Usage

pyMarkovClustering provides simple CLI for running MCL and extract clusters from edges file.

Option

$ pymcl --help
usage: pymcl [options] edges.tsv -o clusters.tsv

Markov Clustering in Python

positional arguments:
  edges              Input edges(source, target, weight) tab-delimited file

optional arguments:
  -o , --outfile     Output tab-delimited clusters file (default: stdout)
  -I , --inflation   Inflation factor (default: 2.0)
  --max_iter         Max number of iteration (default: 100)
  -q, --quiet        No print log on screen (default: OFF)
  -v, --version      Print version information
  -h, --help         Show this help message and exit

Example Command

pymcl edges.tsv -I 2.0 -o clusters.tsv

e.g. edges.tsv >>> clusters.tsv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymarkovclustering-0.1.0.tar.gz (277.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymarkovclustering-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file pymarkovclustering-0.1.0.tar.gz.

File metadata

  • Download URL: pymarkovclustering-0.1.0.tar.gz
  • Upload date:
  • Size: 277.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.9

File hashes

Hashes for pymarkovclustering-0.1.0.tar.gz
Algorithm Hash digest
SHA256 51893d8614532c52888edbdc57340abfc1b58d06ad64516465268e807b6adbf1
MD5 e32845e06d1ebcf0063ac9b7a11f303a
BLAKE2b-256 f49b6a23615f3133b7a102af4668c204145a3773831b04d129aa5da2df47eed0

See more details on using hashes here.

File details

Details for the file pymarkovclustering-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pymarkovclustering-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d892150e41b6c2ffe90f242abe51b44a63db1fcb45bf9f434b5264b64c6994da
MD5 82036eacfb53b5aa769359d52c453935
BLAKE2b-256 323874710090052aec78309e5733028ab2e8edf205773f8c30d55e6f524ae01d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page