connectivity modifier

Project description

cm

Connectivity Modifier (CM) is a generic meta-method for community detection while ensuring a certain connectivity (minimum number of edges to remove to disconnect a community) on the output communities (clusters). To be more precise, suppose that you want to ensure that Leiden clusters must not be "easily-cut". For example, ensuring that none of the output clusters have connectivity below $\log_{10}(n)$, $n$ the size of any cluster, you can run CM paired with Leiden, and it will ensure that all output clusters have that minimum size cut. CM supports customizable requirements on the connectivity of the clusters. CM currently supports Leiden (CPM optimization), IKC, and Leiden (modularity optimization) out-of-the-box. After installing the necessary dependencies, users can simply run CM to obtain Leiden clusters with strong guarantees on connectivity.

Installation

Our software is provided via PyPI and supports at least Python 3.9.

pip3 install --pre connectivity-modifier # install prerelease version

Note that you must install Viecut as a dependency explicitly, i.e., viecut the binary must be compiled and the path to viecut must be specified in the config file (explained below).

Say that you installed viecut in /foo/bar/viecut, then you want to create a config file in ~/.config/cm/config.toml and have something like this:

[tools]
ikc_path = "{project_root}/third_party/ikc.py" # {project_root} is a specific path resolving to the source code root
leiden_path = "" # currently obsolete
viecut_path = "/foo/bar/viecut" # viecut's path

Or if the main executable detects that cm.toml is in the current working directory, the cm.toml config file will have the highest priority instead.

After all this, try cm --help, and you should see something like this:

Usage: cm [OPTIONS]

  Connectivity-Modifier (CM). Take a network and cluster it ensuring cut
  validity

Options:
  -i, --input TEXT                [required]
[...]

Usage

Our main executable is provided as cm, and we list the options below:

`-i, --input GRAPH_TSV`

The input graph to be clustered, where graph.tsv is a tab-delimited edgelist, only including integer edge ids. Note that we follow the igraph convention, where we assume that the input node ids are continuous, and if not, dummy nodes are added.

`-c, --clusterer [leiden|ikc|leiden_mod]`

The clusterer to be paired with. If using with an existing clustering (-e), then the same clusterer must be used (see below). Otherwise, one must decide which clusterer should be used. The clusterers are:

leiden: Leiden (leidenalg) with CPM optimization, must specify -g, --resolution later
ikc: Iterative k-core, must specify -k later
leiden_mod: Leiden with modularity optimization, no other parameters allowed to be specified

`-e, --existing-clustering CLUSTERING_FILE`

Specifies the starting clustering (in effect saving time for cm to reproduce the initial clustering) to be modified to have sufficient connectivity thresholds (c.f. -t). The file format is "native" to the clustering method. For example, for IKC, it is the default IKC csv output format. For Leiden, it is the Leiden output format (i.e., tab-delimited node_id cluster_id file).

`-g, --resolution FLOAT`, `-k, --k INTEGER`

The respective parameters for either Leiden(CPM) (-c leiden) or IKC (-c ikc). Only at most one should be specified, and for modularity optimization neither should be specified.

`-o, --output OUTPUT_PREFIX`

The output prefix. Two files will be produced, first the OUTPUT_PREFIX will have a file denoting the last cluster a node has been in, and {OUTPUT_PREFIX}.tree.json is a serialized tree denoting the history of the execution of the algorithm. See also converting the output to more parsable formats.

`-t, --threshold TEXT`

Threshold expression. cm guarantees that the output clustering all have clusters that are above a specific threshold. We list some examples for -t below:

# each line denotes a valid example for "-t"
2 # connectivity must >= 2
0.1mcd # connectivity must >= 0.1 MCD, MCD the minimum intra-cluster degree
0.1mcd+42 # linear combinations are allowed to some extent
1log10 # >= log10 of size of cluster
99log10+0.0002mcd+1 # combinations like this are allowed

`-d, --working-dir TEXT`

Entirely optional; specifies where cm should store its temporary files.

Example commands

# Leiden, CPM optimization (resolution = 0.1)
# BUT, the output clusters must satisfy global connectivity >= 1 * log10(n), n the size of cluster
cm -i graph.tsv -c leiden -g 0.1 -t 1log10 -o leiden_clus.txt

# IKC, k = 10
# BUT, the output clusters must satisfy global connectivity >= 0.1 * mcd, MCD the minimum intra-cluster degree among all nodes
# we additionally use an existing IKC clustering (ikc_output.csv) as the starting point to be modified
cm -i graph.tsv -c ikc -k 10 -t 0.1mcd -e ikc_output.csv -o ikc_clus.txt

Format Conversion

The default output of cm contains the entire history of the execution of the algorithm. This format allows preservation of much information, but often times for data analysis, only knowing the clustering before modifying the connectivity (i.e., as if just running the base method) and after modifying the connectivity is enough. These two sets of clusters can be obtained from cm using the specialized tool cm2universal:

# INPUT_GRAPH is the same INPUT_GRAPH
# CM_OUTPUT_PREFIX is the same output prefix of `cm`, i.e., `{CM_OUTPUT_PREFIX}.tree.json` and `CM_OUTPUT_PREFIX` are existing files
# CLUSTERS_OUTPUT_PREFIX is where you want the converted clusters
cm2universal -g INPUT_GRAPH -i CM_OUTPUT_PREFIX -o CLUSTERS_OUTPUT_PREFIX

Two files will be generated: {CLUSTERS_OUTPUT_PREFIX}.original.json and {CLUSTERS_OUTPUT_PREFIX}.extant.json, containing the original and after clusters respectively. The json files use the so-called "universal" new-line delimited JSON format, looking like this:

{"label": "0", "nodes": [0, 3, 7, 9], "connectivity": 1}
{"label": "46", "nodes": [5765736, 4717164, 14154348, 3144303, 6290035, 3668596, 1571445, 2620022, 4717176], "connectivity": 2}

These files can be directly parsed (each line is a cluster, label the cluster name, nodes the node ids of that cluster, connectivity the edge connectivity) or can be paired with the data science tool belinda.

Development

We use Poetry to manage our progress and follow the Poetry conventions. See below for some example commands:

poetry install # install networkit and co
poetry run pytest # run tests

Project details

Release history Release notifications | RSS feed

0.1.0b13 pre-release

Dec 24, 2022

0.1.0b12 pre-release

Dec 24, 2022

0.1.0b11 pre-release

Dec 24, 2022

0.1.0b10 pre-release

Dec 24, 2022

0.1.0b8 pre-release

Dec 24, 2022

0.1.0b7 pre-release

Dec 24, 2022

0.1.0b6 pre-release

Dec 23, 2022

0.1.0b5 pre-release

Dec 22, 2022

0.1.0b4 pre-release

Dec 20, 2022

0.1.0b3 pre-release

Dec 20, 2022

0.1.0b2 pre-release

Dec 16, 2022

0.1.0b1 pre-release

Dec 15, 2022

This version

0.1.0b0 pre-release

Dec 14, 2022

0.1.0a2 pre-release

Dec 14, 2022

0.1.0a0 pre-release yanked

Dec 11, 2022

Reason this release was yanked:

Wrong stated supporting Python 3.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

connectivity-modifier-0.1.0b0.tar.gz (20.4 kB view hashes)

Uploaded Dec 14, 2022 Source

Built Distribution

connectivity_modifier-0.1.0b0-py3-none-any.whl (20.4 kB view hashes)

Uploaded Dec 14, 2022 Python 3

Hashes for connectivity-modifier-0.1.0b0.tar.gz

Hashes for connectivity-modifier-0.1.0b0.tar.gz
Algorithm	Hash digest
SHA256	`48cb2ffd4c9be9feac878380fbbc3f7a7b974c9dec761d40a40c1e9b4a35e459`
MD5	`6a1365bf4ad8ced6184c58ff14bc1d9a`
BLAKE2b-256	`1a42258cb19cdf22dcaa44a422dfb5e024f1dd017edec62ff837c0bed6e8c6fc`

Hashes for connectivity_modifier-0.1.0b0-py3-none-any.whl

Hashes for connectivity_modifier-0.1.0b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d30285fa058660a6e2e15e998d2a232d85cd7b236272c6d72a49726b20ff63d`
MD5	`aaf7dfc2d0d1cded1fd34e9835ebd5e3`
BLAKE2b-256	`71f5d79760aa3465cc3c22c4538eb1da32c7bdfa5fb9061ef43d4e86846aad73`

connectivity-modifier 0.1.0b0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

cm

Installation

Usage

`-i, --input GRAPH_TSV`

`-c, --clusterer [leiden|ikc|leiden_mod]`

`-e, --existing-clustering CLUSTERING_FILE`

`-g, --resolution FLOAT`, `-k, --k INTEGER`

`-o, --output OUTPUT_PREFIX`

`-t, --threshold TEXT`

`-d, --working-dir TEXT`

Example commands

Format Conversion

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

connectivity-modifier 0.1.0b0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

cm

Installation

Usage

-i, --input GRAPH_TSV

-c, --clusterer [leiden|ikc|leiden_mod]

-e, --existing-clustering CLUSTERING_FILE

-g, --resolution FLOAT, -k, --k INTEGER

-o, --output OUTPUT_PREFIX

-t, --threshold TEXT

-d, --working-dir TEXT

Example commands

Format Conversion

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`-i, --input GRAPH_TSV`

`-c, --clusterer [leiden|ikc|leiden_mod]`

`-e, --existing-clustering CLUSTERING_FILE`

`-g, --resolution FLOAT`, `-k, --k INTEGER`

`-o, --output OUTPUT_PREFIX`

`-t, --threshold TEXT`

`-d, --working-dir TEXT`