Skip to main content

Community detection via Louvain/Leiden + Genetic Algorithm

Project description

TAU Community Detection

PyPI License: MIT Python 3.10+ Downloads Build Status Ruff

tau-community-detection implements TAU, an evolutionary community detection algorithm that couples genetic search with Leiden refinements. It is designed for scalable graph clustering with a simple drop-in run_clustering() API, sensible defaults, and multiprocessing support.


Highlights

  • Evolutionary search: Maintains a population of candidate partitions and applies crossover and mutation tailored for graph clustering.
  • Leiden optimization: Refines every candidate with Leiden to ensure modularity gains each generation.
  • Multiprocessing aware: Utilises parallel worker pools for population optimization with automatic fallback to sequential mode.
  • Fully reproducible: Pass random_seed to seed both TAU's numpy RNG and igraph's Leiden RNG — same seed always produces identical results.
  • Input flexibility: Accepts igraph.Graph, networkx.Graph, or a file path. Edge weights are auto-detected.
  • Simple API: Use run_clustering(graph) for zero-friction usage, or drop down to TauClustering + TauConfig for full control.

Installation

Requires Python 3.10 or newer.

pip install tau-community-detection

To work from a clone:

git clone https://github.com/HillelCharbit/TAU.git
cd TAU
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
pip install -e .

Quick Start

import igraph as ig
from tau_community_detection import run_clustering

g = ig.Graph.Famous("Zachary")

# Zero-friction default usage
clustering = run_clustering(g)
print(f"Communities: {len(clustering)},  Modularity: {clustering.modularity:.4f}")

# Override only the knobs you care about
clustering = run_clustering(
    g,
    resolution_parameter=0.8,
    random_seed=42,
    verbose=True,
    population_size=100,
    max_generations=50,
)

run_clustering() returns an igraph.VertexClustering, so .membership, .modularity, and all standard igraph attributes are available immediately.

NetworkX input

import networkx as nx
from tau_community_detection import run_clustering

g = nx.erdos_renyi_graph(n=500, p=0.02, seed=0)
clustering = run_clustering(g)

Advanced usage with TauClustering

For full control over the lifecycle — including reusing the worker pool across multiple runs:

from tau_community_detection import TauClustering, TauConfig

config = TauConfig(
    population_size=60,
    max_generations=20,
    resolution_parameter=1.0,
    elite_fraction=0.15,
    immigrant_fraction=0.2,
    stopping_generations=10,
    random_seed=42,
    verbose=True,
)

with TauClustering(g, config=config) as tau:
    clustering, stats = tau.run(track_stats=True)

print(f"Ran for {len(stats)} generations")
print(f"Final modularity: {clustering.modularity:.4f}")

track_stats=True returns a list of per-generation dicts with keys generation, top_fitness, average_fitness, time_per_generation, convergence, elite_runtime, crossover_runtime.


Graph Input

Supported sources:

Type Notes
igraph.Graph Passed directly; weights auto-detected from "weight" edge attribute
networkx.Graph Converted internally; weights auto-detected
str (file path) Edgelist/NCOL (.graph, .edgelist, .txt) or adjacency list (.adjlist)

For large graphs or high worker counts, passing a file path is recommended — it avoids serialising the graph object across worker processes.

Edge weights are detected automatically. To override:

from tau_community_detection import TauConfig
config = TauConfig(is_weighted=False)   # force unweighted even if file has weights

Configuration Reference

All hyperparameters live on TauConfig. Every field is validated on construction — invalid values raise ValueError immediately.

Parameter Default Valid range Description
population_size 60 > 0 Number of candidate partitions per generation
max_generations 20 > 0 Hard cap on evolutionary iterations
worker_count None ≥ 1 Parallel workers (default: CPU count, capped by population size)
elite_fraction 0.1 (0, 1] Fraction of best partitions preserved each generation
immigrant_fraction 0.15 (0, 1] Fraction of fresh random partitions injected each generation
selection_power 5 > 0 Sharpness of fitness-proportional parent selection
elite_similarity_threshold 0.9 [0, 1] Jaccard threshold below which two elites are considered diverse
stopping_generations 10 > 0 Generations without improvement before early stopping
stopping_jaccard 0.98 [0, 1] Similarity threshold that counts as "no improvement"
n_iterations 3 > 0 Leiden iterations per fitness evaluation
resolution_parameter 1.0 > 0 Leiden resolution — higher values produce more, smaller communities
sample_fraction_range (0.2, 0.9) 0 < low ≤ high ≤ 1 Range for random subgraph sampling during population init
is_weighted None bool or None Override weight auto-detection (None = auto)
sim_sample_size 20 000 int or None Node sample size for Jaccard similarity (None = all nodes)
random_seed None int or None Seeds both numpy and igraph's Leiden RNG for fully deterministic results
verbose False bool Log progress to the standard Python logger

run_clustering() exposes the most common parameters directly. Any TauConfig field can also be passed as a keyword argument:

clustering = run_clustering(g, elite_fraction=0.2, stopping_generations=5)

Development

pip install -r requirements-dev.txt
pip install -e .
make lint     # ruff checks
make test     # pytest
make coverage # pytest + coverage report
make build    # build sdist + wheel

Continuous Integration

GitHub Actions runs lint, tests (Python 3.10 and 3.11), and a package build on every push and pull request. Set the CODECOV_TOKEN secret to upload coverage reports.

Publishing

  1. Bump version in setup.cfg and commit.
  2. Tag the release: git tag vX.Y.Z && git push --tags.
  3. Run the Publish Package workflow. Use TEST_PYPI_API_TOKEN for a dry run on TestPyPI, or PYPI_API_TOKEN to publish to PyPI.

Reference & Citation

If you use TAU in your research, please cite:

From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm Gal Gilad and Roded Sharan. PNAS Nexus, Volume 2, Issue 6, June 2023. DOI: 10.1093/pnasnexus/pgad180

@article{gilad2023tau,
  title={From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm},
  author={Gilad, Gal and Sharan, Roded},
  journal={PNAS Nexus},
  volume={2},
  number={6},
  pages={pgad180},
  year={2023},
  publisher={Oxford University Press}
}

License

MIT License © 2023 Hillel Charbit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tau_community_detection-1.4.1.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tau_community_detection-1.4.1-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file tau_community_detection-1.4.1.tar.gz.

File metadata

  • Download URL: tau_community_detection-1.4.1.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tau_community_detection-1.4.1.tar.gz
Algorithm Hash digest
SHA256 6abb6a73c03f0118d84c6cd61d5e5209f15bff36fd52f85c3749d6192c8754ad
MD5 cbc7b0bfd3c300901d5ab2d6e7f1849b
BLAKE2b-256 407e492efaaf92ca2ea5780923905eafa742f0407f912e61c9df405f8bf6e295

See more details on using hashes here.

Provenance

The following attestation bundles were made for tau_community_detection-1.4.1.tar.gz:

Publisher: publish.yml on HillelCharbit/TAU

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tau_community_detection-1.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tau_community_detection-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed93a3d3169541d4411ef45acfc4b407b7b8e7c8f74f792b4508d0596964b66c
MD5 575cc3b856d8109089a4842894e6abf6
BLAKE2b-256 ed99d4424b2bfa9cf70ac267beb64141946191b97643cc4a69796f09c14051c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for tau_community_detection-1.4.1-py3-none-any.whl:

Publisher: publish.yml on HillelCharbit/TAU

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page