Skip to main content

Community detection via Louvain/Leiden + Genetic Algorithm

Project description

TAU Community Detection

PyPI License: MIT Python 3.10+ Downloads Build Status Ruff

tau-community-detection implements TAU, an evolutionary community detection algorithm that couples genetic search with Leiden refinements. It is designed for scalable graph clustering with a simple drop-in run_clustering() API, sensible defaults, and multiprocessing support.


Highlights

  • Evolutionary search: Maintains a population of candidate partitions and applies crossover and mutation tailored for graph clustering.
  • Leiden optimization: Refines every candidate with Leiden to ensure modularity gains each generation.
  • Multiprocessing aware: Utilises parallel worker pools for population optimization with automatic fallback to sequential mode.
  • Fully reproducible: Pass random_seed to seed both TAU's numpy RNG and igraph's Leiden RNG — same seed always produces identical results.
  • Input flexibility: Accepts igraph.Graph, networkx.Graph, or a file path. Edge weights are auto-detected.
  • Simple API: Use run_clustering(graph) for zero-friction usage, or drop down to TauClustering + TauConfig for full control.

Installation

Requires Python 3.10 or newer.

pip install tau-community-detection

To work from a clone:

git clone https://github.com/HillelCharbit/TAU.git
cd TAU
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
pip install -e .

Quick Start

import igraph as ig
from tau_community_detection import run_clustering

g = ig.Graph.Famous("Zachary")

# Zero-friction default usage
clustering = run_clustering(g)
print(f"Communities: {len(clustering)},  Modularity: {clustering.modularity:.4f}")

# Override only the knobs you care about
clustering = run_clustering(
    g,
    resolution=0.8,
    random_seed=42,
    verbose=True,
    population_size=100,
    max_generations=50,
)

run_clustering() returns an igraph.VertexClustering, so .membership, .modularity, and all standard igraph attributes are available immediately.

NetworkX input

import networkx as nx
from tau_community_detection import run_clustering

g = nx.erdos_renyi_graph(n=500, p=0.02, seed=0)
clustering = run_clustering(g)

Advanced usage with TauClustering

For full control over the lifecycle — including reusing the worker pool across multiple runs:

from tau_community_detection import TauClustering, TauConfig

config = TauConfig(
    population_size=60,
    max_generations=20,
    resolution=1.0,
    elite_fraction=0.15,
    immigrant_fraction=0.2,
    stopping_generations=10,
    random_seed=42,
    verbose=True,
)

with TauClustering(g, config=config) as tau:
    clustering, stats = tau.run(track_stats=True)

print(f"Ran for {len(stats)} generations")
print(f"Final modularity: {clustering.modularity:.4f}")

track_stats=True returns a list of per-generation dicts with keys generation, top_fitness, average_fitness, time_per_generation, convergence, elite_runtime, crossover_runtime.


Graph Input

Supported sources:

Type Notes
igraph.Graph Passed directly; weights auto-detected from "weight" edge attribute
networkx.Graph Converted internally; weights auto-detected
str (file path) Edgelist/NCOL (.graph, .edgelist, .txt) or adjacency list (.adjlist)

For large graphs or high worker counts, passing a file path is recommended — it avoids serialising the graph object across worker processes.

Edge weights are detected automatically. To override:

from tau_community_detection import TauConfig
config = TauConfig(is_weighted=False)   # force unweighted even if file has weights

Configuration Reference

All hyperparameters live on TauConfig. Every field is validated on construction — invalid values raise ValueError immediately.

Parameter Default Valid range Description
population_size 60 > 0 Number of candidate partitions per generation
max_generations 20 > 0 Hard cap on evolutionary iterations
worker_count None ≥ 1 Parallel workers (default: CPU count, capped by population size)
elite_fraction 0.1 (0, 1] Fraction of best partitions preserved each generation
immigrant_fraction 0.15 (0, 1] Fraction of fresh random partitions injected each generation
selection_power 5 > 0 Sharpness of fitness-proportional parent selection
elite_similarity_threshold 0.9 [0, 1] Jaccard threshold below which two elites are considered diverse
stopping_generations 10 > 0 Generations without improvement before early stopping
stopping_jaccard 0.98 [0, 1] Similarity threshold that counts as "no improvement"
n_iterations 3 > 0 Leiden iterations per fitness evaluation
resolution 1.0 > 0 Leiden resolution — higher values produce more, smaller communities
sample_fraction_range (0.2, 0.9) 0 < low ≤ high ≤ 1 Range for random subgraph sampling during population init
is_weighted None bool or None Override weight auto-detection (None = auto)
sim_sample_size 20 000 int or None Node sample size for Jaccard similarity (None = all nodes)
random_seed None int or None Seeds both numpy and igraph's Leiden RNG for fully deterministic results
verbose False bool Log progress to the standard Python logger

run_clustering() exposes the most common parameters directly. For any other TauConfig field, use TauClustering with a TauConfig directly:

from tau_community_detection import TauClustering, TauConfig

config = TauConfig(elite_fraction=0.2, stopping_generations=5)
with TauClustering(g, config=config) as t:
    clustering = t.run()

Development

pip install -r requirements-dev.txt
pip install -e .
make lint     # ruff checks
make test     # pytest
make coverage # pytest + coverage report
make build    # build sdist + wheel

Continuous Integration

GitHub Actions runs lint, tests (Python 3.10 and 3.11), and a package build on every push and pull request. Set the CODECOV_TOKEN secret to upload coverage reports.

Publishing

  1. Bump version in setup.cfg and commit.
  2. Tag the release: git tag vX.Y.Z && git push --tags.
  3. Run the Publish Package workflow. Use TEST_PYPI_API_TOKEN for a dry run on TestPyPI, or PYPI_API_TOKEN to publish to PyPI.

Reference & Citation

If you use TAU in your research, please cite:

From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm Gal Gilad and Roded Sharan. PNAS Nexus, Volume 2, Issue 6, June 2023. DOI: 10.1093/pnasnexus/pgad180

@article{gilad2023tau,
  title={From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm},
  author={Gilad, Gal and Sharan, Roded},
  journal={PNAS Nexus},
  volume={2},
  number={6},
  pages={pgad180},
  year={2023},
  publisher={Oxford University Press}
}

License

MIT License © 2023 Hillel Charbit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tau_community_detection-1.4.2.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tau_community_detection-1.4.2-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file tau_community_detection-1.4.2.tar.gz.

File metadata

  • Download URL: tau_community_detection-1.4.2.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tau_community_detection-1.4.2.tar.gz
Algorithm Hash digest
SHA256 565b8f0a13abf7a1cd3507f6144d548835d73383e32c2d1605bf11c3c8cb03ab
MD5 9ef728ff1e67d463cd7dfff8d23d3ad2
BLAKE2b-256 484f70924dd577774735d5a7f2545a675d37eacd88cb22bb9c5a6d1e5993348c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tau_community_detection-1.4.2.tar.gz:

Publisher: publish.yml on HillelCharbit/TAU

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tau_community_detection-1.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for tau_community_detection-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 887992c3cc2d1abba2d279ab14f44368a9bd7f28681cb9fc1d83e8b74517cf4d
MD5 a63c74a6ceab90202feb7adb9f3f8add
BLAKE2b-256 6cfbd786452ce5b97737e8f32de333396cf18acb586989948d09b73109fe4952

See more details on using hashes here.

Provenance

The following attestation bundles were made for tau_community_detection-1.4.2-py3-none-any.whl:

Publisher: publish.yml on HillelCharbit/TAU

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page