Community detection via Louvain/Leiden + Genetic Algorithm
Project description
TAU Community Detection
tau-community-detection implements TAU, an evolutionary community detection algorithm
that couples genetic search with Leiden refinements. It is designed for scalable graph
clustering with a simple drop-in run_clustering() API, sensible defaults, and
multiprocessing support.
Highlights
- Evolutionary search: Maintains a population of candidate partitions and applies crossover and mutation tailored for graph clustering.
- Leiden optimization: Refines every candidate with Leiden to ensure modularity gains each generation.
- Multiprocessing aware: Utilises parallel worker pools for population optimization with automatic fallback to sequential mode.
- Fully reproducible: Pass
random_seedto seed both TAU's numpy RNG and igraph's Leiden RNG — same seed always produces identical results. - Input flexibility: Accepts
igraph.Graph,networkx.Graph, or a file path. Edge weights are auto-detected. - Simple API: Use
run_clustering(graph)for zero-friction usage, or drop down toTauClustering+TauConfigfor full control.
Installation
Requires Python 3.10 or newer.
pip install tau-community-detection
To work from a clone:
git clone https://github.com/HillelCharbit/TAU.git
cd TAU
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
pip install -e .
Quick Start
import igraph as ig
from tau_community_detection import run_clustering
g = ig.Graph.Famous("Zachary")
# Zero-friction default usage
clustering = run_clustering(g)
print(f"Communities: {len(clustering)}, Modularity: {clustering.modularity:.4f}")
# Override only the knobs you care about
clustering = run_clustering(
g,
resolution_parameter=0.8,
random_seed=42,
verbose=True,
population_size=100,
max_generations=50,
)
run_clustering() returns an igraph.VertexClustering, so .membership, .modularity, and all standard igraph attributes are available immediately.
NetworkX input
import networkx as nx
from tau_community_detection import run_clustering
g = nx.erdos_renyi_graph(n=500, p=0.02, seed=0)
clustering = run_clustering(g)
Advanced usage with TauClustering
For full control over the lifecycle — including reusing the worker pool across multiple runs:
from tau_community_detection import TauClustering, TauConfig
config = TauConfig(
population_size=60,
max_generations=20,
resolution_parameter=1.0,
elite_fraction=0.15,
immigrant_fraction=0.2,
stopping_generations=10,
random_seed=42,
verbose=True,
)
with TauClustering(g, config=config) as tau:
clustering, stats = tau.run(track_stats=True)
print(f"Ran for {len(stats)} generations")
print(f"Final modularity: {clustering.modularity:.4f}")
track_stats=True returns a list of per-generation dicts with keys generation, top_fitness, average_fitness, time_per_generation, convergence, elite_runtime, crossover_runtime.
Graph Input
Supported sources:
| Type | Notes |
|---|---|
igraph.Graph |
Passed directly; weights auto-detected from "weight" edge attribute |
networkx.Graph |
Converted internally; weights auto-detected |
str (file path) |
Edgelist/NCOL (.graph, .edgelist, .txt) or adjacency list (.adjlist) |
For large graphs or high worker counts, passing a file path is recommended — it avoids serialising the graph object across worker processes.
Edge weights are detected automatically. To override:
from tau_community_detection import TauConfig
config = TauConfig(is_weighted=False) # force unweighted even if file has weights
Configuration Reference
All hyperparameters live on TauConfig. Every field is validated on construction — invalid values raise ValueError immediately.
| Parameter | Default | Valid range | Description |
|---|---|---|---|
population_size |
60 | > 0 | Number of candidate partitions per generation |
max_generations |
20 | > 0 | Hard cap on evolutionary iterations |
worker_count |
None |
≥ 1 | Parallel workers (default: CPU count, capped by population size) |
elite_fraction |
0.1 | (0, 1] | Fraction of best partitions preserved each generation |
immigrant_fraction |
0.15 | (0, 1] | Fraction of fresh random partitions injected each generation |
selection_power |
5 | > 0 | Sharpness of fitness-proportional parent selection |
elite_similarity_threshold |
0.9 | [0, 1] | Jaccard threshold below which two elites are considered diverse |
stopping_generations |
10 | > 0 | Generations without improvement before early stopping |
stopping_jaccard |
0.98 | [0, 1] | Similarity threshold that counts as "no improvement" |
n_iterations |
3 | > 0 | Leiden iterations per fitness evaluation |
resolution_parameter |
1.0 | > 0 | Leiden resolution — higher values produce more, smaller communities |
sample_fraction_range |
(0.2, 0.9) | 0 < low ≤ high ≤ 1 | Range for random subgraph sampling during population init |
is_weighted |
None |
bool or None | Override weight auto-detection (None = auto) |
sim_sample_size |
20 000 | int or None | Node sample size for Jaccard similarity (None = all nodes) |
random_seed |
None |
int or None | Seeds both numpy and igraph's Leiden RNG for fully deterministic results |
verbose |
False |
bool | Log progress to the standard Python logger |
run_clustering() exposes the most common parameters directly. Any TauConfig field can also be passed as a keyword argument:
clustering = run_clustering(g, elite_fraction=0.2, stopping_generations=5)
Development
pip install -r requirements-dev.txt
pip install -e .
make lint # ruff checks
make test # pytest
make coverage # pytest + coverage report
make build # build sdist + wheel
Continuous Integration
GitHub Actions runs lint, tests (Python 3.10 and 3.11), and a package build on every push and pull request. Set the CODECOV_TOKEN secret to upload coverage reports.
Publishing
- Bump
versioninsetup.cfgand commit. - Tag the release:
git tag vX.Y.Z && git push --tags. - Run the Publish Package workflow. Use
TEST_PYPI_API_TOKENfor a dry run on TestPyPI, orPYPI_API_TOKENto publish to PyPI.
Reference & Citation
If you use TAU in your research, please cite:
From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm Gal Gilad and Roded Sharan. PNAS Nexus, Volume 2, Issue 6, June 2023. DOI: 10.1093/pnasnexus/pgad180
@article{gilad2023tau,
title={From Leiden to Tel-Aviv University (TAU): exploring clustering solutions via a genetic algorithm},
author={Gilad, Gal and Sharan, Roded},
journal={PNAS Nexus},
volume={2},
number={6},
pages={pgad180},
year={2023},
publisher={Oxford University Press}
}
License
MIT License © 2023 Hillel Charbit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tau_community_detection-1.4.0.tar.gz.
File metadata
- Download URL: tau_community_detection-1.4.0.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76a4429464503358c6e1dd4634bbc25ff9e4cdaf05fb32390678caadc18a693e
|
|
| MD5 |
19ec97b51470cf77f92538dd789b00cd
|
|
| BLAKE2b-256 |
daa43463b12773bcd3cf0ff639e8d0cebc402bf91cc71ff6447f86f0e31e0725
|
Provenance
The following attestation bundles were made for tau_community_detection-1.4.0.tar.gz:
Publisher:
publish.yml on HillelCharbit/TAU
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tau_community_detection-1.4.0.tar.gz -
Subject digest:
76a4429464503358c6e1dd4634bbc25ff9e4cdaf05fb32390678caadc18a693e - Sigstore transparency entry: 1820729910
- Sigstore integration time:
-
Permalink:
HillelCharbit/TAU@50c4f8517a85b41a699c804ac86e2e26e522f855 -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/HillelCharbit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@50c4f8517a85b41a699c804ac86e2e26e522f855 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tau_community_detection-1.4.0-py3-none-any.whl.
File metadata
- Download URL: tau_community_detection-1.4.0-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36de9a06101b5b6d563461d441605f48d31bda515346d402e56c49def00f467b
|
|
| MD5 |
be2594c7d60412a360344edb80648407
|
|
| BLAKE2b-256 |
fd095a10f12f06f3ef46ee12a086c45b192bf310e228509d837037966f984741
|
Provenance
The following attestation bundles were made for tau_community_detection-1.4.0-py3-none-any.whl:
Publisher:
publish.yml on HillelCharbit/TAU
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tau_community_detection-1.4.0-py3-none-any.whl -
Subject digest:
36de9a06101b5b6d563461d441605f48d31bda515346d402e56c49def00f467b - Sigstore transparency entry: 1820729941
- Sigstore integration time:
-
Permalink:
HillelCharbit/TAU@50c4f8517a85b41a699c804ac86e2e26e522f855 -
Branch / Tag:
refs/tags/v1.4.0 - Owner: https://github.com/HillelCharbit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@50c4f8517a85b41a699c804ac86e2e26e522f855 -
Trigger Event:
push
-
Statement type: