Louvain is a general algorithm for methods of community detection in large networks.
Project description
INTRODUCTION
============
This package implements the louvain algorithm in `C++` and exposes it to `python`.
It relies on `(python-)igraph` for it to function. Besides the relative
flexibility of the implementation, it also scales well, and can be run on graphs
of millions of nodes (as long as they can fit in memory). The core function is
``find_partition`` which finds the optimal partition using the louvain algorithm
for a number of different methods. The methods currently implemented are:
* Modularity.
This method compares the actual graph to the expected graph, taking into
account the degree of the nodes [1]. The expected graph is based on a
configuration null-model. Notice that we use the non-normalized version (i.e.
we don't divide by the number of edges), so that this Modularity values
generally does not fall between 0 and 1. The formal definition is
```
H = sum_ij (A_ij - k_i k_j / 2m) d(s_i, s_j),
```
where `A_ij = 1` if there is an edge between node `i` and `j`, `k_i` is the degree of
node `i` and `s_i` is the community of node i.
* RBConfiguration.
This is an extension of modularity which includes a resolution parameter [2].
In general, a higher resolution parameter will lead to smaller communities.
The formal definition is
```
H = sum_ij (A_ij - gamma k_i k_j / 2m) d(s_i, s_j),
```
where `gamma` is the resolution value, and the other variables are the same as
for Modularity.
* RBER.
A variant of the previous method that instead of a configuration null-model
uses a Erdös-Rényi null-model in which each edge has the same probability of
appearing [2]. The formal definition is
```
H = sum_ij (A_ij - gamma p) d(s_i, s_j),
```
where `p` is the density of the graph, and the other variables are the same as
for Modularity, with `gamma` a resolution parameter.
* CPM.
This method compares to a fixed resolution parameter, so that it finds
communities that have an internal density higher than the resolution
parameter, and is separated from other communities with a density lower than
the resolution parameter [3].The formal definition is
```
H = sum_ij (A_ij - gamma ) d(s_i, s_j),
```
with `gamma` a resolution parameter, and the other variables are the same as for
Modularity.
* Significance.
This is a probabilistic method based on the idea of assessing the probability
of finding such dense subgraphs in an (ER) random graph [4]. The formal
definition is
```
H = sum_c M_c D(p_c || p)
```
where `M_c` is the number of possible edges in community `c`, i.e. `n_c (n_c - 1)/2`
for undirected graphs and twice that for directed grahs with `n_c` the size of
community `c`, `p_c` is the density of the community `c`, and `p` the general density
of the graph, and `D(x || y)` is the binary Kullback-Leibler divergence.
* Surprise.
Another probabilistic method, but rather than the probability of finding dense
subgraphs, it focuses on the probability of so many edges within communities
[5, 6]. The formal definition is
```
H = m D(q || <q>)
```
where `m` is the number of edges, `q` is the proportion of edges within
communities (i.e. `sum_c m_c / m`) and `<q>` is the expected proportion of edges
within communities in an Erdős–Rényi graph.
INSTALLATION
============
In short, for Unix: ``sudo pip install louvain``.
For Windows: download the binary installers.
For Unix like systems it is possible to install from source. For Windows this is
overly complicated, and you are recommended to use the binary installation files.
There are two things that are needed by this package: the igraph c core library
and the python-igraph python package. For both, please see http://igraph.org.
There are basically two installation modes, similar to the python-igraph package
itself (from which most of the setup.py comes).
1. No C core library is installed yet. The packages will be compiled and linked
statically to an automatically downloaded version of the C core library of
igraph.
2. A C core library is already installed. In this case, the package will link
dynamically to the already installed version. This is probably also the
version that is used by the igraph package, but you may want to double check
this.
In case the python-igraph package is already installed before, make sure that
both use the **same versions**.
The cleanest setup it to install and compile the C core library yourself (make
sure that the header files are also included, e.g. install also the development
package from igraph). Then both the python-igraph package, as well as this
package are compiled and (dynamically) linked to the same C core library.
TROUBLESHOOTING
===============
In case of any problems, best to start over with a clean environment. Make sure
you remove the python-igraph package completely, remove the C core library and
remove the louvain package. Then, do a complete reinstall starting from ``pip
install louvain``. In case you want a dynamic library be sure to then install
the C core library from source before. Make sure you **install the same
versions**.
USAGE
=====
There is no standalone version of louvain-igraph, and you will always need
python to access it. There are no plans for developing a standalone version or R
support. So, use python. Please refer to the documentation within the python
package for more details on function calls and parameters.
To start, make sure to import the packages:
```python
import louvain
import igraph as ig
```
We'll create a random graph for testing purposes:
```python
G = ig.Graph.Erdos_Renyi(100, 0.1);
```
For simply finding a partition use:
```python
part = louvain.find_partition(G, method='Modularity');
```
In case you want to use a weighted graph, you can store this in an edge
attribute:
```python
G.es['weight'] = 1.0;
part = louvain.find_partition(G, method='Modularity', weight='weight');
```
Please note that not all methods are necessarily capable of handling weighted
graphs.
Notice that ``part`` now contains an additional variable, ``part.quality`` which
stores the quality of the partition as calculated by the used method. You can
always get the quality of the partition using another method by calling
```python
part.significance = louvain.quality(G, partition, method='Significance');
```
You can also find partition for multiplex graphs. For each layer you then
specify the objective function, and the overall objective function is simply the
sum over all layers, weighted by some weight. If we denote by ``q_k`` the quality
of layer ``k`` and the weight by ``w_k``, the overall quality is then ``q = sum_k
w_k q_k``. This can also be useful in case you have negative links. In
principle, this could also be used to detect temporal communities in a dynamic
setting, cf. [8].
For example, assuming you have a graph with positive weights ``G_positive`` and
a graph with negative weights ``G_negative``, and you want to use Modularity for
finding a partition, you can use
```python
membership, quality = louvain.find_partition_multiplex([
louvain.Layer(graph=G_positive, method='Modularity', layer_weight=1.0),
louvain.Layer(graph=G_negative, method='Modularity', layer_weight=-1.0)])
```
Notice the negative layer weight is ``-1.0`` for the negative graph, since we
want those edges to fall between communities rather than within. One particular
problem when using negative links, is that the optimal community is no longer
guaranteed to be connected (it may be a multipartite partition). You may
therefore need the options `consider_comms=ALL_COMMS` to improve the quality of
the partition. Notice that this runs much slower than only considering
neighbouring communities (which is the default).
Various methods (such as Reichardt and Bornholdt's Potts model, or CPM) support
a (linear) resolution parameter, which can be effectively bisected, cf. [5]. You
can do this by calling:
```python
res_parts = louvain.bisect(G, method='CPM', resolution_range=[0,1]);
```
Notice this may take some time to run, as it effectively calls
`louvain.find_partition` for various resolution parameters (depending on the
settings possibly hundreds of times).
Then `res_parts` is a dictionary containing as keys the resolution, and as
values a `NamedTuple` with variables `partition` and `bisect_value`, which
contains the partition and the value at which the resolution was bisected (the
value of the `bisect_func` of the `bisect` function). You could for example plot
the bisection value of all the found partitions by using:
```python
import pandas as pd
import matplotlib.pyplot as plt
res_df = pd.DataFrame({
'resolution': res_parts.keys(),
'bisect_value': [bisect.bisect_value for bisect in res_parts.values()]});
plt.step(res_df['resolution'], res_df['bisect_value']);
plt.xscale('log');
```
REFERENCES
==========
Please cite the references appropriately in case they are used.
1. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding
of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
2. Newman, M. & Girvan, M. Finding and evaluating community structure in networks.
Physical Review E 69, 026113 (2004).
3. Reichardt, J. & Bornholdt, S. Partitioning and modularity of graphs with arbitrary
degree distribution. Physical Review E 76, 015102 (2007).
4. Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free
community detection. Physical Review E 84, 016114 (2011).
5. Traag, V. A., Krings, G. & Van Dooren, P. Significant scales in community structure.
Scientific Reports 3, 2930 (2013).
6. Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure
of complex networks. Scientific reports 3, 1060 (2013).
7. Traag, V.A., Aldecoa, R. & Delvenne, J.-C. Detecting communities using Asymptotical
Surprise. Forthcoming (2015).
8. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P.
Community structure in time-dependent, multiscale, and multiplex networks.
Science 328, 876–8 (2010).
============
This package implements the louvain algorithm in `C++` and exposes it to `python`.
It relies on `(python-)igraph` for it to function. Besides the relative
flexibility of the implementation, it also scales well, and can be run on graphs
of millions of nodes (as long as they can fit in memory). The core function is
``find_partition`` which finds the optimal partition using the louvain algorithm
for a number of different methods. The methods currently implemented are:
* Modularity.
This method compares the actual graph to the expected graph, taking into
account the degree of the nodes [1]. The expected graph is based on a
configuration null-model. Notice that we use the non-normalized version (i.e.
we don't divide by the number of edges), so that this Modularity values
generally does not fall between 0 and 1. The formal definition is
```
H = sum_ij (A_ij - k_i k_j / 2m) d(s_i, s_j),
```
where `A_ij = 1` if there is an edge between node `i` and `j`, `k_i` is the degree of
node `i` and `s_i` is the community of node i.
* RBConfiguration.
This is an extension of modularity which includes a resolution parameter [2].
In general, a higher resolution parameter will lead to smaller communities.
The formal definition is
```
H = sum_ij (A_ij - gamma k_i k_j / 2m) d(s_i, s_j),
```
where `gamma` is the resolution value, and the other variables are the same as
for Modularity.
* RBER.
A variant of the previous method that instead of a configuration null-model
uses a Erdös-Rényi null-model in which each edge has the same probability of
appearing [2]. The formal definition is
```
H = sum_ij (A_ij - gamma p) d(s_i, s_j),
```
where `p` is the density of the graph, and the other variables are the same as
for Modularity, with `gamma` a resolution parameter.
* CPM.
This method compares to a fixed resolution parameter, so that it finds
communities that have an internal density higher than the resolution
parameter, and is separated from other communities with a density lower than
the resolution parameter [3].The formal definition is
```
H = sum_ij (A_ij - gamma ) d(s_i, s_j),
```
with `gamma` a resolution parameter, and the other variables are the same as for
Modularity.
* Significance.
This is a probabilistic method based on the idea of assessing the probability
of finding such dense subgraphs in an (ER) random graph [4]. The formal
definition is
```
H = sum_c M_c D(p_c || p)
```
where `M_c` is the number of possible edges in community `c`, i.e. `n_c (n_c - 1)/2`
for undirected graphs and twice that for directed grahs with `n_c` the size of
community `c`, `p_c` is the density of the community `c`, and `p` the general density
of the graph, and `D(x || y)` is the binary Kullback-Leibler divergence.
* Surprise.
Another probabilistic method, but rather than the probability of finding dense
subgraphs, it focuses on the probability of so many edges within communities
[5, 6]. The formal definition is
```
H = m D(q || <q>)
```
where `m` is the number of edges, `q` is the proportion of edges within
communities (i.e. `sum_c m_c / m`) and `<q>` is the expected proportion of edges
within communities in an Erdős–Rényi graph.
INSTALLATION
============
In short, for Unix: ``sudo pip install louvain``.
For Windows: download the binary installers.
For Unix like systems it is possible to install from source. For Windows this is
overly complicated, and you are recommended to use the binary installation files.
There are two things that are needed by this package: the igraph c core library
and the python-igraph python package. For both, please see http://igraph.org.
There are basically two installation modes, similar to the python-igraph package
itself (from which most of the setup.py comes).
1. No C core library is installed yet. The packages will be compiled and linked
statically to an automatically downloaded version of the C core library of
igraph.
2. A C core library is already installed. In this case, the package will link
dynamically to the already installed version. This is probably also the
version that is used by the igraph package, but you may want to double check
this.
In case the python-igraph package is already installed before, make sure that
both use the **same versions**.
The cleanest setup it to install and compile the C core library yourself (make
sure that the header files are also included, e.g. install also the development
package from igraph). Then both the python-igraph package, as well as this
package are compiled and (dynamically) linked to the same C core library.
TROUBLESHOOTING
===============
In case of any problems, best to start over with a clean environment. Make sure
you remove the python-igraph package completely, remove the C core library and
remove the louvain package. Then, do a complete reinstall starting from ``pip
install louvain``. In case you want a dynamic library be sure to then install
the C core library from source before. Make sure you **install the same
versions**.
USAGE
=====
There is no standalone version of louvain-igraph, and you will always need
python to access it. There are no plans for developing a standalone version or R
support. So, use python. Please refer to the documentation within the python
package for more details on function calls and parameters.
To start, make sure to import the packages:
```python
import louvain
import igraph as ig
```
We'll create a random graph for testing purposes:
```python
G = ig.Graph.Erdos_Renyi(100, 0.1);
```
For simply finding a partition use:
```python
part = louvain.find_partition(G, method='Modularity');
```
In case you want to use a weighted graph, you can store this in an edge
attribute:
```python
G.es['weight'] = 1.0;
part = louvain.find_partition(G, method='Modularity', weight='weight');
```
Please note that not all methods are necessarily capable of handling weighted
graphs.
Notice that ``part`` now contains an additional variable, ``part.quality`` which
stores the quality of the partition as calculated by the used method. You can
always get the quality of the partition using another method by calling
```python
part.significance = louvain.quality(G, partition, method='Significance');
```
You can also find partition for multiplex graphs. For each layer you then
specify the objective function, and the overall objective function is simply the
sum over all layers, weighted by some weight. If we denote by ``q_k`` the quality
of layer ``k`` and the weight by ``w_k``, the overall quality is then ``q = sum_k
w_k q_k``. This can also be useful in case you have negative links. In
principle, this could also be used to detect temporal communities in a dynamic
setting, cf. [8].
For example, assuming you have a graph with positive weights ``G_positive`` and
a graph with negative weights ``G_negative``, and you want to use Modularity for
finding a partition, you can use
```python
membership, quality = louvain.find_partition_multiplex([
louvain.Layer(graph=G_positive, method='Modularity', layer_weight=1.0),
louvain.Layer(graph=G_negative, method='Modularity', layer_weight=-1.0)])
```
Notice the negative layer weight is ``-1.0`` for the negative graph, since we
want those edges to fall between communities rather than within. One particular
problem when using negative links, is that the optimal community is no longer
guaranteed to be connected (it may be a multipartite partition). You may
therefore need the options `consider_comms=ALL_COMMS` to improve the quality of
the partition. Notice that this runs much slower than only considering
neighbouring communities (which is the default).
Various methods (such as Reichardt and Bornholdt's Potts model, or CPM) support
a (linear) resolution parameter, which can be effectively bisected, cf. [5]. You
can do this by calling:
```python
res_parts = louvain.bisect(G, method='CPM', resolution_range=[0,1]);
```
Notice this may take some time to run, as it effectively calls
`louvain.find_partition` for various resolution parameters (depending on the
settings possibly hundreds of times).
Then `res_parts` is a dictionary containing as keys the resolution, and as
values a `NamedTuple` with variables `partition` and `bisect_value`, which
contains the partition and the value at which the resolution was bisected (the
value of the `bisect_func` of the `bisect` function). You could for example plot
the bisection value of all the found partitions by using:
```python
import pandas as pd
import matplotlib.pyplot as plt
res_df = pd.DataFrame({
'resolution': res_parts.keys(),
'bisect_value': [bisect.bisect_value for bisect in res_parts.values()]});
plt.step(res_df['resolution'], res_df['bisect_value']);
plt.xscale('log');
```
REFERENCES
==========
Please cite the references appropriately in case they are used.
1. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding
of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
2. Newman, M. & Girvan, M. Finding and evaluating community structure in networks.
Physical Review E 69, 026113 (2004).
3. Reichardt, J. & Bornholdt, S. Partitioning and modularity of graphs with arbitrary
degree distribution. Physical Review E 76, 015102 (2007).
4. Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free
community detection. Physical Review E 84, 016114 (2011).
5. Traag, V. A., Krings, G. & Van Dooren, P. Significant scales in community structure.
Scientific Reports 3, 2930 (2013).
6. Aldecoa, R. & Marín, I. Surprise maximization reveals the community structure
of complex networks. Scientific reports 3, 1060 (2013).
7. Traag, V.A., Aldecoa, R. & Delvenne, J.-C. Detecting communities using Asymptotical
Surprise. Forthcoming (2015).
8. Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P.
Community structure in time-dependent, multiscale, and multiplex networks.
Science 328, 876–8 (2010).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
louvain-0.5.3.tar.gz
(48.1 kB
view hashes)
Built Distributions
louvain-0.5.3.win-amd64-py3.4.msi
(134.1 kB
view hashes)
louvain-0.5.3.win-amd64-py2.7.msi
(423.9 kB
view hashes)
louvain-0.5.3.win32-py3.4.msi
(123.4 kB
view hashes)
louvain-0.5.3.win32-py2.7.msi
(393.2 kB
view hashes)
louvain-0.5.3-py3.4-linux-x86_64.egg
(749.8 kB
view hashes)
louvain-0.5.3-py2.7-linux-x86_64.egg
(748.9 kB
view hashes)
Close
Hashes for louvain-0.5.3.win-amd64-py3.4.msi
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66659d39b9d150a11e613e190218204d106fb3cbab4c60a90c36a87ed3cae969 |
|
MD5 | fa15984d67c0524e69da0fc0b8c3043b |
|
BLAKE2b-256 | 9e5d50d8d11cf0d2dc068c209b3516b511f7595b2a6b4a958a44fb9eca6ed9d6 |
Close
Hashes for louvain-0.5.3.win-amd64-py2.7.msi
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56de476701e73f5ceffb79bd906b97930c3d977f2f7faf67af7e2a98b5d6ddd5 |
|
MD5 | 30337e0f93089a3901cc83e7d3d15f13 |
|
BLAKE2b-256 | f36e5c5fbe0f6854bfd7afb8fadf17ace335b341621ed78f7cbcd301767e3de9 |
Close
Hashes for louvain-0.5.3-py3.4-linux-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffa4d252f191f3c125a20f7c5e25e858a1feb89d6c7082cad77bd821f8892a95 |
|
MD5 | 40fc6b4b1a24adb04462cfbaa332790e |
|
BLAKE2b-256 | 2bc7edacd4c7662b748a29ca1f131f7f5e95867eea3eb1d1ece2b240c36c9cd6 |
Close
Hashes for louvain-0.5.3-py2.7-linux-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87877f0bdb2e48931f1c6f90de867498c5e6f4049f001e31cf1562a8d9035509 |
|
MD5 | 39fb45a1e5359367af9c0893747fa728 |
|
BLAKE2b-256 | 6804d5e6598e2a91ff3bdbbe733185a2596b54401dd2eba9c8ad50798b6b0967 |