The Louvain method for community detection is a greedy optimization method intended to extract non-overlapping communities from large networks created by Blondel et al.[1] from the University of Louvain (the source of this method's name).

Modularity optimization

edit

The inspiration for this method of community detection is the optimization of modularity as the algorithm progresses. Modularity is a scale value between −0.5 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities with respect to edges outside communities. Optimizing this value theoretically results in the best possible grouping of the nodes of a given network. But because going through all possible iterations of the nodes into groups is impractical, heuristic algorithms are used.

In the Louvain Method of community detection, first small communities are found by optimizing modularity locally on all nodes, then each small community is grouped into one node and the first step is repeated. The method is similar to the earlier method by Clauset, Newman and Moore[2] that connects communities whose amalgamation produces the largest increase in modularity. The Louvain algorithm was shown to correctly identify the community structure when it exists, in particular in the stochastic block model.[3]

Algorithm

edit

The value to be optimized is modularity, defined as a value in the range   that measures the density of links inside communities compared to links between communities.[1] For a weighted graph, modularity is defined as:

 

where:

  •   represents the edge weight between nodes   and  ; see Adjacency matrix;
  •   and   are the sum of the weights of the edges attached to nodes   and  , respectively;
  •   is the sum of all of the edge weights in the graph;
  •   is the total number of nodes in the graph;
  •   and   are the communities to which the nodes   and   belong; and
  •   is Kronecker delta function:

 

Based on the above equation, the modularity of a community   can be calculated as:[4]

 

where

  •   is the sum of edge weights between nodes within the community   (each edge is considered twice); and
  •   is the sum of all edge weights for nodes within the community (including edges which link to other communities).

As nodes in different communities do not contribute to the modularity  , it can be written as:

 

In order to maximize modularity efficiently, the Louvain Method has two phases that are repeated iteratively.

Phase 1:

1. First, each node in the network is assigned to its own community.

 
Step 1: Each node in the graph is randomly assigned to a singleton community

2. Next, for each node  , the change in modularity is calculated for removing   from its own community and moving it into the community of each neighbor   of  . This value is computed in two steps:

(a) Compute the change in modularity   for removing node   from its original community.

(b) Compute the change in modularity   for inserting an isolated node   (i.e. node   has no connections and is in a community of only itself) into the community of neighbouring node, denoted  .

In the following, we will show the derivation for (b). The equations for (a) are similar and can be computed by similar methods.

First, we compute the modularity of the isolated cluster of node  , which we will call  . Here we are assuming that there are no loops, and so   for all values of  :

 

Next, we compute the modularity of the cluster   before we have added the new node  . We already computed this equation:

 

Finally, we compute the modularity of the cluster   after we have added a new node  :

 

We can rewrite the first term as follows:

 

where   represents the sum of the weights of all the edges which go between node   and the nodes in community  . In other words,   is the degree of node   within community  .

We can rewrite the second term as:

 

Putting this together we have:

 

Putting together the equations for  , , and  , we can compute the change in modularity   for adding an isolated node   to the cluster  . This is sometimes referred to as the gain:[1]

 

3. Once the change in modularity   has been computed for all communities   that node   is connected to, node   is placed into the community that resulted in the greatest modularity increase. If no increase is possible, node   remains in its original community.

4. This process is applied repeatedly and sequentially to all nodes until no modularity increase can occur. Once this local maximum of modularity is hit, the first phase has ended.

 
Steps 3 and 4: Nodes are assigned to communities based on their modularities

Phase 2:

The second phase of the algorithm involves reducing communities to a single node and repeating the steps in Phase 1:

1. Each community   is reduced to a single node. Links between nodes in   are reduced to a single weighted self-loop for this community. Likewise edges connecting nodes from   to other communities are reduced to a single weighted edge.

 
Phase 2, step 1: Communities are reduced to a single node with weighted edges

2. Once the new graph is created, the second phase has ended and the first phase can be re-applied to the new network.

 
Final Result: Phase 1 is repeated and nodes are added to communities. Here is shown the result repeating phase 1 before collapsing the communities again

Previous uses

edit
  • Twitter social Network (2.4 Million nodes, 38 million links) by Josep Pujol, Vijay Erramilli, and Pablo Rodriguez:[5] The authors explore the problem of partitioning Online Social Networks onto different machines.
  • Mobile phone Network (4 Million nodes, 100 Million links) by Derek Greene, Donal Doyle, and Padraig Cunningham:[6] Community-tracking strategies for identifying dynamic communities of different dynamic social networks.
  • Detecting species in network-based dynamical model.[7]

Disadvantages

edit
 
A graph illustrating how communities can become disconnected when using the Louvain algorithm.

Louvain produces only non-overlapping communities, which means that each node can belong to at most one community. This is highly unrealistic in many real-world applications. For example, in social networks, most people belong to multiple communities: their family, their friends, their co-workers, old school buddies, etc. In biological networks, most genes or proteins belong to more than one pathway or complex. Furthermore, Louvain has been shown to sometimes produce arbitrarily badly connected communities, and has been effectively superseded (at least in the non-overlapping case) by the Leiden algorithm.

These badly connected communities arise when a node that had been acting as a "bridge" between two groups of nodes in its community is moved to a new community, leaving the old one disconnected. The remaining nodes in the old community may also be relocated, but if their connection to the community is strong enough despite the removal of the "bridge" node, they will instead remain in place. While this is the worst-case scenario, there are other, more subtle problems with the Louvain algorithm that can also lead to arbitrarily badly connected communities, such as the formation of communities using nodes that are only weakly connected.

Another common issue with the Louvain algorithm is the resolution limit of modularity - that is, multiple small communities being grouped together into a larger community. This causes the smaller communities to be hidden. Both this and the arbitrarily badly connected community problem are further exasperated by each iteration of the algorithm. Ultimately, the only thing the Louvain algorithm guarantees is that the resulting communities cannot be merged further; in other words, they're well-separated. To avoid the problems that arise from arbitrarily badly connected communities and the resolution limit of modularity, it is recommended to use the Leiden algorithm instead, as its refinement phase and other various adjustments have corrected these issues.[8]

Time complexity and comparison to other methods of non-overlapping community detection

edit

Generally, the Louvain method is assumed to have a time complexity of  .[9][10] Richard Blondel, co-author of the paper that originally published the Louvain method, seems to support this notion,[11] but other sources claim the time complexity is "essentially linear in the number of links in the graph,"[12] meaning the time complexity would instead be  , where   is the number of edges in the graph. Unfortunately, no source has published a rigorous analysis of the time complexity so nothing can be said with certainty except that, based on empirical evidence like the table below, Louvain is faster than previous algorithms.

When comparing modularity optimization methods, the two measures of importance are the speed and the resulting modularity value. A higher speed is better as it shows a method is more efficient than others and a higher modularity value is desirable as it points to having better-defined communities. The compared methods are, the algorithm of Clauset, Newman, and Moore,[2] Pons and Latapy,[13] and Wakita and Tsurumi.[14]

Modularity Optimization Comparison[15]
Karate Arxiv Internet Web nd.edu Phone Web uk-2005 Web WebBase 2001
Nodes/links 34/77 9k/24k 70k/351k 325k/1M 2.6M/6.3M 39M/783M 118M/1B
Clauset, Newman, and Moore .38/0s .772/3.6s .692/799s .927/5034s -/- -/- -/-
Pons and Latapy .42/0s .757/3.3s .729/575s .895/6666s -/- -/- -/-
Wakita and Tsurumi .42/0s .761/0.7s .667/62s .898/248s .56/464s -/- -/-
Louvain Method .42/0s .813/0s .781/1s .935/3s .769/134s .979/738s .984/152mn

-/- in the table refers to a method that took over 24hrs to run. This table (from[1][16]) shows that the Louvain method outperforms many similar modularity optimization methods in both the modularity and the time categories.

See also

edit

References

edit
  1. ^ a b c d Blondel, Vincent D; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (9 October 2008). "Fast unfolding of communities in large networks". Journal of Statistical Mechanics: Theory and Experiment. 2008 (10): 10008. arXiv:0803.0476. Bibcode:2008JSMTE..10..008B. doi:10.1088/1742-5468/2008/10/P10008. S2CID 334423.
  2. ^ a b Clauset, Aaron; Newman, M. E. J.; Moore, Cristopher (2004-12-06). "Finding community structure in very large networks". Physical Review E. 70 (6): 066111. arXiv:cond-mat/0408187. Bibcode:2004PhRvE..70f6111C. doi:10.1103/PhysRevE.70.066111. ISSN 1539-3755. PMID 15697438. S2CID 8977721.
  3. ^ Cohen-Addad, Vincent; Kosowski, Adrian; Mallmann-Trenn, Frederik; Saulpic, David (2020). "On the Power of Louvain in the Stochastic Block Model". Advances in Neural Information Processing Systems (Neurips 2020). Curran Associates, Inc. pp. 4055–4066.
  4. ^ https://eecs.wsu.edu/~ananth/papers/Ghosh_IPDPS18.pdf [bare URL PDF]
  5. ^ Pujol, Josep M.; Erramilli, Vijay; Rodriguez, Pablo (2009). "Divide and Conquer: Partitioning Online Social Networks". arXiv:0905.4918v1 [cs.NI].
  6. ^ Greene, Derek; Doyle, Dónal; Cunningham, Pádraig (May 2011). Tracking the Evolution of Communities in Dynamic Social Networks (PDF) (Technical report). University College Dublin. UCD-CSI-2011-06. Archived from the original (PDF) on 2013-05-12. Retrieved 2014-11-20.
  7. ^ Markovitch, Omer; Krasnogor, Natalio (2018). "Predicting species emergence in simulated complex pre-biotic networks". PLOS ONE. 13 (2): e0192871. Bibcode:2018PLoSO..1392871M. doi:10.1371/journal.pone.0192871. PMC 5813963. PMID 29447212.
  8. ^ Traag, V. A.; Waltman, L.; van Eck, N. J. (2019-03-26). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. ISSN 2045-2322. PMC 6435756. PMID 30914743.
  9. ^ Kumar, Vikas; Sisodia, Anubhav; Maini, Umesh; Dhull, Pankaj; Anand, Abhineet (November 2016). "Comparing Algorithms of Community Structure in Networks". Indian Journal of Science and Technology. 9 (44): 1–5.
  10. ^ Watson, Cullen (April 25, 2022). "Girvan-Newman and Louvain Algorithms for Community Detection". Medium. Retrieved November 21, 2024.
  11. ^ "Louvain method for community detection". perso.uclouvain.be. Retrieved 2024-11-21.
  12. ^ "Louvain - Analytics & Algorithms - Ultipa Graph". www.ultipa.com. Retrieved 2024-11-21.
  13. ^ Pons, Pascal; Latapy, Matthieu (2006). "Computing Communities in Large Networks Using Random Walks" (PDF). Journal of Graph Algorithms and Applications. 10 (2): 191–218. arXiv:cond-mat/0412368. doi:10.7155/jgaa.00124. S2CID 121714719.
  14. ^ Wakita, Ken; Tsurumi, Toshiyuki (2007). "Finding Community Structure in Mega-scale Social Networks". arXiv:cs/0702048.
  15. ^ Blondel, Vincent D.; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (2008). "Fast unfolding of communities in large networks". Journal of Statistical Mechanics: Theory and Experiment. 2008 (10): 10008. arXiv:0803.0476. Bibcode:2008JSMTE..10..008B. doi:10.1088/1742-5468/2008/10/P10008. S2CID 334423.
  16. ^ Aynaud, Thomas; Blondel, Vincent D.; Guillaume, Jean-Loup; Lambiotte, Renaud (2013). "Multilevel Local Optimization of Modularity". In Bichot, Charles-Edmond; Siarry, Patrick (eds.). Graph Partitioning (1 ed.). Wiley (published 13 February 2013). pp. 315–345. doi:10.1002/9781118601181.ch13. ISBN 978-1-84821-233-6.