hina.mesoscale ++++++++++++ Tutorial ======== The `mesoscale` module provides methods for clustering one set of nodes in a heterogeneous interaction network based on their shared interactions with nodes in another node set. The clustering methods employed automatically learn the number of clusters from the heterogeneity in the interaction data to find the mesoscale representation. This module can incorporate other algorithms for understanding the mesoscale structure of interaction networks. Currently, the module contains the `clustering.py` file, which includes: - `hina_communities`: Identifies communities in a bipartite/tripartite network by optimizing a Minimum Description Length (MDL) objective that modifies the microcanonical stochastic block model for the intended clustering task. The MDL hina community detection method hina_communities (custom developed for the HINA package) finds a community partition of the nodes in the first node set using an information theoretic objective function that automatically selects for the optimal number of clusters. This objective scores a partition of the nodes according to how well it allows for the transmission of the bipartite/tripartite graph while exploiting redundancies in the edges coming from members of the same community. It develops a description length objective by breaking the information transmission process into three steps: (1) Transmit community labels of nodes in the first set; (2) Transmit total edge weight contributions from each of the communities; (3) Transmit weights of each edge from Set 1 to Set 2 (the graph) given the constraints imposed by (1) and (2). Following these steps, the objective we use is: .. math:: \mathcal{L}(b) = \log N_1 + \log {N_1-1\choose B-1} + \log {N_1\choose n_1,...,n_B} + \log{BN_2+W-1\choose W} + \sum_{r=1}^{B}\sum_{j=1}^{N_2} \log{n_r+w_{rj}-1\choose w_{rj}}, where: - :math:`b` is a partition of the nodes in the first node set into :math:`B` non-empty communities such that :math:`b_i` is the community of node :math:`i` - :math:`N_1` and :math:`N_2` are the sizes of the first node set and second node set respectively - :math:`n_r` is the size of community :math:`r \in \{1, \dots, B\}` - :math:`W` is the total weight of the edges in the bipartite/tripartite graph - :math:`w_{rj}` is the total weight of the edges from nodes in community :math:`r` to node :math:`j` in the second node set The method optimizes this MDL objective approximately using a fast agglomerative scheme in which we start with every node in its own cluster and iteratively merge the pair of communities that produces the greatest decrease to the description length until all nodes are grouped together. Afterwards, we scan over all solution candidates to identify the MDL-optimal partition. .. list-table:: Functions :header-rows: 1 * - Function - Description * - `hina_communities(G, fix_B=None) <#hina_communities>`_ - Identifies bipartite/tripartite communities by optimizing the MDL objective. Reference --------- .. _hina_communities: .. raw:: html
**Description**: Optimizes a Minimum Description Length (MDL) objective to identify communities in a bipartite/tripartite network. The MDL objective quantifies the trade-off between the complexity of the community structure and the efficiency of encoding the network information. The algorithm begins by assigning each node in the first node set to its own community, then iteratively merges communities that produce the greatest decrease in the total description length. The optimal partition is chosen either automatically or by fixing the number of communities via the `fix_B` parameter. **Parameters**: .. raw:: html(i, j, w), where i and j are node labels and w is a positive integer.None.
None, the algorithm automatically determines the optimal number of communities.