Dense subgraph mining in probabilistic graphs

Date

2021-12-09

Authors

Esfahani, Fatemeh

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this dissertation we consider the problem of mining cohesive (dense) subgraphs in probabilistic graphs, where each edge has a probability of existence. Mining probabilistic graphs has become the focus of interest in analyzing many real-world datasets, such as social, trust, communication, and biological networks due to the intrinsic uncertainty present in them. Studying cohesive subgraphs can reveal important information about connectivity, centrality, and robustness of the network, with applications in areas such as bioinformatics and social networks. In deterministic graphs, there exists various definitions of cohesive substructures, including cliques, quasi-cliques, k-cores and k-trusses. In this regard, k-core and k-truss decompositions are popular tools for finding cohesive subgraphs. In deterministic graphs, a k-core is the largest subgraph in which each vertex has at least k neighbors, and a k-truss is the largest subgraph whose edges are contained in at least k triangles (or k-2 triangles depending on the definition). The k-core and k-truss decomposition in deterministic graphs have been thoroughly studied in the literature. However, in the probabilistic context, the computation is challenging and state-of-art approaches are not scalable to large graphs. The main challenge is efficient computation of the tail probabilities of vertex degrees and triangle count of edges in probabilistic graphs. We employ a special version of central limit theorem (CLT) to obtain the tail probabilities efficiently. Based on our CLT approach we propose peeling algorithms for core and truss decomposition of a probabilistic graph that scales to very large graphs and offers significant improvement over state-of-the-art approaches. Moreover, we propose a second algorithm for probabilistic core decomposition that can handle graphs not fitting in memory by processing them sequentially one vertex at a time. In terms of truss decomposition, we design a second method which is based on progressive tightening of the estimate of the truss value of each edge based on h-index computation and novel use of dynamic programming. We provide extensive experimental results to show the efficiency of the proposed algorithms. Another contribution of this thesis is mining cohesive subgraphs using the recent notion of nucleus decomposition introduced by Sariyuce et al. Nucleus decomposition is based on higher order structures such as cliques nested in other cliques. Nucleus decomposition can reveal interesting subgraphs that can be missed by core and truss decompositions. In this dissertation, we present nucleus decomposition for probabilistic graphs. The major questions we address are: How to define meaningfully nucleus decomposition in probabilistic graphs? How hard is computing nucleus decomposition in probabilistic graphs? Can we devise efficient algorithms for exact or approximate nucleus decomposition in large graphs? We present three natural definitions of nucleus decomposition in probabilistic graphs: local, global, and weakly-global. We show that the local version is in PTIME, whereas global and weakly-global are #P-hard and NP-hard, respectively. We present an efficient and exact dynamic programming approach for the local case. Further, we present statistical approximations that can scale to bigger datasets without much loss of accuracy. For global and weakly-global decompositions we complement our intractability results by proposing efficient algorithms that give approximate solutions based on search space pruning and Monte-Carlo sampling. Extensive experiments show the scalability and efficiency of our algorithms. Compared to probabilistic core and truss decompositions, nucleus decomposition significantly outperforms in terms of density and clustering metrics.

Description

Keywords

Probabilistic Graphs, Dense Subgraphs

Citation