Approximating 4-cliques in streaming graphs: the power of dual sampling

Date

2021-05-04

Authors

Mann, Anmol

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Clique counting is considered to be a challenging problem in graph mining. The reason is a combinatorial explosion; even moderate graphs with a few million edges could have clique counts in the order of many billions. When dealing with such big data, it becomes critical to not just analyze it, rather analyze it very efficiently. While randomized algorithms are known for estimating clique counts, 4-cliques have not received as much attention as triangles in the streaming setting. In this work, we propose 4CDS, a fast and scalable algorithm for approximating 4-clique counts in a single-pass streaming model. By leveraging a combination of sampling approaches, we estimate the 4-clique count with high accuracy. We provide a theoretical analysis of the algorithm and prove that it improves upon the known space and accuracy bounds. A comprehensive evaluation of 4CDS is conducted on a collection of real-world graphs. Our algorithm performs well on massive graphs containing several billions of 4-cliques and terminates within a reasonable amount of time. We experimentally show that our proposed method obtains significant speedup, outperforming several existing clique counting algorithms.

Description

Keywords

stream data analysis, graph streams, clique counting, randomized algorithm, clique approximation, stream mining

Citation