Distributed enumeration of four node graphlets at quadrillion-scale

Date

2021-11-19

Authors

Liu, Xiaozhou

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Graphlet enumeration is a basic task in graph analysis with many applications. Thus it is important to be able to perform this task within a reasonable amount of time. However, this objective is challenging when the input graph is very large, with millions of nodes and edges. Known solutions are limited in terms of scalability. Distributed computing is often proposed as a solution to improve scalability. How- ever, it has to be done carefully to reduce the overhead cost and to really benefit from the distributed solution. We study the enumeration of four-node graphlets in undirected graphs using a distributed platform. We propose an efficient distributed solution which significantly surpasses the existing solutions. With this method we are able to process larger graphs that have never been processed before and enumerate quadrillions of graphlets using a modest cluster of machines. We convincingly show the scalability of our solution through experimental results.

Description

Keywords

subgraph enumeration, MapReduce, graphlet enumeration, graph analytics, distributed analytics

Citation