An Experimental Evaluation of Giraph and Graphchi

Date

2016-08-29

Authors

Junnan, Lu

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Graphs are the ultimate data structure to capture and represent the data of different connected entities. Graphs have become a very practical tool to model complicated relationships in various application domains, such as social media, protein, transportation, bibliographical, or knowledge networks. With the growth of popularity of cloud computing, graphs with millions of nodes and billion edges are becoming more common. Graph analytics is a critical component of big data discovery. The major problem in processing large graph data is the size and the irregular structure of the graph. In this report, we evaluate a Pregel implementation, Apache Giraph, on several algorithms. Also, we compare our results with a disk-based (centralized) system, GraphChi. We observe that for a moderate number of very simple machines, Giraph outperforms GraphChi for all the algorithms and datasets tested. This is in contrast to the claim of the GaphChi authors that one needs a cluster of more than 1,000 computers to perform comparably to GraphChi.

Description

Keywords

Giraph, Graphchi, Graph Analytics, Cloud Computing, Big Data, Pagerank

Citation