Complex graph algorithms using relational database
| dc.contributor.author | Ahmed, Aly | |
| dc.contributor.supervisor | Thomo, Alex | |
| dc.date.accessioned | 2021-08-24T19:18:15Z | |
| dc.date.available | 2021-08-24T19:18:15Z | |
| dc.date.copyright | 2021 | en_US |
| dc.date.issued | 2021-08-24 | |
| dc.degree.department | Department of Computer Science | |
| dc.degree.level | Doctor of Philosophy Ph.D. | en_US |
| dc.description.abstract | Data processing for Big Data plays a vital role for decision-makers in organizations and government, enhances the user experience, and provides quality results in prediction analysis. However, many modern data processing solutions make a significant investment in hardware and maintenance costs, such as Hadoop and Spark, often neglecting the well established and widely used relational database management systems (RDBMS's). In this dissertation, we study three fundamental graph problems in RDBMS. The first problem we tackle is computing shortest paths (SP) from a source to a target in large network graphs. We explore SQL based solutions and leverage the intelligent scheduling that a RDBMS performs when executing set-at-a-time expansions of graph vertices, which is in contrast to vertex-at-a-time expansions in classical SP algorithms. Our algorithms perform orders of magnitude faster than baselines and outperform counterparts in native graph databases. Second, we studied the PageRank problem which is vital in Google Search and social network analysis to determine how to sort search results and identify important nodes in a graph. PageRank is an iterative algorithm which imposes challenges when implementing it over large graphs. We study computing PageRank using RDBMS for very large graphs using a consumer-grade machine and compare the results to a dedicated graph database. We show that our RDBMS solution is able to process graphs of more than a billion edges in few minutes, whereas native graph databases fail to handle graphs of much smaller sizes. Last, we present a carefully engineered RDBMS solution to the problem of triangle enumeration for very large graphs. We show that RDBMS's are suitable tools for enumerating billions of triangles in billion-scale networks on a consumer grade machine. Also, we compare our RDBMS solution's performance to a native graph database and show that our RDBMS solution outperforms by orders of magnitude. | en_US |
| dc.description.scholarlevel | Graduate | en_US |
| dc.identifier.bibliographicCitation | Aly Ahmed, Keanelek Enns, and Alex Thomo. Triangle enumeration forbillion-scale graphs in rdbms. InAINA (2), pages 160–173, 2021 | en_US |
| dc.identifier.bibliographicCitation | Aly Ahmed and Alex Thomo. Computing source-to-target shortest paths forcomplex networks in rdbms.Journal of Computer and System Sciences,89:114–129, 2017 | en_US |
| dc.identifier.bibliographicCitation | Aly Ahmed and Alex Thomo. Pagerank for billion-scale networks in rdbms.InInternational Conference on Intelligent Networking and CollaborativeSystems, pages 89–100. Springer International Publishing, 2020. | en_US |
| dc.identifier.uri | http://hdl.handle.net/1828/13306 | |
| dc.language | English | eng |
| dc.language.iso | en | en_US |
| dc.rights | Available to the World Wide Web | en_US |
| dc.subject | Shortest Path | en_US |
| dc.subject | pagerank | en_US |
| dc.subject | RDBMS | en_US |
| dc.subject | Matrix partitioning | en_US |
| dc.subject | Big Data | en_US |
| dc.subject | Triangle Enumeration | en_US |
| dc.subject | Graph Database | en_US |
| dc.subject | PTE | en_US |
| dc.subject | Compact Forward | en_US |
| dc.subject | Table partitioning | en_US |
| dc.subject | Billion Scale Graph | en_US |
| dc.title | Complex graph algorithms using relational database | en_US |
| dc.type | Thesis | en_US |