Complex graph algorithms using relational database

dc.contributor.authorAhmed, Aly
dc.contributor.supervisorThomo, Alex
dc.date.accessioned2021-08-24T19:18:15Z
dc.date.available2021-08-24T19:18:15Z
dc.date.copyright2021en_US
dc.date.issued2021-08-24
dc.degree.departmentDepartment of Computer Science
dc.degree.levelDoctor of Philosophy Ph.D.en_US
dc.description.abstractData processing for Big Data plays a vital role for decision-makers in organizations and government, enhances the user experience, and provides quality results in prediction analysis. However, many modern data processing solutions make a significant investment in hardware and maintenance costs, such as Hadoop and Spark, often neglecting the well established and widely used relational database management systems (RDBMS's). In this dissertation, we study three fundamental graph problems in RDBMS. The first problem we tackle is computing shortest paths (SP) from a source to a target in large network graphs. We explore SQL based solutions and leverage the intelligent scheduling that a RDBMS performs when executing set-at-a-time expansions of graph vertices, which is in contrast to vertex-at-a-time expansions in classical SP algorithms. Our algorithms perform orders of magnitude faster than baselines and outperform counterparts in native graph databases. Second, we studied the PageRank problem which is vital in Google Search and social network analysis to determine how to sort search results and identify important nodes in a graph. PageRank is an iterative algorithm which imposes challenges when implementing it over large graphs. We study computing PageRank using RDBMS for very large graphs using a consumer-grade machine and compare the results to a dedicated graph database. We show that our RDBMS solution is able to process graphs of more than a billion edges in few minutes, whereas native graph databases fail to handle graphs of much smaller sizes. Last, we present a carefully engineered RDBMS solution to the problem of triangle enumeration for very large graphs. We show that RDBMS's are suitable tools for enumerating billions of triangles in billion-scale networks on a consumer grade machine. Also, we compare our RDBMS solution's performance to a native graph database and show that our RDBMS solution outperforms by orders of magnitude.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.bibliographicCitationAly Ahmed, Keanelek Enns, and Alex Thomo. Triangle enumeration forbillion-scale graphs in rdbms. InAINA (2), pages 160–173, 2021en_US
dc.identifier.bibliographicCitationAly Ahmed and Alex Thomo. Computing source-to-target shortest paths forcomplex networks in rdbms.Journal of Computer and System Sciences,89:114–129, 2017en_US
dc.identifier.bibliographicCitationAly Ahmed and Alex Thomo. Pagerank for billion-scale networks in rdbms.InInternational Conference on Intelligent Networking and CollaborativeSystems, pages 89–100. Springer International Publishing, 2020.en_US
dc.identifier.urihttp://hdl.handle.net/1828/13306
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectShortest Pathen_US
dc.subjectpageranken_US
dc.subjectRDBMSen_US
dc.subjectMatrix partitioningen_US
dc.subjectBig Dataen_US
dc.subjectTriangle Enumerationen_US
dc.subjectGraph Databaseen_US
dc.subjectPTEen_US
dc.subjectCompact Forwarden_US
dc.subjectTable partitioningen_US
dc.subjectBillion Scale Graphen_US
dc.titleComplex graph algorithms using relational databaseen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ahmed_Aly_PhD_2021.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: