Approximating 4-cliques in streaming graphs: the power of dual sampling

dc.contributor.authorMann, Anmol
dc.contributor.supervisorThomo, Alex
dc.contributor.supervisorSrinivasan, Venkatesh
dc.date.accessioned2021-05-04T18:48:30Z
dc.date.available2021-05-04T18:48:30Z
dc.date.copyright2021en_US
dc.date.issued2021-05-04
dc.degree.departmentDepartment of Computer Science
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractClique counting is considered to be a challenging problem in graph mining. The reason is a combinatorial explosion; even moderate graphs with a few million edges could have clique counts in the order of many billions. When dealing with such big data, it becomes critical to not just analyze it, rather analyze it very efficiently. While randomized algorithms are known for estimating clique counts, 4-cliques have not received as much attention as triangles in the streaming setting. In this work, we propose 4CDS, a fast and scalable algorithm for approximating 4-clique counts in a single-pass streaming model. By leveraging a combination of sampling approaches, we estimate the 4-clique count with high accuracy. We provide a theoretical analysis of the algorithm and prove that it improves upon the known space and accuracy bounds. A comprehensive evaluation of 4CDS is conducted on a collection of real-world graphs. Our algorithm performs well on massive graphs containing several billions of 4-cliques and terminates within a reasonable amount of time. We experimentally show that our proposed method obtains significant speedup, outperforming several existing clique counting algorithms.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/12941
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectstream data analysisen_US
dc.subjectgraph streamsen_US
dc.subjectclique countingen_US
dc.subjectrandomized algorithmen_US
dc.subjectclique approximationen_US
dc.subjectstream miningen_US
dc.titleApproximating 4-cliques in streaming graphs: the power of dual samplingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mann_Anmol_MSc_2021.pdf
Size:
393.22 KB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: