On the Optimality of TeraSort in MapReduce

dc.contributor.authorXia, Fei
dc.contributor.supervisorThomo, Alex
dc.contributor.supervisorSrinivasan, Venkatesh
dc.date.accessioned2016-08-26T12:59:50Z
dc.date.available2016-08-26T12:59:50Z
dc.date.copyright2016en_US
dc.date.issued2016-08-26
dc.degree.departmentDepartment of Computer Scienceen_US
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractMapReduce is a scalable, reliable and easy-to-program parallel computation frame- work for massive data processing. The key for a MapReduce algorithm to be efficient is the balance of workloads on the participating machines. Building on the notion of minimal MapReduce algorithms, this project report discusses the sampling and partitioning techniques used in TeraSort. For one of them, we improve the bound on partition sizes to one of asymptotic optimality in terms of increasing number of partitions. In light of the wide applicability of this partition technique, our result potentially strengthens the worst case performance guarantee in other algorithms. We show the application in top-k and k-selection problems as an example.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/7489
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.5/ca/*
dc.subjectTeraSorten_US
dc.subjectMapReduceen_US
dc.subjectoptimalityen_US
dc.subjectsampleen_US
dc.titleOn the Optimality of TeraSort in MapReduceen_US
dc.typeprojecten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fei_Xia_MSc_2016.pdf
Size:
231.65 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: