On the Optimality of TeraSort in MapReduce

dc.contributor.author	Xia, Fei
dc.contributor.supervisor	Thomo, Alex
dc.contributor.supervisor	Srinivasan, Venkatesh
dc.date.accessioned	2016-08-26T12:59:50Z
dc.date.available	2016-08-26T12:59:50Z
dc.date.copyright	2016	en_US
dc.date.issued	2016-08-26
dc.degree.department	Department of Computer Science
dc.degree.level	Master of Science M.Sc.	en_US
dc.description.abstract	MapReduce is a scalable, reliable and easy-to-program parallel computation frame- work for massive data processing. The key for a MapReduce algorithm to be efficient is the balance of workloads on the participating machines. Building on the notion of minimal MapReduce algorithms, this project report discusses the sampling and partitioning techniques used in TeraSort. For one of them, we improve the bound on partition sizes to one of asymptotic optimality in terms of increasing number of partitions. In light of the wide applicability of this partition technique, our result potentially strengthens the worst case performance guarantee in other algorithms. We show the application in top-k and k-selection problems as an example.	en_US
dc.description.scholarlevel	Graduate	en_US
dc.identifier.uri	http://hdl.handle.net/1828/7489
dc.language.iso	en	en_US
dc.rights	Available to the World Wide Web	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/	*
dc.subject	TeraSort	en_US
dc.subject	MapReduce	en_US
dc.subject	optimality	en_US
dc.subject	sample	en_US
dc.title	On the Optimality of TeraSort in MapReduce	en_US
dc.type	project	en_US

Files

Now showing 1 - 1 of 1

Now showing 1 - 1 of 1