On the Optimality of TeraSort in MapReduce
| dc.contributor.author | Xia, Fei | |
| dc.contributor.supervisor | Thomo, Alex | |
| dc.contributor.supervisor | Srinivasan, Venkatesh | |
| dc.date.accessioned | 2016-08-26T12:59:50Z | |
| dc.date.available | 2016-08-26T12:59:50Z | |
| dc.date.copyright | 2016 | en_US |
| dc.date.issued | 2016-08-26 | |
| dc.degree.department | Department of Computer Science | |
| dc.degree.level | Master of Science M.Sc. | en_US |
| dc.description.abstract | MapReduce is a scalable, reliable and easy-to-program parallel computation frame- work for massive data processing. The key for a MapReduce algorithm to be efficient is the balance of workloads on the participating machines. Building on the notion of minimal MapReduce algorithms, this project report discusses the sampling and partitioning techniques used in TeraSort. For one of them, we improve the bound on partition sizes to one of asymptotic optimality in terms of increasing number of partitions. In light of the wide applicability of this partition technique, our result potentially strengthens the worst case performance guarantee in other algorithms. We show the application in top-k and k-selection problems as an example. | en_US |
| dc.description.scholarlevel | Graduate | en_US |
| dc.identifier.uri | http://hdl.handle.net/1828/7489 | |
| dc.language.iso | en | en_US |
| dc.rights | Available to the World Wide Web | en_US |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ | * |
| dc.subject | TeraSort | en_US |
| dc.subject | MapReduce | en_US |
| dc.subject | optimality | en_US |
| dc.subject | sample | en_US |
| dc.title | On the Optimality of TeraSort in MapReduce | en_US |
| dc.type | project | en_US |