Trigger: A Hybrid Model for Low-Latency Processing of Large Data Sets

Xiang, Min

Trigger: A Hybrid Model for Low-Latency Processing of Large Data Sets

Files

Xiang_Min_MSc_2015.pdf (191.32 KB)

Date

2015-08-12

Authors

Xiang, Min

Abstract

Large data sets now need to be processed at close to real-time speeds. For ex- ample, video hosting sites like Youtube and Netflix have a huge amount of traffic every day and large amounts of data needs to be processed on demand so that statistics and analytics or application logic can generate contents for user queries. In such cases, data can be stream processed or batch processed. Stream processing treats the incoming data as a stream and processes it through a processing pipeline as soon as the stream is gathered. It is more computationally intensive but grants lower latency. Batch processing tries to gather more data before processing it. It consumes fewer resources but at the cost of higher latency. This project explores an adaptable model that allows a developer to strike a balance between the efficient use of computational resources and the amount of latency involved in processing a large data set. The proposed model uses an event triggered batch processing method to balance resource utilization versus latency. The model is also configurable, so it can adapt to different tradeoffs according to application specific needs. In a very simple application and a extremely best case scenario, we show that this model offers a 1 second latency when applied to a video hosting site where a traditional batch process- ing method introduced 1 minute latency. When the initial system has low latency, this model will not increase the latency when appropriate parameters are chosen.

Keywords

low latency, batch processing, live data sets

URI

http://hdl.handle.net/1828/6431

Collections

Graduate Projects (Computer Science)

Full item page

Trigger: A Hybrid Model for Low-Latency Processing of Large Data Sets

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections