Trigger: A Hybrid Model for Low-Latency Processing of Large Data Sets

Date

2015-08-12

Authors

Xiang, Min

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Large data sets now need to be processed at close to real-time speeds. For ex- ample, video hosting sites like Youtube and Netflix have a huge amount of traffic every day and large amounts of data needs to be processed on demand so that statistics and analytics or application logic can generate contents for user queries. In such cases, data can be stream processed or batch processed. Stream processing treats the incoming data as a stream and processes it through a processing pipeline as soon as the stream is gathered. It is more computationally intensive but grants lower latency. Batch processing tries to gather more data before processing it. It consumes fewer resources but at the cost of higher latency. This project explores an adaptable model that allows a developer to strike a balance between the efficient use of computational resources and the amount of latency involved in processing a large data set. The proposed model uses an event triggered batch processing method to balance resource utilization versus latency. The model is also configurable, so it can adapt to different tradeoffs according to application specific needs. In a very simple application and a extremely best case scenario, we show that this model offers a 1 second latency when applied to a video hosting site where a traditional batch process- ing method introduced 1 minute latency. When the initial system has low latency, this model will not increase the latency when appropriate parameters are chosen.

Description

Keywords

low latency, batch processing, live data sets

Citation