Trigger: A Hybrid Model for Low-Latency Processing of Large Data Sets
Date
2015-08-12
Authors
Xiang, Min
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Large data sets now need to be processed at close to real-time speeds. For ex-
ample, video hosting sites like Youtube and Netflix have a huge amount of
traffic every day and large amounts of data needs to be processed on demand so that
statistics and analytics or application logic can generate contents for user queries.
In such cases, data can be stream processed or batch processed. Stream processing
treats the incoming data as a stream and processes it through a processing pipeline
as soon as the stream is gathered. It is more computationally intensive but grants
lower latency. Batch processing tries to gather more data before processing it. It
consumes fewer resources but at the cost of higher latency. This project explores an
adaptable model that allows a developer to strike a balance between the efficient use
of computational resources and the amount of latency involved in processing a large
data set. The proposed model uses an event triggered batch processing method to
balance resource utilization versus latency. The model is also configurable, so it can
adapt to different tradeoffs according to application specific needs. In a very simple
application and a extremely best case scenario, we show that this model offers a 1
second latency when applied to a video hosting site where a traditional batch process-
ing method introduced 1 minute latency. When the initial system has low latency,
this model will not increase the latency when appropriate parameters are chosen.
Description
Keywords
low latency, batch processing, live data sets