Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system

Date

2016-11-28

Authors

Chrimes, Dillon

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Background: Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges. The study objective was high performance establishment of interactive BDA platform of hospital system. Methods: A Hadoop/MapReduce framework formed the BDA platform with HBase (NoSQL database) using hospital-specific metadata and file ingestion. Query performance tested with Apache tools in Hadoop’s ecosystem. Results: At optimized iteration, Hadoop distributed file system (HDFS) ingestion required three seconds but HBase required four to twelve hours to complete the Reducer of MapReduce. HBase bulkloads took a week for one billion (10TB) and over two months for three billion (30TB). Simple and complex query results showed about two seconds for one and three billion, respectively. Interpretations: BDA platform of HBase distributed by Hadoop successfully under high performance at large volumes representing the Province’s entire data. Inconsistencies of MapReduce limited operational efficiencies. Importance of the Hadoop/MapReduce on representation of health informatics is further discussed.

Description

Keywords

Big Data, Big Data Analytics, Big Data Tools, Big Data Visualizations, Hadoop Ecosystem, Health Big Data, Hospital Systems, Interactive Big Data, Patient Data, Simulations

Citation