Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system
Date
2016-11-28
Authors
Chrimes, Dillon
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges. The study objective was high performance establishment of interactive BDA platform of hospital system.
Methods: A Hadoop/MapReduce framework formed the BDA platform with HBase (NoSQL database) using hospital-specific metadata and file ingestion. Query performance tested with Apache tools in Hadoop’s ecosystem.
Results: At optimized iteration, Hadoop distributed file system (HDFS) ingestion required three seconds but HBase required four to twelve hours to complete the Reducer of MapReduce. HBase bulkloads took a week for one billion (10TB) and over two months for three billion (30TB). Simple and complex query results showed about two seconds for one and three billion, respectively.
Interpretations: BDA platform of HBase distributed by Hadoop successfully under high performance at large volumes representing the Province’s entire data. Inconsistencies of MapReduce limited operational efficiencies. Importance of the Hadoop/MapReduce on representation of health informatics is further discussed.
Description
Keywords
Big Data, Big Data Analytics, Big Data Tools, Big Data Visualizations, Hadoop Ecosystem, Health Big Data, Hospital Systems, Interactive Big Data, Patient Data, Simulations