Clinical relevance of ML predictions using health datasets

Date

2024

Authors

Balasubramanian, Sowmya

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This research explores how advanced data analysis and machine learning techniques can transform healthcare. By applying innovative computational methods, it addresses key challenges in medical diagnosis, data interpretation, and synthetic data generation. The aim is to enhance clinical practices, improve diagnostic accuracy, and maximize the utility of healthcare data. Through these cutting-edge approaches, the research seeks to provide more effective, accurate, and practical solutions for modern healthcare needs. The first study tackles the challenge of diagnosing thyroid disorders, which can profoundly affect both physical and mental health. Instead of merely evaluating classifier performance, this research emphasizes the value of comprehensive feature analysis. Identifying the top four crucial features for predicting thyroid disorders, shows that using a complete thyroid panel leads to more accurate and cost-effective diagnoses. This approach reveals the flaws in current clinical practices that frequently skip a full thyroid panel, particularly in universal healthcare systems. The second study delves into diagnosing Autism Spectrum Disorder (ASD) using fMRI data. It compares simple tabular data classifiers with cutting-edge graph-theoretic methods, discovering that the simpler approach performs just as well. Moreover, the research finds that adding higher-order connectivity information doesn't improve classification outcomes, and highlights the complexity of diagnosing ASD due to the similar brain networks in individuals with and without the disorder. This study underscores the necessity for clear and reliable diagnostic methods in neuroimaging. The third study explores the creation of synthetic health data, which is essential for research and practical applications while addressing privacy and ethical concerns. It assesses various cutting-edge synthetic data generation (SDG) techniques, comparing their scalability, resemblance to real data, and practical utility. The findings reveal that statistical models surpass machine learning-based methods in training time and data generation, with synthetic data from most models closely mimicking real data. This synthetic data proves highly valuable in practical applications, maintaining high classification accuracy whether trained on real or synthetic data. In summary, this dissertation shows how advanced data analysis and machine learning can significantly improve healthcare. By enhancing diagnostic methods, making complex medical data easier to understand, and ethically using synthetic data, these techniques lead to more effective, accurate, and scalable healthcare solutions. This work demonstrates the powerful role of technology in advancing patient care.

Description

Keywords

Data analytics, Machine learning, Health informatics

Citation