Explainable machine learning for diabetes prediction

Hussain, Sadam

Explainable machine learning for diabetes prediction

dc.contributor.author	Hussain, Sadam
dc.contributor.supervisor	Gulliver, T. Aaron
dc.date.accessioned	2026-03-10T20:20:51Z
dc.date.available	2026-03-10T20:20:51Z
dc.date.issued	2026
dc.degree.department	Department of Electrical and Computer Engineering
dc.degree.level	Master of Applied Science MASc
dc.description.abstract	Diabetes is a growing global health concern, contributing to significant morbidity, mortality, and long-term economic burden. Machine Learning (ML) methods are increasingly applied to diabetes prediction, however, selecting appropriate classifiers and understanding the key features driving model decisions remain essential for reliable and clinically acceptable performance. This is particularly important in healthcare settings where clinicians may have limited familiarity with ML techniques and where transparency and trust in predictive outputs are critical. This study evaluates eight ML classifiers, Logistic Regression (LR), Random Forest (RF), Gradient Boosting (GB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), AdaBoost (AB), Decision Tree (DT) and Neural Network (NN) using a dataset of 100,000 patient records for diabetes prediction. Models are evaluated using various configurations which includes baseline training and hyperparameter optimization using RandomizedSearchCV. The global and local interpretability is examined using SHapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME) and Explain Like I’m 5 (ELI5) to identify the most influential features contributing to predictions. These findings show that ensemble based models achieve strongest predictive performance with RF and GB outperforming other evaluated classifiers. Interpretability analyses consistently highlight that Hemoglobin A1c (HbA1c), blood glucose, Body Mass Index (BMI), and age are the dominant predictive features. A final evaluation using a reduced feature set derived with the help of Explainable AI (XAI) demonstrates that strong predictive accuracy can be maintained while improving model simplicity and interpretability. This work underscores the importance of combining ML performance with transparent feature explanations in order to support trustworthy and clinically meaningful decision support systems for diabetes prediction.
dc.description.scholarlevel	Graduate
dc.identifier.uri	https://hdl.handle.net/1828/23441
dc.language	English	eng
dc.language.iso	en
dc.rights	Available to the World Wide Web
dc.subject	Machine learning
dc.subject	Explainable AI
dc.subject	Diabetes prediction
dc.subject	Feature selection
dc.title	Explainable machine learning for diabetes prediction
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hussain_Sadam_MASc_2026.pdf
Size:: 7.29 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)