Breast Cancer Prediction Using Machine Learning Algorithms

dc.contributor.authorShahzad, Zeeshan Ali
dc.contributor.supervisorGulliver, T. Aaron
dc.date.accessioned2024-03-12T23:57:32Z
dc.date.available2024-03-12T23:57:32Z
dc.date.issued2024
dc.degree.departmentDepartment of Electrical and Computer Engineering
dc.degree.levelMaster of Engineering MEng
dc.description.abstractBreast cancer has become a pressing global health issue with its prevalence increasing worldwide. The rise in breast cancer cases is a cause for concern as it not only affects the physical and emotional well-being of individuals but also places a significant burden on the healthcare system. Early detection and timely intervention are critical factors in effectively combatting this disease. The ability to predict and diagnose breast cancer at its earliest stages can have a profound difference in patient outcomes, potentially saving countless lives. In recent years, the importance of Machine Learning (ML) in the field of healthcare has become paramount. This study considers the utility of supervised ML models to address the challenges posed by breast cancer using the publicly available Breast Cancer Wisconsin (Diagnostic) dataset from the University of California Irvine (UCI) ML repository. The Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), Naive Bayes and K-Nearest Neighbors (KNN) classifiers are implemented using Jupyter Notebook with Python programming. The goal of the proposed methodology is accurate breast cancer prediction. First, data preprocessing is employed to clean the dataset by removing null values and duplicates, and handling missing data. In order to balance the target labels of the dataset, Synthetic Minority Oversampling Technique (SMOTE) is employed. Then, Principal Component Analysis (PCA) is used to reduce the dimensions of the dataset. The number of components is varied (n=2, 5, 10, 15). For training and testing the ML models, five data splits, namely 80/20, 70/30, 50/50, 30/70, and 20/80 are employed to assess the impact on model performance. The performance of the models is evaluated using the metrics accuracy, precision, recall, F1-score, and execution time. The results obtained show that SVM and Logistic Regression outperform the other models with SVM having an accuracy of 98.2% and an execution time of 9.99 ms with an 80/20 split using 10 features and Logistic Regression having an accuracy of 97.9% and an execution time of 8.42 ms with a 50/50 split using 15 features.
dc.description.scholarlevelGraduate
dc.identifier.urihttps://hdl.handle.net/1828/16063
dc.language.isoen
dc.rightsAvailable to the World Wide Web
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleBreast Cancer Prediction Using Machine Learning Algorithms
dc.typeproject

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zeeshan_Ali_MEng_2024.pdf
Size:
942.64 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: