Malicious URL Detection Using Machine Learning

dc.contributor.authorSyed, Abdul Aleem
dc.contributor.supervisorGulliver, T. Aaron
dc.date.accessioned2022-08-09T00:12:46Z
dc.date.available2022-08-09T00:12:46Z
dc.date.copyright2022en_US
dc.date.issued2022-08-08
dc.degree.departmentDepartment of Electrical and Computer Engineeringen_US
dc.degree.levelMaster of Engineering M.Eng.en_US
dc.description.abstractThe detection of malicious Uniform Resource Locators (URLs) is important for network and cyber security. The Internet has long been a platform for online criminal activity. In this project, supervised Machine Learning (ML) is employed to identify and detect malicious URLs. The ISCX-URL-2016 dataset from the Canadian Institute for Cyber Security is employed for evaluation purposes. This dataset contains 79 features with four classes of URLs, namely spam, malware, phishing, and benign. The Waikato Environment for Knowledge Analysis (WEKA) tool is used to test and train the ML classifiers. To compare the results, k-fold cross-validation is used with k = 5 and k = 10. Principal Component Analysis (PCA) is employed for dimensionality reduction of the dataset and the important features selected based on the eigenvalues. The best 10 and 25 features were selected using PCA and the classifiers were trained using 5-fold and 10-fold cross-validation. The classifiers were also trained using all 79 features. The ML classifiers evaluated are Random Forest (RF), Decision Tree, K-Nearest Neighbors (KNN), Bayesian Network (BayesNet), and Simple Logistic. The performance metrics accuracy, precision, recall, f-measure, and execution time are considered. The RF classifier resulted in the highest accuracy at 98.7% with 79 features. However, in terms of execution time, KNN outperformed RF with 0.06 s for 79 features and 98.3% accuracy, which is only second to RF. In general, the results obtained show that KNN provides the best overall performance.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/14090
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectMachine Learningen_US
dc.subjectURLen_US
dc.subjectMalicious URLen_US
dc.subjectMalicious URL Detectionen_US
dc.subjectWEKA Toolen_US
dc.subjectPCAen_US
dc.subjectPrincipal Component Analysisen_US
dc.subjectRandom Foresten_US
dc.subjectSupervised Machine Learningen_US
dc.titleMalicious URL Detection Using Machine Learningen_US
dc.typeprojecten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Syed_Abdul_Aleem_MEng_2022.pdf
Size:
1.38 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: