Malicious URL Detection Using Machine Learning

Syed, Abdul Aleem

Malicious URL Detection Using Machine Learning

dc.contributor.author	Syed, Abdul Aleem
dc.contributor.supervisor	Gulliver, T. Aaron
dc.date.accessioned	2022-08-09T00:12:46Z
dc.date.available	2022-08-09T00:12:46Z
dc.date.copyright	2022	en_US
dc.date.issued	2022-08-08
dc.degree.department	Department of Electrical and Computer Engineering
dc.degree.level	Master of Engineering M.Eng.	en_US
dc.description.abstract	The detection of malicious Uniform Resource Locators (URLs) is important for network and cyber security. The Internet has long been a platform for online criminal activity. In this project, supervised Machine Learning (ML) is employed to identify and detect malicious URLs. The ISCX-URL-2016 dataset from the Canadian Institute for Cyber Security is employed for evaluation purposes. This dataset contains 79 features with four classes of URLs, namely spam, malware, phishing, and benign. The Waikato Environment for Knowledge Analysis (WEKA) tool is used to test and train the ML classifiers. To compare the results, k-fold cross-validation is used with k = 5 and k = 10. Principal Component Analysis (PCA) is employed for dimensionality reduction of the dataset and the important features selected based on the eigenvalues. The best 10 and 25 features were selected using PCA and the classifiers were trained using 5-fold and 10-fold cross-validation. The classifiers were also trained using all 79 features. The ML classifiers evaluated are Random Forest (RF), Decision Tree, K-Nearest Neighbors (KNN), Bayesian Network (BayesNet), and Simple Logistic. The performance metrics accuracy, precision, recall, f-measure, and execution time are considered. The RF classifier resulted in the highest accuracy at 98.7% with 79 features. However, in terms of execution time, KNN outperformed RF with 0.06 s for 79 features and 98.3% accuracy, which is only second to RF. In general, the results obtained show that KNN provides the best overall performance.	en_US
dc.description.scholarlevel	Graduate	en_US
dc.identifier.uri	http://hdl.handle.net/1828/14090
dc.language.iso	en	en_US
dc.rights	Available to the World Wide Web	en_US
dc.subject	Machine Learning	en_US
dc.subject	URL	en_US
dc.subject	Malicious URL	en_US
dc.subject	Malicious URL Detection	en_US
dc.subject	WEKA Tool	en_US
dc.subject	PCA	en_US
dc.subject	Principal Component Analysis	en_US
dc.subject	Random Forest	en_US
dc.subject	Supervised Machine Learning	en_US
dc.title	Malicious URL Detection Using Machine Learning	en_US
dc.type	project	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Syed_Abdul_Aleem_MEng_2022.pdf
Size:: 1.38 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master's Projects