Malicious URLs and Attachments Detection on Lexical-based Features using Machine Learning Techniques

dc.contributor.authorZeng, Yuanxi
dc.contributor.supervisorTraore, Issa
dc.date.accessioned2018-11-02T00:46:12Z
dc.date.copyright2018en_US
dc.date.issued2018-11-01
dc.degree.departmentDepartment of Electrical and Computer Engineering
dc.degree.levelMaster of Engineering M.Eng.en_US
dc.description.abstractEmail is one of the prime sources of cyber-attacks against Internet users. Attackers often use social engineering in order to encourage recipients to click on a link which refers the user to a malicious website or opens a malicious attachment. The recipients may have confidential information stolen and suffer financial loss. In this report, we use two lexical models to detect malicious emails. The models extract solely lexical features from embedded URLs in email content and attachment filenames. The feature extraction does not require external services or tools, thereby it meets the need for real-time detection. The lexical features are used in conjunction with machine learning techniques. Five classifiers are used to test on our dataset of URLs and attachments. The experimental results show that Gradient Boosting Decision Tree outperforms all the other classifiers. It achieves an encouraging accuracy of 90.71% and 94.36% for URL model and attachment model. We can conclude that the lexical models are helpful in malicious email detection. However, we demand an attachment dataset of much more diversity and more features to be added for attachment model.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/10218
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectMachine Learningen_US
dc.subjectURLen_US
dc.subjectAttachmenten_US
dc.subjectEmail filteringen_US
dc.titleMalicious URLs and Attachments Detection on Lexical-based Features using Machine Learning Techniquesen_US
dc.typeProjecten_US

Files

License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: