Malicious URLs and Attachments Detection on Lexical-based Features using Machine Learning Techniques




Zeng, Yuanxi

Journal Title

Journal ISSN

Volume Title



Email is one of the prime sources of cyber-attacks against Internet users. Attackers often use social engineering in order to encourage recipients to click on a link which refers the user to a malicious website or opens a malicious attachment. The recipients may have confidential information stolen and suffer financial loss. In this report, we use two lexical models to detect malicious emails. The models extract solely lexical features from embedded URLs in email content and attachment filenames. The feature extraction does not require external services or tools, thereby it meets the need for real-time detection. The lexical features are used in conjunction with machine learning techniques. Five classifiers are used to test on our dataset of URLs and attachments. The experimental results show that Gradient Boosting Decision Tree outperforms all the other classifiers. It achieves an encouraging accuracy of 90.71% and 94.36% for URL model and attachment model. We can conclude that the lexical models are helpful in malicious email detection. However, we demand an attachment dataset of much more diversity and more features to be added for attachment model.



Machine Learning, URL, Attachment, Email filtering