Spam Detection using N-gram Analysis and Machine Learning Techniques

Date

2019-12-17

Authors

Kaur, Simran

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

There are many types of fraudulent activities happening today that open the loopholes of security, but email is a cheaper and widely known method for delivering false messages to potential victims. Spam is a form of email messages that is not only annoying for users but can provide a conduit for fraudulent or deceptive content delivery. In this project, a spam detector to identify an email as either spam or ham is built using n-gram analysis and supervised machine learning models. Three different algorithms are implemented and compared, namely naïve-Bayes, logistic regression and support vector machines (SVM). Experimental evaluation of the detector using a public dataset shows that the SVM and logistic regression attain the highest accuracy.

Description

Keywords

N-grams, Spam detection

Citation