Spam Detection using N-gram Analysis and Machine Learning Techniques
Date
2019-12-17
Authors
Kaur, Simran
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
There are many types of fraudulent activities happening today that open the loopholes of security, but email is a cheaper and widely known method for delivering false messages to potential victims. Spam is a form of email messages that is not only annoying for users but can provide a conduit for fraudulent or deceptive content delivery. In this project, a spam detector to identify an email as either spam or ham is built using n-gram analysis and supervised machine learning models. Three different algorithms are implemented and compared, namely naïve-Bayes, logistic regression and support vector machines (SVM). Experimental evaluation of the detector using a public dataset shows that the SVM and logistic regression attain the highest accuracy.
Description
Keywords
N-grams, Spam detection