User-Centered Spam Detection Using Linear and Non-Linear Machine Learning Models
dc.contributor.author | Singh, Manpreet | |
dc.date.accessioned | 2019-04-24T23:02:06Z | |
dc.date.available | 2019-04-24T23:02:06Z | |
dc.date.copyright | 2019 | en_US |
dc.date.issued | 2019-04-24 | |
dc.degree.department | Department of Electrical and Computer Engineering | en_US |
dc.degree.level | Master of Engineering M.Eng. | en_US |
dc.description.abstract | The Enron dataset is one of the very few datasets in the world of spam ham detection that has helped the data science community understand the relationship of ham and spam mails for specific users and build powerful models around it. The Enron dataset being textual in nature poses unique challenges in the manner in which information is extracted from the text and supplied to the models. The purpose of the MEng project is to replicate the results obtained by Metsis et al. [1] on spam detection using different strains of Naïve Bayes (NB) classification models and identify areas for improvement. While Metsis et al. focused solely on linear models, we have explored the performance of non-linear models as well. We have compared the existing NB models with the nonlinear models and simulated the mails that a typical mailbox receives in real time with incremental training. We have also created new data sets from the raw data of the Enron mails, and used these data sets to test the different models. They show interesting results that prove that the proposed approach works for personalized mails more accurately than being generalist in nature. | en_US |
dc.description.scholarlevel | Graduate | en_US |
dc.identifier.uri | http://hdl.handle.net/1828/10751 | |
dc.language.iso | en | en_US |
dc.rights | Available to the World Wide Web | en_US |
dc.subject | machine learning | en_US |
dc.subject | spam | en_US |
dc.subject | spam filter | en_US |
dc.subject | xgboost | en_US |
dc.subject | user centered | en_US |
dc.title | User-Centered Spam Detection Using Linear and Non-Linear Machine Learning Models | en_US |
dc.type | project | en_US |