User-Centered Spam Detection Using Linear and Non-Linear Machine Learning Models

dc.contributor.authorSingh, Manpreet
dc.date.accessioned2019-04-24T23:02:06Z
dc.date.available2019-04-24T23:02:06Z
dc.date.copyright2019en_US
dc.date.issued2019-04-24
dc.degree.departmentDepartment of Electrical and Computer Engineeringen_US
dc.degree.levelMaster of Engineering M.Eng.en_US
dc.description.abstractThe Enron dataset is one of the very few datasets in the world of spam ham detection that has helped the data science community understand the relationship of ham and spam mails for specific users and build powerful models around it. The Enron dataset being textual in nature poses unique challenges in the manner in which information is extracted from the text and supplied to the models. The purpose of the MEng project is to replicate the results obtained by Metsis et al. [1] on spam detection using different strains of Naïve Bayes (NB) classification models and identify areas for improvement. While Metsis et al. focused solely on linear models, we have explored the performance of non-linear models as well. We have compared the existing NB models with the nonlinear models and simulated the mails that a typical mailbox receives in real time with incremental training. We have also created new data sets from the raw data of the Enron mails, and used these data sets to test the different models. They show interesting results that prove that the proposed approach works for personalized mails more accurately than being generalist in nature.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/10751
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectmachine learningen_US
dc.subjectspamen_US
dc.subjectspam filteren_US
dc.subjectxgboosten_US
dc.subjectuser centereden_US
dc.titleUser-Centered Spam Detection Using Linear and Non-Linear Machine Learning Modelsen_US
dc.typeprojecten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Singh_Manpreet_MEng_2019.pdf
Size:
3.57 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: