Detecting opinion spam and fake news using n-gram analysis and semantic similarity

dc.contributor.authorAhmed, Hadeer
dc.contributor.supervisorTraoré, Issa
dc.date.accessioned2017-11-14T15:34:59Z
dc.date.available2017-11-14T15:34:59Z
dc.date.copyright2017en_US
dc.date.issued2017-11-14
dc.degree.departmentDepartment of Electrical and Computer Engineeringen_US
dc.degree.levelMaster of Applied Science M.A.Sc.en_US
dc.description.abstractIn recent years, deceptive contents such as fake news and fake reviews, also known as opinion spams, have increasingly become a dangerous prospect, for online users. Fake reviews affect consumers and stores a like. Furthermore, the problem of fake news has gained attention in 2016, especially in the aftermath of the last US presidential election. Fake reviews and fake news are a closely related phenomenon as both consist of writing and spreading false information or beliefs. The opinion spam problem was formulated for the first time a few years ago, but it has quickly become a growing research area due to the abundance of user-generated content. It is now easy for anyone to either write fake reviews or write fake news on the web. The biggest challenge is the lack of an efficient way to tell the difference between a real review or a fake one; even humans are often unable to tell the difference. In this thesis, we have developed an n-gram model to detect automatically fake contents with a focus on fake reviews and fake news. We studied and compared two different features extraction techniques and six machine learning classification techniques. Furthermore, we investigated the impact of keystroke features on the accuracy of the n-gram model. We also applied semantic similarity metrics to detect near-duplicated content. Experimental evaluation of the proposed using existing public datasets and a newly introduced fake news dataset introduced indicate improved performances compared to state of the art.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/8796
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectClassificationen_US
dc.subjectFake contenten_US
dc.subjectFake newsen_US
dc.subjectn-gramen_US
dc.subjectMachine learningen_US
dc.subjectSemantic Similarityen_US
dc.subjectKeystrokes patternen_US
dc.titleDetecting opinion spam and fake news using n-gram analysis and semantic similarityen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ahmed_Hadeer_Masc_2017.pdf
Size:
1.32 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: