Mining Phishing Campaigns using FP Trees

Karanth, Anirudh

Mining Phishing Campaigns using FP Trees

Files

Karanth_Anirudh_MEng_2020.pdf (761.56 KB)

Date

2020-08-04

Authors

Karanth, Anirudh

Abstract

Phishing is a fraudulent online activity conducted by hackers to obtain sensitive information such as credit card number, social security number, or passwords of a user by disguising themselves as legitimate entity via emails, text messages or phone calls. It has been reported that in 2019 nearly 4% of all emails were phishing emails, which correspond to about 3.4 billion emails. Analyzing those phishing emails is an important step towards understanding the motivation and methods of phishers. However, analyzing manually that amount of astronomical data is impossible and ineffective considering that phishers are always finding unique and novel methods to evade detection. One way to keep up with the huge amount of data and the growing sophistication in evasion tactics is to focus the analysis around phishing campaigns. A phishing campaign is the collection of phishing emails built from the same template. This report adapts and extends previous work on spam campaigns for mining phishing campaigns. The phishing campaigns are mined using Frequent Pattern Tree (FP Tree). The campaigns are identified by investigating the contribution of different email features. Experiments are conducted using a dataset consisting of over 17,342 phishing messages, yielding 231 different campaigns in the best case. The campaigns found for given set parameters are found to be very stable with an error percentage of around 1.5%.

Keywords

FP Trees, Data Mining, Phishing, Unsupervised Learning, Python

URI

http://hdl.handle.net/1828/11971

Collections

Master's Projects

Full item page

Mining Phishing Campaigns using FP Trees

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections