Mining Phishing Campaigns using FP Trees




Karanth, Anirudh

Journal Title

Journal ISSN

Volume Title



Phishing is a fraudulent online activity conducted by hackers to obtain sensitive information such as credit card number, social security number, or passwords of a user by disguising themselves as legitimate entity via emails, text messages or phone calls. It has been reported that in 2019 nearly 4% of all emails were phishing emails, which correspond to about 3.4 billion emails. Analyzing those phishing emails is an important step towards understanding the motivation and methods of phishers. However, analyzing manually that amount of astronomical data is impossible and ineffective considering that phishers are always finding unique and novel methods to evade detection. One way to keep up with the huge amount of data and the growing sophistication in evasion tactics is to focus the analysis around phishing campaigns. A phishing campaign is the collection of phishing emails built from the same template. This report adapts and extends previous work on spam campaigns for mining phishing campaigns. The phishing campaigns are mined using Frequent Pattern Tree (FP Tree). The campaigns are identified by investigating the contribution of different email features. Experiments are conducted using a dataset consisting of over 17,342 phishing messages, yielding 231 different campaigns in the best case. The campaigns found for given set parameters are found to be very stable with an error percentage of around 1.5%.



FP Trees, Data Mining, Phishing, Unsupervised Learning, Python