Dark Web Traffic Detection Using Supervised Machine Learning




Nezhad, Sahra Zangeneh

Journal Title

Journal ISSN

Volume Title



The purpose of this study is to examine the feasibility of utilizing machine learning algorithms for distinguishing and categorizing VPN and TOR traffic on the dark web. The dark web, often referred to as the inaccessible or shadow aspect of the internet, is marked by its anonymity and inability to be indexed by search engines, making it a common platform for illegal activities such as drug trafficking, money laundering, and cybercrime. Both Virtual Private Networks (VPNs) and The Onion Router (TOR) are commonly employed technologies for anonymizing web traffic and accessing the dark web. While these technologies can be used for legitimate purposes, such as protecting the privacy and bypassing internet censorship, they can also be exploited by cybercriminals. To achieve our objective, we will leverage a dataset of dark web traffic, specifically, the CIC-Darknet2020 dataset, which comprises a comprehensive and diverse collection of network traffic captures from the dark web, incorporating traffic features from both The Onion Router (TOR) and Virtual Private Network (VPN) technologies. Our model will be constructed using supervised machine learning methods, specifically classification algorithms including Random Forest (RF), Support Vector Machine (SVM) , Naive Bayes (NB) , and the Decision Tree (J48) classifiers. The experiments will be performed using five-fold and ten-fold cross-validation, and 66/34 and 80/20 percentage splits, utilizing the open-source software WEKA. The performance of the model will be evaluated based on parameters such as execution time, accuracy, precision, F-measure, and recall. The results of this study indicate that the Decision Tree (J48) classifier surpasses the other classifiers in terms of accuracy, achieving 99.6% accuracy with an execution time of 15 seconds for a ten-fold cross-validation.



VPN, TOR, Machine Learning