Abstract:
Distributed Denial-of-Service (DDoS) is the number one cyber threat to the availability of business networks, applications, and services [1]. DDoS is a malicious attempt to disrupt normal traffic to a target server, service or network by overwhelming the target with illegitimate traffic. The consequences can be devastating such as financial losses, loss of productivity, brand damage, credit and insurance rating downgrades, compromised customer and supplier relationships, and IT budget overruns [2]. DDoS attacks continue to rise in complexity, volume and frequency, threatening the network security of all enterprises, regardless of their size [1]. The number of DDoS attacks is predicted to almost double to 14.5 million in 2022 compared to 2017 [3]. In 2017, the top motivations behind these attacks were criminals demonstrating attack capabilities, gaming, and extortion [1].
There is a critical need to devise Network Intrusion Detection Systems (NIDSs) to accurately predict DDoS attacks. In this work, supervised Machine Learning (ML) techniques are evaluated using the CICDDoS2019 dataset which consist of 80 network traffic features with benign (legitimate) traffic and 12 DDoS attacks [4]. This dataset was modified to create six datasets with the 24 best features [4] to predict DDoS attacks and benign traffic by employing undersampling and oversampling techniques. The ML algorithms evaluated are Bayesian Network (BayesNet), Bootstrap Aggregating (Bagging), k-Nearest Neighbors (kNN), Sequential Minimal Optimization (SMO), and Simple Logistic. The Waikato Environment for Knowledge Analysis (WEKA) tool is used for implementing the ML algorithms using k-fold (k = 5) cross validation. The evaluation metrics precision, recall, F-measure, True Positive Rate (TPR), False Positive Rate (FPR), and execution time are determined for the six datasets. The results obtained show that Bagging provides the best overall performance followed by kNN, BayesNet, SMO, and Simple Logistic. Further, the execution time is approximately linear with the dataset size.