A machine learning approach to network security anomaly detection

Date

2025

Authors

Verma, Prateek

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Supervised machine learning has emerged as a highly effective technique for classification in anomaly-based cyber-threat detection systems due to its predictability, and high accuracy. This work utilizes the CICIDS2017 dataset which is widely recognized as a benchmark for anomaly detection research. The work begins with the idea to implement a two-layered ML-based detection model. The proposed system’s first layer performs binary classification to differentiate benign from malicious traffic, while a secondary, multi-class classification system identifies specific attack types to implement targeted countermeasures. Incremental Principal Component Analysis (PCA) technique and Synthetic Minority Oversampling (SMOTE) is applied to balance the dataset, critical for both binary and multi-class classification tasks. Among all evaluated machine learning models, LightGBM achieved superior performance with 99% accuracy, 98.1% F1-score, and minimal resource usage, outperforming traditional methods like SVM, KNN, Random Forest and Decision Trees. Further feature reduction, guided by feature importance scores, led to an even more lightweight model while performance metrics such accuracy, recall, and F1-score, remained consistent or improved slightly within a margin of ±0.5% highlighting the stability and efficiency of the proposed approach. This proposed system demonstrates that advanced, resource-efficient supervised ML models such as LightGBM can significantly improve real-time threat detection while offering a scalable and cost-effective solution for future cybersecurity deployments.

Description

Keywords

autoencoder, anomaly detection, Principal Component Analysis, Light Gradient Boosting

Citation