Overcoming Imbalanced Class Distribution and Overfitting in Financial Fraud Detection: An Investigation Using A Modified Form of K-Fold Cross Validation Approach to Reach Representativeness




Rocha Bezerra Junior, Joao Batista

Journal Title

Journal ISSN

Volume Title



According to the Internet Crime Report 2022, the number of complaints and the amount of financial losses from 2018 to 2022 show the total of $27.6 billion dollars, in 3.26 million complaints. Technology has been in development by institutions interested in mitigating cybercrimes, and researchers have been contributing with them to keep ahead the fraudulent systems. Machine learning and deep learning are being applied in a variety of studies to understand and learn how to avoid fraudulent transactions in real-world financial networks from financial institutions, through the use of past transactions. This thesis proposes to use a modified version of k-fold crossvalidation technique (Full Sets approach) applied to the PaySim synthetic dataset and submit it to a neural network model, and compare the results to one method of splitting that uses five folds of 20% of the dataset each fold applied to the same model, and then compare it to the machine learning algorithms Random Forest (RF), Logistic Regression (LR), and AdaBoost (AB). The measurements scores applied to evaluate the performances of the models are accuracy, precision, recall, F1 score, specificity, AUC-ROC, and PRC.



Financial fraud detection, Synthetic dataset, Machine learning, Neural network, Imbalanced dataset