Overcoming Imbalanced Class Distribution and Overfitting in Financial Fraud Detection: An Investigation Using A Modified Form of K-Fold Cross Validation Approach to Reach Representativeness

Rocha Bezerra Junior, Joao Batista2023-08-172023-08-1720232023-08-17http://hdl.handle.net/1828/15268According to the Internet Crime Report 2022, the number of complaints and the amount of financial losses from 2018 to 2022 show the total of $27.6 billion dollars, in 3.26 million complaints. Technology has been in development by institutions interested in mitigating cybercrimes, and researchers have been contributing with them to keep ahead the fraudulent systems. Machine learning and deep learning are being applied in a variety of studies to understand and learn how to avoid fraudulent transactions in real-world financial networks from financial institutions, through the use of past transactions. This thesis proposes to use a modified version of k-fold crossvalidation technique (Full Sets approach) applied to the PaySim synthetic dataset and submit it to a neural network model, and compare the results to one method of splitting that uses five folds of 20% of the dataset each fold applied to the same model, and then compare it to the machine learning algorithms Random Forest (RF), Logistic Regression (LR), and AdaBoost (AB). The measurements scores applied to evaluate the performances of the models are accuracy, precision, recall, F1 score, specificity, AUC-ROC, and PRC.enAvailable to the World Wide WebFinancial fraud detectionSynthetic datasetMachine learningNeural networkImbalanced datasetOvercoming Imbalanced Class Distribution and Overfitting in Financial Fraud Detection: An Investigation Using A Modified Form of K-Fold Cross Validation Approach to Reach RepresentativenessThesis