Emotion detection with data fusion

Date

2024

Authors

Khuzhaniyazova, Maida

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This report explores the performance of three machine learning models — SVM with SGDClassifier, Gradient Boosting, and XGBoost — in detecting emotions using data fusion techniques. Early Fusion was chosen for integrating features due to its simplicity and reliable performance. The study employs the MELD dataset, which combines text, audio, and visual data from over 1,300 dialogues and 13,000 utterances in the “Friends” TV show. This dataset provides a unique multimodal approach to understanding emotions in conversational contexts, making it ideal for emotion recognition tasks. Evaluation metrics for the models included accuracy, F1-score, precision, recall, and AUCROC, calculated over multiple training iterations. By comparing the performance of these models on a comprehensive, multimodal dataset, this study meets the growing demand for accurate emotion detection in conversational AI. XGBoost demonstrated high and consistent performance on the MELD dataset; however, its effectiveness may vary under different conditions or datasets. SVM with SGDClassifier achieved the widest accuracy range, though less stable on nuanced emotions. Gradient Boosting delivered consistently strong AUC-ROC values but required full retraining with each data update, affecting its adaptability. Overall, while XGBoost and SVM delivered good performance, their accuracy was subject to fluctuations across iterations. Gradient Boosting consistently showed strong AUC-ROC values, but its disadvantage is the need to completely retrain the model when new data is added, which reduces efficiency.

Description

Keywords

emotion detection, data fusion, multimodal emotion recognition, machine learning, neural networks, emotion recognition systems, multimodal fusion, XGBoost, Support Vector Machine (SVM), Gradient Boosting, emotion classification, incremental learning, performance metrics

Citation