Emotion detection with data fusion
Date
2024
Authors
Khuzhaniyazova, Maida
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This report explores the performance of three machine learning models — SVM with SGDClassifier, Gradient Boosting, and XGBoost — in detecting emotions using data fusion techniques. Early Fusion was chosen for integrating features due to its simplicity and reliable performance. The study employs the MELD dataset, which combines text, audio, and visual data from over 1,300 dialogues and 13,000 utterances in the “Friends” TV show. This dataset provides a unique multimodal approach to understanding emotions in conversational contexts, making it ideal for emotion recognition tasks.
Evaluation metrics for the models included accuracy, F1-score, precision, recall, and AUCROC, calculated over multiple training iterations. By comparing the performance of these models on a comprehensive, multimodal dataset, this study meets the growing demand for accurate emotion detection in conversational AI. XGBoost demonstrated high and consistent performance on the MELD dataset; however, its effectiveness may vary under different conditions or datasets. SVM with SGDClassifier achieved the widest accuracy range, though less stable on nuanced emotions. Gradient Boosting delivered consistently strong AUC-ROC values but required full retraining with each data update, affecting its adaptability.
Overall, while XGBoost and SVM delivered good performance, their accuracy was subject to fluctuations across iterations. Gradient Boosting consistently showed strong AUC-ROC values, but its disadvantage is the need to completely retrain the model when new data is added, which reduces efficiency.
Description
Keywords
emotion detection, data fusion, multimodal emotion recognition, machine learning, neural networks, emotion recognition systems, multimodal fusion, XGBoost, Support Vector Machine (SVM), Gradient Boosting, emotion classification, incremental learning, performance metrics