StretchVADER – A Rule-based Technique to Improve Sentiment Intensity Detection using Stretched Words and Fine-Grained Sentiment Analysis

Date

2024-01-22

Authors

Jokhio, Muhammad Naveed

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Watching a horror movie and someone shouts “HEEEELLLPPPPPPPPP” or someone replies to your joke with a huge “HAHAHAHAHAHAHAHAHAHAHA” is known as word stretching. Word stretching is not only an integral part of spoken language but is also found in many texts. Though it is very rare in formal writing, it is frequently used on social media. Word stretching emphasizes the meaning of the underlying word, changes the context and impacts the sentiment intensity of the sentence. In this work, a rule-based fine-grained approach to sentiment analysis named StretchVADER is introduced that extends the capabilities of the rule-based approach called VADER. StretchVADER detects sentiment intensity using textual features such as stretched words and smileys by calculating a StretchVADER Score (SVS). This score is also used to label the dataset. It has been observed that many tweets contain stretched words and smileys, e.g. 28.5% in a randomly extracted dataset from Twitter. A dataset is also generated and annotated using SVS which contains detailed features related to stretched words and smileys. Finally, Machine Learning (ML) models are evaluated using two different data encoding techniques, e.g. TF-IDF and Word2Vec. The results obtained show that the XGBoost algorithm with 1500 gradient-boosted trees and TF-IDF data encoding achieved a higher accuracy, precision, recall and F1-score than the other ML models, i.e. 91.24%, 91.11%, 91.24% and 91.08%, respectively.

Description

Keywords

Citation