Abstract:
On average, every second, approximately 6,000 tweets are tweeted on Twitter, which accounts for approximately 500 million tweets a day, and hence, 200 billion tweets per year. In 2010, tweets per day were around 50 million, so in just five years the amount of data has increased by ten times. This exponential increase in data creation and user activity makes Twitter an ideal tool for analysing financial trends. Sentiment analysis is the process of identifying and categorizing opinions expressed in text and determining writer attitudes towards a particular topic. There are few existing systems for analysing tweets to predict sentiments and results may not be accurate due to the random and short nature of tweets. Existing information retrieval techniques rely heavily on linguistic features like part of the speech or trigger words and perform poorly because they cannot understand sentiments. In this project, a segmentation algorithm is used to improve the accuracy and hence provide better sentiment prediction. In the proposed model, a tweet is split into meaningful segments (a word or group of words), while context is preserved and extracted from the segments.