College Papers

ABSTRACT activity in the field of sentiment analysis.

ABSTRACT

Such text mining techniques as sentiment analysis and opinion mining
have become especially relevant with the development of Web 2.0, since they
serve as an instrument for monitoring opinions of users, who constantly express
their thoughts via various social networks, blogs, etc. Twitter is a networking service which is most often used for
sentiment analysis and opinion mining in different domains, ranging from online
commerce to politics. Researchers seem to be particularly keen on the latter,
specifically on the concept of examining and predicting something from Twitter,
be it voting intentions, a winner of the televised debates, or a sentiment
towards a political figure. However, the existing research in this domain
appears to be one-sided, as papers mostly overwhelmingly support the idea of
predictions from Twitter instead of critically analyzing pros and cons. This survey aims at looking in more
detail at ways sentiment analysis and opinion mining for politics with the use
of Twitter are performed by different researchers. We
will try to summarize where the currently existing approaches fall short, and where more
research needs to be done.

KEYWORDS

Sentiment
analysis, opinion mining, Twitter, politics

1?INTRODUCTION

Subjectivity is an integral part of the human
nature, and that is why people find it essential to be able to express their
opinions. Having access to these opinions is beneficial in many ways, so it is
no surprise that in the last decade there has been an intense research activity
in the field of sentiment analysis. Insights and applications from sentiment analysis
of text have been used in various scientific areas, including psychology 1,
sociology 2, law/policy making, and politics/political science. The latter,
with its numerous possibilities for sentiment analysis, such as the analysis of
attitudes towards policies, parties, government agencies, politicians, the
analysis of trends and the evaluation of public (voters’) opinion, will be in
the focus of the present survey paper.

Nowadays people extensively share their views
via social media sites. Facebook, Twitter and similar social media are full of
people’s comments about products, services, political figures and much more. Most
researchers who are carrying out sentiment analysis, and specifically political
sentiment analysis, which is of special interest for us, use Twitter as a
source of data. Tweets seem to be a valuable source of information, since
people tweet everything they are concerned about. Thus, this survey paper will be dealing
with the political sentiment analysis of Twitter.  

The paper is organised as follows: related work
on general sentiment analysis is described in Section 2.  Section 3 presents the work on political sentiment
analysis from the two perspectives: Section 3.1 is focused on the idea of
usefulness of Twitter for politics, while Section 3.2 is concerned with the
critical approach to the topic, and Section 3.3 discussed how it can be
improved. Section 4 concludes the paper and provides ideas for the future
research.

2?SENTIMENT ANALYSIS IN GENERAL

Sentiment analysis and opinion mining, in spite
of being two very active research areas in NLP (natural language processing)
nowadays, do not have a long history.  As
Pang & Lee 3 state, early works on beliefs, such as the one by Wilks
& Bein 4, dealing with building a model of beliefs for computer
understanding, can be considered predecessors of modern sentiment analysis. Later
research was focused on such problems as the identification of subjectivity in
narrative 5, subjectivity classifications 6, the computation of the point
of view 7, 8 etc.; in other words, these projects were trying to solve
problems similar to those of the sentiment analysis. However, they aimed at
recognizing opinionated documents, but not classifying opinions. 

An avid interest in opportunities of sentiment
analysis has started growing rapidly since early 2000, and there are a few
reasons for that. First, having discovered commercial and intellectual
applications that sentiment analysis offers, researchers realized that it can
be useful in numerous domains, which boosted the research. Second, people have
gained the access to a tremendous amount of opinionated data in the social media,
without which sentiment analysis would have been impossible. Actually,
present-time social media research is often concerned with sentiment analysis.
Finally, the advancement of machine learning methods played a pivotal role in
raising the interest towards sentiment analysis.

As mentioned
above, the beginning of the 21st century was also the beginning of the
extensive research into sentiment analysis, and a large number of papers on
sentiment analysis have been published since then. Computer-based approaches
have started to matter more in the field of sentiment analysis than the knowledge-based
approach which was applied before. According to Liu 9, the term sentiment
analysis itself was first utilised in the work by Nasukawa & Yi 10,
although we can also find earlier papers on the subject, e.g. 11, 12. Early sentiment analysis
research dealt mostly with quite long documents, such as movie 11 and product
reviews 12. Turney 12 employed PMI-IR (Pointwise Mutual
Information-Information Retrieval) algorithm to classify product reviews as
recommended/not recommended. Pang & Lee 11 used standard machine learning
techniques (Naive Bayes, maximum entropy classification, and support vector
machines) with features based on unigrams and bigrams to classify documents by
overall sentiment. The research was further expanded with sentiment analysis of
blogs and news 13, 14.

Sentiment analysis in Twitter emerged later with first papers being published in 2009 15,
16, 17. Bollen et al. 15 were trying to establish
whether there is a connection between public mood patterns (evidence from sentiment
analysis of tweets) and changes in the current socio-economic situation, using
a psychometric instrument, called the Profile of MoodStates (POMS). Two other works used machine
learning techniques (Naive Bayes, Maximum Entropy Model, SVM) for the problem,
trying out various features. Parikh & Movassate 16 achieved the accuracy
of over 85% with Naïve Bayes classifier (multinomial unigram) on pre-processed
tweets, and the accuracy of 64% with MaxEnt classifier. Go et al. 17 reached
the accuracy of 82.7% with Naïve Bayes (unigram and bigram features), 83% with
MaxEnt (unigram and bigram features) and 82.2% with SVM (unigram features).
These early works were not domain-specific.

Sentiment analysis applied to Twitter had rapidly
become popular and was adapted by various domains, such as well familiar
studies of customer behaviour 18, economics 19, and politics.

The latter will be discussed in more detail in
the next section.

3?POLITICAL SENTIMENT ANALYSIS IN TWITTER

3.1?How it works

The
work in the field of sentiment analysis of tweets in political contexts has
mainly focused on the analysis of public sentiment towards
political figures, election predictions and mining of political preferences. Thus, in
this section papers dealing with these three lines of research will be covered.

However, before discussing papers in depth, it should be mentioned
that main approaches to the sentiment analysis are divided into knowledge-based
(lexicon-based) techniques and statistical (machine learning) techniques. The
former rely on affective word lists, so called sentiment lexicons; the latter
employ linguistic features. Hybrid approaches, which make use of both knowledge ontologies and
machine learning models, are the most common, and the large number of papers
which will be considered further implement these hybrid approaches.

According
to Gayo-Avello 20, the craze for the
sentiment analysis using Twitter, especially for the prediction of election results,
started with the paper by Tumasjan et al. 21 on German federal elections. The
aim of the paper was to check whether one can assess offline political
sentiment on the basis of Twitter messages and whether one can relate the
Twitter activity to the popularity of parties in reality and, thus, predict the
outcome of the elections. Linguistic Inquiry and Word Count (LIWC) text
analysis software was used to ascribe one of 12 dimensions (positive emotions,
negative emotions, sadness, anger etc.) to tweets and then to create profiles
of politicians on their basis. According to the results, the total number of
Twitter posts could predict the results of elections, and the mean absolute
error (MAE) was very low (1.65%), which is close to MAE of traditional polls
(1.61%). Moreover, the sentiment of tweets was characteristic of the real world
political sentiment. These promising conclusions encouraged other researches to
continue the work in the area.

Let
us consider studies related to the sentiment on political figures,
which are quite often placed in context of elections. Works by O’Connor et al.
22, Wang et al. 23, and Soelistio et al. 24 all focused on the sentiment
analysis of U.S. presidential candidates of 2008, 2012 and 2016 accordingly, but
used slightly different methods. In 22 daily sentiment scores for Barack
Obama and John McCain and the presidential job approval rating for Obama in
2009 were counted according to the number of positive and negative messages in
Twitter, and then the moving average aggregate sentiment was calculated, as
everyday sentiment ratio appeared to be inconsistent. Affective word lists were
acquired from OpinionFinder. It was found that presidential job approval polls
correlated significantly with their Twitter sentiment data, while electoral
polls correlated less strongly. 23, 24 tried to mine public opinions about
all candidates in 2012 and 2016 elections. 23 created a real-time sentiment
analysis system and used a complex preprocessing method on gathered tweets so
that their tokenizer could handle URLs, emoticons, phone numbers, HTML tags,
mentions, hashtags, and symbol or Unicode character repetition. At the same
time, 24 wanted to prove that their system with a simpler preprocessing step can
yield decent results, that is why they only removed URLs and filtered tweets with
candidates’ names. Both studies used Naïve Bayes Classifier with unigram
features was used for the classification. 23 used a more complicated
classification with 4 categories (positive, negative, neutral, unsure), while 24
chose binary classification, focusing more on positive sentiment. In the end models
had comparable accuracies (59% and 54.8%).

As stated in
the beginning of the section, researchers have also been interested in predicting
outcomes of elections, not only in the USA. Bermingham & Smeaton 25 and Bakliwal et al. 26 chose tweets on the Irish General
Elections of 2011 as the data source for their research. Both studies used the
most common classification into 3 classes – positive, negative and neutral.

First, 26 applied
naïve lexicon-based classification. They improved it by using part-of-speech
information (adjectives as indicators of sentiment), handling negations and
comparative expressions, and adding domain-specific idioms.  In the end the accuracy of 59% was achieved.

In their turn,
25, having shown in their previous research that supervised learning provides
better sentiment analysis than unsupervised learning, e.g. the use of sentiment
lexicons, performed the analysis of political sentiment in tweets by using
supervised classification with unigram features. Sociolinguistic features, such
as emoticons and unconventional punctuation, were kept, while topic
terms, usernames and URLs were removed to avoid any bias. They reached 65%
accuracy.

Further in
their work 26 applied supervised learning as well, using not only unigram
features, but also “hand-crafted” features (scores based on subjectivity
lexicons and Twitter-related features, e.g. positive/negative emoticons, URLs,
positive/negative/neutral hashtags) and proved that it increased the accuracy
(61.6%).

Although the
volume of tweets was proven to be a better predictor of the election outcome,
studies had also shown
that the sentiment analysis plays
an essential role anyway.

3.2 Why it does not always work

Meanwhile, the number of academic works dealing with the feasibility
of sentiment analysis for politics with the use of Twitter is relatively small,
although reasonable doubts are being expressed in those.

First
of all, the paper by Tumasjan et al. 21, which triggered the appearance of
numerous studies on political sentiment analysis and predictions of popular
opinion and elections, got a reaction from Jungherr et al. 27. They
pinpointed that 21 did not set out valid rules neither for data collection, nor for the
choice of parties and time period. As a result, the bias was introduced, and,
thus, the findings of 21 were refuted. Tumasjan et al. 28 later answered to
the comment by Jungherr et al., but their arguments were not persuasive enough,
and they had to tone down their previous conclusions. 

Metaxas et al. 29 tried to
replicate the reportedly successful methods by 21, 22, using data collected
during US elections in 2010. From experiments which 29 carried out, it became clear that the accuracy of
lexicon-based sentiment analysis for politics is low (36.85%), which is only a
bit better than a random classifier. Moreover, disinformation and propaganda
were wrongly interpreted, resulting in most messages
being assigned a positive label. Finally, following the approach of Golbeck
& Hansen 30, who computed political preferences of media outlets’ Twitter
followers, 29 realized that they were rather unsuccessful in predicting
political leaning as well. Actually, 21, whose method was used, in their
paper also stated that, although sentiment analysis in Twitter for politics
appears to be an interesting area of research, it has not yet reached an
acceptable level. Thus,
results of previous papersturned out to be irreproducible.

25 state that, compared to the volume of tweets, the
sentiment is reactive, that is why it is complicated to distinguish real
political preferences and opinions of people and their immediate reactions to
news and events. Thus, political sentiment analysis is not that reliable.  

Gayo-Avello 31, being an active
proponent of critical approach to the political sentiment analysis, decided to
examine whether 2008 US presidential elections could be predicted from Twitter.
He achieved the MAE of 13.10%, which meant a failure. In his later paper (2012)
he came up with a list of flaws in current research regarding prediction of the
outcome of elections, and it can be generalized to any political sentiment
analysis in Twitter.

1.     
Most
importantly, naïve sentiment analysis is often used.

2.     
All
tweets are believed to be reliable, although, as Metaxas et al. stated,
propaganda and misleading information often come into play.

3.     
Demographic bias is present. People of
different age groups, social groups, genders are not equally represented in
Twitter (and social media in general). Until and unless the major part of the
population uses social media on a regular basis, results from the data will be
controversial and often incorrect.

4.     
Self-selection
bias is neglected. People post in Twitter voluntarily, that is why data is
formed by politically active users.

5.     
Despite
the fact that the social media are extremely tempting for researchers because
of the availability of the large amount of data which can be mined, data sets
are not always representative of the population.

6.     
Previous
positive results do not guarantee the success of the next studies.

3.3 How it should be changed?

Following on from aforementioned problems, some core lines of
research and recommendations for the future research on political sentiment
analysis using Twitter data can be derived.

Since sentiment analysis, especially
its textual component, is a core task, it should be significantly improved.
There exists a considerable need for an appropriate domain-specific (political)
lexicon, which will most certainly increase the accuracy of modern systems. For
that, a more profound understanding of peculiarities of political discourse in
social media, including Twitter, might also be useful, as shown by Somasundaran
& Wiebe 32.

In addition, at this point,
researchers have not come up with a common approach to Twitter-specific
features and the ways they should be used. For example, counting retweets as
separate tweets expressing sentiment is under question, as well as news
headlines and other links leading to external resources. Thus, more experiments
should be performed so as to create the methodology.

As for the opinions and sentiments,
in particular political ones, they are very likely to change throughout
people’s lifetime, depending on occurring events, functioning government,
politicians involved and such. In order to handle these changes, sentiment
analysis systems should be able to detect whether a statement with a sentiment can
be generalized to represent an actual opinion or is specific to the actual
moment. Maynard &
Funk 33 suggest that additional information, e.g. people’s
interests, (dis)likes, political preference, etc., should be added to existing
techniques.

Actually, results which are obtained
from Twitter should be adjusted because of the demographic bias mentioned in
the previous section. Some strata are under-represented, whereas others are over-represented
in Twitter. Researchers
should identify diverse groups of users and balance the weights of opinions
depending on how much they are represented in the population.  This is one of the hardest
tasks, but researchers should attempt to collect demographic data of users in
their datasets, as was done by Mislove et al. 34. This represents another
essential part of the future work.

Another recommendation concerns the
gold standard. Although most often polls are treated as the gold standard (and sometimes,
unfortunately, papers do not mention the gold standard at all), they are very
noisy indicators of truth, that is why the future research should look for a
more reliable source, e.g. interviews, or as Gayo-Avello 20 puts it, “the
real thing”.

Finally, credibility should also be
a problem of great importance, because, sadly, not everything that is written
in Twitter is true. Large amount of data should be ignored, since it is
misleading. Some work has been done in this area (e.g. 35), so existing
techniques should be applied to make sure that data one is using is credible
and disinformation, sock puppets and spammers are removed.

4?CONCLUSION

To conclude, for the past 15 years, sentiment analysis has been a
quickly developing and changing area because of the advancement of
microblogging sites, namely Twitter, which open a unique opportunity to develop
theories and adopt technologies that mine texts for sentiments. Research has been undertaken in many
different domains, politics being of more interest to us.

Since a lot of users express their
political opinions via Twitter, tweets grow to be a fertile source of people’s
sentiments. Main problems which researchers try to solve with the use of
political sentiment analysis in Twitter include public sentiment towards
political figures, election predictions and mining of political preferences.

There have been studies results of which stated
that sentiment analysis in Twitter had predicted the outcome of the elections, had
been successful in picturing the real-world sentiment towards a politician and
had defined political preferences accurately. According to these works, the
most useful features, significantly reducing MAE, were those of existing
sentiment lexicons, and Twitter-specific features, such as emoticons, hashtags
etc.

Despite the aforementioned claims, I side with
researchers taking up a more critical position and saying that political
sentiment analysis in Twitter is not quite reliable and successful yet. As we
could see, some experiments which achieved positive results could not be
replicated. Although others could be replicated, they still had a very high MAE
in the end.  Furthermore, many complications,
such as demographic and self-selection bias, unrepresentativeness of samples,
and unreliability of information are simply ignored.

However,
political sentiment analysis using Twitter is sure to improve and play its role
in opinion estimation of the public if the following directions of the future (and ongoing) research are taken:

1.     
Adjustments to political
sentiment analysis systems so as to make them more accurate;

2.     
Creation of the common
methodology for political sentiment analysis;

3.     
Research on demographic data
for Twitter users;

4.     
User profiling according to their
personal preferences (especially concerning their political leanings);

5.     
Search and establishment of the
reliable gold standard;

6.     
Automatic detection of sock
puppets, deceptive information, and propaganda.

In
general, the research in the political sentiment analysis in Twitter comes into
line with a more general goal of sentiment analysis: to create a system which
will answer the questions about what people think on the basis of the texts
they are writing in social networks. For sure, producing results similar to the
traditional political surveys is a handy application of sentiment analysis, but
it also appears to be one of the important steps towards more complicated and
advanced applications.