A Self-Adaptive Hidden Markov Model for Emotion Classification in Chinese Microblogs

Microblogging is increasingly becoming one of the most popular online social media for people to express ideas and emotions. The amount of socially generated content from this medium is enormous. Text mining techniques have been intensively applied to discover the hidden knowledge and emotions from this huge dataset. In this paper, we propose amodified version of hiddenMarkov model (HMM) classifier, called self-adaptive HMM, whose parameters are optimized by Particle Swarm Optimization algorithms. Since manually labeling large-scale dataset is difficult, we also employ the entropy to decide whether a new unlabeled tweet shall be contained in the training dataset after being assigned an emotion using our HMM-based approach. In the experiment, we collected about 200,000 Chinese tweets from Sina Weibo. The results show that the F-score of our approach gets 76% on happiness and fear and 65% on anger, surprise, and sadness. In addition, the self-adaptive HMM classifier outperforms Naive Bayes and Support Vector Machine on recognition of happiness, anger, and sadness.


Introduction
In the recent years, online social media, such as Microblogging, has generated enormous content on the world wide web.Microblogs are extremely limited in the length of 140 characters.This allows users to easily update a Microblog or receive updates from mobile devices, such as cell phones.
Twitter has grown to one of the most popular Microblogging websites and generated hundreds of millions updates per day.Similar to Twitter, Weibo is a Microblogging website in China.In the past few years, it has become an increasingly important source of online social media information with its population of Chinese users growing rapidly to 309 million in 2012 (http://news.xinhuanet.com/english/sci/2013-01/15/c132104473.htm),as well as with more than 1,000 tweets generated in every second.
The emotional states of the users are able to be inferred from these large numbers of short tweets.Emotions in tweets play a significant role in many fields.The stock market and other socioeconomic phenomena could be predicted by using emotion analysis of the Twitter users [1].Even the gross happiness of a community or a country could be estimated from Twitter [2].
Text mining on Chinese tweets is a challenging work.First, word segmentation in Chinese is more difficult than in English since there is no space between Chinese characters and it requires disambiguating segmentation strings.Second, like English Twitter, new words are coming out every day and it is difficult to develop a system to recognize the unknown emotional words.Third, words are ambiguous in various contexts, especially those emotional words.Most of the recent sentiment analysis approaches [3,4] for Chinese tweets employ emotional word count as main feature and build emotional word dictionary for inference.
Considering these issues, we propose a self-adaptive hidden Markov model (HMM) based method to perform the emotion analysis for Chinese tweets.Our method implements a self-adaptive mechanism, which learns the parameters of HMM models and appends new recognized emotional words or sentences into emotional word dictionary, to deal with the issues of the sentiment ambiguities of words and the generation of new words.The main contributions of this paper are shown as follows.(a) More fine grained emotional categories are recognized.We employ the wellknown six basic emotional categories defined by Ekman [5]: happiness, sadness, fear, anger, disgust, and surprise, which are more intuitive and useful than traditional sentiment analysis categories: positive, negative, and neutral categories.(b) More useful features, other than word count, are defined.Our method employs the category-based features extracted from the short sentences to train our proposed HMM models.(c) A self-adaptive mechanism is used to test on a real dataset collected from Sina Weibo.
The rest of the paper is organized as follows.In Section 2, we survey the related works of emotion analysis either in Twitter or in Weibo and provide background on the concepts and methods related to emotion analysis for tweets, especially for Chinese tweets.In Section 3, we describe our proposed self-adaptive HMM-based method.In Section 4, we illustrate the dataset and the experiment setup and present results from our study.We discuss limitations of the method and future work and conclude in Section 5.

Related Work.
Weibo plays a significant role in people's lives; therefore its opinion mining and emotion analysis become interesting researches.There are a lot of researches on Twitter emotions abroad.Spencer and Uchyigit [6] identified subjective tweets and detected their emotion polarity by using Naive Bayes.In their test using bigrams without part of speech (POS) tags has a high accuracy of 52.31%.In the study carried out by Pak and Paroubek [7], they utilized Naive Bayes to classify tweets into positive, negative, and neutral.It is shown that using the presence of an -gram as a binary feature yielded the best results.A research [8] has shown that machine learning algorithms (Naive Bayes, Maximum Entropy, and SVM) had high accuracy rate with above 80% when sentiment categories consisted only of positive and negative.In the detection based on Twitter corpus, the authors explored various strategies of selecting features and found that bigram features outperform unigram and POS.
However, studies of Chinese Weibo emotions grow just a few years.Zhao et al. [9] used Naive Bayes model to train on emoticon features; they classified Chinese tweets to four categories of emotions (i.e., angry, disgusting, joyful, and sad), with an empirical precision of 64.3%.Yuan and Purver [10] classified Chinese tweets to six emotions which are defined by Ekman as well.In their experiments, characterbased features achieved 80% accuracies for "happiness" and "fear" based on the SVM, but there is insufficient analysis to the other four emotions.
Our self-adaptive HMM is to recognize more grained emotional categories by category-based features.Besides, the self-adaptive mechanism can continually enhance the recognition accuracy.

Emotion Models.
Two models are normally used to represent emotions: the categorical model and dimensional model [11].The categorical models are based on the assumption that the emotional categories are distinct.Ekman defined six basic emotions: anger, disgust, fear, joy, sadness, and surprise, and found a high agreement of expressions in multiple culture groups in his study.D'Mello et al. [12] proposed five categories (boredom, confusion, delight, flow, and frustration) for describing the affect states in ITS interactions.
In the dimensional model, core affects represent emotions in a two-or three-dimensional space.A valence dimension represents positive and negative emotions on different ends of scale.The arousal dimension distinguishes excited states from calm states.Sometimes a third, dominance dimension is used to differentiate if the subject feels in control of the situation or not.For example, Positive Affect and Negative Affect Schedule (PANAS) [13] provides two opposite mood factors (positive and negative) which has been widely used for opinion mining.
In this study, the reason we employ Ekman's emotion model is that the six basic emotions are distinct expressions and contain people daily mood.

Chinese Text Preprocessing.
Chinese, a unique language, differing from another language like English, is written without word delimiters.Therefore, its word segmentation is a significant task.Chinese text preprocessing is divided into two steps: word segmentation and stop words removing.
We utilized NLPIR (namely, ICTCLAS 2013) as a word splitter in this study.Its latest lexical database contains new words often appearing in Weibo.It also can recognize bloggers' nickname.Furthermore, an adaptive word splitter can add new words in its lexical database, which can be applied on our self-adaptive mechanism to update wordemotional vocabulary.
For reducing dimension of feature space, the next important step is to remove stop words (quantifier, pronoun, digits, notations, etc., e.g., "hundred, " "we, " "3, " and " * %").Similarly, Weibo has many unique properties like usernames and usage of links that must be removed.

Features.
In the field of document classification, each document is represented as a vector of term.Effective feature extraction is essential to make learning task effective.We adopt four features for our emotion classification model as follows.

Mutual Information (MI)
. MI is a useful information measure, which refers to the correlation between two sets of events.MI reflects the relevance between terms and text categorization on emotions.Two events mutual information of term  and emotion   is defined as Chi-Square (CHI).CHI is a statistical method to measure the lack of independence between term  and emotion   .The higher the CHI values, the more dependence between them: where  represents presence of  and membership in   ;  represents presence of  and nonmembership in   ;  represents absence of  and membership in   ;  represents absence of  and nonmembership in   ;  is the total number of tweets.

Term Frequency-Inverse Document Frequency (TF-IDF).
The main idea of TF-IDF is as follows: if a word or phrase in a tweet appears in a high frequency and rarely appears in other tweets; then it has a good ability to distinguish among emotions: where TF , represents the frequency of the term  in the emotion   ; the main idea of IDF  is that if the term  rarely appears in other emotions, IDF  can be an indicator feature for   .Their definition is where  , means the number of how many times  occurs in emotion   ,  denotes the total number of tweets in the dataset, and   is the number of tweets where  appears.
Expected Cross Entropy (ECE).ECE reflects the probability distribution of the text category on emotions and distances between the probability distribution given term .Its definition is where (  | ) represents the probability of emotion   given term  and (  ) represents the probability of tweets associated with emotion   .

Classification Methods
Hidden Markov Model (HMM).HMM is used to describe a Markov process with unknown parameters.It is difficult to determine implicit parameters of the process through observable parameters, which are then used to make further analysis.
Main parameters of HMM are the transition and emission probabilities [15]:  Suppose a set of  features extraction methods { 1 ,  2 , . . .,   }.A tweet  is first divided into  terms ( 1 ,  2 , . . .,   ).Let   be the th feature of the word   extracted by the method   .Then, an intermediate  ×  matrix of term-level features is obtained for calculating the tweet-level feature vector.For each emotion   (1 ≤  ≤ 6), the tweet-level feature vector of the tweet  can be calculated as follows:
In our experiment, a tweet is mapped into a fourdimensional vector using the four features: MI, CHI, TF-IDF and ECE.For example, the tweet "How lovely!" is divided into two terms: "how" and "lovely" in Chinese.The features of the terms on happiness and anger are calculated, respectively, as listed in Table 1.
The emission probability (  |   ) also can be calculated by the known state   , feature   , and training data.Our calculation method is given at the end of this chapter.Algorithm 1 depicts the procedure of building an HMM model  (  ) for emotion   .The tweet-level feature vector is denoted as We construct an HMM model for each of the six emotions.When a new tweet arrives, the probabilities of the tweet are calculated on the six models, respectively.The tweet is labeled with the emotion whose model is associated with the maximum probability.
A Strategy of Calculating (  |   ) (  ) .To find a way to calculate the (  |   ) of emotion   , denoted as (  |   ) (  ) , we attempt to use Jaccard similarity measure [16] to test correlation between value   and state   : where We assume that observation   or state   is associated with   if the above inequality is met.

Parameters Computed by PSO.
To build an excellent HMM classifier, an important problem we have to solve is to find optimized sequences of HMM states.There are a variety of strategies we can utilize for optimizing state parameters.Particle Swarm Optimization (PSO) algorithm [17] is a population based stochastic optimization.In our algorithm, each particle represents a candidate solution of an HMM parameter.These particles move around in the search-space.Through particle's local best known position, it is guided toward the best known positions.Eventually this iterative method can find best parameters.Compared to Genetic Algorithm [18], PSO has simple rules and more powerful ability of global optimization that has been good applications in our study: where [] indicates that its corresponding variable is a vector.V max : decide the granularity of searching space.We set it through values of training data.: keep the particles motion inertia.We set the inertia weight at 0.8. 1 ,  2 : represent accelerated weight of pushing each particle to [] and [].We set both weights to the value of 2.
Fitness Function.We use fitness function to find two kinds of parameters in HMM models, that is, the optimized associated factor   and the  (  )  , which is the th hidden states of the HMM model  (  ) .Accordingly, the fitness function can be defined as follows: fitness (  ,  ( 1 )  , . . ., where  1 -Measure is the metrics (in Section 4.3) of classifiers' accuracy.There are a total number of 7 ×  parameters that need to be searched.Since finding the whole set of parameters is time intensive by PSO, we divide the set of parameters into  independent parts.Seven parameters (  , , . . ., ), (1 ≤  ≤ ), are learned for each part according to the fitness function.

Feedback. Since it is time-consuming and expensive
to manually label all the tweets, we introduce a feedback method that can automatically decide whether an unlabeled tweet shall be chosen and contained in a training data pool after it is assigned with an emotion by our HMM model.Unlike the strategy (Lewis and Catlett [21], Scheffer et al. [22]), which merely concerns the assigned emotions, a suitable strategy is to compute the entropy of a tweet  to identify the discrimination of emotions; that is, an emotion is more discriminating than all other emotions on the tweet .Entropy is an information-theoretic measure and its formula is given as follows: where (  | ) means the probability of the tweet  recognized as emotion   .The less the (), the more the certainty about the tweet  on the emotion   .The algorithm of pool-based feedback is shown in Algorithm 2 to decide whether a tweet should be contained in the training dataset.

Experiments
4.1.Dataset.Since there are no public datasets of Chinese tweets associated with emotions, many studies employ emoticons to label tweets [7,8,10].However, only a small percentage of Chinese users post tweets with emoticons.Besides, several emoticons cannot find their corresponding emotions.Hence, we build our own dataset as follows.
(i) About 200,000 tweets were collected through Sina API.
(ii) For each emotion, more than twelve seed terms were chosen for term-level feature extraction.We refer interested readers to Aman's annotation scheme [23] for seed term selection.For instance, a set of seed terms on happiness may contain "enjoy" and "pleased." (iii) Manually screening.Not all tweets are associated with a corresponding emotion.We asked ten students in computer science to choose good indicator tweet for the six emotions.the output value To evaluate the performance of each of the HMM models, we randomly select several hand-annotated tweets.The test dataset contains 30 tweets for each emotion.Each test run is executed five times.We use the average results for our evaluation.

Evaluation Metrics.
Precision, recall, and -measure are the most commonly used evaluation methods for text classification tasks [7].We employ the three metrics in our experiment.For each emotion   , precision and recall are defined as where  represents the number of tweets that are correctly recognized;  represents the number of tweets that are falsely recognized as   ;  represents the number of tweets that are actually associated with   , but are recognized as another emotion.
To balance precision and recall rates,  1 is defined as Compare with Another Classifier.We compared our approach with two well-known classifiers, that is, Naive Bayes and Support Vector Machine (SVM) [24].They are often used for sentiment classification in literatures because of their easiness of implementation.As shown in Figure 1, our HMM-based approach outperforms Naive Bayes and SVM on happiness, anger, and sadness.The performances produced by all these classifiers have less difference on the other three emotions.
The results also show that, for the emotions of happiness, surprise, and fear, all the three classifiers get good performance.The -measure is greater than 65%.Furthermore, the -measure exceeds 78% on fear.None of the three classifiers recognize disgust accurately.

The Comparison of Six Emotions.
Figure 2 shows the classification results of the six emotions using our HMMbased approach.We find that our approach gets the best accuracy over 76% on happiness and fear.It also gets a good performance on anger, surprise, and sadness with an average accuracy of 65%.

Analysis of HMM Results
. We also attempt to analyze the reasons for false recognition on emotions.Twitters with specific characteristics may lead to false recognition using our HMM-base approach.
(i) Some twitters may contain multiple emotions.For example, "Wow, I'm so smart!" (original sentence is in Chinese) contains both happiness and surprise.Multiple emotional tweets may cause false recognition.It also explains why our approach gets low -score on disgust, which is often falsely recognized as anger.
(ii) A number of tweets contain new words that can not be recognized.
In addition, puns and polysemous words are significant factors but rather difficult to be recognized.According to these reasons, our HMM-based approach can be improved concerning the above characteristics in our future experiments.

Conclusion
In this paper, we present an approach to extract features using the four methods, that is, MI, TF-IDF, CHI, and ECE.Our classifier is based on HMM, in which hidden states are found by PSO algorithm.Since manually labeling large-scale dataset is difficult, we employ the entropy to decide whether a new unlabeled tweet shall be contained in the training dataset after being assigned an emotion using our HMM-based approach.
The experimental results show that HMM outperforms SVM and NB, especially on happiness, anger, and sadness.In terms of the recognition precisions on the six emotions, HMM gets better performance on happiness and fear than on anger, surprise, and sadness.
In the future, we will optimize HMM to accurately recognize twitters associated with other emotions and automatically add new words for emotional seed terms selection.Moreover, self-adaptive mechanism will be implemented in our HMM model.

3. 1 .
Features as Observed Variables.Feature extraction methods fall into two main categories: category-based extraction and global-based extraction.The features exacted by the latter methods can reflect the importance of words in global corpus but cannot be used to distinguish the differences between emotions.Therefore, we adopt category-based feature extraction methods in our models.All the selected features we introduced in Section 2.4 are category-based extracted features.

F 1 F 1 F 1 F 1 F 1 F 1 Figure 1 :
Figure 1: Performance comparison of three approaches on the six emotions, respectively.The -axis means the number of test runs.

Figure 2 :
Figure 2: The accuracy of our HMM-based approach on the six emotions.The -axis means the number of test runs.
probabilities): it means current state is depend on the previous state   ; (  |   =   ) (emission probabilities): observation symbol is released by current state   .In our model, an HMM model  (  ) is built for each emotion   .The features extracted from a tweet  are observed variables, while each hidden state  (  )  of  (  ) is considered as a state associated with the feature   .If the tweet  gets the highest probability on the model  (  ) , it means that  is associated with emotion   .

Table 1 :
The features of the terms "how" and "lovely" on happiness and anger, respectively.
3.2.HMM-Based Emotional ClassificationModel.In our case each hidden state is supposed to emit a value, and the whole model generates the sequence of values that constitutes the tweet's feature vector.States are considered to be a set of values that represent the best emotional category.There is a one-to-one mapping between HMM states and tweet features, which requires hidden states transition to be stationary and the states to begin at the first state  1 .When the classifier works, transition probabilities indicate features of test tweets being drawn closer to the emotion, which is given as follows: in   ; and  10 represents the number of tweets, which contains only   in   .In order to check whether emotion   associates to observation   or state   , we introduce an associated factor, denoted by   .Let    be the feature extracted from training data.The following describes the association:         −        ≤   or         −        ≤   .
11represents the number of tweets, which contains both   and   in   ;  01 represents the number of tweets, which contains only The variables  and  are random numbers between [0, 1].[] records the individual extremes and [] records the global extremes.The constant  is the inertia weight, and  1 and  2 present acceleration constants.Moreover, (a) updates velocities of particles according to previous velocity and the distance to the best particle.The equation (b) updates the particles' position present[] according to its previous position and current velocity.training data pool , test data pool , query strategy (•), query batch size  repeat for  = 1 to || do Optimized  (  ) by using current  and PSO algorithm; [19,20]ameter Settings.Different PSO parameters may have a large impact on optimization performance.Following are guidelines to help us select PSO parameters[19,20].Input: optimize hidden states of each  (  ) by PSO;(e) for each  (  ) (  ∈ ||), calculate feature vector[ 1  2 ⋅ ⋅ ⋅   ] (  )of  in emotion   and obtain