Manual annotation of sentiment lexicons costs too much labor and time, and it is also difficult to get accurate quantification of emotional intensity. Besides, the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons (Wang et al., 2010). This paper implements statistical training for large-scale Chinese corpus through neural network language model and proposes an automatic method of constructing a multidimensional sentiment lexicon based on constraints of coordinate offset. In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts, we further present a sentiment disambiguation algorithm to increase the flexibility of our lexicon. Lastly, we present a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc. Experiments show the superior performance of SentiRuc lexicon in category labeling test, intensity labeling test, and sentiment classification tasks. It is worth mentioning that, in intensity label test, SentiRuc outperforms the second place by 21 percent.
Opinion mining and sentiment analysis of online text have become a hot research area in recent years, which includes opinion summarization and sentiment classification. Most of these tasks would benefit from a high quality sentiment lexicon which could provide excellent sentiment features when no training data is available.
The primary form of sentiment lexicons is binary annotation with positive and negative labels, such as Sentiwordnet developed by Italian Information Technology Research Institute [
In recent years, driven by diverse tasks in different fields, both the polarity word and its related target are included as a sentiment item. However, the application areas of such 2-tuple lexicons as 〈polarity word, target〉 are strictly limited to one specific field, and also the size of such lexicons could easily explode with the growth of training data, which causes the problem of sparseness of features. Massive online text makes the limitations of domain sentiment lexicons increasingly apparent, especially when the sentiment classification tasks vary in different areas. Thus, a general and adaptable lexicon is important for sentiment analysis to avoid this problem.
This paper presents a method of automatic construction and optimization of a multisentiment lexicon through statistical analysis of a massive online corpus. The main content of this paper is as follows. First, we use neural network language model to obtain distributed representations of words from a massive online corpus (Sogou News Corpus, 3.17 GB) [
The remainder of this paper is organized as follows. In Section
Many Chinese sentiment lexicons, such as NTUSD, HowNet, and DUT Affective Lexicon Ontology, are manually annotated to ensure the lexicon’s coverage and effectiveness. But manual methods usually cost too much labor and time and also tend to be subjective; the coverage is also a concern. To provide more granularities, it is necessary to introduce statistical language model to automatically annotate sentiment category and intensity.
To label the sentiments, we should first study the sentiment categorization. As early as 1957, Osgood distributed human emotion to three aspects: strong and weak, good and bad, active and passive [
In addition to qualitative labeling, the sentiment intensity needs to be annotated quantitatively. A lot of the existing lexicons are manually annotated, including WordNet [
Therefore, it is important to optimize the collection of sentiment lexicon entries and the intensity labeling. Chen et al. [
Considering the above points, this paper presents an unsupervised model of automatic construction of a multisentiment lexicon based on WLI neural network language model [ We propose a new categorization of human emotions, which makes the linguistic features more suitable for computational analysis. We define the converting constraint set of distance and sentiment intensity and present an automatic construction model based on WLI language model. We present a global optimization framework based on several manually annotated semantic resources, to improve the semantic description of our lexicon SentiRuc.
In this section, we present the “5 pairs with 10 polarities” categorization of human emotions and automatically annotate the multisentiment lexicon SentiRuc by defining the converting constraint set of distance and sentiment intensity. We also investigate the emotional disambiguation of multiple affective words in this section.
We integrated the entries of NTUSD dictionary, HowNet lexicon, and the DUT Ontology as the entries of our SentiRuc lexicon, which contains a total of 14250 emotional words.
Traditional binary sentiment labeling has gradually become unable to meet the development of sentiment analysis tasks. The primary work of multiple sentiment labeling is the categorization of human emotions. Section
Words contain very rich meanings, and statistical language models are used to extract those semantic features. Given a corpus, neural network language model could map words into a high dimensional continuous space. Word2Vec is a tool based on deep learning and is released by Google in 2013, which adopts two main language models: the continuous bag of words model and continuous skip-gram model [
All word representations are located in a high dimensional vector space, in which we determine an entry’s polarity and intensity by computing the distance between the entry and seed words. However, there are many words that could express, for example, happiness. And it is difficult to choose one as the only seed of “happy.” Here, to decrease the deviation caused by subjectivity, we use coordinate offset of word representations to list the 50 nearest neighbors of “happy” and then manually choose several words as the seed set of “happy.” For example, we collect all distances between “bittersweet” and “happy” seeds and take the average distance as the distance between “bittersweet” and “happy” emotion. For any word W, we can obtain a 10-dimensional distance vector Dis(W) and each dimension of Dis(W), respectively, represents the distance between W and happy, like, believable, gratitude, complimentary, sad, hate, unexpected, angry, and critical
Previous research pointed out that, generally, a word mainly contains only one or two emotions [
Paper [
Each dimension of Senti(W) is denoted as Senti(W)[
Each dimension of Dis(W) is denoted as Dis(W)[
The intensity of a certain sentiment Senti(W)[
From the converting constraints set we could derive the generating formula of W’s sentiment vector Senti(W) as follows:
Senti(W)[
In formulas (
Finally, to every sentiment word W, we annotate it in our sentiment lexicon with a 10-dimensional vector Senti(W). The value in each dimension represents the similarity between W and this sentiment, that is, W’s intensity of this sentiment.
In Section
We use a hybrid approach in screening multisentiment words from our lexicon’s vocabulary. So far there is no effective method for automatic selection of multisentiment words. We attempted to extract words which appear in different synonym sets in “HIT Tongyicicilin” and “Synonym Lexicon for Pupils” and take these words as the candidate set
Then, 113694 sentences containing words in
Count(CWpositive) represents the count of context word CW in all sentences which have a positive W. Count(CWnegative) is the count of CW in all sentences that have a negative W. Count(CW) is the total count of CW in all sentences where W appears.
After the tendency disambiguation, a multisentiment word W is split and segmented as two independent cases
Section
If
Senti(W)[
If
Senti(W)[
If the annotation of SentiRuc could contribute more in relevant tasks, the sentiment classification result using SentiRuc shall be closer to human judgment than that using other lexicons. We select 6000 sentences from the dataset of NLPCC 2013 Competition and NLPCC 2014 Competition and label the sentences with a “main sentiment” and an optional “subsentiment,” which are both involved in the 10 sentiment categories of SentiRuc. For a certain set of parameters in formulas (
By combining the above three evaluation methods we obtain the global optimization framework based on manually constructed resources, as shown in Figure
The global optimization framework.
The global error function is
The global error
We first evaluate the generating process of SentiRuc and then verify the availability of SentiRuc. To evaluate the rationality of the generating process, we design the parameters tuning experiment to prove the rationality of constraint set (Section
In all experiments, the threshold distance value
Sogou News Corpus (3.17 GB) is used as the training text set. After segmentation by ICTCLAS 5.0 developed by Chinese Academy of Sciences, this corpus contains about 0.83 billion words, and the vocabulary size is 1,104,914. We do not have any other preprocessing of the data, so it can be ensured that every n-gram sample is a real Chinese word sequence and also that the word representations can show the actual semantic distribution of each word.
Section
The tuning of generating parameters.
|
|
|
|
|
---|---|---|---|---|
Baseline | 1 | 1 | 1 | 2.301 |
|
||||
Dropping |
0 | 1 | 1 | 2.402 |
1 | 0 | 1 | 2.660 | |
1 | 1 | 0 | 2.599 | |
|
||||
Parameter |
1.875 | 1 | 1 | 2.123 |
1.875 | 1.075 | 1 | 2.046 | |
1.875 | 1.075 | 1.145 | 1.972 |
It can be seen that dropping any constraint would increase the global error, which indicates that all constraints are useful in computing the intensity of SentiRuc. When
Section
According to the labeled result, the tendency disambiguation algorithm based on word distribution density, which is introduced in Section
Experiments of sentiment disambiguation.
Target | Samples | Accuracy | |
---|---|---|---|
Overall | 148 words | 113694 | 95.52% |
Highest accuracy |
|
3095 | 98.71% |
|
1924 | 98.70% | |
Lowest accuracy |
|
281 | 87.90% |
|
130 | 61.54% |
The overall disambiguation accuracy of all 148 words in the 113,694 sentences reaches 95.52%. The entries “epigone” and “yes-man” get the lowest accuracy, mainly due to the limited training data caused by their low occurrence. Generally, this experiment result shows that our disambiguation algorithm can effectively distinguish different tendencies of a word.
The sentiment polarity and intensity of SentiRuc are both drawn from a Chinese corpus of GB grade level; therefore, its semantic description should be closer to actual semantic distribution than manually constructed lexicons. We try to evaluate the annotation quality of several existing lexicons by analyzing their sentiment category consistency (qualitative evaluation) and sentiment intensity consistency (quantitative evaluation). Sentiment category consistency examines the similarity of synonyms’ (or antonyms’) tendency annotation in SentiRuc. Sentiment intensity consistency refers to the similarity of synonyms’ (or antonyms’) intensity annotation in SentiRuc.
HIT Tongyicicilin and Synonym Lexicon for Pupils contain 55,265 synonyms, from which we selected 2500 synonyms as the test dataset
Senti(W)[
The evaluation result of synonyms is in Table
Evaluation of synonyms in each lexicon.
Lexicons | Synonyms | Category consistency | Intensity consistency |
---|---|---|---|
NTUSD | 2179 | 87.29% | — |
HowNet | 2500 | 89.04% | — |
DUT | 2500 | 88.44% | 70.89% |
SentiRuc | 2500 | 91.88% | 92.54 % |
Evaluation of antonyms in each lexicon.
Lexicons | Antonyms | Category consistency | Intensity consistency |
---|---|---|---|
NTUSD | 1450 | 84.00% | — |
HowNet | 1772 | 86.51% | — |
DUT | 1774 | 85.40% | 67.55% |
SentiRuc | 1774 | 87.94% | 91.62 % |
Tables
This experiment investigates the performance of sentiment analysis task using different lexicons. 3,100 sentences are selected from NLPCC 2013 Competition and NLPCC 2014 Competition and 3,700 sentences containing one of the 148 multisentiment words are selected from Sina Microblog. All 6,800 sentences are labeled with a “main sentiment” and an optional “subsentiment” tag. We define 2-gram part of speech (2-POS) and 3-gram part of speech (3-POS) for every labeled sample and extract sentiment tendency features with the help of SentiRuc. SVM is used in the multivariate classification experiments. Compared with human annotation result, the accuracy of the multivariate classification reaches 62.0%.
In order to facilitate the comparison of different lexicons, we also conduct binary classification experiments (positive or negative). Each of the 6,800 sentences is labeled with a “positive” or “negative” tag by four Chinese native speakers. The other 3200 objective sentences without affection are also labeled with “neutral” and added in the test dataset. For each sentence, we extract the 2-POS and 3-POS features and identify sentiment features with the help of SentiRuc. We use SVM classifier to implement tenfold cross validation. In addition, we also investigate the performance of SentiRuc before and after the tendency disambiguation. The results can be evaluated by
Result_Correct is the number of sentences that are correctly labeled with “positive” (or “negative”). Result_Proposed is the number of sentences labeled with “positive” (or “negative”) by SVM model. Result_Labeled is the number of sentences manually labeled with “positive” (or “negative”). The result is shown in Table
Sentiment classification based on different lexicons.
Result of positive text | |||
---|---|---|---|
Lexicon | Precision | Recall |
|
NTUSD | 0.603 | 0.375 | 0.462 |
HowNet | 0.728 | 0.540 | 0.620 |
DUT | 0.721 | 0.552 | 0.593 |
SentiRuc (before disambiguation) | 0.744 | 0.588 | 0.657 |
SentiRuc (after disambiguation) | 0.782 | 0.678 | 0.726 |
|
|||
Result of negative text | |||
Lexicon | Precision | Recall |
|
|
|||
NTUSD | 0.480 | 0.319 | 0.383 |
HowNet | 0.611 | 0.451 | 0.519 |
DUT | 0.572 | 0.445 | 0.501 |
SentiRuc (before disambiguation) | 0.633 | 0.468 | 0.538 |
SentiRuc (after disambiguation) | 0.671 | 0.589 | 0.627 |
Table
This paper presented an automatic construction and global optimization framework of a multisentiment lexicon SentiRuc. The main jobs include the categorization of human emotions, an automatic construction model based on WLI language model, a global optimization framework based on several manually annotated semantic resources, and the disambiguation of multisentiment words. The experiment in Section
It is difficult to directly compare existing lexicons because of various sentiment categorizations. We will investigate appropriate evaluation method of multiclass sentiment classification tasks.
Although Section
The authors declare that there are no competing interests.
This research was supported by the National Science Foundation for Young Scientists of China under Grant 61601371, the National Natural Science Foundation of China under Grant 71271209, Beijing Municipal Natural Science Foundation under Grant 4132052, and Humanity and Social Science Youth Foundation of Ministry of Education of China under Grant 11YJC630268.