Bullet Subtitle Sentiment Classification Based on Affective Computing and Ensemble Learning

The bullet subtitle reflects a kind of instant feedback from the user to the current video. It is generally short but contains rich sentiment. However, the bullet subtitle has its own unique characteristics, and the effect of applying existing sentiment classification methods to the bullet subtitle sentiment classification problem is not ideal. First, since bullet subtitles usually contain a large number of buzzwords, existing sentiment lexicons are not applicable, we propose Chinese Bullet Subtitle Sentiment Lexicon on the basis of existing sentiment lexicons. Second, considering that some traditional affective computing methods only consider the text information and ignore the information of other dimensions, we construct a bullet subtitle affective computing method by combining the information of other dimensions of the bullet subtitle. Finally, aiming at the problem that existing classification algorithms ignore the importance of sentiment words in short texts, we propose a sentiment classification method based on affective computing and ensemble learning. Our experiment results show that the proposed method has higher accuracy and better practical application effect.


Introduction
In recent years, the bullet subtitle video site has become more and more popular among young people, and Bilibili is one of the typical representatives. The bullet subtitle text has the characteristics of traditional short text such as short text length and sparse features, but it is different from traditional short text at the same time. First, the bullet subtitle text is more simplified, colloquially, and symbolized; second, it often contains many teasing contents; third, it also contains conversational properties. Analyzing the sentiment of the bullet subtitle text, and then analyzing the overall sentiment distribution of the bullet subtitle video, can be used as a reference basis for website operators to make personalized video recommendations to users and can also provide decision support for public opinion governance in cyberspace security.
Text sentiment classification is a key technique for mining sentiment in the field of natural language processing and is now widely used in public opinion monitoring, marketing, and finance [1]. There are two main types of sentiment classification methods: sentiment lexicon-based methods and machine learning-based methods. The two methods have different advantages in different fields.
In the sentiment lexicon-based classification method, the sentiment lexicon used to calculate the sentiment value of the text plays a key role. Currently, English sentiment lexicons mainly use Word-Net-Affect [2] and Senti-WordNet [3], and Chinese sentiment lexicons mainly use HowNet Chinese Sentiment Lexicon and the Sentiment Vocabulary Ontology Library proposed by the Dalian University of Technology [4]. Based on the English sentiment lexicon, Taboada et al. [5] proposed a semantic-oriented calculation method SO-CAL, which is based on the lexicon to classify the sentiment polarity of the text. Based on the Chinese sentiment lexicon, Zhang et al. [6] realized the sentiment analysis of Chinese microblog text by extending the sentiment lexicon. Yao et al. [7] added Weibo emoji to the sentiment lexicon, which improved the result of sentiment judgment on Weibo containing emoji. Li et al. [8] proposed a seven-dimensional affective computing method for the bullet subtitle, which is relatively simple and only uses text information. The sentiment lexicon-based classification method performs well for text data containing sentiment words, but cannot handle text data without sentiment words and has insufficient generalization ability.
In machine learning-based classification methods, prelabeled training data is usually needed to train the model and then make predictions on the data to be classified. Word2Vec model proposed by Mikolov et al. [9] effectively solves the problem of dimensionality disaster caused by traditional feature extraction methods and takes into account the semantic relationships between words. Zhang et al. [10] combined Word2Vec and SVMperf to improve the accuracy of sentiment classification tasks. Ma and Zhang [11] proposed a bullet subtitle sentiment polarity analysis model based on TF-IDF and SVM, which can effectively classify the sentiment of bullet subtitle texts. However, this method ignores the semantic information of words and can only analyze simple bullet subtitle texts. Liu and Xing [12] improved the random forest algorithm by studying its shortcomings in text classification and optimized the hyperparameter tuning to improve the performance of the model. In recent years, neural networks and deep learning have been developing more and more rapidly, Recurrent Neural Network (RNN) [13] and its corresponding variants Long Short-Term Memory (LSTM) [14] and Gated Recurrent Unit (GRU) [15] also perform well in text classification. Kim [16] proposed a convolutional neural network model TextCNN for text sentiment classification. The model hardly needs to adjust parameters to achieve high classification accuracy, but the model ignores the sequential relationship between words in the text. Onan [17] proposed a five-layer neural network based on CNN-LSTM, which performs better in sentiment analysis than traditional neural networks. Yuan et al. [18] proposed a personality-based Weibo sentiment analysis model, PLSTM, which trains different classifiers through the Weibo of different personalities, and finally fuses the results. Onan and Toçoğlu [19] proposed a three-layer stacked bidirectional long and short-term memory architecture to identify satirical text documents, and the model's classification accuracy rate reached 95.3%. Zhuang and Liu [20] extracted highlight video clips by combining attentional mechanisms and LSTM for sentiment classification. Most of the above methods are based on Word2Vec for feature extraction. Although Word2Vec effectively captures the position and lexical meaning information of words, the bullet subtitle text belongs to the short text, and using Word2Vec alone is not effective.
Ensemble learning is to train multiple weaker classifiers and a certain fusion strategy to finally get a powerful classifier. Wang and Yue [21] pointed out that in specific problems and scenarios, combining the advantages of multiple models, the classification results may be better. Xie et al. [22] proposed a high-precision EEG sentiment recognition model by fusing XGBoost, LightGBM, and random forest.
The contributions of this paper mainly include first, by analyzing the bullet subtitle text data and collecting buzzwords on the Internet in recent years, we obtain the buzzword lexicon and the emoji lexicon, and finally, they are combined with the Sentiment Vocabulary Ontology Library to construct the Chinese Bullet Subtitle Sentiment Lexicon. Second, using the multiple dimensional information of the bullet subtitle, gain intensity is constructed, and combined with the text affective computing method, we propose Bullet Subtitle Calculation (BS-CAL), a multidimensional sentiment value calculation method for bullet subtitle. Finally, based on the analysis of existing algorithms, we propose a sentiment classification method for the bullet subtitle text, and comparative experiments are conducted between different methods and data sets.

Related Work
2.1. GRU. GRU [15] is a variant of RNN. Compared with LSTM, GRU has the advantages of fewer parameters and faster convergence speed, which can greatly reduce the training time of the model and achieve results close to or even surpassing LSTM (see Figure 1).

Ensemble
Learning. For some classification tasks, it may be more difficult to train a single learner with the best performance, but it is possible to combine multiple learners with general performance into a combined classifier with better performance. Ensemble learning is divided into homogeneous ensemble learning and heterogeneous ensemble learning. Homogeneous ensemble learning means that each learner is of the same type. Such learners are also called base learners. This method usually cooperates with boosting and bagging learning method, while heterogeneous ensemble learning means that each learner is of a different type. Such learners are also called component learners, and this method is usually used in conjunction with a fusion strategy.

2.3.
Other Methods. Word2Vec model [9] effectively solves the problems of dimensional disaster and lexical divide brought by traditional text data feature extraction methods such as bag-of-words model, one-hot, and tf-idf.
The attention mechanism [23] is based on the different distribution of human attention when observing objects. When the attention mechanism is applied to a machine 2 Wireless Communications and Mobile Computing learning model, the model can focus on the current important information and fully learn and absorb, thereby improving the classification performance of the model. Naive Bayes [24] is a generative model, which is an optimized classifier based on Bayesian networks and Bayesian statistics. The model assumes that each feature is independent of each other, can predict the posterior probability of the sample belonging to each category based on the prior distribution of sentiment words in the sentiment lexicon, and select the category with the highest probability as the prediction result.

Bullet Subtitle Sentiment
Classification Method 3.1. Sentiment Classification Process. Currently, existing Chinese sentiment lexicons do not contain buzzwords in bullet subtitle texts, which will lead to inaccurate affective computing or even the inability to calculate sentiment. Therefore, we crawl a large amount of bullet subtitle data from Bilibili and conduct systematic analysis, and we collect common buzzwords on the Internet, and finally fuse them to obtain the Chinese Bullet Subtitle Sentiment Lexicon. At the same time, the analysis reveals that the information of other dimensions of the bullet subtitle has some influence on the bullet subtitle sentiment, which is not taken into account by the traditional text affective computing method. Therefore, we propose BS-CAL by fusing the textual information of bullet subtitle texts and other dimensional information. Finally, the classification problem of bullet subtitle sentiment can be handled in two cases according to whether the bullet subtitle contains sentiment words: when it contains sentiment words, a heterogeneous ensemble learning method is used for prediction; when it does not contain sentiment words, the sentiment lexicon-based method cannot be used, and it is backed off to a single model method for prediction (see Figure 2).

Chinese Bullet Subtitle Sentiment Lexicon Construction.
The sentiment lexicon used in this paper is mainly derived from the Sentiment Vocabulary Ontology Library [4]. The lexicon contains more than 27,000 sentiment words and describes sentiments in terms of seven sentiment dimensions, including happiness, like, surprise, sadness, fear, anger, and disgust. Happiness and like belong to the positive sentiment, while the other five belong to the negative sentiment. The weight of each sentiment is divided into five levels: 1, 3, 4, 7, and 9, with 1 representing the lowest weight and 9 representing the highest weight.

Wireless Communications and Mobile Computing
There are a large number of special words, buzzwords, and internet phrases in the bullet subtitle text, and we obtained the bullet subtitle buzzword lexicon containing 2659 words. At the same time, the bullet subtitle text contains a large number of emojis, which contain rich information. Therefore, we collected 431 emojis as the emoji lexicon.
According to the ranking criteria of the Sentiment Vocabulary Ontology Library, the above buzzwords and emojis are artificially weighted and scored. Finally, the Sentiment Vocabulary Ontology Library, the bullet subtitle buzzword lexicon, and the emoji lexicon are fused together to get a relatively complete Chinese Bullet Subtitle Sentiment Lexicon. The weights and sentiment categories of some emojis are shown in Table 1.

Affective Computing.
In the problem of text sentiment analysis, sometimes it is necessary not only to know the sentiment tendency of the text but also to represent the sentiment quantitatively, while the current text affective computing method does not combine the characteristics of the bullet subtitle, resulting in poor quantitative accuracy of sentiment. In view of this, this paper proposes BS-CAL to describe the sentiment value of each dimension in detail. Some users deliberately choose some prominent bullet subtitle formats such as font size and color in order to express their strong sentiments when sending bullet subtitles. Therefore, BS-CAL combines the information of other dimensions of the bullet subtitle for quantitative calculation based on the traditional text affective computing method. The calculation formula of BS-CAL is as follows: where sentiValueðd, cÞ is the sentiment value of bullet subtitle d under sentiment category c, and ξðd, cÞ is the gain intensity of the bullet subtitle itself. The calculation formula of ξðd, cÞ is as follows: where W c is the set of sentiment words belonging to category c, fontSizeðdÞ is the font size of the bullet subtitle, generally speaking, the bigger the font, the stronger the sentiment. θðdÞ is whether the color of the bullet subtitle is the default color black. The calculation formula of θðdÞ is as follows: ηðdÞ is whether the bullet subtitle is a special bullet subtitle such as a flashing bullet subtitle or a rotating bullet subtitle. The calculation formula of ηðdÞ is as follows: textValueðd, cÞ is the text sentiment value of bullet subtitle d under sentiment category c. The calculation formula is as follows: where neg w is the number of negatives preceding the sentiment word w, μ w is the magnitude of the sentiment value of the word itself, P w is the set of sentiment punctuations immediately following the sentiment word w, α p is the sentiment value of the sentiment punctuation, D w is the set of degree adverbs preceding the sentiment word w, and β d is the strength of the degree adverbs. φ w,c is the sentiment reversal variable of word w in the sentiment category c when calculating the sentiment category c. The calculation formula of φ w,c is as follows: For the sentiment category c, BS-CAL ssssthm is described as follows: Assuming that the number of words of bullet subtitle is n, the time complexity of BS-CAL is linear complexity OðnÞ. In the task of this paper, the bullet subtitle with sentiment words can be directly calculated according to the above formula, and the final sentiment classification can be obtained by

Classification Algorithm.
In the problem of bullet subtitle text classification, there are many sentiment words that can play a decisive role in the result of text classification, but some bullet subtitle texts do not contain any sentiment words, which leads to the classification method based on the sentiment lexicon is not fully applicable in bullet subtitle sentiment analysis. In view of this, we divided the bullet subtitle into two categories for processing according to whether they contain sentiment words: when no sentiment words are included, the Gated Recurrent Unit classification model combined with the attention mechanism is used for prediction (ATT-GRU); when sentiment words are included, the heterogeneous ensemble learning method of BS-CAL, Naive Bayes, and ATT-GRU three models is adopted. Using the three models as above can fuse the advantages of each model, ATT-GRU classification model based on Word2Vec makes full use of the semantic and positional relationships between words; the BS-CAL method is good at handling the bullet subtitle containing sentiment words and has a high performance in classifying the bullet subtitle with strong sentiment; Naive Bayes method based on sentiment lexicon fully considers the implicit influence brought by different combinations of sentiment words. The model construction is shown in Figure 3.
We abbreviate the bullet subtitle sentiment classification method as BSSCM. The specific algorithm is described as follows: Assuming that each bullet subtitle contains n words, the time complexity of the above algorithm is Oðm * nÞ for Naive Bayes classifier, Oðm * nÞ for BS-CAL, and Oðn * m 2 Þ for ATT-GRU, which has the highest time complexity. According to the heterogeneous ensemble learning strategy, the overall time complexity is determined by the component learner with the highest complexity, so the overall time complexity is Oðn * m 2 Þ.

Experiments and Evaluations
4.1. Data Set. In this paper, we use a crawler program to crawl the bullet subtitle data in Bilibili and filter out about 120,000 bullet subtitle data in the fields of animation, movies, and other major events such as Huawei event and TikTok event, and finally retain about 100,000 bullet subtitle data after data cleaning. Among them, each bullet subtitle data contains 9  Step 1: extract all sentiment words W c belonging to sentiment category c in d; initialize i = 0, score = 0, tmp = 0 Step 2: find all the degree adverbs D wi and the number of negatives neg w i between the sentiment word w i and the previous sentiment word Step 3: find all the sentiment punctuation P w i between the sentiment word w i and the next sentiment word Step 4: use equation (6) to obtain the sentiment reversal variable φ w i ,c Step 5: use equation (5) to calculate the text sentiment value of the current sentiment word Step 6: when i is greater than or equal to |W c | , execute Step7; otherwise, execute Step2, tmp = tmp + value, i = i + 1 Step 7: calculate the gain intensity ξ of d using the corresponding dimensional features of d and equation (2); score = tmp + ξ Step 8: output score and the algorithm ends.  Table 2.
About 13,000 data were extracted from all the bullet subtitle data for manual sentiment annotation as the training data set, and finally, about 6,000 positive sentiment bullet subtitles and about 7,000 negative sentiment bullet subtitles were obtained, and about 8,500 bullet subtitles with sentiment words were included in the annotated data.
In order to verify the stability of the classification performance of the method proposed in this paper, the bullet subtitle under the recent hot event video was crawled. The dataset contains three events, event one is "Ma became famous overnight" (event 1), event two is "nCoV mutation" (event 2), and event three is "innocent young man Ding" (event 3). The specific data is shown in Table 3.

Sentiment Lexicon Comparison.
In order to verify that the sentiment lexicon proposed in this paper is more applicable to the bullet subtitle domain, we compare it with the Chinese Sentiment Vocabulary Ontology Library [4] and HowNet Sentiment Lexicon for experiments, and the experimental algorithm uses the bullet subtitle sentiment classification method proposed in this paper.

Affective Computing Method Comparison.
In order to verify that the BS-CAL method proposed in this paper can quantify the bullet subtitle sentiment more effectively, we compare it with Traditional Text Calculation (TTC) and the Sentiment Value Calculation of Danmaku (SVCD) method proposed by Li et al. [8].

Algorithm Comparison.
In order to verify the effectiveness of the method proposed in this paper, we compare with the following algorithms: Ma and Zhang [11] proposed a TF-IDF and SVM-based sentiment polarity analysis model of the bullet subtitle text. This method is the first to use machine The type of the bullet subtitle 3 25 The font size of the bullet subtitle 4 16777215 The font color of the bullet subtitle (decimal) 5 1554387395 The unix timestamp 6 0 The type of the bullet subtitle pool 7 d826ed2 The sender's encrypted uid 8 14284342245720068 The rowID of the bullet subtitle in the database 9 Hahahaha really funny The bullet subtitle text  Step 1: split d i into words and initializing f lag = 1 Step 2: determine whether d i contains sentiment words, if it does, execute Step3; otherwise, execute Step5, f lag = 0 Step 3: input d i into BS-CAL described in Algorithm 1 and get the classification result a Step 4: input d i into Naive Bayes classifier to obtain the classification result b Step 5: use one-hot to process the splited data, and feed into ATT-GRU classifier to get the classification result c Step 6: if f lag == 1, then d i c is the result of the fusion of a, b, c; otherwise, c is the final result of d i c Step 7: output the bullet subtitle sentiment category d i c and calculate the accuracy, precision, recall and F1-score.
Algorithm 2   6 Wireless Communications and Mobile Computing learning methods for the bullet subtitle sentiment classification problem; Zhuang and Liu [20] proposed the sentiment analysis based on AT-LSTM, which performed well in the sentiment classification problem of the bullet subtitle; Ada-Boost and Random Forest are typical representatives of ensemble learning methods. Naive Bayes [31] and TextCNN [16] are often used as benchmarks for sentiment classification problems.

Algorithm Execution Time Comparison.
We compare the training time and test time in different models to analyze model performance.

Empirical
Analysis. The practicality of the classification method proposed in this paper is verified by conducting experiments on the latest events (see Table 3).

Evaluation Metrics.
In this paper, we use a common evaluation metric for text sentiment classification to evaluate the experimental effect, including accuracy, precision, recall, and F1-score. Precision mainly measures the classifier performance from prediction results, recall mainly measures the classifier performance from the sample itself, and F1-score can be combined with these two metrics to evaluate the classifier performance as a whole.  Table 4, where DUT-SD represents the Sentiment Vocabulary Ontology Library, BS-SD represents the sentiment lexicon proposed in this paper, and N and P denote the negative and positive sentiment categories. As shown in Table 4, BS-SD has the highest classification accuracy compared with the other two traditional sentiment lexicons, and although HowNet sentiment lexicon has the highest recall in the positive category, it has the worst relative accuracy, while precision and F1-score of BS-SD are higher than the other lexicons in the positive and negative categories. The above results show that BS-SD performs better in the bullet subtitle sentiment analysis task.

Affective Computing Method Comparison.
In order to verify that BS-CAL proposed in this paper can more accurately quantify the bullet subtitle sentiment, we compare it with different computing methods. The results are shown in Table 5.
As can be seen from Table 5, SVCD performs better on negative sentiment, with the highest F1-score of 89%, while BS-CAL achieves 92.5% precision on positive sentiment and has the relatively highest F1-score and classification accuracy. In summary, BS-CAL can quantify the bullet subtitle sentiment more accurately.

Algorithm Comparison.
In order to verify the accuracy and generalization ability of the classification method proposed in this article, we compare different algorithms with it. The results are shown in Table 6.
From Table 6, it can be seen that AT-LSTM has the highest precision of 96.2% in the negative category, and ATT-GRU has the best overall performance in the negative   category. Both AdaBoost and BSSCM have the highest precision of 95.8% in the positive category, and RF has the highest recall in the positive category. BSSCM has the highest classification accuracy of 94.6% and best performance in the positive category. In summary, for the bullet subtitle sentiment classification problem, BSSCM has the best overall performance.

Algorithm Execution Time
Comparison. This paper has calculated the training and testing time of each model, and the final results are shown in Table 7.
The above experiments were conducted on a PC computer with 2.6 GHz hexa-core Intel Corei7. From Table 7, it can be seen that AdaBoost takes the longest time in training, while BSSCM is the second. BS-CAL does not require training and can directly perform prediction, so the training time is 0. The ensemble learning method takes longer in the test phase, AdaBoost takes the longest time in testing, BSSCM takes the second longest, and BS-CAL does not require more complex model parameters, so the test time is also the shortest.

Empirical Analysis.
In order to verify the practicability of the bullet subtitle sentiment classification method proposed in this paper, we use three events for experimenting (see Table 3), and the results are shown in Figure 4. Figure 4 shows that under different hot topics, the performance of each model is slightly different. RF has a higher accuracy than BSSCM in event 1, which may be due to the presence of some unregistered words in this topic, which leads to a slight decrease in the performance of BSSCM, while the effect of BSSCM on other topics is significantly better than other models. Therefore, the method proposed in this paper is relatively effective in the sentiment classification of the bullet subtitle.

Conclusion
Based on the traditional sentiment lexicon, this paper proposes the Chinese Bullet Subtitle Sentiment Lexicon by combining the bullet subtitle buzzword lexicon and the emoji lexicon. Then, combined with the information of other dimensions of the bullet subtitle, BS-CAL is proposed. Finally, considering that the bullet subtitle text does not necessarily contain sentiment words, a single classification model ATT-GRU and a heterogeneous ensemble learning method based on BK-CAL, Naive Bayes, and ATT-GRU are proposed to deal with different situations, respectively.
However, this article still has some shortcomings. First, as time goes by, there will be more and more new vocabulary, and the timeliness of the lexicon is poor. The lexicon can be further studied, such as automatic mining of buzzwords in the bullet screen, so that the lexicon can be adaptively updated. Second, there is no public dataset for the bullet subtitle and the labeled data in this paper is subjective to a certain extent. Therefore, a general dataset for the bullet subtitle domain can be considered to propose in the future. Finally, this paper divides the sentiment categories into positive and negative. In the future, we can study how to classify sentiment from a more granular perspective.

Data Availability
The data used to support the findings of this study are included in the article.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.