Multimodal Blog Sentiment Classification Based on MD-HFCE

In recent years, the rapid growth of multimodal information has become an important factor affecting the results of sentiment analysis. However, a few state-of-the-art works take into account the multimodal features and sentiment fuzziness. To this end, a fuzzy method is proposed for assessing sentiment intensity in this paper. Firstly, based on the visual-text conversion network (CNN-LSTM), as well as sentiment optimization through SentiBank and SentiBridge, the visual features are normalized to the text features. At the same time, the emotional features of the extracted audio will be predicted by the random forest algorithm. Subsequently, the sentiment characteristics are processed by dual hesitant fuzzification to form positive and negative sentiment intensity factors. Finally, a classification method, that is, MD-HFCE (multilayer dual hesitant fuzzy comprehensive evaluation), fuzzy comprehensive evaluation method improved by Mamdani fuzzy reasoning, is proposed to realize the multifeature fuzzy sentiment classification based on the comprehensive sentiment dictionary. /e classification results are applicable to the topics of sentiment monitoring. /e experimental results show that the proposed algorithm can effectively realize feature integration and improve the average sentiment classification accuracy of multimodal blogs to 82.2%.


Introduction
With the advent of the information age, enormous data is generated by users on the Internet in real time. It is important to utilize the data for sentiment analysis to achieve public opinion monitoring, stock market prediction, and consumption preference analysis [1]. Due to the diversity of social information, multimodal sentiment analysis has attracted great attention from researchers. To this end, various methods are investigated in this research field. e traditional dictionary method ignores a lot of multimodal information containing emotions. Although the extended dictionaries can solve the problem to some extent, the performance improvement is still limited. e sentiment analysis approaches based on the emerging machine learning and neural network can effectively utilize the multimodal information. However, they fail to consider the sentiment fuzziness and may lead to long runtime due to processing huge data of images and videos.
To solve these problems, the multilayer dual hesitant fuzzy comprehensive evaluation (MD-HFCE) method is proposed in this paper. It is mainly based on the improved fuzzy comprehensive evaluation model of Mamdani fuzzy reasoning. Moreover, the feature transformation model in the convolutional neural network and long short-term memory (CNN-LSTM) neural network is utilized. e main contributions of this paper are summarized as follows: (1) e dual hesitant fuzzy set is used to fuzzify the sentiment intensity, which considers both positive and negative sentiment factors at the same time. e rest of this paper is organized as follows: in Section 2, we briefly introduce previous work on recent sentiment analysis. In Section 3, we describe the overall process of the proposed method, and then, the experimental results are given in Section 4. Finally, we conclude the study in Section 5.

Related Work
At present, the main approaches of sentiment analysis can be divided into four categories: sentiment dictionary, machine learning, deep learning, and hybrid methods. e most common methods of sentiment analysis are to build extended dictionaries [2,3]. e concept of sentiment is considered based on the dictionary, and sentiment analysis is realized through sentiment embedding [4]. With the automatic construction of the domain dictionary, the context is considered, and thus the performance of the basic dictionary can be improved [5].
e work in [6][7][8][9][10] studied machine learning based sentiment analysis. Support Vector Machine (SVM) is used to achieve the sentiment analysis of images by combining the SentiBank which is formed using adjective-noun pairs [6].
elwall et al. [7] considered both positive and negative sentiment and proposed the SentiStrength algorithm to reduce affective disruption. In [8], the naive Bayesian network was established for sentiment evolution experiments. Yuan et al. [9] introduced the Sentribute algorithm for image sentiment classification, which constructs a sentiment prediction framework based on the image features. Chen et al. [10] expanded the review text using a knowledge map to improve the accuracy of sentiment analysis on online travel review texts.
CNN can effectively integrate multimodal information [11,12], which improves the sentiment analysis of the long text by increasing the convolutional layers [13]. In addition, the LSTM network pays attention to the semantic environment where the sentiment words are located [14]. As a result, the combination of CNN and LSTM network can improve the classification accuracy by integrating features with time factors [15]. e neural network model combined with an extended dictionary presents better performance than the approaches based on either the sentiment dictionary or the neural network alone [16]. Luo proposed to combine the neural network with Latent Dirichlet Allocation (LDA) model for network text sentiment classification in [17]. Gu et al. [18] analyzed the text sentiment of commodity evaluation, combining the neural network model with the semantic rules, context, and other factors. e improved artificial neural network (ANN) also has great advantages in research such as prediction [19].
ere are also many other methods of extracting terms for the specific sentiment classification [20]. Sheik et al. [21] proposed the sentinel method to establish the sentiment circle through the Cartesian coordinate system for the sentiment classification of sentences. Alzubi et al. [22] presented a Collaborative Adversarial Network (CAN) model for paraphrase identification. Bel'tyukov et al. [23] utilized logical formulas and logical reasoning to achieve emotional analysis. By fuzziness of sentiment words, Phan et al. [24] obtained the fuzzy embedding feature and effectively improved the F 1 value of Twitter sentiment analysis. Vashishtha et al. [25] defuzzified the output to achieve sentiment analysis, based on the sentiment intensity fuzzification and fuzzy inference rules.

Establishment of Exclusive Fuzzy Dictionary for
Microblog.
e comprehensive sentiment dictionary consists of seven separate dictionaries, among which the fuzzy dictionary is specifically established for microblogs and other dictionaries have general applicability. e crawled microblog text data is divided into two parts. One is for establishing the exclusive fuzzy dictionary while the other is for data testing. A total of 45,344 items are used to establish the fuzzy dictionary. e statistics for the data are detailed in Figure 1. e positive rate, the neutral rate, and the negative rate are 27%, 29%, and 44%, respectively, which strikes a balance on different emotions. e crawled data is preprocessed with word segmentation, of which results are used as the input of the TextRank algorithm.
e TextRank is a graph-based ranking algorithm. It constructs the network according to the adjacent relationship among words and iteratively calculates the ranking score (i.e., importance) of each node (i.e., word) [26]. e importance of the same word in a post is accumulated. e fuzzy dictionary is finally formed by removing the repetitive words. Table 1 details the pseudocode of the algorithm, where w represents the importance of a word and NUM denotes the total number of the posts.

Feature Extraction and Processing of Multimodal Data
3.2.1. Feature Extraction of Blog Images. Based on the intermediate features of images, this paper uses the visual-to-text conversion model proposed by Google in 2015 to extract image features in the CNN encoder. e extracted features are input into the LSTM decoder to obtain the text description feature of the image [27]. To adapt to the sentiment classification, the text sentiment is expanded by two sentiment knowledge maps, namely, SentiBank and SentiBridge [28], as shown in Figure 2. In addition, before the feature extraction of the image description, the text features and face features contained in the image are extracted first. e related techniques for extracting the image matching and the facial expression have been studied well. In this paper, the API interface is provided by Baidu to realize the extraction of two image features.

Video Features Extraction.
Video features consist of image and audio features. e image feature extraction in the video can follow the method discussed in Section 3.2.1. However, no matter how many seconds the video lasts and how many frames the video has, it can capture a huge number of still images from videos, and most of the obtained images are repetitive with each other. As a result, we filter the captured images with two steps before the image feature extraction starts.
Firstly, a YOLOv3 model is used to filter images containing similar objects.
e YOLOv3 model utilizes Darknet53 as the main network. It can identify the subjects and the number of objects in images, especially the small and medium objects [29]. After the images are selected by the YOLOv3 model, the differences among the rest of the images have little effect on the sentiment classification results. To this end, the second round of image selection is required. e Scale Invariant Feature Transform (SIFT) algorithm is adopted to detect the key points of the images, and the local similarity among the images is subsequently calculated [30]. e two-step  (13) pos � num (word in positive)/NUM (14) neg � num (word in negative)/NUM (15) other � 1-pos-neg (16) END if (17) Remove the common and incorrect words from FW to form a final fuzzy dictionary W with 593 words, as shown in Table 2    image selection can greatly reduce the number of images for feature extraction, and thus the efficiency of image processing is improved. For the audio feature extraction, the Mel-frequency cepstral coefficient (MFCC) is regarded as the most representative feature in literature [31]. In addition to the MFCC, another two audio features are considered in this paper: zero-crossing rate and spectral centroid. Specifically, the zero-crossing rate represents the number of zero-crossings in the signal spectrum, while the spectrum centroid denotes the texture of the audio. e two features can assist MFCC features with distinguishing the sentiment of audio. A 22-dimensional vector is introduced to represent the three considered audio features, by which an optimal random forest model is trained for the sentiment classification of video audio.

e Ambiguity of the Sentiment Intensity.
In this paper, fuzzy inference rules are used to fuzzify the sentiment intensity. Dual hesitant fuzzy is introduced for the sentiment intensity, and general fuzzy is considered as the output of the reasoning model.

Dual Hesitant Fuzzy Sets
Definition 1. Let X be a fixed set; then, a dual hesitant fuzzy set (DHF) D on X is described as [32] in which h(x) and g(x) are two sets of some values in [0, 1], denoting the possible membership degrees and nonmembership degrees of the element x ∈ X to the set D, respectively, with the conditions: . Note that this paper only deals with the dual hesitant fuzzification when the number of c, η is equal to 1, respectively. Moreover, π D denotes the uncertainty of the element x belonging to D in X, which is called the swing degree in this paper. en, we have

Sentiment Value Calculation.
In the fuzzy dictionary, the sentiment value is twice the strong ambiguity and equal to the weak ambiguity, which is calculated by pos (positive) and neg (negative). For the basic sentiment dictionary, the sentiment value is determined by pos_b and neg_b. Degree adverbs dictionary and negative words dictionary are the strengthening and weakening of sentiment intensity, of which values are denoted as C and N, respectively. Moreover, the text sentiment is calculated using E_pos and E_neg. e facial sentiment is obtained by F_pos and F_neg, while the speech sentiment is determined by A_pos and A_neg. Table 3 presents the calculations of sentiment values in different dictionaries, where k is the existence coefficient, namely, the number of the corresponding sentiment units. k is set to 0 if it does not exist. n and m in Table 3 represent the number of sentiment words with even and odd negative words in the text, respectively. S pos and S neg represent the sum of positive sentiment value and the sum of negative sentiment value calculated by fuzzy dictionary and basic dictionary, respectively. To simplify the determination of the membership function, the positive and negative sentiment values are unified. e total sentiment value of the blog is obtained according to the text and video sentiment values.

Membership Function and Nonmembership Function.
e sentiment intensity is defined by three different levels, that is, low, middle, and high. As mentioned above, we use dual hesitant fuzzy sets to fuzzify the sentiment intensity of posts. In this way, the membership and nonmembership functions are corresponded by the positive and negative sentiments for the three-level sentiment intensity, respectively. And they meet the conditions in Section 3.3.1. Specifically, the formula above and the formula below of equations (3)-(5) show the membership and nonmembership functions for the sentiment intensity with middle, low, and high levels, respectively.  [33]. Considering the situation of multimodal blog sentiment classification in reality, the fuzzy inference rules on sentiment intensity are given as follows:

MD-HFCE Sentiment
where R p and R n represent the membership degrees of the positive and negative sentiment intensity, respectively. r represents the fuzzy set of sentiment intensity, that is, [lowlevel, middle-level, high-level]. S denotes the last membership degree of multimodal blogs. s is the fuzzy set of blog sentiment classification, that is, [low-positive, strong-positive, neutral, low-negative, strong-negative]. Table 4 elaborates the nine fuzzy inference rules of the Mamdani model. To simplify the calculation, after the membership degree of each category is determined, the neutral membership degree is obtained by comparisons among different categories. erefore, in the membership function of the Mamdani output, y � 0 is the intermediate membership function of the model output results, as shown in Figure 3.

e Improved Fuzzy Comprehensive Evaluation Model.
A fuzzy comprehensive evaluation model is a comprehensive fuzzy method based on fuzzy mathematics. It can transform qualitative evaluation into quantitative evaluation using the membership degree theory of fuzzy mathematics. According to the classification characteristics of the sentiment intensity, the two-level fuzzy comprehensive evaluation model is utilized, which includes five steps:   nonmiddle, nonhigh]. at is, the positive intensity can be divided into low positive, middle positive, and high positive. Nonpositive intensity can be divided into nonlow positive, nonmiddle positive, and nonhigh positive. e same goes for negative and nonnegative. (2) Establish the two-level weight set: the weight set is the numerical reflection of the degree of influence of various factors on the classification result. e positively positive and negative factor in the primary factor can have a larger influence than the negatively positive and negative factor on the sentiment classification. As a result, the weight set of the primary factor A 1 is set to A 1 � [0.3, 0.3, 0.2, 0.2], and the weight set of the secondary sentiment factor A 2 is determined by its own swing degree, namely, π D in (2) in Section 3.3.1. e larger the swing degree is, the smaller the influence proportion is, that is, from the first-level fuzzy comprehensive evaluation of the second-level factors, the preliminary evaluation set j is obtained by where u q is the fuzzy possibility matrix of the relevant evaluation set corresponding to the secondary factor set. e matrix is calculated by Mamdani fuzzy reasoning, which is presented by the following. (5) Make the second-level fuzzy comprehensive evaluation: the comprehensive evaluation set J is composed of the preliminary evaluation sets j obtained by the first-level fuzzy comprehensive evaluation, by which the final evaluation result set is calculated. e possibility of the neutral evaluation is modified, and the sentiment classification results of multimodal blogs are determined according to the principle of the maximum membership, that is, Equations (9)-(15) are the process of preliminarily calculating the fuzzy set of the category of blog post sentiment through Mamdani, which is the basis for further calculation of the fuzzy possibility matrix u q . In, pos r i and neg r i correspond to the membership of positive and negative sentiment intensity of rules in Table 4, respectively. SC r i is the final sentiment classification in Table 4. W r i is the fuzzy set of the positive factors U 1 and U 2 , and u 1 , u 2 are the fuzzy possibility matrix sets formed by W r i . Npos_l, Npos_m, Npos_h, Nneg_l, Nneg_m, and Nneg_h represent the positive and negative membership with nonlow, nonmiddle, and nonhigh sentiment intensity, respectively. pos l r i , pos m r i , pos h r i , neg l r i , neg m r i , and neg h r i represent the positive and negative subordination degrees with low-level, middle-level, and high-level sentiment intensity, respectively. <W mN plow i , W hN plow i >, < W lN pmid i , W hN pmid i >, and < W lN phig i , W mN phig i > are Low Medium Low-negative r 3 Low High Strong-negative r 4 Medium Low Low-positive r 5 Medium Medium Neutral r 6 Medium High Low-negative r 7 High Low Strong-positive r 8 High Medium Low-positive r 9 High  the fuzzy sets of nonlow, nonmiddle, and nonhigh positive sentiment membership. <W mN nlow i , W hN nlow i >, <W lN nmid i , W hN nmid i >, and <W lN nhig i , W mN nhig i > denote the nonlow, nonmiddle, and nonhigh negative sentiments belonging to the fuzzy sets. Among them, i represents the order of the rules in Table 4: w mN nlow 5 + w hN nlow 9 w lN nmid 1 + w hN nmid 9 w lN nhig 1 + w mN nhig 5 3.5. Application. e first step is to select the monitoring topic, crawl the multimodal blogs under the selected topic in real time, conduct the above-mentioned sentiment analysis on the crawled blogs, and record the classification results. Subsequently, the real-time classification results are analyzed to form the trend curve of the topic sentiment, and the negative sentiment intensity is obtained. Eventually, we can determine whether the intensity of the negative sentiment exceeds the threshold. If the threshold is exceeded, the negative alerts notify the software to adjust the topic sentiment of public opinions in time.

Experimental Dataset.
e video-image filtering is based on the COCO image dataset, which has been widely used in object detection, target segmentation, subtitle generation, and other aspects. e dataset has about 330,000 images and Security and Communication Networks is marked by Microsoft in 2014 [34]. Flickr30K image dataset is used to generate image description features. It contains 30,000 images and each image is marked with five text descriptions [35]. Moreover, for video-audio sentiment classification, 10,500 Chinese items are selected from the emotional speech dataset (ESD) dataset, which owes a large number of English and Chinese sentiment speech [36]. Furthermore, 10,000 posts with multiple topics are crawled in microblog for experimental evaluation, including 4562 positive posts, 3636 negative posts, 1802 neutral posts, 4374 images, and 412 videos.

Experimental Implementation.
In this section, we take the post shown in Figure 4 as an example to show the steps of experiment implementation. e post is composed of three elements, that is, texts, emoticons, and videos, to imply the user's sentiment. Specifically, a total of 41 images are extracted from the video. Subsequently, following the image selection scheme, six of the extracted images are filtered in the preliminary selection, and two of the six images remain in the second round. e two images do not present the text and facial features, and their description features are similar to each other. Finally, the description feature is synthesized with sentiment expansion; that is, a sad dog is lying in a bright bathroom. Furthermore, the audio is analyzed by a random forest algorithm. It can be seen from Table 5 that the positive sentiment value of the multimodal blog is 5.69, and the negative sentiment value is 6.08. Figure 5 shows the membership degree and swing degree of positive, nonpositive, negative, and nonnegative sentiment intensity for Figure 4. Figure 6 is the results of the u q calculation. Figure 7 presents the fuzzy comprehensive evaluation results, which can be obtained by the maximum membership degree scheme, the weighted average method, the F distribution method, and other methods. Among them, the maximum membership degree method is the simplest and the most popular method. erefore, this paper uses the maximum membership degree method to deal with the evaluation results. e final sentiment of the blog is classified as low-level negative. As shown in Figure 7, the low-level negative membership degree has the highest value. Moreover, if the difference between the low-level negative and positive membership degrees is less than the threshold, the blog is regarded as a neutral category. It means the membership degree is 0 and vice versa. rough multiple experiments, it can achieve the best performance if the threshold is set to 0.003. e content of the blog post: "Very lucky, the kidney of Zheng does not fail, and it has been out of danger. My friend and I are going back to Litang first, as Bubu is waiting for us. A friend in Chengdu will help us to take care of it. It looked aggrieved and unhappy when we said goodbye. e poor dog really suffers but is very tough."

Experimental Comparisons Based on Different
Dictionaries. e proposed comprehensive dictionary is compared with HowNet [37] and National Taiwan University Sentiment Dictionary (NTUSD) [38] in Table 6. As shown in Table 7, HowNet contains 38 propositions, while NTUSD only includes sentiment words. It can be seen from Table 6 that the proposed comprehensive fuzzy dictionary can outperform the HowNet and NTUSD. In addition, the recall rate and F 1 value of HowNet are higher than those of NTUSD for both positive and neutral blogs. However, in terms of the negative blogs, the performance of NTUSD is slightly better than that of HowNet, since the number of the negative words is small.

Comparison of Methods.
To verify the proposed method, four baseline tests are conducted as follows: (1) Extended dictionary + semantic rules [2], which combines the extended dictionary with semantic rules, hereinafter referred to as Method 1. (2) Improved multilayer CNN network [13], which increases the number of convolution layers, hereinafter referred to as Method 2. (3) Dictionary + fuzzification, which integrates the sentiment dictionary with general fuzzification and is referred to as Method 3. (4) Fuzzy rules + Mamdani fuzzy reasoning [23], where Mamdani fuzzy reasoning is utilized and the scheme is referred to as Method 4. e method in this paper introduces hesitating fuzzy and fuzzy evaluation rules on the basis of the basic fuzzy method, so that the fuzzy limit is well controlled.
e experimental results also show that the method in this paper performs best among several methods, and the classification effect is better improved. And the classification result is more stable than method 2 with better accuracy.

Application Experiment.
In this section, the topic "#teachers discriminate students after comparing the income of their parents #" is studied for the sentiment orientation experiment. e blog posts under this topic are crawled every half hour, which obtains 253, 644, 1105, and 1512 blog posts in each crawling. Firstly, each crawled post is processed by the sentiment classification, and the proportions of the positive blogs and negative blogs are obtained, respectively. en, the proportions of low-level negative blogs and strong-negative blogs are calculated among the negative blogs. In this way, the proportion of negative sentiment blogs is monitored under the topic, which can decide whether the intervention is required. As shown in Figure 12, the number of positive sentiment blogs under the topic is not more than 12%. Although varying over time, the number of low-negative blogs is much larger than that of strong-negative blogs at each period. It implies that their sentiment intensity is not very strong, even though people have relatively negative attitudes. Moreover, the sentiment intensity is not strengthened over time and maintains the same level. As a result, the app administrator can continue to supervise and analyze the posts of this topic without any intervention.

Conclusions
is paper proposes an improved sentiment classification method MD-HFCE based on fuzzy improvement. e method normalizes multimodal features and standardizes fuzzification. e main task of this research is to establish a special Weibo fuzzy dictionary and the design of dual hesitation fuzzy inference classification rules. Experiments have verified that the method in this paper has achieved good results.
However, there are still many areas that need to be improved. First of all, the classification results of neutral blog posts are not ideal, and the model output membership function and determination threshold need to be further optimized and improved. Secondly, the emotion classification is single and not detailed enough.
In the future, after the method in this paper is further improved, it can be combined with neural networks and knowledge graphs to form a hesitant fuzzy network for direct conversion of fuzzy information and target recognition; in addition, the method combined with sentiment analysis can also be used in social software community discovery, emotional robot dialogue, and so forth.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.