Adaptive Learning Emotion Identification Method of Short Texts for Online Medical Knowledge Sharing Community

The medical knowledge sharing community provides users with an open platform for accessing medical resources and sharing medical knowledge, treatment experience, and emotions. Compared with the recipients of general commodities, the recipients in the medical knowledge sharing community pay more attention to the intensity or overall evaluation of emotional vocabularies in the comments, such as treatment effects, prices, service attitudes, and other aspects. Therefore, the overall evaluation is not a key factor in medical service comments, but the semantics of the emotional polarity is the key to affect recipients of the medical information. In this paper, we propose an adaptive learning emotion identification method (ALEIM) based on mutual information feature weight, which captures the correlation and redundancy of features. In order to evaluate the proposed method's effectiveness, we use four basic corpus libraries crawled from the Haodf's online platform and employ Taiwan University NTUSD Simplified Chinese Emotion Dictionary for emotion classification. The experimental results show that our proposed ALEIM method has a better performance for the identification of the low-frequency words' redundant features in comments of the online medical knowledge sharing community.


Introduction
More and more comments, opinions, suggestions, ratings, and feedback are produced on social networks with the rapid development of the Internet [1]. While those on social networks are meant to be useful, this part of the contents requires adopting text mining and emotion analysis techniques. Until now, emotional analysis and evaluation still face several challenges [2], which are shown in Table 1. ese challenges become obstacles to accurately analyze emotional polarity.
In recent years, more and more research has been done on emotion analysis. Among them, unstructured natural language texts have received the widest attention of scholars [9]. Emotion analysis is the inference of users' opinions, positions, and attitudes through written or spoken contents [10]. Solving emotion analysis tasks typically uses dictionary-based and learning-based approach [11,12]. e dictionary-based approach analyzes the relevance of each word to a particular emotion by using the predefined dictionary [13]. Learning-based methods typically use labeled samples to train the specific-purpose models under supervision [14].
Emotional analysis is increasingly used to analyze human emotions, but the fatal shortcoming of current emotion analysis methods is lack of aspect level granularity improvement, and also they are rarely applied to online knowledge communities, especially medical knowledge communities, so it is necessary to find an emotional classification method for medical knowledge communities. In light of these considerations, we propose an adaptive learning emotion identification method (ALEIM) based on mutual information feature weight, which captures the correlation and redundancy of features. Its effectiveness is verified on the datasets crawled from the Haodf's online platform, in which the eigenvalues corresponding to the feature nouns are assigned according to the emotional dictionary NTUSD compiled by Taiwan University. Finally, the experimental results show that our proposed ALEIM method achieves a better performance. e remainder of this paper is organized as follows. Section 2 reviews the related work of our study. Section 3 presents our proposed ALEIM method, which contains problem description and assumptions, feature selection based on mutual information, and emotional polarity selection based on mutual information weight. Section 4 presents the datasets, evaluation measures, experimental performance, and the discussion. Finally, Section 5 presents the conclusions.

Related Work
2.1. Feature Extraction. Natural language processing and text analysis techniques are used to extract emotion features in emotion comments [9]. However, the feature selection method based on mutual information is developed to obtain the true feature, which is an information entropy estimation method independent of classifiers and datasets and superior to other feature extraction methods [15,16]. A redundant algorithm for constructing the mutual information feature subset was proposed and used to improve the emotion classification accuracy [17]. e maximal relevance and minimal redundancy (mRMR) algorithm was proposed on the basis of the principle of mutual information, which was compared with the SVM classification [18,19] and the recommended three ratio classification methods; the proposed accuracy is superior to traditional method, and recognition speed is faster than the intelligent method [20].

Emotion
Analysis. Emotion analysis has been widely used in many fields [21,22], such as consumer management, precision marketing, social network, etc. Unsupervised learning algorithm and the foremost supervised learning algorithm were used to classify emotion polarity of comments [23]. Moreover, emotion analysis is divided into many levels: document level [24], sentence level [25], word/term level, or aspect level [26].
Until now, the emotion classification methods can be roughly divided into three fields: machine learning methods, emotion dictionary-based methods [27], and deep learning emotion classification approaches [28]. Some common classifiers for machine learning methods are decision trees [29], Bayes [30], and support vector machines [31]. Emotion dictionary-based approach is to achieve classification by using the different granularity of emotion words polarity. e common emotion lexicons include the following: SentiWordNet [32], General Inquirer [33], SenticNet [34], Opinion Lexicon, HowNet Emotional Dictionary, Subjective Lexicon, DUTIR emotional vocabulary ontology library, and NTUSD [35]. However, it is very difficult to construct a complete emotion dictionary, which may have polarity of all emotion words. erefore, it is necessary to obtain the polarity of emotional words by context. Deep learning emotion classification approaches are usually used to achieve emotion classification at aspect level. In terms of natural language processing, deep learning has far superior performance to machine learning [18], and it has been proved in the fields of text recognition [36] and semantic mining [37]. More recently, deep learning, especially convolutional neural network is widely used to improve the emotion analysis accuracy [38][39][40].

Problem Description and Assumptions.
Let the basic corpus denote Θ � (U, A, V, f), the domain U indicate the source review set exists N comments, U � u i | i � 1, 2, . . . , N , u i be the nth comment, and N be the total number of comments. e feature noun set of comments denotes A � a j | j � 1, 2, . . . , J , a j is the jth comment feature, and J is the total number of feature noun. Among them, the overall characteristics of the review (patient satisfaction, efficacy) are also known as the identification category, which is recorded as c.
e range of eigenvalues is V; it forms an information function with U and A: f : is the eigenvalue vector of the comment u i and V (u i ) � V ij | j � 1, 2, ..., k, j ≤ k , and k is the number of eigenvalues of the feature noun a i . V ij is the jth eigenvalue of the u i comment (the eigenvalue is related to the adjective corresponding to the noun). e new comment is recorded as T; the comment feature matrix can be defined as e data in the comments are multiisomerized, so it is necessary to normalize the eigenvalues.
We number all the adjectives contained in each feature and substitute the number as eigenvalues into the matrix for subsequent calculations.
Let M(c; f λ ) be the jth eigenvalues of the comment u i ; then, V is converted to V * .

Author
Year Domain oriented Challenge type Review structure Jia et al. [3] 2009 Health/medical domain eoretical Semi-structured Hogenboom et al. [4] 2011 Movie reviews eoretical Unstructured Alexandra and Ralf [5] 2009 Online news reviews eoretical Semistructured/unstructured Mukherjee and Bhattacharyya [6] 2012 Products Technical Semistructured Chetan and Atul [7] 2014 Tweets Technical Unstructured Doaa and Osama [8] 2015 Scientific papers eoretical + technical Structured 2 Computational Intelligence and Neuroscience Due to the uncertainty of the adjective language selection in the commentary library, the probability is used to describe its distribution characteristics. P il is the probability of feature a i values v (u i ) l ; after the commentator's emotional polarity is determined, the word is uncertain, and use the probability to eliminate the influences of the commenters' decision. e uncertainty of the emotional polarity of comments is concentrated in the feature redundancy of the comment set. Mutual information can effectively measure the redundancy between variables in a feature set. It is thus possible to find a set of input features that has a large mutual information value with the identification category and low redundancy between other features. e feature Relation-Redundancy Coefficient (R 2 C) is used for discrimination considering both the range of feature values and the distribution of values.
In the feature selection process, the joint action of multiple candidate features on the category c, due to the redundancy. In this paper, the redundancy between a ƛ and selected feature S and the redundancy between all features in S are collectively referred to as the redundancy of the feature, denoted by M(c; a ƛ ; S). e eigenvalue number of feature a i is k; then, its information entropy can be denoted as If a i ∈ A, a j ∈ A, and a j ≠ a i , according to the joint distribution rate, the conditional entropy can be denoted as In Θ, the mutual information between a i ∈ A and a j ∈ A in feature set A can be expressed as e larger M(a i ; a j ) is, the closer the relationship between the feature random variables a i and a j is; when M(a i ; a j ) approaches zero, the two are independent of each other.
e relationship between mutual information and information entropy can be expressed as

Feature Selection Based on Mutual Information
Definition 2. Let Z s be the ratio of the mutual information M(c; a s ) between selected feature a s and identification category c to the information entropy E(a s ) of the feature a s ; then, Z s � M(c; a s )/E(a s ), 0 ≤ Z s ≤ 1. Z s meets the following characteristics: (i) When the range of feature values is the same, the more uniform the value is, the less important it is (ii) When the feature values are evenly distributed, the larger the value range is, the less important it is (iii) en, the mutual information formula of feature redundancy in the MIFS-U method is expressed as e ratio of mutual information between maximum correlation and minimum redundancy denotes the ratio of feature correlation and redundancy.
δ is a constant used to measure the influence degree of redundancy between features in the feature set on classification accuracy, and it can be set according to the actual situation. e parameter called the feature Relation-Redundancy Coefficient (R 2 C) that measures the redundancy of the selected feature set is expressed by a nonnegative number R: others.
In Θ, the Relation-Redundancy Coefficient has the following four effects: (i) When R � 0, the correlation of candidate features a ƛ and the identification category c is zero; a ƛ is an irrelevant feature of Θ. (ii) When 0 < R < 1, the redundancy of the candidate features a ƛ and a s is stronger than a ƛ and other features; then, it is a redundancy feature. (iii) When R > 1, the correlation between the candidate feature a ƛ and the identification category c is stronger than the redundancy of the candidate features a ƛ and a s and brings new information for classification; then, it is called the correlation feature. Here we set a threshold θ (θ > 1) based on the actual values for the correlation features. e features are strong correlation features when R ≥ θ; otherwise, they are weak correlation features. (iv) When R � ∞, it only needs to analyze the mutual information M(c; a ƛ ) between a ƛ and the identification category c; the corresponding a ƛ of the maximum value R can be selected into the set S.

Computational Intelligence and Neuroscience
According to the abovementioned effects, the optimal feature set S � a j | j � 1, 2, . . . , φ including the φ features is finally obtained.
Given the mutual information and redundancy of the features, the empirical index α is given by the expert. Using the mutual information method to obtain the comprehensive weight w j of the feature a j in the comment space Θ, As an important parameter of the model, w j plays an important role in the accuracy of the classification.

Emotional Polarity Selection Based on Mutual Information Weight.
Based on the corpus data in the basic database, we obtain the optimal subset with the least feature redundancy and the relative weights of each feature in the feature set and calculate the emotion value of the unmarked corpus in the marked feature based on this weight. e specific steps are as follows: (i) Extract the emotional words from unmarked corpus and convert them to a basic corpus. (ii) According to the basic corpus, the optimal features including weights that remove redundant features are filtered out. (iii) e eigenvalues corresponding to the feature nouns are assigned according to the emotional dictionary NTUSD compiled by Taiwan University; the positive word is assigned 1, the negative word is assigned −1, and the emotional value according to the weight is calculated (which ignores the impact of the adverb or grammatical structure for the emotional value) and the emotional threshold based on the basic corpus is set. (iv) Judge the polarity and accuracy of the test corpus according to the weights and emotional thresholds based on the training library.

Numerical Experiment
Our experimental analysis is performed between mutual information method and emotion lexicon, TI-IDF and SVM. Using four datasets crawled from the Haodf's online platform to evaluate the performance of our proposed ALEIM method, the experiment is divided into four aspects:

Datasets.
e experimental datasets are crawled from the Haodf's online platform. ese medical service comments are extracted using the octopus, and then the word segmentation is reorganized using Java programming, and each sentence in the comment is split into the metamatrix structure of "noun + verb." We first select 100 doctors and randomly collect 750 data in their comment areas and construct four basic corpus training libraries with different comments based on the above data, which is shown in Figure 1. e number of positive and negative comments in the four basic corpus training libraries varies, and the positive comments ratio is higher than the negative ones. Due to the random extraction of the comment data as a corpus training library, the distribution of positive and negative comments in the training library is uncertain. Such randomly extracted data are used as training corpus data, which can test not only the dependence of different classification algorithms based on different category numbers but also the learning ability of the specific marked category based on a small sample. 400 data are prepared as the test data in Table 2, including 200 positive comments and 200 negative comments to test the accuracy of the training library for emotion classification under different algorithms.
When feature extraction is performed, the features extracted from the corpus with 100 data are all included in the other volumes corpus; the features extracted from the corpus with 150 data are all included in the corpus of 200, 300, and 400 data; the corpuses of 200 and 300 data extract the same features; the feature number extracted from the test corpus with 400 data is 42, and an additional feature is extracted from the corpus with the 200 and 300 data.
Since the data are randomly crawled, the corpus has low data repeatability between each other, so it can approximate that the probability of new features appearing decreases rapidly as the selected comment corpus data increase. erefore, the amount of comment data for a suitable training corpus is determined, and the extracted features can contain almost all the features included in the medical comments (some special features extracted by small probability often not related to the medical service itself ). is indicates that the features in comments often have limitations compared with traditional commodity comments due to the uniformity and standardization of medical services. e general commodity comments are not fixed due to the feature attributes of products; the products are highly different, and different products often contain unique features, which often affect the overall polarity of the comments. erefore, commodity comments have high requirements for feature extraction, and it is necessary to continuously update the extracted features based on a large amount of data to achieve accurate classification of emotional polarity. Since the medical service does not have the variability of general commodity, the features of the comments are limited, so selecting a certain amount of data extract features can almost involve all the features in the medical service comments.

Experimental Design and Evaluation Measures.
We employ Taiwan University NTUSD Simplified Chinese 4 Computational Intelligence and Neuroscience Emotion Dictionary Corpora for emotion and emotion classification. e overall flowchart of experiments in this paper is shown in Figure 2. e SVM and feature weight algorithm used in this experiment are implemented by using MATLAB. Among them, the mutual information algorithm and the IDF algorithm calculate the feature weight by using the basic corpus and then combine the emotion dictionary NTUSD to calculate the emotion value of the corpus in the training library and set the emotional threshold according to the corpus data (calculate the positive and negative comments, respectively, and then use the weighted average of the two types emotional mean as the emotional threshold). e emotional polarity of the test corpus based on the feature weight and the threshold is judged. We have selected the following indicators as evaluation indicators: . (11) e precision reflects the proportion of the true positive sample in the positive case determined by the classifier and can be expressed as . (12) e recall reflects the proportion of positive cases that are correctly judged as the total positive examples and can be expressed as Figure 3 shows that the accuracy of the classification algorithm of IDF and mutual information considering the feature weight and the emotion dictionary-based classification algorithm are significantly higher than the SVM algorithm using the Gaussian kernel function for four basic corpus libraries. As the number of samples increases, the accuracy of emotion lexicon maintains constant basically. However, as the number of samples increases, the accuracy of mutual information    Used for  100  70  30  37  Training corpus  150  100  50  39  Training corpus  200  120  80  41  Training corpus  300  180  120  41  Training corpus  400  200  200  42  Test corpus   Computational Intelligence and Neuroscience  5 increases rapidly and is higher than the other three methods. As can be seen from Figure 3, the performance of mutual information method is better than the other three methods. SVM algorithm requires that the number of different types in the training database must be substantially the same to achieve optimal learning. However, the online medical service comments have a large proportion of positive and negative polarities; support vector machine algorithm is difficult to achieve the optimal data ratio. Constructing the training library according to the actual ratio often leads to the identification of negative polarity data with less proportion and leads to lower overall accuracy. Table 3 illustrates the detailed significant test results of accuracy between mutual information and other three methods in terms of the p value on the four basic corpus libraries. As can be seen from the table, the mutual information method is superior to the other three methods on 150 data, 200 data, and 300 data. e results show that when the sample size increases, p values between mutual information and other three methods are less than 0.05. is means the classification results of mutual information method are significantly better than the other three methods. Figure 4 shows that the precision of the classification algorithms of IDF and mutual information considering the feature weight are slightly higher than the other two algorithms. e mutual information algorithm has lower precision when the training data are less, and the precision is improved with the training data increase but is slightly lower than the IDF weighting algorithm. Table 4 illustrates the detailed significant test results of precision between mutual information and other three methods in terms of the p value on the four basic corpus libraries. As can be seen from the table, there is a significant difference among the mutual information method, emotion lexicon, and SVM methods because p values between mutual information and other two methods are less than 0.05, but when the number of samples increases, there is no significant difference between the mutual information method and TI-IDF method. Figure 5 indicates that our proposed algorithm which considers the weight of each feature has the superior performance than other two comparison approaches. Since the negative emotion polarity data of the training inventory are less, the recall of the other two algorithms is extremely low, and the weight of the feature weight algorithm is not dependent on the weight of the data category, so the learning effect on the limited negative polarity data is better, and the recognition of the negative emotion data in the test data is higher. e recall rate of the mutual information algorithm is significantly higher than that of the IDF algorithm. It shows that the mutual information algorithm considering the feature weight has strong recognition ability for negative emotion. Table 5 illustrates the detailed significant test results of recall between mutual information and other three methods in terms of the p value on the four basic corpus libraries.        From this table, it can be seen that there is a significant difference among the mutual information method, emotion lexicon, and SVM methods because p values between mutual information and other two methods are less than 0.05, but there is no significant difference between the mutual information method and TI-IDF method. Figure 6 shows the comparison of 41 feature weights in 300 training library corpora under mutual information weighting algorithm and TI-IDF algorithm. It can be seen that the weights of the two algorithms of feature 1, feature 25, feature 35, and feature 5 and feature 41 are quite different, corresponding to the condition, attitude, doctor, and side effects and consultation. e mutual information algorithm weights are significantly higher than the IDF weights for the first three features. ese three features are commonly found in medical comments. IDF algorithm believes that these comments with high frequency are of lower importance and are filtered to give small weights, while mutual information algorithm according to the high mutual information value and low redundancy of the identification feature gives the high weight, and such weight causes the mutual information algorithm to have a lower accuracy than the IDF in identifying the positive emotional polarity. ese features are used as the basic features of comments; it tends to have a lower guiding effect on the emotional polarity of the reviewer in the positive comments and a primary role in the orientation of the emotional polarity in the negative comments.

Implementation Details of Experiments.
In the latter two features, the mutual information algorithm weight is significantly lower than IDF algorithm. ese features belong to the low-frequency features and appear 6 times and 7 times in 300 data, respectively. e IDF algorithm assumes that low-frequency words can more affect the emotional polarity of the comments for the comment library as a whole, and the mutual information algorithm considers these features to be small mutual information values with high redundancy and low weight. e experiments show that these two features actually weaken the emotional polarity of the comments. e IDF algorithm classifies all the errors in the six test comment categories with the above two features, and the mutual information recognition rate is 100%.

Discussion of Experiments.
From the above experimental analysis, we can obtain that mutual information is the most appropriate method to solve such problem. It shows good performance in terms of accuracy when the number of samples increases and only requires a moderate computational cost for solving emotion classification problems of short texts for online medical knowledge sharing community. However, in terms of precision and recall, there is no significant difference between the mutual information method and TI-IDF method, but Figure 6 shows that the accuracy of IDF algorithm in identifying negative emotion polarity is significantly lower than that of mutual information algorithm. Experiments show that low-frequency words existing in medical review data are often redundant features, and mutual information algorithm has higher accuracy for the identification of such redundant features. However, our experiments need to be further improved due to only four basic corpus libraries involved in the experiment. erefore, we plan to crawl more different types comments on online medical knowledge sharing community to achieve parameter optimization and method performance promotion.

Conclusions
Emotion analysis has been widely used in many fields and becomes an important tool for extracting emotional information of the comments. Emotional analysis in medical knowledge sharing community is still relatively lacking compared with the general commodities. e information recipients in the medical knowledge sharing community are more concerned with the intensity of the emotional words in the comments or the overall evaluation. In this research, we propose an adaptive learning emotion identification method based on mutual information feature weight, which captures the correlation and redundancy of features. Its effectiveness is verified on the dataset crawled from the Haodf's online platform, and we employ Taiwan University NTUSD Simplified Chinese Emotion Dictionary Corpora for emotion classification. Finally, the experimental results show that the proposed ALEIM method can achieve good performance, especially in terms of the low-frequency words feature extraction in comments of the online medical knowledge sharing community.

Data Availability
e experimental data come from the Haodf's online platform and can be crawled from https://www.haodf.com.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.  1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Mutual information TI-IDF Figure 6: e difference between mutual information algorithm and TI-IDF algorithm.