Deep Learning Structure for Cross-Domain Sentiment Classification Based on Improved Cross Entropy and Weight

Within the sentiment classification field, the convolutional neural network (CNN) and long short-termmemory (LSTM) are praised for their classification and prediction performance, but their accuracy, loss rate, and time are not ideal. To this purpose, a deep learning structure combining the improved cross entropy and weight for word is proposed for solving cross-domain sentiment classification, which focuses on achieving better text sentiment classification by optimizing and improving recurrent neural network (RNN) and CNN. Firstly, we use the idea of hinge loss function (hinge loss) and the triplet loss function (triplet loss) to improve the cross entropy loss. *e improved cross entropy loss function is combined with the CNNmodel and LSTM network which are tested in the two classification problems.*en, the LSTMbinary-optimize (LSTM-BO)model andCNNbinary-optimize (CNN-BO)model are proposed, which are more effective in fitting the predicted errors and preventing overfitting. Finally, considering the characteristics of the processing text of the recurrent neural network, the influence of input words for the final classification is analysed, which can obtain the importance of each word to the classification results.*e experiment results show that within the same time, the proposed weight-recurrent neural network (W-RNN) model gives higher weight to words with stronger emotional tendency to reduce the loss of emotional information, which improves the accuracy of classification.


Introduction
Analysis of text emotional tendency, as an important research focus in the analysis of Internet public opinion, is mainly used to analyse and process subjective information, such as attitude, emotion, viewpoint, and tendency, in text. Sentiment analysis was first proposed by Pang et al. [1] for the positive or negative classification of movie reviews and Turney [2] for the positive or negative classification of cars and movies in 2002. Subsequent studies on sentiment analysis have been widely carried out for hotels, restaurants, product reviews, Weibo tweets, and other fields. Additional developments include positive or negative polarized classification methods [3], five classifications including ratings [4], and eight classifications including specific emotions [5].
Traditional sentiment analysis algorithms are mostly based on shallow machine learning, such as the maximum entropy model [6], conditional random field [7], support vector machine [8], and so on. With the increasing popularity of artificial intelligence, data-driven models have gradually become a focus on research of sentiment analysis models.
Deep learning algorithms have been widely used in the fields of speech, image, and natural language processing with their strong feature extraction and excellent information expression capabilities and have achieved better results than traditional models. In 1988, Rumelhart proposed the backpropagation neural network (BPNN) [6], which is a multilayer feedforward neural network (FNN) that uses the error backpropagation algorithm to adjust weights. It is the most widely used NN model. LeCun et al. [9] used various deep neural networks to train language models at large-scale corpus level and constructed a probabilistic language model based on deep neural network, which solves common natural language processing tasks such as sentiment classification and part-of-speech tagging. Chen et al. proposed a deep learning method of learning potentially complex and irregular probability distributions, which can accurately estimate the values of cumulative distribution function (CDF) and probability density function (PDF) [10].
In the sentiment analysis task, the deep learning algorithm has also been widely used. At the same time, some people use convolutional networks to solve problems with the field of natural language processing and have achieved excellent results in tasks such as semantic analysis, query retrieval, and text classification. Since text is sequence data, there is a close relationship between words and characters as well. In 2006, Hinton [11] proposed a method for extracting features to the maximum extent and efficient learning, which has become a hotspot in deep learning research. Due to the excellent performance of deep learning in many fields, many researchers have begun to use deep learning for text sentiment analysis. Due to the long-term dependence of the cyclic neural network on the processing of long text tasks and the temporal information about words with the text, the LSTM is used [12,13] for text emotion classification. Kennedy and Inkpen [14] considered the polarity transfer relationship of words in the text and determined the affective tendency by word counting based on the seed word set. Kim compared multiple deep learning models on multiple datasets and found that the experimental results of CNN were better than those of other methods [15]. Tang [16] considered the importance of user information and product information for sentiment classification, combined word vectors, user vectors, and product vectors at the input layer, and then used CNN for modelling and softmax for classification; the results were higher than those of the benchmark system at the sentence level and phrase level. e difference between the predicted value and the real value of the model is usually evaluated by loss function, which generally tends to be the objective function in the classification or regression algorithms [17]. e smaller the loss function, the more its model can reflect the real data [18]. e closeness between the actual and the expected output is determined by cross entropy, which is essentially a measure of the difference between two codes [19]. Cross entropy is often the final loss function of machine learning or deep learning [20]. e closer the predicted distribution is to the real distribution, the smaller the value will be. With the wide application of cross entropy, in 2020, Cui et al. [21] applied a new loss function to optimize the end-to-end network for the first time, which is composed of binary cross entropy and dice coefficient; the best performance indexes can be achieved, thus verifying the validity of the model.
Deep learning methods have been applied to cross-domain sentiment mining tasks successfully with excellent representation learning and high efficiency classification abilities. Zhao et al. [22] presented a two-stage bidirectional LSTM (Bi-LSTM) and parameter transfer framework for short text cross-domain sentiment classification tasks. In 2019, Dey et al. [23] explored a three-step methodology, in which distinct balanced training, text preprocessing, and machine learning methods were tested, using two languages: English and Italian. In [24], cross-domain-labeled Web sources (Amazon and Tripadvisor) are used to train supervised learning models (including two deep learning algorithms) that are tested for typically unlabelled social media reviews (Facebook and Twitter), whose train model is tested on Facebook data for both English and Italian. In weight computing, Dey et al. [25] calculated the sentiment score of the n-grams by using the individual sentiment scores of the unigrams and precalculated values of intensifiers and negations attached with it. ese scores are multiplied with the corresponding feature-importance value to generate the final score of SEND features of each review.
In the deep learning network model, the CNN has made great achievements in the field of image processing, whose convolution and pooling structure can extract image information very well. erefore, RNN is widely used as a neural network for processing sequence data in the field of text analysis. Because of its memory function, it is better at processing sequence-changing data, among which, the LSTM recurrent neural network solves the problem of gradient disappearance and gradient explosion in the recurrent network, which makes the analysis and modelling of long sequence data successful. is study focuses on the optimization and improvement of RNN and CNN to achieve better text sentiment classification. According to the characteristics and shortcomings of each deep neural network, the following three text sentiment classification models are proposed.
Based on the CNN model and LSTM network, the ideas of the hinge loss and the triplet loss are used to improve the cross entropy loss used in the two classifications problems. e LSTM binary-optimize (LSTM-BO) model and CNN binaryoptimize (CNN-BO) model are proposed which are more effective in fitting the predicted errors and preventing overfitting.
Considering the characteristics of the processing text of the recurrent neural network, the influence of input words of the final classification is analysed, which can obtain the importance of each word with the classification results. e proposed weight-recurrent neural network (W-RNN) model gives higher weight to words with stronger emotional tendency to reduce the loss of emotional information, which improves the accuracy of classification. e rest of the paper is organized as follows. Section 2 is about the deep learning structure of cross-domain sentiment classification. Section 3 elaborates our numerical example. Results and discussion are presented in Section 4, and Section 5 is the summary of our research work. e hinge loss function is a loss function in the machine learning field and can be used for the "max-margin" classification, often used to be the objective function of the SVM. Triplet loss is a loss function in the deep learning, which was originally proposed by Schroff et al. [26] to train less sensitive samples, such as face similarity measurement. e input of triplet loss is a triple a, p, n: a (anchor); p (positive) referring to a sample of the same category as a; n (negative) referring to a sample of a different category from a. e calculation of the sample similarity is achieved by optimizing the distance between a and p to be less than the distance between a and n. e formula as follows:

Deep Learning
So, the ultimate optimization goal is to shorten the distance between a and p and extend the distance between a and n. e results are divided into three cases: (i) Easy triplets: L � 0, that is, d(a, p) + margin < d(a, n) ; this situation does not need to be optimized and meets the requirement that the distance of a and p is close, and the distance of a and n is far. (ii) Hard triplets: d(a, n) > d(a, p), that is, the distance between a and p is far. (iii) Semihard triplets: d(a, p) < d(a, n) < d(a, p) + margin, that is, the distance between a and n is very close but has a boundary value margin.

e Improved Cross Entropy Loss
Function. e text sentiment analysis task is fundamentally a classification problem. For the classification model, there will exist a problem that the optimization goal and the evaluation index are inconsistent. In the two-category task, the model uses cross entropy as the loss function, whose source is the maximum likelihood estimate. However, the final evaluation goal of the sentiment classification task is the accuracy of the model, instead of the size of the cross entropy. Usually, the cross entropy is small and the classification accuracy is high, but this relationship is not necessarily true.
In the two-category task, due to problems such as model fitting ability and data category imbalance, it is difficult for the model to achieve the positive sample output to be 1 and the negative sample output to be 0. In the actual prediction, the model considers that when the classification result is greater than 0.5, it is a positive sample, and when it is less than 0.5, it is a negative sample. is means that the model can be selectively updated. erefore, an improved model is proposed in this paper: we set a threshold M, where M belongs to (0, 1). When the model's output of a positive sample is higher than M, or the output of a negative sample is lower than 1 − M, the model will not be updated; the model will be updated only when the output of a sample is between M and 1 − M, which can ensure that the model focuses on those samples that are not predictive. is can prevent the model from reducing the loss function and selecting those easy-to-fit samples to overtrain, making the model more effectively fit the samples whose prediction is wrong, thus improving the classification effect.
Based on the above model ideas, this passage references the thought of hinge loss and triplet loss to improve the loss function in the two-category model. e commonly used cross entropy loss function is formulated as follows: where y true is the actual output result and y pred is the expected value. Select a threshold M and introduce the unit step function θ(x): en, the new loss function is considered: where where L new adds corrections λ(y true , y pred ) to the cross entropy, which means that while entering a positive sample, y true � 1. Apparently, λ(1, y pred ) � 1 − θ(y pred − m) at this time; if y pred > m, λ(1, y pred ) � 0 will be established, and the cross entropy will automatically be 0 (reaching the minimum). On the contrary, if y pred < m, then λ(1, y pred ) � 1; at this time, the cross entropy is maintained. at is to say, if the positive sample is higher than m, then it will not be updated. If it is less than m, it will continue to update; similarly, the negative sample can be analysed. As a result, the conclusion is that if the output is already lower than 1 − m, then it will not be updated, and if it is higher than 1 − m, it will continue to update.

e LSTM-BO and CNN-BO Models.
e LSTM-BO and CNN-BO models are based on the LSTM and CNN in keras which are combined with improved cross entropy loss function described in Section 2.1.2. Recurrent neural network (RNN) refers to the network structure that processes input data sequences in the same structure over time [27]. e proposed RNN effectively solves the problem of processing sequence information. In traditional neural networks, nodes inside the hidden layer are connectionless and each output is independent of one another. However, in RNN, the hidden layer nodes are Scientific Programming connected to each other in the time dimension, and the input of each node includes not only the input currently input by the input layer but also the output information of the previous state of the hidden layer, that is, the network can remember the previous information, used to calculate the current output, as shown in Figure 1.

Weight-Recurrent
e characteristic that the RNN can recall the previous information is based on the hidden layer, which is constantly repeated as a memory unit and saves the information from the previous state. As a logical structure, the internal structure of the memory unit is shown in Figure 2. At the time t, weight W of the input x t and old information h t − 1 from the time t − 1 are processed by the self-joining matrix U through the hidden layer, whose sum adds the offset b together to obtain the output h t of the current state hidden layer via an activation function (such as, Tanh). e time t's and previous information continues to propagate through h t until the end which is the true output of the hidden layer. e above calculation process is formulated as

W-RNN Model for Cross-Domain Sentiment
Classification. It can be acknowledged that in the standard recurrent neural network, the calculation result of the recurrent unit at each moment is not utilized, but only passed to the recurrent unit at the next moment for another round of calculation until the last moment, whose output is used as the output of the recurrent layer. e calculation of the state of each moment depends on the previous moment, and the temporal order information is continuously retained by such dependence. In this section, combined with the characteristics of the recurrent neural network, the importance of each word will be obtained by analysing the influence of the input words on the final classification. Based on this thought, the W-RNN model is constructed, which provides higher weight for words with stronger emotional tendency and reduces the loss of text emotional information, thus improving the accuracy of text sentiment classification. Recurrent neural networks are one of the most important models of many sequence tasks. e common methods for text classification tasks are shown in Figure 3.
How to measure the importance of the impact of input w 1 , . . . , w n on the final classification results? Assuming that it is an emotional classification task, firstly, the words which have a more important impact on the final classification need to be found.
Because the state vector of the last step of the RNN (the vector represented by the orange shade in Figure 3) is passed to the subsequent classifier for classification, the state vector h n of the last step is a target vector. RNN is a recursive process, and h 0 , h 1 , . . . , h n− 1 is gradually approaching h n .
So, the distance ‖h n − h 0 ‖, ‖h n − h 1 ‖, . . . , ‖h n − h n− 1 ‖, 0 of the intermediate vector to the target vector can be considered in turn. From h i to h i+1 , because of the excessive consideration of the word w i+1 , it can be deduced that the distance between h i and the target vector should be ‖h n − h i ‖, but now the distance becomes ‖h n − h i+1 ‖, so we can use the difference ‖h n − h i ‖‖ − ‖‖h n − h i+1 ‖ to measure the impact of the word w i+1 on the final classification. If the difference is positive, it means that the introduction of w i+1 narrows the distance from the target, so it promotes the correct classification; otherwise, if the difference is negative, it means that it has a reaction to the classification; the larger the value, the greater the degree of effect. So, this indicator can be used to sort in descending order and get the importance of each word. is article excludes the effects of dimension by dividing by the norm of the target vector: 2.3. Algorithm Flow. According to the above model structure, we can get the algorithm flow of emotion classification by using W-RNN model (Algorithm 1). Batch_size is the scale, used to group C TR for training, and the left samples with less than Batch_size are grouped together; epochs represent the number of training iterations.
In this algorithm, firstly, distances are ranked according to h '. Secondly, every secondary feature c is calculated from the word vector. en, the output z is obtained by c, and the z value is used to update the weight. e W-RNN gives high weight to words with strong emotional tendency and reduces the function of words with weak emotion in sentences, which decreases the loss of text emotional information.

Experimental Environment.
e specific experimental environment configuration of this model is shown in Table 1.

Dataset
. Since the classification model may possess different adaptabilities of different languages and texts of different lengths, in order to verify the performance of the model, the experiment was tested with different types of datasets under several famous corpora. is Input layer Hidden layer Output layer Figure 1: e structure diagram of recurrent neural network model. experiment uses IMDB [28] English film review data and Netease news classified text data, covering different languages, different lengths, and different types of text classification tasks. e following two datasets are specifically described. e IMDB English film review dataset is data from Amazon's Internet Movie Database (IMDB), which includes a lot of information about the film, such as actors, film length, content introductions, ratings, and reviews. For the text classification task, the film review data used in this experiment distinguish positive review from negative review, that is, they contain two categories, which belong to the two-category sentiment analysis classification problem. e dataset contains a total of 50,000 comment text data, and its label distribution is balanced, that is, there are 25,000 positive reviews and 25,000 negative reviews. In addition, the dataset provides 50,000 unlabelled data for unsupervised learning.
e Chinese dataset is a collection of commodity reviews provided by Data Hall, which contains six aspects of comment data (books, hotels, computers, milk, mobile phones, and water heaters), a total of 21,107 text data, in which there are 10,428 negative data and 10,679 positive data. Figure 4 describes the above Chinese dataset.
In Table 2, the number of positive emotion samples and negative emotion samples and sum of samples for two datasets are given.

Data Preprocessing.
Text datasets used in the experiment are both in Chinese and in English. e Chinese dataset is a collection of commodity review corpus provided by Data Hall, which contains six aspects of comment data, a total of 21,107 text data. e English dataset is the IMDB film review dataset, with a total of 50,000 comment text data. In the sentiment classification task, the training data and test data of the two datasets are randomly generated in a ratio of 80 : 20. e preprocessing work on the data mainly includes the cleaning of invalid special characters and punctuation, the cleaning of common pause words in the language, and the segmentation of Chinese language using the jieba word segmentation tool based on python.
is experiment introduces Word2Vec as pretraining, aiming to construct the word vector. e appropriate word vector can improve the performance and calculation speed of the model. Every word vector dimension pretrained by Word2Vec is set to 50, with the window size set to 10 and trained by the skip-gram model. e training parameter settings of the Word2vec model are described in Table 3.

Superparameter Setting.
In the neural network model training process, the superparameter is a parameter that sets the value before the model training. Generally, the superparameter needs to be optimized, and a set of optimal superparameter is selected for the model to improve the performance and effect of the learning. e superparameter configuration of the model is shown in Table 4. Table 5 shows the superparameter settings of each model, among which the selection of superparameter is obtained by the optimization experiment of the bold font parameters of Table 6.
It can be seen from Table 6 that the LSTM-BO and CNN-BO models have the highest accuracy at the threshold M � 0.6, which are 82.14% and 88.74%, respectively; the loss rates are the same when M � 0.6 and M � 0.7, which are, respectively, 0.2267 and 0.1637. It can be seen from Figures 5 and 6 that with the increase of the M value, the accuracy of the two models generally shows a trend of increasing first and then decreasing, and the overall loss rate tends to decrease first and then rise. When the threshold M is 0.6, the accuracy of the LSTM-BO model reaches the peak value, and the loss rate reaches the minimum value; when the M value changes from 0.5 to 0.6, the accuracy rate is greatly improved, which is 7.96% higher; when the M value is changed from 0.9 to 1.0, the loss rate changes greatly, increasing by 0.4435. e accuracy and loss rate of the CNN-BO model are similar to those of the  (11) c ⟵ ExtractFeature (v)//extract secondary feature c from the word vector (12) z ⟵ Softmax (c)//Get the prediction results of samples by Softmax classifier (13) end for (14) Update (z, w, (b)//update parameters w and b of the model by backpropagation (15) end for (16) end for (17) output ⟵ Softmax (c) (24) end for ALGORITHM 1: Text sentiment analysis algorithm based on W-RNN. When M � 0.6, the accuracy reaches the peak and the loss rate is the smallest. When the M value is changed from 0.5 to 0.6, the accuracy rate is increased by 38.71%, and the loss rate is reduced by 0.1827. Based on the above analysis, the threshold M of this paper is 0.6. (ii) Influence of different loss functions on the model In Table 7, binary_crossentropy is a standard cross entropy loss function; binary-optimize is the loss function proposed in this paper; hinge is a hinge loss function, commonly used in SVM classifiers; mean_absolute_percentage (MAPE) is the average absolute percentage error loss function; mean_absolute_error (MAE) is the absolute value variance loss function. Figure 7 shows accuracy. e accuracy of the LSTM model using the hinge and MAPE loss functions is kept at 50% and the number of iterations is more than 5 times. Based on the above experimental results, the effectiveness of the improved loss function is proved.

(iii) e selection experiment of word vector dimension
In this experiment, the vector dimension of the selected words is 50, 100, 150, 200, 250, and 300, respectively. It can be seen from Table 8 that the LSTM-BO model has the highest accuracy of 82.48%, when the word vector dimension is 100 dimensions, and the loss rate is at least 0.2234. When the word vector dimension is 50 dimensions, the CNN-BO model has a maximum accuracy of 88.74% and a loss rate of 0.1637. e dropout technique weakens the coadapting property of adjacent elements in the same layer by randomly discarding certain elements in the previous layer during the training process. By using dropout, the overfitting phenomenon is significantly reduced, and thus it is widely used in the training process of deep learning. In order to study the impact of dropout on the training process, this experiment sets the value of dropout to a series of different values during each training process with other parameters fixed. e results are shown in Table 9.
It can be seen from Table 9 that when dropout was set to 0.2, the LSTM-BO model achieved the highest accuracy of 82.14%, the minimum loss rate of 0.2247, and the shortest time consumption. When dropout was set to 0.2, the LSTM-BO model has a maximum accuracy of 88.74% and the shortest time consumption. When dropout is 0.3, the lowest loss rate is 0.1590.

Comparison of Experimental Results.
In order to verify the validity of the LSTM-BO and CNN-BO models, based on the Chinese and English datasets, the results were compared with the emotional classification results of the benchmark models LSTM and CNN. e experimental results are shown in Table 10.
(1) Accuracy Analysis. Figures 8 and 9 show the accuracy of each model in the training set and test set sentiment classification tasks under different datasets. e horizontal axis represents the number of iterations and the vertical axis represents the accuracy. e blue curve represents the change in the accuracy of the training set, and the orange curve represents the change in the accuracy of the test set. Figure 10 shows a bar graph of the accuracy of the final test set for each model on different datasets. Figure 11 shows a plot of the change in the accuracy of each model in the 10 iterations of the English dataset. From the above experimental results, the following conclusions can be drawn: (i) Table 10 shows that there exist some differences in the experimental results of different datasets. e accuracy of LSTM and LSTM-BO models in Chinese datasets is 5.94% and 5.37% higher than that of English datasets, respectively. e accuracy of CNN and BO models on the Chinese dataset is 0.5% and 0.17% higher than the English datasets, respectively. (ii) It can be seen from Table 10 and Figure 10 that the CNN-BO model has the highest accuracy of emotional classification in the Chinese and English datasets compared with the other three models, which are 88.91% and 88.74%, respectively. e accuracy of the LSTM-BO model is 1.16% and 0.59% higher than that of the benchmark LSTM model in the English dataset and the Chinese dataset, respectively. e accuracy of the CNN-BO model is higher than that of the benchmark CNN model in the English dataset and the Chinese dataset which is increased by 0.5% and 0.07%. (iii) As can be seen from Figures 8 and 9, the accuracy of the four models on the training set increases slowly with the number of iterations, and there is a significant change between the first and second iterations, and eventually it stabilizes. However, the increase in the number of iterations cannot increase the accuracy in the test set signficantly. e fluctuation exists in the test, especially the LSTM and CNN models fluctuate greatly in the test. It can be seen from Figure 11 that in the English dataset, the LSTM-BO model has high accuracy after the second iteration, and the CNN-BO model has high accuracy after the fourth iteration. e analysis results in this paper are that LSTM-BO and CNN-BO models can more effectively fit the samples that predict errors, help  (2) Loss Rate Analysis. Figures 12 and 13 show the variation curves of the loss rates of the models in the training set and test set sentiment classification tasks in different datasets, respectively. e horizontal axis represents the number of iterations and the vertical axis represents the accuracy. e black curve represents the change in the training set loss rate, and the yellow curve represents the test set loss rate change. Figure 13 shows a plot of the change in loss rate for each model over 10 iterations in the English dataset. From the above experimental results, the following conclusions can be drawn: (i) It can be seen from Figures 12 and 13 that the loss rate of the four models in the Chinese and English training sets decreases with the increase of the number of iterations, but the change of the model on the test set is more obvious. e loss rate of the LSTM model and the CNN model in the test set fluctuated greatly and showed an upward trend. e loss rate of the LSTM-BO and CNN-BO models on the test set slowly decreased and finally tended to be flat. (ii) It can be seen from Figure 14 that after the second iteration of the English dataset, the loss rate of each model has a large trend, and the loss rate of LSTM and CNN models increases with the number of iterations and finally ends after the10th iteration; they were 0.8170 and 0.5588, respectively. e LSTM-BO and CNN-BO models slowly decrease as the number of iterations increases, eventually reaching a steady state. e loss ratios of the LSTM-BO and CNN-BO models were reduced by 0.5903 and 0.3951, respectively, over the benchmark LSTM and CNN models. e analysis results in this paper are that the improved model has better generalization ability, which can converge after multiple iterations and achieve lower loss rate.
(3) Time Performance Analysis. Figure 15 shows the time consumption of each model in the Chinese and English datasets. As can be seen from the figure, the LSTM-BO is reduced by 162 seconds and 51 seconds, respectively, in the Chinese and English datasets compared to the LSTM model; CNN-BO is reduced by 128 seconds and 202 seconds, respectively, compared to the CNN model. e analysis results in this paper are that for the LSTM-BO and CNN-BO models, the samples will not be updated when the predicted value of the positive sample is higher than M or the predicted value of the negative sample is lower than M, focusing on those whose predictions are not accurate, thus reducing the    consumption of time. e calculation amount is decreased with the preprocessing of Chinese data by removing the useless words and punctuations, but the same operation has not done on the English data.

Analysis of Results Based on W-RNN Model.
To qualitatively and quantitatively evaluate the W-RNN model proposed in this section, this experiment compares the effects of different models in the emotional analysis task under the Chinese and English datasets. e specific method is as follows: for the quantitative evaluation experiment, some data are selected from the Chinese and English datasets as the training set, the classification model is trained, and finally the emotional classification task is completed in the test set to measure the accuracy; for the qualitative evaluation experiment, the emotional weight calculated by the analysis model is to verify the validity of the model.

Results and Discussion for Qualitative Experiments.
In the qualitative analysis experiment, firstly, the classification model is trained in the training set. en, we randomly select 3 comment corpora under the English and Chinese datasets and process them by the trained W-RNN model and generate the results in the table. In the result, the first component in the brackets indicates the position of the word after the word segmentation, the second component indicates the word segmentation, and the third component indicates the weight of the word. For example, the first component in the first row of the table: (7, "poor", 0.36663848) is the result of word "poor", in which 7 means "poor" is the seventh position after sentence participle, "poor" means the word itself, 0.36663848 indicates the weighting factor of "poor". e following results can be obtained from the above experimental results: (i) It can be seen from Table 11 that the W-RNN sentiment classification model proposed in this paper ranks the words with strong emotional tendencies in the front and gives higher weights. For example, in the second sentence, the word "good" is given a weight of 0.2046071, and the word "very" in the fourth sentence is given a weight of 0.17955697. (ii) e evaluation scheme of such importance will automatically consider the influence of the position of the words. If an emotional word is repeated in the sentence, the words appearing later are generally weighted lower.

Results and Discussion for Quantitative Experiments.
In order to verify the effectiveness of the W-RNN model, based on the Chinese and English datasets, the results are compared with the emotional classification results of the benchmark model RNN. e experimental results are shown in Table 12.
(i) Accuracy analysis Figure 16 shows the performance of each model in the training set and test set sentiment classification tasks in different datasets. e horizontal axis represents the number of iterations and the vertical axis represents the accuracy. e blue curve represents the change in the accuracy of the training set, and the orange curve represents the change in the accuracy of the test set. Figure 17 shows a bar graph of the accuracy of the final test set for each model on different datasets. e following results can be obtained from the above experimental results: (i) Table 12 shows that the experimental results of different datasets have certain differences. e accuracy of RNN and W-RNN models in Chinese datasets is 6.53% and 8.12% higher than that of English datasets, respectively.   (ii) Figure 16 shows that the accuracy of the two models in the Chinese and English training sets increases slowly with the number of iterations and eventually stabilizes. e accuracy rate is flat on the test set. After 10 iterations, the accuracy of the W-RNN model over the benchmark RNN model in the English dataset and the Chinese dataset increased by 1.56% and 3.19%, respectively. e analysis result of this paper is that W-RNN model can analyse the influence of input words on the final classification, assign higher weight to words with stronger emotional tendency, and reduce the loss of emotional information, thus improving the accuracy of text sentiment classification.
(ii) Loss analysis    iterations in the Chinese dataset is 0.4616, which is 0.2332 lower than the RNN model. e analysis in this paper can prevent the overfitting of the model to a certain extent, and it can effectively extract the text features and reduce the loss.
(iii) Time performance analysis Figure 19 shows the time consumption of the two models in the Chinese and English datasets. It can be seen from Figure 19 Table 13. Table 13 shows the accuracy, loss rate, and time performance of the W-RNN-BO and W-RNN models in the Chinese and English datasets; Figure 20 shows a line graph showing the change in the accuracy of each model as the number of iterations increases in the Chinese and English datasets.
e following results can be obtained from the above experimental results: (i) In terms of accuracy, it can be seen from Figure 20 that the accuracy of the W-RNN-BO model is higher than that of the W-RNN model after the second iteration in the Chinese and English datasets. And after the 10th iteration, the W-RNN-BO model is 0.59% and 0.78% higher than the W-RNN model in the Chinese and English datasets, respectively. (ii) In terms of loss rate, it can be seen from Table 13 that the loss convergence effect of the W-RNN-BO model is better. After the 10th iteration in the Chinese and English datasets, the loss rate is reduced to 0.1405 and 0.1430, respectively, which is lower than W-RNN of 0.3211 and 0.3070.

Conclusion
In order to solve the shortcomings of traditional deep neural networks in sentiment analysis tasks, three emotion classification models are proposed in this paper, based on deep neural networks. Firstly, based on the LSTM and CNN models, the traditional cross entropy loss function is improved. e LSTM-BO and CNN-BO models are designed so that the improved model can more effectively fit the prediction error samples and prevent the overfitting phenomenon. In addition, combined with the characteristics of the circulating neural network, by analysing the influence of the input words on the final classification, the importance of each word to the classification results is obtained, and the W-RNN model is constructed. e model gives higher weight to words with stronger emotional tendency and reduces the loss of emotional information. In order to verify the effectiveness of the three sentiment classification models, qualitative and quantitative sentiment analysis experiments of two kinds of datasets in Chinese and English were designed. e experimental results show that the three models proposed in this paper improve the accuracy of text sentiment classification to a certain extent and also perform better in loss rate and time performance.
In the next work, we will consider the characteristics of CNN with its extraction of text features and RNN with its capacity on series tasks, which can be combined with selfattention to construct a better model for text feature extraction and classification.

Scientific Programming
Data Availability e data used to support the findings of this study not available in [28] can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.