Text Sentiment Analysis Based on a New Hybrid Network Model

The research of text sentiment analysis based on deep learning is increasingly rich, but the current models still have different degrees of deviation in understanding of semantic information. In order to reduce the loss of semantic information and improve the prediction accuracy as much as possible, the paper creatively combines the doc2vec model with the deep learning model and attention mechanism and proposes a new hybrid sentiment analysis model based on the doc2vec + CNN + BiLSTM + Attention. The new hybrid model effectively exploits the structural features of each part. In the model, the understanding of the overall semantic information of the sentence is enhanced through the paragraph vector pretrained by the doc2vec structure which can effectively reduce the loss of semantic information. The local features of the text are extracted through the CNN structure. The context information interaction is completed through the bidirectional cycle structure of the BiLSTM. The performance is improved by allocating weight and resources to the text information of different importance through the attention mechanism. The new model was built based on Keras framework, and performance comparison experiments and analysis were performed on the IMDB dataset and the DailyDialog dataset. The results have shown that the accuracy of the new model on the two datasets is 91.3% and 93.3%, respectively, and the loss rate is 22.1% and 19.9%, respectively. The accuracy on the IMDB datasets is 1.0% and 0.5% higher than that of the CNN-BiLSTM-Attention model and ATT-MCNN-BGRUM model in the references. Comprehensive comparison has shown the overall performance is improved, and the new model is effective.


Introduction
Sentiment analysis, also known as sentiment polarity analysis, predicts the sentiment polarity of text by acquiring, processing, analyzing, summarizing, reasoning, and so on. In real life, based on the results of sentiment analysis, leaders can make more targeted decisions. In recent years, with the continuous development of artifcial intelligence, sentiment analysis has gradually entered the practical application. It is based on the industry data analysis and the key technologies such as natural language processing and in-depth learning. By classifying the emotional attitudes of diferent data, the intelligent service system based on sentiment analysis can provide decision-making basis for decision-makers and provide more human-oriented services for ordinary users. Tus, to improve service efciency and service quality as a whole [1], the core of sentiment analysis is sentiment classifcation. Te difculty is how to understand the emotional information of diferent words in the text [2]. At present, from the perspective of techniques, there are three kinds of text sentiment analysis methods, namely, sentiment analysis method based on emotional dictionary, traditional machine learning, and deep learning, respectively.
For the sentiment analysis method based on emotional dictionary, Yang and Zhang introduced SentiWordNet emotional dictionary to build a model and analyzed the emotional words and sentences according to it [3]. Te main point of this method is to build a dictionary containing words and corresponding emotional information, and the quality of the dictionary directly determines the efect of sentiment classifcation. Because there is always a bias between the emotional dictionary and the actual application, the result of sentiment classifcation based on the method is often poor. On the sentiment analysis method based on traditional machine learning, Sudhir and Suresh have compared the models on IMDB (Internet Movie Database) data set [4]. On the basis of considering special words, such as afective words, negative words, and degree adverbs, this method also makes a comprehensive evaluation of most other words in the text, which is suitable for the processing of long text [5]. Tis method requires manual selection, labeling, and extraction of emotional features, and the results of the fnal emotional classifcation are relatively general.
In the sentiment analysis methods based on deep learning, the key is to establish deep learning models and complete the text sentiment classifcation through model training. Tis method is based on a deep neural network to simulate the network structure of the human brain and automatically mines the deep-seated and relevant semantic features of text information. So, the results of emotional classifcation are often better than that of sentiment analysis methods based on emotional dictionary and traditional machine learning [6]. With the deepening of the deep learning research, sentiment analysis based on deep learning models has gradually developed [7]. Kim and Jeong proposed sentiment analysis based on the CNN (Convolutional Neural Network) model [8]. Alireza et al. proposed a weighted convolutional neural network for multimodal sentiment analysis based on integrated transfer learning [9]. It has been proved that the CNN model can efectively understand the local features of text information for sentiment classifcation, but it is not close enough to the context information, and the accuracy of sentiment classifcation is not high. Khan et al. used CNN + LSTM architecture to conduct in-depth sentiment analysis of information shared on social media. Tey proved that the LSTM (Long Short-Term Memory) model could efectively interact with context information and the hybrid model had better efect than a single neural network model [10]. Gao et al. conducted aspect layer sentiment analysis to the short text based on CNN + BiGRU and proved that the two-way information interaction mechanism could efectively improve the results of emotional classifcation, but the information processing on the input layer is too single [11]. Kalaiarasu et al. extracted features through word2vec (Word Embeddings), using the improved new convolution neural network for sentiment analysis. Tey proved that the pretrained word vector could better quantitatively describe the relationship between diferent words, improve the input layer structure, and greatly improve the results of emotional classifcation, but there was a certain loss of text information [12]. Wang and Liu proposed a method based on doc2vec (Paragraph Vector) and the deep neural network to analyze the emotion of text information, which not only reduced the training cost but also had high efciency. It proved that the doc2vec model could well reduce the loss of information [13], but the accuracy of sentiment classifcation is low. Fan proposed a sentiment analysis method based on the BERT + BLSTM + Attention model and introduced attention mechanism. Tey proved that diferent texts were given diferent weights to extract the relatively important parts of the input text, which could improve the accuracy of sentiment classifcation [14]. However, the model which can efectively integrate the advantages of each model and not only improve the accuracy but also efectively reduce the semantic loss and increase the expression ability of text features has not been seen so far.
Te main purpose of this paper is to reduce the loss of semantic information as much as possible on the premise of improving the accuracy of emotion classifcation. Tis study focuses on the establishment and training process of the hybrid deep learning model which introduces the doc2vec model and attention mechanism. Te input layer structure of the general deep learning model is improved through the doc2vec paragraph vector model to reduce the loss of semantic information. Te CNN model is used to enhance the local feature expression ability of text information, and the BiLSTM model is added to make up for the lack of close connection of context information in the CNN model. By adding attention mechanism, text information of diferent importance is reasonably allocated. Finally, the new model is compared with other existing models on two public datasets, which then analyze the results of diferent models to verify the efectiveness of the proposed model.
Te main contributions of this work are summarized as follows: (1) It innovatively integrates the doc2vec model with the deep learning model and attention mechanism to increase the ability of text feature expression while minimizing the loss of semantic information.

A Hybrid Deep Learning Model Introduced Doc2vec Model and Attention Mechanism
Trough the research on the sentiment analysis at home and abroad, an innovative hybrid deep learning model combined with the doc2vec model and attention mechanism is proposed. Te model integrates the characteristics of each model to improve the accuracy of sentiment classifcation. Te structure of the model is shown in Figure 1. Te model is mainly composed of doc2vec model, CNN model, BiLSTM (Bidirectional Long Short-Term Memory) model, and attention mechanism. First, text information is transformed into text vector matrix through the doc2vec model, which is used as embedded layer. Secondly, through the CNN model, feature extraction is carried out in the convolution layer, information fltering is carried out in the pooling layer, and the feature dimension is reduced. Ten, the context information is exchanged through the cell gating mechanism in the BiLSTM model, and the dropout technology is used to prevent over ftting. Finally, the attention mechanism is used to allocate resources for text information 2 Computational Intelligence and Neuroscience of diferent importance, then the full connection layer is added, and the emotional classifcation results are output through the classifer.

Model Representation.
For the dataset D with given text information, it contains m text sentences denoted as X {x 1 , x 2 , . . ., x m } and the corresponding emotional tags of each text sentence denoted as Y{y 1 , y 2 , . . ., y m }, in which each text sentence x i is composed of n words denoted as {x i1 , x i2 , . . ., x in }, and the fnal objective function is shown in the following formula : where θ represents all parameters involved in the model and f( * ) is the mathematical expression of the network model.

Doc2vec
Model. Te doc2vec paragraph vector model [15] is introduced into the hybrid model to further improve the semantic understanding of datasets. Compared with the word2vec word vector model, the doc2vec paragraph vector model has a simpler training method. It accepts sentences of diferent lengths as training samples and trains the corresponding feature vectors directly from the corpus, with less loss of information. Doc2vec is proposed based on word2vec, and there are two training methods. One is the PV-DM (Distributed Memory Model of Paragraph Vectors) model, which predicts the probability of words given paragraph vectors and context words, similar to the CBOW (Continuous Bag-of-Words Model) model in word2vec. Te specifc structure is shown in Figure 2.
Te other is the PV-DBOW (Distributed Bag of Words of Paragraph Vector) model, which predicts the probability of a group of random words in a paragraph given a paragraph vector, similar to the Skip-gram (Continuous Skipgram Model) model in word2vec. Te specifc structure is shown in Figure 3.
Tis paper selects the PV-DM method in the doc2vec paragraph vector model for training. Te preprocessed data set is directly used as the training corpus. Trough the existing doc2vec model in the Gensim database, the training corpus is saved in the sentences using the TaggedLineDocument function. Ten, the minimum word frequency, maximum distance, feature vector dimension, accelerated training method, and other parameters are set. In addition, the parameters of training model are all default values. Tus, the text vector space model is generated. Finally, the doc2vec paragraph vector model after training is saved. Te specifc training process is as follows: First, given the initial words for training, assuming that there are M known words, the ultimate goal is to maximize the average logarithmic probability of the known words to predict the target words. Te specifc calculation is shown in the following formula : Secondly, for each output word ω i , the average value of the word vector is extracted from the word vector matrix W by function f. In addition, a and b represent parameters, and the specifc calculation is shown in the following formula : y � a + bf ω m−k , · · ·, ω m+k ; W . (3) Next, the word ω m is predicted by the softmax function, and y ω i represents the nonstandardized logarithmic function. Te specifc calculation is as shown in the following formula : P ω m ω m−k , · · ·, ω m+k � e y ω m i e y ω i .

Computational Intelligence and Neuroscience
Finally, the paragraph vector paragraph id is added. Tis paragraph vector not only has a fxed length but also has the same length as the word vector, so it has better adaptability to new data. In formula (3), function f is jointly constructed by paragraph matrix V and word matrix W.
To sum up, the doc2vec paragraph vector model is selected. First, the data set preprocessed is divided through the train_test_split function in Python's sklearn library. Secondly, the trained doc2vec paragraph vector model is converted into a visualized txt document, and the word vector of each text is mapped to the trained model one by one through the Tokenizer word splitter. Finally, according to the matching results, weight is allocated. Ten, the embedded layer of the model for subsequent training is received. Te specifc process is shown in Figure 4.

CNN Model.
Te local features of text information are extracted through the CNN model. Te CNN is a feed forward neural network that performs translation classifcation on input information [16]. Te specifc structure is shown in Figure 5, which mainly includes the following four parts: (1) Input Layer: Tis layer takes the output of doc2vec embedded layer as the input of the CNN layer. Te word vector of each comment in the dataset is denoted as where n is the number of words and d is the vector dimension. (2) Convolution Layer: Tis layer selects the Con-volution1D function of the Keras Library in Python and extracts the features of the input layer data through the flter. Te calculation function is shown in the following formula : where ω represents the convolution kernel, g represents the size of the convolution kernel, x i: i+g−1 is the sentence vector composed of i to i + g − 1 words, b represents the ofset term, and the obtained feature matrix J represents {c 1 , c 2 , . . ., c n−g+1 }. Relu [17] is selected as the activation function, which is simple and can efectively alleviate the disappearance of gradient. Te calculation function is as shown in the following formula : (3) Pooling Layer: Tis layer selects the MaxPooling1D pooling function of the Keras Library in Python, selects and flters the features extracted from the convolution layer, and reduces the feature dimension. Te calculation function is as follows: (4) Full Connection Layer: Tis layer connects the M i vector after the previous layer into vector Q, which is used as the input information of the BiLSTM layer. Te calculation function is shown in the following formula : , which is a time-based recurrent neural network that can solve long-term dependence. Compared with the original RNN, LSTM adds a cell state and realizes information interaction through the gating mechanism. Te model structure is shown in Figure 6, which is composed of forgetting gate, input gate, and output gate.
(1) Forgetting Gate: It discards useless information mentioned above through this step. Te structure of the forgetting gate is shown in Figure 7. Te gate reads h t−1 and x t frst, and then output f t through σ, and the result value is between 0 and 1. Here, h t−1 represents the output information of the last cell state, x t represents the input information of the current cell state, σ represents sigmoid function, the result value 0 indicates complete rejection, and the result value 1 indicates complete retention. Te forgetting gate function f t is expressed as formula (9), where W f represents the weight and b f represents the ofset.
(2) Input Gate: Tis step completes the update of cell status information, which is divided into the following two operations: (a) As shown in Figure 8, we determine the updated information. First, we get i t by σ layer. Te result value is between 0 and 1, which is used to determine what information to be updated. Ten, a vector c t is generated by tanh function to flter the input candidate information. Here, σ indicates sigmoid function, result value 0 indicates unimportant information, and result value 1 indicates important information. Te input gate function is denoted as formula (10), and the input candidate information function c t is denoted as formula (11). Here, W i and W c represent weights and b i and b c represent ofsets.
(b) As shown in Figure 9, we update the cell status. First, we multiply the old cell state C t−1 by the forgetting gate function f t and discard the useless above information. Ten, we multiply the input gate function by the input candidate information function c t to determine the updated   Computational Intelligence and Neuroscience information. Finally, we combine the two parts to realize the transformation from C t−1 to C t and complete the update of cell state. Te function is as shown in the following formula : (3) Output Gate: It determines the output information through this step.
Te structure of the output gate is shown in Figure 10. First, we get o t through the σ layer sigmoid function, determining which part of the cell state to be output. Ten, we process the cell state C t at time t by tanh function and obtain a value between −1 and 1. Finally, these two parts are multiplied to obtain h t , which is used to determine the fnal output information of cell state. Te output gate function o t is denoted as formula (13), and the output information function h t is denoted as formula (14), where W o represents weight and b o represents ofset.
Compared with the unidirectional LSTM structure, the BiLSTM structure can obtain the long-term information of the text in both directions and reduce the error as much as possible. Te BiLSTM model structure is shown in Figure 11.
Each cycle of BiLSTM is composed of two parts, namely, front hidden state h → t and rear hidden state h ⃖ t , and both of them are LSTM network structures [18]. Similar to the unidirectional LSTM, h → t means to update the cell status and information from the front to the back along the direction of time t from time 0, while h ⃖ t means to update the cell status and information from the back to the front along the direction of time 0 from time t. Finally, the splicing results of time t h → t and h ⃖ t are output. Te calculation function is as follows: To sum up, the BiLSTM layer is selected. Tat is, the bidirectional function of the Keras library in Python is added in front of the unidirectional LSTM layer. Te local semantic information extracted from the CNN layer is processed through a two-way loop to obtain the context semantic information of long text and measures of preventing overftting are added.

Attention Mechanism.
Attention mechanism is a data analysis method of deep learning. First, we focus on important local information and then combine the information from diferent regions to form an overall evaluation [19].
We take the output information of the previous layer as the input information of this layer and assume that the input information is in the form of (Key, Value), where Key represents keyword, Value represents weight, and set vector Query represents the query. Ten, the specifc calculation process of attention level is shown in Figure 12. Figure 9: LSTM model-input gate 2. Figure 10: LSTM model-output gate.
x t

Computational Intelligence and Neuroscience
First, we calculate the similarity between each vector Query and the Key of each input information through function F and get the attention score e ti , as shown in the following formula : Secondly, we normalize the attention score e ti by the softmax function to obtain the weight coefcient α ti of the Value of the input information. Te larger the weight coefcient, the higher attention should be paid to this part of the information, as shown in the following formula : Finally, the Value of each input information is weighted and summed with its corresponding weight coefcient to obtain the attention value, as shown in the following formula : To sum up, the attention mechanism uses the attention function of the Keras library in Python to allocate the weight and calculate the attention value of the features obtained by the BiLSTM layer. It then judges the importance of the text information according to the size of the results to complete the corresponding resource allocation. Finally, we obtain the result of text sentiment classifcation through the output layer.

Experiments and Analysis
In this paper, the deep hybrid model, which integrates doc2vec model, deep learning model, and attention mechanism, is applied to sentiment analysis tasks. Diferent models are compared on the same dataset to verify their Forward LSTM Backward LSTM Figure 11: BiLSTM model structure.  Computational Intelligence and Neuroscience efectiveness. Te hybrid model runs on windows10 system and is built in Python language based on the deep learning framework Keras in PyCharm integrated development environment.

Experimental Data.
Te experimental datasets are the IMDB movie review dataset and the DailyDialog multiround dialog text dataset. Te IMDB dataset contains 50000 movie emotional reviews, and emotional polarity is a secondary category. Among them, about 25000 emotional comments are negative polarity, marked as 0, and the remaining 25000 emotional comments are positive polarity, marked as 1. Te DailyDialog dataset contains tens of thousands of dialogues, and each dialogue is divided into several sentences, each of which corresponds to seven emotional attitudes. Among them, we delete the sentences marked with 0 and classify the sentences marked with 1∼3 as negative polarity and 4∼6 as positive polarity.
Usually, the original dataset needs to be preprocessed before the formal sentiment analysis experiment. Te preprocessing process of the experimental dataset mainly includes three parts, namely, corpus cleaning, text segmentation, and removal of pause words, as follows: (1) Corpus Cleaning: We delete the html tags and non-English characters (punctuation, numbers, etc.) in the dataset and convert all English characters into lowercase mode. Te dataset after corpus cleaning can eliminate the interference of irrelevant texts and improve the accuracy of sentiment classifcation of the model. For example, "this," "was," and "a" in "this was a discarding flm" belong to stop words that need to be deleted. Te fnal result is "discarding flm." Te removal of stop words can reduce the space occupied by the dataset and improve the efciency of model sentiment analysis.
Te IMDB dataset and DailyDialog dataset are divided into training set, test set, and verifcation set according to 8 : 1 : 1, as shown in Table 1.

Experimental Parameter Setting.
In this paper, the model's training parameters are set as following, including the doc2vec + CNN + BiLSTM + Attention model training parameters (see Table 2) and the doc2vec model training parameters (see Table 3).

Evaluation Index
3.3.1. Accuracy. Accuracy is the most commonly used indicator to measure the performance of a model, which indicates the proportion of correctly predicted data in all data. Generally speaking, the higher the accuracy is, the better the model efect is. Te calculation method is shown in the following formula : where TP refers to the number of samples whose actual emotional polarity is positive and predicted to be positive, that is, the number of samples of the real examples; FP represents the number of samples that are actually negative but are predicted to be positive, that is, the number of false positive examples; FN represents the number of samples that are actually positive but predicted to be negative, that is, the number of false counterexamples; TN represents the number of samples that are actually negative and predicted to be negative, that is, the number of true counterexamples [20]. Te confusion matrix is shown in Table 4.

Loss Rate.
Te loss rate describes the diference between the predicted value and the real value. Te sentiment analysis problem studied in the paper belongs to the binary classifcation problem, so the binary cross entropy algorithm is selected as the loss function. Te ultimate goal is to optimize all parameters, minimize the objective function (loss function) as far as possible, and guide the model to move towards convergence in the training process. Te function is as the following formula : where y is the binary tag 0 or 1, and p(y) is the probability that the output belongs to the y tag.

Experimental Comparison Model.
In this paper, diferent control groups are set up to conduct comparative experiments on the same data set to verify the performance of the new hybrid model based on doc2vec + CNN + BiLSTM + Attention. Te control groups are divided into three groups, which are comparison experiments of sentiment analysis based on simple deep learning models, word2vec models, and doc2vec models, respectively.

Comparative Experiments of Sentiment Analysis Based on Simple Deep Learning Models.
In the experiments, CNN + LSTM hybrid model, CNN model, and LSTM model are selected for comparative experiments. Te hybrid deep learning model is compared with the single deep learning models to verify whether the deep learning model can efectively combine the advantages of each part, and the efect is better than the single model. Among them, the structure of the CNN + LSTM model is shown in Figure 13.
In the experiment, frstly, the dataset is preprocessed and transformed into text vector matrix as the input layer of the model. Secondly, the convolution calculation is carried out by setting the corresponding parameters in the CNN layer, and the ReLU function is selected as the activation function, while the pool function is selected as the corresponding pool function in the pool layer to do the feature selection and dimension reduction. After obtaining the local semantic information of the text, the cell state is continuously updated through the gating mechanism of the LSTM layer, so as to obtain the semantic information of the long text. Te dropout function is added to efectively prevent the over ftting in the model training process. Finally, the full connection layer sets units and activation functions, and the output layer obtains the emotional classifcation results.

Comparative Experiments of Sentiment Analysis Based on word2vec Models.
Te experiments combine the word2vec model with CNN model, LSTM model, and CNN + LSTM model, respectively, to verify the efectiveness of the word2vec word vector model. It can efectively avoid experimental contingency, reduce error, and make the results more objective, as follows: (1) Word2vec + CNN Model: Based on the CNN model, the trained word2vec word vector model is used as the embedding layer to replace the original input layer. Te weight of the embedding layer is changed from the original randomly assigned vector to word2vec word vector. Te convolution layer sets the corresponding parameters for convolution calculation, and the activation function selects the ReLU function. Te pooling layer selects the corresponding pooling function for feature selection and dimension reduction and then adds the dropout layer. Te full connection layer sets units and activation functions. Finally, the output layer obtains the sentiment classifcation results. Te structure of the word2vec + CNN model is shown in Figure 14.
(2) Word2vec + LSTM Model: Based on the LSTM model, the trained word2vec word vector model is used as the embedding layer to replace the original input layer. Te weight of the embedding layer is not a randomly assigned vector but a word2vec word vector. We set LSTM layers and add dropout layer to prevent over ftting. Te full connection layer sets units and activation functions, and fnally the output layer obtains the sentiment classifcation results. Te structure of the word2vec + LSTM model is shown in Figure 15.  Figure 15, and the weight of the embedding layer is changed from the original randomly assigned vector to word2vec word vector.
In addition, the model structure remains unchanged. Te word2vec + CNN + LSTM model structure is shown in Figure 16.

Comparative Experiments of Sentiment Analysis Based on doc2vec Models.
In the experiments, the doc2vec + CNN + LSTM model is compared with the word2vec + CNN + LSTM model to verify the efectiveness of the doc2vec paragraph vector model. Ten, we set the doc2vec + CNN + BiLSTM model to verify the impact of bidirectional LSTM structure and attention mechanism on the performance of the model. Te details are as follows: (1) Doc2vec + CNN + LSTM Model: Based on the word2vec + CNN + LSTM model, the trained doc2vec paragraph vector model is used as the embedding layer to replace the original word2vec word vector embedding layer. In addition, the model structure remains unchanged. Te doc2vec + CNN + LSTM model structure is shown in Figure 17.   unidirectional LSTM layer is replaced by the BiLSTM layer. In addition, the model structure remains unchanged. Te doc2vec + CNN + BiLSTM model structure is shown in Figure 18.

Analysis of Experimental Results
. Te model is evaluated by comparing the results of accuracy and loss rates of different models in the test set. Te experimental results are shown in Table 5.

Analysis of Experimental Results Based on Simple Deep
Learning Models. Experimental results based on simple deep learning models are as shown in Figure 19. According to Figure 19, on the IMDB dataset, the accuracy of a single LSTM model is 13.2% higher than that of a single CNN model, but the loss rate is 8.8% higher. Te accuracy of the CNN + LSTM hybrid model is 13.7% higher than that of CNN model, 0.5% higher than that of the LSTM model, and the loss rate is 4% higher than that of the CNN model and 4.8% lower than that of the LSTM model. On the DailyDialog dataset, the accuracy of a single LSTM model is 1.1% higher than that of a single CNN model, while the loss rate is 6.6% higher. Te accuracy of the CNN + LSTM hybrid model is 2.5% higher than that of the CNN model, 1.4% higher than that of the LSTM model, and the loss rate is 3.4% higher than that of the CNN model and 3.2% lower than that of the LSTM model.
Compared with the local information of the text, the understanding of context information has a greater impact on the accuracy of sentiment classifcation. Te CNN + LSTM hybrid model has the best performance [21]. It pays attention to both the local characteristics of information and the abovementioned time interval characteristics and then makes emotion prediction, which can efectively improve the performance of the model [22].

Analysis of Experimental Results Based on word2vec
Models. Experimental results based on word2vec models are as shown in Figure 20. According to Figure 20, on the IMDB dataset, the accuracy of the word2vec + CNN model is 13.3% higher than that of the separate CNN model, and the loss rate is 15.8% lower; compared with the LSTM model, the accuracy of the word2vec + LSTM model is 1.6% higher and the loss rate is 10.3% lower. Te accuracy of the word2vec + CNN + LSTM model is 2.5% higher than that of the CNN + LSTM model, and the loss rate is 8.1% lower.
On the DailyDialog dataset, the accuracy of the word2vec + CNN model is 2.2% higher than that of the single CNN model, and the loss rate is 0.7% lower; compared with the LSTM model, the accuracy of word2vec + LSTM model is 1.3% higher and the loss rate is 2.8% lower. Te accuracy of the word2vec + CNN + LSTM model is 0.4% higher than that of the CNN + LSTM model, and the loss rate is 1% lower.
It shows that the accuracy of the three models with word2vec pretraining word vector is signifcantly higher than that of the corresponding simple deep learning model. Te word vector model pretrained by word2vec can better understand the semantic information and then efectively improve the structure of the simple deep learning input layer to get a higher performance model.   Figure 13: CNN + LSTM model structure.

Analysis of Experimental Results Based on doc2vec
Models. In order to further explore the role of doc2vec model in reducing the loss of semantic information, we compare the doc2vec model with the word2vec model frstly.
On the IMDB dataset, we load the trained word2vec word vector model and the doc2vec paragraph vector model and get the cosine similarity between diferent words. For example, the top 10 words with the highest cosine similarity with "terrible" are shown in Figures 21 and 22.
For the same word "terrible," the top ten words with the highest cosine similarity have changed. It indicates that there is a deviation in the understanding of semantic information between word2vec word vector model and doc2vec paragraph vector model. Word2vec pretrained word vectors can better understand the semantic information of the text than general word vectors and can quantitatively analyze the relationship between diferent words. However, word2vec ignores the infuence of the order of words on semantic understanding, while doc2vec introduces paragraph vectors with variable sentence length, which is simpler and more fexible. It can improve the abovementioned problems and reduce the loss of semantic information. Te deeper and more accurate the semantic understanding, the better the result of emotional classifcation of the model.
Secondly, according to Figure 23, we compare and analyze the doc2vec + CNN + LSTM and the word2vec + CNN + LSTM model. On the IMDB dataset, the accuracy of the doc2vec + CNN + LSTM model is 0.6% higher than that of the word2vec + CNN + LSTM model, and the loss rate is 20.8% lower. On the DailyDialog dataset, the accuracy of the doc2vec + CNN + LSTM model is the same as that of the word2vec + CNN + LSTM model, and the loss rate is 1% lower. It shows that the gap between the predicted value and the real value of the doc2vec model is smaller, which can better retain the semantic information of the text, reduce the loss of information, and make the fnal model performance more stable.
Ten, we compare and analyze the doc2vec + CNN + BiLSTM and the doc2vec + CNN + LSTM model. On the IMDB dataset, the accuracy of doc2vec + CNN + BiLSTM model is 0.6% higher than that of the doc2vec + CNN + LSTM model, but the loss rate is 0.2% higher. On the DailyDialog dataset, the accuracy of the doc2vec + CNN + BiLSTM model is 0.7% higher than that of doc2vec + CNN + LSTM model, and the loss rate is 3.2% lower. It shows that the bidirectional LSTM structure can better retain historical information, interact with context information, and improve the accuracy of sentiment classifcation than the unidirectional LSTM structure.
Finally, we compare and analyze the doc2vec + CNN + BiLSTM + Attention and the doc2vec + CNN + BiLSTM model. On the IMDB dataset, the accuracy of the doc2vec + CNN + BiLSTM + Attention model is 0.4% higher and the loss rate is 3.2% lower than that of the doc2vec + CNN + BiLSTM model. On the DailyDialog     Mechanism》 combines bidirectional GRU, multichannel convolution, and attention mechanism [24].
Te comparison results of sentiment analysis models are shown in Table 6.
According to Table 6, the accuracy of sentiment classifcation of the abovementioned models is diferent due to diferent processing of input information, diferent neural network selection, or diferent internal structure design of the model, as follows: (1) Comparing the doc2vec + CNN + LSTM model with the CNN-BiLSTM-Attention model, we can see that the accuracy of the two models is consistent. It shows that the doc2vec paragraph vector model and attention mechanism have a great impact on the accuracy of sentiment classifcation.  (4) Comparing the new hybrid model based on doc2vec + CNN + BiLSTM + attention in this paper with CNN-BiLSTM-Attention model and ATT-MCNN-BGRUM model in the literature, we can see that the accuracy of the new model is 1.0% and 0.5% higher than that of the literature models, respectively. It shows that the doc2vec paragraph vector model can efectively reduce the loss of semantic information, and the attention mechanism can reasonably allocate emotional weights to improve the fnal classifcation efect of the model.
In summary, the new doc2vec + CNN + BiLSTM + attention model can efectively play the structural features of each part and has good performance. Te doc2vec structure enhances the understanding of the overall semantic information of the sentence through the paragraph vector. Te CNN structure extracts the local features of the text. Te BiLSTM structure completes the information interaction of the context through a two-way cycle. Te attention mechanism allocates weights and resources according to the importance of the text information. In addition, the arrangement of the internal structure of the model is also more appropriate, thus greatly improves the performance of the model. Terefore, the doc2vec + CNN + BiLSTM + Attention model is a better model of text sentiment analysis with good performance.

Conclusion
Tis paper innovatively integrates the doc2vec model with the deep learning model and attention mechanism to increase the ability of text feature expression, minimize the loss of semantic information, and improve the accuracy of sentiment classifcation. Te doc2vec + CNN + BiLSTM + attention hybrid model makes full use of the advantages of each structure. It correctly extracts the local features of the text information through CNN structure, enhances the interaction ability of the context information through the BiLSTM structure, better understands semantic information and reduces the semantic information loss through the doc2vec model, and reasonably allocates important resources through the attention mechanism. Te accuracy is 91.3% and 93.3%, and the loss rate is 22.1% and 19.9% on the IMDB dataset and the the Daily-Dialog dataset, respectively. Te accuracy of the hybrid model is 1.0% and 0.5% higher than that of the literature's models. Te experimental results show that the new model efectively exerts the advantages of each part, improves the accuracy, and has stable performance. However, the new model mainly aims at the binary classifcation problem and has weak ability to deal with complex problems [25]. Te processing speed is slow, and the prediction accuracy can be further improved. Terefore, it is necessary to further improve the algorithm in the future, such as expanding the binary classifcation problem to a multiclassifcation problem on the premise of ensuring the accuracy of sentiment classifcation, combining hardware to improve model speed [26], and combining domain knowledge [27] to further reduce semantic loss and improve accuracy.

Data Availability
Te data involved in the research are public data. Tey do not involve ethics and commercial secrets. Te IMDB
14 Computational Intelligence and Neuroscience