Aspect-Level Sentiment Analysis Based on Position Features Using Multilevel Interactive Bidirectional GRU and Attention Mechanism

. The aim of aspect-level sentiment analysis is to identify the sentiment polarity of a given target term in sentences. Existing neural network models provide a useful account of how to judge the polarity. However, context relative position information for the target terms is adversely ignored under the limitation of training datasets. Considering position features between words into the models can improve the accuracy of sentiment classiﬁcation. Hence, this study proposes an improved classiﬁcation model by combining multilevel interactive bidirectional Gated Recurrent Unit (GRU), attention mechanisms, and position features (MI-biGRU). Firstly, the position features of words in a sentence are initialized to enrich word embedding. Secondly, the approach extracts the features of target terms and context by using a well-constructed multilevel interactive bidirectional neural network. Thirdly, an attention mechanism is introduced so that the model can pay greater attention to those words that are important for sentiment analysis. Finally, four classic sentiment classiﬁcation datasets are used to deal with aspect-level tasks. Experimental results indicate that there is a correlation between the multilevel interactive attention network and the position features. MI-biGRU can obviously improve the performance of classiﬁcation.


Introduction
Capturing and analyzing the sentiments implied in large-scale comment texts has become a central topic for natural language processing (NLP). e tasks of fine-grained sentiment classification of the target terms in a given context are called aspectlevel sentiment analysis, which have received considerable attention compared with acquiring traditional comprehensive sentiment polarity [1,2]. A growing number of prestigious researchers and engineers around the world have post their opinions and reports on topics of sentiment classification online and offer them for free. ese technical contributions have been properly accepted and acclaimed for their obvious advantages in NLP tasks. However, further aspect-level sentiment analysis is quite sensitive to current researchers. ere are many problems in sentiment classification of aspect-level, including classification, regression, and recognition. We mainly focus on classification issues [3].
Sentiment predictions for a target term in a text are important for our increased understanding of sentence semantics and user emotions behind the sentences. e typical feature of aspect-level sentiment analysis can be exemplified in studies using the follow sentence expression: "they use fancy ingredients, but even fancy ingredients do not make for good pizza unless someone knows how to get the crust right." e sentiment polarity of the target terms "ingredients," "pizza," and "crust" were positive, negative, and neutral, respectively. However, one potential problem is that the predictive accuracy of polarity is much lower than expected by the application, which is limited to complex sentence features and language environments. Traditional methods of comprehensive sentiment evaluation do not meet the requirements of fine-grained aspect-level tasks based on the target terms [4]. ere are few studies that have investigated the association between sentiment polarity and the position information of target terms. Hence, this paper proposes a multilevel interactive bidirectional attention network model, integrating bidirectional GRU and position information to improve the accuracy of aspect-level sentiment predictions.
Traditional published methods for processing aspectlevel tasks are limited to the selection of feature sets. e focus of these studies, such as bag-of-words and sentiment lexicons [5], is to manually label a large number of features. Scholars have long debated the waste of labour on manual marking. However, the existing studies indicate that the quality of training models largely depends on the constructed labelled featured set. Recently investigators have examined the effects of deep learning compared with traditional manual generation methods in NLP tasks [6,7]. e former has a clear advantage.
Recurrent neural network (RNN) can extract the essential features of word embedding by using a multilevel recurrent mechanism and then generate a vector representation of the target sentences. Most of sentiment classification models using RNN can achieve acceptable results through well-established tuning steps [8]. More recent attention on the sentiment classification tasks has focused on the provision of RNN variants. e first method to improve the models is to adjust their structures. For example, targetdependent long short term memory (TD-LSTM) [9] can divide a context into left and right parts according to target terms. en, the hidden states to deal with aspect-level tasks are generated by combining two LSTM models in structure. e second is characterized by a change in the input of the models. For example, there are methods to associate the target term vectors with the context vectors as the whole input of the LSTM model, which can realize aspect-level tasks by enhancing the semantic features of the words [10].
Research on the method of neural networks on aspectlevel tasks has been mostly restricted to limited performance improvement. However, few studies have been able to draw on systematic research into the importance of the words in a sentence. In other words, we cannot effectively identify which words in a sentence are more indispensable and cannot accurately locate these key words in aspect-level tasks. Fortunately, attention mechanisms which are widely used in machine translation [11], image recognition [12], and reading comprehension [13,14] can solve this problem. Attention mechanisms [15] can be utilized to measure the importance of each word in a context to the target terms, in which attentions are ultimately expressed as weight score. e model will focus more attention on the words with high weight score and extract more information from the words related to the target terms, thus improving the performance of classification. Some scholars have invested in this domain and achieved excellent results such as AE-LSTM, ATAE-LSTM [16], MemNet [17], and IAN [18]. However, the influence of position parameters of target terms on classification performance has remained unclear [19,20]. is indicates a need to understand the actual contribution of the position information.
Researchers review that the sentiment polarity of a target term contained in a sentence is related to the context around it, but not to those with greater distant. A well-constructed aspect-level model should allocate higher weight score to the contexts that possess closer relative distance to the target term. e idea can be illustrated briefly by the following sentence: "they use fancy ingredients, but even fancy ingredients do not make for good pizza unless someone knows how to get the crust right." In this case, the polarity is positive, negative, and neutral when the target term is setting as the word "ingredients," "pizza," and "crust," respectively. In order to decide the polarity, we should intuitively concentrate on the words that are close to the target one and then consider the other words far away from it. Hence, the word "fancy" in the case compared to other words such as "good" and "get" will make a greater contribution to determine the polarity of target term "ingredients." Consequently, adding position features can enrich word semantics in the embedding process. is work attempts to illuminate the fact that a model with position information can learn more sentence features on aspect-level tasks.
is study proposes an improved aspect-level classification model by combining multilevel interactive bidirectional Gated Recurrent Unit, attention mechanisms, and position features (MI-biGRU). e function of the model consists of three parts: (1) Calculate the positional index of each word in the sentence based on the current target term and express it in an embedding (2) Extract semantic features of target words and context using multilevel bidirectional GRU neural network (3) Use bidirectional attention mechanism to obtain the weight score matrix of hidden states and determine relevance of each context to the target word e model not only extracts the abstract semantic features of sentences but also calculates the position features of words in parallel by a multilevel structure. A vector representation with more features can be obtained according to the bidirectional attention mechanism, which can enhance the performance of sentiment classification tasks. On top of that, bidirectional embeddings can be brought together to tackle accurate sentiment classification at the fine-grained level. Finally, the effectiveness of this model will be evaluated by using four public aspect-level sentiment datasets. e experimental results show that the proposed model can achieve good sentiment discrimination performance at aspect-level on all datasets. is paper is organized as follows. Section 2 introduces the related work. Section 3 formulates the improved model MI-biGRU that is composed of multilevel interactive bidirectional Gated Recurrent Unit, attention mechanisms, and position features. Section 4 deals with experiments and reporting results on aspect-level sentiment analysis. Conclusions and future work are presented in Section 5.

Related Work
is section introduces the development of sentiment analysis in recent years. e general research can be divided into three parts: traditional sentiment analysis methods, 2 Discrete Dynamics in Nature and Society neural network-based methods, and applications of attention mechanism in aspect-level tasks.

Traditional Sentiment Analysis Methods.
Existing traditional methods on sentiment classification are extensive and focus particularly on machine learning technologies, which solve two problems: text representation and feature extraction. First of all, several studies have used support vector machines (SVM) to deal with text representation research studies in sentiment classification tasks [21]. According to the formulation of SVM, all the words of the text do not make a distinction between target terms and normal context. ere are other relatively text representation methods of the literature that is concerned with sentiment words [22,23], tokens [24], or dependency path distance [24]. Above methods are called coarse-grained classifications. On top of that, the majority of studies on feature extraction have obtained the sentiment lexicon and bag-of-words features [25][26][27]. ese methods have been playing an increasingly important role in improving the performance of classification. Yet, these existing types of approaches have given rise to a lot of heated debate. e model training is heavily dependent on the features we extract. Manually labelling features will inevitably take a lot of manpower and time resources. As a result, the classification performance is low because of the high dimension of useful information if the features are obtained from an unlabelled text.

Neural Network-Based Sentiment Analysis Methods.
Using neural network-based methods has become increasingly popular among the sentiment classification tasks for their flexible structure and pleasant performance [28]. For example, all the models such as Recursive Neural Networks [29], Recursive Neural Tensor Networks [30], Tree-LSTMs [31], and Hierarchical LSTMs [32], more or less enhance the accuracy of sentiment classification by different constructive model structures. e models above improve the accuracy compared with traditional machine learning. However, researchers have come to recognize their inadequacies. Ignoring to distinguish the target terms of a sentence will greatly decrease the classification effect. erefore, some scholars in academia have turned their attention to the target terms. Jiang et al. performed a similar series of experiments to show the significance of target terms for sentiment classification tasks [5]. Tang et al. reviewed the literature from the period and proposed two improved models TD-LSTM and TC-LSTM, which can deal with the problem of automatic target term extraction, context feature enhancement, and classification performance improvement. Zhang et al. conducted a series of trials on sentiment analysis in which they constructed a neural network model with two gate mechanisms [33]. e mechanisms implement the functions of extracting grammatical and semantic information and the relationship information between the left and right context for a target term, respectively. Finally, information extracted by two gate mechanisms is aggregated for sentiment classification. Overall, these studies highlight the need for target terms, but there is no reference to the position of such terms and the relationship between position information and classification performance.

Application of Attention Mechanism in Aspect-Level
Sentiment Analysis. Deep learning technologies are originally applied in the field of images, which have gradually turned to NLP area and achieved excellent results. Attention mechanisms in deep learning are necessarily serve as an effective way to highly accurate sentiment classification. A few NLP researchers, who have surveyed the intrinsic relevance between the context and the target terms in sentences, have been found. For example, Zeng et al. designed an attention-based LSTM for aspectlevel sentiment classification [10], which processes target term embedding and word embedding in the pretraining step, simultaneously. en, the target term vector will be put into an attention network to calculate the term weight. More recent attention has focused on aspect-level tasks with similar attention mechanisms. Tang et al. designed a deep memory network [17] with reporting multiple computational layers. Each layer is a context-based attention model, through which the relationship weight from context to target terms can be obtained. Ma et al. suggest the deep semantic association between context and target terms by proposing an interactive attention model [18].
is model obtains the two-way weight and combines them to perform aspect-level sentiment classification.
Most of the improved neural network models can achieve better results compared with the original one. However, these methods ignore the role of position relationship between context and target terms. As a result, the polarity of the target terms must be affected under certain positional relationships. e study of Zeng et al. subsequent offers some important insights into the application of position information in classification tasks [34]. For example, understanding the distance between context and target term and how to present such distance by embedding representation will help our aspect-level work. e work of Gu et al. uses a position-aware bidirectional attention network to investigate aspect-level sentiment analysis [20], which provides rich semantic features in embedding words expression.
As noted above, interactive network or position information is particularly useful in studying aspect-level sentiment analysis. So far, little attention has been paid to both of them, simultaneously. Hence, a combination of both interactive concept and position parameter was used in this investigation. is study proposes an improved aspect-level sentiment analysis model by combining multilevel interactive bidirectional Gated Recurrent Unit, attention mechanisms, and position features (MI-biGRU). First of all, the distance from the context to target term in our model will be prepared according to the similar procedure used by the scholars Zeng et al. [34]. On top of that, the word embedding with position information will be trained by using a multilevel bidirectional GRU neural network. Finally, a bidirectional interactive attention mechanism is used to compute the weight matrix to identify the context possessing the semantic association with the target term. MI-biGRU can perform a more classification accuracy for aspect-level tasks than previous models, which will be shown in Section 4.
Discrete Dynamics in Nature and Society

Model Description
is section presents the details of the model MI-biGRU for aspect-level sentiment analysis. In the previous work, several definitions with symbolic differences have been proposed for sentiment analysis. Hence, we need to provide the basic concepts and notations of MI-biGRU classification involved in this paper.
A sentiment tricategory problem represented in MI-biGRU is associated with a three-tuple polarity set (positive, negative, and neutral). Given a sentence with n words, including context and target terms. A target term is usually denoted by the word group composed of one or more adjacent words in context, where the position of the first and last word in the word group is called the start and end position, respectively. e target term embedding sequence can be denoted by [e 1 a , e 2 a , . . . , e m a ] with m predetermined target terms. Notation [p 1 , p 2 , . . . , p n ] represents the relative distance embedding from each word w i c , i ∈ 1, 2, . . . , n, of a sentence to a target term. e overall architecture of MI-biGRU can be illustrated in Figure  . e goal of the model is to extract enough semantic information from two embedding sequences and combine them to perform aspect-level sentiment classification. Notations employed to represent the components of MI-biGRU are described in Table 1. e details of the model are divided into six steps based on their execution order.

Position Representation.
Aspect-level tasks have benefited a lot from position embedding representation for more valuable word features [34]. e concept of relative distance between words serves to quantify the relevance of a sentence word to a target term. We are required to represent the position information by a embeddable vector pattern, which can be formalized as an integer vector or a matrix depending on whether there concerns unique or multiple target terms.
First of all, the word position index of a target term in a sentence will be marked as the cardinal point "0." Discrete spacing from the ith word w i c in a sentence to the cardinal point is called the relative distance of w i c , which is denoted by p i and can be calculated by the formula: Extending this concept in a sentence with n words gives us the following position index list P � [p 1 , p 2 , . . . , p n ].
is can be illustrated briefly by the following two examples. Firstly, if the unique word "quantity" is applied as the target term in the sentence "the quantity is also very good, you will come out satisfied," we can develop the position index list [1, 0 , 1, 2, 3, 4, 5, 6, 7, 8, 9] by setting a cardinal point "0" for the second word "quantity" and deriving an increasingly positive integer from the left or right direction for other words. Secondly, if the target term contains more than one adjacent word, all the internal words are assigned as the cardinal point "0." Other words in the sentence will obtain an increasingly positive integer from the start position of the term to left direction or from the end position of the term to right direction. erefore, the position index list [6, 5, 4, 3, 2, 1, 0 , 0 , 1, 2, 3, 4, 5, 6, 7] will be obtained if we set the case sentence and the target term with "all the money went into the interior decoration, none of it went to the chefs" and "interior decoration," respectively.
On top of that, if multiple target terms are applied to a sentence, we can obtain a position index list sequence that is called the position matrix. Assuming that a sentence has n words and m target terms, let notation P i denote the list index of the ith target term in a sentence. Position matrix G is defined as where m refers to the number of target terms and n is the number of words in the sentence. en, we use a position embedding matrix P ∈ R d p ×n to convert position index sequence into a position embedding, where d p refers to the dimension of position embedding. P is initialized randomly and updated during the model training process. e matrix is further exemplified in the same example "all the money went into the interior decoration, none of it went to the chefs," which contains two target terms "interior decoration" and "chefs." We can obtain the position matrix: e position matrix has been playing an increasingly important role in helping researchers get a better sense of aspect-level tasks. We can first observe the polarity of the emotional words which are near the target term and then consider the other words which are far away to judge whether a sentence is positive or not. For example, as presented in the case "the quantity is also very good, you will come out satisfied," the distance of "good" (4) is closer than "satisfied" (9) by a simple numerical comparison according to the index list [1, 0 , 1, 2, 3, 4, 5, 6,7,8,9], and the approach will give priority to "good" instead of "satisfied" when we judge the sentiment polarity of the subject "quantity" of the sentence.
is study suggests that adding position information to initialize word embedding can provide more features to perform aspect-level sentiment classification.

Word Representation.
One of the basic tasks of sentiment analysis is to present each word in a given sentence by embedding operation. A feasible approach is to embed each word in a low-dimensional real value vector through the word embedding matrix E ∈ R d w ×v , where d w represents the dimension of word embedding and v denotes the size of vocabulary. Matrix E is generally initialized by random number generation operation. en, matrix weight will be updated to reach a stable value in the process of model training. Another feasible method to obtain the matrix E is to pretrain it through the existing corpus [35].
is study uses pretrained Glove (pretrained word vectors of Glove can be obtained from http://nlp.stanford. edu/projects/glove/) from Stanford University to get word embeddings. Four sets of sequence symbols are applied to   , p 2 , . . . , p n ] will be concatenated to get final word embedding representation of each word in a sentence as [w 1 , w 2 , . . . , w n ].

Introduction of Gate Recurrent Unit (GRU).
Recurrent neural network is a widespread network employed in natural language processing in recent years. One advantage of RNN is the fact that it can process variable-length text sequence and extract key features of a sentence. However, model performance of traditional RNN in the case of long sentences has been mostly restricted by the problems of gradient disappearance and explosion during the training process. As a result, RNN was unable to send vital information in text back and forth.
A great deal of previous research into RNN has focused on model variants. Much of the current scholars pays particular attention to LSTM models and the GRU models. Both LSTM and GRU provide a gate mechanism so that the neural network can reserve the important information and forget those that are less relevant to the current state. It has been universally accepted that GRU has the advantages of fewer necessary parameters and lower network complexity compared with LSTM. erefore, this paper plans to use the GRU model to extract the key features of word embedding.
Details of GRU are illustrated according to the network structure shown in Figure 2. e GRU simplifies four gate mechanisms of LSTM, i.e., input gate, output gate, forget gate, and cell state, into two gates that are called reset gate and update gate. At any time step t, GRU includes three parameters: reset gate r t , update gate z t , and hidden state h t . All the parameters are updated according to the following equations: e symbolic meaning are described as follows. x t denotes the input word embedding at time t. h t−1 represents the hidden state at time t − 1. U z , U r , U h ∈ R d w ×d h and W z , W r , W h ∈ R d h ×d h denote weight matrices, where d h indicates the dimension of hidden state. σ and tanh denote the sigmoid and tanh function, respectively. Notation · is the dot product and ⊙ is the elementwise multiplication.
In this study, we choose bidirectional GRU to obtain the vector representation of a hidden layer for the target term and context, which can extract more comprehensive features compared with normal GRU. e bidirectional GRU  Discrete Dynamics in Nature and Society e bidirectional GRU can extract more information carried by words in a given sentence and convert them into hidden states. However, the words differ in their importance to the target term. We should capture the relevant context for different target terms and then design a strategy to improve the accuracy of classification by increasing the intensity of the model's attention to these words. Weight score can be used to express the degree of model concern.
e higher the score, the greater the correlation between target terms and context. Hence, an attention mechanism is developed to calculate the weight score between different target terms and context. If the sentiment polarity of a target term is determined by the model, we should pay greater attention to the words that have a higher score.
is study calculates the attention weight score in opposite directions. One is from target terms to context, and the other is from context to target terms. erefore, the twoway approach was chosen because we can get two weight score matrices in order to improve the performance of our model. e process of using the attention mechanism in the model are described, as shown in Figure 1. First of all, the target term embedding matrix [h 21 a , h 22 a , . . . , h 2m a ] and the averaged context embedding w avg are calculated to obtain the attention vector α i : where f is a score function that calculates the importance of h 2i a in the target term, W m ∈ R 2d h ×2d h and b m ∈ R 1×1 are the weight matrix and the bias, respectively, notation · is the dot product, w T avg is the transpose of w avg , and tanh represents a nonlinear activation function.
On top of that, the averaged target term embedding a avg and context embedding matrix [h 21 w , h 22 w , . . . , h 2n w ] are applied to calculate the attention vector β i for the context. e equation is described as follows: If we obtain the attention weight vectors α i and β i , the target term representation a and the context representation w can be deduced by the following equations: where a ∈ R 2d h and w ∈ R 2d h denote the final word embedding representations of the target term and context, which will be processed in the output layer.

Output Layer.
Target term and context representations described in Section 3.4 will be concatenated as d � [a; w] ∈ R 4d h at the output layer. en, a nonlinear transformation layer and a softmax classifier are prepared according to equation (9) to calculate the sentiment probability value: where W n ∈ R 4d h ×d c and b n ∈ R d c are the weight matrix and the bias, respectively, and d c represents the number of classifications. Probability P k of category was analyzed by using the following softmax function: 3.6. Model Training. In order to improve the model performance for the tasks of aspect-level sentiment classification, this approach deals with the optimization of training process, including word embedding layer, bidirectional GRU neural network layer, attention layer, and nonlinear layer. e crossentropy with L2 regularization is applied as the loss function, which is defined as follows: where λ is a regularization coefficient, ‖θ‖ 2 represents the L2 regulation, y i denotes the correct sentiment polarity in training dataset, and y i denotes the predicted sentiment polarity for a sentence by using the proposed model. e parameter Θ is updated according to the gradient calculated by using a backpropagation method. e formula is as follows: where λ is the learning rate. In the training process, the method designs dropout strategy to randomly remove some features of the hidden layer in order to avoid overfitting.

Experiments
Section 3 has shown the theoretical formulas and operational steps for the MI-biGRU model. is section has attempted to provide a series of experiments relating to four public aspectlevel sentiment classification datasets from different domains. e aim of the experiments is to test the feasibility of applying MI-biGRU to deal with aspect-level tasks and evaluate the effectiveness of the proposed MI-biGRU model. 2016 are related to the restaurant. Each piece of data in the datasets is a single sentence which contains comments, target terms, sentiment labels, and position information. We remove the sentences which the target term is labelled with "null" or "conflict" from the datasets, and the remaining sentences possess a corresponding sentiment label for each target term. e statistics of the datasets are provided in Table 2.

Pretraining.
is section presents the pretraining process for word embedding matrix E that is generally set by a random initialization operation, and the weight score is then updated during the training process. However, E can be pretrained on some existing corpora. e benefit of this approach is that we can obtain the optimal parameters of the model from the high-quality datasets. Hence, a pretrained Glove from Stanford University was adopted in order to improve the model performance. As a result, the parameters of word embedding and bidirectional GRU in this study are initialized with the same parameters in corresponding layers.

Hyperparameters Setting.
Other parameters except pretraining of word embedding are initialized by the sampling operations from uniform distribution U(−0.1, 0.1), in which all bias are set to zero. Both dimensions of word embedding and bidirectional GRU hidden state are set to 300. e dimension of position embedding is considered as 100. e size of batch is placed at 128. We take 80 as the max length of a sentence. e coefficient of L2 regulation and the learning rate is set to 10 − 5 and 0.0029, respectively. is experiment uses dropout strategy with the dropout rate 0.5 in order to avoid suffering from overfitting. It is important to emphasize that using the same parameters for different datasets may not yield the best results. However, there will be a series of parameters that will optimize the execution of the model on each dataset from a global perspective. erefore, we confirm the above parameters as the final hyperparameters of the model through a large number of experiments. In addition, we use the Adam optimizer to optimize all parameters. Experiments have shown that the Adam optimizer performs better than other optimizers such as SGD and RMSProp on our classification task.

Evaluation.
is section presents the performance evaluation indictors for all baseline methods that arementioned in Sections 4.2 and 4.3. e definition of "Accuracy" is as follows: As outlined in Table 3, the symbol TP is the abbreviation for "True Positive," which refers to the fact that both the sentiment label and the model prediction are positive. e symbol FP is short for "False Positive," which means that the sentiment label is negative and the model prediction is positive. Similarly, "False Negative" and "True Negative" are presented as the symbols FN and TN, respectively.
A broader set of estimation indicators precision, recall, and F1-score [36,37] is also adopted in this study: Historically, the term "Precision" has been used to describe the ratio of correctly predicted positive observations to total predicted positive observations, which is generally understood as the ability to distinguish the negative samples. e higher the "Precision," the stronger the model's ability to distinguish negative samples. Previous studies mostly defined "Recall" as the ratio of correctly predicted positive observations to all observations in the actual class [38], which reflects the model's ability to recognize positive samples.
e higher the recall, the stronger the model's ability to recognize positive samples. e term "F1-score" combines "Precision" and "Recall." e robustness of the classification model is determined by "F1-score."

Compared Models.
To demonstrate the advantage of our method on aspect-level sentiment classification, we compared it with the following baselines: (i) LSTM [10]: LSTM is a classic neural network, which learns the sentiment classification labels by the transformation from word embeddings to hidden states, an average operation of the states, and a softmax operation. LSTM has only been carried out in coarse-grained classification tasks and has not dealt with the aspect-level tasks. (ii) AE-LSTM [10]: AE-LSTM is a variant of LSTM, which adds the connection operation between hidden states and target term to generate an attention weight vector. Existing studies use this vector representation to determine the sentiment polarity of a sentence. (iii) ATAE-LSTM [10]: the model structure of ATAE-LSTM is similar to AE-LSTM except for the step of word embedding initialization. Adding the target word embedding into each context embedding at the initialization step can highlight the status of the target term in LSTM and obtain more sufficient features. (iv) IAN [18]: INA is a neural network with interactive structure. Firstly, the model calculates the two-way attention weights between the context and target word to obtain their rich relational features. Secondly, the information of two directions is concatenated to perform aspect-level sentiment classification. (v) MemNet [17]: MemNet is an open model to perform aspect-level sentiment classification by using the same attention mechanism multiple times. e weight matrix is optimized through multilayer interaction, which can extract high-quality abstract features. (vi) PBAN [20]: in contrast to above baselines, PBAN introduces the relative distance between the context and the target word in a sentence to perform aspectlevel sentiment classification. e model focuses more on the words that are close to the target term. e attention mechanism is also used in PBAN to calculate the weight matrix. (vii) MI-biGRU: the model is proposed in this paper.
MI-biGRU combines the concept of relative distance and the improved GRU with interactive structure to perform aspect-level tasks.

Comparison of Aspect-Level Sentiment Analysis Model.
is section presents an application result for all baseline methods that are mentioned in Section 4.2. We evaluated the effectiveness of our method in terms of aspect-level sentiment classification results on four shared datasets. e experiment chooses Accuracy and F1-score to evaluate all these methods because Accuracy is the basic metric and F1score measures both precision and recall of the classification results.
As we can see from Table 4, the Accuracy and F1-score, i.e., 74.28 and 60.24, under dataset Restaurant14, of LSTM is the lowest of all the models. A common low score under different datasets indicated that LSTM lacks the mechanism to process multiple target terms in a sentence, although LSTM averages the hidden states. Developing a model to process more than one target term in a sentence contributes a lot to improve classification performance. An improvement over the baseline LSTM is observed in AE-LSTM and ATAE-LSTM. Particularly, we can notice that the results on the four datasets are better than the baseline LSTM by approximately 2-7% since AE-LSTM and ATAE-LSTM add judgments on target terms.
Introducing interactive structure or attention mechanism to models, i.e., IAN and MemNet in Table 4, is necessarily served as an effective way to improve the assessment scores since the abstract features and relationships between target terms and context play a positive role in model performance. In the process of initializing word embedding, abundant features can be learned if the relative distance is considered. As we can see from the model PBAN in Table 4, with the added position information, its scores surpass most baseline methods except for 79.73 and 80.79 in datasets Restaurant14 and Restaurant16. e proposed model MI-biGRU combines the concept of relative distance, interactive model structure, and the word embedding initialization involving context and target terms to perform aspect-level tasks. Linking the position vector with the context can enrich the input features of the model.
Recently IAN and MemNet models have examined the positive effects of the attention mechanism between context and target terms on aspect-level feature extraction. Improvement model MI-biGRU of bidirectional GRU neural network generates the attention weight matrix from the target term to context and from the context to target term by using interactive attention mechanism twice. Degree found to be influencing classification effect of MI-biGRU is presented in Table 4. We can notice that the classification performance on MI-biGRU is much better than other baseline methods.
We achieve the best Accuracy and F1-score in datasets Restaurant14, Laptop14, and Restaurant15. We can notice that ATAE-LSTM obtained the best Accuracy 85.17 in Restaurant16. e reason why the accuracy of Restaurant16 is not optimal on MI-biGRU may be the imbalance of data. In dataset SemEval 2016, the amount of data with sentiment polarity is very small in the restaurant domain for testing, so the model does not classify some comments correctly. However, the optimal F1-score is also obtained by our model, which illustrates that MI-biGRU is capable of distinguishing positive and negative samples. erefore, we can come to the conclusion that our model can achieve the best performance in four public datasets.

Model Analysis of MI-biGRU.
is section illustrates the rationality of each component in MI-biGRU by contrast experiments. e dependence of model accuracy on the amount of data is described according to a regular data change. Experiments of different technical combinations are shown in Table 5.
Simple random data extraction experiments were utilized to analyze the dependence of model accuracy on the quantity of data. We randomly take 20%, 40%, 60%, and 80% of the data from the dataset of SemEval 2014 in restaurant and laptop domain. e trends of Accuracy and F1-score of baselines LSTM, IAN, PBAN, and MI-biGRU under datasets Restaurant14 and Laptop14 are shown in Figures 3-6. e trends of Figures 3-6 reveal that there has been a steady increase of Accuracy and F1-score of all the baseline methods with the rise of data amount. e Accuracy and F1score of MI-biGRU is low in the initial data amount. However, the obvious finding to emerge from the analysis is that the Accuracy and F1-score of MI-biGRU, at 60% of the data amount, has reached an approximate peak. is may be because rich data is provided to the model, and the advantages of position information and multilevel interaction mechanism are realized. Compared with using a small amount of data, rich data can allow the model to learn more complete semantic features. erefore, the performance of sentiment analysis achieves better results than baseline models. When the amount of data exceeds 60%, our model's performance enhances rapidly and finally reaches the highest score. erefore, from a holistic perspective, MI-biGRU can achieve the best results in classification tasks compared to baseline models if we can reach a critical mass of data.

Case Study.
Working principle and high performance of the novel model can be further illustrated by a case study that visualizes the attention weight between target terms and context according to color shades. e darker color of the words in Figure 7 indicates the greater attention weight,  (9) after MemNet refers to nine computational layers that are adopted.  which is more essential to judge the polarity of a target term. us, the model will pay more attention to these words. is study confirms that when the model judged the polarity of a target term, it paid more attention to the words around it. Target term "weekends" and "French food" are perfect examples in Figure 7. When we distinguish the polarity of "weekends," the words "bit" and "packed" are more critical than the words with bigger relative distances. e words such as "best" that are close to the target term "French food" will have a greater impact on polarity judgment.
One interesting finding was that the model may give some words with small distance a low attention weight. For example, the model actually gives the words "but" and "vibe" a low weight when the target term is "weekends." is phenomenon is normal since the sentiment contribution of the same words in the sentence to different target terms is varied. MI-biGRU will automatically select surrounding words that are worth giving more attention according to the   Discrete Dynamics in Nature and Society specific target term and then judge the sentiment polarity to perform aspect-level sentiment classification.

Conclusion and Future Work
is paper puts forward a novel descriptive multilevel interactive bidirectional attention network (MI-biGRU) which involves both bidirectional GRUs and position information for aspect-level sentiment analysis. We refine the traditional aspect-level models by considering the context, target terms, and relative distance in word embeddings. In addition, two bidirectional GRUs and interactive attention mechanisms are combined to extract abstract deep features in aspect-level tasks.
e experimental results on restaurant and laptop comment tasks demonstrate our advantage over the traditional sentiment classification methods.
As a sentiment classification method, MI-biGRU performs very well on comment context, especially when a critical mass of aspect-level sentiment sentences is reached. Extracting the attention weight of words from the positionlabelled context according to information interaction makes MI-biGRU more effective than other regular methods.
Our future work focuses on finding the embedding conclusions of the words with semantic relationships. Furthermore, we will figure out phrase-level sentiment method with position information.