Dual-Level Attention Based on Heterogeneous Graph Convolution Network for Aspect-Based Sentiment Classification

We introduce a flexible HIN (Heterogeneous Information Network) framework to model user-generated comments. It can integrate various types of additional information and capture the relationship between them to reduce the semantic sparsity of a small amount of labeled data. It can also take advantage of the hidden network structure information by spreading the information together with the graph. Then we propose to use a dual-level attention-based heterogeneous convolutional graph network to understand the importance of different adjacent nodes and of different types of nodes to the current node. By doing this, we can mitigate the shortcomings that most existing algorithms ignore, i.e. the network structure information between the words in the sentence and the sentence itself. The experimental results on the SemEval dataset prove the validity and reliability of our model.


I. INTRODUCTION
Opinion Mining and sentiment analysis has captured increasing attention in academic and industrial circles in recent years due to its wide use. With the increasing popularity of e-commerce services such as Yelp and Meituan, the number of user-generated reviews has increased dramatically. Many user-generated reviews have become important resources for producers to improve product quality. We have also found that with the development of blockchain, a pioneer cryptocurrency, media and public opinion has impacted its development and use [1], [3], [16], [18]. There are also a large number of IoT research which have begun using sentiment analysis to optimize the user experience [2], [7], [17]. In recent years, people have proposed an aspect-based sentiment analysis task, which is a sub-task of sentiment analysis. It aims to identify the sentiment polarity towards a given aspect and provide more detailed feedback information than traditional sentiment analysis jobs [9].
Most existing methods tend to leverage the neural networks to extract contextual information and the representations of aspect categories, such as long/short term memory network (LSTM) and recurrent neural network (RNN) [13], which usually lead to the mismatching of the sentiment polarity with the target aspect. Recently, with the wide use of attention mechanisms in NLP, many models use attention mechanisms to capture semantic information between aspect features and context [10]. The method of combining attention models can further improve the performance of the model, however it ignores the network structure information between the words in the sentence and the sentence itself, and it also reduces performance by introducing inherent disturbances in the attention mechanisms. In addition, the previous studies did not consider how to add additional information to dig and enrich semantic information.
To address the issues listed above, we propose a dual-level attention based on heterogeneous graph convolutional network for aspect-based sentiment analysis.

II. RELATE WORK
The goal of aspect-based sentiment analysis is to identify the specific sentiment word that is aimed towards the target aspect, which is a fine-grained task in ABSA. The development of aspect-based sentiment analysis will be introduced here in the following three stages.
Early studies mainly focused on training a classifier and feeding the feature vector of text into the classifier, then obtaining the classification results such as SVM (support vector machine) and some improved models of SVM. For instance, Wagner et al. [8] introduced the relationship between sentiment word and target aspect to assist in training an improved SVM classifier.
Later on, the recurrent neural network (RNN) got more attention due to it being widely used in NLP [4]. Lots of studies applied RNN to aspect-based sentiment analysis to achieve better performance [5], [6], [13]. For example, Tang et al. and Ruder et al. [5] implemented a hierarchical bi-directional LSTM model to learn the sentences' contextual information. These RNN-based models can achieve better classification results because RNN has many advantages, e.g. the LSTM is better at extracting short-range dependencies among words in sentences. However, these RNN-based methods cannot extract potential correlations between sentiment words and aspect words that are relatively far away in complex sentences.
Recently, lots of studies indicate that the introduction of attention mechanisms can alleviate the above-mentioned problems [6], [10]. A complex sentence may contain several aspects, each word in a sentence may be associated with one or more aspect terms, and a phrase in a sentence may convey sentiment information about a particular aspect term. By introducing the attention mechanism, we can capture the detailed sentiment features towards a specific aspect in the complex sentences. In particular, Wang et al. [10] proposed the ATAE-LSTM model which combines LSTM and the attention mechanisms. The aspect of embedding was used to calculate the attention weights. Ma et al. [6] proposed a model with bi-directional attention mechanisms for interactive learning context and attention weights of aspect terms respectively. Obviously, these models further improved the accuracy of sentiment analysis. Inherent noise, however, was introduced as a result of the attention mechanisms. To be more specific, the network structure information between the words in the comment text and the comment text itself was missed.

III. DAHGCN MODEL
Our method is divided into two steps. First, in order to address the semantic sparsity and the hidden network structure information, we propose a flexible HIN [14] framework. It can also integrate several additional information, which can greatly enrich the semantic information. Then, we propose the DAHGCN model which uses both type-level and node-level attention to analyzes the relationship between aspect term and sentiment term.

A. HIN for User-generated Review
We first present the HIN framework for modeling the review texts which can alleviate the semantic sparsity by integrating some additional information.
The HIN is construct as G = (V, ε), which contains the review texts T = {t 1 , t 2 , . . . , t n }, aspect term A = {a 1 , a 2 , . . . , a k }, and sentiment term S = {s 1 , s 2 , . . . , s n } as nodes. Where V = T ∪ A ∪ S, ε is the set of edges between nodes which represent the relationship between two nodes.

B. DAHGCN Model
As shown in Figure 1, we propose a novel dual-level attention based on heterogeneous graph convolutional network for aspect-based sentiment analysis. It contains nodelevel attention and type-level attention. We embed HIN into HGCN and then introduce the dual-level attention mechanisms mentioned earlier to compute the attention weights of different adjacent nodes and different types of nodes.

HGCN(Heterogeneous Graph Convolution Network):
In this paper, the HIN framework integrates two kinds of additional information. Due to the heterogeneousness among the different types of nodes, the HIN framework cannot be directly applied to traditional GCN (graph convolutional network) [15]. To address this issue, we introduce HGCN (heterogeneous graph convolutional network) which is an improved model of GCN.
Usually, for a graph G = (V, ε), V and ε represent a set of nodes and a set of edges, respectively. It makes X ∈ R |V |×d represent the matrix of all nodes with their nodes, where |V | is the number of nodes and d is the dimension of the feature vectors. In a graph, due to the self-connected of each node, we set the adjacent matrix A = A + I and the layer-wise propagation of GCN is defined as follows: where σ is an active function, A is the normalize adjacency matrix of A, H (l) ∈ R |V |×d is the hidden state of all nodes in l th layer. Initially, H (0) = X and W (l) is a layerspecific trainable transformation matrix. HGCN considers the heterogeneity of all types of nodes and projects them into an implicit common space with their respective transformation matrices. The layer-wise propagation of HGCN is defined as follows: where A τ ∈ R |V |×|Vτ | is the submatrix of A and rows and columns represent all nodes and their neighboring nodes with type τ , respectively. The representation of the nodes H (l+1) is obtained by aggregating information from the features of their neighboring nodes H Type-level Attention. For a target node v, type-level attention can learn the weights of adjacent nodes of different types. Particularly, we first use the embedding vector h τ = v A vv h vv to represent the type τ , it represents the sum of adjacent nodes features with type τ . Then we calculate the type-level attention score.
Where μ T is the attention vector of type τ , || is "concatenate". Then, we use the softmax function to normalize all attention scores to get the final attention weight.
Node-level Attention. We introduce node-level attention to learn the importance of different adjacent nodes and reduce the weights of the noise information. If a target node v and its neighboring nodes v are given, we use the embedding vector h v of node v, embedding vector h v of node v and the type-level attention score of node as the input to calculate the node-level attention score. It is defined as follows: where v T is the attention vector of node v. Then we normalize the node-level attention scores with softmax function to the final attention weight. Finally, we add the dual-level attention into the HGCN.
where B τ is the attention weights matrix of all nodes. we train the embedding into softmax for classification and obtain Z. We use the L 2 -norm as the loss function in our model training, it defined as, Where C is number of sentiment polarity category, D train is the set of review numbers for training, η is the L2regularization term, and Θ is the parameter set.

A. Datasets and Comparison models
We will use the SemEval competition dataset to evaluate the DAHGCN model at the sentence level of aspect-level sentiment classification. The data set contains user-generated reviews of restaurants. Each data set contains a target aspect, an aspect term, and the aspect-specific sentiment polarity. The data sets are detailed in Table I to make explicit use of the dependency relationships between words to directly spread emotional features from the syntactic context of aspect targets, and to classify aspect-level emotions [12].

B. EVALUATION AND ANALYSIS
As the comparison results are shown in Table II, the accuracy of our DAHGCN method is far better than other models. Firstly, we analyze these algorithms from the attention mechanism. ATAE-LSTM is used attention mechanisms to learn the relationship between aspects and emotions, but it does not consider the relationship between sentences. DAHGCN adds node-level attention and can effectively learn the importance of adjacent nodes to the current node so it can get the connection between sentences. Secondly, we investigate from the perspective of HIN. It can be seen from the classification results of each data set in Table II, the accuracy of HGCN is generally better than that of ASGCN or ATAE-LSTM. Compared with other algorithms, we argue that the main reason is that HGCN uses HIN to further dig the semantic information of the text. In this way, its node feature representation is more accurate, so its results are better. In summary, DAHGCN adopts the HIN mechanism and improves the attention mechanism, hence why its result is the best among all algorithms. In order to have an intuitive understand how DAHGCN works, a case study is used. For instance, the sentence "Great food but the service was dreadful", it contains two aspects which may result in sentiment mismatch by using the model which combines attention and recurrent neural network. The other sentence "If the service is better, I will go back again." uses a word "if", bringing extra difficulty in detecting implicit semantics. For the first sentence, our model can easily supervise the sentiment words and enable the model to concentrate the aspect-specific sentiment word. But it is challenging for our model to collect the logical information in the second sentence. That is because the second sentence expresses negative feelings towards the service without any obvious negative words, hence why it is difficult to make an accurate prediction.

V. CONCLUSION AND FUTURE WORK
In this paper, we proposed novel dual-level attention based on heterogeneous graph convolutional network for aspectbased sentiment analysis. We first proposed a HIN framework that integrates two kinds of additional information (aspect term and sentiment term). Then we introduced a dual-level attention mechanism to learn the importance of different adjacent nodes and different types of nodes. The experimental results have shown that this method is superior to other methods.