Dual-Level Attention Based on a Heterogeneous Graph Convolution Network for Aspect-Based Sentiment Classification

,


Introduction
With the introduction of 5G, the combination of 5G and artificial intelligence promotes information integration and improves people's smart device experience [1]. For example, China Mobile proposed the "5G+ACIDE" plan. This also makes it more convenient for people to use smart devices to express their reviews. For instance, on Yelp and Meituan [2,3], the number of user-generated reviews has increased dramatically. When analyzing users' text data, it is noted that a user's review may contain many aspects. Traditional text classification methods used by smart devices usually ignore the importance of multiple aspects of the information. The addition of aspect-based information can increase the accuracy of text data analysis. For example, by collecting Yelp restaurant review data to analyze what the user's attitudes implied in it are, we can infer whether the restaurant is worth going to and which aspect of the restaurant can be improved. In recent years, people have proposed an aspect-based sentiment analysis task, which is a subtask of sentiment analysis [4,5]. It is aimed at identifying the sentiment polarity towards a given aspect and providing more detailed feedback information than traditional sentiment analysis jobs [6][7][8][9]. An example of a user-generated review is shown in Figure 1. "Great food, but the service is dreadful." We can find that "great" is a specific sentiment word towards foods and "service" is the sentiment word corresponding to the given aspect service. It is helpful for the restaurant to attract customers by improving their dining experience through sentiment analysis by extracting their evaluation of all aspects of the restaurant. With the rapid development of the Internet, the Internet of Things, and cloud computing, data in each field has increased tremendously, e.g., science, technology, software industry, and business. The Internet of things here means machines embedded with iBeacon or sensors that collect and store data for analysis [10][11][12][13]. The wave of big data generated by the Internet of things will drive the growing demand for data analysis. Big Data Sentiment Analysis (BDSA) has developed into a very hot topic, attracting wide attention from all investors and stakeholders from the global academic community, industry, and government. All of these also apply to IoT datasets [9,[14][15][16][17]. The Internet of things (IoT) enables massive terminal connections and generates massive amounts of data through high-speed Internet and sensor technologies [18]. As shown in Figure 2, we can see the relationship between IoT/sensor network and sentiment analysis. We have also found that with the development of the blockchain which is a pioneer cryptocurrency [15,[19][20][21][22], media and public opinions show increased usage of text analysis [23,24]. Blockchain is considered to be an innovative technology with epoch-making significance in recent years. Blockchain is a peer-to-peer distributed ledger based on cryptography and network sharing system, which is decentralized, transparent, and open [25]. The research and application of sentiment analysis in these different emerging research fields show that the field of sentiment analysis is also expanding and has a positive impact on practical applications.
The overall flowchart of sentiment analysis is shown in Figure 3. We can know that sentiment classification is performed by sentiment identification and feature selection of the user comments. Previously, most existing methods leverage the neural networks to extract contextual information and the representations of aspect categories, such as longshort-term memory network (LSTM) [26,27] and recurrent neural network (RNN) [28,29], which usually lead to mismatching of the sentiment polarity with the target aspect. Recently, with widespread application of attention mechanisms in NLP, we noticed that lots of ASC (aspect-level sentiment classification) models combine the attention mechanisms with RNN and achieve great performance [18,21,[30][31][32][33][34]. Chen et al. [35] applied multiattention mechanisms to capture the correlation between sentiment word and the target aspect separated by a long distance. Tang et al. [36] incorporated the attention mechanisms to focus on different parts of a sentence when different aspects are taken as input. These models use attention mechanisms to capture semantic information between aspect features and context. The method of combining attention models can fur-ther improve the performance of the model. However, it ignores the network structure. In addition, it may end up with suboptimal performance if inherent noises are introduced in the attention mechanisms.
Recently, a hierarchical attention network proposed by Cheng et al. [37] and Gao et al. [38] introduced the location relation and uses the aspect features to extract the sentiment features. However, there is no explanation for the relationship between syntactic constraints and long-distance dependency. Due to few labeled data and a large amount of unlabeled data, existing aspect-based sentiment classification methods still have the problem of semantic sparsity in modeling. In addition, the previous studies did not consider how to add additional information to mine and enrich semantic information.
To address the above issues, we propose a novel duallevel attention based on a heterogeneous graph convolutional network for aspect-based sentiment analysis. Firstly, we propose a flexible HIN (heterogeneous information network) framework to model the review to alleviate semantic sparsity and enrich semantic information. It can integrate several amounts of additional information and learn the hidden network structure information. In this paper, we introduce two types of additional information (e.g., aspect term and sentiment term) which greatly enriches the semantic information. Then, we propose a dual-level attention based on a heterogeneous graph convolutional network for aspect-based sentiment analysis. Due to the heterogeneity between the different types of information, HIN cannot be directly embedded into GCN. So we adopt GCN's improved model HGCN and embed HIN into HGCN. And we apply a duallevel attention to learn the importance of different adjacent nodes and the importance of different types of nodes to the current node. In this way, by adjusting the weights of different types of nodes to learn the relationship between aspect items and sentiment items, it can alleviate the long-distance dependence between aspect items and sentiment items. By increasing the weights of different types nodes and learning the relationship between aspect item and sentiment item, it can alleviate long-distance dependency between aspect item and sentiment item. Experimental results on SemEval competition datasets demonstrate the effectiveness and reliability of our approach.
The remainder of this article is organized as follows. Section 2 introduces related works. Section 3 introduces the framework and details of the DAHGCN model. Section 4 presents the experimental evaluation results and discussion. Finally, the main conclusions are summarized in Section 5.

Sentiment polarity
Aspect category <service, negative> <food, positive> Great food but the service was dreadful! Figure 1: An example of aspect-level sentiment classification. The underlined words are aspect terms. The bold-faced words are aspect-related sentiment terms and sentiment words.

Related Work
Aspect-based sentiment analysis is aimed at identifying the specific sentiment word towards the target aspect, which is a fine-grained task in the ABSA task. We will introduce the development of aspect-based sentiment analysis in the following three periods. Early studies mainly focus on training a classifier including SVM (support vector machine), and some improved models of SVM. They feed the feature vector of text into the classifier and then obtain the classification results. For example, Wagner et al. [39] introduced the relationship between a sentiment word and target aspect to assist in training an improved SVM classifier.
Later, the recurrent neural network (RNN) abstracts more attention with widespread application in NLP [40][41][42]. Lots of studies introduce RNN into aspect-based sentiment analysis and achieve better performance [27,36,[43][44][45]. For instance, Tang et al. [36] used two LSTMs to model the bidirectional semantic context of a given aspect. Ruder et al. [35] implemented a hierarchical bidirectional LSTM model to learn the sentences' contextual information. These RNN-based models achieve better classification results because RNN has many advantages; i.e., the LSTM is better at extracting short-range dependencies among words in sentences [29,46,47]. However, these RNN-based methods cannot extract potential correlations between sentiment words and aspect words that are relatively far away in complex sentences.
Recently, lots of studies indicate that the introduction of attention mechanisms can alleviate the aforementioned problem [24,[30][31][32][33][34]. A complex sentence may contain several aspects, each word in a sentence may be associated with one or more aspect terms, and a phrase in a sentence may convey sentiment information about a particular aspect term. By introducing the attention mechanism, we can capture the detailed sentiment features towards the specific aspect in the complex sentences. In particular, Wang et al. [31] proposed the ATAE-LSTM model which combines LSTM and attention mechanisms. It takes the aspect of embedding to calculate the attention weights. Ma et al. [44] proposed a model with bidirectional attention mechanisms for effective learning context and attention weights of aspect terms, respectively. Obviously, these models further improve the accuracy of sentiment analysis. But, while introducing attention mechanisms, they also introduce its inherent noise. In particular, they ignore the network structure information between the words in the comment text and the comment text itself.
Moreover, all these models lack an explanation of syntactic constraints and long-distance dependence. Due to the short review text and the few labeled data, most existing methods have the problem of semantic sparsity while modeling text. And these studies do not consider integrating additional information for enriching semantic information. Thus, we propose a novel dual-level attention based on a heterogeneous graph convolutional network for aspect-based sentiment analysis.

DAHGCN Model
In this paper, we propose a novel dual-level attention based on a heterogeneous graph convolutional network for aspect-based sentiment analysis which leverages the full advantage of the few labeled data by allowing the information propagation along with the graph. Our method involves two steps. First, in order to address the semantic sparsity and mind the hidden network structure information, we propose a flexible HIN (heterogeneous information network) [13,48,49] framework. It also can integrate several amounts of additional information to greatly enrich the semantic information. Then, we propose the DAHGCN model which uses both type-level and node-level attentions to learn the relationship between the aspect term and sentiment term. It can not only improve the ability to learn the importance of different nodes by reducing the weights of noisy information but also learn the importance of different node types.

HIN for a User-Generated Review.
We first present the HIN framework for modeling the review texts which alleviates the semantic sparsity by integrating some additional information and greatly enriches the semantic information among the review text and additional information. It is also helpful for smart devices to conduct the analysis of users' text data by reviewing multiple aspects of the information.
Previous studies usually adopt the topic of the text and external knowledge to enrich semantic information. However, it does not apply to the ABSA task. In this paper, we introduce the relationship between aspect features and sentiment features to enrich the semantic information. Thus, we consider two types of additional information, i.e., aspect term and sentiment term. As shown in Figure 4, the HIN is constructed as G = ðV, ϵÞ, which contains the review texts T = ft 1 , t 2 , ⋯, t n g, aspect term A = fα 1 , α 2 , ⋯, α k g, and sentiment term S = fs 1 , s 2 , ⋯, s n g as nodes, where V = T ∪ A ∪ S is a set of edges representing the structural relationships between two nodes. We will show the details of HIN as follows.
The construction of the HIN mainly includes two steps. First, we recognize the aspect term in the review text and map them to Wikipedia with the linking tool TAGME. And we take the aspect term as a whole word and learn the embeddings of the aspect term. In order to enrich the   Wireless Communications and Mobile Computing semantic information, we introduce the similarity among the aspect terms; if the score that is computed by their embeddings between two aspect terms is above a predefined value δ a , we build a link for them. Second, in the same way, we recognize the sentiment term in the review text and map them to Wikipedia. And we also take the sentiment term as a whole word, learn the embeddings of the sentiment term, define a value δ b , and compute the similarity between two sentiment term embeddings. If the score is above δ b , we link them through an edge.
We can learn the relationship among the aspect terms, texts, and sentiment terms, relationships between sentiment terms, and relationship between aspect terms by integrating both additional information of the aspect term and sentiment term, as shown in Figure 4. It greatly enriches the information of the review texts and improves the performance of the ABSA task. Figure 5, we propose a novel dual-level attention based on a heterogeneous graph convolutional network for aspect-based sentiment analysis. It contains node-level attention and type-level attention. We first embed the HIN into HGCN. Then, we introduce the duallevel attention mechanisms to compute the attention weights of different adjacent nodes and different types of nodes. Finally, we use a softmax layer to give prediction results of sentiment classification.

HGCN (Heterogeneous Graph Convolution Network).
In this paper, the HIN framework integrates two kinds of additional information. Due to the heterogeneity among the different types of nodes, the HIN framework cannot directly apply to traditional GCN (graph convolutional network) [50]. To address this issue, we introduce HGCN (heterogeneous graph convolutional network) which is an improved model of GCN.
Hu et al. [51] first propose a multilayer neural network which can operate in a homogeneous graph. Usually, for a graph G = ðV, ϵÞ, where V and ϵ represent the set of nodes and edges, respectively, it makes X ∈ R |V|×d represent the matrix of all nodes with their nodes, where |V | is the number of nodes and d is the dimension of the feature vectors.
Each row of the matrix represents a feature vector x v of a node. In a graph, due to the self-connection of each node, we set the adjacent matrix The layer-wise propagation of GCN is defined as follows: where σ is an active function,Ã is the normalized adjacency matrix of A, and H ðlÞ ∈ R |V|×d is the hidden state of all nodes in the l th layer. Initially, H ð0Þ = X and W ðlÞ are layer-specific trainable transformation matrices. HGCN considers the heterogeneity of all types of nodes and projects them into an implicit common space with their respective transformation matrices. The layer-wise propagation of HGCN is defined as follows: where f A τ ∈ R jVj×d is the submatrix ofÃ and rows and columns represent all nodes and their neighboring nodes with type τ, respectively.  ' to represent the type τ; it represents the sum of adjacent node features with type τ. Then, we leverage the target node embedding vector h v and type embedding vector h τ to calculate the type-level attention score.
where μ T is the attention vector of type τ and | is the "concatenate." After normalizing the attention weights of all types, the attention weights are defined as 3.4.2. Node-Level Attention. We introduce node-level attention to learn the importance of different adjacent nodes and reduce the weights of the noise information. Particularly, given a target node v and its neighboring nodes v ' , we use the embedding vector h v of node v, the embedding vector h v ' of node v ' , and the type-level attention score of a node as the input to calculate the node-level attention score. It is defined as where v T is the attention vector of node v. After normalizing the attention weights of all neighboring nodes, the attention weights are defined as Finally, we add the dual-level attention into the HGCN.
where B τ is the attention weight matrix of all nodes; we use β vv ' to represent the element in the v th row v ' column of the matrix.

Aspect-Based Sentiment Classification Prediction.
Based on DAHGCN, we can obtain the text embeddings H ðLÞ of review text in HIN. Then, we train the embedding into softmax for classification.
We use the L 2 -norm as the loss function in our model training, and it is defined as where C is the number of sentiment polarity categories, D train is the set of review numbers for training, η is the L 2 -regularization term, and Θ is the parameter set.

Datasets and Experiment Settings.
We use the SemEval competition dataset to evaluate the DAHGCN model at aspect-level sentiment classification. It includes task 4 of the SemEval 2014 competition, task 12 of SemEval 2015 competition, and task 5 of the SemEval 2015 competition. The dataset contains user-generated reviews of restaurants. Each dataset contains a target aspect, an aspect term, and the aspect-specific sentiment polarity. They are labeled as f positive, negative, neutralg. The datasets are detailed in Table 1, where Rest represents restaurant reviews and 14, 15, and 16 [52][53][54] indicate the year the datasets came from.
And Table 2 shows the entity labels and attribute labels of datasets. The baseline dataset of each classification is divided into a training set (70\%) and a test set (30\%) by using the random segmentation tool in Sklearn.
We selected accuracy, recall, and macro-F1 to compare and evaluate each method [55]. They are defined with reference to Table 3, and their formulas are as follows: F1_score = 2 × recall × precision recall + precision , ð12Þ

Model for Comparison
(1) AT-LSTM: it adopts the attention mechanism in LSTM to generate a weighted representation of a sentence [31].
(2) ATAE-LSTM: this method is an extension of AT-LSTM. This model appends input aspect embedding into each word input vector to model the interdependence between words and the input aspect [31].
(3) HEAT-BiGRU: the model captures the aspect information of a text and uses the aspect information to capture the aspect-specific sentiment information [37].
(4) ASGCN: it builds a graph convolutional network (GCN) on the sentence's dependency tree. It uses syntactic information and word dependencies to resolve errors and identify grammatically unrelated context words as clues for judgment emotion [56].
(5) TD-GAT: it applies a graph to make explicit use of the dependency relationships between words to directly spread emotional features from the syntactic       Tables 4, 5, and 6, the accuracy, recall, and macro-F1 of our method DAHGCN are better than those of other models. It is also easy for us to draw this conclusion from Figures 6, 7, 8, 9, 10, and 11. We argue that it benefited from HIN, which integrates two types of additional information: aspect and emotion. The additional information greatly enriches the semantic information of the nodes, which effectively reduces the semantic sparseness. In addition, the node-level and type-level attention mechanisms we adopted enable the model to fully learn the importance of different neighbor nodes and different types of nodes to the current node. At the same time, these attention mechanisms also alleviate noise. These may be the reason why DAHGCN's results are the best. Firstly, we analyze these algorithms from the attention mechanism. AT-LSTM, ATAE-LSTM, and HEAT-BiGRU all used attention mechanisms to learn the relationship between aspects and emotions, but they do not consider the relationship between sentences. DAHGCN adds node-level attention, and it can effectively learn the importance of 7 Wireless Communications and Mobile Computing adjacent nodes to the current node so it is able to capture the dependency relationships between sentences. Therefore, DAHGCN can obtain better classification results. It is shown in Table 4 that the accuracy of DAHGCN is better than that of the above algorithm, and the results in Tables 5 and 6 also support this conclusion. We can see that accuracy results of DAHGCN are 0.7%, 0.4%, and 0.2% higher than those of AT-LSTM, ATAE-LSTM, and HEAT-BiGRU, respectively. Recall results of DAHGCN are 1.0%, 1.2%, and 0.7% higher than those of AT-LSTM, ATAE-LSTM, and HEAT-BiGRU, respectively. Macro-F1 results of DAHGCN are 1.0%, 1.1%, and 0.6% higher than those of AT-LSTM, ATAE-LSTM, and HEAT-BiGRU, respectively. Secondly, we discuss DHGCN from the perspective of HIN. It can be seen from the classification results of each dataset in Table 4 that the accuracy of HGCN is generally better than that of ASGCN, AT-LSTM, ATAE-LSTM, and HEAT-BiGRU. It is also shown in recall and macro-F1 (Tables 5 and 6) that HGCN results are better than those of ASGCN, AT-LSTM, ATAE-LSTM, and HEAT-BiGRU. Specifically, as seen intuitively from Figure 12, HEAT-BiGRU performs slightly better than HGCN in binary prediction and HGCN is better than AT-LTSM and ATAE-LTSM, but we can see that HGCN performs better in 3-class prediction. Adding heterogeneous information and supplementing classification judgment information enable HGCN to perform more stably. Compared with other algorithms, we argue that the main reason is that HGCN uses HIN to further mine the semantic  Wireless Communications and Mobile Computing information of the text. In this way, its node feature representation is more accurate, so its results are better. In summary, DAHGCN adopts the HIN mechanism and improves the attention mechanism, so its result is the best among all algorithms.
From the perspective of the attention mechanism, AT-LSTM, ATAE-LSTM, and HEAT-BiGRU all use attention mechanisms to learn the relationships between aspects and sentiments, but they do not consider the relationship between sentences. However, DAHGCN uses node-level attention to effectively learn the importance of neighboring nodes to the current node, so as to learn the relationship between sentences, so as to get better classification results, as shown in Table 4, from Rest14 data. The results of the sec-ond classification of the set can be seen that the accuracy of DAHGCN is better than that of ASGCN (1.0%), AT-LSTM (0.6%), ATAE-LSTM (0.4%), HEAT-BiGRU (0.2%), and TD-GAT (0.8%). The results in Tables 5 and 6 also show the excellence of DAHGCN. As we know, the attention mechanism can extract information about keywords in a sentence. Extracting attention weights for classification can improve the classification effect. Experiments have proven that using the attention mechanism can achieve good results. However, as the amount of information contained in the data and the complexity increase, more information needs to be considered. In this paper, by establishing a graph network, supplementing information, and filtering information by designing a dual-level attention mechanism, a better 9 Wireless Communications and Mobile Computing experimental effect was obtained. For some complex situations, the attention mechanism will also have difficulties. This difficulty will be mentioned below.
In order to have an intuitive understanding on how DAHGCN works, we present a case study to show it. For instance, the sentence "Great food but the service was dreadful" contains two aspects which may result in sentiment mismatch by using the model combining attention and recurrent neural network. The other sentence "If the service is nice, I will go back again" uses a word "if," bringing extra difficulty in detecting implicit semantics.
For the first sentence, our model can easily supervise the sentiment words and enable the model to allocate more attentions to the aspect-specific sentiment words. In particular, "Great" is the sentiment term towards food, and "dreadful" is the sentiment term corresponding to service. But our model has difficulty learning the logical information in a sentence. For the second sentence "if the service is nice, I will go back again," it expresses negative feelings towards the service, but there are no obvious negative words; it is difficult to make a correct prediction. Our approach can easily extract the sentiment information "nice" but cannot learn the word "if." In this case, the attention mechanism is difficult to learn. In the process of learning, it is difficult for attention to associate sentiment words with the label; i.e., it is difficult to learn attention weights. This is also the direction we need to study in the future.
As can be seen from Figures 6, 7, and 8, the comparison method performs worse on the dataset Rest15 than on Rest14 and Rest16. This is because the data of Rest15 is relatively large, and you can see from Table 1 that the sentiment information of the Rest15 dataset is less. So the result of the comparison method is relatively poor on this dataset. The same results can be seen from Figures 9, 10, and 11. Of course, there are differences in the performance of the comparison method between binary prediction and 3-class prediction. 3-class prediction requires more integrated information for judgment, and at the same time, sentences like "if the service is nice, I will go back again" will affect the classification judgment more.
From the results, it is evident that the proposed model has a comparable performance that is superior to that of the other models. By combining with the previous experiments, it can be proven that the proposed model can achieve relatively high classification performance.

Conclusion and Future Work
In this paper, we propose a novel dual-level attention model based on a heterogeneous graph convolutional network for aspect-based sentiment analysis. We first propose a HIN framework that integrates two kinds of additional information (aspect term and sentiment term). To mine the hidden network information and semantic information, we construct a heterogeneous graph network which can also address the semantic sparsity. By leveraging the HGCN, the embedding information of a word and review text can be learned. And we introduce a dual-level attention mechanism to learn the importance of different adjacent nodes and different types of nodes. Finally, the aspect sentiment classification is transformed into node classification. The experimental results show that the results after we add aspect information to HGCN are better than those of other methods in terms of accuracy, recall, and macro-F1 measures. It facilitates analysis of multiple aspects of user text by smart devices. In the future work, we will make use of the grammatical structure and the dependence between different sentiments in the review text to improve the effect of sentiment analysis at the aspect level.

Conflicts of Interest
The authors declare that they have no conflicts of interest.