Aspect-Level Sentiment Analysis Approach via BERT and Aspect Feature Location Model

With the rapid development of Internet social platforms, buyer shows (such as comment text) have become an important basis for consumers to understand products and purchase decisions. The early sentiment analysis methods were mainly text-level and sentence-level, which believed that a text had only one sentiment. This phenomenon will cover up the details, and it is difficult to reflect people’s fine-grained and comprehensive sentiments fully, leading to people’s wrong decisions. Obviously, aspect-level sentiment analysis can obtain a more comprehensive sentiment classification by mining the sentiment tendencies of different aspects in the comment text. However, the existing aspect-level sentiment analysis methods mainly focus on attention mechanism and recurrent neural network. They lack emotional sensitivity to the position of aspect words and tend to ignore long-term dependencies. In order to solve this problem, on the basis of Bidirectional Encoder Representations from Transformers (BERT), this paper proposes an effective aspect-level sentiment analysis approach (ALM-BERT) by constructing an aspect feature location model. Specifically, we use the pretrained BERT model first to mine more aspect-level auxiliary information from the comment context. Secondly, for the sake of learning the expression features of aspect words and the interactive information of aspect words’ context, we construct an aspect-based sentiment feature extraction method. Finally, we construct evaluation experiments on three benchmark datasets. The experimental results show that the aspect-level sentiment analysis performance of the ALM-BERT approach proposed in this paper is significantly better than other comparison methods.


Introduction
E-commerce is a thriving industry with increasing importance to the global economy. Particularly with the rapid development of social media, more and more users begin to express their sentiments on various online platforms. These comments reflect the sentiments of users and consumers and provide sellers and governments with a lot of valuable feedback on the quality of goods or services [1][2][3]. For example, before purchasing a product, the users can browse a large number of comments about the product on the e-commerce platform to determine whether the product is worth buying. Similarly, governments and companies can collect a large number of public comments directly from the Internet and analyze users' opinions and satisfaction from them, so as to meet their needs. Therefore, as a basic and key work of natu-ral language processing (NLP), sentiment analysis has attracted widespread attention from the theoretical and practical circles [4]. However, the classic sentiment analysis task can only determine the users' sentiment polarities (e.g., positive, negative, and neutral) of the product or event from the entire sentences and cannot determine the sentiment polarity of a particular aspect of the sentence, let alone identify the multiple sentiments existing in a single sentence. In contrast, aspect-based sentiment analysis is a more fine-grained classification task, which can identify the sentiment polarities of multiple aspects in a sentence. Specifically, this scene is shown in Figure 1, where a sentence as a whole has an overall sentiment, and there are also multiple aspect-level sentiments. We can observe from the comment text: in the "It didn't come with any software installed outside of windows media, but for the price, I was very pleased with the condition and the overall product," the emotional polarity of "software" is negative, "Windows Media" is neutral, and "price" and "very satisfied" are positive. Among them, these different sentiment words are called aspect words.
In recent years, researchers have proposed various methods to complete aspect-level sentiment analysis. Among them, the supervised machine learning algorithm has the best effect [5][6][7]. However, such statistical-based methods rely on carefully designed manual features on large-scale datasets, resulting in a lot of waste of manpower and time [8,9]. The neural network model can automatically learn the lowdimensional representation of reviews without relying on artificial feature engineering. This feature allows neural networks to be used for aspect-level sentiment analysis tasks and has attracted the attention of researchers [10,11].
Unfortunately, the existing methods mainly use recurrent neural network (RNN) [12] or convolutional neural network (CNN) [6] to mine the semantic information of aspect word and its context, which is easy to ignore the fact that they are insensitive to the location of key components [10,13]. Researchers have proved that the emotional polarity of the aspect word is highly correlated with the word order of the aspect word information [4], which means that the emotional polarity of aspect words is more easily affected by the context of aspect words with similar distance [14]. Besides, the neural network is difficult to capture long-term dependencies between aspect words and context, which causes a loss of valuable information. Even if the attention mechanism [15] can be positioned in the right context to alleviate this problem, but the problem still remains and limits their performance.
For the sake of solving the aforementioned problems, on the basis of Bidirectional Encoder Representations from Transformers (BERT) [16], this paper establishes an aspectlevel sentiment analysis approach based on BERT and aspect feature location model (i.e., ALM-BERT). The core idea of the ALM-BERT approach is to recognize the emotion of different aspect words in the text, consider the contextual interaction information of aspect words, and reduce the interference of irrelevant words, thus forming an effective aspect-based sentiment analysis framework. The main contributions of this paper are as follows: (i) Based on the pretrained general model BERT, we have constructed a multiangle text vectorization mechanism that can obtain high-quality contextual information representation and aspect information representation. In addition, we also construct an aspect-based sentiment feature extraction method. This method utilizes an encoder based on the multihead attention mechanism to learn the expression features of the aspect words and the interactive information of the aspect word context, which can effectively distinguish different sentences and different contributions of different aspect words (ii) We construct an aspect feature location model to capture the aspect information when modeling sentences and integrate the complete information of the aspect words into the interaction semantics. This model can effectively reduce the influence of noise words that have nothing to do with aspect words and improve the integrity of aspect word information (iii) We conduct aspect-level sentiment analysis evaluation experiments on three benchmark datasets. The experimental results show that the accuracy and macro-F1 score of our proposed model (i.e., ALM-BERT) on the Restaurant dataset are 13.66% and 29.76% higher than those of the baseline MGAN models, respectively. At the same time, the accuracy of the ALM-BERT model on comment texts of different lengths is also better than other comparison methods. This shows that the ALM-BERT approach can better mine the users' aspect-level sentiments We organize the remainder of this paper as follows: in Section 2, we introduce some related works on aspect-based sentiment analysis task briefly, the problem formulation is described in Section 3, we present the proposed model and its training process in detail in Section 4, experimental evaluation and result analysis are given in Section 5, and we conclude the paper and briefly discuss the future work in Section 6.

Related Works
The core goal of aspect-based sentiment analysis is to recognize the sentiment polarity of different aspect words in a given text, which means that it can mine more fine-grained sentiments, so it has become a research hotspot in the current sentiment analysis field. Currently, aspect-based sentiment analysis methods are mainly divided into two categories: It didn't come with any so ware installed outside of windows media, but for the piece, I was very pleased with the condition and the overall product.

Negative
Neutral Positive Positive User Figure 1: An example of consumer review with three aspect terms. Black represents sentence level sentiment analysis, while red, green, and blue represent the sentiment of corresponding aspect word, respectively.
2 Wireless Communications and Mobile Computing classic aspect-based sentiment analysis methods and neural network-based sentiment analysis methods.

Classical Aspect-Based Sentiment Analysis Methods.
In the field of aspect-based sentiment analysis, early research mainly focused on traditional machine learning methods, including rule-based methods [17] and statistical-based methods [18]. These studies generally relied on laborious manual annotation and feature engineering and then employed traditional machine learning to establish a sentiment classifier [19]. For example, Qiu et al. [20] analyzed the relationship between aspect words and sentiment polarity according to the grammatical features. Analogously, Liu et al. [21] proposed a word alignment model to identify aspect words and sentiment polarity based on grammatical information. Subrahmanian and Reforgiato [22] proposed a comprehensive framework that fully considered the information of adjectives, verbs, and adverbs. Jing et al. [23] presented a topic modeling method and utilized grammatical features to help separate aspect words and sentiment words. Wu et al. [24] introduced the concept of phrase dependency parsing and took phrase fragments as an important part of identified polarity of sentiment. Zhao et al. [19] proposed a novel method, which decided the sentiment polarity of aspect words according to the grammatical features of the words related to aspect words. Kiritchenko et al. [25] adopted a support vector machine algorithm based on n-gram features, parse features, and lexical features. Although these methods have achieved certain results, they rely too much on manual annotation and feature engineering, which means that there are performance bottlenecks that are difficult to break through.

Neural
Network-Based Sentiment Analysis Methods. Different from the traditional methods mentioned above, the neural network can automatically learn continuous and low-dimensional representation features from the text without relying on manual feature engineering. In other words, the neural network can effectively solve the problems of excessive dependence on manual annotation and feature engineering in the above-mentioned traditional methods. Therefore, more and more researchers have constructed a series of aspect-based sentiment analysis methods based on neural networks. Tang et al. [26] constructed a Target-Dependent Long Short-Term Memory (TD-LSTM) model based on two LSTM networks, which concatenates the left context representation and right context representation of the aspect as the final context representation for predicting the sentiment. Moreover, neural network models based on attention mechanism, which was proposed in machine translation task, have been successfully applied in aspectbased sentiment analysis. Wang et al. [27] designed an LSTM model based on the attention mechanism, which can focus on the important parts related to aspect words in a sentence. Chen et al. [28] utilized a bidirectional LSTM and multiple attention mechanism to pick up important features to predict the final sentiment. Ma et al. [29] employed an interactive attention mechanism to obtain the context representation and aspect word representation. Ou et al.
[4] established a neural network with an attention-overattention model based on LSTM. The neural network models aspect words and context at the same time, which can mine important auxiliary information in aspect words and context.
Recently, the pretraining model BERT, which can not rely on labeled data, has attracted the attention of academia and industry. Specifically, the BERT model can train a general model with preliminary natural language features only by using a large amount of unlabeled text [16]. Of course, the BERT model needs to be further fine-tuned using labeled data to complete the training of the predictor. For instance, Song et al. [30] regarded the BERT model as the embedding layer to obtain the vector representation of context and have achieved good results. Qui et al. [31] proposed a novel auxiliary sentence construction method and transformed aspect-based sentiment classification task into a sentencepair classification task. Gao et al. [32] constructed a BERTbased encoder to determine the sentiment polarity of aspect words.
The above-mentioned research has made some progress, but there are still many problems. For example, the standard BERT model only provides local context information [33], ignoring the differences in the emotional polarity and importance of words in different aspects. In addition, most of these existing studies do not explicitly model the complete information of the aspect words in a sentence. However, other researchers have indicated that the irrelevant information to aspect words would severely degrade the performance of the model [18]. Therefore, it remains a challenging task to identifying the sentiment polarity of different aspects.

Problem Formulation
Aspect-based sentiment analysis refers to the process of outputting the sentiment polarity of each aspect word in a sentence with a sentence and some predefined aspect words as input data. We will utilize some real comment examples to illustrate aspect-level sentiment analysis tasks.
Obviously, as shown in Table 1, each example sentence contains two aspect terms, and each aspect term has four different sentiment polarities (i.e., positive, neutral, negative, and conflict). The aspect-based sentiment analysis can be defined as follows: Definition 1. Formally, we give a comment sentence S = fw 1 , w 2 , ⋯, w n g, where n is the total number of words in S. A = fa 1 , ⋯, a i , ⋯, a m g with length m represents an aspect vocabulary of length m, where a i denotes the ith aspect word in aspect vocabulary A, and A is a subsequence of sentence S. P = fp 1 , ⋯, p j , ⋯, p C g denotes the candidate sentiment polarities, where C denotes the number of categories of sentiment polarity and the p j is the jth sentiment polarity.
Problem 2. The goal of the aspect-based sentiment analysis model is to predict the most likely sentiment polarity of 3 Wireless Communications and Mobile Computing specific aspect word in a sentence, which can be formulated as follows: where ϕ represents a function that quantifies the degree of matching between the aspect word a i and the sentiment polarity p j in the sentence S. Finally, the model outputs the sentiment polarity with the highest matching degree to be the classification result. The notation and their description in this model are summarized in Table 2.

Our Proposed Model
In word-level sentiment analysis and sentence-level sentiment analysis, the details of sentiment analysis will be covered up, and it also cannot accurately reflect people's fine-grained emotional expressions. In order to conduct a more complete sentiment analysis and discover the sentiment information expressed by different angles (i.e., aspects) of text reviews, this paper proposes an aspect-location model based on BERT for aspect-based sentiment analysis (i.e., ALM-BERT), which can mine different aspects of sentiment in comment details, to avoid incorrect results in real-world applications such as recommendation systems and question answering systems. The overall framework of the ALM-BERT approach is shown in Figure 2, which mainly includes four parts: multiangle text vectorization mechanism, important feature extraction model, fusion layer, and sentiment predictor. Firstly, we employ the pretrained model BERT to generate a high-quality word vector of sequence, which provides effective support for subsequent steps (such as Section 4.1). Then, we build a new feature extractor (i.e., important feature extraction model) of multihead attention mechanism and position feedforward network to extract important context and target information (such as Section 4.2.1) and build an aspect feature location model, which can select information related to aspect words from context feature representation (such as Section 4.2.2). Finally, on the basis of fusing the context and relevant important information related to the target, we use a sentiment predictor at the aspect level to predict the probability of different emotion polarities (such as Section 4.3).

Multiangle Text Vectorization
Mechanism. The word embedding maps each word to a high-dimensional vector space, which mainly assists machines in understanding natural language. Its mainstream methods include Word2vec and Glove. Both of these methods belong to context-based word embedding models and have achieved good performance in aspect-level sentiment analysis tasks. However, previous research has already demonstrated that these two word embedding models cannot capture the enough information in the text [34], which leads to poor classification accuracy and reduces the performance of the aspect-based sentiment analysis model. Therefore, a high-quality word embedding model has an important influence on improving the accuracy of classification results [35].
The key of aspect-level sentiment analysis is to understand natural language processing effectively. This idea usually highly relies on large-scale high-quality annotation text. Fortunately, BERT is a language pretraining model that can Table 1: Some examples of aspect-based sentiment analysis.

Comments
Aspect Sentiment polarities Positive Negative Neutral Conflict All the money went into the interior decoration, none of it went to the chefs.

Chefs ✓
Great Indian food and the service is incredible.
The lobster sandwich is $24, and although it was good, it was not nearly enough to warrant that price.

Aspect-Based Sentiment Feature Extraction Method.
In order to extract the implicit features of the aspect words and their context and to consider the auxiliary information contained in the aspect words, we design an aspect-based sentiment feature extraction method inspired by a transformer encoder [36]. The basic idea of this method is to integrate the information of aspect words and context and to model the interaction between context and target words. Furthermore, we hold the opinion that the accuracy of sentiment classification can be improved by capturing the feature information of aspect words in context.

Important Feature Extraction Model.
A transformer encoder is a novel feature extractor based on multihead attention mechanism and position-wise feed-forward networks, which can learn different important information in different feature representation subspaces. Not only that, the transformer encoder can also directly capture the longterm dependencies in the sequence, and it is easier to parallelize than recurrent neural network and convolutional neural  networks, which greatly reduces the training time. Based on the same principle, we design the important feature extraction model as shown below.
Specifically, we first construct a multihead attention mechanism composed of multiple self-attention mechanisms. This mechanism employs different heads to capture the implicit information of the text from different aspects and can achieve high-performance parallel computing independently of RNN and CNN. Among them, the different aspects include query sequence (Q), key-value pairs (K and V). The attention score f s ð:Þ in the self-attention mechanism is calculated as follows: where σð:Þ stands for the normalized exponential function, and f e ð:Þ is the energy function to learn the correlation features between K and Q, which can be calculated by using the following formula: where ffiffiffiffi ffi d k p denotes the scale factor, and the d k is the dimension of the query and key vectors.
The attention score of multihead attention mechanism f mh ð:Þ is obtained by concatenating attention score of selfattention mechanism: where a i represents the ith attention score, ½; denotes concatenates of the vector, and W d is the weight matrix. Secondly, we input the context representation and aspect representation into the multihead attention mechanism to capture the long-term dependencies of the context and decide which context is crucial for determining the sentiment of the aspect word, which is shown in the following: where c cc and t ca denote the long-term dependent information of the context and the context-aware information to aspect word, respectively. Then, we utilize the transform encoder to take c cc and t ca as the input of the position-wise feed-forward network and dig out the hidden states h c and h a . Formally, the positionwise feed-forward networks PFN, h c , and h a are defined as follows: where ζð:Þ expressions the rectified linear unit, b 1 and b 2 represent biases, and W 1 and W 2 denote learnable weights. Finally, after the mean pooling operation of h c and h a , we get the final hidden states h cm and h am .

Aspect Feature Location
Model. The above-mentioned important feature extraction model captures the long-term dependence of the context and also generates the interactive semantic information between the aspect word and the context. On this basis, in order to further highlight the importance of different aspect words, we build an aspect feature positioning model based on the maximum pooling function (which is shown in Algorithm 1). This model divides the extracted aspect words and their context hiding features into multiple regions (i.e., line 3) and selects the maximum value in each region to represent the region (i.e., lines [4][5]. In this way, the model can also locate core features and reduce the influence of noise words that are not related to aspect words, thereby improving the integrity of aspect word information. In other words, capturing aspect features and the different importance of aspect features can further improve the accuracy of aspect-level emotion classification.
Specifically, combining the characteristics of the position and length of the aspect word, the feature location algorithm extracts the most important relevant information of the aspect word af from the context representation e c . Moreover, We applied max-pooling to af to get the most important features AF.
Require: the context representation e c ; the position i of aspect words in a sentence; the length al of aspect words; the batch size bs; 1: repeat 2: for each e c ∈ bs do 3: Select lines (i + 1 and i + 1 + al) of e c to obtain aspect feature af ; 4: Calculate the most important features AF according to Eq. (8); 5: Apply the dropout operation to all the important features to get the h af ; 6: end for; 7: until Accuracy and macro-F1 tend to be stable.
Algorithm 1: Aspect feature location algorithm. 6 Wireless Communications and Mobile Computing Afterwards, we perform a dropout operation on AF and obtain the important features h af of the aspect word in the context representation.

Sentiment Predictor.
One of the cores of ALM-BERT is to utilize multiple self-attention mechanisms to obtain multiangle text hidden expression features, and after processing by aspect feature positioning models, we have obtained a wealth of aspect-level auxiliary features and contextual interaction of aspect word information. In order to effectively utilize these complete and rich features, this paper uses fully connection layer to fuse and preprocess the features in advance and uses the softmax function to map the features to the [0,1] interval, so as to achieve effective mapping from features to sentiment classification. Specifically, we concatenate the h cm , h am , and h af first to obtain the comprehensive representation r, which is shown as follows: Subsequently, we use a linear function to preprocess the data of r, as shown in the following: where W u represents the weight matrix, and b u denotes the bias.
At last, we utilize a softmax function to compute the probability Pr that the sentiment polarity of the aspect word a in a sentence is p, as shown in the following: where C denotes the number of categories of sentiment polarity. On the whole, the ALM-BERT approach, which is proposed in this paper, is an end-to-end computing process. Moreover, in order to optimize the parameters of the ALM-BERT approach, so as to minimize the loss between the predicted sentiment polarity y and the correct sentiment polaritŷ y, we adopt cross-entropy with L2 regularization as the loss function to train our model, which is defined as where D means all training data, and j and i denote the index of a training data sample and a sentiment class, respectively. λ represents the factor for L2 regularization, and θ denotes the parameter set of the model.

Experimental Evaluation
For the sake of evaluating the rationality and effectiveness of the ALM-BERT approach, this section describes the details of experiment settings and designs comparative experiments. Moreover, we also analyze the experimental results.

Datasets.
For our experiments, we conduct experiments on three public English review datasets. The statistical information of these datasets is illustrated in Table 3. Among them, in the Restaurant and Laptop datasets provided by SemEval 2014 [37], each sentence contains some aspect words and the corresponding emotional polarity (polarity is marked as positive, negative, neutral, and conflict); in the twitter dataset collected by Tan et al. [38], users' comments are marked with emotional polarity, and the emotional polarity is positive, negative, and neutral, respectively. These three datasets are currently popular review datasets, which have been widely used in aspect-based sentiment analysis tasks.

Baselines and Evaluation Metrics.
In order to verify the effectiveness of our model, we compare the ALM-BERT approach with many popular aspect-based sentiment analysis models, as listed in the following: (i) TD-LSTM [26] is a classic model, which improves the accuracy of classification by integrating the correlation information between aspect words and context into the LSTM-based classification model (ii) ATAE-LSTM [27] is a classification model that attaches the embedded representation of aspect words to the embedded representation of sentence as input and then applies the attention mechanism to calculate the weight (iii) MemNet [39] is a data-driven model that utilizes multiple attention-based computational layers to capture the importance of each context word (iv) IAN [29] proposes interactive attention networks to model aspect words and context separately and generate the representations for targets and contexts (v) RAM [28] constructs a framework based on multiattention mechanism, as to capture the long-distance features in the text and enhance the representation ability of the model (vi) TNet [40] utilizes bidirectional LSTM to generate the hidden representation of context and aspect words and then utilizes a CNN layer to extract important features from the hidden representation instead of the attention mechanism  [41] proposes two kinds of attention enhancement mechanisms to focus on aspect words and context, respectively, and comprehensively considers the relevance between context and aspect words (viii) AOA [4] constructs an attention-over-attention model to associate sentiment words with aspect words. Moreover, the attention-over-attention model automatically generates mutual attentions from aspect-to-text and text-to-aspect (ix) MGAN [42] proposes a multigrained attention model to capture the interactive information between aspect words and context from coarse to fine   Wireless Communications and Mobile Computing to obtain AOA-BERT, IAN-BERT, and MemNet-BERT models. The structure of the rest models is consistent with those described in the corresponding paper.
In addition, in order to objectively evaluate the performance of the ALM-BERT model, similar to existing aspectlevel sentiment analysis tasks, we use macro-F1 score (F1) and accuracy (Acc) as evaluation indicators.
Accuracy (Acc) is defined as where SC denotes the number of samples correctly classified, and N represents the total number of samples. Generally, the higher the accuracy, the better the performance of the model. In addition, macro-F1 is used to truly reflect the performance of the model, which is the weighted average of precision and recall. The macro-F1 is calculated according to the following formula: where T represents the number of samples correctly classified as sentiment polarity i, FP denotes the number of samples incorrectly classified as sentiment polarity i, FN represents the number of samples whose sentiment polarity i is misclassified as other sentiment polarities, C denotes the number of categories of sentiment polarity, Pr e C i indicates the precision of sentiment polarity i, and R C i denotes the recall of sentiment polarity i. In our experiment, for a more comprehensive evaluation of the performance of our model, we divided the categories of sentiment polarity into 3C = fpositive, neutral, negativeg and 4C = fpositive, neutral, negative, conflictg.

Parameter Optimization.
The training process of the ALM-BERT model mainly introduces BERT to generate vector representations of context and aspect words. Therefore, we use BERT's standard parameter BERT BASE to complete the model training, that is, the number of conversion models, the number of hidden neurons, and the number of selfattention heads are 12, 768, and 12, respectively. Furthermore, we have optimized the training process of the model as follows.
The dropout [43] refers to the probability of discarding some neurons during the training process of neural network, which is used to enhance the generalization ability of the model. We initialize the value of dropout to 0:3 and then search for the optimal value at intervals of 0:1. As shown in Figure 3(c), the experimental results demonstrate that when the dropout is 0:5, the ALM-BERT has the best accuracy and F1 value on the three datasets.
The learning rate determines whether and when the objective function converges to the local minimum. In our experiments, we use the Adam optimization algorithm to update the parameters of the model and explore the best learning rate parameters in the range of [10 −5 , 0:1]. As shown in Figure 3(c), when the learning rate is 2 * 10 −5 , ALM-BERT has the best performance.
The L2 regularization parameter is a hyperparameter, which can prevent the model from overfitting. According to the results of Figure 3(c), the ALM-BERT performs best when the value of L2 regularization parameter is set to 0.01. Meanwhile, we initialize model weights by Glorot  [44] and set the batch size to 16 and train a total of 10 epochs. Table 4, the results of sentiment classification when sentiment polarity C = 3. We can easily observe from the experimental results that the accuracy and macro-F1 of BE RT BASE are significantly higher than those of Glove and Word2vec based models. Particularly for Restaurant dataset, the accuracy and macro-F1 of ALM-BERT are 12.77% and 30.97% higher than those of the classical IAN model, respectively. This shows that in the field of NLP, the introduction of BERT to build a pretrained word embedding model can indeed better express the semantic and grammatical features of the text. Meanwhile, we find that the ALM-BERT approach presented in this paper achieves the best classification performance on the three datasets.

Evaluation Experiment of All Comparison Methods. As shown in
Specifically, compared with the performance of the AENbased model on Restaurant dataset, the ALM-BERT can improve the accuracy and macro-F1 by 4.2% and 8.81%. In addition, it is not difficult to find that the classification accuracy and macro-F1 of the ALM-BERT on the Laptop dataset are 3.29%, 3.15% higher than those of the BERT-base model. This proves that our aspect feature location model plays a positive role in aspect-based sentiment analysis.

Evaluation Experiment for Mining Long-Term
Dependencies. For the sake of verifying the performance of   Wireless Communications and Mobile Computing different methods to capture long-term dependencies, we construct a series of verification experiments in texts of different lengths. As shown in Figures 4(a)-4(c), the ALM-BERT approach obtains higher accuracy and macro-F1 than TD-LSTM on the whole, which means that our transform encoder can simulate the implicit relationship between contexts better than LSTM-based encoder. In addition, compared with AEN, as shown in Figure 1, the prediction accuracy and macro-F1 of the ALM-BERT model in different length sentences are improved by 3.1% and 6.56%, respectively. This shows that ALM-BERT makes better use of aspect information than AEN and reduces the interference of aspect independent information.
To sum up, these experiments reveal that the ALM-BERT can get higher accuracy and macro-F1, which further verifies that the BERT and aspect information is feasible and effective in the task of aspect-based sentiment analysis.

Conclusion
In this paper, we establish a transformer encoder based on BERT to capture the long-term dependencies of the context and generate the interactive semantic information between aspect words and context. Then, we propose an aspect feature location model to extract more aspect features from context information. Experiments on several datasets demonstrate that our proposed approach (i.e., ALM-BERT) is superior to other methods. In addition, with the increase of text length, our proposed approach continues to maintain excellent performance. In other words, the ALM-BERT approach is better able to handle long text data and better excavate the users' aspect-level sentiment.
In our proposed approach, we mainly focus on utilizing natural language texts to identify users' sentiment. However, people's way of expression on social platforms has become more abundant. Therefore, we are interested in combining with image processing technology to analyze multimodal data in the future.

Data Availability
For our experiments, we conduct experiments on three public English review datasets. Among them, the Restaurant and Laptop datasets are provided by SemEval 2014; each sentence in those datasets contains some aspect words and corresponding sentiment polarity, which are labeled with positive, negative, neutral, and conflict. The last datasets consist of user comments collected from twitter; the sentiment polarity is labeled with positive, negative, and neutral. These three datasets are currently popular review datasets, which have been widely used in aspect-based sentiment analysis tasks.

Conflicts of Interest
The authors declare that they have no conflicts of interest.