A Tri-Attention Neural Network Model-Based Recommendation

. Heterogeneous information network (HIN), which contains various types of nodes and links, has been applied in recommender systems. Although HIN-based recommendation approaches perform better than the traditional recommendation approaches, they still have the following problems: for example, meta-paths are manually selected, not automatically; meta-path representations are rarely explicitly learned; and the global and local information of each node in HIN has not been simultaneously explored. To solve the above deﬁciencies, we propose a tri-attention neural network (TANN) model for recommendation task. The proposed TANN model applies the stud genetic algorithm to automatically select meta-paths at ﬁrst. Then, it learns global and local representations of each node, as well as the representations of meta-paths existing in HIN. After that, a tri-attention mechanism is proposed to enhance the mutual inﬂuence among users, items, and their related meta-paths. Finally, the encoded interaction information among the user, the item, and their related meta-paths, which contain more semantic information can be used for recommendation task. Extensive experiments on the Douban Movie, MovieLens, and Yelp datasets have demonstrated the outstanding performance of the proposed approach.


Introduction
With the increasing amount of data on the Internet, users find it difficult to obtain useful information. In recent years, recommender systems which can only retain relevant information have received increasing attention [1][2][3]. Researchers proposed many methods to solve the recommendation task, which can be classified into four classes: neighborhood-based methods [4][5][6][7], model-based methods [8][9][10][11], graph-based methods [12][13][14][15][16], and deep neural network based methods [17][18][19][20][21]. Neighborhoodbased methods contain user-based collaborative filtering and item-based collaborative filtering, this kind of methods utilize neighbor information of users to make prediction [22]. Model-based methods first construct a descriptive model using the users' preferences, and the recommendations are generated based on the model [23]. Graph-based methods not only consider the neighborhood information of each node but also consider the network structure [16,20]. Inspired by the great success of deep neural networks in computer vision and natural language process, recent researchers have exploited deep neural networks in recommendation task.
Besides, since multiple types of auxiliary information become available, many methods propose to use this information to improve the performance of recommendation [24,25]. As auxiliary information has heterogeneity and complexity, it is challenging to leverage this information in recommender systems. Heterogeneous information network (HIN), containing various kinds of nodes connected by multiple types of relations, that can model rich auxiliary data has been applied in recommender systems [26,27]. Although the existing HIN-based recommendation approaches enhanced the performance than the traditional recommendation approaches, they still have the following problems: first, the meta-paths are manually selected, not automatically; second, meta-path representations are rarely explicitly learned; third, the global and local information of each node in HIN has not been simultaneously explored.
To solve the above deficiencies, we develop a tri-attention neural network (TANN) model for recommendation task. e proposed TANN model applies the stud genetic algorithm to automatically select meta-paths at first. en, it learns global and local representations of each node, as well as the representations of meta-paths existing in the HIN. After that, a tri-attention mechanism is proposed to enhance the mutual influence among users, items, and their related meta-paths. Finally, the encoded interaction information among the user, the item, and their related meta-paths, which contain much semantic information, can be used for the recommendation task. To summarization, our major contributions are (1) Meta-paths are automatically selected via the stud genetic algorithm. We organize the rest of this paper as follows. Section 2 reviews related work. Section 3 presents the preliminaries and notations in the paper. Section 4 illustrates the proposed tri-attention neural network (TANN) model-based recommendation approach. Section 5 illustrates the experimental results and Section 6 concludes the paper.

Related Work
Accurately finding useful information in many e-resources becomes more and more difficult for users, due to the development of information technology and the increasing content of information. However, a recommendation system can overcome this obstacle.
Alqadah et al. [28] utilized each user's local biclustering neighborhood and developed a collaborative filtering method. Yao et al. [29] applied a clustering analysis and latent factor model to enhance the neighborhood-based recommendation's performance. Neighborhood-based systems use the stored ratings to make a recommendation, whereas model-based approaches learn a predictive model by using the ratings. Cremonesi et al. [30] presented a PureSVD-based matrix factorization method, which uses the user-item rating matrix's most principle singular vectors to describe users and items. Pan et al. proposed a consensus factorization based framework for coclustering networked data [31]. Sindhwani et al. [32] developed a weighted nonnegative matrix factorization method. Xiong et al. propose an information propagation-based social recommendation method (SoInp) and model the implicit user influence from the perspective of information propagation [33]. Hofmann [34] utilized PLSA (Probabilistic Latent Semantic Analysis) to solve collaborative filtering and showed that the PLSA is equivalent to nonnegative matrix factorization.
Most graph-based recommendation approaches are based on a random walk [35]. Christoffel et al. [12] proposed a graph random walk based recommendation algorithm. Kang et al. [16] presented a graph-based top-n recommendation model, which not only considers the neighborhood information encoding by user graph and item graph, but also takes into account the data's hidden structure.
Since deep learning techniques have been successfully applied in speech recognition and computer vision, some researchers began to utilize deep learning techniques in recommendation task and found that deep learning based recommendation approaches achieve better results than the conventional recommendation approaches. Oord et al. [36] applied deep convolution neural networks to generate songs' latent factors. Wang and Yang [37] combined probabilistic graphical models and deep belief networks to simultaneously learn audio content's features and make personalized recommendations. Xue et al. [38] presented a deep matrix factorization approach to solve the top-n recommendation task. Kim et al. [39] integrated convolution neural network into probabilistic matrix factorization and proposed a context-aware hybrid model for the recommender system.
In recent years, researchers adopted the heterogeneous information network (HIN) which characterizes rich auxiliary data in recommender systems. Pham et al. [40] modeled the rich information based on the constructed heterogeneous graph to solve the recommendation task. Chen et al. [41] developed a heterogeneous information network based projected metric embedding approach for link prediction. Yu et al. [42] presented a recommendation model based on a constructed attribute-rich HIN. Jiang et al. [43] modeled the user preferences using a generalized random walk with restart model and developed a heterogeneous information network based personalized recommendation method. Hu et al. [44] incorporated meta-path based context and proposed a co-attention mechanism based deep neural network to solve the recommendation task.
Although HIN-based deep learning approaches have achieved good performance in recommendation task, they usually select meta-paths manually, ignoring how to automatically select meta-paths. Moreover, the existing approaches seldom consider the interaction between node information and meta-path information. Our work applies the stud genetic algorithm to automatically select metapaths, proposes a tri-attention mechanism that considers interactions among user-item-meta-path triplets, and develops a recommendation approach to further enhance the recommendation performance.

Preliminaries and Notations
We use the definitions of heterogeneous information network (HIN) and meta-path in [45].

Definition 1. A heterogeneous information network is an information network, which contains various kinds of objects and various kinds of links. It is defined as G�(V, A, E, R, W)
, where V is the set of different types of vertices, A is the object type set, E is the union of different types of links, R denotes the link type set, and W is the union of the weight on each link.
e meta-path P's length is the number of relations contained in P. Taking the user-movie network as an example, we can use a 4-length meta-path to describe the usermovie relation such as user ⟶ have seen movie ⟶ seen by user ⟶ have seen movie, or short as UMUM.

Definition 3.
A path instance is a sequence of entity nodes; it is an explicit path in a meta-path, that is, p ∈ P.

Tri-Attention Neural Network (TANN) Model-Based Recommendation
In this section, we present the proposed tri-attention neural network (TANN) model at first and then illustrate the TANN model-based recommendation approach.

TANN Model.
e whole architecture of the proposed TANN is presented in Figure 1.

Embeddings for Users and Items.
In order to make the users' and items' representations more meaningful, we propose global representation to represent coarse-grained features of users and items, and develop local representation to represent fine-grained features of users and items; then, we integrate global information and local information of each node in HIN.
(1) Global Representations of Users and Items. Following [46], we use a lookup layer to map the users' and items' onehot representations to low-dimensional dense vectors. Given a user-item pair <u, i>, let l u ∈ R ||×1 and z i ∈ R |I|×1 denote their one-hot representations. L ∈ R ||d and Z ∈ R |I|×d represent the lookup layer's corresponding parameter matrices, which preserve users' and items' information; the user embedding's and item embedding's number of dimension is denoted as d; and ||and |I| denote the users' number and items' number, respectively. We apply HIN2VEC algorithm [47] to obtain the matrix L and Z. e user u's and item i's global representations are represented as (2) Local Representations of Users and Items. As each user can be represented as a sequence of item and each item can be represented as a sequence of user, we learn local representations of users (items) according to the corresponding item (user) sequence. Here, we use S n(u) ∈ R |ℓ u |×|I| and S n(i) ∈ R |ℓ i |×|| to represent the sequence matrix of the user u's and the item i's neighbors, respectively. For each neighbor node in the sequence, we use one-hot representation to represent it; |ℓ u | and |ℓ i | are the number of u's neighbors and the number of item i's neighbors in HIN; and n(u) and n(i) denote the neighbor set of user u and item i, respectively. en, we apply a lookup layer to obtain the low-dimensional vector of each node in the item (user) sequence of the user (item). After that, the local representation of the user (item) is obtained based on a neighbor attention mechanism, which can be described as follows: γ n(i) � softmax g n(i) .
We concatenate the global representation vector and local representation vector of user u and item i and feed the concatenated vectors into MLP component to get the final representation, that is,

Meta-Path Embedding
(1) Meta-Path Selection Items. Assuming there exist M metapaths in the heterogeneous information network G, we construct a phenotype matrix H (the size of the matrix is C X M × X(X ≤ M)), representing all possible combinations of Xmeta-paths, where each row represents a meta-path. en, we apply the stud genetic algorithm (SGA) [48] to automatically select optimal X meta-paths.
(2) Meta-Path Instance Selection. Traditional HIN embedding models mainly use simple random walk strategies to obtain path instances. However, the path instances obtained by such strategies are of low value and cannot be directly applied to the recommendation system. erefore, we propose a weighted selecting strategy with priority. In each step, the walker considers that the next step should walk to a Complexity 3 higher-priority neighbor, and using such walking strategy, a path instance which contains more semantic information can be obtained for recommendation task. en, how to define the priority of each node in a sequence is a key problem.
Inspired by He et al. [46] and Hinton and Salakhutdinov [49], we use a similar pretraining technique to measure each candidate node's priority. e basic idea is to take the score between different nodes in the heterogeneous network as the weight allocation standard. For example, we define the score ranges from 1 to 5 in film evaluation; if the score of user u for movie i is 5, then we deem the weight value of the link between user u and movie i is the highest. For each node in HIN, it has a weight value for each node of its neighbors, and the similarity score between the node and the corresponding neighbor node can also be obtained. We measure the priority of each neighbor node by the product value between the weight value and the corresponding similarity score. Such score can reflect the correlation's degree between the two nodes. We use the above priority score strategy to construct each meta-path's instances.
Finally, for each meta-path, we obtain a different number of meta-path instances with a given length of L and then calculate scores of these meta-path instances as follows: for a path instance, we sum the product of weight and cosine similarity between adjacent nodes and then obtain that the sum value is divided by L as the score of the corresponding meta-path instance. So, we can obtain metapath instances' scores of each meta-path and select the k path instances with high scores as the selected meta-path instance.
(3) Meta-Path Instance Embedding. Since a meta-path is a sequence of entity nodes, we apply the convolution neural network (CNN) to map a meta-path into a low-dimensional vector. For a meta-path P, we use X p ∈ R L×d to represent the path embedding matrix, where p is a path instance, L represents the path instance's length, and d is the nodes' embedding dimension. e path instance p's embedding is computed as follows: where Θ denotes CNN's related parameters.
(4) Meta-Path Embedding. As each meta-path contains many path instances, we first apply the proposed meta-path instance selection strategy to obtain each meta-path's top k path instances. en, we use the maximum pooling operation to get important dimensional features from the selected path instances. Let h p k p�1 denote k selected path instances' embedding. e embedding of the meta-path P is computed as

(5) Tri-Attention Mechanism Based Interaction Embedding.
As meta-paths contain rich semantic information, different users through different meta-paths show different preferences; even when the same user interacts with different items through the same meta-path, semantic information contained in the meta-path is also different. In order to better represent the semantic information existing among users, items, and metapaths, we develop a tri-attention mechanism to assign different weights to different triplets of user-item-meta-path. Given the user embedding x u , item embedding y i , metapath embedding c p which exists between the user u and item i, we use two full-connection layers to get the triattentive score as where W * 1 is the first layer's weight matrix, b 1 is the first layer's bias vector, w 2 is the second layer's weight vector, and b 2 is the second layer's bias. f(·) is set to the ReLU function.
We use softmax function to obtain the final interaction weights; that is,

Complexity
where P u⟶i is the set of meta-paths existing between the user u and the item i. e interaction embedding among the user u, the item i, and their related meta-path P can be represented as where ⊕ represents the vector concatenation operation.

TANN Model-Based Recommendation Approach.
Once we obtain the interaction embedding, we apply an MLP component to model complicated interactions: in which the MLP contains two hidden layers with ReLU nonlinear activation functions and an output layer with sigmoid functions. r u,i is interpreted as the relevance score between the user u and the item i, and r u,i is used to generate a recommendation list for the user u.
Defining an appropriate objective function is a critical step for model optimization; following [14,17], we use negative sampling to learn the model's parameters: in which the first term is the observed interactions, and the second term indicates the negative feedback drawn from the noise distribution which is set to the uniform distribution (it can be set to other biased distributions), and c is the number of negative item sampling.

Datasets.
We use three datasets, that is, Douban Movie dataset (https://github.com/librahu), MovieLens movie dataset (https://grouplens.org/datasets/movielens), and Yelp business dataset (https://www.yelp.com/dataset). As a rating indicates whether a user has rated an item, we deem the rating as an interaction record [47,50]. Table 1 lists the detailed description of the datasets. Each dataset's first row shows the number of users, items, and their interactions, and the other rows list the other relations' statistics. e manually selected meta-paths and SGA selected meta-paths of each dataset are listed in Table 2. Since long meta-paths may import noisy semantics [51], we set the length of meta-path to 4 and set the number of each meta-path's selected path instances to 5.

Evaluation Methods.
In order to evaluate the recommendation's performance, each dataset's user implicit feedback records are divided into a training set and test set, according to a certain proportion. For example, we use 90% feedback records to predict the remaining 10% feedback records. As it is a waste of time ranking each user's items, especially for large datasets, for each user in the dataset, we randomly select 200 negative samples having no interaction records with the user at first. After that, we obtain each user's recommendation list by ranking the positive items and negative items of the list. We use Pre@K, Recall@K, and NDCG@K to evaluate the experimental results. When we apply the SGA algorithm to select the optimal 4 meta-paths, we follow the principle that "the smaller the objective function value is, the larger the fitness value is"; that is, we take negative evaluation scores. We implement the TANN model using TensorFlow with Keras (https:// keras.io/). e batch size is set to 256, the regularization parameter is set to 0.0001, the learning rate is set to 0.001, and the dimension of user embedding and item embedding is set to 64. We apply Adaptive Moment Estimation (Adam) [52] to optimize the model.

Performance of TANN-Based Recommendation.
To illustrate the benefits of applying the SGA algorithm, weighted random walk strategy, and integrating HIN's global and local information of TANN model, we compare TANN against the following: (1) TANN w/o SGA, which applies manually selected meta-paths; we use the meta-paths in the second column of Table 2; (2) TANN w/o WRWS, which applies a random walk strategy (RWS) instead of weighted random walk strategy (WRWS) to select meta-path instances; (3) TANN w/o local, which considers global information of HIN only, ignoring local information, that is, it uses a global representation of users and items only, ignoring local representation of users and items. Table 3 shows that the performance of TANN w/o SGA method is the poorest, which indicates that the selection of meta-paths has a great influence on the recommendation results. e performance of TANN w/o RWRS is better than the performance of TANN w/o SGA; it indicates that the meta-path selection strategy plays an important role compared to the weighted random walk strategy for recommendation task. In addition, as results illustrate, the performance of TANN w/o local is better than that of the above two methods. is can be mainly credited to the fact that the nodes and meta-paths in HIN contain rich implicit and effective information. TANN w/o local method applies SGA to automatically obtain meta-paths avoiding artificial interference, and it uses the weighted random walk strategy to get optimal meta-path instances that can represent the information of heterogeneous information network structure. TANN method, which not only considers information of users, items, and meta-paths but also considers the mutual influence among them, consistently outperforms the other three methods.

Comparison with Other Recommendation Approaches.
Moreover, we compare our proposed TANN method with other four recommendation methods: (1) BPR (Bayesian Personalized Ranking) [53], which is a Bayesian posterior optimization based personalized ranking algorithm; (2) LRML (Latent Relational Metric Learning) [54], which employs an augmented memory model to construct latent relations between each user-item interaction; (3) CDAE (Collaborative Denoising Autoencoders) [55], which uses a Denoising Autoencoder structure to learn users and items' distributed representations; and (4) MCRec (Metapath based Context for RECommendation) [44], which leverages rich meta-paths and co-attention mechanism. Table 4 compares the experimental results. From Table 4, we can see that LRML performs the poorest, as it only learns relations that describe each user-item interaction. CDAE learns the corrupted user-item preferences' latent representations that can best reconstruct the full input, so it performs better than LRML. Both LRML and CADE concentrate on explicit feedback. BPR uses not only explicit feedback but also implicit feedback; thus, it can obtain better results than LRML and CADE. MCRec learns representations of users, items, and meta-path based context, as it encodes much more information than the previous methods; its performance is better than the above three methods. MCRec selects metapath manually and only utilizes global information of users and items, while our proposed TANN method selects meta-path automatically, uses local and global information of users and items, applies a tri-attention mechanism to enhance the users, items, and meta-paths' representations, and its performance consistently outperforms the other four recommendation methods.

Impact of Meta-Paths Selection.
In this set of experiments, we examine whether the automatically selected metapaths can produce better recommendation performance than manually selected meta-paths. e experiments are conducted on the MovieLens dataset. e optimal meta-path set selected by SGA is UMTM, UUMM, UMUM, and UMMM (P1). We randomly selected three different metapath sets: UMTM, UMUM, UUUM, and UMMM (P2); UMUM, UUMM, UUUM, and UMMM (P3); and UUMM, UUUM, UMTM, and UMMM (P4). e recommendation performance with different meta-path sets is shown in Figure 2.
We can observe in Figure 2 that the recommendation performance based on the meta-path set which is selected by SGA algorithm is better than the performance based on the three manually selected meta-path sets. e experimental results show that the interference of human factors should be avoided in the construction of heterogeneous information network.

Impact of Users' and Items' Local Information.
We study whether incorporating users' and items' local information can further enhance the recommendation performance. e experiments are conducted on the Yelp dataset. We randomly select a user from the user list and obtain the recommendation list with/without local information of users and items. In Figure 3, the ground-truth of the user's preference movie ids is listed in the middle column, the movie ids using TANN w/o local information method are listed in the left column, and the movie ids using TANN w/i local information method are listed in the right column. e bold ids in the left and right columns indicate the matched results with the ground-truth ids. We can see that when integrating local information, the recommendation method can find more accurate results than without local      information. It illustrates that local information has an impact on improving the recommendation result. Compared with the global information, the local information can reflect the neighborhood characteristics of the nodes centrally.

Parameter Tuning.
Our model includes a few important parameters to tune. In this section, we examine three parameters' performance effect on MovieLens dataset, that is, the embedding size for the weighting vector of attentive scores (i.e., the embedding size of w 2 in equation (14)), the negative samples' number (in equation (18)), and the output layer's embedding size.
For the embedding size of the tri-attention mechanism, we vary it in the set of {16, 32, 64, 128, 256}. As shown in Figure 4(a), our method achieves the best performance when it is 128. For the negative samples' number, we vary it in the set of {1, 3, 5, 7, 9}. We can find from Figure 4(b) that when 5 negative items are taken for each positive item, the evaluation result is the best. For the embedding size of the output layer, we vary it in the set of {8, 16, 32, 64}, and the best result can be obtained when the embedding size of the output layer is 8. e optimal performance is obtained with 128-dimension of the tri-attention mechanism, 5 negative samples, and 8-dimension of the output layer.

Conclusion
We present a tri-attention neural network (TANN) model for the recommendation in this paper. We first apply the stud genetic algorithm to automatically select meta-paths and propose a tri-attention mechanism to enhance the mutual influence among users, items, and their related meta-paths. en, we encode the interaction information among the above three objects which contain more semantic information and can be used for the recommendation task. Extensive experiments on the Douban Movie, MovieLens, and Yelp datasets demonstrated the outstanding performance of the proposed approach. In the future, we will explore other auxiliary information in the heterogeneous information network for the recommendation task.
Data Availability e authors have presented the website of the three datasets in Section 5.1. Readers can download the datasets from the corresponding link, or can contact the corresponding author.'

Conflicts of Interest
e authors declare that they have no conflicts of interest.   Complexity