Dynamic and Static Features-Aware Recommendation with Graph Neural Networks

Recommender systems are designed to deal with structured and unstructured information and help the user effectively retrieve needed information from the vast number of web pages. Dynamic information of users has been proven useful for learning representations in the recommender system. In this paper, we construct a series of dynamic subgraphs that include the user and item interaction pairs and the temporal information. Then, the dynamic features and the long- and short-term information of users are integrated into the static recommendation model. The proposed model is called dynamic and static features-aware graph recommendation, which can model unstructured graph information and structured tabular data. Particularly, two elaborately designed modules are available: dynamic preference learning and dynamic sequence learning modules. The former uses all user-item interactions and the last dynamic subgraph to model the dynamic interaction preference of the user. The latter captures the dynamic features of users and items by tracking the preference changes of users over time. Extensive experiments on two publicly available datasets show that the proposed model outperforms several compelling state-of-the-art baselines.


Introduction
e amount of information on the Internet continues to grow rapidly, and determining useful information has become increasingly difficult. Fortunately, the advancement of recommender systems can substantially help people deal with the information overload problem. Collaborative filtering (CF) is one of the most famous methods in recommendation algorithms. erefore, collaborative learning latent representations of users and items from user-item interactions is an important step in CF-based models. However, poor latent representations of users and items remain the factors limiting further performance. erefore, researchers have adopted different methods to capture latent representations. Till now, the most commonly used approach for CF is to learn latent features in the embeddings space generated from the user-item rating matrix, such as matrix factorization [1] and deep learning-based CF [2][3][4]. Some researchers [5,6] use a bipartite graph to represent user-item interactions to further enhance the latent representations; hence, the topological features of the graph are introduced through graph neural networks (GNNs) [7]. e underlying assumption in leveraging the bipartite graph as input to obtain effective recommendations is as follows: nodes that are connected can spread information by aggregating their neighbors, thereby potentially contributing to capturing high-order features.
Latent representations obtained from dynamic user-item interactions serve as another method. Traditional CF usually defines a decay function of temporal information [8,9], such as the exponential decay function e −ωt , to capture these dynamic features, while graph-based CF obtains a series of user-item bipartite graphs based on interaction time [10]. e underlying assumption in using temporal information is that the behaviors of users on items are a dynamic interactive process; consequently, the long-and short-term preferences of users are captured.
It is unclear, however, which of these approaches-static recommendation versus dynamic recommendation-is better for predicting user preference on items. e former ignores that user preference is dynamic, thus changing over time.
e latter usually requires more parameters and training time than the static recommendation, which limits its application. Furthermore, the introduction of temporal information may bring additional noise, which can hinder the performance and scalability of the model. Two important problems must be solved to deal with these challenges: (1) How to represent the behaviors of users with a dynamic graph? Temporal information is vital for capturing the dynamic preference of users. To avoid introducing additional noise data, utilizing temporal information data more efficiently should be priorities. (2) How to obtain the dynamic features simply and swiftly? In addition to the interaction pairs of users and items, the dynamic graph also includes side information (e.g., temporal information). However, additional information will introduce an increase in the number of parameters and high computational complexity.
To this end, a simple and effective graph-based algorithm is proposed to introduce dynamic features into a static recommender system called dynamic and static features-aware graph recommendation (DSAGR). Firstly, rather than simply timestamps, the dynamic graph of users and items is constructed based on Takens' time embedding theorem [11] to use temporal information efficiently.
is work employs the graph convolution network (GCN) [7] to learn the long-and short-term preferences of users because of the expressive graph-based models.
en, a novel module is proposed, that is, the dynamic sequence learning module, to transform the unstructured dynamic graph to structured sequence data to decrease the dynamic model complexity. In particular, convolutional neural networks (CNNs) [12,13] are used to capture the dynamic features from the sequence data. Finally, the dynamic features and dynamic preference are integrated to obtain the predictor for each user. Our main contributions are as follows.
(1) is work can simply and swiftly capture the dynamic features from the constructed dynamic graph. (2) A novel hybrid model is proposed in this work, which can easily capture the users' dynamic preferences. (3) An offline experiment is performed on real-world datasets. e results show that the proposed model successfully performs the personalized recommendation task. e rest of the paper is organized as follows. Section 2 elaborates on relevant research. Section 3 presents the proposed method, while Section 4 discusses the empirical study on the public datasets. Finally, Section 5 contains the conclusions.

Related Work
Collaborative filtering-(CF-) based recommendation aims to predict the preference of users and then return top-N items of the user interests. Heuristic works, such as itemand user-based models, predict the preferences of users on items based on the k-nearest neighbor algorithm [14]. Model-based approaches usually learn the user and item with low-rank latent representations through matrix factorization [1]. e inner product between the two lowerrank vectors is then used to obtain the probability of the user clicking on the item.
Furthermore, deep learning has been shown to be particularly well suited to representation learning tasks [15,16]. erefore, many deep learning techniques have recently allowed CF to have expressive representation vectors from the historical behaviors of users, such as the multilayer perceptron (MLP), autoencoder (AE), recurrent neural network (RNN), and CNN. Many researchers often consider the combinations of matrix factorization and deep learning techniques for CF recommendation. MLP-based model Wide&Deep [17] captures linear and nonlinear latent features effectively. NeuMF [2] integrates MLP and MF to model high-and low-order interaction features. Furthermore, AE is used for recommendation tasks. Work [18] employs a denoising recurrent AE network and then generalizes it to the CF setting. RNN has been widely used for recommendation due to its excellent performance in modeling sequential data. e variants of RNN, such as long short-term memory (LSTM) [19]and gated recurrent unit (GRU) networks [20], are often employed in practice to overcome the vanishing gradient problem. For instance, work [21] uses LSTM to model the long-and short-term preferences of users. CNN is also a powerful tool [15]. Work [22] uses two parallel CNNs to learn deep representations of users and items. Work [23] integrates CNN and GRU networks to obtain distributed representations of users and items. ese representations are then used to regularize the generation of latent features in matrix factorization.
In the last few years, GNN has been widely recognized as a state-of-the-art approach because of its successful applications in recommendation tasks [24]. GNN can effectively learn the structural representations of nodes by aggregating their neighborhood information. A pooling operation is typically used to output the node embeddings after an aggregation function. Many graph-based models are also proposed by using different aggregation and pooling functions such as GCN [25], GraphSAGE [26], and Graph Isomorphism Network (GIN) [27]. Among these models, the most popular recommendation method is LightGCN [6] coupled with NGCF [5]. LightGCN is an effective simplified version of the NGCF by omitting the transformation mechanism and applying the sum-based pooling layer. Some researchers also consider dynamic representation learning to model data. Work [28] employs matrix perturbation to model the changes in graphs, such as the adjacency matrix. Work [29] constructs the user-item interaction graph dynamically based on the users and items embeddings to improve the diversity of recommendations. Furthermore, many graph-based algorithms [24] have been proposed to enrich the presentation of users and items with other auxiliary information [30,31]. erefore, this work tries to introduce dynamic features as side information to improve the recommendation performance. ese graph-based models have verified their superiority for the recommendation task. However, these models mainly focus on constructing static graph-based recommendation models without considering their combinations with dynamic graph features. As far as we know, there is no study to introduce dynamic graph features into a graphbased recommendation framework.

Proposed Method
In this part, the proposed DSAGR method is presented, and its framework is illustrated in Figure 1. Four components are included in the framework: (1) dynamic graph construction aims to convert the behaviors of users into a dynamic graph; (2) dynamic preference learning module is to learn the longand short-term preferences of users; (3) dynamic sequence learning module aims to capture the user and item sequence features as side information; and (4) prediction layer is to obtain the predictors.

Dynamic Graph Construction.
Given a user set U, an item set I, and a set of time stamps T � t 1 , t 2 , t 3 , · · · , the graph of the user-item interaction at the time stamp t 1 can be defined as G t 1 � (U ∪ I, E t 1 ), where U ∪ I is the set of nodes, and edge e ∈ E t 1 represents the interaction between the user and the item at the time t 1 ∈ T. erefore, the interactions of users and items can be seen as a time series, that is, Figure 2(b) shows different graphs at five different time stamps.
To understand the behaviors of users with the effects of temporal information, several time slices of user-item interactions are generated based on Takens' time embedding theorem [11] using a given delay factor. Considering the following example: given five timestamps [1][2][3][4][5], we assume that the delay factor is equal to 1 and the number of the time slices is 4. Takens' time embedding theorem indicates that the time series is embedded into R 2 vector space as follows: [2,3], [3,4], [4,5]]. Similarly, the user-item interaction time stamps can also be embedded into the vector space and further be divided into l time slices [T 1 , T 2 , T 3 · · · , T l ]. erefore, an interaction graph for each time slice can be obtained as previously mentioned. More formally, the obtained interaction subgraphs are denoted as Figure 2 demonstrates the specific processes. Figure 2(a) presents the example dataset, which is ranked based on the interaction time in the order from small to large. For the sake of convenience, the interaction time is indicated by numbers 1-5. In (b), the user-item interaction graph at different timestamps can be observed. ese user nodes are marked dark red, and item nodes are marked lilac color. In (c), Takens' embedding of temporal information generates three time slices marked by orange color (i.e., T 1 : [1-3], T 2 : [2][3][4], and T 3 : [3][4][5]), in which the element is the time ID. e interacted pairs in each time slice constitute a user-item interaction subgraph.

Dynamic Preference Learning Module.
e upper part in Figure 1 shows the dynamic preference learning module. In the recommendation task, the long-term preference of users reflects their inherent features and general preference, which can be learned from all interacted items of users. e shortterm signals of the user reflect his/her latest preference. Furthermore, many studies [32] use the latest interaction item embedding and the latest timestamp as short-term information but ignore the dependence on historical interactions. e long-and short-term collaboration can be captured effectively by considering the same layer structure with Siamese and information sharing components [33] on all interaction graph G and the last subgraph G T l . Siamese networks can naturally introduce inductive biases for invariance modeling because of identical weight-sharing subnetworks. en, the two graphs can be parameterized using a GNN layer, such as LightGCN [6]. To offer a holistic view of the long-term and short-term collaborative nodes embeddings, we provide the matrix form. Long-term: where M ∈ R |U|×|I| is the user-item rating matrix, in which each element M u,i is 1 if the user u interacted with the item i; otherwise, it is 0. en, A is the adjacency matrix of the graph G; D is a (|U| + |I|) × (|U| + |I|) diagonal matrix, in which each entry D jj is the number of nonzero entries in the jth row vector of the adjacency matrix A; E u ∈ R |U|×d , E i ∈ R |I|×d are the initial weight matrix of users and items, respectively.
, are the defined hyperparameters; E long,u and E long,i are the final representations of users and items for learning long-term preferences. Short-term: whereA last is the adjacency matrix of the latest subgraph G T l ; D last is also the diagonal matrix calculated based on A last . E short,u and E short,i , respectively, denote the final representation of users and items in the short-term preference learning.     Figure 1: Framework of our model. 4 Computational Intelligence and Neuroscience

Dynamic Sequence Learning Module.
e degree of the graph is shown to be effective for evaluating the popularity of nodes [34,35]. erefore, the degree matrixes of users and items are proposed to track the dynamic changes in the user and item nodes in constructed dynamic graph {G T 1 , G T 2 , G T 3 , . . . , G T l }, respectively. For instance, the degree matrix of items is denoted as Q � q 1 , q 2 , · · · , q l , in which element q k is a |I| dimensional vector, its ith element q i,k is the number of edges incident to the item i in the kth subgraph G T k . And the element q i,k means the popularity of item i in the time slice T k . Similarly, the user degree matrix is denoted as P � p 1 , p 2 , . . . , p l , where p l ∈ R |U| . erefore, this study offers a novel means of processing the unstructured graph data and hence may shed light on the task of graph-based recommendation. e lower part in Figure 1 shows the dynamic sequence features learning modeled by two parallel CNN layers. e input of the module is the obtained user degree matrix and item degree matrix P ∈ R |U|×l , Q ∈ R |I|×l . e CNNs generally comprise a set of convolutional and pooling layers in their architectures. In this work, two 1D-convolutional layers and one pooling layer are designed to learn dynamic features. e first and second convolutional layers with a set of f 1 , f 2 filters with the kernel size of τ, shared weights w 1 ∈ R f 1 ×1×τ , w 2 ∈ R f 2 ×f 1 ×τ , as shown in the following equations.
where p t ∈ R |u| is the column vector of user degree matrix P ∈ R |U|×l , * is the convolution operator, and h t ∈ R |U|×f 2 denotes a feature matrix for all users. After the 1D-convolutional operation, l feature matrixes can be obtained. Inspired by graph-based models [5,6], the weighted sum operator is designed as the pooling layer and then normalized by the sigmoid function σ. e output is shown in the following equation.
Analogously, the items' degree representations are defined as follows.
where q t ∈ R |I| is the column vector in item degree matrix Q, * is the convolution operator, and w 1 ′ ∈ R f 1 ×1×τ , w 2 ′ ∈ R f 2 ×f 1 ×τ are shared factors.

Prediction
Layer. e embeddings and degree representations of nodes of the user and item are obtained after the dynamic preference learning module and dynamic sequence learning module. en, a fusion layer is defined to learn the final representations: where (,) is the concatenation operator. ereafter, we use an inner product on the final embedding of the users and items to predict the recommendation results. e formula is as follows: 3.5. Training. In this work, the Bayesian Personalized Ranking (BPR) loss [36] is used, which is a pairwise loss that encourages the prediction of an interacted entry to be higher than its uninteracted counterparts: where O � (u, i, j)|(u, i) ∈ O + , (u, i) ∈ O − , is the dataset in the training process, which consists of interacted pairs set O + and uninteracted pairs set O − . What is more, L 2 regularization is used to optimize the model parameter to prohibit overfitting risk. erefore, the final objective function in our model is combined by BPR loss and regularization: where set Θ 1 � E u , E i is the set of embedding parameters, Θ 2 � w 1 , w 2 , w 1 ′ , w 2 ′ is the set of weights in CNN layers, and c 1 , c 2 are the hyperparameters to control the regularization. Furthermore, the Adam [37] is used in a minibatch manner to optimize the proposed model.

Experiments
Empirical results are proposed to evaluate the proposed model. e experiments aim to answer the following research questions: RQ1: How does DSAGR perform as compared with state-of-the-art models? RQ2: How do dynamic features affect DSAGR? RQ3: What are the effects of hyperparameters on the DSAGR model?

Dataset.
e dynamic preference learning module in the proposed method requires implicit feedback and temporal information; thus, the proposed model is evaluated on ML_100k and ML_1M movie datasets (https://grouplens. org/datasets/movielens/). Table 1 presents the statics of the datasets. ese datasets have 5-level rating scores, and each user has rated at least 20 movies. e ratings of the datasets are binarized because the proposed model only requires implicit feedback. Specifically, every element in the original rating matrix (scores 1 to 5) is binarized to 1 and 0, where 1 indicates that the rating score is not less than 4, 0 indicates the rating score is less than 4, and no interaction. is work Computational Intelligence and Neuroscience also follows the same settings described in NGCF [5] to select 20% of interaction recodes randomly from each user to represent the test and valid sets and then treat the remaining as the training set.

Baselines.
e GSAGR model is compared with the following methods: (i) Item-based CF (ICF) [38]: ICF is usually a two-step process: (1) determining the similarity set for target items and (2) predicting rating scores based on the most similar items. e rating scores of unseen items for the user are predicted in the second phase according to the weighted average rating of his knearest neighbor. (ii) PMF [1]: With the probabilistic matrix factorization (MF) algorithm, this model maps the user-item rating matrix into two low-dimensional matrixes. en, this algorithm predicts the preference of users by the inner product between the two low-dimensional matrixes. (iii) DMF [39]: DMF is an MF-based CF method, which obtains the latent features of users and items through deep representation learning, that is, MLP. is method then uses the inner product between the two latent features to predict the preferences of users on items.
(iv) Wide&Deep [17]: Wide&Deep is a famous deep learning recommender system that combines wide linear models and MLP neural network layers to obtain latent representations of users and items. (v) NGCF [5]: is work learns the representations of users and items by aggregating the information of high-order neighbors. Specifically, each node obtains the transformed representation of neighbors by propagating embeddings on the bipartite graph structure. NGCF introduces collaborative signals in the pooling layer to enhance high-order latent features learning. (vi) DGCF [40]: DGCF is an advanced graph-based CF model. is work focuses on the intentions of users for interacting with different items. e implementation of DGCF is based on the NGCF and graph attention network to model different intents of users.

Evaluation Metrics.
Unlike the previous studies [17,39] that perform metrics from sampled uninteracted items, this experiment conducts metrics for all the items that the user has not interacted with. Two widely used evaluation protocols Recall@N and NDCG@N (normalized discounted cumulative gain) (N � 20 by default) are adopted to evaluate the effectiveness of top-N recommendation and preference ranking. e specific formula is as follows: where TP (i.e., Ture Positive) indicates the number of items in the top-N recommendation list that hit the target items and FN (i.e., False Negative) is the number of the positive items in the test set that are falsely identified as the negative items.
where DC G@N � N i�1 2 r i − 1/log 2 (i + 1); here, r i � 1 if the test item is in position i, else 0; I DC G@N indicates the ideal DC G@N such that the target items are present at the top of the recommendation list.

Parameter Settings.
e DSAGR model is implemented in Python under the TensorFlow (https://www. tensorflow.org) framework. For comparison algorithms, the parameter settings are given in the original works of literature. e proposed model uses the following parameter settings: (1) a random normal distribution (the mean and standard deviation are set to 0 and 0.01, resp.) is used to initiate the embedding matrix of users and items. Furthermore, the dimensionality of the embedding matrix is set to 64; (2) the delay factor for dynamic graph construction is set to one three-hundredth of the length of the dataset. (3) GCN and pooling layers with the hyperparameter λ 0 � λ 1 � 1 are used to represent the interaction features of users and items; (3) two GCN layers with 2 and 32 filter factors are used, and the kernel size in each layer is 3; (4) following NGCF [5] and DGCF [40], Adam optimization is used to train the model. e learning rate of the Adam algorithm is 0.0003, which is set by experiments.

Performance Comparison (RQ1).
To answer the first research question, the proposed model is compared with six other methods in terms of Recall@N and NDCG@N. Two of the methods, ICF, and PMF are traditional and are frequently used CF algorithms. DMF and Wide&Deep are deep learning-based CF models. e remaining two, referred to as DGCF and NGCF, are versions of GCN with graph structure. Furthermore, the experiment is repeated for all methods 10 times. erefore, the freedom degree of t-distribution is 9. Specifically, this experiment accepts the hypothesis that DSAGR achieves better performance than baseline models on the two datasets for significance levels of 0.005. e statistical tests and results for this analysis are shown in Table 3.
is table reveals that the method DSAGR successfully enhances the representation of users and items by considering the dynamic features and preferences.

Effect of the Proposed Technologies (RQ2).
e proposed DSAGR is compared with different variants on the ML_100k and ML_1M datasets to investigate the superiority of the key technologies proposed in this work. Table 4 reports the variant models and their performances. e following findings are presented: DSAGR-L performs better than DSAGR-S, which removes the long-term information.
is finding is probably because the preference of users cannot be captured by the short-term information alone. DSAGR-DL, DSAGR-DG, and DSAGR also outperform DSAGR-D and DSAGR-G. is phenomenon proves that the captured dynamic features and short-term information can effectively improve the model's performance. Moreover, DSAGR performs better than GRU-based [20] variant DSAGR-DG and LSTM-based [19] variant DSAGR-DL. is result is probably due to the small length of the row vectors of the dynamic matrix, allowing the CNN to model their dynamic features effectively.

e Sensitivity of Hyperparameters (RQ3).
is work investigates how four hyperparameters, namely, the number of time slices, the filter factors in the first and second CNN layers, and the embedding size, affect DSAGR to examine the effect of the constructed dynamic graph among dynamic preference of users. e experiments on two datasets are conducted, providing similar rules, and only the results on the ML_100k are presented herein.
Inspired by the work [41], the experiment also adopts the orthogonal experimental design (OED) method to get a reasonable combination of these hyper-parameters. Specifically, the number of levels for the four parameters is set as follows: four levels for the time slices {11, 21, 31, 41}; four levels for the filter factors in the first CNN layer {2, 4, 6, 8}; four levels for the filter factors in the second CNN layer {8, 16, 32, 64}; and four levels for the embedding factors {16, 32, 64, 72}. A full-factorial analysis needs 4 4 � 256 experiments. Taguchi's method employs the orthogonal arrays to obtain the possible combinations of the hyperparameters from the whole combinations, thus bringing a minimum experimental run and the best estimation of parameters during the execution. In our experiments, the orthogonal array L 16 (4 4 ) has only 16 experiments, as shown in Table 5. is table shows that DSAGR can achieve better performance by setting time slices as 31, filter factors in the first and second CNN layers as 2 and 32, and embedding factors as 64.
e average values of Recall@N are used to investigate the effect of each factor. For example, the mean value of the first 4 rows in Table 5 Figure 3, DSAGR can achieve the best performance by setting the time slice as 31. Figure 3 shows that when the number of time slices reaches 31, adding more time slices cannot improve the recommendation performance. Also, more time slices will increase the dimension of the row vectors of the degree matrix, and consequently, there will be an increase in the training time taken.
(2) Effect of Filter Factors in CNN. Figures 4 and 5 show the recommendation performance of different filters in the first and second CNN layers. Figure 5 reveals that the performance gradually becomes better with the increase of filter factors in the second CNN layer. However, blindly increasing the filter factors does not necessarily improve the performance of DSAGR. is is maybe because that more information is encoded when the filter factors become larger, but it may also bring a little overfitting.
(3) Effect of Embedding Factors. Figure 6 illustrates the performance of DSAGR under the different embedding factors. e figure reveals that the performance of the model gradually improves as the dimensionality increases. And the performance tends to be stable when the embedding factors are set as 64.

Conclusion
In this work, a hybrid recommender system is proposed to capture the dynamic preferences of users and dynamic sequence features. e proposed model, namely, DSAGR, combines GCN and CNN to obtain the latent representations of users and items and then makes a prediction. e dynamic preference is modeled by the long-and short-term interaction graphs of users. e dynamic sequence includes the degree matrixes of users and items captured from the dynamic graph. To our knowledge, this type of modeling using the short-term interaction graph and degree matrixes has not been previously applied to predict users' preferences. e experimental results show that the DSAGR model significantly improves the performance compared with baselines. Considering future work, two feasible avenues are available: (1) e work concentrates on learning the latent representation of users and items via dynamic information. us, one direction of further study is to design an effective way to aggregate the long-and short-term representations to a single vector, which is successful in maximizing deep learning. (2) e method of capturing dynamic features provides a new idea to many other unstructured data, such as social networks. It is worth trying to improve the recommendation performance.
Data Availability e data used include ML_100k and ML_1M movie datasets. e movie dataset address is as follows: https://grouplens. org/datasets/movielens/.      Computational Intelligence and Neuroscience 9