Neighborhood Attentional Memory Networks for Recommendation Systems

Deep learning systems have been phenomenally successful in the fields of computer vision, speech recognition, and natural language processing. Recently, researchers have adopted deep learning techniques to tackle collaborative filteringwith implicit feedback.However, the existing methods generally profile both users and items directly, while neglecting the similarities between users’ and items’ neighborhoods. To this end, we propose the neighborhood attentional memory networks (NAMN), a deep learning recommendation model applying two dedicated memory networks to capture users’ neighborhood relations and items’ neighborhood relations respectively. Specifically, we first design the user neighborhood component and the item neighborhood component based on memory networks and attentionmechanisms.+en, by the associative addressing scheme with the user and itemmemories in the neighborhood components, we capture the complex user-item neighborhood relations. Stacking multiple memory modules together yields deeper architectures exploring higher-order complex user-item neighborhood relations. Finally, the output module jointly exploits the user and itemneighborhood informationwith the user and itemmemories to obtain the ranking score. Extensive experiments on three real-world datasets demonstrate significant improvements of the proposed NAMN method over the state-of-the-art methods.


Introduction
Information overload has become a challenge in the Internet era as the rapid increase in the user of information resources overloads many users' attentiveness. To alleviate the problem of information overload, recommendation systems have been widely adopted in many online services, such as e-commerce, social media sites, and online news. Collaborative filtering (CF) is one of the most popular and effective recommendation techniques. It establishes the relevance between users and items and relies on historical interactions (e.g., clicking and scoring) by assuming similar users will consume similar items.
Generally, there are three types of CF models: the latent factor models, the neighborhood-based approaches, and the hybrid models. e latent factor models, such as matrix factorization [1], represent a user or an item with a vector of latent features by projecting each user and item into a common low dimensional latent vector space. Typically, a user's interaction with an item is modeled as the inner product of the user latent vector and the item latent vector. Neighborhood-based methods form recommendation systems by identifying neighborhoods of similar users or items based on the previous interaction. In the early neighborhood-based methods, Amazon has achieved significant performance improvements by using collaborative filtering between items [2]. Latent factor models capture the global structure information of users and items but typically neglect the presence of strong associations between a few closely related users or items. In contrast, neighborhoodbased methods capture the local structure information of users and items but often neglect the mass majority of ratings information outside the neighborhood. e above problems between these two classes of CF models lead to the development of hybrid models, such as SVD++ [3] and FM [4], which integrate both neighborhood-based methods and latent factor models. Although these traditional hybrid models improve the accuracy of the recommendations, they primarily model the interaction between a user and an item in a linear way, such as with an inner product, which barely captures the higher-order complex user-item relations.
In recent years, increasing numbers of researchers have adopted the methods of deep learning [5][6][7][8][9][10][11] to study recommendation algorithms. e success of the recommendation algorithms based on deep learning has demonstrated the remarkable advantages of complex nonlinear transformations over traditional linear models. However, the existing deep learning methods such as DeepFM [12] and NeuMF [13] generally only consider the direct interaction between the target user and an item, with the result that the amount of feedback for a given user-item pair is sparse. Hence, we leverage all users who have rated a target item and all items that the target user has rated to gain additional insight on the existing useritem relations.
In this paper, we propose the neighborhood attentional memory networks (NAMN) incorporating two memory networks to realize the user neighborhood component and the item neighborhood component, which are called the user neighborhood memory network and the item neighborhood memory network, respectively. e memory components permit the encoding of rich user preference information and item attribute information. In the user neighborhood memory network, the attention mechanism assigns higher weights to specific subsets of users in the user neighborhood which share similar preferences with the target user. en, the user neighborhood memory network utilizes these weights together with the corresponding user vectors in the user external memory form the vector representation of the user neighborhood. Analogously, we can obtain the vector representation of an item neighborhood by using an attention mechanism in the item neighborhood memory network. Further, to enhance the performance of recommendations, NAMN stacks multiple user and item neighborhood components to reason and infer more precise neighborhood information. Finally, NAMN use two nonlinear interactions, one between the user neighborhood information and the user memories and the other between the item neighborhood information and the item memories, to obtain the ranking score.
In summary, our main research contributions in this paper are as follows: (i) We reveal that the user and item neighborhood information is crucial for improving recommendation performance. (ii) We propose NAMN, which is motivated by recent progress in memory networks to deal with collaborative filtering based on implicit feedback. NAMN utilizes user and item neighborhood memory networks to capture users' and items' neighborhood relations and combine the user and item neighborhood information with the user and item memories in two nonlinear interactions to obtain a recommendation.
(iii) We conduct comprehensive experiments on three real-world datasets and validate the superiority of NAMN over seven state-of-the-art baselines and the effectiveness of both user neighborhood memory networks and item neighborhood memory networks.

Deep Learning in Recommendation.
In recent years, deep learning has been revolutionizing recommendation systems and has achieved excellent performance in many recommendation scenarios. Generally, recommendation systems based on deep learning can be classified into two categories: deep neural networks, used to process the raw features of users or items, or deep neural networks, used to model the interaction among users and items [7]. In one of the early works in this area, He et al. [13] proposed a neural collaborative filtering framework to address collaborative filtering based on implicit feedback by jointly learning a matrix factorization and a feedback neural network. Later, He et al. proposed a neural factorization machine in [14], which enhanced the implicit factorization machine by modeling higher-order and nonlinear interaction features. Xin et al. [15] proposed a convolutional factorization machine model, which seamlessly combined the automatic feature interaction modeling of factorization machines and the strong learning capabilities of 3D convolutional neural networks (CNN) and was able to capture high-order and nonlinear interaction signals. A multirelational memory network [16] is a unified neural learning framework that not only models fine-grained user-item relations but can also distinguish between feedback types according to the strength and diversity of the users' preferences.
Recently, there has been a surge of interest in applying attention mechanisms to recommendation tasks. Social attentional memory network [17] is a novel model for useraware recommendations, which unifies the strengths of memory networks and attention mechanisms for modeling users' preferences by designing an attention-based memory module and a friend-level attention component. Yu et al. [18] improved the traditional structure of recurrent neural networks (RNNs) and proposed a time-aware controller and a content-aware controller, which can adaptively model users' long-term and short-term preferences. Gong and Zhang [19] approached hashtag recommendations with a CNN, augmented with an attention channel, to capture the most informative words. Most of the existing models based on neural attention rely on auxiliary information or context information, while our aim is to explore collaborative filtering with implicit feedback.

Memory Networks.
Memory networks have made massive strides in many research fields, such as natural language processing, question answering, and knowledge tracking. Memory networks generally consist of two components: an external memory that stores long-term historical information and a controller that performs read or write operations on the memory. e memory component can use memory matrices to store historical information, tracks long-term dependencies on historical data, and performs reasoning operations. e controller component manipulates these memories through a content-based or locationbased addressing mechanism.
Zaremba and Sutskever [20] proposed a traditional deep learning model (e.g., RNN, LSTM, and GRU) encoding memory by hidden states or weights that were typically too small, and they were not compartmentalized enough to accurately remember facts from the past dense vectors. Weston et al. [21] proposed an initial framework of memory networks and demonstrated promising results by adding a series of memory units into the model, and it was applied to context-based question answering tasks. Sukhbaatar et al. [22] proposed an end-to-end memory network model that required less supervised data for training so that the model had good flexibility in handling various tasks for different language models. Huang et al. [23] applied the end-to-end memory network model to mention (@) recommendations in Twitter, by combining user interests with external memory and making personalized recommendations by combining tweet content, user history, and candidate user interests. Chen et al. [24] also used a user memory network for sequence recommendations. Considering that users' historical purchase behaviors may not be equally important when predicting users' future purchase interests, the external memory matrix in the network was used to enhance the memory to explicitly store and update users' historical records, which enhanced the expressiveness of the model. Lastly, it is worth mentioning that although the memory networks for collaborative filtering have been considered in a very recent method named collaborative memory network (CMN) [25], it only exploited a memory network to capture users' neighborhoods relations. Specifically, the output module of CMN integrates a nonlinear interaction between the user neighborhood information and the user and item memories to produce the ranking score. Distinct from CMN, we apply two memory networks to capture users' neighborhood relations and items' neighborhood relations, which jointly exploit the nonlinear interaction between the user neighborhood information and the user memories and the nonlinear interaction between the item neighborhood information and the item memories to obtain the ranking score. Our work is innovative in that it utilizes memory networks to capture user neighborhood relations and item neighborhood relations, simultaneously, in a collaborative filtering scenario.

Neighborhood Attentional Memory Networks
In this section, we introduce our model NAMN, whose framework is illustrated in Figure 1, and we present the related notations of NAMN in Table 1. NAMN consists of user neighborhood memory networks and item neighborhood memory networks that contain four memory states: user memory matrix, item memory matrix, user external memory matrix, and item external memory matrix. e model joins user and item neighborhood information with user and item memories in a nonlinear interaction way. e associative addressing scheme acts as a nearest neighbor similarity function, which allows it to learn semantically similar users who have accessed the current item and semantically similar items that have been accessed by the current user. e attention mechanisms permit learning an adaptive nonlinear weighting function in the user neighborhood and item neighborhood, respectively. Consequently, the users who are most similar in the user neighborhood and the items that are most similar in the item neighborhood contribute higher weights in the output module.

User Embedding and Item Embedding.
e memory component consists of a user memory matrix M ∈ R P×d and an item memory matrix E ∈ R Q×d , where P and Q denote the number of users and items respectively and d represents the dimensionality of each memory cell [25,26]. Each user u is associated with a memory slot m u ∈ M that stores the user's specific preferences [25,26]. Similarly, each item i is embedded in another memory slot e i ∈ E that encodes an item's specific attributes. We obtain a user preference vector q 1 ui where each dimension q 1 uiv represents the similarity of the target user u with the user v in the user neighborhood given item i and also forms an item attribute vector q 1 iu where each dimension q 1 iuj represents the similarity of the target item i with item j in the item neighborhood given user u as where N(i) is the user neighborhood set of all users who have accessed item i; S(u) is the item neighborhood set of all items that have been accessed by user u. In formula (1), the first term represents the compatibility between the target user u and the user v, who is in the user neighborhood given item i. e second term computes the level of confidence that user v supports the recommendation of item i. In formula (2), the first term represents the compatibility between the target item i and the item j, which is in the item neighborhood given user u. e second term computes the level of confidence that item j is in line with the current user's preferences. erefore, the associative addressing scheme identifies the internal memories with the highest similarity to the target user u in the user neighborhood given the specific item, while for a specific user, the associative addressing scheme identifies the internal memories with the highest similarity to the target item i in the item neighborhood.

User and Item Neighborhood Memory Networks.
e attention mechanism learns two adaptive weighting functions to focus on a subset of influential users within the user neighborhood and on a subset of influential items within the item neighborhood to obtain the ranking score. Traditional Scientific Programming neighborhood-based approaches predefine a heuristic weighting function such as a cosine similarity or Pearson correlation, which must consider the number of users or items. Although this problem can be partially alleviated by factorizing the neighborhood, it is still linear in nature. Instead, through the traditional methods that learn a   weighting function over the entire neighborhood, we no longer need to predefine the number of neighbors or weighting functions to consider. For target item i, we compute the attention weights for all the users in the user neighborhood to infer the importance of each user's unique contribution to the user neighborhood. For target user u, we compute the attention weights for all the items in the item neighborhood to obtain the importance of each item's unique contribution to the item neighborhood: which produce two distributions over the user neighborhood and the item neighborhood. e attention mechanism allows the model to place higher weights on specific users in the user neighborhood and focus on higher weights on specific items in the item neighborhood, while placing less importance on users and items that may be less similar. We construct the user neighborhood representation and the item neighborhood representation by interpolating the external neighborhood memory with the attention weights: where c v is another embedding vector for user v, which is called the external memory for user v, denoting thev th column of the user external memory matrix C with the same dimensions as M; y j is another embedding vector for item j, which is called the external memory for item j, denoting the j th column of the item external memory matrix Y with the same dimensions as C. e external memory allows the storage of long-term information pertaining specifically to each user's and item's role in the neighborhood. In other words, the associative addressing scheme identifies similar users within the user neighborhood or similar items within the item neighborhood, acting as a key to weight the relevant values stored in the external memory matrix C or Y via the attention mechanism. e attention mechanism specifically weights the neighbors according to the target user and item. e output o 1 ui is the user neighborhood representation given the specific item i, and o 1 iu is the item neighborhood representation given the specific user u. ey are composed of the relations between the specific user, item, and neighborhood.
NAMN captures the similarity between user u and the users who are in the user neighborhood given the specific item i and dynamically assigns the degrees of contribution to the representation of the user neighborhood based on the target item. NAMN also captures the similarity between item i and the items that are in the item neighborhood given the specific user u and dynamically assigns the degrees of contribution to the representation of the item neighborhood based on the target user. NAMN does not need to predefine the number of users in the user neighborhood or the number of items in the item neighborhood, so it has a good generalization capacity. In addition, the attention mechanism simultaneously considers the information of each user in the user neighborhood and each item in the item neighborhood and encodes all the neighborhood information into a single memory slot.

Multiple Hops.
In this section, we extend NAMN to handle an arbitrary number of user and item neighborhood memory networks or hops. Each hop queries the user memory and item memory and is followed by the attention mechanism to obtain the next user neighborhood representative vector and item neighborhood representative vector. e first hop may barely capture the higher-order complex information. Starting from the second hop, the model begins to take into consideration the information of the user neighborhood and the item neighborhood, guiding the search for the representation of the user preferences and the item attributes, respectively. Each additional hop repeats this step considering the previous hop's newly acquired information, before producing the final neighborhood representation. In other words, the model has the chance to look back and reconsider the most similar users and items and to infer more precise information for the user neighborhood and the item neighborhood. More specifically, multiple memory modules are stacked together by taking the output from the h hop as input to the h + 1 hop. We apply two nonlinear projections between different hops: In formula (5), W h is a weight matrix mapping the user preference query q h ui to a latent space, coupled with the user neighborhood information from the previous hop, followed by a nonlinear activation function ϕ(·) to obtain the new user preference representation z h ui . In formula (6), W ′ h is another weight matrix mapping the item attribute query q h iu to a latent space, coupled with the item neighborhood information from the previous hop, followed by a nonlinear activation function ϕ(·) to obtain the new item attribute representation z h iu . Intuitively, multiple hops provide additional information that allows inference of more precise user neighborhood information and item neighborhood information.
rough the above operations, the vector representation of user preference information and item attribute information are updated and then it reconsiders the relations between them and the user neighborhood and item neighborhood: q h+1 iuj � z h iu T e j , ∀j ∈ S(u).
Formula (7) applies the dot product of the newly formed user preference information with the user memory in the user neighborhood and obtains the similarity between the Scientific Programming target user and the users in the user neighborhood. en, via an adaptive attention mechanism, it produces an updated user neighborhood representation. Formula (8) applies the dot product of the newly formed item attribute information with the item memory in the item neighborhood and obtains the similarity between the target item and the items in the item neighborhood. en, via an adaptive attention mechanism, it produces an updated item neighborhood representation. e abovementioned process is repeated for each hop, yielding an iterative refinement. e output module will receive the user neighborhood representation in the user neighborhood memory network and the item neighborhood representation in the item neighborhood memory network from the last H hop to produce the final recommendation.

Output Module.
As mentioned above, traditional neighborhood-based models identify the local structure by analyzing similarities between users or items within the neighborhood, while latent factor models capture the global structure by transforming both users and items to the same latent factor space [3]. Hence, we consider the user neighborhood representation and item neighborhood representation to identify localized user-item interactions and the user and item memories to identify the global user-item relations. Existing methods lack the nonlinear interaction between the local structure and the global structure that barely allows the capture of deeper relations [27]. For a given user u and item i the ranking score can be formulated as where ⊙ is the elementwise product; U ∈ R 2d×2d and v, b ∈ R 2d are the parameters to be learned; H is the number of the last hop. In collaborative memory network (CMN), the element product was first applied to the user and item memories followed by a linear projection and subsequently introduced a skip-connection combined with the user neighborhood representation that followed by another linear projection [25,26]. In NAMN, we first introduce two skipconnections that combine user memory m u and user neighborhood representation o H ui and then combine item memory e i and item neighborhood representation o H iu . Skipconnections can reduce the longest path from the output to input, which encourage the flow of information and ease the learning process [26]. In this way, the model can better correlate the specific user and item memories with the ranking score. Subsequently, we apply the elementwise product between the two combinations, and then the result of elementwise product is projected to a latent space with U, followed by a nonlinear activation function ϕ(·). rough extensive experiments, we found the rectified linear unit (ReLU) ϕ(x) � max(0, x) to work best.

Parameter Optimization.
In NAMN, we intend to study collaborative filtering based on implicit feedback, which is more pervasive in the real world and can be collected automatically (e.g., likes and clicks). e rating matrix from the user's implicit feedback contains a 1 if the interaction is observed and a 0 otherwise. We make for a pairwise assumption that the target user u prefers the observed item i + over the unobserved or negative item i − . In reality, a value of 1 does not mean user u actually likes item i + . Similarly, user u may not dislike item i − ; it may be that user u is not aware of item i − . We uniformly sample a ratio of positive items to negative items to form triplet preferences (u, i + , i − ), which we further investigate in Section 4.5. e loss function of our model is the Bayesian personalized ranking (BPR) optimization criterion which approximates AUC (area under the ROC curve): where δ(x) � 1/(1 + exp(−x)) is the logistic sigmoid function. Since the entire architecture is differentiable, NAMN can be efficiently trained with the backpropagation algorithm. Similar to [23,27], we utilize layerwise weight tying sharing all embedding matrices across hops to reduce the number of parameters.

Computational
Complexity. e computational complexity for a forward pass through NAMN for a given user- where |N(i)| and |S(u)| represent the size of the user neighborhood set for item i and the item neighborhood set for user u respectively and d denotes the size of each memory cell. e first two terms are the costs for computing the user preference vector and the item attribute vector and the latter terms are the costs of the output module. Each additional hop introduces O(d|N(i)| + d|S(u)| + d 2 ) complexity. NAMN calculates two forward passes during training, one for the observed positive term and another for the unobserved negative term. Parameters can be updated via backpropagation with the same complexity. In public datasets, |N(i)| and |S(u)| are usually slightly larger than or comparable to d. erefore, the primary complexity for evaluating NAMN is e cost is reasonable since other deep learning methods such as CMN [25] have computational complexity O(d|N(i)| + d 2 + d). NAMN requires only extra computation of the similarity with the target item's neighbors and |S(u)| is often less than or comparable to |N(i)|. us, the training in NAMN is also quite efficient.
Recommendation can be performed by computing the ranking score (equation (9)) for a given user-item pair with a single pass through the network. e top-N items with the highest score are recommended to the user. e computational complexity during testing is the same with that of the single forward pass during training.

Experimental
In this section, we evaluate NAMN on three real-world datasets: Epinions, citeulike-a, and Pinterest. We first introduce the datasets, evaluation metrics, baselines, and settings and then present the baseline comparisons and 6 Scientific Programming parameter sensitivity. Last, we discuss the effects of the neighborhood memory network.

Datasets.
In our experiments, we use three publicly accessible datasets, i.e., Epinions, citeulike-a, and Pinterest, to evaluate the effectiveness of our model. e details of the three datasets are as follows: (i) Epinions. e first dataset from Epinions allows users to share product feedback in the form of explicit ratings and reviews. If the user has rated the product (item), we convert the explicit ratings to implicit feedback as 1 and 0 otherwise. (ii) Citeulike-a. e second dataset is collected from a literature management system called CiteULike, which provides users with a digital directory to store and share academic papers. If the paper (item) is saved in the user's online catalog, the user preferences are encoded as 1 and 0 otherwise. (iii) Pinterest. e third dataset is collected from Pinterest, the largest photo sharing site in the world, which allows users to save an image to their board. Each interaction denotes whether the user has saved the image (item) to his/her own board.
e statistical details of the three datasets are presented in Table 2.

Evaluation Metrics.
We use the leave-one-out evaluation method to evaluate the performance of our proposed model, which has been widely used in the literature [10,13,25]. In terms of experimental setup, we follow a common strategy, which randomly samples 100 negative items and one positive item to form the test set for each user. For each of the remaining positive examples, we randomly sample 4 negative items for training. To alleviate the cold-start setting, if the user has only one interaction, we put it in the training set. We rank the positive item among the 100 negative items, the performance of a ranked list is judged by Hit Ratio (HR) and normalized discounted cumulative gain (NDCG). HR measures whether the positive item is present in the top N, and NDCG accounts for the position of the positive item hit by penalizing the score for ranking the positive item lower in the list.

Baselines and Settings.
We compare the proposed NAMN with seven state-of-the-art baselines, which are selected from three kinds of CF approaches and the deep learning-based models.
(i) BPR [28] is a traditional pairwise matrix factorization for implicit feedback. (ii) GMF [13] is a traditional latent factor model. We use the ReLU activation function to generalize MF to a nonlinear setting and optimize the BPR loss function. (iii) FISM [29] is a neighborhood-based CF model for top-N recommender systems that learns the itemitem similarity matrix as the product of two low dimensional matrices by optimizing the BPR loss function. (iv) KNN [30] is a neighborhood-based CF approach computing the cosine item-item similarity to provide recommendations. (v) SVD++ [3] is a hybrid CF model that smoothly merges the latent factor model and the neighborhood-base method. (vi) NeuMF [13] is a deep learning-based model combining matrix factorization and a multilayer perceptron model. (vii) CMN [25] is a competitive hybrid deep learningbased model which fuses the global structure of the latent factor model with the local structure of the user neighborhood or memory.
We leave out the comparisons with the baselines that utilize additional information (e.g., content and contextual) since our objective is to study collaborative filtering based on implicit feedback. We randomly select one interaction for each user from the training set to form the validation set and tune all hyperparameters on it. We set the hop number to 2 and the number of negative samples to 4, and the embedding size of the user and item memory is set to 120. e decay rate is set to 0.9, both the regularization term and the learning rate are set to 0.001. e batch size of Epinions and citeulikea is set to 128 and 256 for Pinterest. Since the initialization of the deep neural networks is crucial, NAMN initializes the user and item memory from the generalized matrix factorization model [13]. In Section 4.5, we will further explore the effects of some hyperparameters. Table 3 shows the experimental results of NAMN, along with the baselines, in terms of HR and NDCG with cut offs at 5 and 10 on the Epinions, citeulike-a, and Pinterest datasets, respectively. We denote a NAMN with two hops as "NAMN-2" and "CMN-2" stand for a CMN with two hops. Clearly, NAMN obtains the best performance across all benchmark datasets and metrics. SVD++ performs better than the latent factor model BPR and the neighborhood-based method FISM on the citeulikea dataset, which demonstrates the superiority of hybrid model. e nonlinear generalization of the latent factor model GMF outperforms the linear BPR on the citeulike-a dataset, which reveals the effectiveness of the nonlinear model. e linear decomposition of the item-item similarity matrix in FISM performs unsatisfactorily in general. KNN shows poorest performance with each other across all metrics and cut offs on Epinions, which is mainly due to the restrictive ability to handle sparse data. e Pinterest dataset contains fewer items than the citeulike-a dataset leading to poor performance from the item-based FISM and KNN methods due to the presence of only a few neighbors. SVD++ shows competitive performance across all datasets but lacks the full expressiveness of nonlinearity found in the deep learning-based models, NeuMF, and CMN. CMN can be viewed as a modified NeuMF, by replacing the original multilayer perceptron with a user memory network Scientific Programming component. CMN achieves better performance than NeuMF, further demonstrating the effectiveness of the memory network component, which iteratively updates the internal user neighborhood state to identify complex interactions. NAMN performs best among all the methods in all datasets, demonstrating the superiority of combining a user neighborhood memory network with an item neighborhood memory network to cope with the collaborative filtering task based on implicit feedback.

Parameter Sensitivity.
In this subsection, we conduct an empirical study to investigate the effect of varying the size of embeddings, the number of negative samples, and hops for HR@10 and NDCG@10 on the citeulike-a dataset, which have similar trends on the Epinions and Pinterest datasets.

Size of Embeddings in 2-Hop.
We observe from Figure 2 that embedding sizes range from 20 to 200, while other parameters are kept fixed. e results of HR and NDCG show the same trend that, with the increase of embedding size, the performance is boosted initially since larger dimensions can encode more useful information. Both HR and NDCG show steady improvement as embedding sizes increase, with the exception of HR, where an embedding size of 140 shows a performance drop due to the nonconvex nature of neural networks.

Negative Samples Number in 2-Hop.
We show the performance variances of the NAMN method with the negative samples from 2-10 in Figure 3. We exclude the results of 1 negative sample since NAMN was unable to distinguish between positive and negative samples, leading to a random performance. As shown, just two negative samples for each positive sample are insufficient to achieve a competitive performance, and sampling more negative instances is beneficial for recommendations. For both metrics, the best sampling numbers are approximately 4 to 6 instances. We also find that when a negative sample number is larger than 6, the performance of NAMN starts to drop. is demonstrates that setting the negative sample number too aggressively may adversely affect the performance of the model.

Hop Number.
We also illustrate the effect of changing the maximal hop number H for HR@10 and NDCG@10 on the citeulike-a dataset in Table 4. As shown, our proposed model has achieved the best performance when H is 1 or 2. More specifically, the performance for HR@10 is best when H is 1, and it is very close to the result when H is 2, while the performance for NDCG@10 is best when H is 2. Hence, two hops can explore higher-order user-item neighborhood relations, and too large of an H brings much more noise than useful information.

Effects of Neighborhood Memory Network.
In this subsection, we further explore the effect of individual neighborhood memory network components for HR@10 and NDCG@10 on the citeulike-a dataset, which have similar trends to the Epinions and Pinterest datasets. We denote a NAMN without an item neighborhood memory network as "NAMN-user," and "NAMN-item" stands for a NAMN without a user neighborhood memory network. In Table 5, the results for "NAMN-user" uniformly perform worse than NAMN, hinting at the effectiveness of the item neighborhood memory network. Another variation, "NAMN-item," generally performs worse than NAMN, revealing the effectiveness of the user neighborhood memory network. Furthermore, "NAMN-user" shows improvements over "NAMN-item." It appears that the user neighborhood memory network contains more useful information to explore users' potential preferences and hierarchical interests. Generally, NAMN requires a combination from both the user neighborhood

Conclusion
In this paper, we propose a novel deep learning recommendation model NAMN, which takes the advantages of memory networks to deal with collaborative filtering based on implicit feedback. NAMN overcomes the limitations of existing methods, which only directly profile both users and items, by designing a user neighborhood memory network and an item neighborhood memory network to capture higher-order complex users' neighborhood relations and items' neighborhood relations, respectively. Extensive experiments on three datasets demonstrate that our proposed model outperforms strong baselines and reveal the effectiveness of both the user neighborhood memory network and the item neighborhood memory network.
In future work, we plan to extend NAMN to incorporate other auxiliary information, such as user reviews, temporal signals, and knowledge graphs to better explore users' potential interests and items' potential attributes.

Data Availability
Previously reported citeulike-a dataset and Pinterest dataset were used to support this study and are available at http:// github.com/tebesu/CollaborativeMemoryNetwork. ese prior studies (and datasets) are cited at relevant places within the text as reference [25].

Conflicts of Interest
e authors declare that they have no conflicts of interest.