General Collaborative Filtering for Web Service QoS Prediction

To avoid the expensive and time-consuming evaluation, collaborative filtering (CF) methods have been widely studied for web service QoS prediction in recent years. Among the various CF techniques,matrix factorization is themost popular one.Much effort has been devoted to improving matrix factorization collaborative filtering. The key idea of matrix factorization is that it assumes the rating matrix is low rank and projects users and services into a shared low-dimensional latent space, making a prediction by using the dot product of a user latent vector and a service latent vector. Unfortunately, unlike the recommender systems, QoS usually takes continuous values with very wide range, and the low rank assumption might incur high bias. Furthermore, when the QoS matrix is extremely sparse, the low rank assumption also incurs high variance. To reduce the bias, we must use more complex assumptions. To reduce the variance, we can adopt complex regularization techniques. In this paper, we proposed a neural network based framework, named GCF (general collaborative filtering), with the dropout regularization, to model the user-service interactions. We conduct our experiments on a large real-world dataset, the QoS values of which are obtained from 339 users on 5825 web services. The comprehensive experimental studies show that our approach offers higher prediction accuracy than the traditional collaborative filtering approaches.


Introduction
In recent years, web services have become one of the most popular techniques to build large distributed systems with interoperable machine-to-machine interaction over Internet.By automatic selection, any business organizations' internal applications or services can seamlessly integrate the services of others.Web service selection in terms of functional properties has been extensively studied for many years [1][2][3], but finding services that satisfy the functional requirements is not enough.Even if web services provide the same functionality, they always differ in QoS (Quality-of-Service) properties, due to their dependence on the circumstances of users and web services.With the increasing web services that have identical or similar functionalities, selection process must also aim at the services best meeting the customers' requirements in terms of QoS.Due to the notable importance of QoS in building reliable and user-friendly distributed applications, QoS-based web services selection has been gaining much attention from both the academia and industry [4][5][6].
Web service QoS refers to nonfunctional properties, such as availability, price, failure probability, response time, and throughput.It is crucial to know the exact QoS values to select the best services.Unfortunately, evaluating the QoS values of the real-world web services is difficult and sometimes even impractical.It is time-consuming and resource-consuming for each user to evaluate the QoS values of all the service candidates.Moreover, commercially valuable web services are usually charged, hosted by large business organizations or companies, which cannot be operated by the unpaid users.
An attractive way is that we can predict the QoS value of a service by taking advantage of the past web service usage experiences of a small amount of the other users who have evaluated this service.Therefore, collaborative filtering (CF) techniques have been widely studied in recent years for web service QoS prediction, which already have been adopted for the recommender systems for a decade [7,8].In general, there are two types of methods commonly used in collaborative filtering recommender systems, which are referred to as neighborhood based CF and model based CF.Neighborhood based collaborative filtering algorithms include user-based algorithms and item-based algorithms, but such techniques do not work well for sparse rating data.Among various model based collaborative filtering approaches, matrix factorization method is considered to be state of the art in recommender systems [9][10][11].
The core idea of matrix factorization is that it assumes the rating matrix is low rank.That means many rows and columns are linearly correlated.It project users and services into a shared latent space, and then the interaction between a user and a service is modeled as the inner product of their latent vectors.
Many researchers have provided various matrix factorization methods for web serviced QoS prediction, simply borrowing the similar models already studied in recommender systems.However, we found that they did not recognize that the QoS often takes continuous values with very wide range, but the user-item rating in recommender system is usually a discrete value belonging to a limited set, such as a integer value ranging from 1 to 5. The low rank assumption would incur high bias for QoS prediction due to the low capacity of matrix factorization models.Furthermore, when QoS value matrix is extremely sparse, matrix factorization models also have high variance due to many kinds of solutions.To reduce the bias, we need more complex models with high capacity.To reduce the variance, we need more complex regularization techniques.
Recently, deep neural networks have yielded remarkable success on many applications, especially on the computer vision, speech recognition, and natural language processing.The exploration of deep learning technologies on recommender systems or QoS prediction has received relatively less attraction.Deep neural networks have very high capacity, but they also have very flexible and practical regularization methods.One of the most popular regularization methods is dropout, which is widely used in real-world applications.
Some recent new novel studies, such as Google's Wide and Deep model and Microsoft's Deep Crossing model, have tried to employe deep learning for recommendation, but they primarily model auxiliary information [12][13][14].Other studies, such as neural collaborative filtering (NCF), a novel approach learning user-item interactions using neural style, outperformed the stated-of-the-art methods by large margin [15].However, NCF models focused on implicit feedback and items ranking, which is not appropriate for QoS prediction.Our work, inspired by the above approaches, presents some novel neural architectures for web service QoS prediction.The main contributions of our work are as follows: (i) We formalize the high bias and high variance problem of traditional matrix factorization models (short for MF-HBV problem) for web service QoS prediction.(iii) We perform experiments for a large real-world dataset, and the extensive experimental investigations are conducted to study the QoS value prediction accuracy of our GCF approaches.
The remainder of this paper is organized as follows: We present our GCF framework in Section 2. Section 3 presents several GCF implementations for web service QoS prediction.Section 4 describes our experiments.Section 5 introduces the related work, and Section 6 concludes this paper.

GCF Framework
2.1.QoS Prediction Task.QoS prediction problem is closely related to matrix completion, which is a process of adding entries to a matrix which has some unknown missing values.Figure 1 shows an example of QoS prediction scenario, including 6 users and 6 web services from three different locations.Each user stores response time of several services measured by itself, but does not measure the response time of all the 6 services.For example, the user  1 in the location 1 only knows the QoS values of service  4 in the location 3 and services  5 ,  6 in the location 2. All the measured QoS values constitute a spare matrix, some entries of which are not defined.The QoS task is evaluating the missing values in the matrix by using the observed values.
However, the QoS matrix is always very large and sparse.The missing values take a large proportion of the matrix.Therefore, the traditional missing value estimation methods need specialized adaptions.
In fact, the QoS prediction task is a special regression problem.Usually, a use or a service does not have a concrete feature.Although we can use hand-designed features, such as users' demographic characteristics or services' text descriptions, we only have interaction data most of the time.We must learn abstract features for the users or items.
Generally speaking, when modeling QoS prediction task, we need to solve the following problems: (i) We learn the latent vector for each user and service.
For the th user and the th service, the latent vectors are defined as u  and s  , respectively.
(ii) We then need to define a prediction function r, = F Θ (u  , s  ).
(iii) A loss function should be defined, such MSE (Mean Square Error) or MAE (Mean Absolute Error).
(iv) A learning algorithm is designed to train the function F Θ (u  , s  using the observed values in the matrix.Note that u  , s  , Θ usually are learned jointly in the same algorithm. To conduct the QoS prediction task, a lot of matrix factorization (MF) based models have been proposed.Section 2.2 will analyze the limitation of matrix factorization and then our GCF framework is outlined in Section 2.3.

Formulization of MF-HBV Problem.
In recommender systems, matrix factorization models use the dot product of two latent vectors as the predicted value, which is defined as Suppose the size of the true rating matrix R is ×.The length of u  or s  is usually less than  and .R can be approximated by the dot of two low-rank matrices: R ≈ US T .The th row of U is u  , and the th row of S is s  .U and S are learned from observed entries of R.This might work well because the user-item rating is usually a discrete value belonging to a small set.However, the QoS often takes continuous values with a very wide range, so the low rank assumption might incur high bias.
Denote r  as the total ratings of user  for  services or items.r  is the th row of matrix R. Cosine similarity is a common measure of similarity between two non-zero vectors.The cosine similarity between r  and r  is defined as Suppose the ratings are normalized and standardized.When the ratings for the each service  are very close, r  and r  can approximately be considered as linearly correlated.Here, we use the expected value of the sum of absolute difference of ratings to measure the credibility of linear correlation (CLC) of the two vectors: Now, we can conclude that Therefore CLC is determined by the number of services (or items) and the difference between the maximum and minimum of ratings.Usually, in recommender systems, ratings are discrete values with limited range, such as 1 to 5. The CLC would be 2.However, QoS may takes continuous value.The response time always is in a range of 0 to 19 (s), and the throughput is often from 0 to 2000 (kb/s).The CLC is so large in QoS prediction applications that many rows or columns cannot be simply considered as linearly correlated.That is why low rank assumption incurs high bias for QoS prediction.
Furthermore, usually, QoS rating matrix is extremely sparse.Suppose we know the value of the true latent vector s If the number of rows is less than the number of columns, we would have infinite solutions for u  .However, in practice, we need to learn both users' latent vectors and services' latent vectors.If the observed ratings are extremely sparse, there are infinite ways of matrix factorization.That is why low rank assumption incurs high variance.
One might consider some regularization methods for matrix factorization.However, this only reduces the variance, but cannot reduce the bias, and sometimes might increase the bias.Therefore, we need more complex models with high capacity to reduce the bias and then use complex regularization technologies to reduce the variance.The next section will introduce the details of our GCF framework, which is more general and flexible and can learn the userservice interactions more accurately.

General Collaborative Filtering.
In fact, the process of learning prediction function can be classified into three categories: (i) Predefine features for users and items (  ,   ) and only learn the weights Θ.
(ii) Predefine the weights Θ and learn the features for users and items (  ,   ).
(iii) Learn features and parameters simultaneously.
Generally speaking, the first type of learning process is related to content-based recommender algorithms, while the other two types are related to collaborative filtering recommender algorithms.
The second type of learning process mainly involves matrix factorization models, where the predicted QoS value r, is usually defined as where ⊙ denotes the element-wise product of vectors.In this case, the weights are predefined as a vector 1.
Matrix factorization models use the dot product of two vectors as the predicted value.However, the underlying relationship between the QoS value and latent vectors might be too complex to be captured well by using only the dot product.
In this paper we present a general collaborative filtering (GCF) framework that uses the third type of learning process.Suppose each user has a unique index value ranging from 1 to ||, and each service has a unique index value ranging from 1 to ||.Here, we use the integer  to represent the index value of the user   and use the integer  to represent the index value of service   .Formally, we can say that  = I u (u i ) and  = I s (s j ).The prediction function of GCF is as follows: where T  (⋅) is a function of transforming index values to a shared k-dimensional latent space: We provide three types of definitions for F  Θ : where h ⊙ and h z are two different transformed interactive features of u  and s  : Note that we can use either h ⊙ or h z or both of them in our model.h ⊙ , h z , and T are termed as p-Factor.The traditional matrix factorization method is just a special case of the first type of GCF framework, which only uses h ⊙ and an identical feature transformation that Φ(u  ) = u  , Φ(s  ) = s  .
To learn GCF models, we need to solve the following optimization problem: R  is the set of ratings observed in R. Note that "∧C" means the regularization policy we adopt.Theoretically, we can use any kind of regularization, but we mainly use Dropout in our models [16,17].Now, there are two remaining problems to be solved: (i) How to implement the function T  (⋅) to transform the index values to latent factors?(ii) How to implement the function Φ(⋅) to transform latent factors to interactive features?
The next section will give the implementation details of how to solve these two problems.However, the GCF we proposed in this paper is a general framework, and anyone can design different strategies to solve the two problems above.We believe our GCF framework would inspire more researchers to present various algorithms for different applications.

Implementation
In this section, we use neural network architecture to implement the GCF models.We first present a solution of implementing the function T  (⋅) and then elaborate some strategies on how to design Φ(⋅).Finally, we present some important algorithms.
3.1.Neural Latent Feature Embedding.In fact, each user or service has a unique identifier in the real-world system.We suppose that each identifier is a unique integer number.If the identifier of a user or a service is equal to , it refers to the th user or service.Now let us use a one-hot vector to represent a user or a service.Suppose we have at most  users and  services.The th user and the th can be expressed as where the superscript "r" refers to the "raw" feature.Our neural latent feature embedding is defined as follows: where = W  is a  ×  matrix and W  is a  ×  matrix.Expanding the formulas W  u   and W V s   , we can see that Therefore, u  is the th column of matrix W  , and s  is the th column of matrix W  .Actually, the tradition matrix factorization predicted QoS matrix can be expressed as Q = (W  ) T W  by using our architecture.u  and s  are termed as embedding layers in the neural network architecture.Figure 2 shows the architecture of the neural latent feature embedding.Above on embedding layers, we can design various other neural layers.

Learning High Level Interactive Features.
After embedding users and services as latent features, we need to transform the latent features to more complex representations.
Here, we use a series of nonlinear functions to transform ReLU Dropout Linear + Dropout  where   can be any kind of simple nonlinear activation function or complex compound function.
For the simple nonlinear activation function, we adopt ReLU [18][19][20], which is defined as Note that we usually add a dropout regularization function for some .
Suppose h  =   (h −1 ); if we use dropout, we need a random vector r, where the length of r is equal to h  and   ∼ (). 1 −  is the drop ratio.Now h  becomes h = r ⊙ h  For the complex compound function, we provide a novel crossing residual unit (CRU): Instead of adding h  2 to x −1  , we add h  2 to h −1 3 .Our CRU can reduce the variance when using dropout.Figure 3 gives an intuitive explanation of CRU.If we use  CRUs, h  1 in the last CRU is termed as t-Factor, the size of which is important for training.

Typical GCF Models and Algorithms.
Depending on how to choose the p-Factors, transformation functions, and prediction functions, we can get various GCF models with different neural network architectures.Table 1 lists  T .Note that UCMF is just a neural network version of tradition matrix factorization, so GCF models refer to NLMF, CNN-CF, ResNet-CF, and HyResNet-CF in the later discussion, not including UCMF.Now, we give some important algorithms to implement GCF models.Due to the limited space, we only elaborate the algorithms related to HyResNet-CF, and the algorithms of the other models are similar.We implement the algorithms using Keras framework, so the pseudocodes are similar to Keras codes [21].
Algorithms 1 and 2 show how to build and train the model, respectively.Embedding function refers to T(⋅), which is implemented according to formula (13).Concat function merges two vectors to a long vector.CRU function refers to CRU neural unit, which is implemented according to formula (17).Dense is a fully connected layer.Split function separates users' ids, services ids, and QoS values from the dataset.Given a training set, the training time is mainly determined by the mode size, batch size, learning rate, and the optimization function.In this paper, we use the Adam as the optimizer, which uses a separated learning rate for each parameter and automatically adapts these learning rates throughout the course of learning.

Dataset and Evaluation Metrics.
We evaluate the proposed methods using the response time values of a publicly large accessible dataset: WS-DREAM dataset#1, obtained from 339 users on 5825 web services [22].Note that the original dataset is a dense full matrix.We construct training and test sets as follows: (i) We first filter out zero values, which are not useful for evaluation, and get the dataset Q = {, ,
For UCMF, we did not adopt any regularization because we found that there is no help to improve the prediction accuracy.For GCF methods, we adopt dropout regularization.The drop rate is set as 0.5 for the layer of which the size is equal to or larger than 64, and 0.2 for the layer of which the size is smaller than 64.
We conduct 10 experiments for each model and each sparsity level and then average the prediction accuracy values.s  , is a one-hot vector, the length of which is 5825.Only the th element is 1.This vector represents the raw feature of a service.The the size of weights matrix W  is 5825×10.This matrix is learned jointly with the other parameters of the whole neural network.The output of each hidden layer, h, represents different abstract features learned by the network.The higher level feature is determined by the lower level feature and the corresponding weights matrix W.These matrices are also learned together while training.The average total training and testing time (from density 5% to 30%) per experiment of each model finally chosen is as follows: UPCC took about 12 minutes, IPCC took about 13 minutes, UIPCC took about 15 minutes, UCMF took about 18 minutes, DNN-CF took about 27 minutes, NLMF took about 19 minutes, ResNet-Cf took about 42 minutes, and HyResNet-CF took about 53 minutes.
The results are reported in Tables 2 and 3 and Figure 5.We can make the following observations: (i) With the sparsity level increasing, all the models have much better prediction accuracy (lower MAE and NMAE).When sparsity level is set as 30%, all the models have similar prediction accuracy except UPCC.
(ii) Our GCF methods outperform the traditional collaborative filtering methods, especially when the QoS rating matrix is extremely sparse.HyResNet-CF has the best prediction performance.
(iii) DNN-CF, ResNet-CF, and HyResNet-CF are all large networks.Although they get similar prediction performance, we found that DNN-CF has higher variance.
(iv) UCMF has very high variance when the QoS rating matrix is very sparse, and the variance decreases when the sparsity level becomes larger.However, it still has high bias and has lower prediction performance than the other GCF models.
When the QoS matrix density is set to 5%, the MAE of UCMF is 0.6284, but the MAE of GCF models is 0.5248 (DNN-CF), 0.5176 (NLMF), 0.5154 (ResNet-CF), and 0.5111 (HyResNet-CF), respectively.On the other side, the SD (Standard Deviation) of UCMF is 0.0256, but the SD of GCF models is 0.0078 (DNN-CF), 0.0094 (NLMF), 0.5154 (ResNet-CF), and 0.0070 (HyResNet-CF), respectively.From   the empirical results, we can see that GCF models have lower variance and lower MAE than UCMF.According the machine learning basic theory, prediction loss is determined by both the bias and variance.Since GCF models have lower prediction loss and lower variance, they obviously also have lower bias.This verifies our assumption in Section 2.3.

Density=5%
To understand why deep GCF models can get better prediction performance, let us show how MAE changes during the training process with different sparsity levels.Here, we compare UCMF and HyResNet-CF methods.The results are shown in Figure 6, and we can make some observations as follows: (i) The training MAEs of UCMF decrease sharply at the early stage, then become steady or increase slightly for a while, and then decrease with increasing the training epochs.(ii) The test MAEs of UCMF have a knee point when the sample density is set as 5%, 10%, or 15%.If we set the sparsity level as 15%, after the knee point, the MAEs first increase but decrease after a period of time and finally become steady.However, for the density of 20%, 25%, and 30%, the test MAEs of UCMF first increase for a while and then decrease and finally become steady.(iii) The training MAEs of HyResNet-CF decrease sharply at the early stage and then decrease relatively more slowly with increasing the training epochs.(iv) The test MAES of HyResNet-CF becomes steady after a few training epochs but is always lower than the test MAEs of UCMF.
Therefore, HyResNet-CF is more robust and efficient than UCMF.Actually, however we choose the sparsity level; HyResNet-CF can always get a better prediction performance using only a small number of training epochs.
To give a more intuitive explanation of the advantages of deep GCF models, we visualize the service latent factors (s  ) of UCMF, NLMF, and HyResNet-CF, using 5% and 30% sparsity levels.We use t-SNE to map the 10-dimensional service latent factor to the 3-dimensional space.From Figure 7, we can see the following: (i) If the training set is larger, i.e., the sparsity level is 30%, each model can capture distinguishable structures.(ii) When the sparsity level is 5%, the training set is very small.UCMF learns poor latent factors with disordered structure.However, NLMF and HyResNet-CF can still capture more clear structures.(iii) NLMF and HyResNet-CF learn different latent factors structures; both of them can get similar prediction performance.

Impact of Network Depth.
To see how network depth influences the prediction performance, we change the number of CRUs of ResNet-CF model and the size of h  1 for each CRU.Here, if we use  CRUs, and the size of t-Factor, i.e., h   1 , is 8, then the size of h −1 1 is 16, the size of h −2 1 is 32, and so on.Therefore, we test the prediction performance, given different  and size of t-Factor.The results are shown in Figures 8 and  9 and Tables 4 and 5.
Due to the extremely large size, models with t-Factor size greater than 8 use at most 7 CRU layers.It is too slow to train the more deeper networks in a limited period of time, but 7 CRU layers are sufficient to verify the impact of network depth.Therefore, Figure 8 and Tables 4 and 5 provide the performance of at most 7 CRUs.However, if the size of t-Factor is equal to 8, we need more CRUs to observe the change of performance.Thus, we draw the prediction performance with respect to the number of CRUs for 1 to 10 in Figure 9 for the model of which the t-Factor size is 8.Note that # t-Factor     in Tables 4 and 5 refers to t-Factor size.We can make some observations as follows: (i) When adding the number of CRUs, the performance would become much better, but it would decrease when the number of CRUs exceeds some threshold value.Therefore, we should choose appropriate network depth.(ii) It seems that models with smaller t-Factor size can get better performance using deeper architecture.When setting t-Factor size as 8, we get the lowest MAE and NMAE using 7 CRUs.(iii) If we use shallower architecture, a larger t-Factor size can get better prediction performance.

Related Work
Collaborative filtering (CF) algorithms have been widely used for the recommender systems.To select the best favorite items for the users, CF usually computes the ratings for how the users are interested in the items and then selects the top  items in term of the ratings.Predicting QoS values are mostly the same as predicting ratings in recommender systems.Users only know partial knowledge about the QoS properties for all the candidate services.We can build the user-service matrix just like the rating matrix and apply collaborative filtering to make prediction [25][26][27].Memory based CF fits the models by directly estimating the parameters using some heuristic algorithms.In fact, memory based CF models are actually generalized k-nearestneighbors (KNN) algorithms [28][29][30][31].Generally, there are two types of memory based CF models: user-based CF and item-based CF.User-based CF utilizes the most similar other users to predict how the user potentially likes a specific item, while item-based CF makes a recommendation according to the user's history experience on similar items.Most of improvement on memory based CF models is how to design an appropriate similarity function for a specific task.Xiaokun  Other studies also provide their own similarity measure based on different assumptions and constraints.Due to lack of ability of making complex assumptions and integrating side information, the memory based CF usually cannot get lower prediction accuracy than the model based CF.
Model based CF, also as a special machine learning algorithm, actually is a generalized regression or classification method, which estimates the coefficients from data.Unlike the typical supervised machine learning problems, there are no obvious raw features for data represented by the rating matrix.Therefore, matrix factorization has become the main technology to handle such data [32][33][34][35][36][37][38].The main difference between traditional matrix factorization approaches is how to represent the latent vectors.Probabilistic Matrix Factorization (PMF) supposes that the rating given a specific user and a specific item obeys a normal Gaussian distribution, and the user latent vector and item latent vector have zero-mean Gaussian distribution [39].Non-negative Matrix Factorization (NMF) learns the optimal nonnegative latent factors from data, which usually deals with the task of rating prediction on explicit feedback [40].Generally speaking, non-negative matrix factorization is used to deal with implicit feedback, where the dataset only includes whether the users are interested in some items but does not include whether the users dislike some items.For explicit feedback, the ratings are usually normalized by subtracting the mean, so non-negative matrix factorization is not suitable.Our task is predicting QoS values, which belong to explicit feedback.Furthermore, our GCF framework is a deep neural network architecture.Although we can add a constraint that the output of embedding layer is non-negative, the output of the above layers could be any value again.So the non-negative constraint has little influence on the GCF framework.However, we can easily use dropout to avoid overfitting for the GCF models.Some studies also integrate matrix factorization with memory-based CF algorithms.Even if matrix factorization CF algorithms have the state-of-the-art performance, only capturing linear interaction cannot get further significant improvement.
To efficiently find nonlinear interaction, some recent studies have provided some deep learning based collaborative filtering technologies.Two remarkable examples of such technologies are Google's Wide and Deep [12] and Microsoft's Deep Crossing [13].Both of them are generalized linear regression/classification models.The interaction between features is represented by deep neural networks, such as MLP or Residual Network.However, they are designed for tasks with a lot of features, and the interaction of them includes not only the user and the item.Neural Collaborative Filtering (NCF) is designed purely for user and item interactions [15].It creatively combines the linear interaction and nonlinear interaction, by applying the embedding technology and multiplication of embedding latent vectors.However, the above three studies focus on classification task.Our work is motivated by NCF, but we are focused on regression tasks, which have different evaluation protocols.Our models indeed made an obvious improvement in service QoS prediction.
Another way of modeling nonlinear interaction is using kernel matrix factorization, which is similar to NLMF method in our GCF framework [41,42].However, kernel matrix factorization is a shallow architecture, which has limited learning capacity.NLMF is more flexible, which can adjust the network depth to increase or reduce the learning capacity.Furthermore, kernel matrix factorization needs matrix inverse operation for each training iteration, which is hard to train for the large dataset.Finally, we can benefit from the newest regularizer technologies, such as dropout and batch normalization when using NLMF, which are more powerful than L1 or L2 regularizer used in the kernel matrix factorization.

Conclusion
Due to the wide range of QoS values, traditional collaborative filtering methods, especially the matrix factorization models, cannot capture the complex structures of user-service interactions.We give some formal description of the MFHBV problem of traditional matrix factorization models and then provide a more general collaborative filtering framework, which is called GCF framework.Traditional matrix factorization is just a special case of GCF framework.Instead of using the dot product of latent factors to make predictions, we use nonlinear transformation functions to get high level interactive feature to increase the capacity of models.At the same time, we use some complex regularization technologies to reduce the variance of models.
There are three types of GCF framework, using different interactive features to predict QoS values.We design 5 types of instances of these three GCF frameworks, using different neural network architectures, including the traditional matrix model.Particularly, we present a novel neural unit, called crossing residual unit (CRU).The generalization power is usually estimated using the prediction performance on the test set.In this paper, we can see that two GCF models, ResNet-CF and HyResNet-CF, get the lowest MAE and NMAE on the test set.Therefore, by using CRUs, models would have better generalization power.We use dropout as the regularization method in GCF models.Finally, we conduct extensive experiments on a real-world dataset.The experimental results show that our GCF models (not including UCMF) outperform the traditional collaborative filtering methods.Furthermore, we give some intuitive explanations on why GCF models have lower test error and why they can capture the user-service interaction better.To verify whether deeper neural networks can get better performance, we design several ResNet-CF architectures.It seems that adding more layers can improve the prediction accuracy but the number of layers cannot exceed some threshold value.
However, the experiments only focus on how to predict response time.The values are from 0 to 19s, and most of the values are around 3s.The future will conduct more experiments on QoS dataset, the values of which have much wider range.
(ii) We propose a novel neural network based framework for web service QoS prediction, termed as GCF (general collaborative filtering).The traditional matrix factorization models are special cases of this framework.We also provide four different implementations of GCF, including NLMF (Non-Linear Matrix Factorization), DNN-CF (Deep forward Neural Network Collaborative Filtering), ResNet-CF (Residual Network Collaborative Filtering), and HyResNet-CF (Hybrid ResNet-CF).
5 types of typical GCF models, and a general architecture for these different models is shown in Figure4.UCMF (Un-Constrained Matrix Factorization) is actually the traditional matrix factorization, and the predicted QoS value is r, = 1 T h ⊙ + 0 = u T  s  .NLMF (Non-Linear Matrix Factorization) is a simple extension of the traditional matrix factorization, which replaces u  and s  with Φ(u  ) and Φ(s  ), respectively.The transform functions are a series of ReLUs.UCMF and NLMF belong to the first type of GCF framework.DNN-CF and ResNet-CF belong to the second type of GCF framework.However, DNN-CF uses the ReLUs but ResNet-CF uses the CRUs.HyResNet-CF belongs to the third GCF framework.It uses ReLUs to transform u  , s  and uses CRUs to transform [u T  s T  ]

Table 3 :
Performance Comparison in terms of NMAE.(b) NMAE w.r.t.density

Figure 5 :
Figure 5: Performance comparison in terms of MAE and NMAE.

Figure 8 :
Figure 8: MAE and NMAE of ResNet-CF with different layers.

Table 5 :
NMAE of ResNet-CF with Different Layers.Wu et al. computed  the similarity between users or between items by comparing the attribute values directly.Huifeng Sun et al. proposed a similarity measure named normal recovery (NR), unifying similarity of the scaled user vectors (or item vectors) in different multidimensional vector spaces.

Table 2 :
Performance Comparison in terms of MAE.

Table 4 :
MAE of ResNet-CF with Different Layers.