Using NearestGraph QoS Prediction Method for Service Recommendation in the Cloud

the original


Introduction
As the development of mobile Internet and agility of distributed system services [1,2], cloud computing is migrating to the fusion of cloud and fog computing since fog computing is able to better satisfy demands on lower latency and short-fat connection.At present, the composed distributed system [3] is becoming the main solution accepted by the majority [4].However, a wide range of cloud-aware services is produced to cater for the fog cloud environment.In this situation, mobile users often feel confused to select proper cloud-aware services due to the appearance of redundant cloud-aware services with identical functionalities but a different quality of service (QoS) [5,6].Recommender systems are designed to address the suitable matching problem with mobile users and cloud-aware services under information overload.
The key to cloud-aware service selection and recommendation is QoS [7].QoS is defined as a set of properties of specific cloud-aware services such as response-time, throughput, reputation, and the like, which is treated as an important criterion to distinguish among different functionally equivalent services [8].In the fog cloud environment, QoS information is normally collected and stored in various fog servers, instead of being transferred to the remote cloud directly, due to the big volume of data and heavy transmission cost.In this situation, QoS information is always distributed but not centralized [9], which means QoS information is often sparse and unavailable for mobile users.Therefore, motivated by making an effective recommendation, it is a feasible way to complete missing QoS values by making predictions.
In fact, all roles in the fog cloud environment have the motivation to predict QoS before their assignments.A typical example of the fog cloud environment [10][11][12] is shown in Figure 1, which includes three roles, service user, service broker, and service designer.The same problem is happening to each role on how to manage cloud-aware services with high-quality performance.For example, service users expect more qualified services which respond more quickly while meeting basic functions.In general, it is necessary for the three roles in the fog cloud environment to predict QoS The architecture of fog cloud environment: each role wants to manage a cloud-aware service with "good" performance, especially QoS.However, QoS information is sparse and often varies among different roles.QoS prediction can achieve the goal of finding "good" performance through the analysis of historical QoS information.
values due to the following reasons: (1) service user can only get a limited number of QoS values caused by time-andmoney-consuming QoS invocation, which makes it difficult for cloud-aware service recommender to make a decision, (2) service broker always has a strong desire to manage cloudaware services with good performances, and (3) service designer needs to deploy cloud-aware services that satisfy QoS constraints to avoid punishment.Therefore, QoS prediction is a critical issue for cloud-aware service deployment, selection, and recommendation.At present, the studies on QoS prediction have made certain progress in recent years.Many scholars prefer to "fill" the unknown QoS values through historical QoS information and formulate it as a matrix completion problem [13].Chen et al. [14] take user-service geographical location into account to improve prediction accuracy.Wang et al. [15] introduce more QoS values affecting aspects such as time and location.Wu et al. [16] answer this problem by considering the relationship between similarity and candidate's consistency.Some other researchers devote them to finding solutions on how to improve the poor credibility of a fog cloud environment.Tang et al. [17] apply the trust concept for cloud-aware service QoS prediction.Su et al. [18] make a prediction for missing QoS values based on the trust relationship.
However, to the best of our knowledge, there is still a lack of research efforts explicitly targeting on the fluctuation of QoS values related to mobile users' status and services' status.In a highly dynamic Internet environment, QoS values of cloud-aware services often fluctuate in a large range due to the variety of users' mobile networking environments and physical distance between mobile users and fog servers.There is a current situation that some services perform more "unstable" according to a study on the real world QoS [19].We select two services in random which is invoked by 339 users and draw their QoS values distribution as shown in Figure 2. We all feel service  in blue is more unstable compared with service  in orange intuitively in Figure 2. Therefore, we can conclude that a cloud-aware service with a wide QoS range performance is not of general applicability and should not be recommended to other users since it is difficult to make an accurate prediction when candidate services with "unstable" performance are employed.
In this paper, the problem of QoS prediction is formulated to leverage historical QoS information.Inspired by the fact of QoS fluctuation, we propose a novel neighbor-based QoS prediction algorithm under the assumption that QoS values have a close relation with services and users in the fog cloud environment.In our approach, a concept and quantization method is put forward to represent the stable status of services and users in the fog cloud environment.And a graph structure is adopted to recognize stable or unstable candidate and to expose their popularity at the same time.Based on this, a NearestGraph method is used to generate an optimal prediction order to get the higher prediction accuracy.
The remainder of this paper is organized as follows.Section 2 introduces related works of QoS prediction and existing methods.Section 3 presents our proposed QoS prediction method for cloud-aware service recommendation.Section 4 provides our experimental results and the details of our experimental implementation.Section 5 sets out our conclusion and looks forward to future works.

Related Works
At present, there are a lot of efforts and results devoted to tackling the issue of QoS prediction.Initially, scholars adopt static methods to make a QoS prediction.Static methods use the arithmetic average value for prediction, including average QoS value from global, user, and service, respectively.These methods are simple and easy to implement, disregarding the situation-aware factors of users and services.Moreover, these static methods cannot reflect the dynamic properties of QoS values, which are leading to greater prediction error between predicted and actual values according to our experimental results in Section 4.
Motivated by the success of traditional recommender systems, existing works on QoS prediction in the fog cloud environment is usually based on collaborative filtering.Collaborative filtering (CF) methods are widely used to rate prediction in recommender systems.It exploits the similarity between users' experiences to predict user preference on unknown items.The intuitive idea is to identify "similar" users with the active user and to predict the active user's preference based on these similar users' feedback.CF can be further divided into two main categories: model-based method and neighbor-based method.
Model-based method makes a prediction from the known QoS values by learning a predictive model [20].Observed values are used to learn two matrices which are the basis to calculate similarities among users.However, model-based method suffers from the ignorance of the low-rank structure of real world user-service matrices [21].The main idea of the model-based method is based on matrix completion, in which the key is to exploit the low-rank structure of the user-service matrix.Lee et al. [22] present an algorithm for nonnegative matrix factorization indicating that there is only a small number of factors influencing the service performance.Some scholars think QoS has strong relation with time and put forward an online prediction.Zhu et al. [23] propose a method for running cloud-aware service to predict its QoS value.
The neighbor-based method uses QoS values of similar users or services to make QoS predictions directly.Shao et al. [24]first introduce a collaborative filtering approach for similarity mining and inference based on historical QoS information in the user-service matrix.They perform positive and negative user similarity calculations separately and integrate them using a weighted mean equation.Zhu et al. [25]give a QoS prediction approach based on multidimension, which takes timing constraints, QoS, throughput, fairness, and load balancing into account.Zheng et al. [5] propose a novel QoS ranking prediction model with the consideration of different cloud users having different preferences for different QoS attributes values.
In this paper, we mainly focus on the neighbor-based collaborative filtering since it is simple to implement and the prediction results are often easy to explain.The prediction accuracy of the neighbor-based method is highly influenced by the available similar candidates.Similar candidates play an effective role in QoS prediction phase mainly since they come from similar computation and assign more or less importance to the target in the prediction.However, the sparsity of QoS information always degrades the accuracy of QoS prediction.In our proposed approach, we address this challenge by introducing graph structure to expose candidate's own popularity.

NearestGraph QoS Prediction
In this section, the problem of QoS prediction is described and formulated in Section 3.1.After that, both user-user similarity and service-service similarity are computed in Section 3.2 to select neighbors.Our NearestGraph algorithm is presented in Algorithms 1 and 2 to predict missing QoS values at last.
Figure 3 shows a matrix  formed by  users and  cloudaware services.The shaded part of the matrix indicates the user has invoked the cloud-aware service and has rated the corresponding QoS value.The blank part indicates the user has not invoked the cloud-aware service and the QoS values need to be predicted.The objective of the missing QoS value prediction is to make the user-service matrix denser within certain iteration phases [26].
Due to analysis of real world QoS datasets, QoS values can vary widely and are highly skewed with large variances that degrade the accuracy of prediction.Without loss of generality, we apply the following function to QoS data in order to map QoS values onto the interval (0, 1).
where   and   are the minimum and maximum QoS values, respectively.After the similarity computations, we can get the user similarity matrix and the service similarity matrix.At the same time, we can also identify their neighbors by similarity values in the ascending order.Traditional top-K algorithms select the top  most similar neighbors for making missing value prediction.In practice, some neighbors with negative similarity values could greatly decrease the prediction accuracy.In this paper, we exclude dissimilar neighbors with negative enhanced-PCC values.We employ the following equation to find a set of proper similar users for user  as Ψ  :

Neighbors
where rank  () is the ranking position of user  in the similarity neighbors of user  and  indicates the lowest ranking position manually.
In the same way, we can get the set of proper similar cloud-aware services for cloud-aware service  as Φ  : where rank  () is the ranking position of cloud-aware service  in the similarity neighbors of cloud-aware service  and  indicates the lowest ranking position manually.

Predicting Missing QoS Values with NearestGraph.
After user neighbors selection, we find an interesting fact that some users or services are relatively "popular" to others.For example, user  is on the top-1 similarity ranking position of user .It also happens to user  when user  ranks top-1 in user 's similar neighbors.User  and User  may be confronted with the same situation.This is not an occasional case but happens for most similar neighbors.In order to expose this kind of popularity of users or services, we construct a directed graph by nearest neighbor graph as shown in Figure 4.
In Figure 4, a user is represented by usrG=(usrID, usrEdge), in which usrID labels a user and usrEdge shows the relationship between the user and his most similar neighbor-a directed edge will line from a user to his most similar neighbor.Therefore, the indegree of a vertex in our nearest neighbor graph indicates the degree to which other vertices are in favor of this vertex.A vertex with larger indegree means it is very "popular" and will have a higher influence on other vertices.It can be understood as the relationship between celebrities and fans in social networking sites.A celebrity who has more fans means greater appeal, which reveals the greater influence at the same time.Similar method can be used to represent cloud-aware service by servG=(servID, servEdge), in which servID labels a service and servEdge shows the relationship of a service and his most similar neighbor.
Furthermore, to reflect the stability of different users and cloud-aware services as shown in Figure 2, a concept of candidate stability is also proposed.We employ the following equation to describe the stability of user's status.
Similarly, we can describe the stability of cloud-aware service's status as follows.stability () = ∑ ∈  ( , −   )             ×   (7) where a smaller value of stability will indicate a more stable status.
In order to introduce the stability of users or services, we further extend above nearest neighbor graph to usrG=(usrID, usrWeight, usrEdge) and servG= (servID, servWeight, servEdge), respectively, in which usrWeigh is the stability of user's status and servWeigh is the stability of service's status.
In this paper, we believe both popularity and stability will play an important role in QoS prediction and should be used to obtain better prediction accuracy.Therefore, we propose an algorithm called NearestGraph to achieve this goal, which can generate an optimal prediction order by nearest neighbor graph based on different popularity and stability.The main strategies of NearestGraph are the following three key points.
(1) Select the usrID with the minimum indegree.
(2) Select the usrID with the maximum weight if more than one vertex has the same indegree.
(3) Select the usrID with the minimum dictionary order if more than one vertex has the same indegree as well as weight.
The reason why we use those three rules comes from two facts: (1) those vertices with larger indegree, which means they are more popular, will affect more users and should be kept longer in our nearest neighbor graph to make full use of their important influence; (2) those vertices with larger weight, which means they are more stable, will have more positive impact on QoS prediction and should be kept longer in our nearest neighbor graph to make full use of their important influence.Now we take an example, shown in Figure 5, to illustrate the process of our NearestGraph algorithm.
Phase (a) is the initial state of nearest neighbor graph in which a property of vertex called weight is introduced to represent the status of stability and the directed edge is used to show the relationship between the user and his most similar neighbor.For example, vertex V 5 with weight 0.42 means it is more stable than vertex V 2 with weight 0.40, and vertex V 1 pointing to vertex V 2 expresses V 2 is the most similar neighbor of V 1 .Then we will decide which vertex will be predicted according to the weight and indegree shown in nearest neighbor graph.According to No. 1 strategy of NearestGraph, V 1 will be predicted first since it has minimum indegree.In the next phase (b), there are two vertices with the same indegree after applying No. 1 strategy.Here No. 2 strategy can help us to make a decision in such a situation.V 5 should be predicted in phase (b) for its high stability.Then we loop through the three-key-point strategies to obtain a complete prediction order until there is only one vertex left in the graph structure.We can get a prediction order  : The details of our NearestGraph algorithm for  () are as Algorithm 1.
Based on the prediction order generated by Algorithm 1, user-based method employs the values of entries to predict the missing entry  , in the user-service matrix as follows: where   and   are the average existence QoS values of different cloud services rated by   and   , respectively.We can also give the prediction order for V () in a similar way as shown in Algorithm 2. And the values from service prediction order are correspondingly employed for prediction in service-based method as follows: where   and   are the average existence QoS values of   and   rated by different users, respectively.In this paper, both user-based and service-based approaches are adopted as follows:   The mixed approach can help us to get much more missing QoS values and therefore can improve the accuracy of prediction.The parameter  controls how much fusion proportion of these two methods and can be trained on a sample dataset from the real world.The complete QoS prediction algorithm is summarized in Algorithm 3.

Experiment
In this section, we evaluate the effectiveness of our proposed method on a distributed and parallel platform, Spark system.We evaluate the QoS prediction accuracy of our proposed method based on a real world QoS dataset which is widely used to evaluate the performance of QoS prediction.It contains response-time (response-time measures the time duration between user sending a request and receiving a response) and throughput (throughput stands for the data transmission rate of a user invoking a service) of 5828 services invoked by 339 distributed computers located in 30 countries from PlanetLab.According to statistics of this QoS dataset as shown in Table 1, the range of response-time and throughput are 0−20 s and 0−1000 kbps, respectively, and the means of response-time and throughput are 0.910 s and 47.386 kbps, respectively.
There are 100837 QoS records about response-time property and 143422 QoS records about throughput property in this QoS dataset.The corresponding user-service matrices on both these two QoS properties have some entries with the value of -1, which means the current QoS value cannot be obtained or the service is unreachable in the real world.Therefore, the entries with the value of -1 are where we need to predict in the matrix.

Metrics.
To evaluate the performance of our proposed NearestGraph method, we compare its prediction accuracy with some neighbor-based CF methods by computing mean absolute error (MAE) and root-mean-square error (RMSE), which is to calculate the errors between predicted values and real values.The metric MAE is defined as and RMSE is defined as where  , is the QoS value of cloud-aware service   observed by user   ,  * , is QoS value of cloud-aware service   that would be observed by user   as predicted by a method, and  is the number of predicted QoS values.According to the definitions, the smaller value of metric indicates the higher accuracy of prediction.

Performance Comparison.
In this part, we conduct an overall comparison experiment on our NearestGraph method and some baseline algorithms in neighbor-based CF fields on both MAE and RMSE.They are listed as follows: UMean: mean QoS values obtained by a user are used to predict the missing QoS value which has not been obtained by this user.
IMean: mean QoS values obtained by all users are used to predict the missing QoS value which has not been obtained by some users.
UPCC: it is a user-based collaborative filtering method, which uses similar users calculated by Pearson Correlation Coefficient to make a prediction [24].IPCC: it is an item-based collaborative filtering method, which uses similar items calculated by Pearson Correlation Coefficient to make a prediction [27].
WSRec: it is a hybrid collaborative filtering method that combines IPCC and UPCC and uses both similar users and similar services for QoS prediction [5].
In order to simulate the users' invocation of cloudaware services in the real world, we remove some entries from user-service matrix in random and compare their values with predicted ones.For example, 10% represents that we randomly remove 90% entries and use the remaining 10% entries to predict the values of removed entries.The parameter settings of NearestGraph are  −  = 10 and  = 0.5 in the experiments.
Experiment results are shown in Table 2.We highlight the best performance of all methods for each row in Table 2.We can easily see from Table 2 that NearestGraph always obtains the minimum MAE and RMSE of response-time and throughput almost for all different matrix densities, which means it can improve the prediction accuracy.Moreover, with the value of matrix density increasing from 10% to 30%, the MAE and RMSE of NearestGraph method become smaller and smaller since a denser matrix will provide more information for the missing QoS value prediction.
Comparing the MAE and RMSE of response-time and throughput in Table 2, we can also find that the MAE and .That confirms that our proposed method focuses on facts of the QoS fluctuation and can make a better performance in a wide range of QoS values (just like the range of throughput is 0-1000 kbps and the range of response-time is only 0-20 s).

Impact of Matrix Density.
In order to explore the impact of matrix density, we compare the prediction accuracy of all the methods under different matrix densities and present the results in Figure 6.The density of the matrix increases from 10% to 30% with a step of 10%.The parameter settings in this experiment are  −  = 10 and =0.5.
The MAE and RMSE results of response-time are shown in Figures 6(a) and 6(b) and the MAE and RMSE results of throughput are shown in Figures 6(c) and 6(d).In these figures, the green line NearestGraph stands for is always below any other lines, which means our proposed NearestGraph method gets the smallest values of MAE and RMSE under different matrix densities.Moreover, we can observe that the performance of our NearestGraph method improves with the increase of matrix density, which indicates that collecting more QoS information will greatly enhance prediction accuracy when the matrix is sparse.

Impact of 𝜆.
The parameter  here controls how much fusion proportion of user-based and service-based method.A larger value of  means user-based approach will contribute more to the hybrid prediction.In Figure 7, we study the impact of parameter  in the proposed NearestGraph method on prediction accuracy by varying the values of  from 0 to 1 with a step of 0.1 under the condition of  − =10.
Figures 7(a) and 7(b) show the MAE and RMSE results of response-time and throughput, respectively.The prediction accuracies increase when we increase the value of  at first.But when  surpasses a certain threshold, the prediction accuracy decreases with the further increase of .From Figure 7, we can also find that NearestGraph gets the best performance when  ∈ [0.4,0.7].

Impact of 𝑇𝑜𝑝 − 𝐾.
The parameter  −  determines the size of candidates sets including similar users and similar services.In Figure 8, we study the impact of parameter  −  in the proposed NearestGraph method on prediction accuracy by varying the values of  −  from 2 to 20 with a step of 2 under the condition of =0.5.
Figures 8(a) and 8(b) represent the MAE and RMSE results of response-time and throughput, respectively.The experimental results show that our NearestGraph will achieve best prediction accuracy (minimum MAE and RMSE) when  −  is set around 10.This is because too small  −  value will exclude useful information from some similar candidates, while too large  −  value will introduce noise from dissimilar candidates, which will impact the prediction accuracy.

Conclusion and Future Work
In the fog cloud environment, to reduce the data transmission cost from mobile users to the cloud, QoS information is often first handled by distributed fog servers instead of being sent to a remote cloud directly.However, such a cross-platform data distribution will lead to the sparsity of QoS information for service recommendation.Focusing on the fact that existing researches on missing QoS value prediction often ignore the QoS fluctuation in a wide range especially in the fog cloud environment, we propose a novel QoS prediction method by using NearestGraph algorithm for service recommendation.The key point of our approach proposed on the neighborbased method is the construction of nearest neighbor graph which is designed to expose stable and popular candidates, and the choice of making prediction in a certain order, which applies priorities to different candidates instead of traversing candidates in random to promote the final accuracy.Through a set of experiments on a real world distributed service quality dataset WS-DREAM for stimulating the fog cloud environment, we validate the feasibility of our method in terms of service recommendation accuracy and confirm the motivation that NearestGraph can get a good performance in large fluctuation of QoS properties.In summary, the paper makes the following key contributions: (1) We emphasize the fact of real world QoS values fluctuation in a wide range and take it into account to solve the inaccuracy of predicting missing values.
(2) We reveal the inner features of candidates behind neighbors and take their outer characteristic, stability, and popularity, in the fog cloud environment by constructing the nearest neighbor graph.and combinations on the QoS attributes in the future.Furthermore, we will use time series analysis for prediction and extend NearestGraph to describe accurate user and service status in the fog cloud environment.

2 WirelessFigure 1 :
Figure1: The architecture of fog cloud environment: each role wants to manage a cloud-aware service with "good" performance, especially QoS.However, QoS information is sparse and often varies among different roles.QoS prediction can achieve the goal of finding "good" performance through the analysis of historical QoS information.

Figure 4 :
Figure4: Nearest neighbor graph: we found that, in the sets of most similar neighbors for different users, some users tended to appear frequently.
Section 4.1 introduces two typical metrics to assess the prediction accuracy.The comparison experiments on the prediction accuracy are conducted with different baseline algorithms in neighbor-based CF fields in Section 4.2 and three key parameters of NearestGraph on the prediction accuracy are further demonstrated in Sections 4.3, 4.4, and 4.5.All the experiments are conducted by using 4 PCs with i5-4460 CPU and 16G RAM as our hardware platform.

( 3 )
Graph structure is employed to develop prediction order and enhance prediction accuracy.Currently we predict the values of different QoS attributes separately.And we are going to investigate on the correlations