^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

To expand the server capacity and reduce the bandwidth, P2P technologies are widely used in video streaming systems in recent years. Each client in the P2P streaming network should select a group of neighbors by evaluating the QoS of the other nodes. Unfortunately, the size of video streaming P2P network is usually very large, and evaluating the QoS of all the other nodes is resource-consuming. An attractive way is that we can predict the QoS of a node by taking advantage of the past usage experiences of a small number of the other clients who have evaluated this node. Therefore, collaborative filtering (CF) methods could be used for QoS evaluation to select neighbors. However, we might use different QoS properties for different video streaming policies. If a new video steaming policy needs to evaluate a new QoS property, but the historical experiences include very few evaluation data for this QoS property, CF methods would incur severe overfitting issues, and the clients then might get unsatisfied recommendation results. In this paper, we proposed a novel neural collaborative filtering method based on transfer learning, which can evaluate the QoS with few historical data by evaluating the other different QoS properties with rich historical data. We conduct our experiments on a large real-world dataset, the QoS values of which are obtained from 339 clients evaluating on the other 5825 clients. The comprehensive experimental studies show that our approach offers higher prediction accuracy than the traditional collaborative filtering approaches.

In recent years, video content accounts for a large proportion of global Internet consumption. Video steaming is gradually becoming the most attractive service [

An attractive way is that we can predict the QoS value of a node by taking advantage of the past usage experiences of a small number of the other clients who have evaluated this node. This refers to a famous technology, collaborative filtering (CF), which has been extremely studied in recommender systems [

However, the neighbor selection policy might need to be changed to improve the quality of video content delivery. If the new policy uses the new QoS property to select neighbors, but the historical user experiences include very few data of this new QoS property, CF methods would incur severe overfitting issues, and then each client might get worse neighbor recommendation list. Transfer learning aims to adapt a model trained in a source domain with rich labeled data for use in a target domain with less labeled data, where the source and target domain are usually related but under different distributions [

Unlike many supervised transfer learning tasks, we cannot simply fine-tune or freeze the weights of the network. The only information about the nodes in the video streaming P2P network is their identifiers (IDs) and the QoS evaluation historical experience. There is no raw feature for each node, and we need to lean abstract features for the nodes using embedding. Freezing the embedding features seems unreasonable. Furthermore, different QoS properties have different value ranges, and fine-tuning will make the final weights differ greatly from the initial weights pretrained in the source domain. Due to the sparsity of target domain labeled data, fine-tuning too much would incur severe overfitting problem.

In this paper we proposed a novel neural style collaborative filtering method, DTCF (Deep Transfer Collaborative Filtering). We can first train the model using the QoS evaluation data in the source domain and then adapt the model in the target domain with different QoS property. The core idea is that we only use the weights of first several layers to initialize the same layers of the model in the target domain, and randomly initialize the remaining layers. To control the degree of fine-tuning, we integrate the maximum mean discrepancy (MMD) measurement into the loss function [

We propose a novel neural collaborative filtering model for QoS prediction using transfer learning technology.

We provide a novel interaction layer to represent the relationship between latent embedding factors of the nodes.

We adopt partial fine-tuning and MMD measurement to train the target domain model to implement domain adapting.

The remainder of this paper is organized as follows: We introduce the related work in Section

Distributed user-generated videos delivery poses a new challenge to large-scale streaming systems. To stream live videos generated by users, many existing video streaming systems rely on a centralized network architecture [

Collaborative filtering is a rational QoS prediction technology to select neighbors for each client in the P2P video streaming network [

However, even if matrix factorization CF algorithms have obtained remarkable success, they have difficulty in dealing with cross-domain learning tasks if the output values of the source and target domain have different ranges. Deep neural networks can easily learn general and transferable features. More and more cross-domain applications adopt deep learning technologies and have yielded remarkable performance [

For the cross-domain QoS prediction in the video streaming P2P network, we are given a

We propose a novel neural architecture, outlined in Figure

DTCF architecture.

Since we do not use any concrete feature for each node, we need to learn abstract features for them. Here, we use embedding layer to learn a continuous latent vector/factor

If we get two latent vectors for

Unfortunately, it is too simple to completely represent the complex interaction between nodes. In this paper, we propose a novel interaction layer to tackle this problem, which has powerful representation capacity. We will give the design details in Section

Above the interaction layer, we use ReLU as the hidden layer. We might need multiple ReLU layers. The ReLU activation function is as follows.

Finally, we use a fully connected layer to generate the output. When training the model in the source domain, we use the regression loss. We then use the all the layers of the pretrained model but the last FC layer to construct the model for target domain. The weights of these layers are kept as the initialized weights of the target domain model, but the final FC layer is initialized randomly. To avoid the overadaptation problem, we use both the domain loss and regression loss to train the target domain model. We will describe how to design the domain loss in Section

Since we can assign a unique integer number as the identifier for each node in the network, we can use a one-hot vector to represent the identifier. If we have at most

Therefore,

There are two inputs of the interaction layer,

Interaction layer.

Suppose the output of interaction layer is a vector

If the length of

The output of the last ReLU layer of the model in the source domain is denoted as

Let

Denote

If the function class

Denote

Similarly, the empirical estimate can be defined now as follows.

In this paper, we use the empirical estimate of

The total loss of target domain includes regression loss and MMD loss. We use the minibatch to train the model. Only a small group of examples are used to compute the loss per training iteration. Denote the set of the minibatch examples in the source domain

To optimize our model, we need to compute the gradient of each weight. For any weigh related to both of the regression and domain loss, its gradient is computed as follows.

Note that

We first train the model of the source domain using the loss function

After training, we use the weights of this model to initialize the model in the target domain except the weights of the last FC layer. The last FC layer of the model of the target domain is initialized randomly.

While training the model of the target domain, we use the loss function

For each training iteration, we randomly select examples in the dataset, and compute the gradient according to formulas (

We use ADAM (Adaptive Moment Estimation) as the optimizer.

We conduct our experiments on a publicly large accessible dataset, WS-DREAM dataset#1, obtained from 339 hosts doing QoS evaluation on the other 5825 hosts. There are two types of QoS properties in this dataset: response time and throughput. Here, we use the response time as the source domain, and the throughput as the target domain.

For the source domain, we randomly extract 30% (density) of the data as the source training set. For the target domain, we construct 5 different training sets with different density of 0.5%, 1%, 1.5%, 2%, 2.5%, and 3%. Consequently, the remaining data is the test set.

We adopt a common evaluation metric: Mean Absolute Error (MAE), which is widely employed to measure the QoS prediction quality.

We compare our methods with some traditional collaborative filtering methods: UPCC, IPCC, UIPCC [

We conduct 10 experiments for each model and each sparsity level and then average the prediction accuracy values.

The results are reported in Figures

As the sparsity level increases, the MAEs of all the models decrease.

Our DTCF methods outperform the other traditional collaborative filtering methods, especially when the training set is extremely sparse.

DTCF model has more weights that need to be trained than the other models, but it gets the best performance, which indicates that the relationship between nodes is very complex, and shallow models cannot capture these structures.

MAE with respect to density.

MAE comparison for each density.

Density=0.5%

Density=1%

Density=1.5%

Density=2%

Density=2.5%

Density=3%

Although shallow models are not easily overfitting when the target domain training dataset is extremely sparse, they cannot transfer rich information from the source domain. The deep models might easily incur overfitting problem, but they can learn common latent features from the source domain. To balance this dilemma, we need to control the degree of fine-tuning the deep model. This experiment shows that MMD domain loss is an efficient way of controlling the adapting degree.

The network depth usually has important impact on the prediction performance. Here, the number of neurons of each ReLU is set to 128, and we add the number of ReLU layers from 1 to 6 to see how the MAE values change. The experimental result is outlined in Figure

Adding more ReLU layers can get better prediction performance, but when the depth exceeds a limited value, the performance starts to become worse.

Although adding more ReLU layers can improve the performance, it seems that enlarging the size of the training data would be more helpful.

Sometimes, adding more layers would not improve the performance anymore, but it also does not get worse prediction performance. This indicates that deep neural network has some kind of regularization property.

MAE with respect to number of ReLU layers.

Actually, if the training dataset is very large, adding more layers usually does not incur overfitting problems, but for the cross-domain learning, the target domain has very little data, so the network depth needs control.

Another hyperparameter that we need to determine is the Gaussian kernel bandwidth. By default, it is set to the median pairwise distance on the source training data. We scale the default value from 0.25 to 2.0, and the experimental result is outlined in Figure

Obviously, the default value is a rational choice, and scaling too small or too large would get worse prediction performance.

If the bandwidth is too large, the kernel will be approximately equal to 1, and the nodes would look the same. We cannot propose personal recommendation for them.

If the bandwidth is too small, the kernel will be approximately equal to 0, and the nodes cannot find similar neighbors to follow their past experiences.

MAE with respect to Gaussian kernel bandwidth scale.

Selecting neighbors in terms of the QoS is an effective way of providing high quality contents in video streaming P2P networks. Due to the heterogeneous network conditions, the QoS between any pairs of nodes is different. However, evaluating the QoS of all the nodes for each user is resource-consuming. An attractive way is to adopt collaborative filtering technologies, which use only a small amount of past usage experience.

Unfortunately, the video content providers might often choose different QoS properties to select neighbors. Traditional CF methods cannot solve the cross-domain QoS prediction problem. This paper proposed a novel neural style CF method based on transfer learning. We first outlined our model architecture and then introduced the details of important parts of this model. To avoid the overadaptation problem, we combined domain loss and prediction loss together to train the model of the target domain. We adopted MMD distance as our domain loss, and we also provide its principle and how to compute the gradient. Finally, we conducted our experiments on a real-world public dataset. The experimental results show that our DTCF model can outperform the other models for cross-domain QoS prediction.

The WS-Dream data used to support the finding of this study is owned by a third party, which is an open dataset and is deposited in “

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work are supported by the National Nature Science Foundation of China (No. 61602399 and No. 61502410).