A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.


Introduction
The rapid growth of the information on the Internet demands intelligent information agent that can sift through all the available information and find out the most valuable to us. In recent years, recommender systems [1,2] are widely used in e-commerce sites and online social media and the majority of them offer recommendations for items belonging to a single domain. Collaborative filtering (CF) algorithms [3] are the most widely used methods for recommender systems and they can be categorized into three classes, including memorybased algorithms [4], model-based algorithms [5], and matrix factorization based algorithms [6].
However, in real-world recommender systems, users usually dislike rating items and the items rated are very limited. Thus the rating matrix is very sparse. The sparsity problem has become a major bottleneck for most CF methods. To alleviate this difficulty, recently a number of cross-domain collaborative filtering (CDCF) methods have been proposed [7]. CDCF methods exploit knowledge from auxiliary domains (e.g., movies) containing additional user preference data to improve recommendation on a target domain (e.g., books) containing less user preference data. They can effectively relieve the sparsity problem in the target domain.
This assumption commonly appears in the real world. For instance, Amazon website contains different domains, including Books, Music CDs, DVDs, and Video tapes. They share the same user set though their items are totally different. For another instance, Amazon Book Network and Dang-Dang Book Network sell similar products to different users. It is easy to find an intersection in which the two domains share the same items. The other class contains a limited number of CDCF methods [12,13] that do not require shared users and items. However, they assume that both users and items in an auxiliary data source are related to the target data.

Computational Intelligence and Neuroscience
Among previous works of the first class, Berkovsky et al. [8] mention a neighborhood based CDCF (N-CDCF), which can be viewed as the cross-domain version of a memory-based method, that is, N-CF [4]. Hu et al. [9] mention a matrix factorization based CDCF (MF-CDCF), which can be viewed as the cross-domain version of a matrix factorization based method. Both N-CDCF and MF-CDCF accommodate items from all domains into a single matrix so as to employ single-domain CF methods. However, they assume the homogeneity of items. Obviously, items in different domains may be quite heterogeneous, and the above two models fail to take this fact into account.
Singh and Gordon [10] propose a Collective Matrix Factorization (CMF) model. CMF couples rating matrices for all domains on the User dimension so as to transfer knowledge through the common user-factor matrix. Hu et al. [9] propose a generalized Cross-Domain Triadic Factorization (CDTF) model over the triadic relation user-item-domain. Considering that not all the auxiliary domains are equally correlated with the target domain, CMF and CDTF assign different weights for different auxiliary domains. This is an advantage of them over N-CDCF and MF-CDCF. However, CMF does not provide a mechanism to find an optimal weights assignment for the auxiliary domains. Though CDTF assigns the weights based on genetic algorithm (GA), the performance is susceptible to the setting of the initial population.
Pan et al. [11] propose a Transfer by Collective Factorization (TCF) model. TCF model requires that the target domain and the auxiliary domain share the same aligned users and items simultaneously. In this assumption, they explore how to take advantage of knowledge in the form of binary ratings (like and dislike) to alleviate the sparsity problem in numerical ratings. The two-side (user-side and item-side) assumption can provide more precise information on the mapping between auxiliary and target data, which can lead to higher performance. However, this assumption does not very commonly appear in the real world.
Among previous works of the second class, Li et al. [12] propose a codebook-based knowledge transfer (CBT) for recommender systems. CBT achieves knowledge transfer with the assumption that both auxiliary and target data share the cluster-level rating patterns (codebook). Further, Li et al. [13] propose a rating-matrix generative model (RMGM). RMGM is derived and extended from the Flexible Mixture Model (FMM) [5], and we can consider RMGM as a multitask learning (MTL) [14] version of CBT with the same assumption. Both CBT and RMGM require two rating matrices to share the cluster-level rating patterns. In addition, CBT and RMGM cannot make use of user-or item-side shared information.
In this paper, we assume the auxiliary domains contain dense rating data and share the same aligned users with the target domain. Previous works on this assumption cannot compute proper weights for different auxiliary domains. In order to overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features both in the target domain and in the auxiliary domain. We use different features to represent different domains. Instead of assigning proper weights to different auxiliary domains, we just assign proper weights to different features. Then we combine all the features together and convert the cross-domain recommendation problem into a regression problem. Therefore, the important information in the auxiliary domains can be transferred to the target domain by the constructed features from the auxiliary domains. Finally, a nonparametric regression method, that is, Locally Weighted Linear Regression (LWLR) model [15], is used to solve the regression problem. We conduct extensive experiments to show that the proposed algorithm can outperform many state-of-the-art single-domain or cross-domain CF methods.
The remainder of this paper is organized as follows: Section 2 reviews the related works on CDCF methods. Section 3 proposes our FCLWLR model. In Section 4, we conduct extensive experiments to test the performance of the proposed algorithm. We conclude the paper and give future works in Section 5.

Related Works
Some of the earliest work on CDCF was carried out by Berkovsky et al. [8], who deployed several mediation approaches for importing and aggregating user rating vectors from different domains. Currently, CDCF methods can be categorized into two classes. One class assumes shared users or items [8][9][10][11], and the other class does not require shared users or items in different domains [12,13].
In the first class, Berkovsky et al. [8] mention an early neighborhood based CDCF (N-CDCF). As neighborhood based CF (N-CF) computes similarity between users or items, which can be subdivided into two types, user-based nearest neighbor (N-CF-U) and item-based nearest neighbor (N-CF-I), the N-CDCF algorithm can also be divided into two types: a user-based neighborhood CDCF model (N-CDCF-U) and an item-based neighborhood CDCF model (N-CDCF-I). For simplicity, we only give a detailed review on N-CDCF-U, and the detailed method of N-CDCF-I is in the same manner.
Let D = { 0 , 1 , . . . , } denote all the domains for modeling, = { 1 , 2 , . . . , } denote the users in D, and = { 1 , 2 , . . . , ( ) } denote items belonging to the domain (0 ≤ ≤ ), where ( ) denotes the item set size of . For a user-based CDCF algorithm, we first calculate the similarity, ,V , between the users and V who have corated the same set of items. The similarity can be measured by the Pearson correlation: where ,V = ∩ V ( = ⋃ ∈ , V = ⋃ ∈ V ) denotes the items over all domains D corated by and V; , and V, are the ratings on item given by users and V, respectively; and and V are the average ratings of users and V for all the items rated, respectively. Then the predicted rating of an item Computational Intelligence and Neuroscience for user can be calculated by a weighted average strategy [4]: where , denotes the set of top users ( neighbors) that are most similar to user who rated item .
In addition to the above model, the traditional MF model can also be employed to solve the CDCF problems straightforward. The Funk-SVD model is the most commonly used MF model [6]. As shown in Figure 1, for a singledomain collaborative filtering recommendation system, the Funk-SVD model maps both users and items to a joint latent factor space of dimensionality .
In this model, each item is associated with a latent vector ∈ , and each user is associated with a latent vector ∈ . measures the distribution of item on those latent factors, and measures the interest distribution of user on those latent factors. The resulting dot product, , captures the interaction between user and item . This approximates user 's rating on item , which is denoted byî n the following form:̂= . ( To learn the latent vectors ( and ), the Funk-SVD model minimizes the regularized squared error on the set of known ratings Here, is the set of the ( , ) pairs for which is known. The constant controls the extent of regularization to avoid overfitting and is usually determined by cross-validation [16]. An effective approach to minimize optimization problem (4) is stochastic gradient descent, which loops through all ratings in the training set. For each given training case, the system predicts and computes the associated prediction error Then it modifies the parameters by a magnitude proportional to (i.e., the learning rate) in the opposite direction of the gradient, yielding Based on the traditional MF model, we can solve the CDCF problems straightforward. We can pour all the items from different domains together and then an augmented rating matrix, M D , can be built by horizontally concatenating all matrices as shown in Figure 2.
Thus we can use MF model to obtain the latent user factors and latent item factors. These latent factors are used for prediction. In this paper, the MF model on CDCF problems is denoted as MF-CDCF.
N-CDCF and MF-CDCF are developed straightforward from single-domain neighborhood and MF based CF methods, respectively. However, single-domain model assumes the homogeneity of items. Obviously, items in different domains may be quite heterogeneous, so N-CDCF and MF-CDCF fail to take this fact into account. Hence, the performance of them is not always satisfactory.
Singh and Gordon [10] propose the Collective Matrix Factorization (CMF) model. CMF [10] is proposed to collectively factorize one user-item rating matrix R ∈ R × , Y⊙R ∼ UV , and one item-content matrixR ∈ R̈× ,R ∼ÜV , with the idea of sharing the same item-specific latent features V: which means that the item-specific latent feature matrixV is shared as a bridge to enable knowledge transfer between two data sets. Hu et al. [9] propose the CDTF model, in which they consider the full triadic relation user-item-domain to effectively exploit user preferences on items within different domains. They represent the user-item-domain interaction with a tensor of order three and adopt a tensor factorization model to factorize users, items, and domains into latent feature vectors. The rating of a user for an item in a domain is calculated by element-wise product of user, item, and domain latent factors. A major problem of tensor factorization, however, is that the time complexity of this approach is exponential as it is O( ), where is the number of factors and is the number of domains. In addition, both CMF and CDTF need to adjust the weights of the auxiliary domains according to the similarities between each auxiliary domain and the target domain. Usually, computing proper weights is a tough problem.
Pan et al. [11] present a Transfer by Collective Factorization (TCF) model to transfer knowledge from auxiliary data of explicit binary ratings (like and dislike), which alleviates the data sparsity problem in numerical ratings. TCF collectively factorizes a 5-star numerical target data R and 4 Computational Intelligence and Neuroscience a binary like/dislike auxiliary data and assumes that both user-specific and item-specific latent feature matrices are the same. Besides the shared latent features, TCF uses two inner matrices to capture the data-dependent information, which is different from the inner matrix used in CBT [12] and RMGM [13]. TCF requires users and items of the target rating matrix and the auxiliary like/dislike matrix to be both aligned. In addition, they can only deal with the scenario of one auxiliary domain. Hence, it is not applicable to the problem studied in this paper.
In the second class, Li et al. [12] propose a CBT model. They first compress the auxiliary rating matrix, R ∈ R × , into an informative and yet compact cluster-level rating pattern representation referred to as a codebook, denoted as B ∈ R × . Then, they reconstruct the target rating matrix via codebook expansion UBV with the following constraint: which means that the rating pattern is shared between target data and auxiliary data. Note that U ∈ {0, 1} × and V ∈ {0, 1} × are membership indicator matrices. Further, Li et al. [13] propose a RMGM model. In this model, the knowledge is shared in the form of a latent clusterlevel rating model. Each rating matrix can thus be viewed as drawing a set of users and items from the user-item joint mixture model as well as drawing the corresponding ratings from the cluster-level rating model. RMGM is a MTL version of CBT with the same assumption. Both CBT and RMGM require two rating matrices to share the cluster-level rating patterns. They assume that the items in an auxiliary data source (e.g., books) are related to the target data (e.g., movies). Hence they are also not applicable to the scenario studied in this paper.

Our Model
Since previous CDCF works cannot assign proper weights to different auxiliary domains, the recommendation performance is not always satisfactory. To overcome this drawback, in this paper, we first construct features in different domains and use the features to represent different domains. Then we combine the constructed features together and convert the original recommendation problem into a regression problem. Instead of assigning proper weights to different auxiliary domains, our aim is to assign proper weights to different features. In order to guarantee the accuracy of the weights for different features, we employ a nonparametric regression method, that is, Locally Weighted Linear Regression (LWLR) model, to solve the regression problem. Below we give the details of our model. and denote the location information of user and item , respectively. However, such a two-dimensional feature vector is not sufficient to discriminate the user ratings. Hence, we require some other features to reflect the user preferences with the help of rating information from the auxiliary domains.
In this paper, we assume the auxiliary domains contain dense rating data and share the same aligned users with the target domain. In this scenario, we employ a user-based nearest neighbor (N-CF-U) algorithm to fill the missing ratings in the auxiliary domains. We expand the trivial location feature vector in the target domain with the corresponding row vectors from all the auxiliary domains. Thus we can effectively add more features to reflect the user preferences. Given a user-item interaction ( , , ) ∈ 1 × 1 × {1, 2, 3, 4, 5} in the target domain, we can expand the location feature vector ( , ) in the target domain with all the row vectors of user from all the auxiliary domains. Thus the expanded feature vector corresponding to the user-item interaction can be represented as ( , , 1 , . . . , ), where ( = 1, . . . , ) represents the complete row vector of user in the th auxiliary domain obtained by N-CF-U algorithm.

Regression Model
Building. Assume 1 is the target domain and 2 , . . . , +1 denote the auxiliary domains. We can model the standard recommendation problem in the target domain 1 by a target function : 1 × 2 × ⋅ ⋅ ⋅ × +1 → , where 1 denotes the feature vector (i.e., location information) in the target domain, (2 ≤ ≤ + 1) denotes the feature vector (i.e., the corresponding row vector) in the ( − 1)-th auxiliary domain, and denotes the rating value.
For example, we represent each user-item interaction ( , , ) ∈ 1 × 1 × {1, 2, 3, 4, 5} with a feature vector ( , , 1 , . . . , ) and a regression function value . Thus we can represent each user-item interaction as a training sample, and the original recommendation problem can be converted into a regression problem. We use Figure 3 to illustrate our method.
In Figure 3, , V, , and denote four users in all the domains, 0 1 , 0 2 , and 0 3 denote three items in the target domain,  ) .
The rating value 2 can be regarded as the regression function value. In the same manner, we can also represent other useritem interactions as training samples. Thus the rating matrix can be converted into a training set and we can convert the recommendation problem into a regression problem.

Regression Model Solving.
In the constructed regression problem, the dimension of the feature vector is 2 + 3 + ⋅ ⋅ ⋅ + +1 + 2, where (2 ≤ ≤ + 1) is the item size of the ( − 1)th auxiliary domain, which is always very large in real-world application. As a consequence, the constructed regression problem is very high-dimensional. Below we will propose two methods to improve the learning performance of the constructed regression problem.
Firstly, we can filter out the top rated items from the original rating matrices of the auxiliary domains. All the users on the filtered columns will compose a denser submatrix. Thus we can effectively reduce the dimension of the problem and reserve the user preference as much as possible.
Secondly, it always leads to underfitting or overfitting problem for parametric methods on high-dimensional problem. For example, a linear or quadratic regression model may not fit the high-dimensional data well (leading to underfitting problem), while a high-order regression model may fit the high-dimensional data severely (leading to overfitting problem). It is difficult to choose a proper order or a proper form for the parametric regression models. To overcome the drawback of parametric regression models, we employ a nonparametric regression model, that is, Locally Weighted Linear Regression (LWLR), for the constructed regression problem. Details of LWLR model are given in the following.
Firstly, we expand the constructed feature vector by adding 0 = 1 (this is the intercept term). Then we build LWLR model in the following form: where denotes the coefficient vector of the linear equation; ( ) denotes the weight parameter computed by a Gaussian kernel function; is called the bandwidth parameter determined by cross-validation method [16]; denotes the dimension of the constructed feature vector; denotes the index of training samples; x ( ) denotes the feature vector of the th training samples; ( ) denotes the corresponding rating value; and x is the query point. Thus LWLR is also a lazy learning method.
As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. In this paper, we use stochastic gradient descent to solve the optimization problem (10). The update formula of is in the following form, where is the learning rate: The detailed algorithm is shown in Algorithm 1.
The complete algorithm of FCLWLR is given in Algorithm 2.

Experiments
In this section, we conduct extensive experiments to test the performance of the proposed algorithm. We compare our algorithm with seven state-of-the-art algorithms, namely, N-CF-U, UVD [6], CFONMTF [17], N-CDCF-U, MF-CDCF, CMF, and CDTF, where N-CF-U, UVD, and CFONMTF are 6 Computational Intelligence and Neuroscience Input: the incomplete rating matrix 1 , 2 , . . . , +1 corresponding to 1 , 2 , . . . , +1 Output: the complete rating matrix 1 (1) Filter out the top rated items in each auxiliary domain ( may be different across different auxiliary domains) to obtain denser sub-matrices; (2) Use N-CF-U algorithm to fill the missing ratings in the sub-matrices; (3) Feature construction in the target domain; (4) Feature construction in the auxiliary domain; (5) Convert the recommendation problem into a regression problem; (6) Train a regression model on the obtained training set based on Algorithm 1; (7) Predict the missing ratings in the target domain. Since the data format of the metadata can be shown in Box 1, it is not suitable to run recommendation algorithm directly. Hence we first convert the data format into a set of triples, ( , , ), where is the rating of user on item .

The Setting of the Compared Methods
(1) N-CF-U. A user-based neighborhood CF model: in this experiment, we use = 10 closest users.
(2) UVD. The UV decomposition model: map both users and items to a joint latent factor space of dimensionality . In our experiment, we try different latent factors {5, 10, 15, 20}. The weight of the regularization terms is tried with different values, {0.001, 0.01, 0.1, 1, 10, 100}. The learning rate is a constant typically having a value between 0.0 and 1.0. If the learning rate is too small, then learning will occur at a very slow pace. If the learning rate is too large, then oscillation between inadequate solutions may occur. In this paper, for simplicity, we set = 0.3.
(3) CFONMTF [17]. A coclustering based collaborative filtering model using orthogonal nonnegative matrix trifactorization: following the parameter setting method in [17], we compute the optimal value of and alternately, where and reflect the weights of three different models: ONMTF, user-based, and item-based. In detail, we conducted two Computational Intelligence and Neuroscience 7 experiments on each training set to identify the optimal combination coefficients. Firstly, let = 0 and compute the optimal value of which corresponds to the best evaluation metric when varies from 0 to 1. Secondly, we fix to be the optimal value and continue to compute the optimal value of . Besides, we choose 20 as the number of user/item clusters, 30% as the percentage of preselected user/item neighbors, and 20 as the size of user/item neighbors. (6) CMF. This is the collective matrix factorization, which couples rating matrices for all domains on the User dimension so as to transfer knowledge through the common user-factor matrix.
(7) CDTF. The Cross-Domain Triadic Factorization model: we use the same setting as that in [9]. More specifically, we also take the following strategy to initialize the individuals with exponential growth, where ∈ (0, 1] is a constant to scale weight, and are integers to control the range of weight, and 1 is an all-one vector with the length equal to the number of auxiliary domains In this experiment, to find the optimal weights assignment, we ran the GA with initial population w = {w 0.33 , w 0.66 , w 1 } and = −2, = 2; that is, there are totally 15 initial individuals with different scale. (8) FCLWLR. The proposed method using rating data from all the auxiliary domains: in this experiment, for the rating filling process with N-CF-U algorithm, we also use = 10 closest users. Different bandwidth parameters 2 ∈ {0.125, 0.25, 0.5, 1, 2, 4, 8} are tried and we use crossvalidation method to compute the best parameter value. For simplicity, we also set the learning rate = 0.3.
(9) FCLWLR CD. This is the proposed method only using rating data from the auxiliary domain of Music CDs.
(10) FCLWLR DVD. This is the proposed method only using rating data from the auxiliary domain of DVDs.
(11) FCLWLR VHS. This is the proposed method only using rating data from the auxiliary domain of VHS video tapes.

Evaluation Protocol.
We first use mean absolute error (MAE) as an evaluation metric in our experiments. MAE is defined as where denotes the set of test ratings, is the ground truth, and̃is the predicted rating. A smaller value of MAE means a better performance. However, what we often want is not to make a rating prediction for any item but to find the best items. In Top recommendations, a recommender is trying to pick the best items for someone. Hence, a model with a smaller value of MAE does not mean a better recommendation performance. Rather than getting the exact rating right, in Top recommendations we are interested in predicting whether an item would be among the user's favorites.
In our experiments, we also use another two metrics commonly used in information retrieval, that is, precision and recall, to measure the recommendation quality. Let ( ) be a recommended list based on the behavior of the user on the training set, and ( ) is the "liked" list of behaviors that the user has on the test set. Then, the precision of the recommended results is defined as The recall of the recommended results is defined as Precision gives us an estimate of how many of the items predicted to be "liked" for a user really belong to the "liked" list. Recall estimates how many of all the items in the user's "liked" list were predicted correctly.

Data Preparation for MAE.
We construct two data sets to conduct the experiment. In one data set, we selected Books as the target domain and Music CDs, DVDs, and VHS video tapes as the auxiliary domains. In the other data set, we selected Music CDs as the target domain and Books, DVDs, and VHS video tapes as the auxiliary domains.
For the first data set, we filtered out users who have rated at least 30 music CDs, 30 DVDs, and 30 VHS video tapes so as to construct denser rating matrices in the auxiliary domains. Finally, 496 users were selected, and in addition we retrieved all items rated by these users in these four domains and set aside top rated items for each domain, respectively. Thus the submatrices in the auxiliary domains are much denser than the original rating matrices. Table 1 shows the statistics of the data set for evaluation.
For the second data set, we filtered out users who have rated at least 90 Books, 30 DVDs, and 30 VHS video tapes so as to construct denser rating matrices in the auxiliary domains. Finally, 435 users were selected, and in addition we also retrieved all items rated by these users in these four domains and set aside top rated items for each domain, respectively. Table 2 shows the statistics of the data set for evaluation.
To simulate the sparse data problem, we constructed two sparse training sets, tr 20 and tr 75 , by, respectively, holding out 80% and 25% data from the target domain Book; that is, the remaining data of target domain for training is 20% and 75%. The hold-out data serve as ground truth for testing. Likewise, 8 Computational Intelligence and Neuroscience   we also construct two other training sets tr 20 and tr 75 when choosing Music as the target domain.

Data Preparation for Precision and Recall.
We choose Books as the target domain and Music CDs, DVDs, and VHS video tapes domains as the auxiliary domains. We filtered out users who have rated at least 100 books so that there are enough observations to be split in various proportions of training and testing data for our evaluation. Finally, 586 users were selected, and in addition we retrieved all items rated by these users in the four domains and set aside top ℎ rated items for each domain, respectively. Table 3 shows the statistics of the filtered data. Then, we constructed rating matrices over filtered data for each domain.
To simulate the sparse data problem, we constructed five sparse training sets, TR 50 , TR 40 , TR 30 , TR 20 , and TR 10 , by, respectively, holding out 50%, 60%, 70%, 80%, and 90% rating data from the target domain Book; that is, the remaining data for training is 50%, 40%, 30%, 20%, and 10%. The testing set is constructed in the following. We first filtered out users who have rated more than 20 books from the set composed by the hold-out data. Then we select 20 books randomly for each filtered user as the testing set. In order to compute the precision and recall, for the testing set, we also map the five classes of original ratings {1, 2, 3, 4, 5} into 2 classes, "liked" and "disliked." Usually, an item with a score greater than or equal to 3 is defined as "liked"; otherwise, it is defined as "disliked." We define the size of the recommendation list = 3, 6, 9 and the set of all the books liked by the user from the 20 books as the "liked" list. We sort the predictive ratings of the 20 books for each user in the testing set and choose the top books for recommendation. The books in the recommendation list are labeled as "liked." Hence, we can compute the precision and recall.

Impact of Rating Densities in Auxiliary Domains.
FCLWLR requires that auxiliary domains contain dense rating data. Obviously, a very sparse rating matrix from an auxiliary domain will not improve the recommendation performance in the target domain. In this part, we analyze how the performance of FCLWLR is affected by rating densities in auxiliary domains. We construct the experiment data in the following way. For simplicity, we just use tr 20 as the training data in the target domain. For data in auxiliary domains, we constructed four different data sets, 100 , 75 , 50 , and 25 , by, respectively, holding out 0%, 25%, 50%, and 75% rating data. We use MAE to evaluate the performance of FCLWLR on different data sets. Tables 4  and 5 and Figures 4 and 5, respectively. The performances of FCLWLR on different data sets are given in Figure 6.
As shown in Table 5 and Figure 5, the proposed models using rating data from one auxiliary domain or from all the auxiliary domains all perform better than UVD model which just uses the rating data from the target domain. FCLWLR DVD and FCLWLR VSH outperform FCLWLR CD due to the fact that DVDs and VHS video tapes are more related to books than music, since many movies are adapted from novels, and movies and books have some correspondence in genre. Besides, FCLWLR perform best due to the fact that more user features can be considered in the regression model, which will improve the regression performance effectively.
According to Table 4, it is also worth noting that, from tr 20 to tr 75 , our method possesses the largest performance improvements, because, with the number of training ratings increasing, the training set size of the converted regression problem also increases. Thus the regression model can effectively avoid overfitting, and the performance can be improved. N-CDCF-U also achieves a not-bad performance when the data is relatively dense, that is, tr 75 , but the performance decreases very fast when the data becomes sparser, because when the data are sparse, the total similarity used in N-CDCF-U cannot represent the local similarity in the target domain well. However, according to (1), with the number of training ratings increasing, the total similarity can represent the local similarity in the target domain better.
From Figures 4 and 5, we can also obtain two important conclusions in the following.
(1) Precision and recall metrics always depend on the length of the recommended list . In general, as increases, the precision metric will decrease and the recall metric will increase.
(2) If < , where denotes the length of the "liked" list, the recall metric for any model will not be greater than / .
We have the following observations from Figure 6. (1) When rating matrices of auxiliary domains are relatively dense (e.g., 100 , 75 ), our model FCLWLR performs well. The effect of FCLWLR, however, is unsatisfactory when rating matrices of auxiliary domains are sparse (e.g., 50 , 25 ). (2) FCLWLR even performs worse than UVD that is a singledomain CF algorithm, when rating matrices of auxiliary domains are very sparse. The main reason may be that when rating matrices become sparser, noise data from auxiliary domains will have a worse impact on the recommendation performance in the target domain.

Conclusion
In this paper, from the perspective of regression, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). On one side, the FCLWLR model can avoid computing proper weights for different domains, since we construct features in each domain and use the features to represent the domains. On the other side, the FCLWLR model can guarantee the accuracy of the weights for different features, due to the fact that LWLR model is a nonparametric regression method, which can effectively avoid underfitting or overfitting problem. The experimental results have shown that FCLWLR can significantly outperform all other state-ofthe-art baseline algorithms at various sparsity levels.
In this paper, we have only discussed how to construct features about users with the help of rating information in the auxiliary domains. In our future work, we will explore how to construct features about items and how to combine both user and item features to provide a better recommendation. Besides, FCLWLR requires a relatively rich rating data from the auxiliary domain. The experimental results show that FCLWLR even performs worse than single-domain CF algorithms, when rating matrices of auxiliary domains are very sparse. It would be interesting to compute what sparsity of rating data from auxiliary domains will degrade the effectiveness of FCLWR. It is worth studying in our future work.

Conflicts of Interest
The authors declare that they have no conflicts of interest.