A Probability-Based Hybrid User Model for Recommendation System

With the rapid development of information communication technology, the available information or knowledge is exponentially increased, and this causes the well-known information overload phenomenon. This problem is more serious in product design corporations because over half of the valuable design time is consumed in knowledge acquisition, which highly extends the design cycle and weakens the competitiveness. Therefore, the recommender systems become very important in the domain of product domain.This research presents a probability-based hybrid usermodel, which is a combination of collaborative filtering and contentbased filtering.This hybridmodel utilizes user ratings and item topics or classes, which are available in the domain of product design, to predict the knowledge requirement. The comprehensive analysis of the experimental results shows that the proposed method gains better performance in most of the parameter settings. This work contributes a probability-based method to the community for implement recommender system when only user ratings and item topics are available.


Introduction
Many researchers believe that comprehensive utilization of accumulated knowledge is vital to maintaining the competitive advantage in today's knowledge-based economy, particularly within research and development organizations [1][2][3].According to the statistic, over 90% of product design is variant design [4], which means new requirements can be satisfied by simply revising the accumulated design knowledge like solution report, 3D models, and others [5].Therefore, the design knowledge acquisition and utilization are of great importance for completing design tasks.However, the process of knowledge acquisition usually takes up over half of the design time, which significantly affects the design efficiency and extends the design cycle.This is caused by the separation between design knowledge and designers; in that situation the designers are inevitably to ask or search required information and design knowledge.Therefore, a recommender system is an urgent demand for decrease in the time spent on design knowledge acquisition.
To manage design knowledge, a number of design knowledge management systems, like experience design knowledge, 3D model base and standard parts base, and so forth, are deployed in many organizations.These knowledge management systems have recorded information for implementing a recommender system.On one hand, in most of the knowledge management systems, a set of carefully predefined knowledge topics or classes is used to organize design knowledge existing in the knowledge base, and the most commonly used is the product structure (Bill of Material (BOM)).On the other hand, many knowledge management systems record the ratings that designers have given to the design knowledge.Before this research work, we have developed a recommender system for designers based on the method by Al-Shamri and Bharadwaj [6], which combines the user ratings and product topics, and the system is deployed to the practical environment.After a period of running, there is a requirement for further promoting the performance of the recommender system.Therefore, this research work is an attempt to promote the performance of the recommender systems which is based on the topics and ratings of design knowledge.In this work, we present a probability-based hybrid user model; based on that, the corresponding similarity calculation method as well as the recommendation method is proposed.The experimental results show that this model obtains better performance in most parameter settings.The main contributions of this research include (1) combining the ratings and topics by a simple probability formula; (2) providing a new choice for implementing recommender system when only ratings and topics are available.The remainder of this paper is structured as follows.In Section 2, the related techniques are reviewed and analyzed.Then Section 3 presents the proposed method for constructing the probability based hybrid recommendation model.Following that, Section 4 gives a comprehensive experiments analysis.At last, Section 5 makes the conclusion and discusses the future works.

Related Works
In modern enterprises, tremendous amounts of data and information are stored and even more are generated than ever before.One side effect for the staff is information overload, which means the staff has been provided more information than they can process efficiently [7][8][9].Recommendation systems have emerged as a research methodology in response to this problem [10].According to [7], there are four kinds of filtering techniques that recommender systems may leverage, including content-based method (CB) [11], collaborative filtering method (CF) [12], knowledge-based method (KB) [13], and hybrid method [14].Each method has its own advantages and disadvantages; for example, the CF method has the advantage of independence of item's content and the disadvantage of sparsity rating data.To overcome the disadvantages and combine the advantages of different methods, the hybrid recommender system is proposed [15].The hybrid recommender system indicates the recommender system that combines multiple recommendation techniques together to produce better performance [7,16].According to [15], there are seven strategies for combining different recommender techniques, including weighted, mixed, switching, feature combination, feature augmentation, cascade, and metalevel.
In recent years, CF has become one of the most widely implemented recommender methods [6,17,18].Researchers in the area tend to combine CB method to achieve high scalability while maintaining relatively high prediction accuracy [6,10,19].For examples, Rastin and Zolghadri Jahromi [20] proposed a hybrid method which integrates collaborative filtering and content-based methods.They believe that two users are similar if their ratings on items that have similar context are similar.Yao et al. [21] built a collaborative filter and content-based filter hybrid user model to dynamically recommend Web services.Ronen et al. [22] presented a framework for automated selecting of informative contentbased features, and this framework is in dependent on the type of recommender system, which means the method is generalized well to different recommender systems.Barragáns-Martínez et al. [23] developed a Web 2.0 TV recommendation system which is named queveo.tv.They proposed a hybrid method which combined content-filtering techniques with collaborative filtering; the social network also integrated together to make better recommendation performance.Berkovsky et al. [24] proposed a method to merge different user models from different systems and use the merged user model to make recommendations.The same authors transformed this merged user model from a collaborative filtering system to a content-based recommendation system [25].Degemmis et al. [26] proposed a new contentcollaborative hybrid recommender that computes similarities between users, relying on their content-based profiles instead of comparing their rating styles and the content-based user profiles play a key role in the proposed hybrid recommender.Many of the existing researches rely on the description content of the items and this is unavailable in many situations, especially in the product design domain.
When the description content of the items is sometimes unavailable, the knowledge related to users or items can be leveraged for recommender system.Ontology-based or semantic-driven recommender system is a research branch to utilize the knowledge about items for recommendation [7,16].In these methods, ontology is used to model a big amount of related knowledge for recommender systems [27].If knowledge related to users or items can be obtained, this method will contribute to the performance of recommender systems.Moreno et al. [28] developed a Web-based system that provides a personalized recommender system of touristic activities in the region of Tarragona, and the ontology is used to classify and label the touristic activities for further reasoning process.
The above-mentioned methods cannot well satisfy the situation we are facing.On one hand, most of the content-based and collaborative-based hybrid methods require description content, which is unavailable in our problem.On the other hand, we do not have complicated knowledge about items except the predefined simple topics or classes, and the ontology construction itself is an extremely complex process, which requires many experts to work collaboratively.Therefore, the requirement of building an ontology for recommender systems is not urgent in this research work.Among the literature, Al-Shamri and Bharadwaj [6] developed an approach that uses the ratings and topics to build a hybrid model which well meets our requirement and situation.Based on their work, we develop a new probability-based method, and the experimental results show that the newly developed method is better in most of the situation.Therefore, we contribute a method to the research community, which can be used when they have only the topics or classes and ratings.

Proposed Method
3.1.Symbols Definition.We first defined the symbols that might be used in the following sections.Let  =  1 ,  2 , . . .,  || be the set of all users, let  =  =  1 ,  2 , . . .,  || be the set of all items, and let  =  1 ,  2 , . . .,  || be the set of all topics.We defined  as the two-dimensional user-item rating matrix and the element  , ∈ {, 0} represents the rating of user  ∈  on item  ∈ .The value of  , is on either the numerical ordinal scale (e.g., 1 to 5) or zero, which indicates an unknown rating (not rated).Here,  =  1 ,  2 , . . .,  || is defined as the ratings.The matrix  can be explained either 3.2.Database Validation.In many situations, items are classified into one or more specific topics according to their content; the number of topics is normally far less than the number of items.In this paper, we have attempted to formulate a hybrid user model for item recommendation by combining the rating data and the topic data.Before we develop the hybrid user model, one question should be addressed first: whether the topics could be used to express the interests of the users.
For a given database , which includes  and , we cannot ensure the user interests can be expressed by the item topics.An obvious example is that when most of the items belong to a small number of topics, the topics are unable to distinguish user interest.Therefore, we need to analyze and confirm whether the topics are capable of expressing user interest before building the user model.
We determine the relationship between users and topics using  = ×; each element  , in the matrix indicates the total score that user  ∈  rates for the topic  ∈ .The row vector   = [ ,1 ,  ,2 , . . .,  ,|| ] represents the total scores that user  ∈  rates for every topic.For a specific user  ∈ , we define cap  as the metric to measure the coverage of the user's interests: where |TOPT  | is a set of topics that the user may be interested in (the total score is higher than other topics) and it meets the following condition: where  is the threshold value that determines how many interested topics the user has.Based on that, we define cap  as the metric to measure the topic's capability of expressing the interests of all users: When most of cap  and cap  are small compared with ||, we infer that the user's interests are concentrated on a small number of topics.In other words, the topics are able to distinguish the user's interests.

User Model Construction.
In the traditional CF method, user model is simply represented by a rating vector in which all ratings that the user has given to item are recorded.
Similarly, we use a row vector of  to represent user model, which records the sum of ratings that the user has given to all items which belong to every topic.However, this method is unreasonable because the sum of ratings may be very close although each individual rating is different.For example, user 1 rates 5 items which belongs to topic 1 and the scores are 1, 4, 5, 1, and 1. User 2 also rates 5 items which belongs topic 1 and the scores are 5, 1, 1, 3, and 2.Although the sum of both ratings is 12, we cannot believe the two users have similar interest.Based on the above analysis, we adopts probability to construct the user model.The basic idea is that two similar users should have similar possibilities of rating a specific score on every topic.For example, the possibilities of user 1 rates 1, 2, 3, 4, and 5 on items which belong to topic 1 are 10%, 10%, 20%, 40%, and 20%, and if user 2 and user 1 have similar interest, user 2 should have similar possibilities of rating a specific score on items which belong to topic 1 .
In this work, a matrix is used to represent user model (UM), as shown in formula (4).The UM is || × || matrix in which row vectors indicate the possibility that a user may rate a specific score on topics: where ( 1 |  1 ) means the possibility of user rates score  1 to topic  1 .The value of ( 1 |  1 ) is calculated according to Bayes formula: where ( ∩ ) means the probability that user  ∈  gives a score  ∈  when the item  ∈  belongs to the topic  ∈ ; () indicates that the probability of an item  ∈  belongs to the topic  ∈ , which user  ∈  ever rated; and both (∩) and () may be determined out based on matrix  and matrix .

Neighborhood Calculation.
In this research work, we use CF as the method to make predication.Therefore, the calculation of neighborhood is a critical step.The number of neighborhood can be fixed or floated [6].The fixed neighborhood means to select the top  users which have the biggest similarities with the active user while the floated neighborhood means to determine the neighborhood by a similarity threshold.We will use the fixed number of neighborhood in the experimental section.
The essential problem of calculating neighborhood is the measurement of similarity between two user models.The measurement of the similarity is highly depended on the user model, which means that different user model requires corresponding similarity measurement.In this work, the user model is represented by a probability matrix and this is very different with traditional method, which represents user model as a rating vector.When the user is a rating vector, the similarity is figured out easily by many existing methods, like Euclidean distance, Manhattan distance, Cosine, and so on.
In this work, the basic idea of measuring the similarity of two user models is the number of topics which are interested by both of the two users, as shown in formula (6).Fox example, if user 1 and user 2 share five topics while user 1 and user 3 share three topics, we believe that the similarity between user 1 and user 2 is higher than the similarity between user 1 and user 3 : where sim(  ,   ) means the similarity of user   and user   ; pos(  ) represents the interest topics of user   ; and || means the total number of interest topics of   and   .The pos(  ) is computed by the following formula: where  is a threshold used to determine whether a specific topic is an interested topic.When  is high, it will be more difficult for a topic to be selected as the interested topic.pos(, ) means the sum of positive possibilities.
Here, the positive probability indicates the probability that the user will rate a high score for some topics, referring to the degree that the user may interest in the topics.For example if the value ( = 5 | ) is high, we may infer the user interest in the topic .On the other hand, a passive probability indicates the probability that the user will rate a low score for some topics, which refers to the probability that the user dislikes a specific topic.For example, if the value ( = 1 | ) is high, we believe the user dislikes topic .In this research work, only the positive probability is considered to calculate the similarity of two user models.In general, the high score ranges from 3 to 5 when the score ranges from 1 to 5. In the classical CF, we can predict the score using the average of neighbor ratings.However, the average value ignores the different rating characters of different users.The widely used method is the weighted sum [29,30], which is also Resnick's prediction formula [31].Therefore, in this work, Resnick's prediction formula is adopted for all different methods to validate the presented method.

Evaluation Database and Performance Metrics.
In this research, the research question was arisen from the product design domain, and the solution has been discussed in this paper which is trying to deal with the information overload problem in the product design domain.Because of that, the best way to validate the method is to use a database in the product design domain.However, there is still no unanimous database for validating the recommender systems in this domain.Therefore, we turn to finding widely used database that has similar characteristics of the problem in the product design domain.Another reason for using widely used database is that we believe the proposed method has the potential to deal with many problems in the different domains.
After considering many different databases, the MovieLesn is selected because the ratings and topics of items are recorded properly.MovieLens (http://www.movielens.umn.edu) is a widely used evaluation database.The database consists of 100,000 ratings, 943 users, and 1682 items.All ratings follow the following numerical scale: (1) bad, (2) average, (3) good, (4) very good, and (5) excellent.Each user in this database has rated at least 20 items.The dataset also contains genre (topic) features for the items that our method needs.A single item can belong to one or more movie genres, which include action, adventure, animation, children's, comedy, crime, documentary, drama, fantasy, film-noir, horror, musical, mystery, romance, sci-fi, thriller, war, or western.To avoid the influence of cold start items and cold start users, we have extracted a denser database from the original database.We only considered items that had been rated more than 20 times.Table 1 details the statistics of the extracted database.
In this paper, the 10 times 10-fold cross-validation process was conducted to evaluate our method's performance.The metrics we used included the mean absolute error (MAE) [30] and the coverage rate [32].
The MAE measured the difference between the predicted ratings generated by the recommendation algorithm and the real ratings given by the user; the lower the value, the better the method's performance.This metric was obtained by aggregating the MAE  for all users.The MAE  could be figured out using the following formula [30]: where   is the validating split and |  | is the size of the validating split.pr  denotes the predicted ratings for the th item in the validating split, and   denotes the real ratings for the th item in the validating split.Based on that, the overall MAE can be calculated using the following formula: where   is the number of users in the validating split.
The MAE can indicate the prediction accuracy of the method; however, the MAE cannot reflect the real utility of a recommendation algorithm, which makes the coverage rate an important metric [32].There are two ways to define the coverage rate.The first is to calculate the fraction of successful rating predictions for all unrated items; the second considers the items that have been rated by each user.In this paper, we have defined the coverage rate using the second definition, since it better meets the user's need [32].The value can be found using the following formula: where CR is the coverage rate and PI is a set of items that have been predicted correctly using the recommendation algorithm.Here, the correctly predicted items do not indicate that the predicted ratings and real ratings are identical.It means that the algorithm correctly distinguished whether or not the user had interest in the item.The VI denotes all items in the validating dataset.

Database Analysis.
Before validating the performance, we must first examine whether the database is capable of expressing the users' interests.According to the method proposed in Section 2, we examined the value of cap  for all users.Figure 1 shows the histogram of all cap  ; as we can see, most of cap  is small (range from 0.05 to 0.27) compared with 1, which means that most of the users' interests focused on a small number of topics.cap  is 0.21, which also implies that the topics in this database adequately express the user's interest.Therefore, we can use this database to build a probability-based rating and topic hybrid user model.

Evaluation of System Components.
In this section we have assessed the influence of two parameters, including the threshold value  and the neighborhood number, on the method's performance.First, we assessed the effect of the threshold value  on the method's performance by setting the value from 0.3 to 0.9.For each setting, we set the neighborhood number at 4. Later, we will assess the robustness of neighborhood number over a range of value.
As shown in Figure 2, the MAE of our method is approximately 0.819, which refers to the absolute distance between the predicted ratings and real ratings.The standard deviation of MAE is about 0.0045, which implies the MAE is not affected by the threshold value .This may be because most of the users' interests focused on a small number of topics; no matter the variation of , the positive topics will not change.It is worth noting that  = 0.3 produced the highest MAE value (0.829); this may be because the small threshold value brings some topics that may fall within the user's interests in the positive topics set.When  increased to 0.4 or greater values, the MAE value dropped and remained invariant.Figure 2(b) shows that the coverage rate was about 91.7%, which implies our method can recommend over 90% of the items correctly.The standard deviation (0.0045) was also small, which tells us the threshold value has little influence on the coverage rate.We also noticed that  = 0.4 produced the highest coverage rate; this is because some topics the user is interested in are lost when  = 0.3 and some topics in which the user is not interested remain when  is bigger than 0.4.In short, we can conclude that both the MAE and the coverage rate are not obviously affected by the threshold value, and the smaller value (e.g., 0.4) of the threshold value tends to obtain a high coverage rate and small MAE.
Second, we assessed the method's performance when the neighborhood number was set at different values (from 1 to 10).The threshold value was set to 0.4 in this experiment.As shown in Figure 3, when the neighborhood number increased from 1 to 10, the method's MAE first decreased to the lowest value (0.812) at 4 and then continuously increased.It is noteworthy that the MAE was high when neighborhood number was 1 because a single neighborhood is not enough to provide correct rating predictions.Furthermore, when the neighborhood number is greater than 4, the method also obtains a high MAE.This is because some dissimilar neighbors could be introduced if the neighborhood number is too big; therefore, the predicted ratings can be inaccurate.A similar situation occurred relating to the coverage rate in Figure 3(b); when the neighborhood number increased from 1 to 10, the coverage rate first increased to the highest value (0.938) at 4 and then dropped.Therefore, we know that when the neighborhood number is 4, the method performs at its best.We can conclude from this experiment that the method is sensitive to the neighborhood number; neither a too small value nor a too large value can obtain the best result.Therefore, the selection of an appropriate value for the neighborhood number is critical for our method.

Performance Comparison.
In this section, we have compared our method with the other two methods: the classical collaborative filtering (CF) method and the GIM method proposed by Al-Shamri and Bharadwaj [6].The GIM method was selected as a comparison method since this process uses the same information to predict user interest.During the experiment, our method is denoted by PF, the probabilitybased filtering method.The classical collaborative filtering has been taken as the experiment baseline.Because the classical collaborative filtering method and the GIM method are described in detail in related papers, we can then replicate the methods and compare the performances.
We have compared these three methods' performance on different parameter settings.Two parameters (neighborhood number, dataset density) were selected in order to compare their performance.The first parameter was used to validate the performance in the previous section; therefore, we know the meanings of the parameters.The second parameter is the dataset density, which is used to represent how many ratings the users have in the database.For example, setting the dataset density to 20 means only users who have rated at least 20 items have been selected in the final database.With this parameter we can compare the method's performance using different databases.
First, we compared the method's performance after changing the neighborhood number from 1 to 10.The dataset density was set at 20.This experiment was intended to check whether our PF method could obtain a better performance given different neighborhood numbers.Figure 4 shows the comparison results.We can see from Figure 4(a) that the PF method always obtains the lowest MAE when the neighborhood number has increased from 2 to 10.Although the value is much higher than the GIM method when the neighborhood number is 1, this does not affect the utility of the PF method because we will never select 1 as the neighborhood number in real recommendation systems.In terms of the accuracy of rating prediction, the PF method always outperforms the GIM method and both the GIM and the PF methods are superior to the baseline CF method.In Figure 4(b), the experiment shows that the PF method has a higher coverage rate than the GIM method when the neighborhood number is from 2 to 10, although the gap is not large.The baseline CF method's coverage rate is low from start to finish.We can conclude from this experiment that the neighborhood number affects the performance only slightly and does not change the performance ranking of these three methods.
Second, we compared the performance when providing different database densities to the three methods.During the experiment, the neighborhood number was set at 4. Unlike the previous two experiments, this experiment's result, shown in Figure 5, is much more complicated.In Figure 5(a) we can see that the baseline CF method does not always hold a higher MAE.When the dataset density is over 210, the CF method obtains a lower MAE than the GIM method; afterwards, when the dataset density is over 280, the CF method obtains a lower MAE than the PF method.This phenomenon tells us that with the increase of the user's rating number the CF method's performance increases rapidly, performing even better than the GIM and the PF methods, which are rating and topic hybrid methods.The occurrence of this phenomenon may be because the accuracy of user similarities using CF is higher than the accuracy obtained using the GIM and PF methods with a high-density database.It is also noteworthy that the PF method gets a lower MAE than the GIM method from beginning to end.If there are recommendation systems in which the prediction accuracy is critical, constructing a hybrid user model is not always useful; we can still use the classical CF method when the dataset density is high enough.
In Figure 5(b), the coverage rates for the three methods demonstrate a trend similar to the downward parabola, although the values fluctuate with a great deal.However, the three methods reach their highest coverage rates at different dataset densities.The similar downward parabola trend in the coverage rate is reasonable and reflects the real world situation.If the dataset density is very small, there may not be enough ratings to predict user ratings.As the dataset density increases, the coverage rate will also increase up to its highest value.However, afterward, if the dataset density continues to increase, the method may be incapable of dealing with the noise brought about by the high-density database; therefore, the coverage rate would begin to drop.For the baseline CF method, the highest coverage rate appears when the dataset density is at about 300.Although the coverage of the CF is pretty small at the beginning, it increases rapidly and exceeds the GIM method when the dataset density is about 170; it catches up with the PF method when the dataset density is about 270.For the GIM method, the coverage rate gets a slight increase as the dataset density increases and obtains its highest value when the dataset density is about 130, decreasing rapidly afterward.When the dataset density is over 170, the GIM method's coverage rate decreases until it is lower than the other two methods.The PF method maintains the highest performance throughout.The coverage rate increases slightly when the dataset density rises from 30 to 120, but after that it presents a highly uneven downward trend.When the dataset density is over 270, the coverage rate drops rapidly, along with the CF method.
Considering both the MAE and the coverage rate, this experiment demonstrates that the dataset density can significantly influence the performance of the three methods.When the dataset density is small, the GIM and PF outperform the classical CF method.However, the CF method can produce a lower MAE and a higher coverage rate prediction than the GIM method if the dataset density is high.The PF method obtained a relatively better result with both a low dataset density and a high dataset density.Therefore, we found that our PF method is more suitable than the GIM method when dealing with a high-density database.

Conclusion and Future Work
In an attempt to alleviate information overload in the domain of new product design domain, this paper proposed a probability-based hybrid user model that combined collaborative filtering and content-based filtering to implement a recommendation system.This research contributes a new method to the research community for implement recommender system when only user ratings and item topics or classes are available.Based on the extensive analysis of several experimental results, we got the conclusion that the proposed method obtains better performance in most parameter settings.Despite the progress of the method, some issues still need to be addressed in future work.
(1) First, the performance of the proposed method dropped if the dataset density was high, which means the ability of fault tolerance needs to be promoted.Therefore, in our future work, we will develop techniques to filter the noise generated by the dense database.
(2) Second, the performance of the proposed method was validated only by one database; more databases are necessary, especially a database from new product design domain.

5 Figure 1 :
Figure 1: The cap distribution of the database.

Figure 2 :Figure 3 :
Figure 2: The influence of  on the performance.

Figure 4 :Figure 5 :
Figure 4: The effect of the neighborhood number on performance comparison.

Table 1 :
Statistics of the final database.The last step is to figure out recommendations, which is a list of top- items that would most interest the user.The primary task of recommendation is to predict which scores the active user might give to unrated items according to the neighbor set, which gives us the top- highest score items as the recommendation.