The recommendation algorithm based on bipartite network is superior to traditional methods on accuracy and diversity, which proves that considering the network topology of recommendation systems could help us to improve recommendation results. However, existing algorithms mainly focus on the overall topology structure and those local characteristics could also play an important role in collaborative recommend processing. Therefore, on account of data characteristics and application requirements of collaborative recommend systems, we proposed a link community partitioning algorithm based on the label propagation and a collaborative recommendation algorithm based on the bipartite community. Then we designed numerical experiments to verify the algorithm validity under benchmark and real database.
Collaborative recommend technology is one of the most effective approaches in dealing with information overload. It has drawn great attention. However, with the development of network technologies, collaborative recommend has present complex dynamic characteristics and its key issues are still outstanding, such as data sparsity, scalable, and the shift of user interest.
In recent years, the study of complex networks has become a hot issue, and many theoretical models and analytical methods have been proposed. It provides new ideas and methods to solve those key problems. The bipartite network is an important manifestation of the complex network [
However, current research on collaborative recommend based on bipartite topology is just starting and mainly focuses on the global structure. It also contains a clustering based on certain pattern of users and resources, such as users’ common interests and similar resources’ theme. This local feature, known as the community structure in complex networks, is very beneficial to collaborative recommendation, including the following three points.
The communities are forming naturally and their size are controllable. Using them as nearest neighbourhoods can guarantee the relations between the object and its neighbours and enable the size of nearest neighbourhood to be dynamic. Compared with traditional methods, collaborative recommendation based on bipartite community can reduce influences of mistaken nearest neighbours. On the other hand, processing on local community structures can alleviate the scalability issues to a certain extent.
Structural properties of communities, such as overlap and hierarchy, could enrich the available information for collaborative recommend system. Studying on the inherent relationship between those structural properties and collaborative recommend process can bring new breakthroughs in data sparse and interpretability problems.
Research on community structure evolution can also grasp the dynamic nature of the recommendation system and reflect the behaviour of the continuous interactive and user feedback. So we can also discover certain patterns and predict the tendency of community structure, which could intelligentize collaborative recommend process.
Thus, we first proposed a bipartite community partitioning algorithm according to the real data environment of collaborative recommendation. And then we proposed a novel collaborative recommendation algorithm using these bipartite communities. Finally, we verify the validity of the algorithms by numerical experiments and analysed the phenomenon and reasons of the experimental results.
In order to handle largescale data set, the traditional researches are mainly focused on kNN methods, which predict recommendations by those historical choices of the target user’s
Reference [
Reference [
The above algorithms both adopted the idea of dynamic nearest neighbourhoods. The former focused on the neighbourhood scale which is suitable for the target user’s forecast scenarios, while the latter's concern is the modification of neighbourhood based on the attribute of resource class. In addition, the former integrated userbased and resourcebased collaborative recommendation. These algorithms overcome the limitations of traditional methods with fixed nearest neighbourhood and single dimension measure. But there are still some defects. The former needs to set a large number of parameters artificially, while the optimal value of parameter is difficult to determine under different scenarios, which will affect the algorithm stability. And resource class attribute introduced by the latter algorithm is too objective to reflect the rich content of users’ subjective behaviour.
The division of bipartite community is the process of identify bipartite network community. It has important theoretical significance and practical value on network structure analysis, functional evolution, and prediction. In General, we can get the structure of bipartite community by project bipartite network for a common network of one kind nodes and execute existing community division algorithms. However, the projection process will result in loss of information and other issues. Therefore, many scholars directly divide bipartite community against the original bipartite network structure. Existing bipartite community methods generally fall into three categories: modularbased methods, cliquebased methods, and propagationbased methods.
There are two main policies in modularbased methods. One is regarding the modular as the target function for optimization [
Cliquebased methods divide overlapping bipartite communities. Reference [
Propagationbased methods are easy to implement parallel with a linear time complexity and without prior knowledge. Reference [
For users and resources are different objects, the bipartite communities under collaborative recommendation environment should be able to distinguish between heterogeneous nodes in order to guarantee interpretability. At the same time, due to the common phenomenon of miltinterested users and milttheme resources, the overlap and hierarchy of bipartite communities should be allowed. The division of link community is an effective way to achieve the above targets. However, current researches on link community division are mainly in common network.
Reference [
Reference [
According to the collaborative recommend systems’ environments and requirements, our community division algorithm needs to implement the following goals except for the accuracy:
Without loss of generality, defining a bipartite network as
For a given node
The heterogeneous neighbor is the direct property of a node; the degree of a node is the number of its heterogeneous neighbors, while the homogeneous neighbor is the indirect property of a node, because a node and its homogeneous neighbors contact with each other through those common heterogeneous neighbors. This interconnection is referred to as “cross linking,” and bipartite networks is exactly a crosslinked network. A Crosslinked structure is the basic unit of bipartite networks consisting of a pair of homogeneous nodes and common heterogeneous nodes. The formal definition is as follows.
In a given bipartite network
There is a certain correlation between a pair of homogeneous nodes in a crosslinked structure. For example, a cocitation relation means that literature articles are more or less similar to some degree if they have the same quotations. Therefore, reference [
Given a crosslinked structure
By accumulating the all correlations in crosslinked set (
Given a node
Given a node
Figure
An example of calculating vertex correlations. The lefthand figure is the structure of a bipartite network and the righthand one shows correlation matrixes.
In a common network, we generally say that two edges sharing the same endpoint are adjacent, because these edges clearly have higher similarity than those without common endpoint. But the common endpoint is unable to provide useful information for the similarity measure, and the higher its degree is, the more similar the edges are. Therefore, Ahn et al. [
However, an edge exists in a pair of heterogeneous nodes in bipartite network. So, the common nodes of adjacent edges are different types.
For example, in Figure
Therefore, for a given edge, we define another edge as its adjacent edge if and only if the edge has no common endpoints with it and each pair of homogeneous endpoints of them has common heterogeneous neighbors. This could unify the correlation of both dimensions, and we could measure it by the correlations of the two pairs of homogeneous nodes. For example, in Figure
Two edges are adjacent if and only if they have no common endpoints and each pair of homogeneous endpoints shares common heterogeneous neighbors. For a given edge
If edge (
The property of an edge is determined by its endpoints. Therefore, we can measure the correlation of adjacent edges through indirectly multiplying the correlations of homogeneous endpoints. Relevant formal definition is as follows.
Given a pair of adjacent edges (
The above definition uses the correlation between two pairs of homogeneous neighbor nodes independently. When calculating Sig
For a given pair of adjacent edges (
If the intermediary node
Table
An example of calculating the correlations among adjacent edges, where
(1, 1)  (1, 2)  (1, 3)  (1, 5)  (2, 1)  (2, 4)  (3, 4)  (4, 2)  (4, 3)  (4, 4)  (4, 6)  

(1, 1)  —  0  0  0  0  0.141  0  0.174  0.174  0.510  0 
(1, 2)  0  —  0  0  0.673  0.099  0  0  0.415  0.140  0.278 
(1, 3)  0  0  —  0  0.673  0.099  0  0.415  0  0.140  0.278 
(1, 5)  0  0  0  —  0.175  0  0  0.413  0.413  0  0 
(2, 1)  0  0.095  0.095  0.151  —  0  0.272  0.139  0.139  0.107  0 
(2, 4)  0.289  0.221  0.221  0  0  —  0  0.118  0.076  0  0.118 
(3, 4)  0  0  0  0  0.501  0  —  0.140  0.140  0  0.219 
(4, 2)  0.158  0  0.410  0.248  0.097  0.033  0.053  —  0  0  0 
(4, 3)  0.158  0.410  0  0.248  0.097  0.033  0.053  0  —  0  0 
(4, 4)  0.158  0.170  0.170  0  0.092  0  0  0  0  —  0 
(4, 6)  0  0.401  0.401  0  0  0.077  0.121  0  0  0  — 
In the real world, people often follow others’ behaviors. For example, they will buy the same goods that their friends have bought. The edge correlation measure proposed in last section could be interpreted as to how a behavior depends on another one. Here, we propose a link community division algorithm based on label propagation (BELPA).
The basic idea of BELPA is assigning unique labels to each edge at first and then repeatedly updating labels until they converge to a steady state. At last, edges with the same label belong to the same community. This process is equivalent to label propagation on a directed and weighted network where nodes are corresponding to edges in the bipartite network and the directed and weighted edges are corresponding to the correlation between adjacent edges. We need to solve three key problems: how to allocate initial labels, how to update labels, and when to stop the iterative process.
First of all, we select one kind of node sets as the starting set and then give the same label for edges ending up with each node in the node set. For example, if starting from the set of
After initial allocation, the label updating strategy includes following aspects.
Label selection strategy. At the
Tie treatment strategy. When the above function returns more than one maximum labels, we will maintain the label of edge
Updating execution strategy. We execute label updating synchronously, that is, the new label of each edge is independent of other edges in the current iteration and just relies on the adjacent labels in the last iteration. So we can obtain more stable results and make the algorithm parallel and practical.
Finally, in
no edge updates label, namely,
after updating, labels satisfy the condition
the maximum iteration is reached, namely,
According to the above steps, we execute the edge label propagation on the bipartite network shown in Figure
An example of edge label propagation. The row numbers express edges, the column numbers express iteration times, and elements express edge label identifier.
(1, 1)  (1, 2)  (1, 3)  (1, 5)  (2, 1)  (2, 4)  (3, 4)  (4, 2)  (4, 3)  (4, 4)  (4, 6)  

0  1  1  1  1  2  2  3  4  4  4  4 
1  4  4  4  4  4  1  2  1  1  1  1 
2  1  1  1  1  1  4  1  4  4  4  4 
3  4  4  4  4  4  1  4  1  1  1  1 
4  1  1  1  1  1  4  1  4  4  4  4 
In the initial iteration (iteration zero), the labels are identifiers of
Radicchi et al. [
The hierarchical clustering algorithm [
That the given edge completes its label updating is equivalent to saying that the edge has joined into a link community with its new label. And this change will make an effect on the original link community, because the correlation between adjacent edges is bidirectional. For example, the whole correlation in the link community will be weakened, if the correlations between original edges inside and the given edge are very weak. This is similar to the access permission of some organizations in real life, such as some people who will be rejected to join in. Therefore, we modify formula (
Here, parameter
An example of getting differentscale communities by adjusting




0~0.4  {1, 2}, {2, 3, 4}  {1, 2, 3, 5}, {2, 3, 4, 6} 
0.5~1.0  {2, 4}, {2}, {3}, {1}  {1, 2, 3, 5}, {2, 3, 4, 6}, {1}, {4} 
We utilize BELPA algorithm to obtain the user and resource community and then forecast the value of resources which are not chosen by target users according to the community membership and corresponding relationship between communities to realize collaborative recommendation.
Firstly, we take user and resource community as users’ and resources’ nearest neighbourhood, respectively, and call those community members as users’ and resources’ community neighbors, respectively. For a given user
Secondly, we call those heterogeneous communities with same community label as corresponding community; that is, for any node
a weighting coefficient which is used to calculate correlations between a given user or resource and their community neighbors;
an initial score that a given user gives to his selected resource.
Finally, we use cosine theorem to calculate the similarity based on the community membership. The formal definition is as follows.
The similarity based on community membership of given node
Combining with the above definition, we use larger value strategy to modify the correlation measure between target object and its community neighbor, and weighting by its community membership. So, the correlation between given object and its community neighbor is defined as follows.
The correlation between given node
We maintain a recommended list
For a target user
determining the user community neighbourhood
for
sorting resources in the
Similarly, the steps of resourcecommunitybased collaborative recommend (RCBCR) algorithm is as follows:
for
adding resources in
sorting resources in the
Standard data set. We used Southern Women data set to verify the validity of BELPA algorithm in this section. This data set describes the participation of 18 women in 14 social events. Many social scientists have divided 18 women into two groups: woman from 1 to 9 and woman from 10 to 18. Some other social scientists think that woman 9 belongs to both groups. Generally, the real women community partition of this data set is expressed by
Table
Results of related algorithm.
Women community  Events community  

Guimerà  {1~9}, {10~18}  {1~8}, {9~14} 
Barber  {1~7, 9}, {8, 10~18}  {1~8}, {9~14} 
Murata  {1~6}, {7, 9, 10}, {8, 16~18}, {11~14}  {1~6}, {7, 8}, {9, 11}, {10~14} 
Suzuki  {1~7}, {8}, {9}, {16}, {17, 18}, {10~14}  {1~6}, {7}, {8}, {9, 11}, {10, 12~14} 
Therefore, we deem that the reasonable result contains two parts: the foundation partition which consists of two communities
Table
The result of BELPA algorithm starting from women set with 0.1 as the step length of parameter
Γ  Women community  Events community 

0~0.4  {1~9, 16}, {10~18}  {1~9}, {6~14} 
0.5  {1~9}, {10~18}  {1~9}, {6~14} 
0.6~0.7  {1~9}, {8, 10~18}  {1~9}, {6~14} 
0.8~1.0  {1~9}, {8~18}  {1~9}, {6~14} 
We can also see that the final community label is 1 and 13, because the two women 1 and 13 have high frequency to participate in social events and they become the core of communities. Then, women 16, 8, and 9 nodes in overlaps all have lower frequentness of participating in social events, 2, 3, and 4, respectively, and they all have taken part in two social events, 8 and 9, in which many women have participated. So they are pulled by two communities and become community border.
The above experiments show that BELPA could obtain reasonable results consistent with the real partition under different parameter values. What is more, BELPA could identify the cores and borders of communities which are representatives of communities and bridges connecting different communities, respectively. This will play an important role in the practical applications.
Only meaningful communities in the real world can be further used of to achieve collaborative recommendation. Therefore, in this section we use MovieLens data set to verify the validity of BELPA algorithm. In order to test the recommendation algorithms afterwards, we extract some data in a random time slot from the original data set and divide it into two parts on the basis of time sequence; the training set contains 80% of it and the test set contains 20% of it.
The data set in this experiment contains 113 user nodes and 1024 resource nodes. The average degree of user nodes is 70 and the average degree of resource nodes is 7.8. If we choose resource nodes as the starting set, we can finally obtain the result shown in Figure
Basic community structure. With the growth of parameter
Nested community structure. With the growth of parameter
Overlapping degree. The length of solid vertical lines beyond the dotted horizontal line shows the overlapping degree of communities. So we can see that the overlapping degree of user community is bigger than the one of resource community, because the average degree of user node is bigger, namely, that user nodes ending up with more edges will be more likely to belong to different communities.
The community division result of MovieLens data set starting from resource nodes. The abscissa and ordinate denote the parameter value and community scale, respectively. Figure at left shows user communities and the one at right shows resource communities. The dotted horizontal lines show the number of users and resources, respectively, and solid vertical lines represent community distributions. Each segment of the solid line shows a community and the length denotes community size. The labels beside nodes are the identifiers of communities.
Starting from user nodes, we can finally obtain the results shown in Figure
There are slight differences in both the number and size of communities between basic partitions of them. And there is also a certain correspondence between them. For example, 130 community is separated because it is connected to 335 resource.
The former user communities are more overlapped than the latter ones, while the former resource communities are less overlapped than the latter ones. In the first iteration, nodes in the staring set can gain only one initial community label, while nodes in the receiving set can get more than one label through edges of different labels. Finally the former user communities with user nodes as receivers are more overlapped than the latter ones with user nodes as senders and so does the situation of rescores nodes.
The former nested community structures are slightly more than the latter ones with the growth of the parameter value, because the scale of resources in this experimental data set is larger than the scale of users, which makes the former number of initial labels slightly more bigger and nested community be easier to separate.
The community division result of MovieLens data set starting from user nodes. Related instructions are shown in Figure
Above all, BELPA algorithm can get bipartite communities in the real data set. What’s more, it can identify cores borders and of communities and some nested structures, which will play an important role in collaborative recommendation systems. For instance, we can use community cores to ease cold startup problem or use community borders and nested structures to improve the recommendation diversity. Moreover, by comparing the results of different initial label distribution strategy, we found that the results of BELPA algorithm are relatively stable. But specific initial label distribution strategy and data structure will both affect the division results indeed.
Measures of collaborative recommendation contain accuracy and individualization. The former is the degree of correspondence between recommend results and users' preferences, and the latter is the difference degree between recommend results of different users.
We use rank accuracy and hitting rate to measure the accuracy. Rank accuracy is the mean of ranking score, which is defined as
In fact, users only concern resources at the front of recommend lists, so we use hitting rate to measure the percentage that the number of resources chose by the target user to a certain list length with certain length list. The hitting rate is defined as
We use popularity and diversity to measure the individualization degree of algorithm. The popularity is measured by average degree. In the real world, if the recommended results contain many popular resources which are chosen by many users, the accuracy could be guaranteed, while the individualization perhaps could be weakened, because the popular resource may not meet the individual needs of users. Therefore, the smaller the average degree is, the more personalized the recommendation results are.
Hamming distance can measure difference degree between recommend lists. It is defined as
In this section we used the above measures to estimate our collaborative recommend algorithm based on communities in Figures
Results of UCBCR. It successively reports the rank accuracy, hitting rate, popularity, and diversity base on different length of recommend lists and different community structures under each parameter values. Illustrations
Results of RCBCR. Related instructions are shown in Figure
Looking at the overall trend of different indexes in the figures, we can get the following conclusions.
Each value of different measures under different parameter values presented a gentle change, which verified that the basic partition changed both on size and number of communities as we have mentioned before.
The variation tend of these measure values keeps basic consistent with the one of both the overlapping degree of community structure and separation degree of nested structure in Figures
Different initial label assignment strategy results in different community partition. Therefore, observing results of illustrations
As a whole, the effects of UCBCR and RCBCR algorithm based on communities obtained by BELPA starting from user nodes are superior to the ones based on communities obtained by BELPA starting from resource nodes.
When the length of list is separately 10, 50, and 100, the hitting rate of RCBCR algorithm based on communities obtained by BELPA starting from resource nodes is higher than the one based on communities obtained by BELPA starting from user nodes. However, the diversity and popularity index is still slightly optimal in RCBCR algorithm based on communities obtained by BELPA starting from resource nodes.
As mentioned before, both user and resource communities obtained by BELPA starting from resource nodes are more meticulous, but the overlapping degree of resources communities is slightly lower than those obtained by BELPA starting from user nodes. Therefore, the above phenomenon can also explain that, the higher overlapping degree and the more apparent nested structure are helpful to improve recommend accuracy on the whole and guarantee individuation of algorithm at the same time.
In addition, there is an obvious inflection point on the curve of the illustration
The above analysis shows the relationship between effects of collaborative recommend algorithms and community characters, including the overlapping degree and the separation degree of nested structures. The reasons are mainly the following two points.
Because our collaborative recommend algorithms adopt the similarity based on the community membership, the higher the overlapping degree of communities is, the more abundant the information of nodes’ community membership is. And we can depict the scale of neatest neighborhood and the correlations among the neighbors much more accurately.
The separation of nested structures comes down to find out those nodes as the community borders, which are bridges between different communities. They can expand the scope of the nearest neighborhoods by introducing other possible related objects, which could ensure both algorithms accuracy and recommend diversity, especially the latter.
Finally, we compared metrics of our collaborative recommend algorithms ( UCBCR and RCBCR) with those of algorithms (RCNCR and UCNCR) as benchmark algorithms in [
Results of different algorithms under some typical list lengths. (a), (b), (c), and (d) report the ranking accuracy, hitting rate, popularity, and diversity, respectively. The illustrations of RCBCR and UCBCR show the mean of all metrics UCBCR and RCBCR algorithms based on communities in Figure
Ranking accuracy
Hitting rate
Popularity
Diversity
Analyzing the above results, we can get the following conclusions.
Compared with benchmark algorithms, each measure of UCBCR and RCBCR algorithms improved. Our algorithms ensured the algorithm accuracy and showed an obvious advantage in the recommend diversity individuation. It also proved that those user and resource communities could effectively represent the nearest neighborhood, which could verify the validity of BELPA algorithm at the same time. On the other hand, the community neighbors could break limits from crosslinked structure, which played an important role to raise the novelty of collaborative recommendations.
Comparing the results of different algorithms under different community divisions, the improvements of the algorithms UCBCR and RCBCR are more apparent than the ones of the algorithms UCNCR (U) and RCBCR (U), which we have mentioned before.
Comparing the results of different measures, on the algorithm accuracy, the improvement of the algorithm UCBCR is more apparent than the algorithm RCBCR, which verifies the importance of the community overlapping degree to improve recommend effects. On the recommend diversity, the improvement of algorithm RCBCR is very apparent, because the number of resources community is larger and the resources nodes in the community border can introduce new objects from other communities, so more community neighbors could become recommended objects. Another possibility is that if the overlapping degree is too high, it may reduce the recommend diversity.
In order to use local characters of the topology structure of collaborative recommend systems, we put forward a bipartite link community division algorithm based on the label propagation (BELPA). We redefined the structure of adjacent edges and the edge correlation measure by making full use of the properties of endpoints on the edge. Then we gave a label to each edge and synchronously updated labels according to edge correlations until steady state was reached. Those edges with the same label comprise a community. Taking example by the idea of defining strength and weak community, we expanded the basic algorithm by adjusting the label updating function to make the scale of community variable. Finally, we designed numerical experiments on relevant data sets to verify the algorithm validity.
We proposed a collaborative recommendation algorithm based on the bipartite community obtained by BELPA. In detail, we used the overlaps and corresponding relationship of the user resource communities to realize the dynamic nearest neighbourhood. At last, by the numerical experiment and the analysis of experimental results, we prove that our recommend algorithms could effectively improve the recommended accuracy and individuation.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported in part by Grants from The National Natural Science Foundation of China (61070122, 61070223, and 61373094); Jiangsu Provincial Natural Science Foundation (9KJA520002); Jiangsu Provincial Research Scheme of Natural Science for Higher Education Institutions (09KJA520002); Jiangsu Provincial Key Laboratory for Computer Information Processing Technology (kjs1024); Jiangsu Province Support Software Engineering R&D Center for Modern Information Technology Application in Enterprise (SX200902).