CSA: A Credibility Search Algorithm Based on Different Query in Unstructured Peer-to-Peer Networks

Efficient searching for resources has become a challenging task with less network bandwidth consumption in unstructured peer-topeer (P2P) networks. Heuristic search mechanism is an effective method which depends on the previous searches to guide future ones. In the proposedmethods, searching for high-repetition resources ismore effective. However, the performances of the searches for nonrepetition or low-repetition or rare resources need to be improved. As for this problem, considering the similarity between social networks and unstructured P2P networks, we present a credibility search algorithm based on different queries according to the trust production principle in sociology and psychology. In this method, queries are divided into familiar queries and unfamiliar queries. For different queries, we adopt different ways to get the credibility of node to its each neighbor. And then queries should be forwarded by the neighbor nodes with higher credibility. Experimental results show that our method can improve query hit rate and reduce search delay with low bandwidth consumption in three different network topologies under static and dynamic network environments.


Introduction
In the past ten years, peer-to-peer (P2P) networks have gained full development and become an important part of the Internet.P2P networks are divided into structured P2P networks and unstructured P2P networks.Unstructured P2P networks are characterized with self-organization, distributed resource sharing, semantic queries, and so forth, which have been widely applied on the Internet, such as Gnutella [1], FastTrack [2] and KaZaA [3].However, because of the dynamic characteristics of unstructured P2P networks, it is difficult to capture correctly global behavior [4,5].Each node in unstructured peer-to-peer networks does not have global information about the whole network topology and the location of queried resources.Thus, designing an efficient search algorithm has been a hot research issue in unstructured P2P networks.
There are mainly two kinds of search methods in unstructured P2P networks: blind search methods and informed search methods.In the former, such as flooding [1], peers possess no knowledge to guide the search process, resulting in great blindness.When the size of network increases, the search time will be extended, a large number of redundant messages will be created, and large amounts of network bandwidth will be consumed.In order to reduce the bandwidth consumption, many improved methods [6][7][8][9][10][11][12][13][14][15] have been proposed on the basis of the blind search algorithms.And literatures [8][9][10] have improved flooding algorithm in network bandwidth consumption while preserving large coverage, response time, and flexibility of flooding in dynamic environment.The effective and optimizing search algorithms have been presented in literatures [11,12], which achieve higher performance than random walks in terms of number of hits, network overhead, and response time by adopting stochastic process knowledge and estimating of the popularity of a resource, respectively.A hybrid search scheme [13] and light flood [14] are proposed by combining flooding and random walks and make full use of the both merits so as to minimize redundant messages.In literature [15], RFSA limits effectively that the message be received and forwarded repeatedly in blind search methods by using the real-time search path information and the local messages index caching mechanism, thus reducing the production of a great number of redundant messages.
In contrast to the blind search methods, many informed search algorithms have been more extensively studied and proposed, such as intelligent search [16], APS [17], PQR [18] and SPUN [19].In literature [16], an intelligent search is proposed where a query is forwarded to neighbors that have answered the most queries similar to the current query.APS [17] is a popular adaptive probability  random walks search algorithm and is also bandwidth efficient and easy to implement unstructured P2P search algorithm.APS utilizes the feedback information from previous searches to guide the future ones probabilistically.In APS, each node maintains an index table to record success rate of each neighbor for each requested resource in previous searches.APS probabilistically selects those neighbor nodes which get higher success rate for the requested resource from previous searches.Thus, the search will be successfully guided to the requested resource.At the same time, the success rate is updated dynamically based on whether a peer returns a hit or miss for a given query.PQR [18] is a novel query routing mechanism for improving query performance in unstructured P2P networks.In PQR, a data structure called traceable gain matrix (TGM) is designed and used to record gain value of every query at each peer along the query hit path.By TGM, PQR can optimize query routing decision effectively and achieve high query hit rate with low bandwidth consumption.TGM is an important component of PQR with a compound data structure and maintains query routing information.In these methods [16][17][18], peers update index values only based on the return type of the query message, success, or failure.And in APS and PQR, peers also have a tendency to use the first discovered neighbor node which reduces search performance in dynamic environments.As for these problems, SPUN is proposed in [19].SPUN is an informed search algorithm that improves upon state-of-the-art APS.Each peer in SPUN maintains a vector of relative success rates (RSRV) along a query path for a given neighbor for a requested resource resulting in a more informed decision in SPUN.SPUN uses best path gradient (BPG) as neighbor node selection mechanism, which firstly calculates PG values of different paths through neighbor nodes and then discovers more successful query paths through the neighbors according to PG values.The purpose of SPUN is to select neighbor node with the most successful query path to forward query message.However, the most successful query path is often "traffic arteries" for the requested resources, which can cause search bottleneck problem.On the other hand, it only considers the success of a path with no consideration to the distance information of the path.The most successful path may be the longest and the most congested, thereby increasing the search time and network overhead.
The common characteristic of these methods above is to guide future searches through search information recorded previously.Therefore, in the search process, these algorithms are effective for the repeated queries for the same objects (resources) or similar objects (resources).Because these algorithms can gain heuristic information from the relevant indexes such as TGM (in PQR) or RSRV (in SPUN).But for those nonrepetitive queries such as queries for race resources, there is no heuristic information in the relevant indexes, so the queries will be forwarded to a random neighbor node (peer).At this moment, the search is inefficient in these methods and their overall search performance will be reduced.
The main motivation of our research is to solve the inefficient problem for nonrepetition or low-repetition or rare resources queries.So, a credible search algorithm (CSA) based on different query is proposed in this paper.The main purpose is to improve the search performance of the searching for nonrepetition or low-repetition or rare resources and repeated queries through the effective guidance and then achieve the higher overall search performance.The contributions of this paper are shown as follows.(1) It is the first in which queries are divided into familiar queries and unfamiliar queries and for different query adopting different calculation method to obtain credibility information and then selecting neighbor nodes with higher credibility as passers.(2) We give a credibility calculation method to calculate neighbor's credibility according to familiarity and similarity among queries and nodes based on the trust production principle in sociology and psychology.(3) Design a new data structure: query credible matrix (QCM), recording the credibility of each neighbor node for each specific object.(4) The proposed methods can achieve high query hit rate with low search delay and low bandwidth consumption and improve the search performance for nonrepetitive queries, such as rare resource queries.

Credibility Calculation Method
The studies of literature [20] show that trust among humans consists of two parts: the one generated by the familiarity and the other one by similarity.In social networks, the more familiar among humans, the more trust produced will be.The more similar people are in their interests and hobbies, the easier they will trust each other.So, through familiarity and similarity to calculate trust among humans can reflect the generating process of trust in social networks.Considering the similarity between unstructured P2P networks and social networks, the principle of trust generated among humans in social network can be used to the credibility calculation of node to its neighbors in unstructured P2P networks.The credibility of node to its neighbors is also divided into the trust generated by the familiarity and the trust generated by the similarity.
Suppose   stands for an arbitrary peer in unstructured P2P networks and   is an arbitrary neighbor peer of the peer   .The credibility of   to   is calculated as where Cre(  ,   ) represents the credibility of the peer   to its neighbor peer   .fam(  ,   ) refers to the trust generated by the familiarity between   and   .In familiarity study, people often gain the familiarity through the number of contacts among humans.In our study, fam(  ,   ) is defined as the success rate of the communications between   and   .The more the number of success communications is, the higher the success communication rate is and then the more familiar is between   and   .And the credibility of   to   is higher.sim(  ,   ) denotes the trust generated by the similarity between   and   .In unstructured P2P networks, contents stored on a peer reflect the interests and hobbies of this peer and then sim(  ,   ) will be obtained according to the similarity of contents stored on   and   .

Credibility Calculation Based on Familiarity.
In social networks, familiarity among humans mainly derives from the mutual help and constant contacts, and so forth.Similar to social networks, in unstructured P2P networks, whether peer and its neighbor are familiar will be decided through the behavior of communications between the peer and its neighbor.
In general, the more the numbers of communications are between   and   , the more familiar they are.At the same time, this communication is bidirectional.If only   sends query messages to   and   cannot return successful messages to   , then, although the number of communications between them becomes more, the neighbor peer   is still not credible for the future search.So the number of communications between   and   cannot reflect better the familiarity of   to   .We can not only use the number of communications as the credibility of   to   generated by the familiarity.
Because query tends to be passed to the first peers found [17] or the peers with higher degree [21] in the informed methods, then these peers have more opportunities to be selected as passers.And thus, these peers can get higher number of success return messages than those peers that have less opportunity to be selected as passers, even though most query messages forwarded through the peers can be successfully returned.Meanwhile the first peers or peers with higher degrees will be selected repeatedly to forward messages.And thus search bottleneck problem on these peers will be produced and the broadness of search will be reduced.
For example, suppose the number of messages of   sending to  1 is 7 and the number of messages of success return is also 7. The number of messages of   sending to  2 is 20, and the number of messages of success return is 10.If we adopt the number of success communications as the credibility of   to   generated by the familiarity, then fam(  ,  1 ) = 7, fam(  ,  2 ) = 10.Although all messages of   sending to  1 have been all returned successfully, because the number of query messages forwarded by  1 is less, the number of messages of success return of  1 is still less than that of  2 .Thus  2 gains more confidence than  1 and will be more selected, which will cause bottleneck problem of the peer  2 and ignores the selection for  1 .So, we calculate the trust of   to   generated by the familiarity according to communication success rate in this paper.And thus, fam(  ,  1 ) = 1 and fam(  ,  2 ) = 0.5; the peer  1 can gain more trust and it will be more selected than   [22,23] show that a better search performance can be get though clustering similar peers into the same group based on the similarity of preferences of the peers, such as interests or hobbies, and then searching in these groups.This suggests peer can get more success hit from neighbor peers similar with the peer.So the more similar peer and its neighbor are, the more they will trust each other.In this section, we obtain the preferences information of peers from contents stored on the peers and then according to the similarity of the preferences information of peers to calculate the credibility between peer and its neighbor generated by similarity.
Given an object set of peer   :   = { 1 ,  2 , . . .,   , . . .,   } with  elements, because of advances in metadata retrieval technology [24][25][26], it is easier to obtain keywords information of every object.Let    = { 1 ,  2 , . . .,   , . ..} be a set of keywords used to describe an object   .We denote the keywords set of all objects peer   held by    = ⋃  =1    and the characteristics (preferences information) of peer   are defined as a vector of weights ⃗   = ( ,1 ,  ,2 , . . .,  , , . ..),where the weight  , denotes the preference of peer   for objects described by the keyword   as follows: where   is the set of objects held by peer   and  , is a subset of   containing objects tagged by keyword   .
The similarity sim(  ,   ) between   and   can be measured by comparing their preferences.There are several methods such as the correlation coefficient, the cosine similarity measure [27], and the Euclidean distance that can be used to compute the distance between two description vectors and return a quantitative value to represent the similarity between peers.In this context, we use the cosine similarity measure to quantify the similarity sim(  ,   ) as follows: where  is the total number of keywords.If   and   have similar interests in the contents they hold, then a bigger value sim(  ,   ) will be obtained.

Credibility Calculation Based on
Query.In a real world, when a person is asked to do a task, who tends to choose his (her) credible and capable friends to complete this matter?For example, in six degrees of segmentation principle, everyone who receives the letter usually passes the letter to his friends who have similar information with the letter.
In unstructured P2P networks, for a specific query  ℎ , we use sim( ℎ ,   ,   ) to denote the credibility between   and its neighbor   based on the specific query  ℎ .The value of sim( ℎ ,   ,   ) is calculated as where Above, we introduce the calculation method of credibility between peer and its neighbor and also describe the calculation method of credibility based on specific query between peers.In this method, we choose different credibility calculation methods to obtain trust for its neighbor peers according to different query types and then select neighbor peers with higher credibility to pass messages.Our algorithm will be described in detail in Section 4.

Data Structure and Update
Mechanism in CSA where According to formula (2), on the basis of CQM, the credibility of   to its neighbor   generated by familiarity is shown as 3.3.Update Mechanism.At the beginning of the search, both QOL and QPL are empty at each peer, as well as the corresponding CQM.Because there is no communication information between node and each of its neighbor nodes, the credibility of the familiarity to its each neighbor is 0 and the credibility of the similarity to its each neighbor is computed according to the formula (4).When a new query arrives, peer firstly checks whether the requested object information is included in QOL; if not, the requested object information (keywords) will be joined in the QOL.When a neighbor node is selected to pass the query and the neighbor node is not in QPL, then it will be added in the QPL.At the same time, in CQM, the corresponding object row and node column information will be established and set the corresponding element Cre  ⋅  num  = 1; if the requested object has been included in the QOL and the QPL contains the selected node, then the requested object position information (  ) and the selected node position information (  ) will be got from QOL and QPL, respectively, and then set the corresponding Cre  ⋅  num  = Cre  ⋅  num  + 1 in CQM.
If the requested object has been included in QOL and the selected node is not in QPL, then the selected node will be added to QPL and the corresponding objects row and node column information will be established in CQM.After the query hit return happens, set Cre  ⋅  num  = 1 and Cre  ⋅  num  = Cre  ⋅  num  + 1 in CQM and update Cre  ⋅ fam value  according to formula (6).If the query returns failure, the corresponding information in QPL, QOL, and CQM will be deleted.

Search Algorithm CSA
4.1.Problem Analysis.In the existing heuristic search algorithm, such as APS, PQR, and SPUN, search mechanisms show that, when a node receives a query message, it firstly tests whether there is relevant information with the query in its index table.If there is, the query will be guided to forward according to the previous record information.And if there is no information, it will be forwarded based on random walks way.Thus, these methods are valid for the queries with high repetition rate.But for some queries with low repetition rate, such as a new query or query for rare resources, there is no or little heuristic guide information in these methods.At this moment, these methods are low efficient and their performances are equivalent to that of random walks.We simulate the three methods APS, PQR, and SPUN in experiments (experimental configuration given in Section 5.1); the simulation results are shown in Figure 1.
In Figure 1, "total number" represents all number of selected neighbor nodes."Random selection" means the number of neighbor nodes chosen randomly."APS selection, " "PQR selection, " and "SPUN selection, " respectively, denote the number of neighbor nodes selected by the heuristic information given in the three algorithms.From Figure 1, we find that only about a third of neighbor nodes are selected according to the heuristic strategies given in the search processing and two-thirds of the neighbor node is still selected randomly in the three algorithms.Therefore, our motivation is to improve the search performance with high repetition rate as well as low repetition rate.In our method, queries firstly can be divided into familiar queries and strange (unfamiliar) ones and then using different neighbor node selection strategy to select neighbor peer forwarding different queries, respectively.

Peer Selection Criteria.
When a node receives a query message, if there is relevant information with the query in index table of this node, then the query is defined as familiar query.The familiar query will be processed according to previous search information.And if no, then the query is defined as strange query.Different from familiar query, there is no heuristic information for strange query in this node.Therefore, for strange query, our method firstly gets the credibility of the node to each neighbor node based on all the previous queries and the similarity between the requested object information and the contents stored on each neighbor node and then chooses the neighbor nodes with higher credibility to transfer the strange query.According to different query, familiar query, or strange query, the different credibility is described below.According to Sections 2 and 3, for a familiar query, the credibility of the node   to its neighbor node   is calculated as follows: Cre In the search process, our method firstly determines a query whether it is a familiar query or a strange query.If it is a familiar query, we calculate the credibility of node to its neighbor nodes according to the formula (10).If it is a strange query, we compute the credibility of node to its neighbor nodes according to the formula (11) and then select the neighbor nodes with high credibility as passers.The proposed algorithm is described in Section 4.3.

Algorithm Description.
The credibility search algorithm includes the process of the neighbor node selection and updating process, which is described in detail in Algorithm 1.

Experiments and Performance Evaluation
P2P networks are large-scale networks with millions of nodes, which join and leave frequently.The dynamic characteristics world model network structure (small world), and scale-free network structure (scale-free).We evaluate the performance of the four search strategies in terms of network overhead, query hit rate, search delay, and the query hit rate for rare resources under static and dynamic network conditions.

Experimental Setting.
Studies have shown that Gnutella, Napster, and Web users request tend to follow Zipf-like distributions [29].In order to reveal the real network environment, the object popularity follows Zipf-like distribution in our experiments and is given by the formula [30]: where  is the number of objects (resources) and  denotes the exponent characterizing of the distribution.Research [31] shows that  is usually between 0.6 and 0.8. is the relative position of a resource or object.In experiments, we provide a total of 10,000 objects that are divided into 100 classes.Each class object is set up corresponding popularity based on the formula (12).We set the appropriate number of each type of object on the basis of its popularity and make objects or resources with high popularity to obtain a higher replication rate.Each object or resource is duplicated to a random node.As for queries, each node obtains query objects from 100 classes based on the popularity of objects.The objects with high popularity are more likely to be selected as the query objects.So the queries in our method can be more close to real networks.Table 1 shows the experimental parameters and their default values.And three kinds of network topologies, respectively, constructed, and node degree and its distribution are analyzed in our experiments.The results of the analysis are shown in Table 2.   the middle of the curves of APS and CSA.The query hit rate of APS is the lowest in the four algorithms and that of CSA is highest.The query hit rate of CSA exceeds that of APS by about 22% and those of PQR and SPUN by about 9% and 10% when TTL value is greater than 5.
The rare resource query hit rates of the four algorithms are shown in Figure 3.The query hit rate of CSA for rare resources is significantly improved and better than those of the other three methods.Even compared to the PQR algorithm with better performance, the query hit rate for rare resources in CSA is still increased by about 20%.The main reason is that searches for rare resources are nonrepetitive search or less repetitive search; there is little or no historical experience used in the other three methods.So the other three methods can only adopt a random manner to forward query messages.However, in this paper, for nonrepetitive search or less repetitive search, we give full consideration for the overall success rate of the previous search experience and the similarity between the query itself and the contents stored in the neighbor node and obtain effective heuristic information to avoid blind random search, thus resulting in better performance in method CSA.
(2) Comparative Analysis of Search Delay.In our experiments, searching is based on the deployment of  walkers and  is set to 4. So each request may receive multiple query hits.In this paper, the search delay is defined as the ℎ value of hit message returned the first time successfully.Figure 4 displays the search delays of the four methods: APS, PQR, SPUN, and CSA for different TTL value in three different network topologies.The search delay of CSA is basically stable at 4 hops when TTL value is greater than 16.And those of the other three methods stepwise grow with the increase of TTL value.Meanwhile, Figure 4 shows the search delay of CSA is lower about 1 hop than those of the other three methods in the same range of TTL value when TTL value is less than 23.When TTL value is greater than 23, the advantage of CSA is more obvious.
(3) Comparative Analysis of Average Number of Messages.Figure 5 shows the average number of messages generated per query in different networks.The performance of CSA is also the best one in the four methods: PQR, SPUN, APS, and CSA.At the same time, the reduction rate of the average number of messages generated per query in CSA is indicated in Figure 5, in contrast to the best performance one in the other three methods PQR, SPUN, and APS.From Figure 5, we can see that the average number of messages per query of CSA is reduced 12.8%, 13.4%, and 12.9% in three different network topologies, respectively, when TTL value comes up to 25.
(4) Comparative Analysis of Network Overhead.In the search processing, network overhead is constituted by the number of messages generated per query and network bandwidth consumed by these messages.In this paper, we use the actual number of bytes of IP packets generated per message as network overhead.
By Section 3.1, the message structure in CSA is ⟨,   ,   , , ℎ,   ℎ , ℎ⟩.In our experiments,  is made up of 10 bytes.The information of node   and   is composed of "IP + port number, " 6 bytes, a total of 48 bits.ℎ and  possess 2 bytes, respectively.4 bytes are assigned to   ℎ .The number of bytes of ℎ is dynamic change in the search progress.
On the other hand, since the total number of bytes generated per message is small, it cannot produce fragmentation.And then message packet is a UDP packet.And thus, the total number of bytes generated by per message will contain a total of 28 bytes of the IP header and UDP header.
Figure 6 illustrates the comparison of network overhead among the four search strategies.Similar to Figure 5, we mark the reduction rate of the network overhead in CSA in contrast to the best performance one in the other three methods PQR, SPUN, and APS in Figure 6. Figure 6 shows that, when TTL value is 5, the network overhead of CSA is slightly higher than the best performance one of the other three methods PQR, SPUN, and APS by 0.2% and 1.0% in small world model network and scale-free network, respectively.But in the initial stage of the search, the entire bandwidth consumption is very low, so the small increase will not bring burden to the network.With the increase of TTL value, the network overhead of CSA is reduced continuously.When TTL value is 15 and 25, the reduction rate of the network overhead is up to the highest 10.3% and 17.0%, in scale-free network and random graph network, respectively.At this time, the performance of CSA is the best in the four algorithms.

Performance Analysis in Dynamic Network Environments.
Here, we conduct a series of experiments similar to Section 5.2.1 to evaluate the performance of CSA under dynamic network environments.The total experimental runtime is divided into 100 time slices.At the end of each time slice, we add 10 new nodes and allocate 100 resources to these nodes according to the Zipf distribution.At the same time, 100 nodes from the network are selected randomly and each node deletes one resource from its resource list randomly.In this dynamic environment, we deploy four query strategies CSA, PQR, SPUN, and APS in three different network topologies; results shown in Figures 7, 8, and 9 and Tables 3 and 4.
Figure 7 shows the comparison of query hit rates of the four algorithms CSA, PQR, SPUN, and APS under dynamic network environments.From Figure 7, the query hit rate of CSA is the highest one in the four search strategies in three different network topologies.In the small world network, especially, the query hit rate of CSA exceeds PQR, SPUN, and APS by about 14%, 18%, and 26%, respectively.
Figure 8 illustrates the comparison of query hit rates of the four algorithms CSA, PQR, SPUN, and APS for rare resources.We can see the query hit rate of CSA algorithm is still the highest and it surpasses those of the other three methods by about 20% under dynamic network environments.
Figure 9 presents the comparison of the search delays of the four methods under dynamic network environments.The search delay of CSA is lower than the other three algorithms by about 1 hop in Figure 9.When TTL value is greater than 20, the search delay of CSA is stable at 6 hops while the other  three algorithms reflect continued stepwise growth trend with the increase of TTL value.
Table 3 shows the average number of messages generated per query in the three different kinds of network topologies.It consists of data of two-hop interval in the TTL value from 1 to 25.So the TTL value is a sequence of 1, 4, 7, . . ., 25 in Table 3.As can be seen from Table 3, in the four methods, the average number of messages generated per query is basically the same, when the TTL value is less than or equal to 7.However, when the TTL value is greater than or equal to 10,  the average number of messages generated per query in CSA is lower than the other three methods at most by about 9.6%.
Table 4 illustrates the network overhead generated by the four search strategies.Similar to Table 3, we also extract experimental data of two-hop interval among 1 to 25 hops to form Table 4. From Table 4, we can see that the network overhead of CSA is less than or equal to those of the other three methods with the increase of TTL value and those of the other three methods are basically the same.At the same time, after the 10th hops, the network overhead of CSA is reduced  In summary, we can see that the performances of the four algorithms are reduced to some extent in the dynamic network environments.For example, in the dynamic environments, the query hit rates of the four search strategies CSA, PQR, SPUN, and APS for scarce resources are reduced on average by about 20%, compared with the static network Mathematical Problems in Engineering  environments as shown in Figures 3 and 8. And, although the search delays of the four strategies are also stepwise growth similar to the static network environments, but its gradient is larger as shown in Figures 4 and 9.The main reason is that the churn of network influences the performances of searches, but which has less effect on the CSA algorithm compared with the other three ones.For example, the query hit rate of CSA drops slightly in Figure 7 and less than 7%  compared to that in Figure 2. Moreover the query hit rates of the other three methods are reduced by more than 10%.At the same time, in the dynamic network environments, CSA method shows better performance than the other three algorithms, compared with static environments.For example, in the static network environments, the query hit rate of CSA is improved by about 9%, 10%, and 22% compared to the algorithm PQR, SPUN, and APS in Figure 2 and in the dynamic     1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 2.0 1.9 1.9 In the search process, the worst case is that all the  walkers travel TTL hops and then return a hit or miss message along query path.Therefore the number of messages generated per query is 2 ×  × TTL in the worst case and which is the same as the other three methods: PQR, SPUN, and APS.But simulation experiments in this paper have shown that CSA produced less number of messages and lower network consumption compared with the other methods, whether it is in a dynamic P2P environment or in a static P2P environment.The main reasons lie in two aspects.On the one hand, the credibility generated on the basis of familiarity and similarity as the heuristic information is very effective to guide the future searches, which improves the query hit rate, reduces the search delay, and shortens the length of the search path, resulting in reducing the number of messages generated and the network consumption.On the other hand, it is reasonable that queries are classified.At the same time, the credibility information provided in CSA is trustworthy, which guides effectively the strange queries to the requested objects, and superior to the random walks way in the other three methods.The improvement of scarce resources query hit rates in CSA may succeed to verify it in our experiments.
CSA can make full use of the advantages of the previous informed algorithms such as APS, PQR, and SPUN, using the previous search information to guide the future searches.Thus, CSA is similar to these methods on the handing of the queries with high repetition rate.The difference is that, in the acquisition process of heuristic information, APS, PQR, and SPUN methods only consider hit information of successful path, while in CSA the similar information between the queried contents and contents stored on node is also considered as the heuristic information to guide the future searches.So, the query message is forwarded to nodes that are more likely to provide the necessary resources node in CSA.
The main difference between CSA and APS, PQR, and SPUN is that queries are classified as familiar queries and strange (unfamiliar) ones according to the similarity between the requested resources and those have been received in the query node in CSA.For the strange queries, there is no or little heuristic guide information in APS, PQR, and SPUN methods and the queries will be forwarded based on random walks way.However, in CSA, based on the trust production principle in sociology and psychology, CSA method firstly gets the credibility of the node to each neighbor node according to all the previous queries and the similarity between the requested object information and the contents stored on each neighbor node and then selects the nodes with higher credibility to forward the strange queries.In contrast to random manner in APS, PQR, and SPUN methods, the one in CSA greatly reduce the search blindness.
The main feature of CSA method is that it makes full use of the query information and resources information nodes themselves hold.Based on node local information, the searches are effectively forwarded.This method does not require too complicated structure and does not need to track search path information, so it has a good adaptability for dynamic characteristics of unstructured P2P networks.

Conclusion
In this paper, a credibility search algorithm (CSA) has been presented.The main feature of this method is that it can improve query performance in unstructured P2P networks.CSA can gain the effective heuristic information and credibility of node to its neighbor by combining with the trust production principle in sociology and psychology, so that the familiar query and the strange query can be guided successfully.Experimental results show that the proposed algorithm outperforms the other three methods: PQR, SPUN, and APS furthermore can achieve high query hit rate with less search delay and lower bandwidth consumption in three different types of network topologies under static and dynamic network conditions.At the same time, CSA is also very effective for the search of rare resources.In the scale-free network especially, the query hit of CSA for race resources can reach up to about 85% when TTL value comes up to 24.Compared to PQR and APS, the query hit of CSA for race resources is increased by about 20% and 40%, respectively, with the increase of TTL value in three different network topologies.

Figure 1 :
Figure 1: The comparison of the number of neighbor nodes selected.

Figure 2 :
Figure 2: Query hit rate versus TTL value in static network environments.

Figure 3 :
Figure 3: Query hit rate of rare resources versus TTL value in static network environments.

Figure 4 :
Figure 4: Search delay versus TTL value in static network environments.
Average number of messages generated per query (c) Scale-free network

Figure 5 :
Figure 5: Average number of messages generated per query versus TTL value in static network environments.

Figure 6 :
Figure 6: Network overhead versus TTL values in static network environments.

Figure 7 :
Figure 7: Query hit rate versus TTL value in dynamic network environments.
Scale-free network

Figure 8 :
Figure 8: Query hit rate of rare resources versus TTL value in dynamic network environments.
Scale-free network

Figure 9 :
Figure 9: Search delay versus TTL value in dynamic network environments.
2 , which reduces bottleneck problem of  2 and improves performance of search in future search.Therefore, the credibility of   to   generated by famil-   →   is a set of messages from   forwarded to   .SR   ←  is also a set of messages from   success return to   .Where fam(  ,   ) ∈ [0, 1], when fam(  ,   ) = 0, no success message is returned to   through   .When fam(  ,   ) = 1, the query messages forwarded to   are all returned successfully to   .Thereby the greater fam(  ,   ) is, the more credible   is and selecting   to forward the query will get the higher probability of success hit.
ℎ is a set of keywords included in the query  ℎ ,    denotes the keywords set of all objects peer   hold, and sim( ℎ ,   ,   ) ∈ [0, 1].There are no similar keywords with the specific query  ℎ in peer   , if sim( ℎ ,   ,   ) = 0.All keywords of the specific  ℎ are contained in the keywords set of objects peer   hold, if sim( ℎ ,   ,   ) = 1.So, the more the number of similar keywords is between  ℎ and   , the bigger the value of sim( ℎ ,   ,   ) is; then the neighbor peer   is more credible for the specific query  ℎ .According to calculation method of the credibility fam(  ,   ) generated by familiarity in Section 2.1, we denote the credibility fam( ℎ ,   ,   ) generated by familiarity based on the specific query  ℎ as follows:→  is a set of all messages that contain keywords   ℎ from   forwarded to   .SR   ℎ from   success return to   and fam( ℎ ,   ,   ) ∈ [0, 1].If fam( ℎ ,   ,   ) = 0, no message containing keywords   ℎ is returned successfully to   from   ; then the credibility of   to   generated by familiarity for the specific query  ℎ is 0. If fam( ℎ ,   ,   ) = 1, all messages containing keywords   ℎ are returned successfully to   from   ; then the credibility of   to   generated by familiarity for the specific query  ℎ is 1.So, the bigger the value of fam( ℎ ,   ,   ) is, the more credible the neighbor peer   is for the query  ℎ .
fam ( ℎ ,   ,   ) =          SR     ℎ ←                         ℎ →           ,(6)where (6)h item Cre  has a compound data structure and is represented by a tuple of three elements: Cre  = ⟨ num  ,  num  , fam value  ⟩.  num  is the number of query messages containing object   and forwarded by neighbor peer   .num  is the number of query hit messages containing object   and forwarded by neighbor peer   .famvalue  is calculated by formula(6)and denotes the credibility for neighbor   generated by familiarity for the query containing object   .numwill be updated, if   is selected to forward the query containing object   .When the query is returned with hit by   , the value of  num  will be updated.At the same time, the value of fam value  will be updated according to formula(6).The dynamic update way in the search process adapts for dynamic characteristics of the unstructured P2P networks and avoids the bottleneck problem of the peer with higher credible.For example, when the value of fam value  is higher and there are multiple queries containing object   in peer,   will be multiselected.If the message is only to be forwarded, but not returned, then  num  will be increased and  num  not and thus the value of fam value  will be decreased.And then neighbor peers except   in CQM will have chances to forward the queries containing object   , thus reducing the burden of   and expanding the scope of the search.For the new query  ℎ containing object   , the credibility rate fam( ℎ ,   ,   )  of   to its neighbor   by CQM is calculated as fam( ℎ ,   ,   )  =      Cre ( ℎ ,   ,   ) = fam ( ℎ ,   ,   )  + sim ( ℎ ,   ,   )For a strange query, the credibility of the node   to its neighbor node   is calculated as follows:Cre ( ℎ ,   ,   ) = fam (  ,   ) + sim ( ℎ ,   ,   )

Table 2 :
Analysis of node degree.

Table 3 :
Average number of messages generated per query versus TTL value in dynamic network environments.

Table 4 :
Network overhead versus TTL value in dynamic network environments (M).
In CSA, the space cost is mainly composed of four indexes CNL, CQM, QPL, and QOL in each node.CNL, QPL, and QOL are linear lists.The number of elements in CNL is equal to the number of all neighbors of a node.The length of QPL does not exceed the length of CNL.At the same time, the length of QOL is also not large because it records only different query objects.CQM is a two-dimensional matrix.If the length of the QOL is  and the length of the QPL is  then the length of CQM is  × .So, the space cost of CNL, QPL, and QOL is negligibly small compared with that of CQM in each node because of their linear length.Thus the space complexity of each node in CSA is ( × ) which is the same with the PQR and APS method, less SPUN (( ×  × TTL)) algorithm.Meanwhile, such space cost is not a burden for computing device in current P2P networks.However, if the storage capacity of node is very low, the node can choose a small size of CQM and update it according to first in first out (FIFO) policy or least recent used (LRU) policy.