A Distributed Dynamic Super Peer Selection Method Based on Evolutionary Game for Heterogeneous P 2 P Streaming Systems

Due to high efficiency and good scalability, hierarchical hybrid P2P architecture has drawn more and more attention in P2P streaming research and application fields recently. The problem about super peer selection, which is the key problem in hybrid heterogeneous P2P architecture, is becoming highly challenging because super peers must be selected from a huge and dynamically changing network. A distributed super peer selection (SPS) algorithm for hybrid heterogeneous P2P streaming system based on evolutionary game is proposed in this paper.The super peer selection procedure ismodeled based on evolutionary game framework firstly, and its evolutionarily stable strategies are analyzed. Then a distributed Q-learning algorithm (ESS-SPS) according to the mixed strategies by analysis is proposed for the peers to converge to the ESSs based on its own payoff history. Compared to the traditional randomly super peer selection scheme, experiments results show that the proposed ESS-SPS algorithm achieves better performance in terms of social welfare and average upload rate of super peers and keeps the upload capacity of the P2P streaming system increasing steadily with the number of peers increasing.


Introduction
Future media internet will need to be able to distribute high quality video contents in an efficient, supple, and personalized way through dynamic and heterogeneous network environments.Multimedia streaming content distribution over internet has already become a major application due to users' growing demand of multimedia content and extraordinary growth of network technologies [1].Streaming applications over heterogeneous P2P systems have gained an enormous popularity.However, providing streaming service over heterogeneous P2P networks is still a challenging task because of their inherent instability and unreliability.According to the characteristics of data dissemination structure, current P2P video streaming system deployed can be further divided into three types: tree-push based topology, mesh-pull based topology, and hierarchical hybrid topology.Among them, the hierarchical hybrid topology model, which divides the peers in P2P streaming systems into super peers and ordinary peers according to the performance differences of nodes, has become the focus of recent researches especially for heterogeneous P2P streaming applications.The hierarchical hybrid P2P model integrates the advantages of tree-based topology and mesh-based topology.It allows the nodes with stronger ability to participate fully in the P2P streaming system to get higher scalability and stability.However, the super peers selection problem, which is the key problem in hierarchical hybrid P2P streaming systems, is highly challenging because in the heterogeneous peer-to-peer environment a large number of super peers must be selected from a huge and dynamically changing network in which neither the node characteristics nor the network topology are known as a priori [2].Furthermore, the super peers selection plays an important role in the efficiency and the Quality of Experience (QoE) of the whole P2P streaming system.
Currently, many approaches have been proposed to solve the super peer selection problem, but most of these researches focus on the capacity of peers [2][3][4][5], such as CPU cycle, memory space, and network bandwidth.And few studies focus on the behavioral attribute of peers.Super peers need to provide resources inquiry services and routing services for the other ordinary nodes within a group, and they also need to upload video streaming chunks for them.The cost of calculation and storage of super peers may lead to their system performances decline.An  2  (hierarchical 2-level Overlay) protocol was proposed in [2] to select super peers in unstructured P2P networks which can take trust and secure paths, routing performance, and some other factors into account, but the notification messages between super peers are distributed through flooding mechanism, resulting in a lot of network overhead.Jin et al. proposed a super peer selection method based on proxy trust in [6].They used proxy trust to identify the behaviors of super peers.With this metric a peer can select its own trusted super peers and isolate malicious super peers in the system.The method proposed in [6] reduced the overhead of network flooding in [2] to some extent, but it required a priori knowledge in the decision on whether to trust a node or not.Reference [7] proposed a super peer election algorithm based upon district partition which divided the whole P2P overlay into  small districts and using distributed and parallel election algorithm among these small units to reduce message complexity as well as communication costs.A super-node selection protocol based on self-information theory was discussed in [8], which used a weighted mean mechanism based on the information vector of peers to promote the "best" nodes to be a super peer according to their capabilities and on-line time.Wolf et al. considered the super peer selection problem as a hub location optimization problem and proof it is NP hard in [9].They also present a heuristic method based on evolutionary techniques and local search to solve this problem in [10].Kratica et al. proposed two hybrid genetic algorithm approaches in [11] for solving this NP hard problem.The algorithms adopted two-segment encoding of peers and appropriate objective functions with modified genetic operators.They used local search heuristics to minimize the communication costs.But most of these researches studied general P2P networks and did not consider the characteristics of P2P streaming network.Wang et al. proposed a super peer selecting mechanism based on analytic hierarchy process in [12], which considered RTT, interest vector similarity, and bandwidth of the node synthetically.Being different from the above works, in this study, we present a super peer selection scheme based on evolutionary game in a hybrid heterogeneous P2P streaming networking scenario.
Game theory is a mature theory and a formal mathematic tool to model the strategy interaction between two or more decision makers, which has attracted more and more attention in computer science [13,14] in recent years.The concepts of game theory provide a language to formulate structure, analyze, and understand strategic scenarios, which assumes that each participant involved in the game is "rational" to choose strategies to make his own benefit maximization.Roy et al. surveyed the existing game theoretic solutions in [14] which are designed to enhance network security, such as network intrusion detection, DDoS attacks, and internet worms.In traditional game theory, the Nash equilibrium is the most popular solution.It ensures that a player cannot improve its payoff if none of the other players in the game deviates from the solution.However, when the solution to a noncooperative game has multiple Nash equilibriums, a refined solution is required.Evolutionary equilibrium, which is based on the theory of evolutionary game theory, provides such a refined solution, and it ensures stability (i.e., group of players will not change their chosen strategies over time).An evolutionary game can explicitly capture the dynamics of interaction among the players in a population.In an evolutionary game, a player can observe the behavior of other players, learn from the observations, and make the best decision based on its knowledge.In addition, with replicator dynamics, the state of the game can be determined at a particular point in time, which is useful for investigating the trends of the strategies of the players while adapting their behavior to reach the solution.
The peers in P2P streaming systems are rational and selfish, and they can make decisions according to their needs and the behavior of other peers.In this paper, we present a distributed method for super peer selection dynamically in hybrid heterogeneous P2P streaming system.First we use evolutionary game theory framework to model the super peer selection procedure in hybrid heterogeneous P2P streaming system, and then we design a distributed super peer selection algorithm (ESS-SPS) based on Q-learning according to the Evolutionarily Stable Strategies (ESSs).We apply ESS-SPS algorithm to the real P2P streaming system LStream [15], which is deployed on the Henan Education and Research NETwork (HERNET).The experimental results show that compared with the traditional randomly super peer selection method, ESS-SPS method can provide the system with the higher streaming media upload rate stably and enabling the entire P2P streaming system upload capacity increasing steadily with the number of nodes increasing.
The rest of the paper is organized as follows.In Section 2, we describe the hybrid heterogeneous P2P streaming system model and utility functions of peer nodes.We model the super peer selection (SPS) procedure by evolutionary game and analyzes the Evolutionarily Stable Strategies for the SPS evolutionary game model by applying replicator dynamics equation in Section 3. In Section 4, we propose a distributed Q-learning algorithm for SPS evolutionary game model which can dynamically converge to the ESS.In Section 5, we show the experiment results to evaluate the performance of the proposed ESS-SPS algorithm.Last we conclude the paper in Section 6.

System Model and Utility Function
2.1.System Model.Peers in hybrid heterogeneous P2P streaming system are divided into peer groups according to their geographical location as shown in Figure 1 [16].We expect that the video streaming chunks of a certain video channel can be shared within one peer group as much as possible to reduce unnecessary network traffic.There are a set of group peers who want to view a live video streaming channel simultaneously in the edge of the hybrid heterogeneous P2P live streaming system.Every peer can choose to act as a super peer (SP) or an ordinary peer (OP) in a group.If a peer choose to act as an SP, then he/she not only needs to act as a client to download video streaming chunks from the SPs in other peer groups but also needs to act as a server to upload video streaming chunks for both the SPs in other groups and the peers in the same group.
On the contrary, if a peer chooses to be an OP, he/she only needs to download or upload necessary streaming chunks in the same group.Assume that the upload and download bandwidth within a super group is larger than that cross groups.In such a case, peers tend to be an OP because of their selfish nature.However, from the other aspect, the OPs have a risk of receiving degraded streaming quality since there may not be sufficient SPs to download streaming chunks from other groups.

Utility Function.
In the hybrid heterogeneous P2P streaming system, peers not only act as clients who download needed streaming chunks from other peers but also act as servers to upload video streaming chunks for the other peers.So, while a peer can benefit downloading video streaming chunks from the other peers, he/she also causes a cost in uploading video streaming chunks for the other peers, where the cost may be the resource consumed on uploading chunks, such as bandwidth, memory, and buffer size.
Assume that there are  peers within the group, among which  peers are willing to act as SPs to download video streaming chunks from the SPs of other groups.Suppose that the download rates of the  SPs are  1 ,  2 , . . .,   ; then the total download rate of the group peers is   = ∑ s =1   .As these  SPs select peers outside the group for downloading streaming chunks randomly and independently, the download rate   's are random variables.According to [17], the cumulative distribution function (CDF) of a peer's download bandwidth can be modeled as a linear function, which means that the probability density function (PDF) of a peer's download bandwidth can be viewed as a uniform distribution, which means   's are uniformly distributed.
Obviously, if the total download rate   is no less than the source rate , then the group peers can obtain the effect of real-time streaming, and all the group peers can obtain a certain profit .Otherwise there will be some delay, and in this case we assume the gain of the peer is 0. Therefore, given the total download rate   and the source rate , if peer  chooses to be an SP, then its utility function is presented by where   is the cost of peer  when he/she acts as an SP, and Pr(  ≥ ) is the probability that the peer can obtain real-time streaming effect.If we assume that  1 ,  2 , . . .,   are independent and identically distributed in [  ,   ], and  is sufficiently large, then Pr(  ≥ ) can be approximately computed as Pr(  ≥ ) ≈ ((r − )/2)/(√/12)), where r = ( −   )/(  −   ) and () is the Gaussian tail function As the upload and download bandwidths within the group are large, it is assumed that the cost of uploading the streaming chunks to the other peers within the same group can be negligible.In this case, if a peer  chooses to be an OP instead of an SP, then there is no cost for him/her, and the utility function is

Super Peer Selection (SPS) Evolutionary Game Model
We adopt the concept of Evolutionarily Stable Strategy (ESS) [19,20] to provide a robust equilibrium strategy for the selfish peers.In evolutionary game, Evolutionary Stable Strategy (ESS) is a solution to game theoretic problems which is equivalent to the Nash equilibrium but can be applied to the evolution of individuals behavior, which is defined as Definition 1.If all individuals are using ESSs, then they will do not change to a different strategy unless their expected payoff will be worse than the rest of the population in the system.Since all peers are selfish and rational, they will cheat if their payoffs can be increased by cheating, which means that all peers are uncertain to the behavior and payoffs of the other peers.In such a case, peers will learn from strategy interaction in each round of game and try different strategies to improve their own utility.Hence, the percentage of peers who use a certain pure strategy may change during this process.Such kind of population evolution process can be modeled by replicator dynamics method, which is the basic dynamic mechanism of evolutionary game.
In a dynamic evolutionary game, an individual from a population, which means a player in the game, who is able to reproduce itself through the process of mutation and selection, is called a replicator.In such case, a replicator with a higher payoff can reproduce itself faster.When the reproduction process takes place over time, this can be modeled by using a set of ordinary differential equations called replicator dynamics equations.Replicator dynamics refers to the growth rate of the peers using certain pure strategy is proportional to the difference between the average payoffs obtained by using the pure strategy and the average payoffs of the peers within the group, which is important for an evolutionary game since it can provide information about the population.Given a particular point in time, it can accurately describe the dynamic relationship between individual behavior payoff and the group system evolution.
For clarity, we assume that the cost of all peers serving as an SP is same firstly, and heterogeneous situation under hybrid P2P streaming systems will be discussed in the next section.In replicator dynamics, it is assumed that a peer chooses pure strategy  from a finite set of strategies  = {SP, OP} including acting as an SP or acting as an OP.Let   denote the number of individuals choosing strategy , and let the total population size be  = ∑ 2 =1   .The proportion of individuals choosing strategy  is   =   /, and it is referred to as the population share.The population state can be denoted by the vector  = { 1 ,  2 }.The replicator dynamics can be described as follows: where   () is the average payoff obtained by the peers using pure strategy , and () is the average payoff of all peers within the group.It can be seen from (3) that if using pure strategy  can get a higher payoff than the average level, the probability of a peer using strategy  will grow, and the growth rate is in proportion to the difference between the average payoff of using pure strategy  and the average payoff of all peers.

Analysis of the SPS Evolutionary Game.
According to (1) and (2), the average payoff of a peer if he/she chooses to be an SP can be computed by where  is the probability of a peer being an SP, and is the probability that there are  SPs out of other  − 1 peers.Similarly, the average payoff of a peer if he/she chooses to be an OP is given by According to (4) and ( 5), the average payoff of a peer is Substituting ( 6) back to replicator dynamics differential equation (3), for  = SP, we have At equilibrium point  * , no player will deviate from the optimal strategy, which means /|  * = 0, and we can get  * = 0,  * = 1, or  * is the solutions to  SP () −  OP () = 0.However, since /|  * = 0 is only the necessary condition for  * to be ESS, we examine the sufficient condition for each ESS candidate and draw the following conclusions with the proofs shown in Theorems 3-5: (1)  * = 0 is an ESS only when Pr( 1 ≥ ) −  ≤ 0; (2)  * = 1 is an ESS only when Pr(  ≥ ) − Pr( −1 ≥ ) ≥ ; (3) let  * be the solution to  SP () =  OP (), and  * ∈ (0, 1); then  * is an ESS.Proof.According to (4) and ( 5), we have where is the additional gain by introducing one more SP into the i-SPs P2P streaming system.
For any  ∈ [0, 1], by taking the derivative of () over , we have By introducing an integer  1 that satisfies  1 ≤ ( − 1) <  1 + 1, we can get that Since V  is a decreasing function in terms of , which means that V  ≥ V  1 when for all  ≤  1 , and V  < V  1 when for all  >  1 .Therefore, according to (10), we have And therefore,   () < 0, for all  ∈ [0, 1].

SPS Model for Hybrid Heterogeneous P2P Streaming
System.In this section, we will further discuss how to choose super peers in heterogeneous peer groups for hybrid P2P streaming systems, where peer nodes may be different in performance when they choose to be a super group peer.
Let  ,  be the probability of peer  use pure strategy   ∈ .According to the replicator dynamics equation, which refers to the growth rate of the proportion of the peers using certain pure strategy is proportional to the difference between the payoffs obtained by using the pure strategy and the average payoffs of the peers within the group; the evolutionary dynamics of  ,  is given by the differential equation as follows: where   (  ,  − ) is the average payoff of peer  using pure strategy   , and   (  ) is the average payoff of peer  using mixed strategy.
To simplify the analysis, we start from the two-player game, and then the results will be extended to general multiplayer game.
Let  1 ,  2 be the probability of peer 1 and peer 2 who choose to act as an SP, respectively.Let  1 = Pr( 1 ≥ ) and  2 = Pr( 2 ≥ ); then the payoff matrix of peer 1 and peer 2 can be presented in Table 1.And then, the average payoff  1 (SP,  2 ) can be calculated by and the average payoff  1 ( 1 ) can be calculated by From ( 16), the replicator dynamics equation of peer 1 can be calculated by And the replicator dynamics equation of peer 2 can be calculated by At the equilibrium point, we know that ( 1 /) = 0 and ( 2 /) = 0.According to (19) and (20), we can get five equilibriums.They are (0, 0), (0, 1), (1, 0), (1, 1), and the mixed strategy equilibrium According to [20], in a evolutionary dynamic system, if an equilibrium of the replicator dynamics equations is a locally asymptotically stable point, it is an ESS.Equations ( 19) and ( 20) can be regarded as a nonlinear dynamic system and we can examine whether the five equilibriums are ESSs by analyzing the Jacobian matrix of the nonlinear dynamic system.By taking partial derivatives of ( 19) and (20), we can get the Jacobian matrix as where . At the locally asymptotically stable point, we have that det() > 0, and tr() < 0 [20].By substituting the five equilibriums into (21), we can get that (1) if  2 −  1 −  1 > 0 and  2 −  1 −  2 > 0, there is a unique ESS (1, 1), which means that both peer 1 and peer 2 will converge to be SPs; ( there is a unique ESS (1, 0), which means that peer 1 will converge to be an SP and peer 2 tends to be an ordinary peer; (3 there is a unique ESS (0, 1), which means that peer 2 will converge to be an SP and peer 1 tends to be an ordinary peer; (4) else there are two ESSs (0, 1) and (1, 0), where the converged strategy profiles depend on the initial strategy profiles.
From the above analysis, we can see that when the gain of being an SP ( 2 −  1 ) is greater than the cost of being an SP ( 1 or  2 ), the peer tends to be an SP.And the peer with a higher cost tends to be a free rider and rely on the peer with a lower cost.
After analyzing the two-player Super Peer Selection game, we can deduce from it that the peer with a higher cost (  ) tends to rely on the peer with a lower cost, which can be extended to multiplayer game.If there are multiple peers in the Super Peer selection game, the strategy of the peers whose cost is higher will converge to "OP" with greater probabilities.Conversely, the peers with lower costs tend to be SPs who will suffer relatively severer losses if no one serves as an SP.

Algorithm for ESS
From the previous section, we can see that the ESS can be found by solving the replicator dynamics equations.However, solving the replicator dynamics equations requires the exchange of strategies adopted by other peers and their private information.In such a case, a Q-learning approach [21] which is a type of reinforcement learning (i.e., learning by interaction) is used.With this ability to learn, complete payoff information of other users in the same group is no longer required for SP selection.In this section, we will present a distributed Q-learning algorithm that can gradually converge to ESS without information exchange.We first discretize the replicator dynamics equation as where  is the slot index and   () is the probability of peer  being an SP during slot .Here, we assume that each slot can be further divided into  subslots and each peer can choose to be an SP or an OP at the beginning of each subslot.From (22), we can see that in order to update   ( + 1), we need to first compute   (,  − ()) and   (  ()).Let us define Step 1: Given the slot index  = 0, each peer initializes   () =   (0) with a random between [0, 1].
Step 2: Initialize -value associated with  () = 0 for all peers in super groups.
Step 3: loop During slot , for k = [1 : M] if   () ≤  then peer  randomly choose to be a SP to serve as a super peer to download streaming chunks from the peers outside the group with dowpnload rate   (, ) or to be an OP to download streaming chunks from the SP.else Choose strategy  * = arg max   () end if peer  computes the indicator function   (, ) and his/her directly payoff   (, ) using (23).
The directly payoff of player  at subslot  in slot  can be computed by Similarly,   (  ()) can be approximated as In the distributed Q-learning algorithm, -value (i.e., ()) is used to maintain the knowledge about each peer, and the decision can be made based on this knowledge.The Super Peer Selection algorithm is described in Algorithm 1.
In Algorithm 1, a peer performs the exploration step with probability , and  denotes the learning rate that is used to control the speed of adjustment of the -value.A new -value ( + 1), which is the expected payoff for the future iterations, is obtained based on the previous value () along with the new observed payoff   (  ()).Here, the new observed payoff is biased by the outcome of choosing the best action based on the available knowledge (i.e., max  ()).

Experimental Results
For evaluation purposes, we consider the traditional randomly super peers selection method, which is denoted as Random SPS for performance comparison.In Random SPS, each peer acts as an individual and randomly selects some peers from its partner list for downloading video chunks.Such a protocol has been widely used in the existing P2P systems, such as PPLive [22] and Cool Streaming [23].For convenience, in the rest of this paper, we denote the proposed ESS-based approach as ESS-SPS.
In this section, we describe the experiment results which are conducted on the LStream P2P live streaming platform deployed on the Henan Education and Research NETwork (HERNET).In our experiments, there are about 2000 users in the hybrid heterogeneous P2P live streaming system.The video is initially stored at an original video server with upload bandwidth  = 3 Mbps.The request round is 1 second and the relay buffer length is 30 seconds and the streaming fragment size is  = 1 KB.
In the first experiment, assume that there are 20 heterogeneous peers in one super group, their gain is  = 1, and their cost   is a random between [0, 0.3].The first experiment compares the differences between ESS-SPS method and traditional Random-SPS method in the aspect of the obtained social welfare, which is defined by the sum of the utility obtained by all peers.As it is shown in Figure 2, ESS-SPS method obtains a stable Nash equilibrium and chooses suitable numbers of SPs through the collaboration among the peers in the same super group.It maintains a relatively higher social welfare level under the condition of relatively higher video source rate .But the social welfare level of traditional Random-SPS method decreases linearly and rapidly with the source rate increasing.
In the second experiment, we evaluate the convergence property of the ESS-SPS.In Figure 3, we show that the replicator dynamic of the SPS game model with heterogeneous peers.We can see that, starting from high initial values, all  peers gradually reduce their probabilities of being an SP since being an OP more often can bring a higher payoff.However, because too low probability of being an SP may increase the chance of having no peer be an SP, the probability of being an SP will finally converge to a certain value which is determined by the number of peers in the group.
The third experiment compares the upload capacity of the P2P streaming network using ESS-SPS algorithm and traditional Random-SPS method.As it is shown in Figure 4, when the nodes number up to about 2100, the load of the super peers close to saturation, and when the system scale continues to increase, the increase in the upload capacity of  the P2P streaming system is not obvious.On the contrary, ESS-SPS algorithm eliminates the system bottlenecks and the upload capacity of the P2P streaming system increased steadily with the number of peers increasing.
The fourth experiment compares the average upload rates of super peers using ESS-SPS algorithm and traditional Random-SPS method, respectively.Average upload rates of super peers is the ratio of the total sum of upload rate of all the super peers to the number of them, which reflects the upload capability for the P2P streaming system provided by the super peers.As it is shown in Figure 5, the SPs generated by ESS-SPS provide higher upload capacity for the P2P streaming system and the average upload rates of super peers increase steadily with system uptime increasing.

Conclusion
In this paper, we propose a super peer selection game model based on evolutionary game, which is used in hybrid heterogeneous P2P streaming system to address the network inefficiency problem encountered by the traditional randomly super peer selection scheme.By deriving the ESS for every peer, we further propose a distributed Q-learning algorithm for each peer to converge to the ESS by learning from his/her own past payoff history.The experiment results indicate that, compared with the traditional Random-SPS scheme, the proposed method achieves much better social welfare and enables the upload capacity of the whole P2P streaming system to increase stably with the peer number increasing.

Figure 1 :
Figure 1: Peer groups at the edge of hybrid heterogeneous P2P streaming system.

Figure 2 :
Figure 2: The social welfare comparison between ESS SPS and Random SPS.

Figure 3 :
Figure 3: Behavior dynamics of peers in a heterogeneous super group.

Figure 4 :
Figure 4: The upload capacity of P2P streaming system comparison between ESS-SPS and Random SPS.

Figure 5 :
Figure 5: The average upload rates of SPs comparison between ESS-SPS and Random SPS.

Table 1 :
Payoff matrix of the two-player super peer selection evolutionary game.
peer  chooses to be an SP,   ≥ , −  , if peer  chooses to be an SP,   < , , if peer  chooses to be an OP,   ≥ , 0, if peer  chooses to be an OP,   < ,(23)where   is the total download rate of the SPs and  is the video source rate.Then,   (,  − ()) can be approximated as