Due to high efficiency and good scalability, hierarchical hybrid P2P architecture has drawn more and more attention in P2P streaming research and application fields recently. The problem about super peer selection, which is the key problem in hybrid heterogeneous P2P architecture, is becoming highly challenging because super peers must be selected from a huge and dynamically changing network. A distributed super peer selection (SPS) algorithm for hybrid heterogeneous P2P streaming system based on evolutionary game is proposed in this paper. The super peer selection procedure is modeled based on evolutionary game framework firstly, and its evolutionarily stable strategies are analyzed. Then a distributed Q-learning algorithm (ESS-SPS) according to the mixed strategies by analysis is proposed for the peers to converge to the ESSs based on its own payoff history. Compared to the traditional randomly super peer selection scheme, experiments results show that the proposed ESS-SPS algorithm achieves better performance in terms of social welfare and average upload rate of super peers and keeps the upload capacity of the P2P streaming system increasing steadily with the number of peers increasing.
1. Introduction
Future media internet will need to be able to distribute high quality video contents in an efficient, supple, and personalized way through dynamic and heterogeneous network environments. Multimedia streaming content distribution over internet has already become a major application due to users' growing demand of multimedia content and extraordinary growth of network technologies [1]. Streaming applications over heterogeneous P2P systems have gained an enormous popularity. However, providing streaming service over heterogeneous P2P networks is still a challenging task because of their inherent instability and unreliability. According to the characteristics of data dissemination structure, current P2P video streaming system deployed can be further divided into three types: tree-push based topology, mesh-pull based topology, and hierarchical hybrid topology. Among them, the hierarchical hybrid topology model, which divides the peers in P2P streaming systems into super peers and ordinary peers according to the performance differences of nodes, has become the focus of recent researches especially for heterogeneous P2P streaming applications. The hierarchical hybrid P2P model integrates the advantages of tree-based topology and mesh-based topology. It allows the nodes with stronger ability to participate fully in the P2P streaming system to get higher scalability and stability. However, the super peers selection problem, which is the key problem in hierarchical hybrid P2P streaming systems, is highly challenging because in the heterogeneous peer-to-peer environment a large number of super peers must be selected from a huge and dynamically changing network in which neither the node characteristics nor the network topology are known as a priori [2]. Furthermore, the super peers selection plays an important role in the efficiency and the Quality of Experience (QoE) of the whole P2P streaming system.
Currently, many approaches have been proposed to solve the super peer selection problem, but most of these researches focus on the capacity of peers [2–5], such as CPU cycle, memory space, and network bandwidth. And few studies focus on the behavioral attribute of peers. Super peers need to provide resources inquiry services and routing services for the other ordinary nodes within a group, and they also need to upload video streaming chunks for them. The cost of calculation and storage of super peers may lead to their system performances decline. An H2O (hierarchical 2-level Overlay) protocol was proposed in [2] to select super peers in unstructured P2P networks which can take trust and secure paths, routing performance, and some other factors into account, but the notification messages between super peers are distributed through flooding mechanism, resulting in a lot of network overhead. Jin et al. proposed a super peer selection method based on proxy trust in [6]. They used proxy trust to identify the behaviors of super peers. With this metric a peer can select its own trusted super peers and isolate malicious super peers in the system. The method proposed in [6] reduced the overhead of network flooding in [2] to some extent, but it required a priori knowledge in the decision on whether to trust a node or not. Reference [7] proposed a super peer election algorithm based upon district partition which divided the whole P2P overlay into k small districts and using distributed and parallel election algorithm among these small units to reduce message complexity as well as communication costs. A super-node selection protocol based on self-information theory was discussed in [8], which used a weighted mean mechanism based on the information vector of peers to promote the “best” nodes to be a super peer according to their capabilities and on-line time. Wolf et al. considered the super peer selection problem as a hub location optimization problem and proof it is NP hard in [9]. They also present a heuristic method based on evolutionary techniques and local search to solve this problem in [10]. Kratica et al. proposed two hybrid genetic algorithm approaches in [11] for solving this NP hard problem. The algorithms adopted two-segment encoding of peers and appropriate objective functions with modified genetic operators. They used local search heuristics to minimize the communication costs. But most of these researches studied general P2P networks and did not consider the characteristics of P2P streaming network. Wang et al. proposed a super peer selecting mechanism based on analytic hierarchy process in [12], which considered RTT, interest vector similarity, and bandwidth of the node synthetically. Being different from the above works, in this study, we present a super peer selection scheme based on evolutionary game in a hybrid heterogeneous P2P streaming networking scenario.
Game theory is a mature theory and a formal mathematic tool to model the strategy interaction between two or more decision makers, which has attracted more and more attention in computer science [13, 14] in recent years. The concepts of game theory provide a language to formulate structure, analyze, and understand strategic scenarios, which assumes that each participant involved in the game is “rational” to choose strategies to make his own benefit maximization. Roy et al. surveyed the existing game theoretic solutions in [14] which are designed to enhance network security, such as network intrusion detection, DDoS attacks, and internet worms. In traditional game theory, the Nash equilibrium is the most popular solution. It ensures that a player cannot improve its payoff if none of the other players in the game deviates from the solution. However, when the solution to a noncooperative game has multiple Nash equilibriums, a refined solution is required. Evolutionary equilibrium, which is based on the theory of evolutionary game theory, provides such a refined solution, and it ensures stability (i.e., group of players will not change their chosen strategies over time). An evolutionary game can explicitly capture the dynamics of interaction among the players in a population. In an evolutionary game, a player can observe the behavior of other players, learn from the observations, and make the best decision based on its knowledge. In addition, with replicator dynamics, the state of the game can be determined at a particular point in time, which is useful for investigating the trends of the strategies of the players while adapting their behavior to reach the solution.
The peers in P2P streaming systems are rational and selfish, and they can make decisions according to their needs and the behavior of other peers. In this paper, we present a distributed method for super peer selection dynamically in hybrid heterogeneous P2P streaming system. First we use evolutionary game theory framework to model the super peer selection procedure in hybrid heterogeneous P2P streaming system, and then we design a distributed super peer selection algorithm (ESS-SPS) based on Q-learning according to the Evolutionarily Stable Strategies (ESSs). We apply ESS-SPS algorithm to the real P2P streaming system LStream [15], which is deployed on the Henan Education and Research NETwork (HERNET). The experimental results show that compared with the traditional randomly super peer selection method, ESS-SPS method can provide the system with the higher streaming media upload rate stably and enabling the entire P2P streaming system upload capacity increasing steadily with the number of nodes increasing.
The rest of the paper is organized as follows. In Section 2, we describe the hybrid heterogeneous P2P streaming system model and utility functions of peer nodes. We model the super peer selection (SPS) procedure by evolutionary game and analyzes the Evolutionarily Stable Strategies for the SPS evolutionary game model by applying replicator dynamics equation in Section 3. In Section 4, we propose a distributed Q-learning algorithm for SPS evolutionary game model which can dynamically converge to the ESS. In Section 5, we show the experiment results to evaluate the performance of the proposed ESS-SPS algorithm. Last we conclude the paper in Section 6.
2. System Model and Utility Function2.1. System Model
Peers in hybrid heterogeneous P2P streaming system are divided into peer groups according to their geographical location as shown in Figure 1 [16]. We expect that the video streaming chunks of a certain video channel can be shared within one peer group as much as possible to reduce unnecessary network traffic. There are a set of group peers who want to view a live video streaming channel simultaneously in the edge of the hybrid heterogeneous P2P live streaming system. Every peer can choose to act as a super peer (SP) or an ordinary peer (OP) in a group. If a peer choose to act as an SP, then he/she not only needs to act as a client to download video streaming chunks from the SPs in other peer groups but also needs to act as a server to upload video streaming chunks for both the SPs in other groups and the peers in the same group.
Peer groups at the edge of hybrid heterogeneous P2P streaming system.
On the contrary, if a peer chooses to be an OP, he/she only needs to download or upload necessary streaming chunks in the same group. Assume that the upload and download bandwidth within a super group is larger than that cross groups. In such a case, peers tend to be an OP because of their selfish nature. However, from the other aspect, the OPs have a risk of receiving degraded streaming quality since there may not be sufficient SPs to download streaming chunks from other groups.
2.2. Utility Function
In the hybrid heterogeneous P2P streaming system, peers not only act as clients who download needed streaming chunks from other peers but also act as servers to upload video streaming chunks for the other peers. So, while a peer can benefit downloading video streaming chunks from the other peers, he/she also causes a cost in uploading video streaming chunks for the other peers, where the cost may be the resource consumed on uploading chunks, such as bandwidth, memory, and buffer size.
Assume that there are N peers within the group, among which s peers are willing to act as SPs to download video streaming chunks from the SPs of other groups. Suppose that the download rates of the s SPs are r1,r2,…,rs; then the total download rate of the group peers is ds=∑i=1sri.
As these s SPs select peers outside the group for downloading streaming chunks randomly and independently, the download rate ri's are random variables. According to [17], the cumulative distribution function (CDF) of a peer's download bandwidth can be modeled as a linear function, which means that the probability density function (PDF) of a peer's download bandwidth can be viewed as a uniform distribution, which means ri's are uniformly distributed.
Obviously, if the total download rate ds is no less than the source rate r, then the group peers can obtain the effect of real-time streaming, and all the group peers can obtain a certain profit G. Otherwise there will be some delay, and in this case we assume the gain of the peer is 0. Therefore, given the total download rate ds and the source rate r, if peer i chooses to be an SP, then its utility function is presented by
(1)πSP,i(s)=Pr(ds≥r)G-Ci,∀s∈[1,N],
where Ci is the cost of peer i when he/she acts as an SP, and Pr(ds≥r) is the probability that the peer can obtain real-time streaming effect. If we assume that r1,r2,…,rs are independent and identically distributed in [rL,rU], and s is sufficiently large, then Pr(ds≥r) can be approximately computed as Pr(ds≥r)≈Q((r^-s)/2)/(s/12)), where r^=(r-srL)/(rU-rL) and Q(x) is the Gaussian tail function ∫x∞(1/2π)exp-x2/2dx [18].
As the upload and download bandwidths within the group are large, it is assumed that the cost of uploading the streaming chunks to the other peers within the same group can be negligible. In this case, if a peer i chooses to be an OP instead of an SP, then there is no cost for him/her, and the utility function is
(2)πOP,i(s)={Pr(ds≥r)G,ifs∈[1,N-1];0,ifs=0.
3. Super Peer Selection (SPS) Evolutionary Game Model
We adopt the concept of Evolutionarily Stable Strategy (ESS) [19, 20] to provide a robust equilibrium strategy for the selfish peers. In evolutionary game, Evolutionary Stable Strategy (ESS) is a solution to game theoretic problems which is equivalent to the Nash equilibrium but can be applied to the evolution of individuals behavior, which is defined as Definition 1. If all individuals are using ESSs, then they will do not change to a different strategy unless their expected payoff will be worse than the rest of the population in the system.
Definition 1.
A strategy x* is an ESS, if and only if, for allx≠x*, x* satisfies
equilibrium condition: F(x*,x*)≥F(x,x*),
stability condition: if F(x,x*)=F(x*,x*), then F(x*,x)>F(x,x), where F(x1,x2) is the utility obtained by a player when he/she uses strategy x1 and the other player uses strategy x2.
Since all peers are selfish and rational, they will cheat if their payoffs can be increased by cheating, which means that all peers are uncertain to the behavior and payoffs of the other peers. In such a case, peers will learn from strategy interaction in each round of game and try different strategies to improve their own utility. Hence, the percentage of peers who use a certain pure strategy may change during this process. Such kind of population evolution process can be modeled by replicator dynamics method, which is the basic dynamic mechanism of evolutionary game.
In a dynamic evolutionary game, an individual from a population, which means a player in the game, who is able to reproduce itself through the process of mutation and selection, is called a replicator. In such case, a replicator with a higher payoff can reproduce itself faster. When the reproduction process takes place over time, this can be modeled by using a set of ordinary differential equations called replicator dynamics equations. Replicator dynamics refers to the growth rate of the peers using certain pure strategy is proportional to the difference between the average payoffs obtained by using the pure strategy and the average payoffs of the peers within the group, which is important for an evolutionary game since it can provide information about the population. Given a particular point in time, it can accurately describe the dynamic relationship between individual behavior payoff and the group system evolution.
For clarity, we assume that the cost of all peers serving as an SP is same firstly, and heterogeneous situation under hybrid P2P streaming systems will be discussed in the next section. In replicator dynamics, it is assumed that a peer chooses pure strategy i from a finite set of strategies B={SP,OP} including acting as an SP or acting as an OP. Let ni denote the number of individuals choosing strategy i, and let the total population size be N=∑i=12ni. The proportion of individuals choosing strategy i is xi=ni/N, and it is referred to as the population share. The population state can be denoted by the vector X={x1,x2}. The replicator dynamics can be described as follows:
(3)dxidt=xi(t)[π¯i(t)-π¯(t)],
where πi(t) is the average payoff obtained by the peers using pure strategy i, and π¯(t) is the average payoff of all peers within the group.
It can be seen from (3) that if using pure strategy i can get a higher payoff than the average level, the probability of a peer using strategy i will grow, and the growth rate is in proportion to the difference between the average payoff of using pure strategy i and the average payoff of all peers.
3.1. Analysis of the SPS Evolutionary Game
According to (1) and (2), the average payoff of a peer if he/she chooses to be an SP can be computed by
(4)π¯SP(x)=∑i=0N-1(N-1i)xi(1-x)N-1-i×[Pr(di+1≥r)G-Ci],
where x is the probability of a peer being an SP, and (N-1i)xi(1-x)N-1-i is the probability that there are i SPs out of other N-1 peers. Similarly, the average payoff of a peer if he/she chooses to be an OP is given by
(5)π¯OP(x)=∑i=1N-1(N-1i)xi(1-x)N-1-iPr(di≥r)G.
According to (4) and (5), the average payoff of a peer is
(6)π¯(x)=xπ¯SP(x)+(1-x)π¯OP(x).
Substituting (6) back to replicator dynamics differential equation (3), for i=SP, we have
(7)dxdt=x(1-x)[π¯SP(x)-π¯OP(x)].
At equilibrium point x*, no player will deviate from the optimal strategy, which means dx/dt|x*=0, and we can get x*=0, x*=1, or x* is the solutions to π¯SP(x)-π¯OP(x)=0. However, since dx/dt|x*=0 is only the necessary condition for x* to be ESS, we examine the sufficient condition for each ESS candidate and draw the following conclusions with the proofs shown in Theorems 3–5:
x*=0 is an ESS only when Pr(d1≥r)G-C≤0;
x*=1 is an ESS only when Pr(dN≥r)G-Pr(dN-1≥r)G≥C;
let x* be the solution to π¯SP(x)=π¯OP(x), and x*∈(0,1); then x* is an ESS.
Lemma 2.
Let g(x)=π¯SP(x)-π¯OP(x); then g′(x)<0, for all x∈[0,1].
Proof.
According to (4) and (5), we have
(8)g(x)=∑i=0N-1(N-1i)xi(1-x)N-1-ivi-C,
where vi=[Pr(di+1≥r)-Pr(di≥r)]G is the additional gain by introducing one more SP into the i-SPs P2P streaming system.
For any x∈[0,1], by taking the derivative of g(x) over x, we have
(9)g′(x)=∑i=0N-1(N-1i)xi(1-x)N-1-ivi-C=∑i=0N-1(N-1i)[ixi-1(1-x)N-1-i-xi(N-1-i)(1-x)N-2-i]vi=∑i=0N-1(N-1i)[xi-1(1-x)N-2-i×(i(1-x)-x(N-1-i))xi-1]vi=∑i=0N-1(N-1i)xi-1(1-x)N-2-i[i-x(N-1)]vi.
By introducing an integer i1 that satisfies i1≤(N-1)x<i1+1, we can get that
(10)g′(x)=∑i=0i1(N-1i)xi-1(1-x)N-2-i[i-x(N-1)]vi+∑i=i1+1N-1(N-1i)xi-1(1-x)N-2-i[i-x(N-1)]vi.
Since vi is a decreasing function in terms of i, which means that vi≥vi1 when for all i≤i1, and vi<vi1 when for all i>i1. Therefore, according to (10), we have
(11)g′(x)<∑i=0i1(N-1i)xi-1(1-x)N-2-i[i-x(N-1)]vi1+∑i=i1+1N-1(N-1i)xi-1(1-x)N-2-i[i-x(N-1)]vi1=vi1∑i=0N-1(N-1i)xi-1(1-x)N-2-i[i-x(N-1)]=vi1d[∑i=0N-1(N-1i)xi(1-x)N-1-i]dx=0.
And therefore, g′(x)<0, for all x∈[0,1].
Theorem 3.
The condition for x*=0 to be an ESS is Pr(d1≥r)G-C≤0.
Proof.
According to (4)–(6), the average payoff can be written as
(12)π¯(x,0)=π¯OP(0)+x[π¯SP(0)-π¯OP(0)],
where π¯SP(0)=Pr(d1≥r)G-C, π¯OP(0)=0.
If Pr(d1≥r)G-C>0, that is, π¯SP(0)>π¯OP(0), every peer will deviate to x=1 to get π¯SP(0) rather than π¯OP(0).
If Pr(d1≥r)G-C<0, that is, π¯SP(0)<π¯OP(0), every peer will select to stay at x=0 to obtain π¯OP(0) rather than π¯SP(0).
If Pr(d1≥r)G-C=0, that is, π¯SP(0)=π¯OP(0), then g(0)=π¯SP(0)-π¯OP(0)=0. According to Lemma 2, we know that g′(x)<0 is true for any x∈[0,1]; so g(x)=π¯SP(x)-π¯OP(x)<g(0)=0 holds for any x∈[0,1]. In such a case, we have π¯(0,x)=π¯OP(x)>π¯(x,x)=π¯OP(x)+x(π¯SP(x)-π¯OP(x)), which means x*=0 is an ESS according to Definition 1.
Therefore, x*=0 is an ESS only when Pr(d1≥r)G-C≤0.
Theorem 4.
The condition for x*=1 to be an ESS is Pr(dN≥r)G-Pr(dN-1≥r)G≥C.
Proof.
According to (4)–(6), the average payoff that a peer uses mixed strategy x and the other peer use mixed strategy x*=0 can be written as
(13)π¯(x,1)=π¯OP(1)+x[π¯SP(1)-π¯OP(1)],
where π¯SP(1)=Pr(dN≥r)G-C, π¯OP(1)=Pr(dN-1≥r)G.
If Pr(dN≥r)G-Pr(dN-1≥r)G<C, that is, π¯OP(1)>π¯SP(1), every peer will deviate to x=0 to get π¯OP(1) rather than π¯SP(1).
If Pr(dN≥r)G-Pr(dN-1≥r)G>C, that is, π¯OP(1)<π¯SP(1), every peer will select to stay at x=1 to obtain π¯SP(1) rather than π¯OP(1).
If Pr(dN≥r)G-Pr(dN-1≥r)G=C, that is, π¯OP(1)=π¯SP(1), then g(1)=π¯SP(1)-π¯OP(1)=0. According to Lemma 2, we know that g′(x)<0, for all x∈[0,1]. In such a case, we have π¯(1,x)=π¯OP(x)+1×(π¯SP(x)-π¯OP(x))>π¯(x,x)=π¯OP(x)+x(π¯SP(x)-π¯OP(x)), which means x*=1 is an ESS according to Definition 1.
Therefore, x*=1 is an ESS only when Pr(dN≥r)G-Pr(dN-1≥r)G≥C.
Theorem 5.
If x*∈(0,1) is a solution to equation π¯SP(x)=π¯OP(x), then x* is an ESS.
Proof.
Let π¯i(x,x*) be the average payoff of player i that uses mixed strategy x and the other peers use mixed strategy x*. Then we have
(14)π¯i(x,x*)=xπ¯SP(x*)+(1-x)π¯OP(x*).
Since x* is a solution to π¯SP(x)=π¯OP(x), we have π¯SP(x*)=π¯OP(x*), and (14) becomes
(15)π¯i(x,x*)=π¯SP(x*)=π¯i(x*,x*),
which means that x* satisfies the equilibrium condition defined in Definition 1.
Moreover, according to (6), we have π¯i(x,x)=π¯OP(x)+x(π¯SP(x)-π¯OP(x)) and π¯i(x*,x)=π¯OP(x)+x*(π¯SP(x)-π¯OP(x)). Therefore, we have π¯i(x*,x)-π¯i(x,x)=(x*-x)(π¯SP(x)-π¯OP(x)).
From Lemma 2, we know that g(x)=π¯SP(x)-π¯OP(x) is monotonically decreasing. Since π¯SP(x*)=π¯OP(x*), so when x<x*, we have π¯SP(x)-π¯OP(x)>0; and when x>x*, we have π¯SP(x)-π¯OP(x)<0. Therefore, for for all x≠x*, we have (x*-x)(π¯SP(x)-π¯OP(x))>0, that is, π¯i(x*,x)>π¯i(x,x), for all x≠x*, which means x* satisfies the stability condition defined in Definition 1. So we know that x* is an ESS.
3.2. SPS Model for Hybrid Heterogeneous P2P Streaming System
In this section, we will further discuss how to choose super peers in heterogeneous peer groups for hybrid P2P streaming systems, where peer nodes may be different in performance when they choose to be a super group peer.
Let xi,bi be the probability of peer i use pure strategy bi∈B. According to the replicator dynamics equation, which refers to the growth rate of the proportion of the peers using certain pure strategy is proportional to the difference between the payoffs obtained by using the pure strategy and the average payoffs of the peers within the group; the evolutionary dynamics of xi,bi is given by the differential equation as follows:
(16)dxi,bidt=xi,bi[π¯i(bi,x-i)-π¯i(xi)],
where π¯i(bi,x-i) is the average payoff of peer i using pure strategy bi, and π¯i(xi) is the average payoff of peer i using mixed strategy.
To simplify the analysis, we start from the two-player game, and then the results will be extended to general multiplayer game.
Let x1, x2 be the probability of peer 1 and peer 2 who choose to act as an SP, respectively. Let B1=Pr(d1≥r)G and B2=Pr(d2≥r)G; then the payoff matrix of peer 1 and peer 2 can be presented in Table 1. And then, the average payoff π¯1(SP,x2) can be calculated by
(17)π¯1(SP,x2)=x2(B2-C1)+(1-x2)(B1-C1),
and the average payoff π¯1(x1) can be calculated by
(18)π¯1(x1)=x1x2(B2-C1)+x1(1-x2)(B1-C1)+x2(1-x1)B1.
Payoff matrix of the two-player super peer selection evolutionary game.
Peer 2
“SP”
“OP”
Peer 1
“SP”
(B2-C1,B2-C2)
(B1-C1,B1)
“OP”
(B1,B1-C2)
(0,0)
From (16), the replicator dynamics equation of peer 1 can be calculated by
(19)dx1dt=x1(1-x1)[B1-C1-(2B1-B2)x2].
And the replicator dynamics equation of peer 2 can be calculated by
(20)dx2dt=x2(1-x2)[B1-C2-(2B1-B2)x1].
At the equilibrium point, we know that (dx1/dt)=0 and (dx2/dt)=0. According to (19) and (20), we can get five equilibriums. They are (0,0),(0,1),(1,0),(1,1), and the mixed strategy equilibrium ((B1-C2)/(2B1-B2),(B1-C1)/(2B1-B2)).
According to [20], in a evolutionary dynamic system, if an equilibrium of the replicator dynamics equations is a locally asymptotically stable point, it is an ESS. Equations (19) and (20) can be regarded as a nonlinear dynamic system and we can examine whether the five equilibriums are ESSs by analyzing the Jacobian matrix of the nonlinear dynamic system. By taking partial derivatives of (19) and (20), we can get the Jacobian matrix as
(21)J=(∂x˙1∂x1∂x˙1∂x2∂x˙2∂x1∂x˙2∂x2)=(J11J12J21J22),
where J11=(1-2x1)(B1-C1-(2B1-B2)x2), J12=-x1(1-x1)(2B1-B2), J21=-x2(1-x2)(2B1-B2), and J22=(1-2x2)(B1-C2-(2B1-B2)x1). At the locally asymptotically stable point, we have that det(J)>0, and tr(J)<0 [20]. By substituting the five equilibriums into (21), we can get that
if B2-B1-C1>0 and B2-B1-C2>0, there is a unique ESS (1,1), which means that both peer 1 and peer 2 will converge to be SPs;
else if B2-B1-C1>0 while B2-B1-C2<0, there is a unique ESS (1,0), which means that peer 1 will converge to be an SP and peer 2 tends to be an ordinary peer;
else if B2-B1-C1<0 while B2-B1-C2>0, there is a unique ESS (0,1), which means that peer 2 will converge to be an SP and peer 1 tends to be an ordinary peer;
else there are two ESSs (0,1) and (1,0), where the converged strategy profiles depend on the initial strategy profiles.
From the above analysis, we can see that when the gain of being an SP (B2-B1) is greater than the cost of being an SP (C1 or C2), the peer tends to be an SP. And the peer with a higher cost tends to be a free rider and rely on the peer with a lower cost.
After analyzing the two-player Super Peer Selection game, we can deduce from it that the peer with a higher cost (Ci) tends to rely on the peer with a lower cost, which can be extended to multiplayer game. If there are multiple peers in the Super Peer selection game, the strategy of the peers whose cost is higher will converge to “OP” with greater probabilities. Conversely, the peers with lower costs tend to be SPs who will suffer relatively severer losses if no one serves as an SP.
4. A Distributed Q-Learning Algorithm for ESS
From the previous section, we can see that the ESS can be found by solving the replicator dynamics equations. However, solving the replicator dynamics equations requires the exchange of strategies adopted by other peers and their private information. In such a case, a Q-learning approach [21] which is a type of reinforcement learning (i.e., learning by interaction) is used. With this ability to learn, complete payoff information of other users in the same group is no longer required for SP selection. In this section, we will present a distributed Q-learning algorithm that can gradually converge to ESS without information exchange.
We first discretize the replicator dynamics equation as
(22)xi(t+1)=xi(t)+[π¯i(t)-π¯(t)]xi(t),
where t is the slot index and xi(t) is the probability of peer i being an SP during slot t. Here, we assume that each slot can be further divided into M subslots and each peer can choose to be an SP or an OP at the beginning of each subslot.
From (22), we can see that in order to update xi(t+1), we need to first compute π¯i(B,x-i(t)) and π¯i(xi(t)). Let us define an indicator function oi(t,k) whose values equal to 1 if player i chooses to be an SP at the beginning of kth subslot in time slot t. Otherwise, the function value is 0.
The directly payoff of player i at subslot k in slot t can be computed by (23)πi(t,k)={G-Ci,ifpeerichoosestobeanSP,rt≥r,-Ci,ifpeerichoosestobeanSP,rt<r,G,ifpeerichoosestobeanOP,rt≥r,0,ifpeerichoosestobeanOP,rt<r,
where rt is the total download rate of the SPs and r is the video source rate.
Then, π¯i(B,x-i(t)) can be approximated as
(24)π¯i(B,x-i(t))=∑k=1Mπi(t,k)oi(t,k)∑k=1Moi(t,k).
Similarly, π¯i(xi(t)) can be approximated as
(25)π¯i(xi(t))=1M∑k=1Mπi(t,k).
In the distributed Q-learning algorithm, Q-value (i.e., Q(t)) is used to maintain the knowledge about each peer, and the decision can be made based on this knowledge. The Super Peer Selection algorithm is described in Algorithm 1.
Algorithm 1: A distributed Q-learning. Super Peer Selection algorithm for ESS (ESS-SPS).
Step 1: Given the slot index t=0, each peer initializes xi(t)=xi(0) with a random between [0,1].
Step 2: Initialize Q-value associated with Q(t)=0 for all peers in super groups.
Step 3: loop
During slot t, for k = [1 : M]
if xi(t)≤γ then
peer i randomly choose to be a SP to serve as a super peer to download streaming chunks
from the peers outside the group with dowpnload rate ri(t,k) or to be an OP to download
streaming chunks from the SP.
else
Choose strategy
b*=argmaxbQ(k)
end if
peer i computes the indicator function oi(t,k) and his/her directly payoff πi(t,k) using (23).
Step 4: peer i computes average payoff π¯i(B,x-i(t)) and π¯i(xi(t)).
Step 5: Updates the probability of being a SP xi(t+1) using (22).
In Algorithm 1, a peer performs the exploration step with probability γ, and λ denotes the learning rate that is used to control the speed of adjustment of the Q-value. A new Q-value Q(k+1), which is the expected payoff for the future iterations, is obtained based on the previous value Q(k) along with the new observed payoff π¯i(xi(t)). Here, the new observed payoff is biased by the outcome of choosing the best action based on the available knowledge (i.e., maxbQ(k)).
5. Experimental Results
For evaluation purposes, we consider the traditional randomly super peers selection method, which is denoted as Random SPS for performance comparison. In Random SPS, each peer acts as an individual and randomly selects some peers from its partner list for downloading video chunks. Such a protocol has been widely used in the existing P2P systems, such as PPLive [22] and Cool Streaming [23]. For convenience, in the rest of this paper, we denote the proposed ESS-based approach as ESS-SPS.
In this section, we describe the experiment results which are conducted on the LStream P2P live streaming platform deployed on the Henan Education and Research NETwork (HERNET). In our experiments, there are about 2000 users in the hybrid heterogeneous P2P live streaming system. The video is initially stored at an original video server with upload bandwidth r=3Mbps. The request round is 1 second and the relay buffer length is 30 seconds and the streaming fragment size is L=1KB.
In the first experiment, assume that there are 20 heterogeneous peers in one super group, their gain is G=1, and their cost Ci is a random between [0,0.3]. The first experiment compares the differences between ESS-SPS method and traditional Random-SPS method in the aspect of the obtained social welfare, which is defined by the sum of the utility obtained by all peers. As it is shown in Figure 2, ESS-SPS method obtains a stable Nash equilibrium and chooses suitable numbers of SPs through the collaboration among the peers in the same super group. It maintains a relatively higher social welfare level under the condition of relatively higher video source rate r. But the social welfare level of traditional Random-SPS method decreases linearly and rapidly with the source rate increasing.
The social welfare comparison between ESS SPS and Random SPS.
In the second experiment, we evaluate the convergence property of the ESS-SPS. In Figure 3, we show that the replicator dynamic of the SPS game model with heterogeneous peers. We can see that, starting from high initial values, all peers gradually reduce their probabilities of being an SP since being an OP more often can bring a higher payoff. However, because too low probability of being an SP may increase the chance of having no peer be an SP, the probability of being an SP will finally converge to a certain value which is determined by the number of peers in the group.
Behavior dynamics of peers in a heterogeneous super group.
The third experiment compares the upload capacity of the P2P streaming network using ESS-SPS algorithm and traditional Random-SPS method. As it is shown in Figure 4, when the nodes number up to about 2100, the load of the super peers close to saturation, and when the system scale continues to increase, the increase in the upload capacity of the P2P streaming system is not obvious. On the contrary, ESS-SPS algorithm eliminates the system bottlenecks and the upload capacity of the P2P streaming system increased steadily with the number of peers increasing.
The upload capacity of P2P streaming system comparison between ESS-SPS and Random SPS.
The fourth experiment compares the average upload rates of super peers using ESS-SPS algorithm and traditional Random-SPS method, respectively. Average upload rates of super peers is the ratio of the total sum of upload rate of all the super peers to the number of them, which reflects the upload capability for the P2P streaming system provided by the super peers. As it is shown in Figure 5, the SPs generated by ESS-SPS provide higher upload capacity for the P2P streaming system and the average upload rates of super peers increase steadily with system uptime increasing.
The average upload rates of SPs comparison between ESS-SPS and Random SPS.
6. Conclusion
In this paper, we propose a super peer selection game model based on evolutionary game, which is used in hybrid heterogeneous P2P streaming system to address the network inefficiency problem encountered by the traditional randomly super peer selection scheme. By deriving the ESS for every peer, we further propose a distributed Q-learning algorithm for each peer to converge to the ESS by learning from his/her own past payoff history. The experiment results indicate that, compared with the traditional Random-SPS scheme, the proposed method achieves much better social welfare and enables the upload capacity of the whole P2P streaming system to increase stably with the peer number increasing.
Acknowledgments
This research was supported by the National High Technology Research and Development Program of China 2009AA012201. The authors are grateful to the anonymous reviewers for a careful checking of the details and for the helpful comments that improved this paper.
RamzanN.ParkH.IzquierdoE.Video streaming over P2P networks: challenges and opportunities2012275401411LoV.ZhouD.LiuY.GauthierDickeyC.LiJ.Scalable supernode selection in peer-to-peer overlay networksProceedings of the 2nd International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P '05)July 2005San Diego, Calif, USA18272-s2.0-3374942363310.1109/HOT-P2P.2005.17MinS. H.ChoD. S.LeviA.Super-peer selection based framework using dynamic capacity and similarityProceedings of the Computer and Information Sciences (ISCIS '06)2006Berlin, GermanySpringer803812GaoL.PengM.Optimal superpeer selection based on load balance for P2P file-sharing systemProceedings of the 1st IITA International Joint Conference on Artificial Intelligence (JCAI '09)April 2009Haikou, China92952-s2.0-7035078379610.1109/JCAI.2009.165MinS. H.HollidayJ.ChoD. S.Optimal super-peer selection for large-scale P2P systemProceedings of the International Conference on Hybrid Information Technology (ICHIT '06)November 2006Jeju Island, Korea5885932-s2.0-3424722203410.1109/ICHIT.2006.253666JinY.LiuY.ZhaoH.Trust-based supernode selection in peer-to-peer systemsProceedings of the 2nd International Conference on Future Computer and Communication (ICFCC '10)May 2010Wuhan, ChinaV1285V12892-s2.0-7804935007310.1109/ICFCC.2010.5497787YuC.GouX.ZhangC.JiY.Supernode election algorithm in P2P network based upon district partition2011511861942-s2.0-7995448897810.4156/jdcta.vol5. issue1.19GaoZ. W.GuZ. M. WangW. B.SPSI: a hybrid super-node election method based on information theoryProceedings of the 14th International Conference on Advanced Communication Technology (ICACT '12)201210761081WolfS.On the complexity of the incapacitated single allocation p-hub median problem with equal weights2007University of KaiserslauternWolfS.MerzP.Evolutionary local search for the super-peer selection problem and the p-Hub Median Problem20071527KraticaJ.KojićJ.TošićD.FilipovićV.DugošijaD.Two hybrid genetic algorithms for solving the super-peer selection problem200958337346WangS.YangS.ShenK.XieL.A super node selecting mechanism based on AHPProceedings of the 7th International Conference on Grid and Cooperative Computing (GCC '08)October 20084034062-s2.0-5794910742010.1109/GCC.2008.87ShohamY.Computer science and game theory200851875792-s2.0-4824908861710.1145/1378704.1378721RoyS.EllisC.ShivaS.DasguptaD.ShandilyaV.WuQ.A survey of game theory as applied to network securityProceedings of the 43rd Annual Hawaii International Conference on System Sciences (HICSS '10)January 2010Poipu, Hawaii, USA2-s2.0-7795172973410.1109/HICSS.2010.35LStream P2P Live Streaming System, http://lstream.ha.edu.cn/LuX.WuQ.LiR.LinY.On tree construction of super peers for hybrid P2P live media streamingProceedings of the 19th International Conference on Computer Communications and Networks (ICCCN '10)August 2010Zurich, Switzerland162-s2.0-7795845121610.1109/ICCCN.2010.5560099HuangY.FuT. Z. J.ChiuD. M.LuiJ. C. S.HuangC.Challenges, design and analysis of a large-scale p2p-vod systemProceedings of the ACM SIGCOMM Conference on Data Communication (SIGCOMM '08)August 2008New York, NY, USA3753882-s2.0-6524915640110.1145/1402946.1403001ShengZ.XieS. Q.PanC. Y.4thHigher Education
PressSmithJ. M.1982Cambridege University PressWangB.LiuK. J. R.ClancyT. C.Evolutionary cooperative spectrum sensing game: how to collaborate?20105838909002-s2.0-7794934549010.1109/TCOMM.2010.03.090084FudenbergD.LevineD. K.19982Cambridge, Mass, USAMIT Pressxvi+276MIT Press Series on Economic Learning and Social EvolutionMR1629477SpotoS.GaetaR.GrangettoM.SerenoM.Analysis of PPLive through active and passive measurementsProceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS '09)May 20092-s2.0-7044984914210.1109/IPDPS.2009.5160956LiB.XieS.QuY.KeungG. Y.LinC.LiuJ.ZhangX.Inside the new Coolstreaming: principles, measurements and performance implicationsProceedings of the 27th IEEE Communications Society Conference on Computer Communications (INFOCOM '08)April 2008Phoenix, Ariz, USA103110392-s2.0-5134910224210.1109/INFOCOM.2007.157