Energy-Efficient Relay Selection Scheme for Physical Layer Security in Cognitive Radio Networks

Security is a critical issue in cognitive radio (CR) relay networks. Most previous work concentrates on maximizing secrecy capacity (SC) as a criterion to guarantee the security requirements in CR relay networks. However, under the requirement of “green” radio communication, the energy consumption is largely ignored. This paper proposes a relay selection scheme which jointly considers the best relay selection and dynamic power allocation in order tomaximize SC and tominimize energy consumption.Moreover, we consider finite-state Markov channels and residual relay energy in the relay selection and power allocation process. Specifically, the formulation of the proposed relay selection and power allocation scheme is based on the restless bandit problem, which is solved by the primal-dual index heuristic algorithm. Additionally, the obtained optimal relay selection policy has an indexability property that dramatically reduces the computational complexity. Numerical results are presented to show that our proposed scheme has the maximum SC and minimum energy consumption compared to the existing ones.


Introduction
Cognitive radio (CR) is a promising technology to improve the utilization efficiency of the wireless spectrum resources [1].In CR networks, the secondary users (SUs) are allowed to transmit concurrently on the same spectrum bands with the licensed primary users (PUs), as long as the resulting interference power at the PUs' receivers is kept below the interference temperature limit.Such an operation mode is known as spectrum underlay [2].In the underlay paradigm, the performance of the SUs degrades significantly in fading environments due to the constraints on their transmission power.One of the efficient ways to enhance the performance of SUs is to use cooperative relaying, which is capable of mitigating wireless channel fading [3], saving transmission power [4,5], and increasing capacity [6][7][8] through multipath propagation offered by cooperative nodes.
The security concerns in CR relay networks have been attracting continuously growing attention [9].Due to the open nature of wireless transmission medium, the CR relay networks are particularly susceptible to eavesdropping [10].Traditionally, the cryptographic techniques have been employed to protect the communication confidentiality against eavesdropping attacks, which, however, increases the computational and communication overheads and introduces additional system complexity for the secret key distribution and management.
As an alternative, physical layer security has emerged as a new secure communication method to defend against eavesdroppers by exploiting the physical characteristics of wireless channels.This work was initiated by Wyner in [11], in which the notion of secrecy capacity is developed from an information-theoretical prospective and shown to be the difference in capacities between the main channel (i.e., the channel from the transmitter to the legitimate receiver) and the wiretap channel (i.e., the channel from the transmitter to the eavesdropper).It was proved in [12,13] that if the wiretap channel is stronger than the main channel, the eavesdropper will succeed in intercepting the source information.Some recent work has been proposed to overcome this limitation by taking advantage of multiple-antenna [14][15][16][17] and cooperative relay [18][19][20][21] techniques.For instance, Pei et al. [14,15] addressed the secrecy capacity optimization problem in multiple-input single-output (MISO) CR networks.
2 Mathematical Problems in Engineering Kwon et al. [16] explored MISO CR systems where the SUs secure the PUs in return for permission to use the spectrum.Zhang et al. [17] proposed efficient algorithms to solve the secrecy capacity maximization problem in multipleinput multiple-output (MIMO) CR networks.Apart from this, Zou et al. [18] proposed user scheduling scheme to achieve multiuser diversity for improving the security level of cognitive transmissions.Sakran et al. [19] proposed a relay selection scheme in CR networks where the considered scheme selects a trusted decode and forward relay to assist SUs and maximize the secrecy capacity that is subjected to the interference power constraints at the PUs.The power allocation strategies for relays were introduced in [20] with the goal of maximizing the total secrecy capacity in CR networks.Authors in [21] studied the relay precoding scheme to improve the secrecy capacity of SUs in CR systems.
Notice that the aforementioned work [14][15][16][17][18][19][20][21] on CR networks addressed the issue of secrecy capacity maximization but did not take into account the energy consumption.In wireless networks, most wireless devices are powered by batteries with limited energy.The network lifetime is an important factor to characterize the performance of such networks.In order to prolong the network lifetime, the battery energy should be consumed efficiently.In CR relay networks, the improvement of energy consumption can be realized by reducing the transmission power and balancing energy consumption among relays.However, the reduction of transmission power leads to degradation of the secrecy capacity.Therefore, the secrecy capacity and the energy consumption should be jointly considered for efficient implementation of CR relay networks.In addition, most previous works for relay selection use the current observed channel conditions to make the relay selection decision for subsequent data transmission.However, this memoryless channel assumption is not realistic in the time-varying radio environments.Finitestate Markov models have been considered as an effective approach to characterize the time-varying nature of the radio environments.
In this paper we propose an energy-efficient relay selection scheme which jointly considers best relay selection and dynamic power allocation in order to maximize SC as well as to minimize energy consumption.The main contributions of this paper are summarized as follows.
(1) A scenario in which a secondary transmitter () communicates with a secondary destination () with the help of the best relay in the presence of different numbers of PUs and eavesdroppers is considered.(2) An energy-efficient relay selection scheme which jointly considers best relay selection and dynamic power allocation is proposed to maximize SC and minimize energy consumption.(3) In order to accurately describe the time-varying characteristic, the spectrum occupancy state, the channel state information (CSI) of the related channels, and residual relay energy are modeled as finite-state Markov model.(4) The relay selection and dynamic power allocation scheme is formulated as restless bandit problem, which is solved by the primal-dual index heuristic algorithm.The obtained optimal relay selection policy has an indexability property that dramatically reduces the computational complexity.Simulation results show that the proposed scheme outperforms the existing one in terms of the achievable secrecy capacity and energy consumption.
The remainder of this paper is organized as follows.In Section 2, the system model is described.Section 3 formulates the relay selection and dynamic power allocation scheme as a restless bandit problem and solves the problem with the primal-dual index heuristic algorithm.Extensive simulation results are presented and analyzed for performance evaluation in Section 4. Finally, Section 5 concludes the paper.

System Model and Secrecy Capacity
We consider an underlay CR system with the coexistence of primary and secondary networks.As depicted in Figure 1, in the primary network, a primary transmitter (PT) communicates with  primary destinations (PDs) denoted by PD = {PD  |  ∈ M = {1, 2, . . ., }}.Meanwhile, in the secondary network, a secondary transmitter () wants to send confidential information to a secondary destination () assisted by the best relay selected from the candidate relay set, R = {  |  ∈ N = {1, 2, . . ., }}, over the spectrum band that is licensed to the primary network.At the same time,  eavesdroppers, denoted by  = {  |  ∈ K = {1, 2, . . ., }}, try to eavesdrop and intercept the message sent by  and relay nodes.A Rayleigh block-fading channel is assumed in this paper.We define ℎ , , ℎ ,  , ℎ ,  , ℎ ,  , ℎ   , , ℎ   ,  , and ℎ   ,  , as the channel coefficient of - link, -  link, -  link, -  link,   - link,   -  link, and   -  link, respectively, where  ∈ N,  ∈ M, and  ∈ K.In addition, the global channel state information (CSI) is assumed to be available, and even the eavesdroppers' channels are known when the eavesdropper is also a user of the secondary network, but it is not the intended destination for some particular confidential information [22,23].

Cooperative Relaying Protocol and Secrecy Capacity.
We consider the decode and forward (DF) relaying protocol with two stages.In the first stage,  transmits its encoded information with transmission power   to the relay nodes.In the second stage, the selected relay   reencodes the message and forwards it to  with transmission power    .Meanwhile, the eavesdroppers can overhear the information at the two stages due to the broadcast nature of wireless medium.For the secondary transmission in the presence of  eavesdroppers, the secrecy capacity is characterized as where [] + = max(, 0);   and   are the achievable rates at  and , respectively.The achievable rate at  can be written as In this paper, we assume that  eavesdroppers independently perform their tasks to intercept the secondary transmission.The overall rate of the wiretap links is the maximum of individual rates achieved at  eavesdroppers.Thus, the overall rate can be obtained as where  2 is the noise power of all the links.To guarantee the QoS of PDs, the transmission power of  and   is limited by the interference temperature limit  th ; that is, where  ,max is the maximum transmission power limit.

Finite-State
where

Finite-State Markov Spectrum Occupancy and Energy
Model.In CR relay networks, the radio spectrum is either occupied by the primary users or not.The spectrum state () evolves according to a two-state Ω = {0, 1} Markov model, where () = 1 means that the spectrum is occupied by the primary users and () = 0 shows that the spectrum is idle.
Let    V  () denote the probability that () transits from state   to state V  at time .The 2 × 2 spectrum occupancy state transition probability matrix is defined as where The residual energy of the battery powered relay   ( ∈ N) can also be modeled by a finite-state Markov energy model [24].In this model, the continuous battery residual energy is divided into discrete levels denoted by E = {E 0 , E 1 , . . ., E −1 }, each corresponds to an energy state in Markov chain;  is the number of energy sate levels.Let      () denote the probability that residual energy   () of   transits from state   to state   at time .The  ×  energy state transition probability matrix is defined as where

Mathematical Problems in Engineering
We need to find out the optimal relay selection and power allocation scheme, which can set one relay to be active at time slot  according to the relays' states that contain their channel state   () ∈ C, where (, ) belongs to {(,   ), (,   ), (  , ), (  ,   ), (  ,   )}, residual energy state   () ∈ E, and the spectrum state () ∈ Ω.Our optimization objective is to maximize the secrecy capacity as well as to minimize the energy consumption.

Stochastic Formulation
In this section, we propose the relay selection and power allocation scheme to defend against eavesdropping attacks and to save the energy consumption.The proposed scheme can be formulated as a restless bandit problem which has been widely used to solve the stochastic selection issues [25].In the restless bandit system, the relay is equivalent to the arm, the relay selection and power allocation are the actions of the arm, and the secrecy capacity and energy consumption correspond to the reward.The restless bandit problem can be solved according to the indices of the arms, which is calculated by a primal-dual index heuristic algorithm.

Formulation of the Restless Bandit Problem
3.1.1.Action Space of Relay.At time slot , each relay node decides whether to cooperate with the confidential communication between  and  or not and then decides how much power is provided if it joins the cooperation.Thus, the action of relay   ( ∈ N) in time slot  is represented by   () = (   (),    ()), where    () ∈ {0, 1}, 0 denotes that the relay is passive and 1 denotes that the relay is active.If the relay is active,    () is the corresponding power allocation which must satisfy the power constraint in (5).For  relays in time slot , the action space is A = { 1 (),  2 (), . . .,   ()}.In our proposed scheme, we only select a single relay to assist with data transmission.Hence, the relay selection satisfies ∑  =1    () = 1.

State Space and Transition
The changes of the channel states    (),    (),     (),      (), and      (), spectrum state (), and residual energy state   () are independent of each other.The relay state   () evolves in a Markov fashion with a finite-state space S  ,   () ∈ S  .The state transition probablity matrix of relay   is defined as where ,    V  and       are defined in ( 6), (7), and (8), respectively, and  = || 4 × 2 × .The element of P   () is       , denoting the transition probability that the state of relay   transits from   to   , where   ,   ∈ S  and  ∈ A.

System
Reward.The goal of our proposed relay selection and power allocation scheme is to maximize SC and to minimize energy consumption in CR relay networks.Thus, we formulate the system reward to be the function of the SC, the residual relay energy, and the energy consumption.At time slot , if relay   , in state   (), takes action   (), then the immediate reward is earned: where  1 ,  2 , and  3 are weights and   (  (),   ()) is the achievable secrecy capacity and calculated by (1) while   () and    are the residual energy and power consumption.
The immediate reward    ()   () is earned when relay   takes action   () in state   ().For a stochastic process, a maximum immediate value is not equivalent to the maximum expected long-term accumulated value.We assume that the duration of the whole communication is long enough and that  is approximately infinite.We denote by  the discount factor and denote by U the set of admissible Markovian policies.The relay selection and power allocation problem is to find an optimal scheduling policy  ∈ U that maximizes the expected total discounted reward over an infinite horizon and compute its optimum value: (12) where Z * is the optimal expected total discounted reward.The discount factor is required to be 0 <  < 1 to ensure that the expected total discounted reward is converged over an infinite horizon.

Solution to the Restless Bandit Problem.
The restless bandit problem mentioned above can be solved by the primal-dual index heuristic algorithm based on the firstorder LP relaxation, which has been demonstrated to have less complexity and very close performance compared to the optimal one [25].

Linear Programming (LP) Relaxation.
In order to formulate the restless bandit problem as a linear program we introduce performance measures: where  ∈ U is an admissible scheduling policy, Reference [25] proved that the performance region  is the restless bandit polytope .The restless bandit problem can thus be formulated as the linear program: The approach developed in [25] is to construct relaxations of polytope  so as to yield polynomial-size relaxations of linear program.Denote by X ⊇  the relaxations not on the original variables      , but in a higher-dimensional space that includes new auxiliary variables.Define  1  = {  = (     ())   ∈S  ,  ∈{0,1},∈N |  ∈ U}, which is precisely the projection of restless bandit polytope  over the space of the variable      for   .A complete formulation of  1   is given by [25]: where    denotes the probability that the initial state of relay   is   .According to Whittle's condition, the average number of active relay can be written as In our scheme, only one relay is selected at each time slot, so  = 1.
Therefore, the first-order relaxation can be formulated as the linear program There are O(||) variables and constraints of this linear program (LP 1 ), with the polynomial size in the problem dimensions.

Primal-Dual Priority Index Heuristic.
In this section, we present a heuristic for the restless bandit problem, which uses information contained in optimal primal and dual solutions to the first-order relaxation (LP 1 ).The primal-dual heuristic is interpreted as a priority-index heuristic as well.
The dual of linear program (LP 1 ) is Let {     } and {   , } be an optimal primal and dual solution pair to the first-order relaxation (LP 1 ) and its dual ( 1 ).The corresponding optimal reduced cost coefficients {     } are defined as which must be nonnegative. 0   and  1   are the rates of decrease in the objective value of linear program (19) per unit increase in the value of variables  0   and  1   , respectively.Based on the cost coefficients, the index of relay   in state   is defined as The priority-index rule is that the relay with the smallest index is selected to be active.

Process of Relay Selection and Power Allocation Scheme.
In this section, we present the indexable relay selection and power allocation scheme in CR relay networks.Our proposed scheme is divided into offline computation and online selection.The specific procedure is given in Algorithm 1.
Algorithm 1 (process of relay selection and power allocation scheme).Consider the following steps.
Step 1 (offline computation).( 1) According to the spectrum state, channel state, and residual relay energy state, the state space and transition probability matrices under different actions can be determined.
(2) Input the state transition probability       , the reward      , the discount factor , and initial state probability    and then compute the priority indices {   } according to ( 20)- (22); the indices {   } are stored in an index-table .(3) Each relay stores this index-table.
Step 2 (online selection).( 1) At the beginning of each time slot , all the candidate relays sense the spectrum occupancy state, estimate the channel gain, and detect the residual energy to obtain the spectrum state, channel state, and residual energy state.
(2) Each candidate relay shares its state   with each other.
(3) Each candidate relay looks the indices up for all relays in the index-table; the relay with the smallest index is selected to be active, and the corresponding power allocation can be obtained.

Numerical Results and Analysis
In this section, numerical results are provided to show the physical-layer security and energy consumption improvement by exploiting the proposed relay selection and power allocation scheme.The maximum transmission power limit  ,max for  and   is set as 150 mw; the battery capacity of each relay is set to be 1000 mAh with the output voltage 1 Volt.The discount factor  is 0.7.The links -  , -  ,   -,   -  , and   -  are divided into "bad" and "good" states; the spectrum occupancy state is "busy" and "idle." For both Φ  () and Ψ  (), the transition probability between the different states is 0.3 and the probability of staying in the same state is 0.7.The residual energy of each relay is divided into "high, " "low, " and "dead" states; set Θ 1  to be the residual energy state transition probability matrix when the relay   is active and Θ 0  when it is passive; that is, 1.00 0.00 0.00 0.01 0.99 0.00 0.00 0.01 0.99 1.00 0.00 0.00 0.08 0.92 0.00 0.00 0.08 0.92 ) . ( The following methods are simulated for comparison: (i) the proposed relay selection and power allocation scheme; (ii) the memoryless relay selection scheme, in which the relay node is selected for subsequent data transmission according to the current channel condition;  (iii) the traditional relay selection scheme [8], in which the eavesdroppers' channel condition is not taken into account; (iv) the arbitrary relay selection scheme.
The computational complexity of the proposed relay selection and power allocation scheme and that of those existing schemes are tabulated in Table 1. denotes the number of candidate relays and  denotes the time horizon.Compared to the memoryless selection scheme with O((2 − 1) × ) order of operations, the complexity of the proposed selection scheme is reduced due to the indexability property.It also shows that the complexity of the arbitrary selection scheme is independent of the number of candidate relays.

Secrecy Capacity Performance
Improvement.This subsection presents the numerical secrecy capacity results of the proposed relay selection scheme.We do not consider the energy issue here, which will be considered later.Thus, the weighs can be specified as  1 = 1,  2 = 0, and  3 = 0.
Figure 2 shows the average secrecy capacity improvement of the proposed scheme with the different number of candidate relays.We assume that the interference temperature limit  th = 5 mw and eavesdropper  = 1.We can see that as the number of candidate relays increases; the probability that there exists a candidate relay with better state is high so that there is always a good candidate for the relay selection schemes.It also can be seen that the proposed scheme always has the larger average secrecy capacity compared with the memoryless scheme, the traditional scheme, and the arbitrary scheme.This is because the memoryless scheme selects the relay node for subsequent data transmission according to the current channel condition, which may change during the subsequent data transmission.The traditional scheme does not take the eavesdropper channels into account; it is not able to support systems with secrecy constraint.The arbitrary selection scheme has the worst secrecy capacity performance.Figure 3 shows the average secrecy capacity versus the number of eavesdroppers for different schemes with candidate relays  = 5 and interference temperature limit  th = 5 mw.We can see that as the number of eavesdroppers increases, the achievable secrecy capacity of all the schemes is significantly reduced.This is because with the number of eavesdroppers increasing, the probability that the wiretap links become much better than the main link is high.As a result, the eavesdroppers will most likely succeed to intercept the legitimate transmission.However, the proposed scheme defends more effectively against eavesdropping attacks than the existing schemes, which confirms the advantage of the proposed scheme.
Figure 4 illustrates the average secrecy capacity under different interference temperature limits  th with candidate relays  = 5 and eavesdropper  = 1.We can observe that the average secrecy capacity of all schemes changes nonsignificantly when  th ⩾ 10dBm and increases with the increasing  th when  th < 10 dBm.This is due to the fact that when the interference temperature limit  th is less than 10 dBm and the spectrum is sensed to be "busy, "   's transmission power directly depends on  th to guarantee the QoS of primary users.

Energy Consumption Improvement.
In this subsection, we demonstrate the energy consumption improvement of the proposed scheme.We set the weights as  1 = 0.5,  2 = 0.3, and  3 = 0.2.For fair comparison, the memoryless selection scheme is revised to select relay with the highest residual energy without considering the energy consumption.The traditional selection scheme selects relay to maximize the achievable data rate and minimize the energy consumption without considering eavesdroppers while the arbitrary selection scheme selects relay among the alive relay nodes.Figure 5 shows the average reward comparison among the proposed scheme, the memoryless scheme, the traditional scheme, and the arbitrary scheme.There are  = 5 candidate relays and  = 1 potential eavesdropper.Since more and more relays run out of energy with the increase of simulation time, the number of available relays decreases.As a result, the average reward for all the schemes declines with time.The proposed scheme outperforms the other three schemes.This is because the achievable secrecy capacity and the energy consumption contribute to the reward.The proposed scheme selects relay node that costs less energy at the decision time, while the memoryless scheme and the arbitrary schemes do not take the energy consumption into consideration, and the traditional scheme's objective is to maximize the achievable data rate and minimize the energy consumption without considering eavesdroppers, so it cannot support the secure secondary transmission.
The energy consumption also has some effects on the average secrecy capacity.As shown in Figure 6, the average secrecy capacity declines with increasing simulation time.This is due to the fact that increasingly more relay nodes run out of energy after data transmission for some time slots.It can be seen that there is hardly any live relay at about 1000 s, and the average secrecy capacity of the proposed scheme outperforms the other selection schemes.
Figure 7 compares the energy consumption for different relaying schemes with  = 5 available relays and  = 1 potential eavesdropper.We can see that the energy of the memoryless selection scheme, the traditional selection scheme, and the arbitrary selection scheme run out earlier than that of the proposed scheme, which further confirms the advantage of the proposed scheme.
Figure 8 reveals the average network lifetime of different relaying schemes with  = 5 available relays and  = 1 potential eavesdropper.In this paper, the network lifetime is defined as that the number of dead relays that reach a threshold, th, such that the considered cognitive network can no longer achieve the target secrecy performance.As    expected, the network lifetime of all schemes increases with th.In addition, our proposed scheme always has the best performance.

Conclusion
In this paper, we have explored the physical layer security and efficient energy consumption of the secondary transmission and proposed the best relay selection and dynamic power allocation scheme.Moreover, the spectrum occupancy state, the wireless channels, and residual relay energy are characterized as finite-state Markov model in order to accurately describe the time-varying radio environment.Specifically, we formulated the relay selection and power allocation problem as a restless bandit system and solved this stochastic control problem with a primal-dual index heuristic algorithm.Finally, simulation results have been presented to illustrate that the proposed relay selection and power allocation scheme can significantly maximize the secrecy capacity as well as minimize the energy consumption compared to the existing schemes.

Figure 1 :
Figure 1: Coexistence of a primary network consisting of one primary transmitter (PT) and  primary destinations (PDs) with a secondary network consisting of one secondary transmitter (),  relay nodes, and one secondary destination () in the presence of  eavesdroppers.

Figure 2 :
Figure 2: Average secrecy capacity versus the number of candidate relays for different relaying schemes with  = 1 and  th = 5 mw.

Figure 3 :
Figure 3: Average secrecy capacity versus the number of potential eavesdroppers for different relaying schemes with  = 5 and  th = 5 mw.
Interference temperature limit P th (dBm) Average secrecy capacity (bit/s/Hz) Proposed selection scheme Memoryless selection scheme Traditional selection scheme Arbitrary selection scheme

Figure 4 :Figure 5 :
Figure 4: Average secrecy capacity versus interference temperature limit for different relaying schemes with  = 5 and  = 1.

Figure 8 :
Figure 8: Average network lifetime versus the threshold th for different relay selection schemes with  = 5,  = 1, and  th = 5 mw.
if   is in state   and its action is   at time ,   () represents the expected total discounted time that   in state   () takes action   () under policy , where      = 1 if action   () is taken in time  and    ∈S  ,  ∈A under all admissible policies, is denoted by :