Optimal Throughput for Cognitive Radio with Energy Harvesting in Fading Wireless Channel

Energy resource management is a crucial problem of a device with a finite capacity battery. In this paper, cognitive radio is considered to be a device with an energy harvester that can harvest energy from a non-RF energy resource while performing other actions of cognitive radio. Harvested energy will be stored in a finite capacity battery. At the start of the time slot of cognitive radio, the radio needs to determine if it should remain silent or carry out spectrum sensing based on the idle probability of the primary user and the remaining energy in order to maximize the throughput of the cognitive radio system. In addition, optimal sensing energy and adaptive transmission power control are also investigated in this paper to effectively utilize the limited energy of cognitive radio. Finding an optimal approach is formulated as a partially observable Markov decision process. The simulation results show that the proposed optimal decision scheme outperforms the myopic scheme in which current throughput is only considered when making a decision.


Introduction
Cognitive radio (CR) technology can improve spectrum utilization by allowing cognitive radio users (CUs) to share the frequency assigned to a licensed user, called the primary user (PU). In order to avoid interference with the operation of the licensed user, CUs are allowed to be active only when the frequency is free. Otherwise, when the presence of the PU is detected, CUs have to vacate their occupied frequency. Subsequently, an essential problem arising in CR implementations is reliable spectrum sensing. In the CR network, since the amount of energy consumed by spectrum sensing increases with sensing time duration, which is one of the main factors affecting sensing performance, sensing energy can significantly affect throughput. In addition, more throughput can be achieved by adapting an adaptive transmission power control (ATPC) [1,2] in the case of a fading communication channel.
As a normal wireless node, a CU has a finite capacity battery which can be recharged by an energy harvester and is consumed by spectrum sensing, data processing, and data transmission. Therefore, a primary challenge of cognitive radio is how to optimize functionality. The problem of optimal energy management has been considered previously [3,4] where an optimal energy management scheme for a sensor node with an energy harvester to maximize throughput is proposed. For maximizing throughput of a CR system, the optimal choice about when to keep silent or carry out spectrum sensing is addressed in [5,6] in which the partially observable Markov decision process (POMDP) [7,8] is adopted to obtain an optimal secondary access policy. However, in previous works [5,6] there are some limitations: a constant harvested energy is unrealistic, the effect of energy consumed by performing spectrum sensing on system throughout is not addressed, and an ATPC is not investigated.
In this paper, we propose an optimal mode decision policy (i.e., keep sleeping mode or change to accessing mode) for CR with a non-RF energy harvester to maximize the CR system throughput. An optimal sensing energy algorithm and an ATPC are also considered in the proposed scheme in order to guarantee effective utilization of CU's limited energy resource, which extends life time and improves throughput of the CR system.

System Model
We assume that a CR network and a PU operate in a time slotted model. The status of the PU changes between two states of the Markov chain, that is, presence (P) and absence (A), as shown in Figure 1. The transition probabilities of the PU from state P to state A and from state A to itself are defined as PA and AA , respectively. The CU is assumed to always have a data packet to transmit. When the CU wants to access the channel of the PU, it needs to perform spectrum sensing.
Only if the sensing result is the state A of PU, CU will be allowed to use the channel. The energy of the CU is stored in a battery with a finite capacity of ca packets of energy. In general, the CU needs to decide its operation either in sleeping mode or in accessing mode to maximize throughput and energy utilization. In both sleeping and accessing modes, the CU can harvest energy from the environment by using its non-RF harvester while performing other operations. At the th time slot, SU can harvest ℎ ( ) energy units that can be used in the next time frame. ℎ ( ) takes its value from a finite number ℎ of energy units: where 0 ≤ ℎ 1 < ℎ 2 < ⋅ ⋅ ⋅ < ℎ ℎ ≤ ca . The probability mass function (PMF) of the harvested energy is given as follows: We assume that the harvested energy follows the stochastic process that is marked by the Poisson process. Subsequently, ℎ ( ) is a Poisson random variable with mean ℎ mean . The PMF in (2) can be rewritten as follows: At the beginning of the time frame, information on the amount of remaining energy , 0 ≤ ≤ ca is available at the CU. Furthermore, the CU has a belief , which is the probability of the PU being absent (A) at the time frame. This information can be calculated by statistics of history sensing results from the CR network. Based on the values of and , the SU decides to keep sleeping or to carry out spectrum sensing and transmit data if the state A of the PU is detected.
We consider fading at the data channel between the CU transmitter and the CU receiver. At the CU receiver, we assume that the channel gain takes its value from the set of finite integers: where 1 > 2 > ⋅ ⋅ ⋅ > . The CU receiver reports this channel gain to the CU transmitter over low-rate, error-free, and zero-delay feedback channel, called causal channel state information (CSI) feedback [9].
The PMF of channel gain can be defined as By applying an ATPC, the required transmission energy of CU, ( ), can be determined corresponding with the channel gain ( ): where the smallest required transmission energy, 1 , corresponds with the highest channel gain, 1 , and, similarly, the CU consumes the largest energy for transmission, , when the channel gain is the lowest, ; that is, 1 < 2 < ⋅ ⋅ ⋅ < ≤ ca . The PMF of transmission energy can be expressed as follows: We assume that the level of channel gain follows the Poisson process. Therefore, ( ) is a Poisson random variable with mean value mean . As a result, the PMF of the transmission energy in (7) can be given as For efficient utilization of energy, we define a transmission energy threshold th to consider the transmission cost so that if the required transmission energy exceeds this threshold, the CU will drop the transmission.

Optimal Mode Decision Policy Based POMDP
In this study, we obtain an optimal mode decision policy by adopting POMDP for the object of maximizing the throughput of the CR system. Two operation modes, sleeping mode (S) and accessing mode (AC), are considered for the CU. As a normal device with limited energy resources, if the CU lacks energy for operations (i.e., spectrum sensing and transmission data), it will keep sleeping and only harvest energy for the next time operation. This operation is called sleeping mode. In the accessing mode, on the other hand, the CU performs spectrum sensing to detect the state of the  PU and further if the state A of the PU is detected, the CU transmitter will send data to the CU receiver.
In spectrum sensing, consumed energy can significantly affect the throughput of the system, especially in the case of limited energy devices. Subsequently, in the next subsection we will propose an algorithm to obtain the optimal sensing energy for the CU.

Optimal Sensing Energy for Maximizing Throughput.
The spectrum sensing of the CU, which is assumed to be performed by using an energy detection method, is to distinguish between two hypotheses of the PU, presence (P) or absence (A). Consider the Gaussian noise in the sensing channel, hence when the number of sensing samples is relatively large (e.g., > 200), the received signal energy can be closely approximated as a Gaussian random variable under both hypotheses such that [10] ∼ { ( , 2 ) , A, ( ( + 1) , 2 (2 + 1)) , P, where is the SNR of the sensing channel between the PU and the CU. The decision about state of the PU can be made as follows: where is the energy threshold and "1" and "0" correspond to the states P and A of the PU, respectively. The sensing performance of the CU can be evaluated by the probability of false alarm ( ) and the probability of detection ( ), which are given, respectively, as and = ( − ( + 1) ) .
The number of sensing samples is assumed to be = 2 , where is the sensing time duration and is the bandwidth. Therefore, for the required probability of detection * , the probability of false alarm according to the sensing time can be calculated as follows: Here, energy consumed by spectrum sensing is defined as . Then, we can assume that is proportional to with a constant of proportionality ; that is, = . Therefore, the probability of false alarm depends on sensing energy according to * ( ) = ((2 + 1) −1 ( * ) + √ ) .
If the sensing results of the CU is the state A of the PU, then the CU can transmit its data. But the throughput is achieved only when this transmission is performed and the PU is really in state A (i.e., the sensing result is correct). The average throughput according to sensing energy can be defined as where is the total time frame for both spectrum sensing and data transmission and 0 is the standard throughput of the CR link, which is defined as 0 = log 2 (1 + SNR CR ), where SNR CR is the SNR received in the CU receiver. The optimal value of for each time frame such that the average throughput of the CU is maximized while maintaining a low level of interference with the PU (i.e., meet the requirement of * ) can be found as the solution of an optimization problem as follows: The problem can be solved by using a numerical method and value of the optimal sensing energy ,opt will be utilized for the proposed optimal mode decision policy of the CU transmitter based on POMDP, as shown in Figure 2.

Optimal Mode Decision Policy.
The optimal mode decision policy related to sleeping or accessing is formulated as the framework of POMDP. The value function ( , ) is defined as the maximum total discounted throughput from the current time slot when the remaining energy is and the belief regarding state A of the PU is . The value function is given by where 0 ≤ < 1 is the discount factor and and are the remaining energy and belief at the beginning of the th time slot, respectively. ( , , ) is the throughput of the CU achieved at the th time slot, which is mainly dependent on , , and action . As described above, action can be either to remain sleeping or change to accessing; that is, ∈ { , AC}. If the CU decides to change to accessing mode it will use ,opt as sensing energy. In addition, an ATPC will calculate the transmission energy according to the channel gain information which is provided by causal CSI feedback from the CU receiver. ( 1 ). If the CU decides to remain sleeping, no throughput is achieved; then ( , , | 1 ) = 0 and the belief for the next time slot is updated as follows:

Sleeping Mode
Also, the remaining energy of the battery will be increased according to with transition probability for = 1, 2, . . . , ℎ .

Accessing Mode.
When the CU decides to change to accessing mode, the achieved throughput of the system depends on the observation of the CU. In this paper, we define 4 observations for the accessing mode of the CU which are as follows.
Observation 1 ( 2 ). The sensing result is state P of the PU; then the CU does not transmit data and there will be no achieved throughput, ( , , AC | 2 ) = 0. The probability that 2 happens is The belief in the current time slot can be updated by using Bayes' rule as follows: As a result, the updated belief that the PU is in state A at the next time slot is given by The updated remaining energy is obtained as: with transition probability for = 1, 2, . . . , ℎ .
There is no PU signal detected (i.e., state A). The required transmission energy is smaller than the threshold th ; then the CU transmits data and can receive an ACK message. This means that the sensing result is correct (A is the real state of the PU) and the CU is successful at transmitting data. The throughput is achieved as The probability that 3 happens is The belief and remaining energy for the next time slot can be updated, respectively, as with transition probability for all = 1, 2, . . . , ℎ and = 1, 2, . . . , .
Observation 3 ( 4 ). State A of the PU is detected. The required transmission energy is smaller than the threshold th ; then the CU transmits data but can not receive the ACK message. This means that the sensing result is incorrect (P is the real state of the PU), the transmission data fails, and ( , , AC | 4 ) = 0. The probability that 4 is obtained is The belief that the PU will be in state A at the next time slot is given as The remaining energy of the CU can be updated similar to the case of 3 .
The Scientific World Journal 5 Observation 4 ( 5 ). The sensing result concludes that the PU is in state A. The required transmission energy exceeds the threshold th ; then the CU does not transmit data and ( , , AC | 5 ) = 0. The probability of the case 5 is given as where ,A is the probability that the sensing result is state A of the PU, which is given by Based on Observation 4, we can update the belief of current time slot by using Bayes' rule as follows: Subsequently, the updated belief for the next time slot is calculated as The updated remaining energy for the next time slot can be obtained similar to the case of 2 .
According to those observations, the value function in (17) can be expressed as follows: The optimization problem in (37) can be solved to find an optimal mode decision for maximizing throughput of CR system by using the value iterations method [11].

Simulation Results
In this section, we present simulation results of the proposed scheme and the Myopic scheme that only considers the current time slot for the value function (i.e., = 0) under the parameters as shown in Table 1. Figure 3 shows the optimal mode decision policy for the sleeping and accessing modes based on the values of and . It can be seen that when the remaining energy is low, the CU changes to the accessing mode when value of is high. In contrast, when the value of is low, more remaining energy is required for carrying out the accessing mode. Figures 4, 5, and 6 illustrate average throughput according to the required probability of detection * in some cases of  Remaining energy (e) Figure 3: The optimal mode decision policy when ca = 10, ℎ mean = 2, and mean = 7 (black area: sleeping mode and white area: accessing mode). ca , ℎ mean , and mean . It is clear that the required probability of detection * represents the protection level of the PU. That is, a high value of * offers a high protection level for the PU. However, the high protection level of the PU may reduce the opportunity for communication in the CR system. The clear relation between throughput and the value of * is shown through Figures 4, 5, and 6; that is, average throughput tends to decrease with the improvement in * . On the other hand, the increases in ca and ℎ mean provide the CU with a higher probability of being active in accessing mode, which results in increase of average throughput. On the contrary, the higher transmission energy (i.e., higher mean ) may reduce the average throughput in the case of a constrained energy resource. Figure 7 shows average throughput when the capacity of battery ca and the required values of * are considered. From the figure, it is observed that the average throughput increases with decreasing of * and the bigger capacity of the battery (i.e., bigger ca ) results in higher average throughput. However, when ca reaches a certain level that is sufficient to store all harvested energy, then the throughput can not be improved due to enhancement in ca .  Figure 8 compares the POMDP-based proposed scheme with the Myopic scheme. We define three cases of the Myopic scheme in this simulation: (1) "Myopic-original, " the scheme is described in [5] in which no optimal sensing energy and no ATPC are considered; (2) "Myopic-ATPC, " Myopic scheme in which an ATPC is considered; (3) "Myopic-,opt and ATPC, " Myopic scheme in which both optimal sensing energy ,opt and an ATPC are considered. It can be seen that an ATPC and/or an optimal sensing energy algorithm can improve throughput of the system compared with the "Myopic-original" scheme. In addition, the proposed scheme achieves better performance than all Myopic schemes because it considers the throughput of future time frame based on POMDP.

Conclusion
In this paper, a POMDP-based proposed scheme is investigated in order to find an optimal mode decision policy to maximize the throughput of the CR system. The random value of harvested energy considered in the proposed scheme is more practical than that in the previous studies. An ATPC scheme and an optimal sensing energy algorithm are proposed for efficient utilization energy from a limited capacity battery of the CU. Simulation results demonstrate that the proposed scheme significantly improves the throughput of the CR system. More specifically, throughput of the CR system depends on protection level of the PU system, * . With higher level of * , the opportunity for communication of CR system is decreased and corresponding throughput The Scientific World Journal is also decreased. In addition, the increase of harvested energy and capacity of battery can improve throughput of the system. However, higher transmission energy reduces the throughput.