A Self-Learning MAC Protocol for Energy Harvesting and Spectrum Access in Cognitive Radio Sensor Networks

,


Introduction
Wireless Sensor Networks (WSNs) are made up of a big number of distributed sensor nodes, fitted with various sensors and typically spread over a wide geographical area [1].Such WSNs have found a wide range of applications nowadays, including periodic monitoring, security, surveillance, and health monitoring and control [2].Energy consumption is a major issue in the design of WSN protocols, since each WSN node is equipped with limited power supply, due to cost considerations.Such limited power supply dramatically reduces the life time of a WSN system, especially when WSN nodes keep operating in the active mode.
To prolong the life of WSN nodes, and to avoid the need for continuous replacement or recharging of the batteries in such nodes, many researchers focused on developing more energy efficient MAC and routing protocols for WSNs.Such protocols utilized different innovative techniques and structures that minimize the energy consumption per transmitted packet [3][4][5][6][7][8][9][10].
However, no matter how energy efficient a WSN node is in transmitting data, a finite power supply is bound to dry up eventually, especially when high throughput data transmission is required, which makes the energy problem in WSN systems one of significant importance.This is why research in wireless energy harvesting techniques has surged recently [11][12][13][14].
In wireless energy harvesting, the WSN nodes are equipped with hardware to convert the ambient radio frequency (RF) signals from nearby sources into electricity to recharge the node's battery.WSNs represent an ideal application for energy harvesting since it means the WSN nodes can rely on a very small rechargeable battery, so long as they take the appropriate steps to harvest energy to maintain a sustainable power supply.Since stray RF power is widely available nowadays with the almost ubiquitous use of TV and radio broadcasting towers, cellular base stations, and Wi-Fi access points around the globe, finding a source from which to harvest power is easier than ever [11][12][13][14].Dedicated RF sources can also be used in special cases when needed [15].

Background and Related Work
There has been tremendous research regarding the hardware circuit design for energy harvesting devices.Most of the implementations are based on the CMOS technology, though other technologies have also been used on occasion.In a harvesting device, generally the efficiency of energy conversion from the AC input radio signal to the output DC voltage is modest unless the received RF power level is high enough (above −15 dBm).For example, the work in [22] can achieve about 70% efficiency at 0 dBm received RF power on the 2.45 GHz spectrum, and the design in [23] can achieve 75% efficiency at −10 dBm received power at 900 MHz.However, the efficiency drops to 40% when the power drops to −17 dBm for the state-of-the-art work in [24] at 868 MHz.This means that the amount of harvested energy is typically small, even with the best possible hardware, especially due to the large propagation loss in wireless environments.For example, the authors in [24] managed to harvest 2 W at a distance of 27 meters away from a 1.78 W transmitter running at 868 MHz.Another study managed to obtain a harvested energy of 5.5 W at 15 meters from a 4 W power supply running at 902-928 MHz [25].In addition, the authors in [26] managed to harvest 60 W at 4.1 km from a high-powered 960 kW TV tower broadcasting in the 674-680 MHz frequency range.
It is fair to expect that more advanced technologies for wireless energy harvesting will be available in the near future due to better multiband electronics and better design of highly efficient beam-forming antenna arrays [27][28][29].However, comparing the levels of power that current state-ofthe-art harvesters can achieve (which is in microwatts), with the power required for transmitting data packets (which is in milliwatts), we see a big discrepancy.For example, a typical WSN node transmitter running at 2. 4 GHz might send at a power level of −4.4 dBm, at which time its electronics will consume an average power of 4.5 mW [30].If the transmit power is increased to 0 dBm, the transceiver of a sensor node consumes approximately 87 mW [30].Another state-of-theart ultra low power 915 MHz transmitter sending at −3 dBm requires a total power of 1.78 mW [31].
This effectively means that sensor nodes relying on RF energy harvesting to achieve perpetual lifetime are limited in their data transmission capabilities because they need to spend a great amount of time harvesting enough energy to be able to transmit a single packet.This is a big problem for WSNs that rely on energy harvesting, but one that we turn into an opportunity as we design our MAC protocol seamlessly around this issue, without sacrificing robustness and simplicity and without abandoning the distributed nature of WSN networks.
It is noteworthy that many of the scheduling techniques and MAC protocols proposed for WSNs that employ energy harvesting have utilized a centralized approach.For example, the authors in [32] proposed a scheduling protocol that requires sensor nodes to harvest energy from downlink broadcast signals before they use this harvested energy to transmit their independent uplink data packets based on a Time Division Multiple Access (TDMA) framework coordinated by the access point.The access point jointly optimizes the time allocations for the downlink energy harvesting and uplink data transfer, which results in maximizing the overall system throughput.
A similar TDMA-based system was studied in [33] but with the base station equipped with two antennas.The optimization problem maximizes the total throughput of the system subject to time constraints or minimizes the total harvesting time and data transmission time of the system subject to data rate constraints.
The MAC protocol featured in [34] uses a contentionbased approach based on CSMA/CA for a star-topology sensor network.In this topology, a single controller node collects data from and emits wireless energy to other sensor nodes.The presented protocol uses an energy adaptive technique and out-of-band RF power harvesting to manage the sensor node's duty cycle based on the node remaining energy level and to control the node's backoff time based on the node energy harvesting rate.
The authors in [35] considered a Multiple-Input and Multiple-Output (MIMO) system with imperfect channel state information and proposed a protocol for frame-based transmission that splits each frame into three phases.In the first phase, the centralized base station estimates the downlink channels by utilizing pilots.The base station then broadcasts radio energy to all nodes under its control for them to perform harvesting in the second phase.In the third and final phase, nodes transmit their data packets to the base station.The optimization problem is executed at the base station to optimize the time and energy allocation with the aim to maximize the minimum rate among all nodes.
The work in [36] used a multiantenna base station in an OFDMA broadband setting to not only communicate but also transfer RF energy to the wireless nodes.The base station optimization algorithm maximizes the system throughput under the constraint of power consumption by the base station and the wireless nodes.The authors in [37] allowed the base station to optimize energy efficiency (i.e., bit/Joule) in the downlink of a multireceiver OFDMA system.The optimization problem is based on the transmit power allocation from the base station and receiver operation to harvest energy.
Though these are valid optimization schemes, they rely on a centralized and powerful controller.This is not a perfect fit for WSNs since it suffers from intractable computational complexity when the system size grows.In addition, it exhibits single point-of-failure issues.
Decentralized approaches that are based on game theory have also been suggested to reduce the complexity issue when working out the optimal solution.For example, the authors in [38] proposed a bidding strategy to achieve a Nash equilibrium between the nodes when competing to harvest RF energy.The work in [39] extended the bidding to both data transmission and energy resources.In [40] a repeated coalition formation game was considered, where RF-powered wireless nodes cooperate in packet transmission to improve the long-term payoff for the system.Such techniques suffer from overhead and the need for out-of-band mechanisms to perform bidding.Additionally, the power-limited and energy-constrained WSN nodes are not typically capable of running such sophisticated algorithms.
Other game-theoretic approaches used for optimizing energy harvesting in sensor networks include queuing theoretic transmission policies [41] and modified back-pressurebased algorithms with energy queues [42].
In [15], the authors proposed a distributed MAC protocol based on CSMA/CA to optimize in-band RF energy harvesting from multiple wireless energy sources in a CRSN.The goal was to minimize the impact of interference between the multiple wireless sources and to maximize energy transfer, but the authors did not consider the issue of either minimizing the interference between the secondary nodes and the primary users or minimizing the collisions between the secondary nodes themselves when they find an opportunity to transmit data.
A partially observable Markov decision process is used at the secondary nodes in [43,44] to select between opportunistic spectrum access and energy harvesting to maximize the system throughput in CRSNs.This system incurs extremely high computational complexity when it comes to low-powered secondary sensor nodes, especially when the state space (related to energy queue size and data queue size) is large.
In light of our prior discussion, our proposed MAC protocol is the first distributed protocol for CRSNs that intelligently addresses the disproportionate difference between data transmission power and harvested power and uses that to improve both the network throughput and total harvested energy.

S-LEARN MAC Protocol
3.1.Network Architecture.We consider in this work a CRSN in which sensor nodes are scattered around a particular geographical area.Such nodes collect data from their sensors and report to dedicated data sinks as shown in Figure 1.The nodes opportunistically utilize the spectrum of PUs in the area when such PUs are inactive.When the PUs are active, sensors do not send data in the spectrum band to avoid harmful interference with the licensed owners.Rather, they use such opportunity to harvest RF energy from the radiated power of the PUs.
The S-LEARN MAC protocol that we propose in this work assumes that sensor nodes rely on harvested wireless energy to operate in such CRSN.To bring the cost of the hardware further down, we limit the sensor node to accessing only one frequency band at any point in time, though it can switch frequency to other available spectrum bands in the future, if it so choses.Hence, sensor nodes cannot carry out energy harvesting and opportunistic data transmission at the same time, which reduces cost because it allows the sensor node to reuse the same antenna for both the transmitter and harvester circuitry, as shown in Figure 2. We assume the time axis to be slotted into epochs, each of duration  seconds.Each sensor node needs to make a decision every time epoch on which spectrum band it wants to access, and after a small period of sensing that band it can decide if a PU is active or not, so it can tell whether it should utilize that spectrum band for transmitting a packet or for harvesting energy.
The nodes that employ the S-LEARN protocol operate autonomously and independently in a completely distributed fashion, without the need to communicate their intentions with each other or with a centralized controller.Particularly, they adaptively learn about the behavior of the PUs and other sensor nodes in the network by utilizing their own spectrum sensing information.This keeps the system cost extremely low and makes for a very robust network.
Due to the limitations of hardware technology, we also assume that the energy consumed by the sensor node as it transmits a packet in one time epoch, which we denote by  Γ , is much higher than the energy that the sensor node can wirelessly harvest during a different time epoch of the same period.This latter one is denoted by  h .In other words  Γ ≫  h .Hence, the sensor node should spend more of its time attempting to harvest energy rather than transmitting so it can maintain a constant and healthy battery level.
As will be illustrated momentarily, S-LEARN requires sensor nodes to take turns in sending information in empty spectrum opportunities, while other sensor nodes busily work on harvesting energy from an available PU source.This will minimize possible collisions between the competing sensor nodes and maximize the total system throughput.

Algorithm Description.
Collisions represent a major challenge for distributed systems, and in power-limited scenarios (such as sensor networks) it is doubly problematic, because not only does it limit the throughput of the available spectrum, but it also wastes valuable energy (equal to  Γ ) without resulting in successful packet transmission.This causes the node to spend excessive time in harvesting more energy to reattempt sending the same data, which unduly increases the delay the packets experience.
A trivial solution to this problem comes to mind, which is to use a TDMA protocol run by a centralized controller.In this case, the central controller will poll each sensor node in a round-robin fashion to allow it to transmit data.However, as explained previously, solving this issue using a centralized controller is not desirable in the context of WSNs.This is because the central coordinator will add extra unjustified cost to the system, as it requires high computational power, large memory, and extra transceiver power budget to allow it to communicate with every single sensor in the network.In addition, the central controller represents a single point-offailure for the system and results in scalability issues when the number of sensor nodes increases.It also has to employ a special protocol to register and access newcomer sensor nodes and to detect and release defective energy-starved nodes from its polling schedule.
Central control becomes even more difficult to operate in cognitive radio environments because spectrum is not guaranteed to be available, which means a dedicated control channel might not persist over time.Finally, the central controller can easily become a security vulnerability for Denial-of-Service (DoS) and jamming attacks.Distributed systems do not suffer from all these problems, as they provide a lower cost solution that is scalable and does not have a single point-of-failure.
Another simple approach to coordinate among sensor nodes is for each sensor node to broadcast a schedule of its intended transmissions to all other sensor nodes in a beacon at the beginning of each time epoch.However, this approach is also undesirable because it incurs extreme overheads in using the channel capacity to send such control information instead of actual data, thereby reducing the channel throughput.More important, in cognitive radio scenarios (where there are more than one spectrum band available) it means each sensor node has to have multiple receivers tuned to the different available spectrum channels to collect the control information from other nodes in the system, which is extremely expensive in terms of hardware costs and power requirements for the nodes.The approach also suffers from the famous hidden station problem, where sensor nodes might not hear each other's beacons but still affect each other's transmission at the receiver side.Therefore, autonomous and independent distributed operation of sensor nodes is preferable so long as there is an intelligent mechanism to reduce possible collisions.Our S-LEARN algorithm is such a technique, in which sensor nodes use their observations of the behavior of other nodes in the network to progressively learn about the environment and hence coordinate their transmissions.
In our MAC protocol, the long time it takes for the sensor nodes to harvest energy is utilized to our advantage to create proper schedules at the different nodes to specify in which bands and which time slots they are going to transmit their data to avoid colliding with other nodes.Meanwhile, other nodes use that time (when a specific node is transmitting) to harvest RF energy from the PUs in the system.The idea is illustrated by means of a simple example in Figure 3.In this scenario, four sensor nodes are trying to coordinate access to two spectrum bands in a CRSN.For simplicity, we assume that the first band is always empty, while the second band is occupied by an always-active PU.
From Figure 3, notice how the majority of sensor nodes spend most of their time in the second band to harvest RF energy, while leaving this band only once during a cycle to attempt transmitting data.Observe that, due to the distributed nature of the system, initially the nodes might collide with each other.However, from these collision incidents, the senor nodes start to learn and change their transmission behavior until they settle on a configuration that has no collisions (see cycle  in Figure 3).
To progressively reach that desired configuration, each node executing the S-LEARN MAC protocol maintains a tentative schedule of its intended data transmission and harvesting operations.The schedule is maintained and updated by the sensor node by factoring in the results of its own spectrum sensing operations.The schedule is only used locally and is never sent to other nodes in the network, which avoids extra data overhead that affects throughput.
In addition, the schedule is limited to a length of  time epochs (slots) to reduce memory requirements at the node, where  is an integer number called the cycle length.We require that the schedule at all sensor nodes be set to the same  value by design, but we do not require system-wide synchronization.Therefore, the schedules of different nodes do not have to be perfectly aligned (as illustrated in Figure 4).
In Figure 4, the cycle length is set to  = 4 time slots (time epochs are referred to as time slots in the context of scheduling since schedules repeat over time).The cycle length is selected in our MAC protocol to allow the sensor nodes to send a maximum of one data packet during one cycle.The remaining slots in the cycle must be used by the sensor node to harvest enough energy for one transmission.Hence, we must maintain the following condition on the cycle length or Typically, however, we need  to be slightly larger than this value because the sensor node is not guaranteed to find an available PU from which to harvest energy during all the ( − 1) time slots, which means, depending on the activity behavior of the PUs in the system,  might need to be larger than that specified by (2) to ensure that the sensor nodes harvest  Γ energy to be able to transmit one packet in that cycle.
Remember that we assumed (in our simple example in Figures 3 and 4) that the PU in band 1 is switched off, which means the band is always empty and available for data transmission, while the PU in band 2 is always on, which makes this band a perfect candidate for energy harvesting.If the sensor nodes somehow knew about this fact, they would schedule their transmissions in band 1 and their energy harvesting activities in band 2. This is the first problem our MAC protocol attempts to solve, which is to identify the bands that are best candidates for energy harvesting.The nodes in S-LEARN identify this information autonomously by observing the behavior of the PU activation/deactivation within a cycle using two simple counters that they maintain, h , 0 and h , 1 (to be explained shortly).The next challenge the sensor nodes encounter is to select one proper time slot within their schedule to transmit their packet, such that they avoid selecting the same time slot as other nodes in the network, hence resulting in a collision.Figure 3 shows how the sensor nodes in S-LEARN start with random selections thus sometimes successfully transmitting and sometimes colliding with other nodes.However, due to two more counters in each node, called Γ , 0 and Γ , 1 , the nodes progressively and independently learn  about the behavior of other nodes and start selecting the time slot/spectrum band combinations that are different from others.Figure 4 illustrates one desirable set of schedules at the different nodes in our example.We can see from the figure that node 1 decided to send its packet in band 1 during slot 3 of the cycle.Notice also that the schedule of each node does not have to be synchronized with other nodes for this technique to work, because the schedules are repetitive and of the same length .For example, node 2 decided to transmit its packet in band 1 in time slot 3 (relative to its own schedule), but this is slot 1 relative to node 1's schedule, and hence no collision occurs between nodes 1 and 2.

S-LEARN Counters.
To achieve the above described scheduling policy, each sensor node running our proposed MAC protocol must maintain a total of four counters for each time slot  ∈ [1, 𝐾] in the -long schedule and for each available spectrum band  ∈ [1, ] in the CRSN system.Those counters are as follows.
(i) h , 1 : which represents the number of successful energy harvesting incidents the sensor node observed whenever it attempted to harvest energy from band  during time slot  of its own schedule.
(ii) h , 0 : which is similar to h , 1 but counts the number of failed energy harvesting incidents the sensor node observed, because no active PU was found in that band at that time slot.(iii) Γ , 1 : which counts the number of successful packet transmissions (for which an ACK was received) that the sensor node observed whenever it attempted to transmit through band  during time slot  of its own schedule.(iv) Γ , 0 : which is similar to Γ , 1 but counts the number of collisions the sensor node observed whenever it attempted to transmit through that band and time slot.
Each sensor node builds the tentative schedule for the next cycle with the help of these counters, as will be described shortly, and then starts abiding by the schedule for the whole duration of the cycle.Notice that the sensor node can only access one band at any given time slot due to its limited hardware capabilities (see Figure 2).When accessing a band at a certain time slot, the node senses that band at the beginning of the slot, at which time four possibilities could arise: (i) the sensor node attempts (based on its schedule) to harvest energy from band  at time slot , and when it senses the band it finds that a PU is already sending RF energy in that band.In this case, the sensor node is lucky and harvests  h energy from that band for the duration of the time slot and then increments its h , 1 counter by 1. (ii) Another possibility is for the node to attempt to harvest energy, but when it senses band  at the beginning of slot , it discovers that the band is empty.This happens when the PU is inactive during that time slot.In this case, the sensor node increments its harvesting failure counter h , 0 by 1 and refrains from sending information to avoid colliding with other nodes that might have scheduled transmission in that slot/band pair.(iii) The third possibility is when the sensor node attempts to access the spectrum for data transmission (assuming there are packets awaiting transmission in its queue), and it is fortunate enough that sensing band  at the beginning of slot  finds it empty.In that case, the node sends its data packet and waits for an ACK from the data sink.If the ACK arrives at the end of the time slot then the Γ , 1 counter is incremented by 1; otherwise a collision must have happened and rather the collision counter Γ , 0 is incremented by 1. (iv) The final possibility is for the sensor node to access the band for data transmission but finds it occupied by an active PU, at which case the node takes this opportunity to harvest energy anyway, increments its harvesting success counter h , 1 by 1, and delays the transmission of its packet until the next cycle.
It cannot be missed that the four counters are initialized to zero and then continuously updated in the aforementioned manner as the S-LEARN protocol is executed by the sensor nodes.However, at the end of each cycle (and after building the tentative schedule for the next cycle), each of the four counters is multiplied individually by a factor  < 1.0 called the aging factor, which is common to all the counters.This aging process helps the sensor nodes to slowly forget old information that might not be relevant any more as the network structure changes over time.
It is worth mentioning that the counters are cumulative and are not reset for as long as the sensor node is running.However, multiplying by the aging factor, which is smaller than unity, will reduce the value in these counters to almost zero as time passes on.
3.4.Building the Schedule.At the beginning of each scheduling cycle, the sensor node builds its own local schedule by inspecting the values of its four counters.The schedule involves two objectives: (a) reducing the chance of colliding with other sensor nodes when the PU is inactive and (b) harvesting RF energy when the PU is active.
Since the sensor node does not have prior knowledge of the spectrum occupancy of the PU in the next cycle, it uses the harvesting success and failure counters to estimate, in a probabilistic way, the chance of finding an active PU in a particular band.Hence, the sensor node calculates at the beginning of the cycle the harvesting scores   for bands  ∈ [1, 𝑀], where The first term inside the max[] operator is an estimate of the probability of finding an active PU in band  when compared to other bands in the system, while the second term estimates the chance of successfully harvesting energy (as opposed to failing) from band  when joining that band.Notice that the first term only considers harvesting success but not failures, while the second term considers both success and failure of harvesting, albeit in one band only for the latter.This combination allows the sensor node to quickly gravitate toward bands with more PU activity, but at the same time not completely abandon other bands with less PU activity.This is useful since the PU activity can change over time.Finally,   in (3) is a small probability, which we set to   = 0.01 to prevent sensor nodes from setting   = 0, thus ensuring that sensor nodes will visit band  for harvesting from time to time to see if new PUs (that are suitable for harvesting) have been recently activated in that band.
The parameter  ℎ is a constant that can be used to control the dominant term in the above equation.The tradeoffs resulting from this (and other parameters in S-LEARN) will be described later in Section 6.
For each time slot in the cycle (except for the one chosen for data transmission), a coin is flipped to choose one of the  bands for energy harvesting, using uniform distribution with the probability assigned to picking any band  ∈ [1, ] being Once harvesting scheduling is performed, the second task of the scheduler is to find a proper slot/band pair in which to attempt the data packet transmission within that cycle.This is done by calculating a transmit score  , for each slot  and band  pair in the cycle utilizing three of our earlier counters as follows: The transmit scores of all slot/band pairs are calculated and compared to find the maximum score.The transmit scores equal to the maximum possible  , are then collected and one of them is chosen at random with uniform probability.That selected slot/band is no longer used for harvesting, but for data transmission instead.On the other hand, if the sensor node does not have any packets queued for transmission by the start of the cycle, it skips the step of choosing a slot/band pair for transmission and just uses the earlier algorithm to schedule energy harvesting in all time slots.
Equation ( 5) is easy to understand.The slot/band pairs that witnessed more successful packet transmissions and fewer collisions in the past receive a higher transmit score  , , and hence they tend to be picked up by the schedule for data transmission.Clearly, if a slot/band pair is heavily contested with other nodes, Γ , 0 counter will detect this and the sensor node will penalize it with lower  , score to avoid transmitting a packet in that location in the future.In addition, the last term in the equation prevents sensor nodes from attempting to transmit in bands and/or slots where PUs were most active in the past, as this means missing the opportunity to transmit a packet in the current cycle.
Again, the parameters   and   are control parameters that can fine-tune the S-LEARN protocol to fit the user needs.The tradeoffs resulting from such parameters are explained later.

Simulation Framework
We test the performance of the S-LEARN MAC protocol using simulations.We will first describe the simulated network setups and then discuss the performance measures used to evaluate the performance of our technique.Finally, we present the simulation results in the next section.

Simulation Parameters.
We investigate three different setups of cognitive radio sensor networks.Such setups will illustrate different capabilities of our proposed MAC protocol.In all such scenarios, we will assume a geographical area with  = 900 cognitive sensor nodes contending for access to  = 5 underutilized spectrum bands.Each spectrum band is licensed to a PU that is active only part of the time.For all setups used, we employ the most common PU activity model used in literature, which is the one shown in Figure 5 [45].This model assumes slotted PU environments, where the activity of the PU follows a two-state Discrete-Time Markov Chain (DTMC).The transition probability of the th PU going from the inactive state to the active state at the beginning of a time epoch is given by   , while the reverse probability is   .Solving this DTMC with the indicated transition probabilities, we notice that the probability of the PU to be active at any given time epoch is   =   /(  +   ).We assume that the sensor nodes do not know those transition probabilities; rather they will try to figure them out from the sensing information they obtain about the spectrum bands.
Packets are assumed to be fixed in size and each time epoch is enough to perform sensing, send one packet, and then receive the corresponding ACK.Packets are generated (arrive) at a sensor node according to a random Bernoulli process, and in each of the network setups (explained below), we vary the mean packet arrival rate   at each sensor node  ∈ [1, ], thus increasing the load  on the system.The main simulation parameters are summarized in Table 1.The simulation run is executed for a total of  = 200,000 time epochs, which is more than sufficient for the system to reach steady-state.
To provide a reference for comparison, and to show how progressive learning in S-LEARN can benefit the CRSN system, we also simulate two more MAC protocols that fit with our system model and hardware capabilities: the first is a purely random harvest and transmit technique, and the other is a modified-CSMA protocol [46].
A sensor node in the first technique (the purely random algorithm) picks, every time epoch, a band purely randomly with equal probability out of the  permissible bands and then senses the band for the first part of the time epoch.If the band turns out to be occupied by an active PU, the sensor node starts harvesting energy from that PU for the rest of the time epoch.On the other hand, if the node senses a spectrum band that is unoccupied by a PU, the sensor node attempts to transmit its data packet in that band immediately and then waits for an ACK in the same time epoch.Whether the packet went through successfully or not, the sensor node does not attempt to send a new packet (or retransmit a failed packet) until it has managed to harvest enough energy from PUs equal to the transmission energy  Γ , even if it finds another empty transmission opportunity.Rather, the node keeps harvesting energy until its total harvested energy is equal to the transmission energy, and only then the node repeats its operations one more time.
Collisions are possible between nodes in the random harvest and transmit technique, but randomness in transmission helps in reducing such collisions.The randomness occurs because of three factors: the first factor is the randomness in picking the spectrum band, and the second factor is the time it takes a sensor node to collect  Γ energy, which is random as it varies depending on the random PU activity in each picked channel.The third contribution to randomness is obviously due to finding the empty slot, which occurs at different times for different nodes depending on the picked spectrum band and the inactivity of the PU at that instant (which is in itself is random).Hence collisions, though large, will not result in a catastrophic collapse of the system, especially that it takes much more time to charge the node (by wireless energy harvesting) than to drain it (by transmitting a data packet).
The second algorithm we implement is the slotted CSMA protocol modified for energy harvesting from the description in [46].Again, the sensor node picks every time epoch a band purely randomly with equal probability out of the  bands and senses that band.If the band is occupied by an active PU, the sensor harvests energy from that PU.Conversely, if the node senses a spectrum band that is unoccupied by a PU, and the node has enough energy in its battery, it attempts to transmit its data packet in that band assuming that the CSMA backoff counter reached zero.Otherwise the backoff counter is decremented and the sensor node tires again in the next empty transmission opportunity.
The backoff counter is part of the well-known binary exponential backoff technique (used in the CSMA protocol in IEEE 802.11 [47] and IEEE 802.15.4 [48]).It aims to reduce possible collisions when the load on the system is high.This technique requires each sensor node to defer its transmission by a total of  empty transmission opportunities, where  is a random integer drawn from a uniform distribution over the interval [0, 2  − 1], which is known as the contention window.The integer value of  is dependent on the number of consecutive collisions  the node suffered in sending the packet.The value of  increases as  increases but is maintained between the minimum and maximum values of  min and  max , respectively, as follows: The values we use for  min and  max are shown in Table 1.We will see that reducing collisions using this CSMA backoff technique is beneficial in CRSNs, since it saves wasted energy due to repeated retransmissions, which in turn also saves on the overhead of harvesting wireless energy to make up for such waste.

Network
Setups.We will test our MAC protocol under three different network setups to evaluate different aspects of the progressive learning technique.These network setups are as follows.
Setup 1: Stable Cognitive Radio Sensor Network.In this first test, a total of  = 900 sensor nodes operate for the whole duration of the simulation within  = 5 spectrum bands occupied by five different PUs.The conditions of the network are stable, as the different PUs are activated and deactivated randomly, each within its own band, according to the probabilities   = {0.1,0.3, 0.5, 0.7, 0.9}.Each sensor node has a queue to which packets arrive, with an average arrival rate of   packets per time epoch (the arrival rate is identical for all nodes, i.e.,   = , ∀1 ≤  ≤ ).We vary the packet arrival rate  to observe how our MAC protocol handles increasing the load on the system.Setup 2: New Sensor Nodes Joining the System.Compared to the first network setup, this is a similar experiment except that at the start of the simulation only 750 sensor nodes are active in the 5 spectrum bands.In the middle of the simulation, another 150 are added to the system for a total of 900 sensor nodes.This allows us to investigate the robustness of our technique as it allows the old and new nodes to progressively learn about the new topology of the network and adapt to this change.We maintain the same PU behavior as in the first experiment.
Setup 3: Sudden Change in PU Behavior.As the PUs activate and deactivate, the sensor nodes in our S-LEARN MAC protocol start learning the behavior of such PUs to better utilize their spectrum bands.The final test looks at the possibility of the PUs changing their behavior to which the nodes are now habituated.We observe how the sensor nodes can easily detect the change in the PU behavior and readjust their conduct to suit the new actions of the PUs.We design this network setup to be a variant of the first configuration, where the PUs are still activated and deactivated randomly.However, only during the first half of the simulation, the activation probabilities for the PUs are given by   = {0.1,0.3, 0.5, 0.7, 0.9}.In the middle of the simulation, the most active PU becomes the least active and the least active PU becomes the most active.In other words, the activation probabilities for the PUs in the second half of the simulation suddenly change to   = {0.9,0.3, 0.5, 0.7, 0.1}.Notice that the sensor nodes have to quickly and skillfully learn about this new situation because they rely on finding the most active PU for maximizing their harvested energy, and this most active PU changes at time epoch  = /2.

Performance Measures.
The following performance measures are used to evaluate the performance of our proposed MAC protocol and compare it to both the purely random harvest and transmit technique and the modified-CSMA algorithm.
Successfully Transmitted Packets (Throughput).We count the number of packets transmitted successfully from the  sensor nodes per time epoch (i.e., with an ACK received), and we denote this by ().Notice that this number is expected to be smaller than , because there are only  spectrum bands available for the  sensor nodes to transmit their packets at any time epoch.Considering also that during a sizable chunk of the time PUs will be active, thus preventing sensor nodes from using such bands, the number will be even smaller.In addition, some of the transmission opportunities will be missed and some will also be occupied by collisions, further reducing the () value.In the results, we sometimes evaluate the average number of successfully transmitted packets per epoch over all time epochs, which is given by  = ∑  =1 ()/.
Colliding Packets.We also count the number of packets transmitted by the  sensor nodes that have collided per time epoch (i.e., did not receive an ACK), and we denote it by ().Notice that the actual number of collision incidents per time epoch is smaller than () because two or more colliding packets are required to result in one single collision.However, the value of () is more relevant to our discussion here because it shows both the missed spectrum opportunity and wasted energy by sensor nodes.Since energy is hard to come by, a packet that suffers a collision represents not only a waste of spectrum but also a waste of energy that needs to be harvested for excruciatingly long time before being used to transmit a successful packet again.The average number of colliding packets per epoch over all time is  = ∑  =1 ()/.
Successful Harvesting Events.We denote by ℎ() the count of sensor nodes that each managed to successfully harvest  h energy from an active PU during a time epoch .The average of ℎ() over all time epochs is  = ∑  =1 ℎ()/.A desirable value of ℎ() is to be as close as possible to , because that means sensor nodes are mostly sitting in a PU-filled band to harvest energy.However, since PUs are not always active, and since nodes can be unlucky in guessing where to find an active PU, the number will drop below .
Node Energy.Throughout the simulation, we also track the energy level stored in the battery of each sensor node, which we denote by   ().The average energy for all sensor nodes per epoch is () = ∑  =1   ()/, and the average energy for all sensor nodes over all time epochs is  = ∑  =1 ()/.Higher values of   () are desirable in a CRSN because it allows nodes to adapt to changes in the network configuration, such as the change of PU activity without quickly running out of energy to transmit data.Nevertheless, to make the simulation more realistic, we limit the capacity of the battery to a maximum value of  = 10 Γ .This represents a very small sized and quite inexpensive battery that can only transmit 10 packets before needing a recharge.
Queue Size at Sensor Nodes.The queue size at each sensor node is also recorded at every time epoch as the simulation progresses and is denoted by   ().The queue size is an important parameter because it is directly proportional to both the memory needed by the sensor node (which adds to system cost) and also the delay the packets experience as they are traversing the sensor node.The average queue size over all sensor nodes per time epoch is evaluated as () = ∑  =1   ()/, and the average queue size for all sensor nodes over all time is denoted by  = ∑  =1 ()/.Smaller values of  are, of course, desirable to reduce system cost and to provide better performance to end users.We do not limit the queue size in our simulations, and packets are never discarded.
Learning Time.The robustness of our MAC protocol to changes in the network configuration when new sensor nodes join the system is measured by the learning time, denoted by   .The learning time measures the time interval needed for these newcomers (and incumbent nodes) to get acclimated with the new system configuration, thus reaching a new steady-state that is different from the old steady-state.We define this interval to be equal to the number of consecutive time epochs required for number of successfully transmitted packet (),  > /2, to reach the new average value of successful packets  new = ∑  =/2 () × 2/ after the system change happens at  = /2.Adjustment Time.The adjustment time, represented by   , is used when the activity of any of the PUs in the system changes.It measures the time it takes nodes in our MAC protocol to figure out the new PU activity behavior and adjust their harvesting/transmission schedules to suit that new configuration.We set this interval to be equal to the number of consecutive time epochs required for the number of successful harvesting events ℎ(),  > /2, to reach the new average value of successful harvesting  new = ∑  =/2 ℎ() × 2/ after the PU behavior changes at  = /2.Small leaning and adjustment times are desirable features of our S-LEARN technique, and we show in the results that we can control how small these values can get as part of a tradeoff between overall system throughput and quick learning of the system configuration.Figure 6 shows the number of successfully transmitted packets () by sensor nodes at each time epoch.The figure clearly shows the contribution of the proposed MAC algorithm which provides performance almost double that of the random and modified-CSMA techniques.This is achieved by the sensor nodes progressively and intelligently learning about the behavior of the PUs in the system and each other and dynamically adjusting their internal schedules to minimize collisions amongst themselves (see Figure 7) and also increase the harvested energy (see Figure 8).Both effects contribute to the increase in throughput since collision reduction increases the available spectrum for successful data transmission and reduces the wasted energy at the node.Saving such energy, and being able to find where the active PUs are located thus harvesting even more energy, means that each sensor node always has a healthy amount of energy stored in its battery (see Figure 9) available to send any packet when an opportunity is presented, without the need to delay such packet until energy is harvested at a very slow pace.This, of course, reduces the queue size (and packet delay) at each node, as evident by Figure 10.Of course, being able to transmit the packet with minimum number of collisions, and hence minimum number of retransmissions, also contributes to the lower packet delay and the smaller queue size seen in Figure 10.

Results and Discussion
The behavior of S-LEARN under various loads on the system is similar.We show a comparison between the number of successful packets for each of the three algorithms versus load in Figure 11.The superiority of the S-LEARN MAC protocol compared to others techniques is remarkable.Of course, as the load on the system increases beyond a certain point, the amount of energy needed by the sensor nodes   becomes quite large that the nodes can no longer harvest enough energy to meet such demand.In this case the throughput (number of successful packets) of all algorithms (including ours) levels off and the amount of stored energy in the node battery starts diminishing (see Figure 12).Clearly, though, our MAC protocol provides much higher throughput even at higher loads compared to the random harvest and transmit and the modified-CSMA protocols.In addition, S-LEARN retains energy in the battery to a better extent compared to these two techniques, of which the random   harvest and transmit technique is the worst.That is why its packets start accumulating heavily in the queue waiting for a long time, as evident by Figure 13.

Newcomer Nodes.
In the second scenario, we investigate what happens when the CRSN is supplemented by extra sensor nodes while in operation.We want to see if the new nodes can be seamlessly integrated into the system or if they do cause undesirable instability.Hence, we run the network using only 750 sensor nodes, but at time  = /2, another 150 nodes are activated.This brings the total number of sensor nodes to 900, which is similar to what we had in our first experiment.Hence, the results of the earlier section should provide a reference for comparison.
Figure 14 shows the number of packets successfully transmitted by nodes versus time.It is evident that the random and modified-CSMA techniques cannot adopt properly to this 20% increase in load on the system.On the other hand, our S-LEARN MAC protocol progressively learns about the new situation and gradually adjusts the schedules at the different sensor nodes to accommodate the newcomer population, thus increasing the overall system throughput, albeit on the  expense of some increase in collisions, which is natural at such high level of load (see Figure 15).
In Figure 16 we show the learning time   the S-LEARN technique requires for different loads on the system (i.e., different packet arrival rates).To find   we count the number of consecutive epochs it takes for the system to move from the old  value (e.g.,  = 1.38 in Figure 14) to the new  value (e.g.,  = 1.60 in Figure 14).We notice that as the load on the system increases, so does the learning time, which is expected.This continues until the load becomes high enough that the system is saturated, which means the throughput of  the system can no longer increase even with more offered load (see also Figure 11).At that point, the learning process cannot do anything to improve the system throughput and its interval starts diminishing.

PU Disruption.
In the final test we disturb the process of energy harvesting at the sensor nodes by abruptly changing the activity probability of the most active and the least active PUs in the system at the middle of simulation.Instead of  5 = 0.9, the most active PU switches to  5 = 0.1, and the least active PU, which used to have  1 = 0.1, continues the rest of the simulation with  1 = 0.9.Both the random harvest and transmit along with the modified-CSMA algorithms are not affected by this change because they do not make any assumptions about the location of the PU; rather they randomly and uniformly pick a band; if a PU is active in that band they harvest its energy.However, this behavior is not very useful because it can only harvest a small amount of wireless energy as seen in Figure 17.However, our S-LEARN technique not only can harvest more energy by predicting where the most active PUs are, but also can adjust once it notices that it is not harvesting enough energy, and recalibrate its harvesting scores to move to the spectrum band that corresponds to the most active PU (see Figure 17).
During the adjustment time, of course, there is a slight drop in the harvested energy and also a slight drop in the number of successful packets (see Figure 18), which is affected because of lack of sufficient energy.It is worth mentioning here that a larger battery reservoir (which we did not use) can help in such cases because it can accommodate continuing data transmission while adjusting the harvesting process to the new system conditions, albeit with the disadvantage of adding extra cost to the system.
It is also interesting to mention that the adjustment time   is not dependent on the system load as the equations governing the harvesting process (see (3) and ( 4)) are decoupled from the data transmission process (system load).This is evident in Figure 19, which shows the adjustment time versus average packet arrival rate.

Tradeoffs
The control parameters in our S-LEARN technique can be quite useful to tune the behavior of the CRSN to steer it  in the direction most desirable for the network designer.For example, the network can be optimized for overall throughput and minimum delay, or on the other hand, it can be optimized to quickly react to changes in the network setup, such as when new nodes are introduced to the system or when PU activity changes.
To show some examples on how the different parameters can control the performance of the powerful S-LEARN protocol, we rerun our simulations based on different values of the control parameters when  = 2.344 × 10 −3 packets/epoch.For example Figures 20-23 show the behavior of the network throughput, collisions, learning time, and adjustment time versus different possible values of the aging parameter .All other parameters are retained to the values in Table 1.
It is easy to see that a larger  value (closer to the maximum of 1.0) results in higher throughput and smaller number of collisions.This is because the sensor nodes start retaining the knowledge they learn from the network and use that to produce better schedules to avoid colliding with each other.However, when the sensor nodes cannot forget old information, they cannot adapt quickly to changes in the network configuration as evident by progressively longer learning time and adjustment time required when  is closer to 1.0.The designer can balance both requirements by choosing a midpoint value for the  parameter or, if he/she so desires, can prefer higher throughput compared to quick response, at which case a higher  value is more befitting.Or the designer might prefer better response to changes by reducing the value of the aging factor , albeit on the cost of losing some throughput.
The transmit score parameter   balances a tradeoff between throughput and learning time.Higher values of   force the sensor nodes to quickly leave contended bands and search for different ones, which reduces their aggressiveness in obtaining spectrum, thus reducing throughput (as shown in Figure 24).However, it also allows the newcomer nodes to avoid aggressively disturbing the incumbent nodes but substitute that with them looking for other empty bands.This results in a smoother joining process and shorter learning time (see Figure 25).
Similar observations can be made about the second transmit parameter   (see Figures 26 and 27), though for a different reason.Higher values of   force sensor nodes to leave bands that have been used successfully in the past for harvesting, which allows the sensor nodes to avoid spectrum bands that are commonly used by active PUs.However, this also means that some transmission opportunities in empty bands are missed (remember that no PU is active for 100% of the time), which means that throughput will suffer.Conversely, newcomers (who are starved of bandwidth) will have an easier time finding empty slots (that incumbents avoided) and hence will have a shorter learning time.
Finally, the harvesting parameter  ℎ has an effect on adjustment time and the amount of energy harvesting possible.When  ℎ is small, that is, when (1− ℎ ) is large, the second parameter in (3) becomes more dominant, allowing a sensor node to quickly detect changes in the PU activity probability and move quicker to a more active PU for harvesting, which translates into a small adjustment time.This is clearly evident from Figure 28.However, a higher  ℎ value allows a node to stick more closely to the spectrum where it has more energy compared to other spectrum bands, which when PU activity is stable allows the node to harvest much more energy in the long run (as shown in Figure 29).Hence, for stable networks we recommend a value of  ℎ = 1.0 to increase the level of energy harvesting, but for nodes where PUs can change their behavior frequently, we recommend a value of  ℎ ≤ 0.7 to keep the adjustment time small.

Conclusions
A novel MAC protocol was introduced for CRSNs.Each node in this protocol develops a schedule to coordinate its energy harvesting and data transmission activities.The schedule is built based on the perceived environment of PUs and other sensor nodes.This is achieved by simply maintaining four counters and using them intelligently to infer the surrounding conditions.We proved, via simulation, that the performance enhancements are remarkable, reaching almost twice that of random harvest/transmit and modified-CSMA techniques.
In addition, the S-LEARN protocol is quite robust and can easily adapt to new sensor nodes joining the network or to PUs changing their activity behavior.Not only that, but such robustness and performance are controllable by a set of parameters that can be changed to suit the objective of end users of the system.

Figure 1 :Figure 2 :
Figure 1: Cognitive sensor nodes utilize the spectrum of PUs to either send data to sink nodes or harvest energy from PUs.

Figure 3 :
Figure 3: Four cognitive sensor nodes are executing the S-LEARN MAC protocol to coordinate access to two spectrum bands in a CRSN.

Figure 4 :
Figure 4: Tentative schedules maintained by the four sensor nodes during cycle  in the example of Figure 3.

Figure 5 :
Figure 5: Activity state diagram for the PU in band .

5. 1 .
Stable Network.We use the first network setup to observe how our intelligent S-LEARN algorithm behaves compared to the random harvest and transmit technique and the modified slotted CSMA technique.Figures 6-10 show the results for the stable CRSN presented earlier.The figures shown are for the case where the average packet arrival rate at each sensor node is  = 1.95 × 10 −3 packets/epoch (or 0.5 packets per scheduling cycle).The figures are drawn with the aid of a s(t) [moving avg] S-LEARN (top) (S = 1.59)Modified-CSMA (middle) (S = 0.86) Random H&T (bottom) (S = 0.81) ×10 5

Figure 6 :
Figure 6: Number of successfully transmitted packets by sensor nodes at each time epoch in the first network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 7 :
Figure 7: Number of colliding packets from sensor nodes at each time epoch in the first network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 8 :
Figure 8: Number of sensor nodes successfully harvesting energy at each time epoch in the first network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 9 :
Figure 9: Energy stored in each node averaged over all nodes at each time epoch in the first network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 10 :
Figure 10: Queue length at each node averaged over all nodes during each time epoch in the first network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 11 :Figure 12 :
Figure 11: Number of successfully transmitted packets by sensor nodes versus arrival rate (system load) in the first network setup.

Figure 13 :
Figure13: Queue length at each node averaged over all nodes versus arrival rate (system load) in the first network setup.

Figure 14 :
Figure 14: Number of successfully transmitted packets by sensor nodes at each time epoch in the second network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 15 :Figure 16 :
Figure 15: Number of colliding packets from sensor nodes at each time epoch in the second network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 17 :
Figure 17: Number of sensor nodes successfully harvesting energy at each time epoch in the third network setup when  = 1.95 × 10 −3 packets/epoch.

Figure 18 :Figure 19 :
Figure 18: Number of successfully transmitted packets by sensor nodes at each time epoch in the third network setup when  = 1.95 × 10 −3 packets/epoch.

6 AFigure 20 :
Figure 20: Average number of successfully transmitted packets by sensor nodes versus different values of the aging parameter .

6 AFigure 21 : 6 AFigure 22 :
Figure 21: Average number of colliding packets from sensor nodes versus different values of the aging parameter .