Balancing Long Lifetime and Satisfying Fairness in WBAN Using a Constrained Markov Decision Process

As an important part of the Internet ofThings (IOT) and the special case of device-to-device (D2D) communication, wireless body area network (WBAN) gradually becomes the focus of attention. Since WBAN is a body-centered network, the energy of sensor nodes is strictly restrained since they are supplied by battery with limited power. In each data collection, only one sensor node is scheduled to transmit its measurements directly to the access point (AP) through the fading channel. We formulate the problem of dynamically choosing which sensor should communicate with the AP to maximize network lifetime under the constraint of fairness as a constrained markov decision process (CMDP). The optimal lifetime and optimal policy are obtained by Bellman equation in dynamic programming. The proposed algorithm defines the limiting performance in WBAN lifetime under different degrees of fairness constraints. Due to the defect of large implementation overhead in acquiring global channel state information (CSI), we put forward a distributed scheduling algorithm that adopts local CSI, which saves the network overhead and simplifies the algorithm. It was demonstrated via simulation that this scheduling algorithm can allocate time slot reasonably under different channel conditions to balance the performances of network lifetime and fairness.


Introduction
With the increasing development of wireless communication technology and wireless sensor network (WSN), the emerging wireless body area network (WBAN) provides great opportunities in real-time healthcare monitoring, fitness, entertainment, and consumer electronics applications without restricting the activities of users [1].WBAN is a dynamic network with sensor nodes in, on, or around the body for continuous monitoring of physiological parameters with capabilities of real-time processing and data communication as well.Device-to-device (D2D) communication is a hot technology, which allows direct communication between closely located devices using the licensed band [2].Networks can benefit from improved reliability, robustness, and coverage provided by D2D communications [3].However, there are some challenges in D2D communication.Power efficiency is one of the difficulties due to the fact that proximate devices must use very low transmission power for reliable communication [4].Besides, source allocation is another research hotspot in D2D communication.In [5], a fair resource allocation problem for D2D communications was studied in orthogonal frequency division multiple access-(OFDMA-) based wireless cellular networks.In [6], a genetic algorithm (GA) with frequency hopping technique was proposed to optimally select the number of frequency channels required in the system and then allocate these frequency channels to the UE clusters for D2D communication.The sensor nodes in the dynamic network of WBAN and the AP can be considered as proximate devices in D2D communication.Using the processing methods for references is thus a good choice to improve the performance of WBAN.
In the literature, some methods have been proposed aiming at improving the performance of WBAN in the aspects of MAC (media access control) protocol design [7], data fusion [8], security [9], and so forth.However, WBAN is still in its early development stage, and there are some challenges we must conquer before it can be widely applied [10].For instance, limited battery energy makes a huge demand of lifetime.Lifetime extension in WBAN has attracted increasing interest of researchers.Transmission scheduling algorithms have been explored for maximizing the lifetime of WSN in some publications, for example, [11,12].In [13], a general formula of lifetime in WSN was proposed, which demonstrated that channel state information (CSI) and residual energy information (REI) were major parameters in the issue of lifetime maximization.CSI is also an essential factor in closed-loop wireless communication systems [14].A dynamic transmission scheduling scheme dubbed dynamic protocol for lifetime maximization (DPLM) was proposed in [15], which has been demonstrated to be asymptotically optimal in network lifetime.As shown in [16], the problem of lifetime maximization was formulated as a stochastic shortest path Markov decision process.An iterative algorithm was developed to find a Pareto-optimal solution for maximizing the lifetime of WSN in [17].
These transmission scheduling schemes care about only lifetime maximization in WSN under the condition of homogeneous traffic requirements.However, this cannot work well when sensor nodes require different data transmission rates.In WBAN, different types of physiological parameters such as body temperature, blood pressure, electrocardiograph (ECG), and electroencephalograph (EEG) are monitored, which requires various data rates for sensor nodes.The transmission scheduling schemes mentioned above can cause socalled unfairness in selection where some channel conditions cannot be satisfied during a long time and consequently some sensor nodes will not be selected.
The concept of fairness has been intensively investigated for resource allocation in wireless network.As is shown in [15], a pure opportunistic transmission scheduling scheme was proposed to carry out transmission in the best channel condition.This scheme is throughput optimal; in the meantime, it can cause unfair resource allocation.Several algorithms have been presented to improve the system performance under fairness constraints.The authors in [18] take network throughput and fairness of user equipment into account by performing interference management.An opportunistic fair scheduling scheme for CDMA (code division multiple access) networks was developed in [19], which relates the average transmission of users to their fair weights achieved.In [20], an optimization framework was proposed to balance the performance of lifetime and fairness.To allocate time slots based on the demands of sensor nodes, a utility-based allocation method was adopted in [21].In [22], a fair resource allocation approach was proposed for D2D communication in wireless cellular networks.
Due to limited battery energy in WBAN, increasing network lifetime and meantime maintaining fairness are conflicting with each other within limits.Balancing network lifetime and fairness performance is important in WBAN.In the literature, however, very few research efforts have been made to address the issue of balance between lifetime and fairness.To the best of our knowledge, there is no widely accepted unified framework that can be effectively used for accurate evaluation of performance under different tradeoff between lifetime and fairness.To address this problem, a novel centralized transmission scheduling scheme that utilizes a constrained Markov decision process (CMDP) and The remainder of this paper is organized as follows.In Section 2, the model of WBAN and the formulation of lifetime are described.In Section 3, an optimization framework using CMDP is presented.The proposed fair weights transmission scheduling scheme is presented in Section 4. Finally, we conclude the paper in Section 5.

WBAN Model
2.1.WBAN Model.We consider a WBAN which consists of an access point (AP) and  sensors with initial energy  in .We adopt a star topology.Each sensor transmits its own equalsized data packets that are directly transmitted to the AP through a common channel as shown in Figure 1.Assume a block channel that remains constant within each transmission slot and varies independently in between different slots.During communication in WBAN, the strength of signals can be fading due to reflection, diffraction, energy absorption, shadowing by body, and body posture.A theoretical channel model may trace back to the fundamental theories of electromagnetic propagation and require precise modeling of a specific situation, which is too complex and exceeds our research.
Each sensor node measures a certain physiological parameter and transmits corresponding data packets directly to the AP through the fading channel.The received signal can be expressed as where   () is the transmitted signal, ℎ  is the channel fading in sensor , and () is additive white Gaussian noise with power spectrum density  0 /2 that is the identical in all sensors.Assume that a block-fading channel of  seconds with channel gain expressed as keeps constant within each time slot.Here   is exponentially distributed in the condition of independent Rayleigh fading.The AP broadcasts beacon signals to initiate the data collection process and each sensor node estimates their channel condition.Since the AP is usually a mobile phone or another personal digital assistant (PDA) containing enough energy, the energy consumption of the AP is not considered in this work.We suppose that sensors can ensure satisfying transmission and reduce unnecessary energy consumption by adjusting their transmission power according to channel conditions.In practical applications, sensors can only transmit at a finite number of power levels according to hardware limitations [23].Let  be the number of power levels and  1 ,  2 , . . .,   , . .., and   denote the power scaling factors of a transmitter, where 0 ⩽  1 < ⋅ ⋅ ⋅ <   ⩽ 1.The power level is then restricted to a finite set shown as where   is the transmission power available for sensor  transmitting a data packet to the AP if it is scheduled and  max is the maximum transmission power that transmitter can achieve.For simplicity, based on the Shannon theorem, the transmission rate of sensor , denoted by V  , can be expressed as where  is the desired value of transmission power in theory and  is the bandwidth.In (4),   adopts the minimum value for the sake of matching with the inequality.Since   depends on the current channel gain and the transmission rate related to the sensor , the energy consumption for data transmitting of sensor  in data collection can be written as where   is the energy consumption of the transmitter circuit and it is identical for all sensor nodes.

Formulation of Lifetime in WBAN.
A general formula of lifetime in WSN is described in [13].We adopt this lifetime concept in WBAN, which expresses WBAN lifetime as In (6),  in is the initial energy of sensor nodes,  tx is the expected transmission energy consumed in one round of data collection,   is the expected wasted energy, and  is the energy required by a sensor for CSI acquisition.The wasted energy is set to be the total unused energy when the lifetime completes.It can be expressed as where   is the wasted energy of sensor .A sensor node is supposed to be dead when its residual energy is lower than the transmitter circuit consumption; that is, under any channel condition it has no enough energy to transmit.A WBAN is considered to be dead when any sensor node in this network is dead.In this paper, we express the lifetime of a WBAN as the number of data allocations before the network dies.

Optimal Transmission Scheduling
In each time slot, only one sensor node is scheduled to transmit its measurements directly to the AP through the fading channel.We assume that the instantaneous CSI of all sensors is available to the AP.In this section, we formulate the problem of dynamically choosing which sensor should communicate with the AP to maximize network lifetime under the constraint of fairness as a CMDP.We propose a centralized transmission scheduling algorithm that maximizes network lifetime under different constraint of fairness.
The optimal lifetime and optimal policy are achieved by Bellman equation in dynamic programming.The optimal policy using global CSI defines the limiting performance in network lifetime for the model specified in Section 2.

Fairness Index.
Fairness is in general a critical factor in performance studies.Particularly in distributed networks where resources are shared by a number of users, fair allocation is extremely important and fairness is considered as an important criterion in the design of a WBAN.
In the MAC layer of IEEE 802.15.6 specification, time is divided into superframes, each with equal length.The superframes consist of four periods: control period, contention access period (CAP), contention-free period (CFP), and inactive period.The CFP is further divided into a number of time slots.We focus on the time division multiple access-(TDMA-) based protocol, in which data packets are mainly transmitted in the CFP.Therefore, this is a time-slotted network, where time is the resource to be allocated among the sensor nodes.
In the literature, Jain's fairness index [24] has been widely used as a measure of network-wise fairness performance.Let   denote the actual transmission time of sensor  and let  denote the total transmission times.  indicates the weighting factor, which expresses the degree of importance of sensor .Then the normalized time allocation of sensor  can be given as If   is used to represent the allocation received by sensor  in a network with  competitive sensors, the fairness index for the network we proposed in this work can be expressed as A transmission scheduling scheme is considered to be perfectly fair if  = 1.A higher value of  indicates higher fairness level among the sensor nodes and the converse is true.

CMDP Formulation.
We use an infinite horizon CMDP to model the sensor selection problem aiming at achieving different performance tradeoff between lifetime and fairness.
The major components are elaborated below.

State Space.
Let e be residual energy and let w and  be the transmission energy requirement and fairness index in each time slot, respectively.The network state space characterized by e, w, and  can be expressed as When the network lifetime expires, it reaches a special termination state   expressed as where   <   indicates that the residual energy of the th sensor reduces below the transmitter circuit consumption and e < w indicates a transmission failure.

Action Space.
The set of actions is denoted by A. The action space in state  = (e, w, ) ∈ S can be described as follows: The set of actions consists of the indexes of all sensors that support the current transmission.

Transition Probability.
Assume that sensor  will be selected for transmitting after action  is applied.If the state at time  is  and action  is taken, then the probability that the next state is  can be calculated as where (  ) = Pr{ =   } is the probability mass function of  determined by channel fading for a predefined set  of transmission energy.Let I  = (0, . . ., 0, 1, 0, . . ., 0) be a 1 ×  unit vector that the th element is set to be 1.1 when state  ∈  and  ∉   .The network lifetime, denoted by , can be described as the accumulated total reward until the network enters a terminating state in   .
3.2.5.The Constraint.In state , an action is considered as a feasible action if the fairness index  of the network is larger than a given threshold   after action  is applied.This can be expressed as where A policy  * is considered optimal if it obtains the maximum expected lifetime before the network reaches the terminating state; that is, We define   = { ∈  :   ⩾   } as an available policy set.If  * ∈   satisfies the condition ( * ) ⩾ (), ( ∈   ),  * is called the constrained optimal policy.
Hence, the constrained optimization problem is to find a feasible  ∈  that maximizes network lifetime .The optimal sensor scheduling protocol is given by the constrained optimal policy  * in the above CMDP problem.
where   is the fairness index of the network in state  and   is the fairness threshold specified according to different application scenario.Equation ( 18) can be rewritten as An optimal policy  for transmission scheduling protocol is given by Similar to [11], an equivalent modified Bellman's optimality equation can be expressed as Hence, the constrained optimal policy can also be expressed as

𝑢 [(e,w, 𝑓)] = arg max
3.4.Implementation and Overhead.We need to acquire network energy profile e and transmission energy requirement w to implement constrained Markov decision process using global CSI.We also need to understand how to realize the instantaneous channel for all sensors.The way of implementation is elaborated below.First, AP broadcasts a beacon signal to activate each sensor in the network at the start of a data collection slot.Each sensor then responds to the AP by sending pilot signals and acquiring global CSI.The AP estimates the channel station of all response sensors and realizes the transmission energy requirement w by the response signals from sensors.Next, according to the CMDP and the current network state (e, w, ), the AP determines which sensor to be scheduled.Lastly, the AP broadcasts the ID of the selected sensor and the required transmission power level.Then, the chosen sensor on the required transmission power level reports its observed value to the AP.The AP can trace the network energy profile easily by knowing the scheduled sensor's ID and channel realizations of all sensors.
The main disadvantage of the CMDP is the huge energy consumption due to the fact that each sensor needs to consume energy to transmit pilot signals to acquire global CSI.Nevertheless, any sensor scheduling protocols would sacrifice the network lifetime.

Numerical Results.
In this section, the transmission scheduling using CMDP is evaluated in simulation.The performance is evaluated by using lifetime and fairness index.
Firstly, we compare the lifetime of the optimal transmission scheduling with the following four scheduling protocols: (1) the AP randomly scheme which selects a sensor to transmit; (2) the pure opportunistic scheme which selects the sensor with the best channel condition; (3) the pure conservative approach which chooses sensor with the most residual energy; (4) the DPLM protocol which selects the sensor with the largest ratio between the residual energy and the current transmission consumption.As shown in Figure 2, when the WBAN adopts optimal transmission scheduling, pure opportunistic, pure conservative, randomly, and DPLM schemes, respectively, the network lifetime is proportional to the initial energy of sensor nodes.Obviously, the randomly scheme performs the worst.The pure conservative approach outperforms the pure opportunist approach.Among these, the optimal transmission scheduling achieves the perfect performance.
In WBAN, every sensor node monitors various physiological parameters with different degrees of importance.That is to say, the sensor nodes have different weighting factors in the network.Then, we illustrate the performance of the optimal transmission scheduling scheme under the condition of equal weighting factors and different weighting factors.

Equal Weighting Factors.
In the simulation of Figure 3, we set that the weighting factors of every sensor are the equal value of 1/3.The fairness thresholds are assigned to be 0.3, 0.5, 0.7, and 0.9, respectively, which represent the fairness requirement in various application scenarios.As shown in Figure 3, the network lifetime improves with the increase of initial energy of sensor nodes.The higher the requirement of fairness, the lower the network lifetime.

Different Weighting Factors.
In order to compare with the above simulation, the weighting factors are assigned to be 0.7, 0.2, and 0.1, respectively.The set of fairness thresholds  are equal to Figure 3.As shown in Figure 4, lifetime also follows the change of fairness, when all sensors have identical initial energy, the lower the requirements for fairness, and the longer the lifetime.The basic difference between Figure 3 and Figure 4 is that the layouts of curves in Figure 4 are more diffuse.This means that the constraint of fairness has a greater impact on the performance of network lifetime under the condition of different weighting factors.As shown in ( 8) and ( 9), the weighting factors work as a part of fairness indexes.When the weighting factors of different sensors are not identical and especially when the weighting factors of some sensors are much larger than the others, the proposed algorithm tends to select the sensor with a higher weighting factor rather than the one with a longer lifetime, in order to ensure fairness performance.Therefore, if fairness is highly required in WBAN, the weighting factors have an intensive impact on network lifetime; otherwise, the weighting factors have a smaller impact on lifetime.By comparing Figures 3 and 4, it can be observed that the proposed optimal transmission scheduling scheme can achieve different degrees of tradeoff between network lifetime and fairness by utilizing different degrees of fairness constraint.

Proposed Fair Weights Scheduling Scheme
Due to the defect of large implementation overhead in acquiring global CSI, we put forward a novel distributed scheduling algorithm that adopts local CSI, which saves the network overhead and simplifies the algorithm.

Design Principle.
In this section, a novel distributed transmission scheduling scheme, which is named as fair weights scheme in this paper, is proposed.Fair weights consist of functions of CSI, REI, exponentially weighted moving average (EWMA) of data rate, and expected data rate.The proposed fair weights scheme satisfies the following design principles: (i) maximize lifetime of the whole network through weights of both CSI and REI; (ii) maintain fairness.To do so, the transmission scheme should adjust fair weights grounded on old samples of sensor nodes' transmission condition.

Fair Weights Scheme
. DPLM has been proved to be asymptotically optimal in lifetime, in which we based on in lifetime maximization.At the beginning of data allocation, the sensor whose current energy consumption demands the smallest portion of its residual energy for transmission is selected based on the DPLM scheme.Accordingly, the energy-efficiency index is defined as a ratio of residual energy and expected energy consumption is expressed as where   is the residual energy of sensor .
In order to maintain fairness, we adopt the EWMA of data rate.The scheme holds a running average V  of data rate, which is achieved by using exponentially weighted moving average of each newly obtained sample during decision history.In our scheme, at the beginning of each scheduling interval , the exponentially weighted moving average of data rate for sensor  is updated as where V  ( − 1) is the transmission rate of sensor  at scheduling interval  − 1.If sensor  is not scheduled to transmit at scheduling interval  − 1, V  ( − 1) is assigned to be the value of 0. In (10),   ∈ [1, 0] is a constant, which determines the rate of exponential decay of the previous samples.A larger   results in rapid decay and the converse is true.Considering this, we can tune   according to the physiological parameter to improve fairness.For example, ECG monitoring requires higher data rate than body temperature monitoring.Therefore, a higher   will be set for ECG monitoring sensor to make V  decrease radically when it fails to be scheduled for several times.On the contrary, lower   should be set for body temperature monitoring sensors requiring lower data rate.It can be seen that this scheme is desirable in the sense that it attempts to compensate for unfairness of recent allocations as much as possible.
Letting   be the ratio of V  to the expected data rate of sensor , denoted by  0 , then it can be expressed as In (25),   represents the deviation between V  and  0 .Note that  0 is different for different sensors processing different kinds of physiological parameters.Then the fair weight of sensor  is defined as The fair weights in (26) are used to determine which sensor node will transmit its measurements to the AP through the common channel during each round of data collection by exploiting CSI, REI, data rate requirement, and decision history.If sensor  has not been scheduled for a long time, the value of V  decreases severely.As a result,   decreases and the fair weight   increases to achieve larger possibility for seizing the channel.The fair weights are calculated periodically to accommodate the channel condition.The objective of the proposed scheme is to improve fairness performance without sacrificing excessive network lifetime, which can overcome the shortage of the existing distributed transmission scheduling schemes.Consequently, it is able to achieve a desired balance between increasing lifetime and maintaining fairness in the design of a WBAN.

Distributed Implementation.
In this part, we consider the implementation of the proposed distributed transmission scheduling scheme.Here, we adopt opportunistic carrier sensing [10] in our implementation.The basic idea is to match the fair weight of each sensor node with the backoff function of carrier sensing.It provides a distributed solution in the searching of global maximum.At the beginning of each scheduling interval, the AP broadcasts a beacon and each sensor node estimates its channel information and calculates the predefined fair weight.After that, each sensor node maps its fair weight  to a backoff time using a predetermined backoff function () and then listens to the channel.If () is designed to be a strictly decreasing function of  as shown in Figure 5, this opportunistic carrier sensing will ensure that only the sensor with the maximum fair weight will transmit data.The propagation delay among sensors is assumed to be negligible, and the sensor will be scheduled to transmit if its backoff time   expires before the other sensors transmit, which indicates that the sensor with the maximum fair weight will seize the channel.In the case that multiple nodes have identical values of fair weight , collision will happen.This will be considered in our future work.

Fairness Criterion.
A fairness factor is defined for fairness evaluation in this section, which is expressed as where the   and   denote the ratio between actual transmission times and the given expected transmission times of sensors  and , respectively, in a fixed time period.

Numerical Results.
In this section, simulation results are provided to illustrate the effectiveness of the proposed fair weights scheme.Lifetime and fairness factor are used as performance metrics.The proposed fair weights scheduling scheme is compared with the pure opportunistic scheme that uses only CSI and the DPLM scheme that utilizes the energyefficiency index to select the sensor transmitting data packets to the AP.Without loss of generality, the estimation of channel conditions that is identical in all schemes and the energy consumption of carrier sensing are not considered here.A WBAN consisting of 5 sensor nodes is considered.The power spectrum density of noise is set to be −70 Dbm/Hz, and the bandwidth is 5 × 10 6 Hz.Assume that the channel gain follows an exponential distribution and is set to be 1.The energy consumption of transmitter circuitry, denoted by   , is set to be 0.001 Joule.The transmission rate V  is 250 k bits per second for all sensors.The expected data rates are gradually increased; that is,  0 = {0.5, 1, 2, 5, 10} kbps for sensors 1-5.It means that the index of a sensor node is proportion to data packets that are to be transmitted.In our scheme, the parameter   is initially set to be {0.5, 0.6, 0.7, 0.8, 0.9}, International Journal of Antennas and Propagation  for sensors 1-5, respectively, and will be adjusted according to miscellaneous physiological parameters.
Figure 6 shows the lifetime of the WBAN designed using different methods, that is, pure opportunistic, the DPLM, and the fair weights schemes.The network lifetime improves with the increase of initial energy of sensor nodes.It can be seen from the figure that the pure opportunistic scheme ignoring REI achieves the worst performance.DPLM has the best performance of lifetime without considering fairness constraint.Although considering the condition of fairness constraint, the WBAN from the fair weights scheme has the lifetime equal to DPLM.The small gap between the performance of DPLM and the fair weights is due to the fact that the fairness constraint will force the scheme to select the sensor that has not been scheduled for a long time regardless of their CSI and REI.
Figure 7 shows the fairness factor versus 300 of scheduling intervals achieved by using the pure opportunistic, DPLM, and fair weights schemes.As the number of scheduling intervals increases, the fairness factor tends towards being stable.The scheduling scheme satisfies the rate requirements of sensor nodes on a large time scale while the fair weights periodically computed compensate unfairness.It can be found from Figures 6 and 7 that, by sacrificing a small amount of lifetime within an acceptable level, the fair weights scheme is able to greatly enhance the fairness performance.In the meantime, the network designed using the fair weights scheduling scheme achieves a balance between lifetime and fairness performance.
In order to illustrate the fairness performance more clearly, the scheduled time slots of each sensor node versus time are compared when their traffic is scheduled by DPLM and fair weights schemes, which are shown in Figure 8.The range of time slots in simulation is from 1 to 250.In order to make the contrast more obvious, we set that the channel gains of sensor 3 and sensor 5 are persistently severe.The simulation illustrated in Figure 8 shows that the DPLM scheme ignores sensor node with low energy-efficiency index, such as sensor 3 and sensor 5.This result leads to severe unfairness in data rate allocation.As shown in the corresponding simulation of fair weights scheme, the sensors that are ignored by DPLM get compensations and the sensors with higher expected data rate obtain more priorities to transmit their data packets to the AP.It can be seen from the simulation results that the proposed fair weights scheme can effectively allocate time slots to balance the performance of lifetime and fairness in various channel conditions.

9 Figure 3 :
Figure 3: Expected lifetime for optimal lifetime with equal weighting factors.

9 Figure 4 :
Figure 4: Expected lifetime for optimal lifetime with different weighting factors.

Figure 6 :
Figure 6: Lifetime versus energy of sensors achieved from pure opportunistic, DPLM, and fair weights scheme.

Figure 8 :
Figure 8: Scheduled time slots of sensors 1 to 5 versus time for DPLM and fair weights scheme.
(, ) represents the fairness index of the next state in state  after action  is applied.The set of available actions in state  is denoted by   ().
3.2.6.CMDP Formulation.Now we formulate the sensor scheduling problem in the form of CMDP.A transmission scheduling protocol is a policy  in CMDP.A policy  in the policy space U is a sequence; that is,  = { 0 ,  1 , . ..},where   :  → {1, . . ., } specifies the sensor selected in the th time slot and   (⋅ |  1 ,  1 ,  2 ,  2 , . . .,  −1 ,  −1 ,   ) is a conditional probability measure over A. Let   () specify the expected network lifetime (the total reward in the CMDP) starting from state  with policy .The maximum expected lifetime  * () starting form state  is given by