Intelligent Channel Allocation for Age of Information Optimization in Internet of Medical Things

Alongwith the development of realtime applications, the freshness of information becomes significant because the overdue information is worthless and useless and even harmful to the right judgement of system. Therefore, The Age of Information (AoI) used for marking the freshness of information is proposed. In Internet of Medical Things (IoMT), which is derived from the requirement of Internet of Thins (IoT) in medicine, high freshness of medical information should be guaranteed. In this paper, we introduce the AoI of medical information when allocating channels for users in IoMT. Due to the advantages of Deep Q-learning Network (DQN) applied in resource management in wireless network, we propose a novel DQN-based Channel Allocation (DQCA) algorithm to provide the strategy for channel allocation under the optimization of the system cost considering the AoI and energy consumption of coordinator nodes. Unlike the traditional centralized channel allocation methods, the DQCA algorithm is distributed as each user performs the DQN process separately. The simulation results show that our proposed DQCA algorithm is superior to Greedy algorithm and Qlearning algorithm in terms of the average AoI, average energy consumption and system cost.


Introduction
Corona Virus Disease 2019 (COVID-19) has caused more than 2.32 million deaths worldwide by February 8 th , 2021 [1]. People are forced to stay at home, reduce the trip proportion, and avoid to go to crowded places. In this case, both the government, medical staff, or the general public hope to monitor virus infections like COVID-19 and isolate them in time to avoid the spread of the virus on a large scale. Besides, people are more concerned about their health than ever before. More and more chronic patients and even healthy people hope to have longterm effective monitoring of their bodies and obtain important information about their health as soon as possible. The emergence of the Internet of Medical Things (IoMT) has provided the possibility to solve these problems, and its intelligent monitoring function has gained massive demand around the world [2].
For the COVID-19 virus, Swati Swayamsiddha et al. proposed a Cognitive Internet of Medical Things (CIoMT), which is a particular case of the IoMT, enabling real-time tracking, remote monitoring of patients, rapid diagnosis, contact tracing and clustering, screening and monitoring, etc., thus reducing the workload of medical staff and preventing and controlling the spread of the virus [3]. RaviPratap Singh et al. discussed the feasibility of using the IoMT to track, monitor, analyze data, and provide treatment plans for orthopedic patients in an environment ravaged by COVID-19 [4]. For COVID-19 management, M.A. Mujawar et al. also proposed a health monitoring system based on wearable devices and artificial intelligence, which continuously monitors the patient's heartbeat, body temperature, and other parameters through medical sensors and transmits them to cloud storage through WSN. At the same time, these parameters are used to update the user's health status in real time and then the status will be sent to the medical staff [5].
The IoMT is a vast network system with diverse technologies. This paper only studies the channel allocation problems in the monitoring and transmitting human physiological data in the IoMT. During the monitoring and transmission, too old data may cause erroneous analysis and evaluation, reduce the accuracy and reliability of system decision-making, and even threaten the safety of users. Therefore, the freshness of information is crucial, and it also occupies an essential position in the design of 6G systems applied to body area networks [6][7][8][9][10]. To effectively describe the freshness of information, this paper introduces the Age of Information (AoI) [11], and studies the channel allocation problem of IoMT with AoI as the target.
In recent years, artificial intelligence has become an effective method to solve the resource allocation problem with many data processing [12]. As the main solution of artificial intelligence, machine learning has also received tremendous attention in recent years. Machine learning uses algorithms to analyze and learn from data to make decisions and predictions about real-world events. Among them, deep learning is the most popular machine learning method at present, which has been well applied in automatic detection [13,14], case recognition [15][16][17], environmental monitoring [18], and epidemic prediction [19], etc. In terms of channel allocation, with the rapid growth of network size and data volume, deep learning can significantly improve the processing speed for a large number of nodes [20][21][22][23].
The research content of this paper is the problem of channel allocation among users oriented to the optimization of the AoI. The AoI of each controller on each user's body at the gateway is the number of slots experienced by the latest update received from this controller at the end of each slot. In each time slot, the system needs to pay for the AoI. our requirement of timely updating the content received by the gateway is reflected in the minimum payment cost of the whole system. At the same time, this paper adopts a deep learning method to solve the proposed optimization problem. The main contributions of this paper are as follows: (i) In view of the channel allocation problem of the IoMT, we focus on the timeliness of the information, and at the same time, considering the mobility of nodes. To measure the cost that the system pays for the lack of new information on gateway, we propose a system cost function based on the AoI and the current energy consumption rate of the nodes.
(ii) Based on the cost function, we constructed a mathematical model of the optimization problem that minimizes the average cost for the channel allocation of the IoMT.
(iii) For the problems raised, we propose a Deep Q-Learning Network (DQN) based channel allocation algorithm, named DQCA, which provides channel allocation scheme to minimize the cost on the basis of meeting the requirements of node SNR and residual energy.
The rest of the paper is organized as follows. Section 2 provides a comprehensive overview about the AoI. Section 3 describes the system model and optimization model of channel allocation problem in IoMT. The proposed DQCA algorithm is illustrated in Section 4. The simulation and performance evaluation is performed in Section 5. Finally, we conclude the paper in Section 6.

Related Works
With the increasingly developed Internet of Things (IoT), real-time applications are gradually increasing, such as driverless cars, which make decisions and control based on road information detected by sensors, adjust the travel mode of vehicles, avoid collisions, and ensure the safe driving of driverless cars. This type of application requires high timeliness and freshness of data, and outdated data will lead to wrong judgments and decisions. The longer the time, the less important and effective the data will be. In order to measure the freshness and effectiveness of data, scholars put forward the indicator of the AoI in 2011 to quantify the freshness of information on a remote system state [11]. The AoI refers to the time elapsed between the creation of the newly successfully received information and its successful reception. The AoI is different from the transmission delay of information.
In a system with multiple source nodes and one destination node, each source node collects information and sends it to the destination node regularly. At the destination node, the AoI of each source node can be calculated [24]. Since the source node is constantly sending information to the destination node, the AoI of each source node refers to the AoI of the latest information received by the destination node from that source node. In other words, the AoI of each source node is not fixed and depends on the sending rate of the source node and the receiving rate of the destination nodes for source node's information. If the destination node has not received the latest information from a certain source node, then the AoI of the source node will show a linear increase until it gets the newest information from the source node and changes to the AoI of the latest information.
As shown in Figure 1, t i ði = 0, 1, 2 ⋯ Þ is the time that data packet i is generated by node j, t i ′ is the time that data packet i is received by the destination. When t =0, the destination node receives a data packet 0 from node j, then A j ð 0Þ = A 0 = t 0 . Then A j ðtÞ increases linearly until the destination node receives a latest data packet 1 at t 1 ′ . At this time, A j ðtÞ is updated as A j ðt 1 ′Þ = A 1 = t 1 ′ − t 1 . Like this, we can deduce that A j ðt 2 ′Þ = A 2 = t 2 ′ − t 2 when the destination node receives a latest data packet 2 at t 2 ′ , and so forth.
The Swedish scholar Antzela Kosta et al. published a review paper on the AoI in 2017, introducing the concept of AoI in detail and summarizing the early researches [25]. Jhunjhunwala P R et al. proposes an AoI-aware channel scheduling algorithm for a sensor network with a monitoring station and multiple source nodes. The algorithm proposes that the cost function is a non-declining function, but it does not provide a completely function and optimization model [24].
There have been some researches on the AoI in the IoT.

System Model and Optimization Model
3.1. System Model. Figure 2 illustrates the topology of the IoMT which is born out of the IoT and wearable devices. Therefore, the core of the IoMT are the users equipped with several wearable devices involving wireless sensors. These wearable devices on user's body can detect the physiological information (such as the blood pressure, the pulse, the temperature, and the electrocardiogram (ECG), etc.) and mobility information (such as location, move speed and move direction, etc.). In addition, there is a coordinator on user's body used to collect the information from all the wearable devices on the same body and communicate with the gateway. The physiological information of all users is sent to the gateway and then transmitted to the nurse, doctor or ambulance on demand through the Internet. In this paper, each user selects a channel from a gateway in each time slot. In order to describe the problem more conveniently, we first illustrate the notations. The AoI of each mobile node is defined as the elapsed time when the latest data of this node is received by the gateway, as shown in Eq. (1). ( ð1Þ t gen cur is the generation time of the currently received data frames, t s is the length of each time slot. Here, we represent the AoI by using the specific time other than the time slot, which is more precise. At each time slot, the system pays the cost for AoI, and the cost CðtÞ is defined as a function of the AoI of all mobile nodes. Since CðtÞ is the cost paid by the system for lack of fresh information from the source node, it is a non-descending function, as shown in Eq. (2).
Where f ðl j ðtÞÞ is defined as the cost function of the AoI of node j, it is determined by the ratio of the consumed energy of node j to the initial energy, Among them, E j is the energy consumed by the node, ε f s d 2 i,j is the energy consumption of free space transmission.
The mobile node communication complies with the 802.11 standards and adopts OFDM technology. The signal-to-noise ratio of the mobile node is defined as follows: 3.2. Optimization Model.
objective min Figure 1: Age of Information.

Wireless Communications and Mobile Computing
The formula (7) indicates that in any time slot t, one channel k can only be allocated to one node j. The formula (8) indicates that in the time slot t, a node can only communicate with one gateway. The formula (9) indicates whether the channel k of the gateway i is allocated to the user j in the time slot t, 1 means yes, and 0 means no. Equation Formula (10) indicates that the number of occupied gateways cannot exceed the number of available gateways. Equation Formula (11) indicates that the number of occupied channels cannot exceed the number of available channels. Equation Formula (12) indicates that the occupied channel bandwidth cannot exceed the total channel bandwidth. Equation Equation (13) indicates that the signal-to-noise ratio of a node must be higher than the threshold so as to ensure the transmission rate.
For the network with small scale and small total number of channels, the enumeration method is available to calculate the cost of users choosing a subchannel of a gateway, and then find the subchannel with the lowest cost. However, if there are 1000 users, 5 gateways and 64 subchannel in the network, the amount of calculation of payment for AoI by enumeration method is at least 320000 times. Thus, for larger networks, the computational complexity is quite high. It is considerably significant to design a low-complexity algorithm to solve the proposed problem.

DQCA Algorithm Design
We assume that each user selecting the channel is a Markov decision process (MDP) and the policy decision and the AoI just depend on the selection in last time slot. In this network, there are a large number of users and they move randomly. The optimization model mentioned above is difficult to obtain an optimal analytical solution because the result of optimization depends largely on the built model and the computing process rate of the computer. Reinforcement learning is suitable for the channel allocation problem of the network. On the one hand, it can adjust actions through the interaction between the user and the environment and rewards, which can solve the optimization problem that is difficult to obtain analytical solutions; on the other hand, it can be well adapted to a highly dynamic environment and the frequently changing channel. Q-learning and DQN are two typical reinforcement learning algorithms. The algorithm flow diagrams are shown in Figures 3 and 4, respectively.
In Q-learning, the agent chooses an action under each state, builds a Q-table and record the Q-value for each pair of state and action. The Q-value is updated by the reward produced by the selected action. However, since all the possible states and actions are enumerated in Q-table, Qlearning is only suitable for the MDP problem with small state space and small action space. When the space becomes large, the storage space of the Q-table will become very large, and the Q-table cannot hold the memory. Meanwhile, the convergence speed of Q-learning will come down.
Compared with Q-learning algorithm, DQN uses the artificial neural network (ANN) to approximate the value function, uses target Q network to update the target value and use experience replay to train the learning process of reinforcement learning. DQN just updates the parameter θ of the artificial neural network rather than update the whole Q-table. Therefore, it shortens the convergence time and is more suitable for the problem with large state and action space. Considering a large number of users and channels, we abandon the Q-learning algorithm based on Q-table and choose the DQN to train the network to obtain an approximate optimal solution. Our proposed DQCA algorithm is a channel allocation algorithm based on DQN.
Agent: We define the controller node on mobile user as an agent. As an agent, it trains the neural network according to the network status (number of users, user location, moving speed and direction of users, etc.) to obtain reasonable actions.
System state: Denoted by s(t), including channel environment and node behavior. The behavior of the node mainly refers to the current position of the node (the mobility of node follows the random walk model [24]), and the nearest  gateway is selected for access according to the position of the node. The channel environment can be characterized by the signal-to-noise ratio of the node. If node j selects gateway i for data transmission in time slot t, the signal-to-noise ratio of node i in time slot t is γ i j,k ðtÞ, if γ i j,k ðtÞ ≥ γ 0 , then s j ðtÞ = 1; otherwise s j ðtÞ = 0. That is, s j ðtÞ = f0, 1g.
System action: After the node selects the gateway i, the system action is defined as which channel k of the gateway i is selected by the node j.
Reward: User j uses the immediate reward produced by a j ðtÞ at the system state s j ðtÞ, which is defined as Eq. (16). This revenue function can ensure that the cost of AoI is minimized while meeting the channel ratio constraint.
For each user j, we define the Q function as Qðs t , a t Þ when take action a t at state s t , as shown by Eq. (17).
Where Pðs t+1 | s t , a t Þ is the transition function from state s t to state s t+1 . δ is a discount factor used to balance the immediate reward and long-term reward. A is the set of feasible actions.
Q function and optimal policy: Then the optimal value of Q function and the optimal policy π * can be represented as Eq. (18) and Eq. (19), respectively.
Target value: To avoid overestimation brought by only one parameter θ in neural network, we use parameter θ and θ ′ to illustrate the predict network and target network, respectively. Then the Q-function can be given by Eq. (20).
A t -greedy

Wireless Communications and Mobile Computing
Loss function: To approximate the Q-function, we also define the loss function as Eq. (20) to train the weights θ and θ ′ of ANN.
In DQCA, we first get the locations of all nodes and gateways and select a gateway for each node according to the shortest distance. And then we perform the channel allocation algorithm by Algorithm 1.

Simulation and Performance Evaluation
In this section, we first introduce the simulation setup, then show the simulation results and analyze the performance of the proposed algorithm.

Simulation Setup.
To testify the effectiveness of our proposed algorithm, the Q-learning algorithm and greedy algorithm are also simulated with the DQCA algorithm for comparison. Q-learning algorithm builds Q-table for each node and finds the maximum Q-value for each node from all available actions. The main idea of the greedy algorithm is to allocate the channel in each time slot with the mini-mum growth of the cost function in the next slot as the optimization objective [24]. To prove the effectiveness of the proposed algorithm, this paper compares the three Input: Node list, gateway list Initialization: 1. Initialize cost c and energy e to 0. 2. Initialize step to 1. For episode =1 to maximum iteration time T do Count =1; Obtain state s t based on the input. 1. Repeat: (1) Select action a t = arg max a Qðs t , a | θÞ (2) Output the next state s t+1 , reward r t , cost c and energy e according to the count and action a t (3) Store transition (s t , a t , r t , s t+1 ) in the replay memory 2. If step >200 and step % 5 == 0: Sample random minibatch of transitions (s t , a t , r t , s t+1 ) from the replay memory pool. Else: Continue 3. Update: s t ⟵s t+1 ; c += c; e += e; Step + =1; Count + =1; For each node in node list If packet size <= 0: Break 4. Update target value y = r t + γ max a t+1 Qðs t+1 , a t+1 | θ′Þ.

End for
Algorithm 1: Channel allocation algorithm for user j based on DQN Wireless Communications and Mobile Computing algorithms from three aspects: cost, AoI, and energy consumption. Among them, cost refers to the overall cost of the network, calculated according to formula (6), and the average AoI of all nodes is calculated as follows: Energy consumption is the average energy consumption of all nodes, defined as 5.2. Performance Evaluation. To verify the effectiveness and feasibility of the proposed DQCA algorithm, this paper uses three different scenarios. The first one: the average size of data packet is 5 M, and the data packet arrival interval is 50 ms, the number of nodes in the network changes; the second one: the number of nodes is 20, the data packet arrival interval is 50 ms, and the average size of data packet changes; the third one: the number of nodes is 20, the average size of data packet is 5 M, and the data packet arrival interval changes. The simulation program runs on a computer with an Intel Core i7-3520M with 2.90GHz frequency CPU and 8G RAM. The parameters used in the simulation are shown in Table 1. Figures 5-7 study the impact of changes in the number of nodes on network performance when the length of the data packet and time slot is fixed as defined in the first scenario. It can be seen from Figures 5 and 6 that the average AoI and average energy consumption of the three algorithms continuously reduce as the number of nodes increases. This is because the AoI and energy consumption increase more slowly than the number of nodes, resulting in a decrease in the average value. At the same time, due to the large state space, Q-learning needs to consume more time and computing resources, and it is necessarily inferior to DQCA in terms of AoI and energy consumption. Especially for energy con-   Figure 7 that the costs of the three algorithms all increase with the increase of the number of nodes, among which the DQCA algorithm increases slowly and the increment is small. The cost takes into account the AoI and energy consumption of the nodes. The DQCA algorithm has more advantages in these two aspects than the other two algorithms. Therefore, the total cost is significantly lower than the Greedy and Q-learning algorithms and can be reduced by up to 57.3% compared to the Greedy algorithm. Figures 8-10 shows the fixed number of nodes and packet arrival interval in the second scenario, to study the change of network performance with the size of data packet. It can be seen that as the size of the data packet continues to increase, the AoI and energy consumption of the node is also increasing, so the cost increases accordingly. This is because after the data packet increases, the processing and transmission time of the data packet increases, and it takes longer time for the gateway node to wait for the latest update of the node, and the energy consumption of the transmitter and receiver of the nodes will increase accordingly.

Wireless Communications and Mobile Computing
Compared with the greedy algorithm and Q-learning algorithm, the DQCA algorithm can reduce the cost by about 62% and 60%. Figures 11-13 show the fixed number of nodes and the size of data packet in the third scenario, to study the change of network performance with the packet arrival interval. It can be seen that when the data packet arrival interval increases, the number of data packets in the network decreases, and the data packets sent and received by the node decrease, so the energy consumption of the node is reduced. Due to the increase of the data packet arrival interval, the probability of the node being allocated to the channel at the gateway node also increases, that is, the waiting time for an assigned channel for the node is shortened. As can be seen from Figure 11, overall, the AoI of the node is reduced. From the simulation results in Figure 13, as the packet arrival interval continues to increase, the impact on the average energy consumption and cost of the node gradually decreases, and the curves in Figures 12 and 13 tend to be stable. This is because the packet arrival interval increases to a certain extent, the basic energy consumption of the node accounts for a larger proportion of the total energy consumption, and the energy consumption of the node is less affected by the sending and receiving of data packets.
The greedy algorithm only considers the optimal value of the current function, does not consider the previous choice, nor the consequences of the current choice. But in fact, this method often does not have the best results. Therefore, in Figures 5-13, the greedy algorithm has the worst performance compared to Q-learning and DQCA.

Conclusion
Focused on the freshness of information in IoMT, this paper studied the channel allocation problem oriented to the AoI. In this paper, system cost is defined as a non-descending function about the AoI and energy consumption of nodes.
Since the system cost optimization problem is difficult to solve due to the large amount of users and the mobility of users, we adopt a DQN-based method named DQCA algorithm. The simulation compared the proposed DQCA algorithm with Greedy algorithm and Q-learning algorithm in three different cases. The simulation results show the superiority of DQCA algorithm from the aspects of average AoI and average energy consumption of nodes and system cost.