Optimization Algorithm for AoI-Based UAV-Assisted Data Collection

. Regarding the issue of information freshness in systems that aid in data collection using unmanned aerial vehicles (UAVs), a data collection algorithm that is based on freshness and UAV assistance is proposed. Under the limitations of wireless sensor node communication distance and UAV parameters, the optimization problem of minimizing the average spatial correlation age of information (SCAoI) of all nodes in the area is set up. This problem is solved by optimizing the number of clusters, UAV ﬂ ight trajectories, and the order of data collection from cluster member nodes. The maximum communication distance of the nodes is used as the cluster formation radius, and the maximum-minimum distance clustering algorithm is used to cluster the nodes in the region to obtain the minimum number of clusters. After it has been proven that the trajectory optimization problem in this study is NP-hard, the ant colony algorithm is applied to obtain the minimum ﬂ ight time and the corresponding trajectory. By using the greedy algorithm to determine the member nodes in the sequence of data collection for a cluster, the instantaneous SCAoI of the UAV arriving at the cluster head is solved. Simulation results show that the proposed algorithm in this paper can e ﬀ ectively improve the freshness of data and reduce the average SCAoI of the system compared with the algorithm in the comparative literature, reducing the average SCAoI by about 61%.


Introduction
The continuous development of wireless communication technologies and the Internet of Things has given rise to many systems oriented towards real-time applications, such as smart homes, smart transportation, and smart health [1].For such systems, it is essential that the information be current.The terminal device (e.g., sensors, surveillance cameras, smart wearables) needs to sense the data of the surrounding environment in real-time and monitor the system status to make accurate and reliable decisions and controls.If the terminal receives old data, it may affect system decisions and cause significant security risks.For information update applications, the significance of information freshness is rising quickly.To accurately describe the freshness of the information, academics have proposed the age of information (AoI), and many scholars use it as a measure of information freshness.In application scenarios where information freshness is sensitive, the networking method of the system and the allocation of wireless communication resources can have a great impact on the information age of the data.This has become an open challenge with research implications.In this paper, the impact on the information age of the system is studied in the direction of the system networking method and the allocation of wireless communication resources.
During the update process, the AoI was utilized to record the freshness of state information in the queueing model [2], as the time between the generation of data and its reception at the receiving end.In [3], the age of information (AoI) is used as a measure of the freshness of information in the information update system.The impact of scheduling policies on the performance of AoI in single-server queues is investigated.
And the scheduling policy that can effectively improve AoI has been designed.In [4], they investigate a multisource preemptive queuing model as well as how to regulate the generation rate of each source under various preemption strategies to attain the highest level of information freshness overall.In [5], the AoI of a multisource queueing model with an FCFS (first-come, first-served) Poisson arrival strategy is examined.They establish precise formulas for the average AoI of the multisource M/M/1 and multisource M/G/1 queueing models.It is further demonstrated that, in time-sensitive control applications, reducing the average delay alone does not reduce the AoI.There is also a large literature on the use of AoI as a performance metric in caching networks.In [6,7], the relationship between service delay and content freshness (as defined by AoI) in mobile edge caching networks is examined.The caches are placed near the users, which can successfully lessen the service latency of content delivery while reducing the latency and additional transmission resources required to update the cached content.To achieve a compromise between AoI and latency, a freshness-aware cache update strategy is developed.A cache refresh system is considered in [8].Both the age of synchronization (AoS) and the age of information (AoI) are used to measure how recent the local cache is.The closed-form expressions for minimizing AoS and AoI are derived for larger and smaller refresh rates, respectively.In [9], a cache update algorithm based on content popularity and information freshness is proposed.The algorithm fully considers the mobility of users and the dynamics of popular content in time and space and introduces the age of information (AoI) to achieve dynamic updating of content.In addition, with the wide application of game theory and reinforcement learning in various fields, they have also been applied to solving AoI-related problems.A content resale problem is discussed in [10] which provides a hybrid multicast/unicast/D2D transmission architecture oriented towards age of information and cache assistance to increase the data transmission rate, reduce the burden of large data traffic, and improve system efficiency, where the problem is decomposed into two subproblems and the subproblems are solved through the Stackelberg game and auction framework, respectively.Reinforcement learning is used in the literature [11] to investigate the best update strategy for the age of information (AoI) and the urgency of information (UoI) of real-time status information based on resource constraints.Urgency of information (UoI) further includes context-aware weights indicating whether the monitored process is in an emergency.The simulation results show that the threshold of the optimal policy increases as the resource constraint is tightened.
UAVs, as a stable, mobile, and flexible flight device with low cost, can be used for image recognition or vision processing [12][13][14] and as an auxiliary device in wireless communication networks.In [15], a MEC network consisting of IoT devices, UAV base stations, edge clouds, and data centers with energy-efficient UAV support services is proposed, and a GreenUAV-CoCaCo algorithm is proposed to jointly optimize the communication, caching, and computation energy consumption of UAVs.In [16], cache-enabled UAVs are used to provide contextual messaging services to end devices.Unlike traditional network traffic, contextual information changes over time, thus increasing the demand for AoI constraints.Cache replacement and content distribution strategies are designed to minimize the traffic on the ground network according to the requests of users and the dynamic changes of the content.In [17], maximizing the quality of service (QoS) based on the freshness of the data was studied while considering the range of the UAV.Modeling is used to convert the optimization problem into a semi-Markov decision process (SMDP), and a hierarchical deep Q network-(DQN-) based path planning algorithm is then suggested to learn the best course of action.In [18], an AI-based end-to-end framework is proposed to resolve the issue with UAV flight trajectory planning.After simulation, it is proved that the AI-based framework is like a commercial open-source solver in terms of accuracy but can be twice as efficient for scenarios with a large number of nodes.In [19], the concept of AoI is enlarged by including a brandnew metric called correlation-aware AoI (CAAoI) to assess the timeliness and degree of correlation of the data collected by UAVs from the ground.In [20], the topic of UAVassisted data collection in wireless sensor networks is investigated, where the AoI of each wireless sensor node is used to gauge the freshness of information.
In the data collection scenarios of [18,20], all use AoI as a measure of data freshness but ignore the data correlation due to data coming from the same collection device or multiple devices.In [19], it states that devices collecting the same type of information that are in proximity are highly correlated at the same moment, which can affect the diversity of the data.Therefore, in this paper, SCAoI is a performance index that is used to measure how recent the information is, which can balance the diversity and freshness of the data.The [19] investigates the AoI when the UAV performs data collection in a region where there are few wireless sensor nodes, but the number of nodes in real-world application scenarios will be much larger than the setup in the literature.The UAV in [19] travels above each node for data collection.Therefore, as the nodes multiply, the flight time also grows, which is not conducive to guaranteeing information freshness.In this paper, a SCAoI-based UAV-assisted data collection method is proposed.In this scenario, every node is first clustered, and the cluster head node gathers the information acquired by the nodes of the cluster.When the UAV flies over the cluster head node, the cluster head node transfers the data that it has temporarily stored to the UAV.
The following is a summary of the significant contributions in this paper.
(1) The spatial correlation age of information is used to construct a model for UAV data collection that measures the freshness of information.All nodes are organized into clusters using the maximumminimum distance clustering algorithm, and then, the member nodes use the time that the UAV spends moving between hovering positions to gather data in accordance with TDMA and upload it to the cluster head node.The UAV then flies over the cluster head node along the best route to receive the data that has been uploaded by the cluster head node 2 International Journal of Distributed Sensor Networks (2) The closed expressions for the instantaneous SCAoI of data collected at the cluster head and the average SCAoI of all nodes are derived.Under the constraints of maximum communication distance between nodes and UAV flight parameters, the optimization problem of minimizing the average SCAoI is developed.The initial problem is divided into three optimization subproblems, which must each be solved separately: cluster formation, trajectory, and data collection order of cluster members (3) First, the maximum communication distance of nodes is taken as the radius of cluster formation, and to produce the smallest possible number of clusters, all nodes are clustered using the maximumminimum distance clustering algorithm.Then, it is shown that the trajectory optimization problem is NP-hard, and an ant colony algorithm is used to optimize the UAV's flight path to achieve the shortest possible flight time.Finally, the greedy algorithm is then employed to determine the optimal order for data collection at cluster nodes (4) Simulation results show that the proposed method can successfully reduce the average SCAoI.Comparing this method to how the UAV collects data at each node, the average SCAoI can be reduced by almost 61%, and the freshness of the collected data is effectively guaranteed The remainder of the paper is structured as follows.Section 2 gives the data collection model with UAV assistance and the establishment of the optimization problem.Section 3 details the problem solution, the algorithm framework, and the detailed design process of the algorithm.Section 4 gives the simulation results.Finally, the entire study is concluded in Section 5.In addition, to improve the readability of this paper, the abbreviations and their meanings covered in this paper are summarized in Table 1.

System Model and Problem Building
2.1.System Model. Figure 1 depicts the system model used in this paper, with a UAV U, a data center DC, and N wireless sensor nodes, which are distributed at random in a rectangular area with a side length A. The sensor node n i ∈ N = n 1 , n 2 , ⋯, n N , with coordinates L i = x i , y i , is used to collect information from the surrounding environment.The UAV serves as a data collection tool from the DC, collecting the data captured by the nodes in the region and returning it to the DC for processing.To facilitate the deployment of UAVs, all the nodes are divided into M clusters with cluster head nodes ch j ∈ CH = ch 1 , ch 2 , ⋯, ch M and CH ⊆ N, and the binary variable γ ij = 1 indicates that node n i belongs to the cluster with ch j as the cluster head, otherwise γ ij = 0.The nodes in the same cluster as ch j are denoted by the set C j = n i γ ij = 1, n i ∈ N , 1 ≤ j ≤ M, and the number of nodes within C j is denoted by N j .The cluster head node serves two primary purposes.First, it gathers data from the member nodes, and second, it establishes the hovering position of UAV.The UAV is transferred between hovering positions to upload the collected information to the cluster head node in a certain order according to TDMA.The UAV is circling close to where the cluster head node is located.While the UAV is hovering over the cluster head node, the cluster head node will transmit to it all the information gathered by the member nodes.Once the UAV has collected data from one cluster head node, it will move on to the next after receiving the data.The UAV moves according to its trajectory V = v 0 , v 1 , ⋯, v M , v M+1 .Since the UAV starts from DC and eventually returns to DC, so v 0 = v M+1 = DC.If the velocity v and height h remain unchanged, then the hovering position of the UAV can be expressed by the coordinates of the cluster head node.When the UAV moves from v l−1 to v l , the flight time can be expressed as where x l−1 , y l−1 and x l , y l are the coordinates of the cluster head node corresponding to v l−1 and v l in the trajectory, respectively.Later in the paper, f l−1,l is denoted as f l for brevity.
The following Figure 1 is an example of the process of collecting node information from DC by a UAV as an assistant device.Define the UAV trajectory as V, as shown in Figure 1, with M = 6.The UAV starts from DC and flies first to v 1 , the area where the cluster head node ch 5 is located, according to trajectory V.The nodes in C 5 use the time f 1 when the UAV flies from DC to v 1 for the order G = g 1 , g 2 , ⋯, g N 5 to upload the collected data to ch 5 by TDMA.After arriving at ch 5 , to receive the data sent by the cluster head node, the UAV hovers for a while and then flies through v 2 , v 3 , v 4 , v 5 , and v 6 one by one and finally returns to DC, i.e., the final flight path of the UAV is V = DC, ch 5 , ch 1 , ch 4 , ch 6 , ch 2 , ch 3 , DC .
The UAV is used as an assistant device to collect information, and if it flies over each node to collect the information that node collects, it will take a long flight time because it needs to traverse each node, and the freshness of the information will be reduced as a result.Reducing the time taken for data to travel from the source to the destination is vital to ensuring the accuracy of the information.Three factors play a role in determining the time in this paper: the number of clusters M, the UAV flight trajectory V, and the order G in which the member nodes gather data.The number of clusters corresponds to the number of hovering locations along the flight path of the UAV.The amount of time spent in flight during the data-gathering procedure decreases as the cluster number decreases, which is better for preserving the accuracy of the data.The sensor nodes transmit the collected data in the form of time-stamped packets.Assuming that the maximum communication radius of a node is R, then the maximum cluster formation radius is R, which is the distance d ij = x j − x i 2 + y j − y i 2 ≤ R between the cluster member node ch j and the cluster head node n i .The binary variable γ ij = 1 is used to indicate that node n i belongs to the cluster with ch j as the cluster head, otherwise 3 International Journal of Distributed Sensor Networks γ ij = 0.For any node n i , it can be classified into a certain cluster, and the clustering of all nodes can be represented as a vector ϒ = γ ij .When all nodes have completed clustering, the UAV must hover directly above the cluster head node to gather data.Therefore, it is important to optimize the path of UAV travel.A sensible flight path enables it to finish the data-gathering operation while spending the least amount of time feasible to make additional contributions to enhancing the freshness of information.In addition, the AoI proposed in this paper is aimed at gauging how recent a piece of information differs from the traditional definition by considering the effect of spatial correlation between data collection devices due to their proximity.This effect is more evident among nodes within the same cluster.Nodes within the same cluster are close to each other, which can make the similar data collected at the same moment highly correlated [21], and the diversity of the data is diminished, which is not conducive to data analysis.Consequently, we construct a data-gathering order for member nodes that can take into consideration the freshness of information and the correlation between collection devices.

Cluster Member Node and Cluster Head Node
Communication Model.The communication between member nodes and cluster head nodes uses a point-to-point communication model on the ground with a transmission rate of where B is the bandwidth of system, p i is the transmit power of node, p n is the noise power, and g = β 0 d −α is the channel gain, where β 0 is the reference channel gain of distance, d is the separation between the cluster head node and the member node, and α is a constant coefficient related to the environment.A member node must spend t ch = L/R ch seconds before sending a packet with data quantity L to the cluster head node.Most of the time, the nodes of the system are in a state of sleep, and when the UAV follows the established trajectory V, flying from v l , l ≠ M to v l+1 , the nodes in the cluster corresponding to v l+1 are awakened and start collecting data in a certain order and sending it using TDMA to the cluster head node.In this process, ignoring the specific time required by the nodes to collect information, the time slot length of TDMA is the length of time used to send information to the cluster head node from the member nodes.The data gathered from each cluster member node must be sent to the cluster head nodes, so when the member nodes have not yet all finished uploading and the UAV has arrived above the cluster head, the UAV is required to extend the hovering time and wait for all nodes to finish the uploading task before the process of uploading data from the cluster head to the UAV.

Communication Model between Cluster Head Node and UAV.
Data is uploaded to the UAV by the cluster head node using a probability-based approach for air-to-ground communication [22].Considering that additive Gaussian white  4 International Journal of Distributed Sensor Networks noise is present, the data transmission rate of the cluster head node to the UAV flying above it can be expressed as where φ i is the typical route loss for both line-of-sight and non-line-of-sight transmission, expressed as P i LoS is the line-of-sight transmission likelihood, and φ i LoS and φ i NLoS are the non-line-of-sight transmission and the path loss of line-of-sight, respectively.
where β 1 and β 2 are constant parameters related to environmental factors, and θ i is the angle formed between the node uploading data and the UAV, considering that the UAV flies exactly above the node to collect data, θ i = π/2.The path loss of line-of-sight and non-line-of-sight transmission is denoted as Where PL i is the free path loss, PL i = 20 log 10 4πhf c /c , h is the flight altitude of the UAV, f c is the carrier frequency, c is the speed of light, and ζ LoS and ζ NLoS are the additional path loss, which takes a constant value.The cluster head node needs t u = L/R u to send a packet with data quantity L to the UAV.

AoI Model.
AoI is used to gauge how recent a piece of information is, which refers of the interval between the moment at which data is produced at the source and when it is received at the receiver.The AoI at ch j at moment t can be defined as where u k t is the time at which ch j received the most recent generation of data at point t, i.e., the timestamp.If the packet carrying the timestamp u k reaches the cluster head node at time τ k , τ k = u k + l k can be used to indicate the time of arrival at the cluster head node, and l k is the time delay of communications with the cluster head node.When the cluster head node is receiving new data and its informational freshness has increased overall, i.e., its AoI decreases, and the process is shown in Figure 2. As a result, r 1 = τ 1 − l 1 = u 1 can be used to indicate the drop in AoI that occurs when the first member node uploads data to the cluster head node, and the decrease in the second arrival can be expressed as r 2 = τ 2 − l 2 = u 2 − u 1 , and similarly r 3 = τ 3 − l 3 − r 1 − r 2 = u 3 − u 2 , as shown in Figure 2; the AoI at the moment t can be stated as Equation ( 6) can be rewritten as where U t is the full count of data received by ch j at time t and u k is the timestamp of the kth upload data generation.
The gathered data shows a substantial association between the member nodes that are spatially adjacent to one another, so the effect of spatial correlation on the instantaneous AoI of cluster head nodes is considered.The instantaneous SCAoI is defined to characterize the instantaneous freshness of ch j at time t [19], and the instantaneous SCAoI of ch j at time t can be defined as where β k is the correlation coefficient and β 1 = 0, the minimum distance d k between the preceding k − 1 member nodes and the kth member node that uploads data to the cluster head, and the constant ρ s represents the strength of spatial correlation.

Problem Formation.
The UAV starts at the DC and collects the information collected by all nodes in the area.Assuming that the UAV trajectory is V, with v j = ch j , j ≠ 0, M + 1, andj ≤ M, taking C j as an example, i.e., ch j is the cluster head of this cluster, the detailed analysis of the cluster data generation process, after transmission, and finally arriving at DC.The AoI of the process can be considered the result of summing three parts of time.First, the member nodes in C j use the flight time f j of the UAV transfer from ch j−1 to ch j to collect information in order G.All member nodes collect the data and need to transmit it to ch j .Assuming that the UAV arrives at ch j at time t j a , the instantaneous SCAoI of the cluster head node at that time, or Δ j C t j a , can be used to Figure 2: Illustration of the AoI of the cluster head node.
5 International Journal of Distributed Sensor Networks represent the current freshness of cluster head nodes.Second, the temporary storage data of the cluster head node must be uploaded to the UAV.For the UAV to get the data, it must hover for a specific time, which is denoted as H and is the cumulative sum of the hovering times of the jth to Mth clusters in the trajectory.Denoting the hovering time at the j th cluster head node by h j , we have H = ∑ M l=j h l .The last part is the flight time, denoted by f j+1 from the jth flight to the j + 1st cluster, and as with the hover time, F = ∑ M+1 n=j+1 f n .For each cluster of data, they undergo the same process as in Figure 3.The UAV offloads the data collected from c to the DC at time t; at which point, the instantaneous SCAoI of C j is expressed as Then, the AoI of all nodes in the system is shown in Figure 4, and the average SCAoI of all nodes can be expressed as By maximizing the number of clusters formed M, the flight trajectory V of the UAV, and the information collection order G of the nodes in the cluster, this research is aimed at reducing the average SCAoI of all the nodes in the system.The optimization issue is best described as The distance of a member node from the cluster head node cannot be greater than the maximum communication distance of nodes, according to constraint (C1).The number of nodes in the cluster and the rate of data transmission from the cluster head node to the UAV are both factors in constrain (C2) at equation ( 13) that affect the hovering duration of the UAV.Equation (C3) demonstrates the relationship between the flight time of the UAV and the separation between its two hovering places.

Problem Solving and Algorithm Design
3.1.Problem-Solving Framework.It is clear from equation (12) that the average SCAoI is the weighted sum of the instantaneous SCAoI, hover time, and flight time when the UAV reaches the head of each cluster, and the weighting factor is related to the number of clusters and trajectories.Due to the tight coupling between variables, the trajectory of the UAV and the clustering outcomes of nodes are tightly tied to the order in which they collected their data and cannot be solved directly, so the problem P is decomposed into three subproblems.Subproblem 1 is about the cluster formation optimization problem, subproblem 2 is the trajectory optimization problem, and subproblem 3 is the optimization problem of the order of data collection of the nodes within the cluster.
First, following grouping each node, it is possible to determine the set CH of cluster head nodes, the number M of cluster head nodes, the nodes C contained in each cluster, and the size C size of each cluster.The best flying trajectory V for the UAV is then determined by using the coordinates of the cluster head node and the data center DC as inputs to the trajectory problem.Finally, based on the known situation of each cluster node C, the order of data collection from the cluster nodes G is determined.Algorithm 1 provides a full description of the process.
3.2.Distance-Based Clustering Method.Based on the above analysis, all nodes should be fairly divided into clusters, and the appropriate cluster head node should be chosen as the area where the UAV will hover to collect the data gathered by the member nodes.With an increase in hovering position, the flight time of the UAV rises, which is not good for keeping the information fresh and thus limits the number of clusters when the nodes are clustered.The more clusters there are, the longer the UAV must fly, and the fewer clusters there are, the better.Wireless sensor nodes have a finite communication range.They are unable to communicate with one another beyond this range.Therefore, when constructing the cluster, the distance between the cluster head node and member nodes must satisfy the requirement of the maximum communication radius of nodes.Subproblem 1 can be expressed as In this paper, a combination of maximum-minimum distance clustering and nearest-neighbor clustering is used 6 International Journal of Distributed Sensor Networks to solve P1.This algorithm uses the Euclidean distance between nodes as the main reference data to decide which cluster head nodes to choose.First, the initial cluster head node might be any node, and then, the next cluster head node is chosen from among the nodes with the greatest distance (which must be greater than the maximum communication radius) from the first cluster head node.Each remaining node's distance from the node that has emerged as the cluster head is determined.The present cluster head node cannot divide all the nodes into clusters as needed if the maximum value of the minimum distance is greater than the cluster radius.A new cluster head node must be added, and the node corresponding to this maximum value is chosen as the new cluster head node.The cluster head nodes can then all be identified by calculating the greatest value of the minimum separation between the remaining nodes and the cluster head node until it is less than or equal to the clusterforming radius.Finally, all nodes are clustered according to  C size ⟵ temp C size M , obtain the result of subproblem 1; 8: Using the coordinates of CH and DC, optimize the UAV trajectory V, and obtain the optimal solution of subproblem 2; 9: for j=1: C do 10: for k=1: C size j 11: Optimize the order of data collection from member nodes within a cluster G, obtain the optimal solution of subproblem 3; 12: end for 13: end for 14: Solve the problem P.

Ant Colony Algorithm-Based Trajectory Optimization.
Based on the clustering results, the UAV trajectory problem consisting of M cluster head nodes and a data center DC is solved.Since the flight trajectory of the UAV affects the calculation of the hovering time and flying time in equation ( 13), subproblem 2 can be written as First of all, prove that P2 is an NP-hard problem.
Proof.According to Algorithm 2, the number of clusters formed, the size of each cluster, the distance between cluster head nodes, and the hovering duration h of the UAV over each cluster head node can both be determined using the coordinates of cluster head nodes.P2 can be viewed as the shortest flight time that solves for DC as the starting point and travels through each cluster head node before arriving back at DC.The shortest time problem is equivalent to the shortest path problem during this operation because the flight speed of the UAV is constant.As in [23], if a certain typical NP-hard problem can be reduced to P2, then it is possible to show that P2 is identical to the NP-hard problem.
The description of P2 is basically similar to the typical traveling salesman problem (TSP), which is to find a path that allows a traveler to visit each city once with the shortest total path length, provided that the city coordinates are known.After sorting, simplifying, and mapping each city into cluster head nodes with hover time h and flight time f , then P2 can be basically equivalent to a TSP problem, so P2 is also an NP-hard problem.
In this study, the NP-hard problem is solved by the ant colony algorithm because it is typically impossible to tackle NP-hard problems by addressing convex optimization problems.Since ants do not have vision, they cannot intuitively feel the distribution of food and can only rely on the pheromones left by their peers along the foraging process to identify the location of food.Pheromone is a biological hormone that will be volatilized over time after being excreted by ants.Therefore, when more pheromone is accumulated in a certain path, it means that there is more food in that path compared with other paths, which will attract more ants to go to that path to get food.
In this paper, each cluster head node is mapped to a city with hover time h and flight time f , which is the location where the ants need to find food in the ant colony algorithm.First, we initialize the parameters of the system, such as the number of ants, pheromone concentration, pheromone volatility factor, and the maximum number of iterations, so that all ants start to find the path from the coordinates where DC is located.By calculating the probability as part of a path search, we can choose which cluster head node the ants will visit next after they have visited all of the cluster head nodes.Each cluster head node has a corresponding hover time and flight time, so the value of equation ( 15) is calculated for the path of each ant, and the trajectory that minimizes the value of equation ( 15) in this iteration is recorded.Then, the pheromone concentration on the path in the system is updated, and the next path finding is performed.Once the maximum number of iterations has been reached, the trajectory with the smallest value of token (15) in all iterations is output.The following is a summary of the precise steps of the algorithm.
Step 1. Initialize the relevant parameters and place N a ants in the system to make them all start their path exploration from the coordinates of the DC.Use the table allowed to record the nodes that have not been visited and the tube table to record the nodes that have been visited.
Step 2. Determine the next node to be visited and express the probability of the mth ant moving from ch i to ch j in round t with probability P m ij t .
Input: Coordinates L of N nodes in the region, cluster radius R; Output: The number of clusters M, the set of cluster heads CH, the number of nodes within each cluster C, the size of each cluster C size; 1: for i=1: N do 2: ch 1 = n 1 ∈ N, records CH = ch 1 ; 3: ch 2 = n j , n j ∈ N and n j ∉ CH, n j is the node with the largest distance from ch 1 among the remaining nodes, record CH = ch 1 , ch 2 ; 4: if max min d n j ,CH > R, n j ∈ N and n j ∉ CH do 5: ch 3 ⟵Nodes with maximum and minimum distance >R; 6: else Output CH; 7: end if 8: end for 9: Complete clustering of the remaining nodes using the information from the cluster head with the fewest number.
Algorithm 2: Cluster formation algorithm based on maximum-minimum distance. 8 International Journal of Distributed Sensor Networks where the concentration of pheromones along the route between ch i and ch j is τ ij t .The length of the path connecting ch i and ch j is reciprocal to a heuristic function called η ij t .The pheromone factor α indicates the extent to which the pheromone concentration has an impact on the path when determining the next node to be visited.The β is the heuristic function factor, and both α and β are constants.
The table of allowed is used to record the nodes that have been visited by the mth ant, which can be regarded as a set consisting of a sequence of nodes that have been visited, complementary to the tube table, where the entire set is the collection of all data center and cluster head nodes.
The next node to be visited is the one with the highest P m ij t value.
Step 3. When all ants have finished visiting all nodes and return to DC, one round of trajectory planning is completed, and then, the value corresponding to the trajectory explored by each ant is calculated according to equation (15).
Step 4. Pheromone update.After a round of trajectory planning is finished, ants will have left behind pheromones along the route.The pheromones on the path between ch i and ch j are represented as follows: where ρ is the pheromone volatilization coefficient.Equation ( 17) can be interpreted as the pheromone after the t + 1st round on the path being equal to the pheromone left on the path in the tth cycle plus the added pheromone.The added pheromone is the sum of the pheromones left on the path by all ants, and the size of the pheromone left by each ant is the reciprocal of its path length, as in equation ( 18).
Step 5. Algorithm iteration and end.When there have been fewer iterations than the maximum amount, the algorithm cycles back to Step 2 and increments the parameter that counts the iterations by 1; when the total number of iterations equals the number allowed, the algorithm iterates to the end and outputs the shortest trajectory.
The algorithmic procedure is described in Algorithm 3.

Greedy Algorithm-Based Data Collection
Sequence.The clustering of system nodes and the flight path of the UAV can be calculated using the answers to P2 and P3 problems.Assuming that the UAV trajectory is V with v j = ch j , j ≠ 0, M + 1 , and j ≤ M, for cluster head ch j , the moment when the member nodes start data collection is the moment when ch j−1 finishes data transmission to the UAV and flies to ch j , which can be regarded as the initial moment for collecting information in cluster C j .As shown in Figure 3, the nodes in cluster C j use the flight time f j for data collection, and the moment when the UAV arrives at ch j is denoted as f j .
According to equation ( 9), the instantaneous SCAoI of the data collected when the UAV arrives at ch j as where t k−1 ch is the time taken by the k -1st node to send the information to the cluster head node when k = 1 and According to the definition of equation ( 9), the datagathering process of cluster nodes in different orders affects the instantaneous SCAoI of the collected data when the UAV reaches ch j .Therefore, it is necessary to optimize the nodes of cluster data collecting order, and subproblem 3 can be written as The correlation coefficient β k of the kth uploaded node is related to the shortest distance between the previous k − 1 already uploaded nodes.An exhaustive method is utilized to enumerate every collection order to arrive at the best value if we want to acquire the best instantaneous SCAoI.However, when the number of member nodes is large, this approach can obtain the exact optimal value, but it will consume a lot of computational resources and increase the complexity of the algorithm.Therefore, in this paper, we use a greedy algorithm to select the node that can ensure the smallest value of the objective function in the current state from the nodes that have not yet collected data and repeat the cycle until all nodes have finished uploading, so that we can obtain the data collection order that makes the instantaneous SCAoI at the cluster head node suboptimal.Algorithm 4 provides a description of the steps of the algorithm.

Simulation Results and Analysis
The proposed algorithm is simulated using MATLAB, and the computer processor used is a dual-core quad thread processor of Intel Core 8th generation.The simulation scenario is shown in Figure 1.The sensor nodes N are randomly International Journal of Distributed Sensor Networks distributed in a rectangular area with coordinates (0, 0), (0, 300), (300, 0), and (300, 300) as vertices, and the data center DC has coordinates (350, 150).The relevant parameters used for the simulation were set with reference to the literature [19,20], and the specific values are shown in Table 2.
The algorithm in this study is compared to the algorithms in [19,20] to evaluate the effectiveness of the algorithm presented in this research.Wireless sensor nodes are distributed in a rectangular area of 300 m × 300 m, and a UAV is used to collect the data collected by the sensor nodes.The UAV always maintains a constant flight height and speed, and the simulation parameters are set as shown in Table 1 if no special instructions are given.In contrast to this paper, [19] aims to minimize the average information age of the nodes by optimizing the UAV trajectory, which requires the UAV for information collection to fly over each node and communicate directly with each node in the region.To further illustrate the superiority of the proposed algorithm in this paper, a comparison with the algorithm in [20] is also made.In [20], the same method of cluster formation is used to reduce the UAV hovering position to optimize the trajectory to further improve the freshness of the collected data, but it does not consider the influence of Input: M cluster head coordinates, coordinates of data center DC, maximum iteration I max and other related parameters; Output: Shortest path trajectory V, the optimal value of P2; 1: Initialization related parameters; 2: for i=1: I max do 3: for j=1:N a do 4: Update tube, allowed 5: Determine the next visited cluster head node according to equation ( 16); 6: end for 7: Calculate the value corresponding to the path found by each ant according to equation ( 15); 8: Update the pheromone according to equation (17) (18); 9: end for 10: Choose the trajectory that minimizes equation (15), denoted as V, the corresponding value is the optimal value of P2.
Algorithm 3: Trajectory optimization algorithm based on ant colony algorithm.
Input: Node coordinates L in each cluster, the set of cluster heads CH, the node situation C in each cluster; Output: The order of node uploads in each cluster, the value of the suboptimal function of P3; 1: for i=1:M do 2: for j=1:N i do 3: table⟵Record the nodes that have been uploaded; 4: list⟵Record nodes that have not been uploaded yet; 5: According to equation (20), the next node that can lower the objective value of the function is found from the list.6: end for 7: end for Algorithm 4: Greedy algorithm based on data collection order optimization.Figure 5 gives the average SCAoI versus the number of nodes in the region, where the cluster formation radius R = 50m, flight height h = 50m, flight speed v = 15m/s, packet size L = 2560 byte, and degree of correlation ρ = 10 2 .As nodes become more numerous, the average SCAoI of both this paper and [19] as well as [20] increases accordingly due to the increased time consumption of each process of information collection by the UAV.For the algorithm described in this work and [20], the increase in the number of nodes means that the UAV needs to collect more data, and therefore, the hovering time of the UAV increases.Additionally, there will be more clusters, which will lengthen the flight time of the UAV.In [19], when the number of nodes rises, the UAV must fly to every node to gather data, which takes up a lot of flight time.Therefore, both in this paper and in [19], the average SCAoI shows a rising trend as the number of nodes increases.When the number of nodes is the same, the average SCAoI in [19] is significantly larger than the algorithm in this paper and [20], which is in the middle.The primary reason is that, according to [19], the UAV visits each node to gather data, and the number of hovering positions is equal to the number of nodes.As a result, the average SCAoI and flight time of the UAV increase as the number of nodes increases, lowering the freshness of the data.In this paper, we effectively reduce the number of locations where UAVs need to hover by clustering, thus shortening the flight time of UAVs.Although [20] also uses clustering to reduce the hovering position of the UAV, the number of clusters cannot be effectively reduced, which also affects the optimization of the UAV trajectory and thus the freshness of the data collected by the UAV.And as can be seen from the remaining three curves, the optimization of UAV trajectories contributes greatly to the improvement of information freshness compared to the clustering and data upload order.The reason is that with the same parameter settings, the cluster member nodes take tens of milliseconds to transmit a packet to the cluster head node, the cluster head node takes a few milliseconds to transmit a packet of the same size to the UAV, and the UAV takes a few seconds to fly to the next cluster head node.Such an order-of-magnitude relationship makes the optimization process far more effective for trajectories than for the other two variables.Overall, the algorithm proposed in this paper can improve the average SCAoI of the system by about 61%.From the point of view of the time complexity of the algorithm, the time complexity O AP of the proposed algorithm in this paper and the time complexity O CL of the algorithm in [19] can be denoted as O ϖ I max N a M ϖ ch and ϖ = AP, CL , where M AP ch is the number of clusters of the proposed algorithm in this paper and the number of clusters in [19] is denoted as M CL ch .From the above analysis, it can be concluded that when the number of nodes N is the same, since the algorithm in [19] does not have a clustering step, each node can be considered a cluster head, i.e., M CL ch = N.And in this paper M AP ch < <N, so it can be obtained as M AP ch < <M CL ch , i.e., O AP < <O CL .The time complexity of the proposed algorithm in this paper is also lower than in [19].
Figure 6 gives the variation of the average SCAoI as the number of nodes increases for different cluster formation radius, where the flight height h = 50m, flight speed v = 15 m/s, packet size L = 2560 byte, and degree of correlation ρ = 10 2 .As the cluster radius rises, it is evident from the graphic that the average SCAoI drops.This is because  11 International Journal of Distributed Sensor Networks when the number of nodes and node distribution are the same, the larger the cluster radius, the fewer clusters there will be.As a result, the hovering position of the UAV, flight time, and average SCAoI will all be reduced, while the freshness of information will also be increased.
The average SCAoI changes when there are more nodes under various UAV flight heights, as seen in Figure 7, where the cluster formation radius R = 50m, flight speed v = 15m/s, packet size L = 2560 byte, and degree of correlation ρ = 10 2 .When there are the same number of nodes, the varying UAV flight altitudes mostly influence how quickly data is transmitted from the cluster head node to the UAV.The transmitting power of the cluster head node is fixed, so as the flight altitude of the UAV rises, the data transmission 12 International Journal of Distributed Sensor Networks rate between the cluster head node and the UAV declines.This is because from Eqs. ( 3)-( 5) and the free path loss PL i = 20 log 10 4πhf c /c in Section 2.3, it is known that the free path loss increases as the flight altitude h increases, causing the denominator part of the log function in Eq. ( 3) to increase, resulting in a decrease in R u .As a result, more time must be spent transmitting the same number of data packets, which lengthens the hovering time of the UAV.As a result, the average SCAoI will rise as the flight of UAV altitude rises, provided that there are the same number of nodes.The average SCAoI changes when there are more nodes under different flight speeds, as seen in Figure 8, where the cluster formation radius R = 50m, flight height h = 50m,   The shortest path length and flight trajectory of the UAV will be the same under the assumption that the number of nodes, distribution of nodes, and cluster formation are all constant.Accordingly, the shorter the flight time, the smaller the flight speed of the UAV will be, and the average SCAoI will decrease as flight speed increases.
The average SCAoI changes when there are more nodes under different data volumes, as seen in Figure 9, where the cluster formation radius R = 50m, flight height h = 50m, flight speed v = 15m/s, and degree of correlation ρ = 10 2 .The effect of various data quantities on the average SCAoI is mostly apparent in two aspects when there are the same number of nodes.On the one hand, due to R u > R ch , the amount of data is the same when t u < t ch .When the amount of data increases, the increment of t u is smaller than the increment of t ch , and the hovering time of the UAV will increase as a result.On the other hand, the instantaneous SCAoI when the UAV reaches the cluster head node decreases as stated by equation (19); then, the average SCAoI also decreases, but its decrease is small.The cluster member nodes use TDMA to transmit data to the cluster head node, and the transmission time from the member nodes to the cluster head node is equal to the time slot length.The increment in UAV hovering time due to the increase in data volume is greater than the decrease in instantaneous SCAoI, so the average SCAoI increases with the growth in data volume.
The average SCAoI changes when there are more nodes under various correlation levels, as seen in Figure 10, where the cluster formation radius R = 50m, flight height h = 50m, flight speed v = 15m/s, packet size L = 2560 byte.When the number of nodes is the same, it is known from equation (10) that when the correlation degree of the space is larger, its correlation coefficient β is also larger, the nodes' data collection has a higher correlation, and the corresponding average SCAoI will be larger, so the average SCAoI increases with the increase of the correlation degree.

Summary
In this paper, we study the problem of age-based optimization in information collection systems with a UAV.An optimization problem for the number of joint clusters, UAV flight trajectories, and data collection order of nodes within clusters is proposed.Minimize the average SCAoI of all nodes while ensuring the sensor node communication distance and UAV parameters.In order to solve the proposed problem, we decompose it into three subproblems.First, the maximum-minimum distance algorithm based on clustering is used to obtain the number of clusters and determine the cluster head node coordinates.Then, it is proved that the UAV trajectory problem in this paper is a typical NP-hard problem that can be solved by using the ant colony algorithm.The data collection order of the nodes in the cluster is solved by the greedy algorithm.The suboptimal solution of the proposed problem is obtained by solving the three optimization problems separately.Simulation results show that the algorithm proposed in this paper outperforms comparative literature algorithms in reducing the average SCAoI of nodes and improving the freshness of information.In the future, UAV caching will be the main research direction to consider the problem of freshness of user-requested content or the problem of joint caching optimization for multiple UAVs and users.

Figure 1 :
Figure 1: Model illustration of data collection with UAV assistance.

Figure 4 :
Figure 4: Illustration of the time spent collecting all data by UAV.

Algorithm 1 :
Framework for solving problem P. 7 International Journal of Distributed Sensor Networks the idea of nearest-neighbor clustering, which is the principle of proximity.The specific algorithm steps are shown in Algorithm 2.
flight height h (m) 50 UAV flight speed v (m/s) 15 Data package size L (byte) 2560 10 International Journal of Distributed Sensor Networks the correlation between the nodes within the cluster on the collected data due to the location factor.The results of the simulation are shown in Figure 5.

Figure 5 :
Figure5: The variation of average SCAoI with the number of nodes in this paper and[19,20].

Figure 6 :Figure 7 :
Figure 6: Variation of average SCAoI with the number of nodes for different cluster radius.

Figure 8 :
Figure 8: Variation of the average SCAoI with the number of nodes at different flight speeds.

Figure 9 :
Figure 9: Variation of average SCAoI with the number of nodes for different data volumes.

Figure 10 :
Figure 10: Variation of average SCAoI with increasing number of nodes for different degrees of correlation.

Table 1 :
Table of abbreviations and full names.
Figure 3: Illustration of the age composition of the instantaneous SCAoI of the data in C j .