Performance Optimization in UAV-Assisted Wireless Powered mmWave Networks for Emergency Communications

In this paper, we explore how a rotary-wing unmanned aerial vehicle (UAV) acts as an aerial millimeter wave (mmWave) base station to provide recharging service and radio access service in a postdisaster area with unknown user distribution. The addressed optimization problem is to find out the optimal path starting and ending at the same recharging point to cover a wider area under limited battery capacity, and it can be transformed to an extended multiarmed bandit (MAB) problem. We propose the two improved path planning algorithms to solve this optimization problem, which can improve the ability to explore the unknown user distribution. Simulation results show that, in terms of the total number of served user equipment (UE), the number of visited grids, the amount of data, the average throughput, and the battery capacity utilization level, one of our algorithms is superior to its corresponding comparison algorithm, while our other algorithm is superior to its corresponding comparison algorithm in terms of the number of visited grids.


Introduction
With the development of computers and robotics, various intelligent unmanned vehicles have been developed and applied to various fields. In the ocean, autonomous underwater vehicles (AUVs) have been used as both undersea mobile base stations (BSs) to assist the construction of ocean sensor networks and underwater mobile collectors to facilitate the information collection [1,2]. In the land, unmanned aerial vehicles (UAVs) play an important role in the similar applications. In particular, UAVs have significant advantages for the rapid construction of wireless emergency communication networks in postdisaster areas.
The disasters caused by earthquake, flood, typhoon, tsunami, and tornado are often unpredictable, and their sudden occurrences will damage the terrestrial radio infrastructures and prevent the victims from communicating with each other or the outside world. Since the 72 hours after the disaster is the most critical, there is a need to deploy wireless networks for emergency communications, which can provide assistance to the rescue workers or teams in the affected areas [3].
Under these emergency communications circumstances, UAVs will be well suited as the platform equipped with small BSs to provide broadband wireless services for postdisaster areas [4,5], where any rotary-wing UAV will hover over the target area and serve the terrestrial users within a certain horizontal distance. In order to serve more terrestrial users, one effective approach is to increase the number of UAVs for collaborative coverage, which is not always available due to increased networking costs. Another alternative is to fly a UAV along a planned path, covering the target area and serving more terrestrial users.
According to the investigation of the existing works, the main challenge faced by the UAV for the path planning in the emergency communications is that the UAV does not know the distribution of terrestrial users within the affected areas. Therefore, a UAV cannot plan the optimal path in advance to maximize the service efficiency. In addition, due to the limited capacity of UAV batteries [6], it is crucial to find the energy-efficient flight trajectory in UAV-assisted emergency communications [7].
In response to the above challenges, the authors in [8] developed an online learning framework to solve a UAV's path planning problem for postdisaster emergency communications, in which the UAV serves as a flying BS to serve terrestrial users along its planed path. In [8], the objective of optimization task is to maximize the sum data rate under the capacity constraints of the UAV battery, and it is modeled as an extended multiarmed bandit (MAB) problem [9], where the two path planning schemes were proposed to gradually learn an energy-efficient path while serving terrestrial users within the UAV coverage.
All the above schemes work in the traditional sub-6 GHz frequency bands, which is difficult to meet the demand of high-throughput communication due to the limited spectrum resources [10,11]. Since the communication duration may be very short when a terrestrial user is in a UAV's coverage each time, there is the increasing demand for ultrahigh-speed data transmission in UAV-assisted emergency communications, which cannot be satisfied by the traditional sub-6 GHz frequency bands because of the limited spectrum resources. The spectrum resources in millimeter wave (mmWave) bands are very rich, which can support ultrahigh-speed data transmission [12]. Therefore, UAVassisted mmWave communication is becoming one of the most prominent technologies in emergency communication systems.
However, mmWave communications usually rely on beamforming technology to overcome its inherent weakness, such as severe path loss. The beamforming training mechanism is usually used to determine the beam orientation and beam width at both the transmission end and receiving end [13]. Because of the unknown distribution of terrestrial users, it is difficult to determine the number of response slots required for the beamforming training process. If the number of response slots is selected at random, it is very likely that the number of response slots will not match the user distribution density, resulting in either waste of response slots or high probability of conflicts when terrestrial users compete for response slots, which is a new challenge that needs to be addressed in UAV-assisted mmWave networks.
In addition, a user equipment (UE) carried by a terrestrial user trapped in a postdisaster area may not be recharged in a timely manner, so it should save as much battery energy as possible. Especially for an energy-constrained sensing device [14] in a postdisaster area, its residual energy level is the most critical influencing factor for ensuring communication with the outside world. With the popularization of terminals equipped with wireless recharging function, a UE can not only share battery energy each other but also conveniently harvest radiofrequency (RF) energy radiated by BSs [15]. Therefore, the UAV hovering above the target area can also act as a recharger to recharge UEs in its coverage area, and in turn, the recharged UEs can use the harvested energy to send data to the BS in the UAV.
For a UAV's each flight, it is very desirable to both find the most effective path to serve as many UEs as possible and maximize each UE's uplink throughput in UAVassisted wireless powered mmWave networks for emergency communications. In fact, it is difficult to achieve both goals simultaneously when the distribution of terrestrial users is unknown. Therefore, in this paper, we focus on how to improve the entire system uplink performance while satisfying the UAV's battery energy constraint and each UE's throughput constraint. We address these challenges in this paper, and our main contributions are as follows: (1) We propose the effective performance optimization scheme in UAV-assisted wireless powered mmWave networks for emergency communications, which can improve the entire system uplink performance while satisfying the UAV's battery energy constraint and each UE's throughput constraint even if the distribution of terrestrial users is unknown (2) Compared with the scheme in [8], each UE makes full use of the energy it harvests to send uplink data in our scheme and thus extends its battery life due to saving its own battery energy (3) In our scheme, the number of response slots can be adjusted according to the currently observed performance results, and thus, it can match the user distribution density as the number of rounds of online learning increases The rest of this paper is organized as follows. The related works are reviewed in Section 2. The system model and the problem statements are detailed in Section 3. The algorithms for solving the problem are described in Section 4. The performance evaluation results are presented in Section 5, while the conclusions are summarized in Section 6.

Related Work
In recent years, the application of UAVs has become more and more extensive, and the researching works in terms of UAV-assisted networks have become a hot spot. In [7], the authors gave an overview of UAV-assisted wireless communications and introduced the basic framework and channel characteristics related to UAV-assisted wireless communications by narrating three cases: UAVs for improving coverage quality, UAVs for information dissemination, and UAVs acting as BSs or relays. Based on the known distribution of terrestrial users, through the joint design of sensing nodes' scheduling scheme, power allocation strategy, and UAV flight trajectory, the scheme in [16] is aimed at minimizing the total power consumption of a UAV, while the work in [17] attempted to minimize the amount of UAVs acting as flying BSs under the condition of satisfying the coverage of ground terminals.
Some works assumed that terrestrial users follow some kind of probability distribution and explored the performance of UAV-assisted networks. The authors in [18] explored how to maximize the coverage probability and system throughput in a single-UAV-assisted downlink network underlaid with device-to-device (D2D) communications, in which the distribution of D2D users follows a homogeneous Poisson point process (PPP). The work of [19] is similar to that in [18], but it considered a multi-UAV-assisted downlink network, in which the UAVs are employed to forward data from satellites to the terrestrial vehicles in a postdisaster 2 Wireless Communications and Mobile Computing area. The authors in [20] designed a prototype of UAVassisted emergency Wi-Fi-based network, which is aimed at accelerating search and rescue operation and doing on-site surveillance over a postdisaster area. By using a stochastic geometry framework, they derived analytical approximate expressions for the outage probability. In a postdisaster area, it is not always feasible to know the distribution of terrestrial users. As mentioned above, the authors in [8] developed an online learning framework to improve performance of UAV-assisted wireless communications without the knowledge of the distribution of ground terminals. Some works focused on connectivity of perceptual networks in a postdisaster area. The authors in [21] explored the deployment of UAV-mounted BSs to assist rescue, where machine-to-machine (M2M) communication is used to establish connectivity and send rescue messages among the human portable/wearable machine-type devices (MTDs) in either relay mode via a UAV or direct mode. For emergency communications based on heterogeneous Internet of Things, the authors in [22] explored how to accommodate the link between the UAV and the terrestrial cellular device and the D2D link outside the UAV's line of sight area in the same frequency band. There are also some open research issues and challenges in UAV-assisted emergency networks, such as intelligent distributed optimization for the trajectory and scheduling for UAVs, and interference management for UAV-assisted emergency networks [23].
The authors of [24] designed a hierarchical beamforming codebook structure with different beam widths to speed up beamforming training and tracking, which is aimed at enhancing the capacity of mmWave UAV-assisted cellular networks. The authors of [25] proposed a training-based beam tracking scheme to keep the connection between UAV and terrestrial BSs. The authors of [26] focused on BS-to-UAV backhaul communications and proposed a beam tracking method, which reduces training overhead by adopting wider beam width at the cost of lower beamforming gain. The authors of [27] proposed a connectivity constraint-based path planning and beam tracking method, by which the UAV can start from a random location and reach its destination within a BS coverage by learning a trajectory while keeping better connectivity.
The works in [28][29][30] mainly explored the theoretical frameworks to derive the coverage probability and average throughput expression. The authors of [28] assumed that the UAVs acted as the parent nodes and were a threedimensional (3D) spatial deployed at the same height, while the UEs acted as the daughter nodes and were distributed in a Poisson cluster process (PCP) mode. In [29], the UAVs and BSs were assumed to be distributed in a PPP mode while the UEs were distributed in a PCP mode. In [30], although the UEs are modeled by using a PCP mode, downlink simul-taneous wireless information and power transfer (SWIPT) scenario and uplink information transmission scenario are jointly considered.
The authors of [31] focused on network coverage and system energy efficiency optimization problem for UAVassisted mmWave cellular networks, but they did not discuss spatially moveable UAVs acting as flying BSs or dynamic UAV placement. The authors of [32] analyzed the secrecy rate performance of UAV-assisted mmWave networks by utilizing Matern hardcore point process, which can also ensure the safety distance between UAVs. The authors of [33] designed a spectrum management architecture for mmWave swarm networks with UAVs acting as flying BSs, which considered the issues of interference, energy consumption, and UAV mobility. The authors of [34] derived the multicell probability and the volume spectral efficiency for UAVassisted mmWave cellular networks by using stochastic geometry.
The authors of [35] derived analytical channel models to evaluate the benefit of mmWave links associated with UAVs, which were further assessed by using outage probability. The authors of [36] investigated the rate performance, fairness, and their tradeoff in UAV swarm connected to mmWave cellular networks, where all UAVs are located in a 3D area and distributed according to a homogeneous PPP. The authors of [37] studied beamforming training and tracking for a UAVassisted mmWave system. They designed a beam training codebook according to the angular domain converted from the known location distribution of users and presented the two beam tracking methods in terms of both random and inertial user mobility to predict the beam direction.
The authors of [34] proposed a 3D location distribution model to characterize the positions of the UAVs by using stochastic geometry and considered multicell coverage probability and volume spectral efficiency as metrics to study the performance of mmWave networks with UAVs acting as aerial BSs. The beam formed by the mmWave antenna array has strong directivity and needs multiple beam scans to cover the whole region, so the authors of [38] proposed a sectoring approach to ensure coverage of the whole region. Since the side lobe gain of the antenna array is considered, it leads to substantial interference in other sectors. They set a threshold on power spillage from adjacent sectors to limit interference in concurrent transmission strategy.
The above solutions for UAV-assisted mmWave cellular networks need to know the probability distribution of terrestrial users or UAVs, which does not always match the reality in the affected area. As far as we know, the work in [8] is most closely related to ours. As mentioned above, the work in [8] cannot be transplanted to a UAV-assisted mmWave emergency communication network, which encourages us to study a new solution for it.

System Model and Problem Statement
3.1. System model. As shown in Figure 1, we consider a single rotary-wing UAV acting as a flying BS to serve a postdisaster area without available ground infrastructure. The UAV takes off in the area where it can be easily recharged, and it will 3 Wireless Communications and Mobile Computing come back to the recharging point to recharge before the battery runs out. We assume that the UAV flies to a position and then hovers above the position to offer service to ground UEs within horizontal distance d. Therefore, the UAV should alternatively fly and hover to cover the disaster-affected area.
We use v f to denote the flying speed of the UAV and adopt t h to denote its hovering interval. Also, we use e max to denote the maximum battery capacity of the UAV, where the UAV's engine powers for hovering and flying are denoted as p h and p f respectively.
We divide the considered area into N × N equal size grids. Also, it is assumed that the UAV can serve all the ground UEs in the grid while hovering over the center of the grid. We use ℕ = f1, 2, ⋯, N 2 g to denote the set of all the grids. Also, we adopt l K = fg 0 , g 1 , ⋯, g K g to represent a UAV path, where each element g k (g k ∈ ℕ and 1 ≤ k ≤ K) denotes a grid in this area and g 0 is the recharging point. The path starts at g 0 , serves K grids, and returns to g 0 to recharge. The set of all the possible paths that start and end at the recharging point is denoted as L. The ground UEs are located randomly in the considered area, and these UE users need wireless access service with the same probability. We use D g k to denote the data volume of the ground UEs within grid g k .
The distribution of UEs in each grid is unknown to the UAV. However, when the UAV selects to hover above the center of a grid, it can get the distribution of UEs in this grid through the mmWave beam training process. We assume that the UAV starts a mmWave beam training process at the transmission power p c . During the mmWave beam training process, the UEs that do not have data delivery requirement can turn on the energy harvesting mode, while those that have data delivery requirement can respond to the training initiation package by feeding back the response packages with location information. After obtaining the location information of all the UEs of the grid, the UAV can adjust its transmission beam width to cover exactly all the UEs of the grid and then start the downlink recharging process at power p c . We assume that these UEs only use the harvested energy to send uplink data. The optimization task can be given by where d g k−1 ,g k represents the horizontal distance between the centers of grid g k−1 and grid g k , and η g k is the ratio of charging time in the hovering interval t h in grid g k .

Time-Slotted Structure.
In general, the hovering interval t h consists of the beam training interval t b , the recharging interval t c , and the transmission interval t d . As shown in Figure 2, the beamforming training interval t b consists of one starting slot for training process, multiple response slots, and one confirming slot for training results, where it satisfies the following relation: where τ b is the size of a slot, which is a predetermined value. The number of response time slots depends on the number of UEs participating in the beamforming training. If the number of UEs is large, the response slots need to be set more accordingly.
The recharging interval t c and the transmission interval t d are closely related and bound by a communication cycle. That is, t c + t d = τ. In each communication cycle τ, the UAV can recharge UEs during the recharging interval t c , while it can receive data from UEs during the transmission interval t d .

Wireless Communications and Mobile Computing
To allow multiple ground UEs to communicate with the UAV during each communication cycle τ, we need to divide τ into a downward recharging slot (i.e., ϵ 0 τ) and multiple upward transmission slots (i.e., ϵ j τ, 1 ≤ j ≤ J) based on time division multiple access mode, which is shown in Figure 3. The downward recharging slot is allocated to the UAV to transfer radio energy to its serving UEs, while the upward transmission slots are allocated to UEs for data transmission, during which each UE transmits its own data to the UAV by using the assigned upward transmission slot.
Since ϵ 0 , ϵ 1 , …, ϵ J are the time ratios in communication cycle τ allocated to the UAV and its serving UEs (e.g., UE 1 , UE 2 , ⋯, UE J ) for energy transfer downward and data transfer upward, respectively, which meets the following relation: Therefore, η g k ð1 ≤ k ≤ KÞ can be estimated by the following formula: 3.3. Beam Alignment and Signal Propagation Model. To know the distribution of UEs in a grid and align the mmWave beam between each UE and the UAV, the UAV needs to start a beamforming training process between the UAV and the UEs. Firstly, the UAV broadcasts a directional beacon frame in its mmWave beam facing the ground, where all the UEs remain in a quasiomnidirectional listening mode. Then, each UE that wants to communicate with the UAV chooses a response slot that is similar to the association beamforming training slot in IEEE 802.11ad/ay to carry out beamforming training of the UE side (i.e., it transmits the directional beacon frames in all its beams). After receiving the directional beacon frames from UEs, the UAV can obtain the beamforming training information and the location information of every UE. Finally, the UAV broadcasts a confirmation frame to announce the conflict-free UEs. By setting a reasonable number of training slots, we expect to avoid the collision of UEs' beacon frames while keeping training time as short as possible. If there is the collision of UEs' beacon frames, the number of training slots will be increased based on a certain strategy when the beamforming training is needed again. Otherwise, if the number of responding UEs is significantly less than the total number of response slots, it will be reduced. We employ the following mmWave signal propagation model: where p t s is the transmission power at the mmWave transmission beam of the UAV s, which is set to p c ; p r s,u is the received power at the mmWave link s ⟶ u from the UAV s to the UE u; G t s,u is the directional transmission gain at the mmWave transmission beam of the UAV s, while G r s,u is the directional reception gain at s ⟶ u; G c s,u is the channel gain at s ⟶ u. When the mmWave beam between the UAV s and the UE u is aligned, G r s,u is obtained by the following formula [39]: where φ r s,u is the reception beam width of the UE u; ε is the gain of the side lobe and 0 < ε ≪ 1, while φ r s,u is the main lobe in radian. G t s,u is obtained by the following formula [39]: where φ t s,u is the transmission beam width of the UAV s, and it is the main lobe in radian. The channel gain G c s,u is obtained by the following formula [40]: where δ (·) is the Dirac delta function; τ s,u and χ c s,u are the propagation delay and the amplitude of s ⟶ u, respectively. τ s,u is obtained by the following formula: where d s,u is the distance of s ⟶ u, and c is the speed of light. When s ⟶ u is a line-of-sight (LOS) link, the amplitude is obtained by the following formula [40]: where λ is the wavelength and λ = c/f c , f c is the carrier frequency. if s ⟶ u is a non-line-of-sight (NLOS) link, it will be eliminated during the beam training process, so we do not discuss its estimation method. d s,u is estimated by the where (x s ,y s ,z s ) is the three-dimensional coordinate of the UAV s, while (x u ,y u ,z u ) is that of the UE u.
3.4. Throughput Maximization for UEs in Single Grid. After obtaining the number (e.g., J) of conflict-free UEs participating in the beamforming training process, the UAV (e.g., s) wirelessly powers this set of UEs at the power p c , where the amount of energy harvested by each UE (e.g., u) in the downward recharging slot ϵ 0 is obtained by the following formula: where e s,u is the amount of energy harvested by the UE u from the UAV s, while p c denotes the recharging power of the UAV s. After the UE u supplement its energy during the downward phase, in the subsequent upward phase, it transfers its own data to the UAV s in its allocated time slot ϵ u (1 ≤ u ≤ J). Because of the recharging efficiency problem, not all of the energy harvested by each UE can be adopted for its data transmission in the upward. We use κ u to denote the power conversion efficiency factor for the UE u, 0 < κ u ≤ 1. Thus, the average transmission power according to the harvested energy for the UE u is given by To simplify the formulaic expression of problem modeling, we assume κ u = 1, ∀u. Although changing κ u can affect the range of values, it has no effect on the overall trend of the simulation results described in the following text. Based on (12)~(13) and the Shannon theorem, the achievable upward throughput from the UE u to the UAV s in bits/second (bps) can be given by where ϵ = ½ϵ 0 , ϵ 1 , ⋯, ϵ u , ⋯, ϵ J , H is the mmWave bandwidth, and ϑ s,u is expressed as follows: where σ 2 s is the ambient noise at the UAV s, and G t u,s , G r u,s , and G c u,s are the directional transmission gain, the directional reception gain, and the channel gain at the mmWave link u ⟶ s from the UE u to the UAV s, respectively. From (14), it is seen that r s,u ðϵÞ decreases with ϵ 0 for a given ϵ u . Moreover, it is observed that r s,u ðϵÞ increases with ϵ u for a given ϵ 0 . However, due to the total time constraint in (2), ϵ 0 and ϵ u cannot be increased simultaneously. Therefore, there exists an optimal time allocation (i.e., ϵ * 0 ) for maximizing the throughput. When ϵ 0 is less than ϵ * 0 , the throughput gets larger with ϵ 0 ; otherwise, it gets smaller with ϵ 0 .
For J UEs, if we do not limit their lowest throughput, the "doubly near-far" phenomenon explained in [41] will occur while optimizing their total throughput, which is generated by the downward and upward distance-dependent signal propagation attenuation. Furthermore, the result generated from the common-throughput maximization scheme in [41] deviates far from the optimal value. According to (14), the total throughput of all the J UEs is given by where R g k ðϵÞ is the total throughput of all the UEs of the grid g k served by the UAV s and it is also a function of the time allocation vector ϵ. Thus, the total throughput maximum problem based on time slot threshold constraint is modeled as follows: Lemma 1. r s,u ðϵÞ is a concave function of ϵ for any given u ∈ f1, 2, ⋯, Jg.
Proof. Please refer to the proof of Lemma 3.1 in [41]. ☐ Based on Lemma 1, R g k ðϵÞ is also a concave function of ϵ because it is the sum of r s,u ðϵÞ. Thus, the optimization problem (17) is convex, which can be solved by using convex optimization methods. Lemma 3.2 in [41] can be used to solve the optimization problem (17), which is repeated as follows. If A is more than 0, there is a unique z * > 1 that is the solution of f ðzÞ = A, in which f ðzÞ ≜ z•ln z − z + 1 and z ≥ 0. Thus, f ðzÞ is convex function with respect to z > 0, in which the minimum value is get at z = 1 with f ð1Þ = 0. If A is more than 0 and not more than 1, there are the two different solutions for f ðzÞ = A. If A is more than 1, there is only one solution for f ðzÞ = A, which is more than 1, i.e., z * > 1.

Proposition 2.
The suboptimal time allocation solution for the optimization problem (17), which is denoted by ϵ * = ½ϵ * 0 , ϵ * 1 , ⋯, ϵ * u , ⋯, ϵ * J , is given by  (17) is given by where ς ≥ 0 and ζ u > 0 ðu ∈ ½1, JÞ are the Lagrange multipliers associated with the constraints in the optimization problem (17). Thus, the dual function of the optimization problem (17) is given by where Ω is the feasible set of ς and ζ specified by the constraints in the optimization problem (17). ☐ It is seen from the optimization problem (17) that there is a ϵ ∈ Ω with ϵ u > ϵ th ðu ∈ ½0, JÞ, meeting ∑ J u=0 ϵ u < 1. Therefore, the optimization problem (17) has a strong duality due to the Slater's condition [42] and thus is a convex optimization problem, where the Karush-Kuhn-Tucker (KKT) conditions are both necessary and sufficient for the global optimality of the optimization problem (17), which are given by where ϵ * u , ς * , and ζ * u represent the optimal and dual solutions of the optimization problem (17), respectively. It is easily verified that ∑ J u=0 ϵ * u = 1 must hold for the optimization problem (17) and thus from (22) without loss of generality, we assume ς * > 0. Also, it is easily verified that ϵ * u > ϵ th must hold for the optimization problem (17) and thus from (24) without loss of generality, we set ζ * u = 0, u ∈ ½0, J. From (25), it follows that where tðxÞ is defined as follows: It is shown that tðxÞ is a monotonically increasing function with respect to x ≥ 0 because dtðxÞ/dx = xð1 + xÞ −2 ≥ 0 for x ≥ 0. Thus, the equality in (29) holds if and only if ϑ s, It is worth noting that (21) and (30), respectively. Thus, ϵ * u is given by where A = ∑ J u'=1 ϑ s,u' . Furthermore, it follows from (27)~(31) that where C is defined in (30). Because C = A/1 − ϵ * 0 from (31), we can rewrite (32) as where z = 1 + ðAϵ * 0 /ð1 − ϵ * 0 Þ1 − ϵ * 0 Þ. It is seen that z is more than 1 if A is more than 0 and 0 < ϵ * 0 < 1. On the basis of Lemma 3.2 in [41], there is a unique z * > 1 that is the solution of (33). Thus, the optimal time allocation to the downward radio energy transfer is estimated as follows: Furthermore, on the basis of (31) and (34), the optimal time allocation to the upward radio information transmission is estimated as follows: Therefore, it proves Proposition 2.
In order to make the suboptimal time allocation solution (i.e., the formula (18)) for the optimal problem (17) meet the minimum throughput requirement of the UE with the 7 Wireless Communications and Mobile Computing longest communication distance, the following relationships must be satisfied: where the meanings of H and L are shown in Figure 4, and ρ th may be selected from 0.7 to 0.9 based on the actual experience. Therefore, the following relationships must be satisfied accordingly: From Figure 4, it can be known that the ground-oriented mmWave beam width of the UAV should not exceed 2 cot −1 ρ th . After the beam training interval t b , the UAV gets the coordinates of all the conflict-free UEs. So, it can also estimate the distance from itself to each UE according to formula (11). If the longest distance is less than L, it can further narrow the beam width facing the ground, which makes the emitted energy be more concentrated and also further inhibits the adverse effects of the "doubly near-far" phenomenon.

UAV Flight Path Planning for Throughput Optimization.
For optimization task (1), the throughput optimization problem for the UEs in a single grid (e.g., the grid g k ) has been solved in Section 3.4, but it does not involve how to choose an optimal flight path for the UAV to maximize the throughput it receives from the UEs of all the grids in this path. In this subsection, we will address it.
The UAV is unaware of the distribution of UEs in a grid and their communication requirements until a beamforming training process is performed on a selected grid. Due to the limited battery capacity of the UAV, only a subset of all the grids can be selected to form its flight path during a flight. The selection problem of an optimized subset can be transformed into a MAB problem, where there exists a bandit with multiple arms denoted by ℕ = f1, 2, ⋯, N 2 g and each arm offers a reward (e.g., R g k , g k ∈ ℕ). In this paper, the UAV is the gambler and each grid is its arm.
Based on the MAB theory, in order to maximize the total reward of all the trials, the gambler needs to explore different arms to find the most value one and also exploit the arm with most value as many times as possible so as to maximize the total reward. It is desirable to find a balance between exploration and exploitation.
Unlike the real gambler who can select each arm arbitrarily, the UAV need to fly to different grids to serve UEs. So, it needs to consider not only the battery energy consumption during a flight but also the number of potential UEs in the target grid. The former depends on the horizontal distance between its current grid and the target grid, while the latter requires a reasonable forecast of the number of UEs in the target grid. Since the classic algorithms for the MAB problem do not consider the number of potential UEs in the target grid, we extend them to solve our formulated UAV path planning task. For the distance-aware upper con-fidence bound (D-UCB) algorithm in [8], the UAV first serves each grid once, and then, it selects a grid from ℕ = f 1, 2, ⋯, N 2 g according to the following formula: where t represents the index of serving rounds and the UAV serves a grid in each round, r' n,t is the average reward of grid n, T n,t is the times that the UAV visited grid n, d n t−1 ,n represents the horizontal distance between the next grid and the current one, d n 1 ,n represents the horizontal distance between the next grid and the recharging point, B r represents the remaining battery capacity of the UAV, and α, β, and γ are the relevant empirical parameters.
In (38), if r' n,t is large, the UAV tends to exploit this highprofit grid to get the possible maximal throughput; if T n,t is large, the confidence interval ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ln t/T n,t p decreases, and thus, the UAV tends to explore other less selected grids. In addition, βd n t−1 ,n represents the flight cost and it prompts the UAV to select the nearby grids to save the flight energy consumption, while γðd n 1 ,n /B r Þ represents the remaining battery level of the UAV and it prompts the UAV to select the grids near the recharging point to return to the recharging point with less energy consumption.
Obviously, formula (38) does not take into account the communication requirement of the UEs who have conflicts during the current round of beamforming training process. Ignoring such potential UEs will result in unfair distribution of access opportunities to the UAV. In addition, in order to avoid the response frame conflict of beamforming training process, the number of response slots will be increased, which in turn may make the number of response slots greatly exceed the actual number of responding UEs due to the dynamic distribution of UEs. For any grid which occurs in  Wireless Communications and Mobile Computing this case, if the number of response slots is reduced, the corresponding communication cycle τ will be increased. That is, this grid has the potential for greater throughput. Therefore, we propose the improved grid selection model based on distance-aware as follows: whereR min ft 0 ,tg g k+1 is the average throughput of grid g k+1 during the min ft 0 , tg latest rounds; t 0 is fixed to a positive integer; T g k+1 is the cumulative times that the UAV visited grid g k ; d g k ,g k+1 represents the horizontal distance between the next grid g k+1 and the current grid g k ; d g 0 ,g k+1 represents the horizontal distance between the next grid g k+1 and the recharging point g 0 ; e r represents the remaining battery energy of the UAV; χ D and ψ D are the relevant empirical parameters; M g k+1 is the number of response slots with conflicting signal in the next grid g k+1 , which records the information about the last conflict that occurred in this grid; J g k+1 is the latest number of conflict-free UEs in the next grid g k+1 ; and I g k+1 is the latest number of total response slots in the next grid g k+1 .R min ft 0 ,tg g k+1 is given bŷ where R g k+1 ðiÞ is the throughput of grid g k+1 during the i th round.
Based on formulas (39) and (40), we design the flight path planning algorithm based on improved distance-aware (FPP-IDA) for a fully recharged UAV to find the most effective path to serve as many UEs as possible. The FPP-IDA algorithm requires that a UAV must first visit each grid once in order to know the rewards from all the grids. After that, it can select out the next grid with the greatest potential reward Run at the UAV s Input: e max , t h , p h , p c , p f , v f Output: ∑ K k=1 D g k 1: t = 0; e r = e max ; D sum = 0; g k = g 0 2: For each grid g k+1 ∈ ℕ except for the current grid g k do 3: Initialize I g k+1 to an empirical value 4: T g k+1 = 0; M g k+1 = 0; J g k+1 = 0 5: Invoke Algorithm2 to get D g k+1 and update M g k+1 , J g k+1 , η g k+1 6: t = t + 1;T g k+1 = T g k+1 + 1; D sum = D sum + D g k+1 ; g k = g k+1 8: End for 9: ℕ r = ∅ 10: For each grid g k+1 ∈ ℕ except for the current grid g k do 11: Ife r > ethen add g k+1 to ℕ r End if 13: End for 14: Ifℕ r ≠ ∅then 15: Select a grid g k+1 from ℕ r according to formula (39) 16: IfM g k+1 > 0then 17: I g k+1 = I g k+1 + M g k+1 18: Else 19: End if 21: Invoke Algorithm2 to get D g k+1 and update M g k+1 , J g k+1 , η g k+1 22: Go to 9 25: End if 26: Return D sum Algorithm 1: FPP-IDA. 9 Wireless Communications and Mobile Computing according the historical average reward of each grid, the number of the served times of each grid, and the other relevant parameters. When the FPP-IDA algorithm judges that the remaining energy after the UAV's visiting the next grid is not enough to make the UAV fly back to the recharging point, it instructs that the UAV stops visiting the next grid and flies back to the recharging point. The further details for the FPP-IDA algorithm are described in Section 4.
For the ∂-exploration algorithm in [8], the UAV selects a grid with the probability ∂ or selects the grid that generate maximum throughput with the probability 1-∂. After checking the remaining battery capacity at each round t, with probability ∂, the UAV selects a grid from ℕ = f1, 2, ⋯, N 2 g according to a softmax function which converts the average throughput of the grid n into the probability.
where ϱ is more than 0 and it is the temperature parameter of the softmax function. If this variable ϱ goes to infinity (i.e., high temperature), all the grids will be selected by nearly the same probability. Otherwise, if this variable ϱ goes to 0 (i.e., low temperature), the probability of choosing the grid with the highest average throughput tends to 1. Also, with probability 1-∂, the UAV selects a grid from ℕ = f1, 2, ⋯, Since the ∂-exploration algorithm tends to choose the nearby grids according to formula (42), we propose the improved grid selection model based on ∂-exploration as follows: where χ ∂ and ψ ∂ are the relevant empirical parameters. In addition, we also rewrite formula (41) as follows: Based on formulas (40), (43), and (44), we design the flight path planning algorithm based on improved ∂-exploration (FPP-I∂E) for a fully recharged UAV to find the most effective path to serve as many UEs as possible. Unlike the FPP-IDA algorithm, the UAV running the FPP-I∂E algorithm does not serve each grid in advance once since it is based on probability ∂ to balance exploration and exploitation, where the larger the ∂ is, the more inclined it is to Run at the UAV s Input: g k , e r , τ b , t h , p h , p c , p f , v f , I g k Output: R g k , M g k , J g k , η g k 1: Initialize L r to 0 2: Broadcasts a directional beacon frame in the mmWave beam facing the ground during the starting slot 3: Waiting for response frames during the response slots 4: Broadcasts a confirmation frame to announce the conflict-free UEs during the confirming slot 5: Observe the number of conflict-free UEs 6: Update J g k with the number of the current conflict-free UEs 7: Observe the number of response slots with conflicting signal 8: Update M g k with the number of conflicting response slots 9: Calculate d s,u according to formula (11) 13: Ifd s,u > L r thenL r = d s,u End If 14: End for 15: IfL r < Lthen 16: Update the UAV's beam width according to ϕ r = cot −1 ðH/L r Þ 17: End If 18: Get ϵ * = ½ϵ * 0 , ϵ * 1 , ⋯, ϵ * J g k by invoking Algorithms1-2 in [43] 19: Calculate R g k ðϵ * Þ according to formula (16) 20: D g k = ð1 − ϵ * 0 ÞτR g k 21: Return D g k , M g k , J g k , η g k Algorithm 2: Beam alignment and first-charging-then-receiving.

10
Wireless Communications and Mobile Computing exploitation. The further details for the FPP-I∂E algorithm are described in Section 4.

Problem Solving Algorithm Description
The pseudocode for FPP-IDA is listed in Algorithm 1. In Algorithm 1, after initializing some variables (see line 1), each grid must be visited once (see lines 2~8), where parameter variables associated with each grid are initialized (see lines 3~4), and then, Algorithm 2 is invoked to get the throughput for each grid and update the corresponding parameters (see line 5). After the execution of Algorithm 2, the remaining energy of the UAV is updated (see line 6), the number of times that each grid is visited is recorded, and the amount of data received by the UAV is accumulated (see line 7). After each grid is serviced once, it is necessary to determine the next most appropriate grid to service, where the grids that meet the conditions are firstly identified (see lines 9~13), and then the most suitable one is selected (see lines 14~25). In the identification phase, if the amount of energy that the UAV takes to serve a grid and then return to a recharging point is less than its current remaining energy, this grid can be served by the UAV and thus added to the candidate set ℕ r (see lines 11~12). In the selection and service phase, the algorithm can adjust the serving parameters of the currently selected grid based on its historical data (see lines 16~20), where the number of response slots should be increased accordingly if this grid has any conflicting historical record (see lines 16~17); otherwise, the number of response slots should be reduced appropriately if there are too many idle response slots (see lines 18~20).
Lines 2~8 of Algorithm 2 mainly complete the beam training task and observe and record the number of conflicting response slots and the information of nonconflicting UEs. According to the known position information of nonconflicting UEs, Algorithm 2 adjusts the beam width of the UAV (see lines 11~17), and invokes Algorithms 1-2 in [43] to get ϵ * = ½ϵ * 0 , ϵ * 1 , ⋯, ϵ * J g k (see line 18). From this, the data collected by the UAV from the grid can be calculated (see lines 19~20).

Wireless Communications and Mobile Computing
Based on Algorithm 1 and Algorithm 2, we know that the FPP-IDA algorithm needs K rounds of calculation for a UAV flight path l K = fg 0 , g 1 , ⋯, g K g, where N 2 − 1 grids need to be calculated and compared to select the next service grid in each round, and thus, the computational complexity of the FPP-IDA algorithm is OðK × N 2 Þ. In order to get the optimal solution, we must enumerate each possible solution in the solution space to determine which is optimal, and thus, the computational complexity of obtaining the optimal solution is OðN 2K Þ. Clearly, the computational complexity of the FPP-IDA algorithm is significantly less than that of the optimal solution mathematically.
The pseudo-code for FPP-I∂E is listed in Algorithm 3, which is basically similar to Algorithm 1, except that it does not require that each grid must be served once at the beginning of algorithm execution process and its grid selection strategy (see lines 13~14 and lines 25~26) is different from that of Algorithm 1.

Experimental Parameter Settings.
We study the performance of our two algorithms (i.e., FPP-IDA and FPP-I∂E) through simulation in terms of the total number of served UEs, the number of visited grids, the amount of data, the average throughput, and the battery capacity utilization level. The battery capacity utilization level is defined as the ratio of the amount of energy used for wireless recharging and hovering to the initial amount of energy of the UAV, which is used to measure the percentage of energy to serve UEs. When it gets bigger, it means that the UAV is more about serving UEs instead of flight.

Wireless Communications and Mobile Computing
To compare the strengths and weaknesses of our two grid selection models, we consider the flight path planning algorithm based on distance-aware (FPP-DA) and the flight path planning algorithm based on ∂-exploration (FPP-∂E) as the comparison algorithms. FPP-DA selects the next grid by formula (38) instead of formula (39), and the others are the same as those of FPP-IDA. Similarly, FPP-∂E selects the next grid by formulas (41) and (42) instead of formulas (43) and (44), and the others are the same as those of FPP-I∂E.
The UEs are randomly distributed and the traffic load in any grid follows binomial distribution Bðu k , PÞ, where u k is the number of UEs in grid g k , P is the possibility that the UE needs radio access service, and the possibility that each UE needs radio access service is fixed to 0.2. In addition, each UE will transfer the given amount of data when it wants to get access service, where the length of the sent data is determined randomly. If any UE gets an access service in a round but it does not complete all the data transfers in this round, it will compete for access service in each subsequent encountered beamforming training opportunity.
The key parameters of the proposed algorithms and the comparison algorithms are selected via a series of numerical simulations, and these parameters and the other main parameters are listed in Table 1.

Experimental Results and Analysis.
We first consider the five simulations, which are shown in Figures 5-9. Here, the number of UEs is fixed to 1000, and the distance between the recharging point and the center of the covered area is fixed to 900 m.
We simulate the performance trend of the four algorithms as the initial amount of energy (i.e., the battery capacity) varies. From Figures 5-7, we can see that all the four algorithms can serve more UEs, visit more grids, and get more amount of data as the battery capacity increases. This is because the longer paths can be planned when the UAV has more energy. Figure 8 shows that there is not a significant relationship interaction between the throughput and the battery capacity, which is consistent with the intuition. Still, there is a small

13
Wireless Communications and Mobile Computing variation trend. When the battery capacity gets large from a small value, the UAV has more opportunities to find the grids with higher throughput. However, when it continues to get large, this advantage is no longer obvious or even vanishes since it may not find the more grids with higher throughput.
From Figure 9, the battery capacity utilization level in FPP-IDA and FPP-DA decreases with the battery capacity, while that of FPP-I∂E and FPP-∂E increases with the battery capacity. In FPP-IDA and FPP-DA, once a grid is visited, the probability of its being visited again is reduced. Therefore, the UAV needs to fly more different grids to serve UEs, and thus, it increases its flying distance and consumes more flight energy. In FPP-I∂E and FPP-∂E, once the UAV has spotted a few nearby high-throughput grids, it tends to repeat the aircraft over the neighboring area since the exploration probability ∂ is very small, which is helpful to reduce the flying distance and flight energy. Figures 5-9 show that FPP-IDA outperforms FPP-DA on the five performance metrics. This is mainly because we have taken into account the avoidance of conflicts in the beamforming training phase and the dynamic adjustment of response slots. From Figures 5-9, we also see that these considered factors have little effect on FPP-I∂E. As mentioned above, FPP-I∂E tends to make the UAV repeat the aircraft over the neighboring area once the UAV has spotted a few nearby high-throughput grids. So the effect of these considered factors is also compromised when the UAV always serve the UEs in those familiar grids. When compared with FPP-∂E, FPP-I∂E allows the UAV to explore more unknown grids and thus extend the service range, though it has worse performance in terms of the total number of served UEs, the amount of data, and the battery capacity utilization level.
In addition, when compared with FPP-I∂E and FPP-∂E, FPP-IDA and FPP-DA have the wider service range and thus have the better fairness. This is mainly because in FPP-IDA and FPP-DA, the increase in the number of times of being served will reduce the probability of being served again, which in turn limits the performance improvement in terms  14 Wireless Communications and Mobile Computing of the total number of served UEs, the amount of data, and the battery capacity utilization level due to more flight energy consumption. In Figures 10-14, the battery capacity is fixed as 1000, while the distance between the recharging point and the center of the covered area is fixed as 900 m. We simulate the performance trend of the four algorithms as the number of UEs varies. From Figures 10-14, we can see that, as the number of UEs increases, four of all the five performance metrics of the four algorithms show an improved trend. This is due to the fact that the four algorithms can select out better planned paths in a high-density user environment, which is helpful to improve the total number of served UEs, the amount of data, the average throughput, and the battery capacity utilization level. However, in terms of the number of visited grids, the change of the number of UEs has a greater impact on the instability of FPP-∂E, especially when the number of UEs is larger, the UAV is more likely to be confined to a smaller range of flight. The main reason is that the exploration prob-ability ∂ is too small in FPP-∂E. For the same exploration probability, FPP-I∂E overcomes this weakness of FPP-∂E since it increases the visiting probability to the grids with conflicting records and thus extends the service range. Figures 10-14 show that FPP-IDA is on the whole better than FPP-DA while FPP-I∂E is slightly worse than FPP-∂E in terms of the total number of served UEs, the amount of data, and the battery capacity utilization level. However, FPP-I∂E is superior to FPP-∂E in terms of the total number of visited grids. In addition, because of the larger service range, the total number of served UEs in FPP-I∂E exceeds that in FPP-∂E when the number of UEs distributed on the ground is large. The main reason of the above phenomena is the same as that explained in Figures 5-9.
In Figure 15-19, the number of UEs is fixed as 1000, while the battery capacity is fixed as 100. As shown in Figures 15, 17, and 19, the performances of all the four algorithms in terms of the total number of served UEs, the   amount of data, and the battery capacity utilization level decrease as the distance between the recharging point and the center of the covered area increases. The main reason is that the longer distance between the recharging point and the center of the covered area requires more energy to be reserved to return to the recharging point, so there is less energy to serve the UEs when the battery capacity is fixed. Figure 16 shows that there is no significant correlation between the number of visited grids and the distance between the recharging point and the center of the covered area, while Figure 18 shows that there is no significant correlation between the average throughput and the distance between the recharging point and the center of the covered area. In fact, the former mainly depends on the strategy function for selecting grids, while the latter mainly relies on the selected grids.

Conclusion
In this paper, we have addressed the performance optimization problem in UAV-assisted wireless powered mmWave networks for emergency communications. The optimization task of concern is transformed into an extended MAB problem, for which we have proposed FPP-IDA and FPP-I∂E to solve efficiently, respectively. Our algorithms improve the ability to explore the unknown user distribution. Simulation results show that FPP-IDA outperforms FPP-DA in terms of all the five metrics while FPP-I∂E outperforms FPP-∂E in terms of the number of visited grids.

Data Availability
The simulation data used to support the findings of this study are available from the corresponding author upon request.