Modelling and Simulation of Distributed UAV Swarm Cooperative Planning and Perception

. As an emerging topic, the swarm of autonomous unmanned aerial vehicles (UAVs) has been attracting great attention. Due to the indeterminacy of sensors, distributed cooperative swarms have been considered to be e ﬃ cient and robust but challenging to design and test. To facilitate the development of distributed swarms, it has been proposed to utilise a simulation platform for cooperative UAVs using imperfect perception. However, the existing simulation platforms cannot satisfy this demand due to a few reasons. First, they are designed for a speci ﬁ c purpose, and their functionalities are di ﬃ cult to extend. Second, the existing platforms lack compatibility to be applied to di ﬀ erent types of scenarios. Third, the modelling of these platforms is too simpli ﬁ ed to simulate ﬂ ight motion dynamic and noisy communication accurately, which may cause a di ﬀ erence in performance between the simulation and real-world application. To address the mentioned issues, this paper models the problem and proposes a simulation platform for distributed swarm cooperative perception, which addresses software engineering concerns and provides a set of extendable functionalities of a cooperative swarm, including communication, estimation, perception fusion, and path planning. The applicability of the proposed platform is veri ﬁ ed by simulations with the real-world application. The simulation results demonstrate that the proposed system is viable.


Introduction
A swarm of autonomous unmanned aerial vehicles (UAVs) has been an emerging topic in the fields of manufacture, disaster rescue, and the military [1]. Autonomous unmanned swarms are expected to outperform a single complex UAV in terms of flexibility and robustness, thus offering enhanced adaptability, survivability, and fault tolerance. For instance, in strike coordination and reconnaissance missions, a swarm of UAVs can cover a given area in a shorter time than a single one, and a failure of any swarm member will not cause a failure of the entire mission, as shown in Figure 1. However, due to the complexity of the joint decision, the larger the swarm is, the higher the demand for cooperative planning and perception algorithm will be. To achieve the aforementioned benefits, the interswarm cooperative planning and perception algorithm, as well as many other factors, need to be studied in-depth.
First, the interswarm planning and perception algorithm should be fully distributed. Otherwise, the swarm is vulnerable to the failure of certain member. Take a centralised swarm as a counterexample, where a swarm is controlled by a central member (leader), who fuses the perceptions of members and makes a planning decision for them. Once the leader fails, the swarm also fails; thus, the swarm is considered to be nonresilient. Second, the interswarm planning and perception algorithm should be intelligent. The swarm performance should increase with the swarm scale. Lastly, to be applicable in the real-world, physical constraints of the swarm, such as packet loss in communication and noise influenced sensor perception, need to be taken into account.
Studies on the swarm cooperative planning and perception have been limited by many factors, such as the difficulty of a real-world experiment due to the lack of funds to build a UAV swarm, a risk of damage caused by a fall down [2] of a UAV in the swarm, and the regulation that constrains UAV flight in urban areas. Therefore, simulation-based validation and testing can significantly facilitate the study of multiple UAV swarm planning and control. Vásárhelyi et al. [3] optimised the parameters of the proposed decentralised guiding algorithm, which enabled large swarms of autonomous drones to navigate in confined spaces seamlessly. The optimised algorithm was tested in real-world applications using a swarm of 30 drones, and the obtained test results coincided well with the expectations. Researchers in the ETH-MAV team in the 2017 Mohamed Bin Zayed International Robotics Challenge tested their swarming algorithm for competition [4] before a real flight using Gazebo [5] as the simulation environment. In most studies, the proposed algorithms have been verified only by numerical simulation. Researchers from the US Air Force Research Laboratory [6] proposed a multiple autonomous vehicle visiting routine planning (VRP) algorithm and validated it by numerical simulations. A framework for generating feasible trajectories in motion coordination problems proposed in [7] was also validated by simulations. Shao et al. [8,9] developed a state estimator-based minimal learning parameter (SE-MLP) observer using simulation verification to deal with uncertainties appearing in individual quadrotors of the swarm.
Currently, there are many commercial and open-source simulation platforms for multi-UAV planning and control. The MultiUAV2 [10] released by the US Air Force Research Laboratory, which is part of the MATLAB and Simulink soft-ware, represents a simulation platform capable of simulating multiple UAVs that cooperate to accomplish tactical missions. However, this platform may cause a copyright problem to the research group without a commercial license. The CoUAV [11] enables rapid implementation of simulation for multiple UAVs, and its source code and demo are available for public access, but in this platform, a member in a UAV swarm is controlled by a ground control centres, which forms a centralised rather than a distributed scheme. The "Infoplanner" simulation platform that was open-sourced by Schlotfeldt et al. [12] is capable of simulating decentralised cooperative planning using an imperfect sensor. However, due to its specific design and lack of documentation, extensions to the system, such as an extension of a sensor model or communication setting, are challenging. An expandable, maintainable, and commercial license-free simulation platform is unavailable nowadays.
To address the mentioned challenges, this paper proposes a simulation platform that provides flexibility in defining varieties of motion dynamics, sensor models, and communication settings; convenience in integration different objectivities with planning algorithm; out-of-thebox trajectory plotting and multiround Monte Carlo simulation; and no requirement of commercial license. The proposed simulation platform is developed with software engineering concerns, such as maintainability and expansibility, and is applied in a military operation study [13,14].      The remainder of this paper is organised as follows. Section 2 introduces the system modelling. Section 3 presents simulation cases of cooperative locating radio emitter with an airborne radio receiver to demonstrate the usefulness of the proposed platform. Lastly, Section 4 concludes the paper and presents future work directions.

System Modelling
2.1. Overview. In this section, the swarm planning and perception procedure is modelled in a general pattern, as shown in Figure 2. Also, the system design and architecture of the simulation platform which are aimed at supporting rapid and robust development and implementation are introduced, and they are shown in Figure 3. To apply the simulation platform to different scenarios, UAV platforms, perception apparatus, and target motion characteristics, the system modelling adopts a highly abstract way which highlights the interaction among the UAVs, targets, perception, and communication. Every part of the modelling can be further replaced with a specific detailed model. The replacement is also known as concrete implementation of an abstract representation in software engineering.

Problem
Formulation. The problem is formulated in a lockstep discrete way wherein the states of each module are computed synchronically. This design enabled that the simulation can run faster or slower than real time and pause at any time. Consider a swarm consisting of n a UAVs with a discrete motion dynamic. Then, it can be written that where vector s i k ∈ S i ≅ ℝ n s represents the n s dimensional state of UAV i at time k and vector u i k ∈ U i is the control action applied to the UAV i at time k, which is one of all possible decisions U i .
Further, consider n t targets with Markovian state transition so that where x j k is the state vector of a target j at time k, X j k is the set of all possible states of the target j, and ℙ t ð·Þ is the transition probability of the target j. It should be noted that the semi-Markovian system can be converted into the Markovian system by extending the state. According to real-world application experience, it is assumed that a target has no awareness of the existence of the swarm and that its own underlay state transition model is unavailable for the swarm. Considering the Bayesian interference, a swarm model target state transition can be expressed as x j k+1 ∼ ℙ t ′ðX j k+1 ,|,X j k Þ, j ∈ T = f1,⋯, n t g, where ℙ ′ t is derived from the swarm's prior knowledge, and in the most accurate case, it holds that that ℙ′ t = ℙ t .
In a realistic application, perceptions of a UAV are imperfect. The perception result obtained based on the states  Planning target motion model International Journal of Aerospace Engineering of UAVs i and target j and environmental noise can be modelled as follows: where i y j k ∈ ℝ n y denotes the perception result of a target j observed by a UAV i at time k, v k is the noise dependent on the state and environment, and hð·Þ models the properties of perception.
This work considers an imperfect interswarm communication network, where every two members l, m ∈ A, can share their states, perceptions, and control decisions if needed. Due to the environment interference and packet loss, the communication between UAV can be modelled as a probabilistic model of the received packet. Assume l m z k is the received packet sent by UAV m to UAV l at time k, Y m k is the UAV m's perception result of all targets and l m Z k is the set of all cases of l m z k . In addition, denote the probability function of communication distribution as ℙ c ð·Þ. Then, In this work, the decision process on cooperative planning and perception is considered as a distributed optimal control problem with the above constraints. Assume that a quantity of interest Ψ represents the yields of the behaviours of UAV members; then, it can be written that where U * denotes the optimal control decision of all UAV members in a swarm, U is the set of all control decisions available for UAVs, and (s ; x ; y ; z) is a set consisting of the UAV state, target state, perception result, and communication packet. The underlying form of Ψ is determined by the mission of a swarm. For instance, in the motion planning problem, Ψ can be defined as a negative integration of the throttle along the path under the preference of saving energy. A distributed control decision is made individually by a swarm member aiming to maximise the union quantity of yields of the entire team. The decision can be made by one of the three possible approaches of the control theory: where ðs i ; y i ; z i Þ is the set consisting of the ith UAV's state, perception results of UAV i, and all communication packets received by UAV i.

Platform Implementation.
During the implementation of the simulation platform, issues concerning maintainability and functional encapsulation were addressed. Since the simulation platform development is a complex task for a small research group, the main purpose of the system design process is to manage complexity and reuse codes while ensuring that the previously formulated problem can be implemented [15]. The Python programming language is chosen as the primary programming language because its popularity ensures  International Journal of Aerospace Engineering the availability to developers in small research groups, development productivity, an abundance of relevant packages, and ampleness of reference cases. Namely, Python is a developer-friendly language, which is easy to learn and code even for a beginner, and a novice researcher can learn Python basics in a relatively short time. In contrast, the other alternative high-level languages, such as C++ and Java, and emerging programming languages, such as Go, requires much more time to master, so a small research group may lack a developer with enough experience in programming in these languages. In addition, unlike MATLAB that requires a paid license, Python has a prosperous open-source community providing many free-of-charge implementations of an interpreter. The features of Python language, including dynamic typing, being object-oriented, and coroutine supporting, which are included in all Python versions since version 3.7, can greatly improve development productivity. Being wellknown, one of the main Python tendencies is to writes less code with the help of concise and expressive grammar. The packages managed in PyPI or other source code hosting sites such as GitHub also promote Python, thus increasing its popularity. The SciPy adds efficient matrix and statistics manipulation function to Python, while the SimPy makes building a discrete-time simulation much easier for Python using coroutine.
The key to managing complexity is constructing a domain model representing the essentials of a problem to be addressed [16]. As the distributed cooperative planning and perception problem has been formulated before, the next step is to refine concepts from the problem formulation and to design the information flow between components. The overall structure of a domain model is displayed in Figure 3. The presented structure uses ubiquitous language during the implementation of the proposed platform and bridges the gap between the formula and code.
A class, which originates from object-oriented programming (OOP), helps a developer to integrate data with function, enabling structure reuse and keeping the interface intact while preserving the ability to modify details and implementations. Being well-known, the scattered duplicated codes can significantly jeopardise the maintainability of software and lead to certain defects; in particular, codes cannot be reused directly before certain modifications. To prevent the deterioration of code in advance, the structure presented in Figure 3 is encapsulated in a class in the proposed simulation platform. It should be noted that the object-oriented syntax of Python is concise and expressive.
The proposed platform consists of two major top-level components, Environments and Autonomy UAV. The Environments component enacts the pivotal role as a simulator of the real world and preserves the whole ground truth data in its subcomponents. Being isolated from the ground truth data in the Environment, Autonomy UAV has multiple instances enacting each UAV in a swarm and retrieving noise-contaminated data generated by the Noise component and ground truth data from Environment. Subcomponents of Environment are UAVs, Targets, Communication Channel, and Noise. Information exchange between UAVs is facilitated by introduction of the Communication Channel component, which judges the availability of packets according to the communication model. The subcomponents of Environment, UAVs, and Targets compute the state evolution using the dynamics assigned by Simulation. Note that in the proposed solution, the dynamic can be different from that in Plan Algorithm in that a simplified proxy model can be used in planning. Sensor component in Autonomy UAV defines the format of perception data and provides a likelihood model for the Fusion component and Plan Algorithm component. Combining the perception data and Fusion model, the Estimator component estimates the target state using the target model assigned to Autonomy UAV component, which is also not necessarily the same as that in Targets. The Plan Algorithm provides the control based on the UAV state, estimation result, and packed data obtained from Communication Channel component. The control action, UAV state, and estimation are gathered as packed data and sent to the other Autonomy UAV components, as well as to the Environment component, to change the UAV state.
The remaining three components of the top level are Simulation, Simulation Clock, and Performance Clock.

International Journal of Aerospace Engineering
Simulation loads parameters in config files, calls functions to initialize the simulation, and saves simulation data for plotting. Simulation Clock maintains a discrete event priority queue and drives the simulation forward. Performance Clock records the computation resource usage data when a feature is enabled. This component is implemented in the aspectoriented scheme using "Wrapper" (a.k.a. decorator), a tool in Python, which allows performance monitoring without modification in a simulation code.

Cooperative Locating Radio Emitter with Airborne Radio Receiver
This section introduces the simulation platform configured to model a simulation case of real-world applications, where a swarm of UAVs equipped with an airborne radio receiver attempts to locate a moving ground target emitting radio signals with limited airborne fuel. The main properties of this task are as follows. First, and most importantly, the swarm is completely autonomous and distributed, so trajectories of UAVs in the swarm are planned online according to the distributed cooperative decision of the swarm rather than predefined by operators. Second, due to the nonlinearity and presence of non-Gaussian noise in sensor perception, the posterior distribution is highly nonunimodal, and the standard deviation criteria have a limited guiding significance in the planning of swarm trajectories. Thus, a quantity of interest must be designed to indicate the location of a moving target, as well as energy conservation. Lastly, due to the manoeuvrability constraint of a fixed-wing UAV, the planning of swam trajectories must consider future returns in advance.

Energy-Aware Motion Model of UAV in Swarm.
It is assumed that during the locating process, swarm members are flying at fixed speed and height [17] and action candidates are at finite discrete yaw angle, which is expressed as where s E and s N denote a swarm member's coordinates in a north-east coordinate plane, which is defined by the air zone; v is a fixed velocity; φ is the yaw angle; γ ≥ 0 is a real number representing the remaining energy or power of a UAV (once γ = 0, a UAV fails); c > 0 is a fix energy consumption ratio for flying; and, lastly, d = ηðuÞ denotes the steering cost.

Radio Emitter Target Model.
In the simulation scenario, two different models of a radio emitter are used in the Environment and Planning components of the platform for computing the ground truth and the planning of UAV, respectively. The state vector of a moving target can be expressed as x = x E x N α ½ T , where x E , x N are the target coordinates in the north-east coordinate plane, the same as that of swarm members, and α > 0 is an unknown constant relating to the transmit power of the radio emission. It should be noted that the swarm has no prior knowledge about ℙ t .

Sensor Model of Airborne Radio
Receiver. The airborne radio receiver is assumed to precept the range and bearing of a radio emitter based on the emission signal's strength and direction-of-arrival. The perceptions are polluted by noise, and in accordance with reality, the bearing perception is polluted by non-Gaussian noise.
The strength perception of radio emission can be modelled using the relative emission electronic power as follows: where μ > 0 is a constant related to the performance of a signal receiver and α is a Rayleigh distribution noise introduced by the environmental and thermal noise in the radio receiver [18]. It should be noted that radio emission is undetectable when rss ≤ 0, representing the fact that the sensitivity of a radio receiver is limited.
The bearing perception is modelled as follows: where w denotes the non-Gaussian noise introduced by a random distortion of the radio waveform. The random variable ω is generated from a weighted mixture of Gaussian distribution and a "long-trailing" noise distribution (Student t distribution) [19], which can be expressed as where g t is the probability density function of the t distribution, g G is the probability density function of the Gaussian distribution, and ξ ∈ ½0, ; ;1 is the weight representing the non-Gaussian degree. It should be noted that, in real-world scenarios, radio emission can be detected only in a limited range of direction-of-arrival due to the structural characteristics of a radio receiver antenna. Thus, the sensor model of an airborne radio receiver can be defined as where φ f ov is a constant representing the maximum angle of the field of view with respect to the yaw of the UAV.

Channel
Model of Interswarm Communication. The detection, a.k.a. perception results, are sent with the UAV states s to other UAV members via a packet loss channel. The packet loss probability is directly proportional to the distance between the two communicating UAVs. The two outcomes, succeed to reach and fail to reach, follow the 8 International Journal of Aerospace Engineering Bernoulli distribution, whose parameters can be obtained from empirical data [20]. Assume l m β k ðs l k , s m k Þ, where m represents a UAV and l represents the ground vehicle, and l m β k ðs l k , s m k Þ denotes the probability distribution of the Bernoulli distribution at time k; then, it can be expressed as where α i represents constant experimental parameter. Then, (9) can be written as 3.5. Bayesian Inference-Based Site Locating. There are many uncertainty sources along with the perception information flow from a UAV to the ground vehicle, including noise, fault detection, and communication loss. To estimate the location of a lost pilot, as well as the estimation accuracy, a distributed iterative particle filter is utilised. Assuming there is poor prior information about the location of the lost pilot, the prior distribution Pðs t Þ is chosen as a uniform distribution on the map. Particle sets with weights h b s k t , w k i are sampled from the prior distribution. The particle set evolves with each perception result or packet reception by the forward state prediction (13) and weight update, which can be expressed as In order to mitigate the particle degeneracy problem, a standard importance resampling step is introduced after the particles' weights are updated.
3.6. Quantity of Interest for Cooperative Decision. As a characterization of uncertainty, the mutual information of the radio emitter state and accumulated perceptions is chosen as one of the two quantities of interest used to evaluate perception quantity. Due to the imperfectness and nonlinearity of UAV perceptions, the radio emitter location acquired by the Bayesian inference is always a nonunimodal probability distribution. In such a situation, other measures like covariance are not appropriate [21]. Another quantity of interest is the sum of distances between the target and UAVs with a negative coefficient related to energy γ.
represents the sum of distances between the UAVs and the target; then, the objective Ψ in (10) can be expressed as where Ið·Þ denotes the mutual information that can be derived from [22]; Z is the union set of perceptions of a UAV member perceived by itself and received from other members, which can be further decomposed as Z = f 1 Z, 2 Z ,⋯, n a Zg; and ηðγÞ < 0 is the coefficient that characterizes the tendency to approach the target with respect to the remaining energy.
3.7. Distributed Cooperative Planning. It is hard to obtain a solution to (15) in the closed form. First, except for a special form of a dynamic system, it is hard to obtain an optimal feedback law of U * as a closed form even in a centralised scheme [23]. In this study, the global optimal control law is approximated with an open-loop independent decision series in a time receding horizon τ, which is resolved in each time step. Assume Ψ k = I k ðY, ZÞ + η k ðγ i Þ · dis k represents the quantity of interest at time k; then, Ψ in (10) can be approximated by where u i k+1 , u i k+2 , ⋯, u i k+τ are the independent decision series of a UAV i. A detailed explanation of coordinate descent was given in [12], and the receding horizon control in cooperative perception was presented in detail in [24].
The last problem to solve is the estimation of the quantity of interest in the receding horizon k + 1, k + 2, ⋯, k + τ. In this work, the sampling-based method [25] is used. Assume that samples of the target state arex ðqÞ k ∼pðx k | x k−1 Þ and y ðqÞ k ∼pðy k | x k , u k Þ, wherepðx k | x k−1 Þ andpðy k | x k , u k Þ are derived from the weighted particle set generated by the Bayesian inference; then, an approximation of Ψ k is expressed as Inspired by the motion primitive method, a motion primitive graph can be constructed by taking s k as the root node, selecting or sampling u i k from the available control action space U i as the edge, connecting edge u i k to the candidate nodeŝ i k+1 , and calculating the quantity of interestΨ k in each candidate nodeŝ i k+1 as the weight of the edge. Because the decision space is discrete andΨ k can be enumerating and evaluated, the scale of the motion primitive graph is limited. The optimal control action sequence can be obtained by a search algorithm like A * .

Simulation
Results. The simulation is conducted in a square area of interest of 10,000 km 2 . The UAV swarm is formed by four small fixed winged UAVs with a twocylinder propeller. The target is moving in a constant linear velocity of 18 km/h and a turning rate of 9 degrees/min. The target's motion is also disturbed in speed and turning rate (see noise 0 in Table 1). The UAVs have no prior 9 International Journal of Aerospace Engineering knowledge of the target's motion and assume the target is randomly walking (see planning target motion model ℙ t ′ in Table 1). The parameters of the airborne sensor for perception and other parameters for simulation initialization are given in Table 1. To eliminate the randomness and explore the effect of τ, four simulation scenarios with different τ were designed, and simulations were conducted 100 times for each scenario.
The trajectory snaps of one of the simulations four different τ values are presented in Figure 4. Due to the uncertainty property, the trajectories of both the target and the swarm are different in each run. Thus, the overall estimation performances, including accuracy and credibility, are depicted in an average performance of 100 simulation runs (see Figure 5 and Table 2). The accuracy was measured by the average root mean square error relating to the target true position (the lower the error was, the better the performance was), which was calculated by , where x E k , x N k are the maximum a posteriori estimation values. The credibility was measured by the average mutual information derived from the particle set of the swarm member; the higher the amount of the average mutual information was, the better the credibility was.
The results in Figure 5 and Table 2 show that the higher τ resulted in higher mutual information and lower RMSE. This indicates that the increase in the time horizon τ can help to improve the performance of the swarm in the way of earlier turning for gaining more quantity of interest and avoiding extra turns.
The computation resource consumption of the simulation is plotted in Figure 6. Based on the results in Figure 6, the improvements in accuracy and credibility were obtained at the cost of an exponential increase in the computation cost. In addition, under the exhausting energy (see the end part of Figure 5), the swarm tended more to approaching the target than gathering the perception, which coincides well with the designer expectation.

Conclusion
This paper presents an extensible and maintainable simulation platform for distributed swarm cooperative perception planning considering the uncertainty in communication and perception. Simulation cases of evaluating the Bayesian inference-based estimation under imperfect perception and evaluating distributed cooperative planning are considered to demonstrate the operating principle and usefulness of the proposed simulation platform.
Since the modelling and implementation of the simulation consider the real-world constraint, such as unstable communication and noisy perception, the proposed solution can be beneficial to distributed cooperative swarm development and application.
However, the proposed simulation platform can be further improved by implementing additional mode functionalities, such as a new sensor model, communication model in complex terrain, and cooperative planning algorithm based on reinforcement learning, which will be part of our future work.

Data Availability
The simulation condition data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declared that they have no conflicts of interest.