Cooperative Loitering Munition Swarm Online Patrolling Route Planning with Nonlinear Seeker Measurement

. With the development of autonomous systems, the operational use of loitering munition is shifting from the following of a preplanned ﬁxed route without communication to smart decision-making and collaborative cooperation with sharing information. In this paper, we study the autonomous decision-making and cooperative control strategy of online patrolling for a swarm of loitering munitions using communication to coordinate their route based on maximizing the information they gathered in the operation region. Taking the non-Gaussian nonlinear property of airborne radar seeker into account, we utilized a particle ﬁlter-based method to evaluate or to predict the information quality of each action candidate. We also implemented a coordinate descent scheme to enable a distributed and scalable swarm decision-making. Simulation results show that the proposed method provides a better estimation than baselines without the need for external or centralized decision agent.


Introduction
Loitering munitions are mainly used to destroy and suppress the enemy's air defense system and to create opportunities for attackers.
e loitering munition is usually equipped with a radar seeker a.k.a. an airborne radar signal receiver, guiding devices and a warhead.After launching into the operation air zone, it circulates and searches for enemy's radar radiation signals and conducts strikes to the radar target positioned by its radar seeker.As early as in the air battle in the Syria-Israel Bekaa Valley, the Israeli loiter munition Harpy has achieved remarkable success in suppressing and destroying Syrian air defense forces and has become famous all over the world.Taken the Harpy as an example, a typical composition of a loiter munition is shown in Figure 1.Modern military powers are ubiquitously equipped with various types of loitering munitions.
In general, equipped with an external positioning system and inertial guidance devices, traditional loitering munitions fly along a preplanned route (usually, this route is circular or figure-eight) over the assigned combat air zone and discard any control signal until a radar target is found, the strike operation is conducted, or they run out of fuels. is phase is known as patrolling stage.e main task of the patrolling stage is to maximize the information gathered about the target in the combat air zone and to prepare for the strike phase.erefore, the effeteness of patrolling route planning plays a significant role in the combat performance of loitering munitions.
1.1.Related Work.Existing research on loitering munitions patrolling route planning mostly uses coverage as the indicator and conducts offline route planning based on the geometry of the combat air zone.e authors of [1][2][3] use "field coverage" as the route planning rewards indicator, a consensus control law and a modified artificial potential field method to improve the performance of the planned patrolling route.However, since these methods are open loop which only carry out the fixed route planned in advance, they neglect the real-time measurement of the radar seeker, neither replan online nor exert the advantage of autonomous decision-making ability of the unmanned system.And also, when there are multiple loitering munitions operating in the same air zone at the same time, these methods can solely rely on the radar seeker carried by each loitering munition to search for the target rather than fully using the measurements from other UAVs to update the information.us, the performance is limited.
With the development of unmanned swarm system technology in recent years [5], through the interconnection and transmission and information exchange between individuals in the swarm, improving search efficiency using cooperative control has aroused the research interest of experts and scholars at home and abroad.e literature of this type is often referred to as the cooperative search problem.e research on the target search problem can be traced back to the research of antisubmarine warfare by Koopman and others [6] during World War II.Derived from the probability theory, the authors of [7,8] considered a single agent search problem including cost of switching search location and failure rate of search due to imperfect sensors and solved the problem using a classic framework of continuous postprobability updating with the searched result.Zhu [8] also pointed out the difference of searching planning between maximizing the probability of discovery (a.k.a.detection search) and maximizing the belief of target's location (a.k.a.whereabouts search).Based on single agent search theory, Zhu et al. [9] proposed multiple agent searching planning methods using clustering and minimum spanning tree to coordinate team to minimize the cost of changing search position.It should be noted that because the sensor is modeled as a "found-or-not-found" binary model, search effort is needed only under the condition of "not found," so the strategy obtained is open loop, which is destined to end when the target discovered.Nevertheless, it is still implying that the search behavior is changed in real time according to the detection result.Whereabouts, search includes not only the optimal strategy when the search succeeds in finding the target but also the optimal strategy for guessing the target's location when the search is unsuccessful.In the field of robotics and multiagent, the reward, rate of probability return, of a search plan is often replaced by information metric using information theory.
ese problems are cataloged as active information gathering [10] problem, whose goal is maximizing the quality of information collected along the route planned in real time.
e research of sensors network (sensors network), which has emerged in recent years [11,12], also covers the scope of multiple agent information gathering.In the scenario where the target distribution probability is Gaussian and the measurement noise is also Gaussian, the information quality metric can be the covariance of the target distribution probability and the measurement of sensors.In the more general scenario, the metric is often information entropy, mutual information [13], and Kullback-Leibler divergence [10].Mutual information between measurement and target state indicates that how much one can learn about target's state from the measurement.e larger the mutual information, the more the one learn about target state based on measurements.
Although information gathering problem differs from search theory in a more generalized sensor model, both in essence lie within Bayesian inference framework, information gathering problem models the sensor as imperfect measurement of target's state with noise interference, and the target state distribution is updated every time the measurement arrives.For linear Gaussian systems, classical Kalman filtering weights the mean and covariance of inferenced distribution by using the covariance of measurement and prior distribution.But real physical systems are often nonlinear and non-Gaussian [14].For nonlinear systems, linearization methods such as extended Kalman filtering are useful.However, in highly nonlinear systems, the accuracy of extended Kalman filtering is limited, and it cannot process non-Gaussian noise.With the improvement of airborne computing resources recently, the particle filtering method based on sequential Monte Carlo (SMC) sampling is widely used because of its capacity of effectively processing nonlinear [13] and non-Gaussian noise.
Another critical issue is that distribution swarm decision is a must because it is particularly relevant to scalability and applicability.Comprising multiple loitering munitions, the swarm is expected to behave collaborative and cooperative without any centric agent or fully connected network.Some literatures often assume that there exists a centric agent or fully connected communication without delay in which information from all agents in swarm can be aggregated and dispatched.us, there is no information difference among all agents, and the coordination problem degenerates into a single-state decision problem with a higher state dimension.In real-world scenario, this assumption is often difficult to meet.On the other extreme, all agents cannot communicate, and each agent makes independent decisions based solely on local information, which leads to limited performance.Our goal of this work is to devise a distributed approach with scalability in the number of agents and meanwhile to still share information and decision among each other and plan cooperative route.
Zavlanos et al. [15] utilize the "distributed auction" algorithm to optimize task allocation without the need for a centric agent or fully connected communication.However, this method requires multiple bids and rebid communication during the swarm cooperative decision process.In the loitering munitions patrolling scenario where measurement  signals arrive continuously at high frequencies, the auctionbased swarm coordinate method meets difficulties in practical due to the limitation of concurrent communication capacity.e coordinate descent method used in [16,17] provides a suboptimal solution to reduce the decision space of multiagent decision-making problems from exponential growth on the number of agents to linear growth.It is proved that this method can get more than 50% return of the optimal solution and requires a smaller amount of communication [17].Due to hierarchy structure of decision where the agent with higher rank makes the decision with higher priority, Chen [18] called this method sequential allocation and used this method to solve the problem of multi-UAVs reconnaissance and surveillance planning.
In this paper, the loitering munitions swarm online autonomous patrolling route planning problem is modeled as an information collection problem.Loitering munitions in the swarm share the measurements collected by its airborne radar seeker and their own status via wireless communication and update the radar target's position [19] distribution in real time based on the prior position distribution and the observation sequence.Mutual information is selected as the optimization goal, and a nonmyopic iterative optimization strategy in the model predictive control (MPC) scheme is adopted to conduct route planning online.
is paper assumes that all loitering munitions in the swarm can communicate in ad hoc mode, but communication capacity is limited, and a fully connected communication is not afforded.It is assumed that the airborne radar seeker (signal receiver) has limited field of view, limited sensitivity, and non-Gaussian errors.Particle filtering [19] is used to infer target position information to calculate mutual information between measurements and target state.e problems of collision avoidance, obstacle avoidance, and patrolling-striking switch in real operations can also be solved within the model predictive control framework proposed in this paper through hierarchical decisionmaking, inner and outer loop control, and multitarget weighting.It is noted that a shorter conference version of this paper is submitted to 5th International Conference on Automation, Control and Robotics Engineering (CACRE 2020).e initial conference paper addressed the patrolling route planning problem without considering the problem caused by unknown parameters like radar transmitting power and the error introduced by quantization [20,21] of measurements acquired by airborne radar seeker.is manuscript addresses those issues and provides a more complicated simulation experiment.e content of this article is organized as follows: Section 2 introduces the modeling of information collection problems based on the model predictive control framework.Section 3 introduces the collaborative descent method used in swarm distributed decision-making under communication constraints.Section 4 deals with the calculation of mutual information using particle filtering-based methods.Section 5 illustrates the effectiveness of the method proposed by simulation experiments using Octave [22].Section 6 summarizes and prospects future work.

Problem Formulation
Active information gathering problem refers to the optimal planning of the executor's path or action under the constraints of certain resources which aims to maximize information revenue along the route planned.Integrating information perception, data fusion, and inference, this problem is an emerging hot topic in robotics, computer science, and military operations research.To plan the route of the loitering munitions swarm using information gathering framework, it is necessary to comprehensively consider the swarm dynamics model, radar target model, and the airborne radar seeker sensing model.Table 1 lists all the notations used in the following and their meaning.

Dynamics Model of the Loitering Munitions Swarm.
Consider a team of n a loitering munitions (referred to as agents in the remaining paper) obeying the following motion dynamics: where s i k ∈ S i � R n s is the n s dimensional state of loitering munition i at time k and u i k ∈ u i is the control action applied to the loitering munition i at time k and is selected among all possible decisions u i .In this paper, it is assumed that during the patrolling phase, the agents are flying in the combat air zone in a fixed speed and height and the action candidates are finite discrete yaw angle, that is, where x and y are the coordinates of agent in an orthogonal plane whose origin lies in the combat air zone, υ is a constant representing the unchanged velocity, φ is the yaw angle, c is a real number representing the fuel remainder, and c > 0 and d < 0 are the fuel consumption ratios, respectively, representing the fuel consumption for flying and steering.e decision set is defined as u i � 0, −ω i , ω i   is the control action, i.e., flight straight, maximum left turn, and maximum right turn.It is noted that the dynamic model formulated in (2) is a special form of (1) and is formulated for engineering purpose.e method is applicable to other dynamic models formulated in (1).

Dynamics Model of the Radar Target.
e mission of the swarm is to suppress the stationary radar target T at an unknown location within the combat air zone.e state of the target r � x T y T α   is unknown, where |x T | ≤ x field and |y T | ≤ y field are the coordinates of the target in the same orthogonal plane with loitering munitions, and Mathematical Problems in Engineering α > 0 is a unknown constant relating to the transmit power of the radar.

Model of the Radar Seeker.
e model of the radar seeker is described by the measurement.Agent i measures the radar target: where i z k ∈ R n s is the measurement of agent i at time k, which is determined by the state of agent s i k and the unknown state of radar target r.υ k (s i k ) is an agent-state-dependent noise, whose values are independent at any pair of times and across the agents.Unlike the study [10] based on the extended Kalman filtering, υ k (s i k ) does not have to be Gaussian noise nor h(s i k , r) does not have to be linear mapping.In our work, we consider a generalized range bearing sensor with limited sensitivity and field of view, influenced by a mixture of Gaussian and non-Gaussian noise.
Signal strength measurement is modeled as where μ > 0 is a constant related to the performance of signal receiver and τ is the Gaussian distributed noise introduced by the environmental noise and the thermal noise of the receiver.range is defined as range � min range max − ������������������� where range max > 0 is a constant relating to the sensitivity of the signal receiver.
Bearing measurement is modeled as bear � tanh where w is a noise introduced by the random distortion of the target radar waveform.e probability density function of w is characterized by a weighted mixture of Gaussian distribution and a "long-trailing" noise distribution (in this paper, a student-t distribution): where g t is the probability density function of the t distribution, g G is the probability density function of the Gaussian distribution, and ξ ∈ [0, 1] is the weight representing the degree of non-Gaussian.us, the model of the radar seeker can be written as where φ fov is a constant representing the maximum angle of field of view respect to yaw.It is noted that the sensor model formulated by using ( 8) is a special form of (3) and the proposed method is also applicable to other linear and nonlinear sensor models in the form of (3).

Online Patrolling Route Planning as Active Information
Gathering.Based on a model predictive control scheme, we only solve the optimization problem of information gathering in a finite time horizon and apply the first optimal control action to agent. is scheme not only reduces the size of the optimization problem that needs to be solved but also adapts to the dynamic environment better.Given a state vector of loitering munitions swarm , an unknown but fixed target state [x T , y T , α], and a constant of planning horizon T, let R be the notation of the set of all possible r, the information gathering utility is formulated as , where I(•, •) is the information metric (mutual information in this paper); η 1 ∈ R 1 and η 2 ∈ R n j are weights, respectively,   is the union set of the sequential action of each agent in time horizon T, and is the union set of measurements of each agent, which is further decomposed as 9) can be solved and is guaranteed to have optimal solution because of the finite decision candidates.However, the complexity of computation growth exponential and the communication capacity for completely sharing of information is not practical due to communication capacity limitation.erefore, a coordinate descent scheme of swarm decision is utilized.

Distributed Collaborative Framework
Due to the interference of environmental noise and communication delay, the capacity of the airborne communication is limited, which cannot afford a fully connected twoway communication channel between all agents.To tackle the complexity problem and communication capacity limitation of swarm decision, Atanasov et al. [17] proposed a coordinate descent method.We assume that all agents in the swarm communicate in a packet broadcasting manner similar to the UDP protocol.
Suppose agent 1 plans its route without considering the other robots.In other words, it solely solves a single agent version of problem (9): , After solving (10), agent 1 broadcasts its chosen plan and measurement to the other agent in the swarm.en agent 2 solves a 2-agent version of information gathering problem with the constraint of fixed policy for agent 1: ), ( 8), e algorithm continues in this fashion for each agent in the swarm.e mutual information in optimization objective ( 9) can be obtained by using the chain rule of mutual information: Let i l Ψ K be the notation of the data package broadcasted by agent l and received by agent i: Given a set of data package and finite planning horizon T, using coordinate descent, the swarm decision can be decomposed as (2), ( 8), (13), (14) can be solved exactly for that all solutions are finite and enumerable.Comparing to (9), the computation for each agent in swarm grows linearly with the number of agents in the swarm, i.e., O(n a |u| T ).us, it is scalable to large swarm.

Mutual Information Computation Using Particle Filtering
To calculate mutual information, the probability distribution of all possible target states and the probability of union distribution of measurement and target state are needed, which is infeasible in practice.In the following part, we compute mutual information by the differential between prior entropy and posterior entropy.Both prior entropy and posterior entropy are estimated by the particle filteringbased method.

Mutual Information as Entropy Differential.
In the objective function of ( 14), using information theory, where H(•) denotes a conditional entropy defined as  16) ) are the prior entropy and posterior entropy respectively, which can be derived by prior probability p(r | 1:i z 1:k ) and posterior probability p(r | 1:i z 1:k+1 ).

Target State Distribution Inference.
Given target state distribution p(r | 1:i z 1: k ) at time k, since the target is stationary, consider the measurement arrived at time k + 1, the posterior distribution can be derived from Bayes theorem: In the finite-state hidden Markov model, the integral part of ( 17) can be a summation of finite terms.In the linear system and the Gaussian distribution, the posterior distribution is still Gaussian distribution.Here for the non-Gaussian nonlinear of (8), probability density function or the integral part generally does not have close form solutions.In this paper, the particle filtering method is used to approximate the posterior probability density function.

Sequential Importance Resampling-Based Particle
Filtering.
e basic idea of particle filtering is to use the Monte Carlo sampling method to obtain conditionally distributed weighted random samples (particles) to approximate the posterior probability density, including the prediction and update steps.e randomly sampled particles in the prediction step update the state according to the dynamics followed by the target.Because it is assumed the target is stationary in this paper, the prediction step can be omitted.Particles of filter are initially sampled from a distribution of target location given by prior knowledge.In the update step, the particles' weight is updated using the measurement and radar seeker model can be formulated as follows.Let 〈 i  r p,k , i ω p,k 〉 be the notation of a group that has n p particles and its corresponding weights known by agent i at time k, where i ω p,k ∈ R + and  n p p�1 i ω p,k � 1. Equation ( 17) can be approximated as When agent i receives the measurements 1:i z k+1 , the particle weights are updated according to each measurements j z k+1 , j ∈ 1, • • • , i using the radar seeker measurement likelihood probability model and sequential importance sampling/resampling (SISR) method, in which weight of particle is updated using i w In order to mitigate the particle degeneracy problem, a standard importance resampling step is imposed after the weight of particles is updated.Before the measurements are sent to neighbor agents via wireless communication, the measurements have to be quantized.e received measurements at each agent is given by where ⌃ z is the quantized component of measurement z (one of rss or bear) and q 0 , q 1 , . . ., q L are the thresholds for a K � log 2 L bit quantizer.Note that q 0 � −∞ and q L � ∞. e corresponding likelihood function of quantized component measure can be derived from the probability of the quantized measurement value conditioned on the target state: where Q(•) is the complementary distribution function of the distribution of measurement component, i.e.,rss or bear, which can be derived from ( 5) or (6).It should be noted that the quantized measurement p( z | r) is a function of the agent state and radar target state.us the quantized likelihood can be used in particle filter weight updation step.

Simulation Results and Analysis
To evaluate the effectiveness of the method proposed in this paper, the Octave platform is used to conduct a 100-time Monte Carlo simulation experiment in which loitering munitions swarm patrol and search for an unknown static radar target in a predefined combat air zone.We compared the proposed method in this paper with baselines to show the improvement.

Baselines
5.2.1.Random.In this paper, random strategy is selected as the baseline to show the effect of planning.In the random strategy, each loitering munition randomly selects an action in the action space or steering to avoid getting out of combat air zone.

Coverage of Field Offline Planning.
is planning method is proposed in [1].e starting position of route is predefined as the starting of each agent.Because the actual target position is unknown in this paper, the estimated position of radar is used, and the end of route is set to offset the estimated radar target position by (100, 100).Parameters for the algorithm are, respectively, popsize � 200, pc � 0.9, pm 0.05, and MAXGEN � 50 (for the mean of these parameter, please refer to [1]).Mathematical Problems in Engineering 5.2.3.Gaussian Noise-Based IEKF.To evaluate whether the particle filter is necessary or not, the particle filtering part in this paper is replaced by the iteration extended Kalman filtering in [10], i.e., a first-order linear model is used to approximate the nonlinear seeker model.e non-Gaussian noise is replaced with a Gaussian noise with higher covariance in planning, that is, ω ∼ N(0, π 2 /4).e noise in simulation is still non-Gaussian and other parts of simulation are unchanged.

Simulation Results
. Simulation results are shown in Table 2 and Figures 2 and 3. We can obtain the following conclusions from the results: (1) As the particle distribution and mean square error value at the end of the simulation shown in Table 2 and Figure 2, the results obtained from the random decision and Gaussian noise-based IEKF are poor.
e method proposed in this paper is better than the erage of field offline planning in which comparing the former to the latter, the square error value is reduced by more than 30%.(2) e estimation accuracy of the target position measured by RMSE obtained from the Gaussian noise-based IEKF is lower than the method proposed in this paper, indicating that for a highly nonlinear, non-Gaussian noise measurement system, IEKF may divergence.us, the u particle filtering method is necessary and effective.(3) Comparing to random strategy, the offline route planning method proposed in [1] improves the "field coverage," which is beneficial to the estimation of the target state, but the effect is not as good as the method proposed in this paper, which is aiming at optimizing mutual information.
Note that in Figure 3, as the simulation proceeds, both the entropy and RMSE of target rise after the falling.is is 8 Mathematical Problems in Engineering caused by the dynamic constraint of loitering munition, and the loitering munitions must take many steps to turn its heading to achieve a better cooperative measurement.e results also reveal that the performance of the proposed method is better.

Conclusions and Future Works
e autonomous online route planning method for loitering munitions swarm proposed in this paper uses the real-time measurement from the airborne radar seeker and model predictive control law to iteratively optimize the mutual information gathered along the route.Compared with the offline planning algorithm based on "field coverage," our method is more effective.e communication capacity requirement is reduced, and the scalability is enabled by introducing the coordinate descent method for swarm distribution decision.e particle filter method used in this paper, can be applied to nonlinear dynamic models and sensor models with non-Gaussian initial distribution and measurement noise.e method proposed in this paper is also capable to add other objective functions related to operational requirements or extended functions such as obstacle avoidance and threat perception through a hierarchical decision framework, inner/outer loop control [23].
Although the formulation is helpful in real-world scenario, it requires a finite set of action and relatively more computation resource for longer time horizon.Future research work will focus on introducing such box particle filtering and adaptive particle filtering to reduce the computational cost of particle filtering and tackling multiple time-variant radar targets.

Figure 2 :
Figure 2: Route planned in simulation 1, particles presenting estimation are plotted with scattered color relating to its weight.(a) Random strategy, (b) coverage of field, (c) Gaussian noise IEKF, and (d) the proposed method.

Figure 3 :
Figure 3: Comparison of average target entropy (a) and average RSME (b) along the simulation between baselines and the proposed method.

Table 1 :
Nomenclature. ) ∈ R n j represents the costs of action related to other operational objectives (for the sake of concise and focusing on main theme, let η 2 � [0] n j in the remaining part of this paper), U T � u 1 T , • • • , u

Table 2 :
Mean and standard deviation with respect to RMSE and entropy at the end of simulation from 100 simulation runs.ApproachesMean and standard deviation with respect to entropy a and RMSE a at the end of simulation Entropy mean (nat) a e lower the better.