Optimal Maneuver Strategy of Observer for Bearing-Only Tracking in Threat Environment

The optimal maneuver of observer for bearing-only tracking (BOT) in a threat environment is a complex problem which involves nonlinear filtering, threat avoidance, and optimal maneuver strategy. Under comprehensive consideration, the reward function comprised of the lower bound on detFIM and threat cost was established; the finite-horizon MDP principle was applied to obtain the optimal strategy. The quantization method was used to discretize the BOT process and calculate the transition matrix of Markov chain; to achieve quantization in the beginning of each period, CKF was applied to provide the initial state estimate and the corresponding error covariance. The numerical simulations illustrated the applicability and superior performance for static and dynamic target tracking in several scenarios in the threat environment.


Introduction
Bearing-only tracking (BOT) techniques are applied in various scenarios of target location, and the related theoretical and practical problems have been studied for decades [1][2][3].The accuracy and efficiency of the target location have witnessed significant improvement in recent years due to the improvement of advanced filtering techniques [4][5][6] and UAV (unmanned aerial vehicle) platform.Many researchers focus on location problems with static target or tracking problems under conditions of observer's specific trajectories, while the target could be located more accurately if the observer implements maneuverings based on certain rules [7].In a military field, the observer's (aircraft, UAV, car, warship, submarine, etc.) maneuvering trajectories may be constrained by some threats (such as missile, torpedo, no fly zone, and air defense radar), and the optimal observer's trajectories for BOT could fail to satisfy those constraints; in result, the observer's maneuver need to not only ensure the achievement of bearing information but to also ensure the safety of the observer itself.This problem concerns the balance between accuracy of target tracking and threat avoidance.
For bearing-only tracking problem, the Cramer-Rao lower bound (CRLB) and Fisher information matrix (FIM) are usually used to evaluate the performance of target tracking and observer's maneuvering.Fawcett [1] used the CRLB to evaluate the effect of course maneuvers on bearing-only range, but the optimization problem of the observer's maneuver was not considered.In [8], the FIM was used to achieve the optimal leg of the observer, and the EKF algorithm was updated by heuristic evolutionary optimization algorithms to enhance the accuracy of the BOT, while heuristic evolutionary optimization algorithms cannot ensure the efficiency of calculation.In [9], recursive Bayesian estimation methods are used to several angle-only applications in sea-tosea, air-to-sea, and sea-to-sea scenarios; the particle filter (PF) and range parameterized EKF were used for comparison.In [10,11], Kalman filter (KF), EKF, and PF have been applied in bearing and elevation measurements for realtime object tracking in underwater environment, while they did not consider the optimal trajectories of the observer.In [12], the observability was analyzed when the observer maneuvers smoothly, and the necessary and sufficient conditions of observability were established.In [13], stochastic control was applied for underwater optimal trajectories in BOT, and dynamic programming was used to achieve the optimal control sequence.Zhang et al. [14,15] applied Markov decision processes (MDP) and stochastic optimal control for optimization problem of the observer's trajectories.In [15], the lower bound of the determinant of the FIM matrix was used as reward function, while the parameters evaluating distance from the observer to target were ignored due to its inaccuracy.The observer's trajectories in all those researches do not consider the constraints of the threat environment, while obstacle/threat avoidance is the pure control or decision methodology applied for an autonomous flight of UAV [16,17], which has little connection with BOT problem.Thus, it remains difficult to ensure that the observer's maneuvering trajectories satisfy the requirements of BOT and the threat avoidance at the same time.
In this paper, the model of threat environment was established, and the cost of threat avoidance was combined with the detFIM in reward function.Based on the initial state estimate and the corresponding error covariance provided by the cubature Kalman filter (CKF) [18], the quantization method [19] was applied to discretize the whole process.Finally, the optimal maneuvering strategy can be calculated in each step by the finite-horizon MDP approach based on the reward function.The paper is organized as follows.Section 2 defines the BOT problem, Section 3 introduces the quantization method and CKF, Section 4 introduces the establishment of reward function, Section 5 presents the finite-horizon MDP approach, Section 6 briefly outlines the algorithm, Section 7 presents the numerical simulations, and Section 7 summarizes the work.

Problem Definition
Suppose that the observer is an UAV platform, the target is a car moving with constant velocity, and the height from the observer to the target is known.Considering the static threats in the environment, the threats must be avoided by the UAV to ensure the safety of the flight.The positions are set in 2-dimensional Cartesian coordinate; the state of target is , where the position and velocity are x The state of the observer is where k = 1, 2, … , n.The model of BOT can be described as follows: Equations ( 1) and ( 2) are state function and measurement function, respectively, where x k ∈ ℝ n x , z k ∈ ℝ n z , and n x , n z are the dimension.v k-1 is a zero-mean Gaussian noise with covariance matrix Q, F is the state transition matrix, and u k-1,k is the input.
where Δt is the time interval of measurements, q is the intensity of process noise, and e k is a zero-mean Gaussian noise with covariance matrix R z .

Discretization of the Process
In order to meet the requirements of the discrete time MDP approach, a quantization method is applied to approximate the continuous process of BOT.Actually, the quantization is used to approximate the process x k by a finite Markov chain, which is defined in the quantization algorithm [19,20].For each time, x k 0≤k≤N is divided by grid Γ k , which is common to all these approximation methods.
In the marginal quantization filtering, the grid points x i t N t=1 ⊂ ℝ n x and probabilities Pt,i N t=1 are generated by the Monte Carlo approach.These grids can define a new state by There is one-to-one match between the nearest-neighbor projection Proj Γ k and Voronoi tessellation of Γ, which means Borel partitions The updating rule of grids Γ k is the competitive learning vector quantization (CLVQ) method.For every time of s ∈ M, M is the number of Monte Carlo; select x k,Neig as the closest neighbor of x k in Γ s k ; in the learning phase, set 2 International Journal of Aerospace Engineering where γ k is the updating rate.The CLVQ has been widely adopted for the neuron updating in the neural network.
The details of the algorithm can be seen in [19,20].At each step from Γ i k to Γ j k+1 , the transition matrix of the Markov chain is Thus, the process x k 0≤k≤N is replaced by the Markov chain xk 0≤k≤N with the transition matrix Pk i,j 0≤k≤N−1 at each step.The whole process is divided in several periods; the initial distribution of each period is also quantized by P x 0 ∈ C i Γ 0 , whose initial values are given randomly.So, the CKF is used to estimate the actual position of the target and provide the density x k|k , P k|k as the initial distribution in each period of quantization.
Compared with EKF, the accuracy of CKF is much higher when they are applied for nonlinear filtering problem.The brief steps of CKF are as follows: (i) Time update (vii) Estimate the innovation covariance matrix

15
(ix) Estimate the Kalman gain (xi) Estimate the corresponding error covariance

Reward Function
It is important to note that the reward function is the key to solve the problem with BOT and threat avoidance, namely, it consists of two parts, one of which describes the profits of bearing information from target, and the other is the cost of threat avoidance.For the former part, the detFIM performance indexed by a lower bound on detFIM is commonly used: where σ is the standard deviation of the bearing measurement error, T is the time horizon, θ is the bearing rate, and r is the relative distance from the observer to a static threat.This function is convenient to evaluate.The second part is the cost of threat avoidance.The intensity of threats is determined by the relative distance from the observer to different threats, which means the smaller the relative distance, the greater the intensity.A potential field can be used to establish the threat environment.
In the two-dimensional Cartesian coordinate, the coordinate of the observer is x s k , y s k , the coordinate of a threat is x q k , y q k , and the potential at the coordinate of the observer at time t is In time horizon T, the cost of the threats is The intensity of threat can only exert influences within a limited distance d lim , so we define the potential of threats as 3 International Journal of Aerospace Engineering where d is the relative distance of the observer and threats.Thus, the reward function can be represented by where ε is a constant coefficient, which ensures an agreement of order of magnitude in two parts.

Optimal Maneuvering Strategy
The core issue of optimal strategy is the maximization of the reward function established.To achieve this purpose, the finite-horizon MDP principle can be used, and it demands that the reward function J satisfies the dynamic programming property [7]: So the sequence of controls u 1 , u 2 , … , u k must satisfy For the model of the finite-horizon MDP, where x is the state space, a Borel space; A x is the control or action set, a Borel space; Q is the transition probability function; c is the reward function at each step; and J N is the terminal reward the each finite time horizon.The maneuvering strategy is π = a t | t = 0, 1, … , N − 1 ∈ A; a t is the angle set of maneuvering; the state of the observer with the maneuver strategy a t is defined as For the whole process, the maneuver strategy is π = The optimal whole reward function is The maneuver strategy can be calculated by Based on the quantization, ∀ i, j ∈ Γ k × Γ k+1 , P t x t ,x t+1 can be replaced by P t i,j : P t i,j c g, i , a, h, j + J * t+1 h, i 32

Algorithm
The algorithm is composed of two processes; firstly, a quantization method is applied to provide the discretized density and transition matrix, which is used by the finite-horizon MDP to calculate the optimal maneuvering strategy.The reward function combines the detFIM and the cost of threats.The parameters Γ k , P k ij can be achieved by quantization and based on the parameters and reward function; the optimal maneuvering strategy can be output by the finitehorizon MDP.At the same time, CKF provides the density for x k|k , P k|k as initial density for each period.The diagram of the algorithm is shown in Figure 1 as below.

Numerical Example
To test the feasibility of the algorithm, we apply the algorithm to several scenarios, including static target tracking and moving target tracking with threat or no-threat zone, and different trajectories and filtering methods are compared.The simulation is conducted in MATLAB 7.0 using Windows XP, Intel Core i3, 3.3 GHz platform.The parameters are as follows: q = 10 −5 km 2 /s 3 , σ = 0 5 deg, ε = 0 001, T = 120 s, and d lim = 20 km; measurement interval Δt = 10 s; the observer initial position (0 km, 0 km); and target initial position (100 km, 100 km).Monte Carlo simulation is implemented 100 times for each scenario.
7.1.Static Target Tracking.Two simple scenarios are given for static target tracking, one is in no-threat environment, and the other considers the problem in the threat environment.Considering the kinematic constraint of the observer, the change of relative velocity direction in adjacent time is limited in ±60 °.For static target tracking problem, the state transition matrix F = I.
In Figure 2, the optimal trajectory is "s" shaped; the shape can be explained by formula (19), where r and change of bearing Δθ are positive related to J FIM .Therefore, the observer can move to the target; at the same time, the maneuvering must guarantee the change of bearing is in a limited range; the shape "s" represents the balance between the numerator and the denominator of the formula.Figure 3 shows the target tracking error in the x-axis and the y-axis of scenarios 1 and 2. Compared with Figure 2, Figure 4 shows the variation of the trajectory when the observer encounters the threats.The circle lines of various colors represent the contours of different threats' potential field.In the beginning,     5 International Journal of Aerospace Engineering the maneuver is "s" shaped; as the observer approach the threats, the range of feasible maneuvers markedly reduced, and the direction of velocity changes as well.The difference between Figures 2 and 4 shows the process from formulas (19), ( 20), ( 21), (22), and (23), namely, the cost of threat avoidance influences the reward of detFIM.Although the threats affect the maneuver of the observer, it still provides effective measurements.

Moving Target Tracking.
The moving target tracking scenarios are also set in environments with or without threats.The motion process is shown in 2D Cartesian coordinate.
As shown in Figure 5, the green line represents the trajectory of the observer, the blue line represents the partial direction vectors of bearings, and the red line represents the trajectory of target.The trajectory in Figure 5 is similar to the situation in Figure 2, while the trajectory in Figure 5 is tracking the moving target, so the contour of the entire trajectory is more complicated compared with the one in Figure 2 due to the change of location of the target.Without the influence of threats, the trajectory keeps the "s" shape.Scenario 4 considering multithreat environment is shown above.The trajectory in Figure 6 varies greatly due to the influence of multithreats compared with the one shown in Figure 5; the multithreats limit the range of the observer's maneuvers.Figure 7 shows the target tracking error in the x-axis and y-axis in scenarios 3 and 4.
Scenario conditions are modified in scenarios 4b, 5, 6.In contrast to Figure 6, the trajectory in Figure 8 is quite X error in scenario 3 Y error in scenario 3 X error in scenario 4 Y error in scenario 4 6 International Journal of Aerospace Engineering different because of the change of motion direction of the target, and at the end of tracking, the observer is very close to the target; it moves circularly to obtain more bearing information.In Figure 9, the number of threats increases; the observer changes the trajectory in this complex threat environment compared with Figure 6.Because the target is moving and the trajectory of the observer is not linked to the target all the time, so the changes of angles enable the reward of BOT.
Local minimum may occur in trajectory planning; in this paper, it represents the "trap" which makes up of threats.Next step, we will test the ability of the algorithm to resolve the "trap" problem.The "trap" consists of 6 threats in Figure 10; when the observer comes into the "trap", the path ahead is blocked by threats, it can select the reasonable maneuvers based on the reward function, as we can see in Figure 10, and the observer turns around and moves out of the "trap."The result illustrates the good performance of the algorithm in resolving the "trap" problem. Figure 11 shows the target tracking error in the x-axis and y-axis in scenarios 4b, 5, and 6.

Performance Comparison.
Firstly, the bearing reward of the optimal maneuver is shown in Figure 12, as we can see, the Fisher information increases significantly after 20 min; due to the influence of the distance parameter, its order of    7 International Journal of Aerospace Engineering magnitude is very small, so the variation of the lines in scenario 3 and scenario 6 is not obvious.In scenarios 1, 2, 4, and 5, the observer approaches the target quickly, which results in the decrease of the value of the distance parameter and the increase of the Fisher information.Due to the influence of threats, the variation of the Fisher information does not apply to a certain principle.The time horizons of motions of the observer are different in these scenarios.The moving directions of the target in scenario 4b are different, so it is inapplicable to compare its Fisher information with others.
Then, the bearing reward of optimal maneuvers, random maneuvers, and fixed maneuvers in scenario 4 is compared to illustrate the performance of the algorithm proposed.The root mean square error (RMSE) of positions is applied to evaluate the accuracy of BOT.
where N MC is the number of iterations for Monte Carlo simulation, xj k is the estimated state at time t for Monte Carlo simulation j, and x k is the real state of the target.
Figure 13 shows the trajectories of optimal maneuvers, random maneuvers, and fixed maneuvers; the green line shows the optimal trajectory, the orange line shows the random trajectory, and the purple line shows the trajectory of fixed maneuvers.Figure 14 demonstrates the difference of accuracy of three maneuvering strategies.As we can see, after 10 minutes, there is a slight fluctuation in every RMSE line, which is caused by the costs of threats when the observer is approaching the threats.The optimal maneuvering strategy is characterized with the minimum position RMSE, and the optimal maneuvering enables the position RMSE to decrease at the fastest speed.The straight trajectory has the worst accuracy.
Then, three filtering methods EKF [15], UKF [14], and CKF are compared to test their performances in scenario 4. The result is shown blow.
Figure 15 shows the results of three filtering methods applied in scenario 4. The position RMSE of CKF and UKF decreases faster compared with the one with EKF, which illustrates the superiority of CKF and UKF in nonlinear filtering with the moving target.

Conclusion
The optimal maneuvering problem for BOT considering the influence of threats is solved by the finite-horizon MDP; the   key is to connect the filtering and maneuvering.The quantization method was used to discretize the BOT process to satisfy the working condition of MDP, the initial state estimate and the corresponding error covariance for quantization are provided by CKF for each period, and reward function balances the reward of BOT and the cost of threat avoidance.The comparison results in several scenarios show the superior performance of the proposed algorithm.However, only one target is considered in this paper, and the scenario with multiple targets still needs to be solved.In addition, optimal navigation in the threat environment (which involves steering a mobile observer from an initial position to a final position [21]) and filtering method will be considered in future research.

X error in scenario 1 Y error in scenario 1 X
error in scenario 2 Y error in scenario 2

in scenario 5 Y error in scenario 5 X error in scenario 6 Y error in scenario 6 Figure 11 :
Figure 11: Target tracking results in scenarios 4b, 5, 6.

Figure 12 :
Figure 12: Fisher information of the scenarios.

Figure 14 :
Figure 14: Comparison of position RMSE with different trajectories.