Guidance of Autonomous Amphibious Vehicles for Flood Rescue Support

We develop a path-planning algorithm to guide autonomous amphibious vehicles (AAVs) for flood rescue support missions. Specifically, we develop an algorithm to control multiple AAVs to reach/rescue multiple victims (also called targets) in a flood scenario in 2D, where the flood water flows across the scene and the targets move (drifted by the flood water) along the flood stream. A target is said to be rescued if an AAV lies within a circular region of a certain radius around the target. The goal is to control the AAVs such that each target gets rescued while optimizing a certain performance objective. The algorithm design is based on the theory of partially observable Markov decision process (POMDP). In practice, POMDP problems are hard to solve exactly, so we use an approximation method called nominal belief-state optimization (NBO). We compare the performance of the NBO approach with a greedy approach.


Introduction
Various guidance algorithms for autonomous amphibious vehicles (AAVs) are being designed and tested to fight today's global warming disasters such as flooding, typhoon, and hurricane [1][2][3].With this motivation, we present a guidance framework to control multiple AAVs to rescue multiple victims (henceforth called targets) in a flood situation, where the flood water (interchangeably called river) flows along a valley as shown in Figure 1.A target is said to be rescued when an AAV is within the circular region of radius  dist-thresh on the 2D plane around the target.In general, AAVs are equipped with various advanced sensors such as polarized stereo vision, laser scanning, and SONAR [4][5][6].The sensors onboard an AAV generate the (noisy) measurements corresponding to the targets and the river.Our goal is to design a path-planning algorithm that guides the AAVs so that every target gets rescued, while maximizing a performance measure (discussed later).The algorithm runs on a notional central fusion node, which collects the measurements from the sensors on-board each AAV, fuses them and updates the tracks on the targets and the river state (discussed later), computes the control commands for the AAVs, and sends the control commands back to the AAVs.
Guidance control methods [1,[7][8][9] for AAVs are normally based on a standard three-layered system architecture that requires human-machine interactions.We design the guidance algorithm based on the theory of partially observable Markov decision process (POMDP) [10,11].There are several other autonomous control methods in the literature for AAVs and underwater vehicles, for example, [12][13][14].Our approach differs from these existing approaches in that we place the guidance problem in the context of POMDP, wherein this approach has a look-ahead property, which trades off shortterm for long-term performance.

Problem Specification
The AAV guidance problem is specified as follows.
2.1.Targets.In this study, we assume that there are multiple mobile targets (flood victims) located in a river, being drifted down by the flood water, as shown in Figure 1.

Autonomous Amphibious Vehicles (AAVs).
There are multiple autonomous amphibious vehicles (AAVs) located on the shore, as shown in Figure 1.An AAV is controlled by the following kinematic controls: forward acceleration and steering angle.Each AAV is equipped with on-board sensors that generate measurements of targets and the river depth.In this problem, AAVs float when moving in the river.For the purpose of this study, we assume that the number of AAVs and the number of targets are the same.

Environmental Conditions.
The elevation map of the region is known a priori.The landscape for this problem is shown in Figure 1, which shows a river flowing along a valley from the north toward the south.The state of the river includes the depth  ref  at a reference point on the map (lowest point in the landscape, e.g., some location at the bottom of the valley as shown in Figure 1).

River Model.
Typically a river flows slowly near the coastlines (where the river is shallow) and flows quickly far from the coastlines (i.e., toward the center of the river where the river is deep).In this paper, we assume that the river flows from the north toward the south in a v-shaped channel as shown in Figure 1.We adopt the logarithmic velocity profile to model the velocity of the flow (see [15] for a detailed description).According to this model, the speed of the river, at the surface, at the location (, ) at time  is given by where   (, ) is the depth of the river at the location (, ) at time , and  1 (a function of the viscosity and the density of flood water) and  2 are constants (see [15] for more details).

Observations.
The sensors onboard an AAV generate noisy observations of target locations and the depth of the river directly beneath the vehicle, that is, the sensors generate the observations of the depth of the river only when the AAV is in the river.

Objective.
A target is said to be rescued if there is an AAV within a circular region of radius  dist-thresh around the target.
The objective is to minimize the average rescue time, where the average is over the number of targets, and the rescue time of a target is defined as the time it takes to rescue the target.

Problem Formulation
We cast the AAV guidance problem into the framework of a partially observable Markov decision process (POMDP).A POMDP is a mathematical framework useful for solving resource control problems and enables us to exploit approximation methods for POMDPs to design our AAV guidance algorithm.A POMDP evolves in discrete time steps.We use  as the discrete-time index.To cast the AAV guidance problem into the POMDP framework, we need to define the following key components in terms of our guidance problem as follows. is the depth of the river at the reference point at time .The reference point is the lowest point in the elevation map, that is, some location at the bottom of the valley in the landscape, as shown in Figure 1.Here, we assume that the flow direction of the river is the same everywhere and is known a priori.The target state   includes the locations and the velocities of the targets at time .The track states represent the state of the tracking algorithm, where  riv  and  riv  are the mean and the variance, standard in Kalman filter equations, corresponding to the river state, and, similarly,  targ  is the mean vector and  targ  is the covariance matrix corresponding to the target state.

Observations and Observation
Law.The vehicle and the track states are assumed to be fully observable.The river and the target states are only partially observable.The observation of the river state at an AAV is given by where  riv  ∼ N(0,   ), and   is the measurement variance.The sensors at an AAV generate the measurement of the river state only when the AAV is in the river.In practice, the sensors on an AAV measure the depth of the river exactly below the AAV.We wrote the observation model (2) as if the sensors are generating the observations of the depth of the river at the reference point.The rationale behind this assumption is that we can always calculate the depth of the river at the reference point given the elevation map and the observed depth of the river at a different location.The observation of the th target at an AAV is given by where  is the target-state observation model,    is the state of th target, and  targ  ∼ N(0,   ), where   is the measurement covariance matrix.The line-of-sight between the target and the AAV is blocked sometimes, for example, whenever the target sinks in the water.

Actions.
The actions include the controllable aspects of the system.In this problem, the actions include the decisions on the assignment of AAVs to targets, and kinematic control commands for AAVs.Let   be the action tuple at time , which is given by   = (  ,   ), where   represents kinematic control vectors (includes forward acceleration and steering angle for each AAV), and   is a vector, which represents the assignment of AAVs to targets, that is,   () =  means that the th AAV is assigned to the th target.For the purpose of this study, the number of AAVs and the targets is the same.Each AAV is assigned to only one target, and each target gets assigned only one AAV, that is,   represents a one-to-one correspondence between the AAVs and the targets.

State-Transition Law.
The state-transition law specifies the next-state distribution given the current state and the action.The transition function for the vehicle state is given by  +1 = (  ,   ,  riv  ), where  (defined later) represents the AAV kinematic model,   is the vehicle state,   is the kinematic control vector (includes forward acceleration and steering angle), and  riv  is the estimated river state at time .The river state evolves according to the following equation: where  riv  is the process variance corresponding to the river state evolution.The target state evolves according to where  represents the target motion model, and  targ  is the process covariance matrix corresponding to the target state evolution.The track states evolve according to the Kalman filter equations given the observations from the sensors onboard the AAVs.When the observations are not available, the track states evolve according to the Kalman filter equations, where only the prediction step is performed and the update step is not performed.

Cost. The cost function represents the cost of performing an action at the current state. The cost function is given by
where  ,pos +1 represents the 2D position coordinates of th AAV,  ,targ,pos +1 represents the estimated 2D position coordinates of the th target at time  + 1, ‖ ⋅ ‖ is the Euclidean norm (everywhere in this paper), and 1{⋅} is the indicator function which equals 1 when the expected distance between the AAV and the target at time  + 1 is greater than some threshold distance  dist-thresh and 0 otherwise.

Belief State.
The belief state   is the posterior distribution of the state at time .The vehicle and the track states are assumed to be fully observable, that is, the belief state corresponding to the vehicle state is given by    () = ( −   ), where (⋅) is the Kronecker delta function.Similarly, the belief states corresponding to the track states can be written in terms of the actual track states.The belief states corresponding to the river and the target are the posterior distributions of  ref  and   , respectively, given the history of observations.

Objective and Optimal Policy
The goal is to find the action sequence ( 0 ,  1 , . . .,  −1 ) such that the expected cumulative cost over a time horizon  is minimized.The expected cumulative cost is given by We can write the expected cumulative cost in terms of the belief states given the initial belief state  0 (similar to the treatment in [10,11]) as follows: where (  ,   ) = ∫ (,   )  ()d, and  0 is the belief state at time  = 0. From Bellman's principle of optimality [16], the optimal objective function value is given by where  1 is the random next belief state,  * −1 is the optimal cumulative cost over the horizon −1,  = 1, 2, . . ., −1, and E[⋅ |  0 , ] is the conditional expectation given the current belief state  0 and the current action  at time  = 0. Let us define the  value of taking action  given the current belief state  0 : The optimal policy (from Bellman's principle) at time  = 0 can be written as In general, it is hard to obtain the  value exactly.There are several approximation methods in the literature: heuristic expected-cost-to-go (ECTG) [17], parametric approximation [18], policy rollout [19], hindsight optimization [20], and foresight optimization [21].In this paper, we use one such approximation method called nominal belief-state optimization (NBO), which was introduced in [11] along with other approximations and techniques specific to guidance problems.The rationale behind choosing NBO method over other methods to solve POMDP is that it is relatively inexpensive in terms of computation time, that is, the computational requirements are not prohibitive unlike other approximation methods.The following subsection provides a brief description of the NBO method.
4.1.NBO Approximation Method.The computational requirements of obtaining the optimal assignments of AAVs to targets (  ) over a long horizon are prohibitive.Also, we expect that the optimal assignment of AAVs to targets (  ) over a long horizon does not change with time.For these reasons, in the NBO method, we keep the assignment of AAVs to targets fixed.In other words, in approximating the expected cost-to-go in (10),   remains fixed over the planning horizon .Therefore, we drop the subscript  from   in the objective function used in the planning based on (10), that is,   =  for all .In the NBO approximation method, we use the following objective function, written in terms of belief states: where   represents the kinematic controls for the AAVs, and  is the assignment of AAVs to the targets.The belief states corresponding to the river state and the target state are given by where ( riv  ,  riv  ,  targ  ,  targ  ) are the track states corresponding to the river and the target states, respectively, which evolve according to the Kalman filter equations.In the NBO method, we approximate the objective function as follows: where b1 , . . ., b−1 is a nominal belief-state sequence, and the optimization is over an action sequence ,  0 , . . .,  −1 .We obtain the nominal belief states by evolving the current belief state with exactly zero-noise sequence over the horizon  (similar to the treatment in [10,11]).Therefore, the objective function from the NBO method is given by The evolution of vehicle state depends on the river state estimate  riv  .In the NBO method,  riv  is replaced with ξriv  in the AAV kinematic model (⋅), where ( ξriv 1 , . . ., ξriv  ) are the nominal track state components corresponding to the river state, and the obtained positions of the th AAV ŝ,pos +1 are called nominal positions.
Here, we adopt an approach called "receding horizon control, " according to which we optimize the action sequence for  time steps at the current time step, implement only the action corresponding to the current time step, and again optimize the action sequence for  time steps in the next time step.The length of the planning horizon  should be large enough for an AAV to receive a benefit by moving toward a target.Due to computational constraints, we cannot have an arbitrarily long horizon.Therefore, we truncate the length of the horizon to a few time steps (we set  = 6 in our simulations) and append the cost function with an appropriate expected cost-to-go (ECTG).The following is a distance-based ECTG: where ŝ,pos  is the nominal position of the th AAV, and ξ,targ,pos  is the estimated location of the th target (from NBO approach) at time  = .Therefore, the objective function from the NBO method is given by where  dist-ECTG  is the distance-based ECTG.

AAV Kinematics.
The kinematic equations of an AAV vary depending on whether the AAV is in the river or on the land.When the AAV is in the river, we take into account the speed of the river to write the kinematic equations.The steering and thrust generation of the vehicle are modeled based on the work done by the authors of [2,22], which is designed using single drive system.The vehicle is frontwheel driven on land.When the AAV is in the river, it is propelled using the centrifugal pump from the front wheels.
The following subsections describe the kinematics of AAV on the land and in the river.where  land is the maximum steering angle.The function  can be specified by a set of nonlinear kinematic equations, as shown below:

Kinematics of AAVs on the
where  is the length of the time step,  is the width of the vehicle, and  is the distance between the front axle and the rear axle.The derivation of the heading angle update ( 19) is as follows.When the front wheels of the vehicle are oriented at a particular angle   with respect to the main axis of the vehicle (as shown in Figure 2), the heading direction of the vehicle at time  + 1 is derived as follows:

Kinematics of AAVs on the River.
This subsection provides the definition of , when the vehicle is in the river.The kinematic equations of the AAV motion are as follows: where ŵ  (  ,   ) and ŵ  (  ,   ) are the estimated speeds of the river at the location (  ,   ) in  and  directions, respectively, which are obtained from the river state estimate ξriv  and the river model presented in Section 2. The speed and the heading angle update equations remain the same as in the case of land.When in water (or river), the control variable   lies within the interval [− water ,  water ], where  water is the maximum acceleration, and   lies within the interval [− water ,  water ], where  water is the maximum steering angle.Typically, the values of  water and  water are much smaller compared to that of  land and  land .

Simulation
We implement the NBO method in MATLAB, and we use the command fmincon (MATLAB's optimization tool) to solve the optimization problem.For performance comparison, we also implement a greedy approach, where we optimize only the current kinematic control for the AAVs such that the following symmetric-distance-based cost is minimized: where ŝ,pos +1 and ξ,targ,pos +1 are the nominal positions (obtained by evolving the belief states with zero noise) of the th AAV and the th target at time  + 1, respectively.Our simulation environment is two dimensional, that is, the AAVs, the river, and the targets move in 2D.According to the river model, the speed of the river stream   at a location (, ) is given by   (, ) =  1 [log(  (, )) +  2 ], where   (, ) is the depth of the river at (, ), and  1 and  2 are constants.Since the depth of the river is not fully observable, we estimate   (, ) as follows.The elevation map of the landscape is known  a priori, that is, if we know the depth of the river at a particular location, we can obtain the depth of the river at all locations.Therefore, we estimate the depth of the river at location (, ), that is, d (, ) using the estimated depth of the river at the reference point dref  (= ξriv  ).Therefore, the estimated speed of the river at location (, ) is given by ŵ (, ) =  1 [log( d (, )) +  2 ].We set the length of the horizon  to 6 time steps, and the length of the times step  to 1 second.In the simulations, the flooded river flows along a valley in the landscape from the north toward the south as shown in Figure 1.Since the simulations are in 2D, the river flows toward the − direction, and the river speed in  direction (toward the east) is zero at every location.Therefore, the estimated speeds of the river at location (, ) in  and  directions are given by ŵ  (, ) = 0 and ŵ  (, ) = − 1 [log( d (, )) +  2 ].Here, we model the dynamics of the target motion by the constant velocity model (see [23] for the definition of the variables  and  targ in ( 5)).
In the simulations, an AAV is represented by a rectangle, and the line connecting the rectangles represents the trajectory of the AAV.We define a performance metric called average rescue time-the average of the rescue times of each target (the rescue time of a target is the time elapsed after the start of the simulation until it is rescued).The POMDP cost function defined in Section 3 is reflective of this performance metric.We simulate three scenarios: Scenario I, Scenario II, and Scenario III.In Scenario I, there are two AAVs, each one located on the opposite banks of the river, and two targets are moving (being drifted by the moving water) in the river, as shown in Figure 3. Figure 3 shows a snapshot of the scenario at the end of the simulation with the NBO approach, where the average rescue time is 36 time steps.We also simulate Scenario I with the greedy approach, as shown in Figure 4, where the average rescue time is 64 time steps.In Scenario II, there are two AAVs on the left bank of the river, and two targets are moving in the river.We simulate this scenario with both the NBO and the greedy approaches.Figure 5 shows the snapshot of the scenario with the NBO approach at the end of the simulation, where the average rescue time is 45 time  steps, and Figure 6 shows the simulation of the same scenario with the greedy approach, where the average rescue time is 62 time steps.In Scenario III, there are three AAVs (two on the left bank of the river and one on the right), and three targets are moving in the river.We simulate this scenario with both the NBO and the greedy approaches.Figure 7 shows the scenario with the NBO approach, where the average rescue time is 48 time steps, and Figure 8 shows the simulation of the same scenario with the greedy approach, where the  average rescue time is 76 time steps.The simulation of these scenarios demonstrates that the NBO approach achieves a better coordination among the AAVs compared to the greedy approach while rescuing the targets, as evident from the average rescue times.
We compare the performance of the NBO approach with that of the greedy approach through Monte-Carlo simulations.We simulate the above scenarios with the NBO and the greedy approaches separately for 50 Monte-Carlo runs.In each scenario, we compute the average rescue time in every run for both the NBO and the greedy approaches.frequencies of average rescue times for the NBO and the greedy approaches for Scenarios I, II, and III, respectively.Figures 9, 10, and 11 demonstrate that the NBO approach significantly outperforms the greedy approach.
The algorithm (NBO) runtime to compute the control commands for three AAVs (in Scenario III) in any time step in MATLAB is approximately 4 seconds on a lab computer (Intel Core i7-860 Quad-Core Processor with 8 MB Cache and 2.80 GHz speed).This runtime can be greatly reduced on a better processor and by further optimizing the code.Since the algorithm runtime is not prohibitive, it can be used in real time (i.e., for practical purposes).

Conclusions, Remarks, and Future Scope
We designed a guidance algorithm for autonomous amphibious vehicles (AAVs) to rescue moving targets in a 2D flood scenario, where the flood water flows across the scene, and the targets move in the flood water.We designed this algorithm based on the theory of partially observable Markov decision process (POMDP).Since a POMDP problem is intractable to solve exactly, we used an approximation method called nominal belief-state optimization (NBO).We simulated a few scenarios to demonstrate the coordination among the AAVs achieved by the NBO approach.We defined a performance metric called average rescue time to compare the performance of our approach with a greedy approach.Our results show that the NBO approach outperforms the greedy approach significantly.This was expected because unlike the greedy approach the NBO approach has a lookahead property, that is, the NBO approach trades off the shortterm performance for the long-term performance.Although the greedy approach achieves coordination among the AAVs in that the AAVs eventually rescue all the targets, but the performance in terms of average rescue time, which is crucial in these kinds of rescue missions, is poor compared to our NBO approach.In our future work, we would like to develop methods to further improve our NBO approach (e.g., NBO with adaptive horizon).We would also like to extend our approach to a decentralized AAV guidance problem to rescue multiple targets.In this decentralized case, we will induce coordination among the AAVs to rescue multiple targets by appropriately optimizing the communication (at the network level) between the AAVs along with the kinematic controls for the AAVs.
Land.This subsection provides the definition of , which was introduced in Section 3, when the vehicle is on land.Let   = (  ,   , V  ,   ) be the state of the vehicle at time , where (  ,   ) represents the location of the vehicle on the 2D plane, V  represents the speed of the vehicle along the heading direction, and   represents the heading angle of the vehicle at time .Let   = (  ,   ) represent the action vector of the vehicle, where   represents the acceleration along the direction of the front wheels, and   represents the steering angle of the front wheels.The (simplified) schematic of a basic fourwheeled vehicle is shown in Figure2.The control variable   lies within the interval [− land ,  land ], where  land (or − land ) is the maximum acceleration (or deceleration), and the control variable   lies within the interval [− land ,  land ],

Figure 2 :
Figure 2: Free body diagram of an AAV.

Figure 3 :
Figure 3: Simulation of Scenario I with NBO approach, average rescue time = 36 steps.

Figure 4 :Figure 5 :
Figure 4: Simulation of Scenario I with greedy approach, average rescue time = 64 steps.

Figure 6 :
Figure 6: Simulation of Scenario II with the greedy approach, average rescue time = 62 steps.

Figure 7 :
Figure 7: Simulation of Scenario III with NBO approach, average rescue time = 48 steps.

Figure 8 :
Figure 8: Simulation of Scenario III with the greedy approach, average rescue time = 76 steps.

Figure 9 :
Figure 9: Performance comparison for Scenario I: NBO approach versus greedy approach.
Let   represent the state of the system at time .The state of the system includes the state of the vehicles (AAVs)   , river state (depth of the river at a reference location)  ref  , target state   , and track states ( riv  ,  riv  ,  The vehicle state   includes the locations and the velocities of the AAVs at time .The river state  ref 3.1.States.targ ).