Adaptive Navigating Control Based on the Parallel Action-Network ADHDP Method for Unmanned Surface Vessel

. The feedback PID method was mainly used for the navigating control of an unmanned surface vessel (USV). However, when the intelligent control era is coming now, the USV can be navigated more eﬀectively. According to the USV character in its navigating control, this paper presents a parallel action-network ADHDP method. This method connects an adaptive controller parallel to the action network of the ADHDP. The adaptive controller adopts a RBF neural network approximation based on the Lyapunov stability analysis to ensure the system stability. The simulation results show that the parallel action-network ADHDP method has an adaptive control character and can navigate the USV more accurately and rapidly. In addition, this method can also eliminate the overshoot of the ADHDP controller when navigating the USV in various situations. Therefore, the adaptive stability design can greatly improve the navigating control and eﬀectively overcome the ADHDP algorithm limitation. Thus, this adaptive control can be one of the intelligent ADHDP control methods. Furthermore, this method will be a foundation for the development of an intelligent USV controller.


Introduction
e development of science and technology is speeding up the informatization to intellectualization.Many systems are facing the new problems of intelligent control.
e intelligent control can also be widely used in the unmanned surface vessel (USV) for its learning and adaptive ability.At present, the control research for the USV mainly focuses on the path-following, path planning, formation control, and experiment etc.
Among the above researches, the path-following has been researched at most.In 2017, Shin et al. proposed a pathfollowing control for the USV based on an identified dynamic model [1].In 2018, Qin et al. solved the path-tracking control of the USV with input saturation and full-state constraints [2]; Qu et al. presented an exponential pathtracking control of the USV with external disturbance, system uncertainties [3], etc. ese methods generally require the precise model of the USV or external environment, which are difficult to establish.e path planning is also one of the major research studies.In 2017, Kim et al. fulfilled the path optimization of the USV under environmental loads using the genetic algorithm [4].In 2018, Lyu and Yin used a path-guided potential-field method to achieve the path planning in restricted waters [5].In 2019, Wang et al. adopted an improved grey wolf optimizer to optimize the USV trajectory [6]. is is a new intelligent method, but it still has some shortcomings.
In addition, in 2017, Klinger et al. evaluated the intelligent controller performance through a USV experiment [7].In 2018, Conte et al. developed a ROS of multi-agent structure for the USV, and tested its path planning [8].Dai et al. combined many methods to achieve the UAV formation control [9] and so on.For more detailed development and trend analysis, please refer to these review papers [10][11][12].
However, the nonlinear adaptive optimal control of the USV has not been fully investigated.is is an important task in the intelligent USV control, and further research is needed.For the adaptive optimal control of the USV, it usually needs solving the Hamilton-Jacobi-Behrman (HJB) equation.is equation is difficult to solve in many cases except for a linear quadratic system.e classical dynamic programming method still suffers to the curse of dimensionality.Fortunately, in recent years, the approximate dynamic programming (ADP) method has arisen with neural network approximation [13].Liu et al. gives the most recent developments of the ADP theory and its advances in industrial control [14].Yang et al. presented a guaranteed cost neural tracking control for a class of uncertain nonlinear systems using the ADP [15].ese researches make the ADP suitable for the optimal control of minimizing a performance index.us, the ADP can also be used to learn and adjust the navigating according to the USV conditions.
However, in practice, simply relying on the ADP cannot achieve enough accurate adjustment, especially under an external influence, while the adaptive control has the ability to adapt to different environment.If we combined the advantages of these two kinds of control, the USV navigating can be adjusted more quickly and accurately.erefore, in this paper, these two methods are jointed for the navigating USV.en, based on the analysis of the USV, this paper presents a parallel action-network ADP method.is method connects an adaptive controller parallel to the action network of the ADP. is parallel adaptive controller adopts a RBF neural network approximation based on the Lyapunov stability analysis.e simulation results show that the parallel action-network ADP has an adaptive control character and can navigate the USV more accurately and rapidly.In addition, this method can also eliminate the overshoot of the ADP controller when navigating the USV in various situations.us, this adaptive control can be one of the intelligent ADP control methods. is method will also be a foundation to the development of an intelligent USV controller.

The ADHDP Control Method
e action-dependent heuristic dynamic programming (ADHDP) is one of the main approaches in the ADP family.e critic and action networks of the ADHDP are usually established based on the BP neural network [16,17].As shown in Figure 1, the main function of the critic network uses the system state to approximate the cost function J(t).e correctness of the cost function will guide the action network to the optimal u(t) approximation.If the output of the action network is received as part of the input to the critic network, it forms a new critic network.
e new critic network conceals a controlled object model and approximates the new cost function Q(t).e output Q(t) is an estimation of the cost function in the next time.us, Q(t) � J(t + 1), and the ADHDP can perform control without the mathematical plant model.

e Problem Formulation.
e action-critic scheme of the ADHDP and its symbols are both shown in Figure 2. Assuming that a nonlinear system is set up as where x(t) ∈ R n is the system state of time t, u(t) ∈ R m is the system action of time t, and F t is the state transition equation of the nonlinear system.In order to let the system run in an optimal state for the dynamic programming algorithm, the cost function must be defined as [18] J where α is the discount factor (0 < α ≤ 1), and c is the utility function of each time step.e objective is to choose the control action, u(k), k � i, i + 1, . .., so that the cost function J(t) defined in equation ( 2) is minimized [19].

e Critic Network.
As shown in Figure 2, the input vector to the critic network is e input q i (t) to the hidden layer is calculated as s in the hidden layer of Figure 2 means the nonlinear Sigmodial function.us, the output of the hidden layer is e / in the output layer means adopting a linear function, so the output layer is e ADHDP approximates the optimal solution by minimizing the new cost function; that is, . is is achieved by training the new critic network to minimize the following error measured over time: where w c is the critic network weights.e output Q(t) of time t is an estimation of the dynamic programming at time t + 1, which is the cost function in the next time.en, the gradient descent algorithm is used to update the critic network weights and minimize E c (t) at each time step by 2 Advances in Materials Science and Engineering where l c (t) > 0 is the learning rate of the critic network at time t, which decreases to a very small value with time t [20].

e Action Network.
As shown in Figure 2, the input to the action network is e input h i (t) to the hidden layer of the action network is calculated as w (1)  a i,j x j .(10) e hidden layer uses the nonlinear Sigmodial function; thus, the output of the hidden layer is e input v k (t) to the output layer is calculated as e output layer also uses the nonlinear Sigmodial function; thus, the output of the action network is When the critic network has been trained, the objective of training the action network is to minimize the following index: x 1 (t) x 2 (t) x n (t) x 1 (t) x n (t) Advances in Materials Science and Engineering en, the gradient descent algorithm is also used to update the weight of the action network by minimizing E a (t) with where l a (t) > 0 is the learning rate of the action network at time t, which also decreases to a very small value with time t [20].
After getting the optimal Q(t) function, the optimal control u * (t) can be obtained by the following equation: that is, to obtain the optimal control output of the action network, it is necessary to make the cost function Q(t) � 0. is will also make the critic network output equal to zero as close as possible.
As shown in Figure 3, the dashed line means that the error should be eventually reduced to zero by calculating the error of each operation and the increment of the critic and action network weights.

The Parallel Action-Network ADHDP: The
Adaptive ADHDP Method e addition of an adaptive control to the navigating of the USV is to ensure its system stability.
is is because the gradient descent algorithm cannot guarantee the system stability [21] when updating the ADHDP parameters.In addition, it will lead to a local optimum in some cases.
erefore, the stability design based on the Lyapunov theory will greatly increase the stability of the navigating control and effectively overcome the limitation of the ADHDP algorithm.
Based on the error minimization in equation ( 18), the error and its differential are used as the inputs to both the action network and adaptive controller, as shown in Fig- ure 4.After getting the error and its differential inputs, the adaptive control algorithm is calculated separately.
When different control signals u(t) are obtained, the litter between the adaptive control and action network outputs is selected as input to the controlled object.is will also help the stability of the navigating USV system.In this way, the advantage of the AHDHP and adaptive control can be integrated to achieve the navigating stability of the USV.
e adaptive control process is as follows: Supposing that the nonlinear system is e � rin − y, (18) where rin is the tracking navigation command, the error e is the difference between the command and the system output y, x is the system state, u is the control input, and f(x, _ x) and g(x, _ x) are the nonlinear state functions of the USV system.
Supposing that E � e _ e   T , and the optimal control u * is designed as where K is a coefficient.By substituting equations ( 18) and (19) in equation ( 17), and simplifying its form, the system error satisfies To design K � k 1 k 2   T , the root of its polynomial function s 2 + k 2 s + k 1 � 0 must be in the left half plane of the phase-plane.When t tends to infinity, there should be e(t) ⟶ 0 and _ e(t) ⟶ 0. For the adaptive control, the RBF neural network is used for approximation.e input to the network is x � e _ e   T , and the output of the network is x 1 (t) x 2 (t) x n (t) x 1 (t) x n (t)  Advances in Materials Science and Engineering where  W is the estimation of the network weight, ) is a Gauss basis function of No. j weight, and h(x) is a vector of the Gauss basis function, please see [22].Suppose that the matrix P is symmetric positive-definite and satisfies the following Lyapunov equation: where and I is a unit matrix.After obtaining the matrix P, the expected RBF network weights are where B � 0 1   T .After the weights of the RBF neural network are obtained, the output of this network can be obtained by equation (21).Finally, the optimal control signal can be obtained by equation (19).

The USV Model for Navigating Control
e ADHDP is based on the state system and does not require a controlled object model.us, the ADHDP is also known as a data-driven control method [23], which enables an online learning and control [24].us, in this study, the USV model is only used as a simulation object, not for the controller design.However, the choice of a suitable controlled object model is also very important for numerical experiment.
e motion model and state space of the USV used in this paper are summarized in [25]. is model was proposed by Abkowitz and other scholars with mathematical integrity and rigor.e standard state space of the USV is expressed as follows where A � a 11 a 12 0 a 21 a 22 0 0 1 0 In it, m is the mass of the USV, m ′ is its dimensionless mass during calculation, which is calculated with m ′ � (1/2)mρL 3 , ρ is the sea water density, L is the length between the perpendiculars; I zz ′ is the inertial distance, which is calculated with I zz ′ � I zz ((1/2)ρL 2 ), and I zz � (mL 2 /16); V is the USV speed; S 1 is the reference area.For other related intermediate variables, please see [25], and they can be calculated with Figure 4: e scheme of the parallel action-network ADHDP method for the navigating control of the USV.Advances in Materials Science and Engineering where B is the width of the USV, C b is the block coefficient, A δ is the area of the rudder leaf, and T is the draught of the USV.
Considering the marine environment effect to the movement of the USV, the interference of a white noise ω � ω 1 ω 2 ω 3   is added.us,

e Parameters of the Control System.
e utility function of the adaptive ADHDP controller can be designed as where e(t) is the error in equation ( 18) at time t, and _ e(t) is its differential with respect to time t.
e MATLAB/Simulink is used for the simulation.e S-function is used to establish the related model and algorithm because there is no ADHDP toolbox in MATLAB.
e parameters of the simulating USV model are shown in Table 1.

Simulation and Results
(1) e navigating target is set as east (90 °) and the initial navigation is north (0 °). e simulation time is 50 seconds.e navigating responses of a PID and the adaptive ADHDP control are shown in Figure 5. e detailed comparison information of result curves in Figure 5 is shown in Table 2.
From the data in Figure 5 and Table 2, it can be seen that the combination of the ADHDP and adaptive control can shorten the adjusting time and eliminate overshoot.e data in figures and tables also prove that the combination of the ADHDP and adaptive control can adjust the navigating USV quickly and accurately, and it only needs one navigating change.(2) When navigating the USV in a narrow river, the adjusting frequency and time are significantly increased.For the simulation in this condition, the navigating target is set as east (90 °) for the first 50 seconds, and the initial heading is north (0 °). en, the navigating target is set as north (0 °) in the 50th seconds, and the navigating should turn around 90 °. is stage runs for another 50 seconds.en, the navigating target is set as east (90 °) again in the 100th second, and the navigating should turn back 90 °. is stage also runs for 50 seconds.e simulation results are shown in Figure 6.According to the simulation results, it can be seen that the PID controller has a long time to adjust the navigating USV, and its overshoot is very big.However, the navigating USV control with the   7 shows the control effect and response under the above continuous and sharp navigating with the individual ADHDP, and compares to that of the adaptive ADHDP.It can be seen from Figure 7 that the ADHDP controller has a larger overshoot in a continuous and sharp navigating change.is is because the ADHDP is based on the data-driven and previous learning effects, which is difficult to make a right response in a short period of time.After combining the adaptive control, the adaptive ADHDP can avoid an overshoot in mutation state and can ensure a safe driving for the USV.

Discussion and Conclusion
e ADHDP control is a data-driven based method, which can be used without a plant model.In the simulation, it can be seen that the adjusting time of the ADHDP controller is much smaller than that of the PID controller, but the overshoot is larger than that of the PID.
erefore, an adaptive control is added to eliminate the overshoot of the ADHDP controller under the condition of a continuous and sharp navigating change.
In addition, the simulation proves the feasibility of the adaptive ADHDP control method.is control method also does not need the model of the controlled object, which can simplify the controller design although it needs combining the adaptive method.
is method can also increase the speed and stability of the controller.us, these methods can lay a foundation for the development of the intelligent USV and have a practical effect for a further enhancing of the intelligent USV control.

Figure 1 :
Figure 1: A new critic network established in the ADHDP.

Figure 2 :
Figure 2: e action-critic principle and structure of the ADHDP.

Figure 3 :
Figure 3: e schematic diagram of the ADHDP control method.

4
and δ is the rudder angle of the USV.

Figure 5 :
Figure 5: A 90 °navigating control response of the USV under a PID and the adaptive ADHDP controller.

Figure 6 :
Figure 6: e PID and adaptive ADHDP controller response to a continue sharp navigating process of the USV.

Figure 7 :
Figure 7: e ADHDP and adaptive ADHDP controller response to a continuous and sharp navigating process of the USV.

Table 1 :
e related USV parameters for simulation.

Table 2 :
e comparison of a 90 °navigating control response of the USV under a PID and the adaptive ADHDP controller.