UUV Autonomous Decision-Making Method Based on Dynamic Influence Diagram

Considering the complexity and uncertainty of decision-making in the operating environment of an unmanned underwater vehicle (UUV), this study proposes an autonomous decision-making method based on the dynamic influence diagram (DID) and expected utility theory. First, a threat assessment model is established for situation awareness of the UUV. Accordingly, a DID model is developed for autonomous decision-making of the UUV. +en, based on the threat assessment results for the UUV, the utility of each decision-making plan in the decision-making nodes is inferred and predicted. Subsequently, the principle of maximum expected utility is used to select an optimal autonomous decision-making plan. Finally, the effectiveness of the DID method is verified by simulation. Compared with the traditional expert systems, the DID system shows great adaptability and exhibits better solutions of dynamic decision problems under uncertainty.


Introduction
Unmanned underwater vehicles (UUVs) play an important role in the modern marine environment [1]. ey are affected by various uncertain factors such as changes in the mission, marine environment, and threats. erefore, UUVs should be able to perform autonomous decision-making in such complex marine environments. To achieve autonomous decision-making of UUVs, their own state and the state of their complex marine environments are estimated while tasks are being performed and the corresponding threat estimation degree is determined, based on which the UUV is controlled. Future marine environments are expected to become increasingly intelligent and informative; hence, it is crucial for UUVs to improve their intelligence levels and make decisions autonomously. erefore, studying UUV autonomous decision-making technologies is of great significance.
Many domestic and international exploratory studies have been conducted to investigate autonomous decisionmaking methods. e theoretical methods are mainly categorized into expert systems, artificial neural networks, genetic algorithms, and influence diagrams (IDs). e expert system is a mature and easy-to-implement method, with comprehensive expert knowledge content. is method is extensively used in practical applications [2][3][4]. However, it cannot be easily modified or improved and exhibits poor adaptability. e neural network method has a strong adaptability to various environments, and with its advantages, including robustness, parallelism, and self-learning ability, it can overcome the shortcomings of the expert system method [5][6][7]. However, the neural network method requires many samples for training, thus making it more complex compared to the expert system method. In addition, it is extremely difficult to choose an appropriate network model. e introduction of the genetic algorithm to autonomous decision-making has yielded satisfactory results because it facilitates the global optimum in the entire process. However, the genetic algorithm suffers from poor real-time decision-making performance [8][9][10]. Solving the delay problem is the key to exploiting the key genetic algorithm in decision-making. e ID method is a graphical representation method to solve decision-making problems on the basis of decision tree analysis by combining a Bayesian network with the decision tree model. Formally, decision-making nodes and utility nodes are added to a Bayesian network. erefore, like Bayesian networks, the ID can represent complex relationships among variables. In addition, integration of the utility theory of the decision-making tree makes the ID method more suitable than a Bayesian network in dealing with uncertain reasoning problems [11]. e ID method can be applied to various battlefield environments and exhibits strong adaptability and good decision-making performance under uncertain conditions [12,13]. However, this method requires further development. Specifically, the achievement of theoretical superiority depends on the performance of the algorithm used. Although the ID method integrates utility theory based on a Bayesian network, it cannot easily express or deal with dynamic decision-making problems.
is paper presents an autonomous decision-making method based on a dynamic influence diagram (DID) and expected utility theory for reasoning, prediction, judgment, and plan selection. e DID model with decision-making nodes and expected utility nodes is used in the process of project selection. en, based on the threat assessment results for the UUV, the utility of each decision-making plan in the decision-making nodes is inferred and predicted. Subsequently, the principle of maximum expected utility is used to select an optimal autonomous decision-making plan. Finally, the effectiveness of the DID is verified by simulation analysis. Compared with other expert systems, the system proposed in this paper is more appropriate for solving dynamic decision problems under uncertainty.

Mathematical Description for UUV Autonomous Decision-Making
UUV autonomous decision-making is affected by various factors in a complex marine environment. As shown in Figure 1, it is necessary for the UUV to detect the state of the marine environment as well as its own state. In addition, the UUV must perform data fusion processing, situation threat assessment, and so on before decision-making. Environment perception, sensory information preprocessing, information fusion, and situation assessment constitute the main steps of the situation awareness system of the UUV. ese steps form the front end of the information flow. On the one hand, it is necessary to provide information input for subsequent decision-making and action; on the other hand, it is necessary to adjust the perceived needs of the information according to the results of the decision-making feedback. Credibility of situation awareness and integrity of situation information are the prerequisites for accurate UUV autonomous decisionmaking. All possible threat situations in the UUV's situation awareness system are regarded as the threat situation space S, and all possible decision-making outputs corresponding to the UUV's autonomous decision-making system are regarded as the decision-making space D. e decision-making process for the UUV is a mapping function from S to D: F: S ⟶ D. Essentially, it is a mapping from the real physical world to the world of human understanding. It reflects the understanding and decision-making ability of decision makers for the current environmental situation.
ere are two types of nonphysical inputs in the decision-making function F: personal attitudes of the decision makers (expert experience and personal preferences) and information from mission tasks. A constraint set R is defined to represent the decision-making constraints added to the decision-making function F, i.e., the task rules to be followed, the current situation of the UUV itself, and so on. erefore, acquisition of the decision-making result d requires knowledge of the environmental threat situation at the current time t, the mission S g (t), and the constraint relationship set R under the corresponding environmental situation, as shown in the following formula: (1) A large amount of realistic data and decision-making situations are expressed by mathematical relations using the function F CI (s(t), S g (T), R). e abovementioned formula includes many contents, such as the state of the system at the initial moment S c (t 0 ), the desired state S CI , the overall goal S g , certain rules R that need to be observed in the execution of tasks, and the relationship between various resources. In the process of completing the entire task, S c (t 0 ) constantly tends to S g .
In the study of situation space, the values obtained by adding two environments are not statistically significant. However, it is sometimes necessary to indirectly represent the difference between two environmental situations by calculating the difference between them. e difference is the situation increment δ S 1⟶2 , which is mainly used to represent the decision-making content from one environment to another. e relationship is as follows: e most important part of the decision-making system is the decision-making content that can change the environment significantly. e essence of decision-making is to transform the old environment situation into a new one. en, a decision-making realization function is introduced to represent the realization of decision-making. e specific formula is as follows: e abovementioned formula implies that when decision-making is applied to the environment, it will have a certain impact on the state and form a new situation. Finally, a new environment that is different from the original one is obtained. In the second half of the formula, a feedback system is formed by closely combining the situation space and decision-making so that the system can be solved efficiently and effectively. e goal of decision-making is to find a corresponding solution for each existing situation, optimize the system continuously, and gradually convert the situation into the desired result.

eory of Dynamic Influence Diagram and Model
Composition. An ID is a loop-free directed graph composed of a node set N and an arc set A, where N is divided into three subsets: a unique-valued node representing the decisionmaking target, a set of chance nodes representing certain or uncertain variables, and a set of decision-making points that represent alternatives. Furthermore, A is divided into two subsets: an associated arc set between the chance nodes and an information arc set pointing to the decision-making nodes [14].
A DID is a directed acyclic graph that can be defined as a two-group 〈B t , B ts 〉, where B t represents the influence diagram at time t, B ts � (E t , V ts ), where E t denotes the directed edges connecting B t and B t−1 and V ts denotes a set of vertices connected by E t . Furthermore, V � (D, X, O, U) is a node set of the DID, where D represents a set of decision-making variables, X represents a set of random variables, O represents a set of observed value variables, and U represents a set of utility nodes. Suppose that X � X 1 , X 2 , . . . , X n is the set of state random variables of the system at time t, O � O 1 , O 2 , . . . , O n is the set of observed random variables corresponding to the state, D � D 1 , D 2 , . . . , D n is the set of decision-making node variables, and U � U 1 , U 2 , . . . , U m is the set of utility nodes corresponding to the behavior set in each decision-making node. A DID model consists of three parts: structural strategy model, a probability model, and a utility model [15][16][17][18]. A DID of a single agent is shown in Figure 2.

DID Model of UUV Autonomous Decision-Making.
Before autonomous decision-making, situation threat assessment should be conducted for the UUV, and the decision-making behavior is selected based on the threat assessment results. A dynamic Bayesian network is used for threat evaluation of the UUV. Specifically, a threat assessment model is established for the UUV. e model is divided into three submodels: the threat assessment dynamic model for the underwater environment, the threat assessment dynamic model for the UUV's working condition, and the threat assessment model for the UUV's security status. ese submodels are shown in Figures 3-5, respectively. Table 1 shows the state set of all the nodes for the UUV's threat assessment model. ere is no dynamic variable in the threat factors of the UUV's working condition; thus, the threat assessment dynamic model and the threat assessment static model of the UUV's working condition are consistent.
Based on the above threat assessment analysis, from the perspective of autonomous decision-making, a DID model is constructed according to the security status, working conditions, and underwater environment situations of the UUV, as shown in Figure 6. In the decision-making process of the DID, the UUV selects the behavior with the greatest expected utility by reasoning and utility calculation of the threat assessment. e DID model of the UUV's autonomous decision-making includes four random nodes, namely, underwater environment, the UUV's working condition, the UUV's security status and threat level, an autonomous decision-making node, and a utility node. e factors related to autonomous decision-making and the relationship between the factors can be clearly expressed by the DID. Different decision-making effects can be distinguished by choosing different autonomous decision-making plans. As autonomous decision-making is a dynamic process, the UUV is required to continuously collect new information and make decisions to achieve maximum utility according to the set or existing marine environment information. e time range of the model covers the entire task process.

Method for Calculating Expected Utility of DID Decision-Making
DID decision-making aims to find a decision-making behavior that maximizes the expected utility. e calculation of the expected utility of the DID consists of two parts: the calculation of the probability model and the calculation of the local utility. As the state, behavior, and utility space of the system are multidimensional, it is necessary to use approximate calculation.

Approximate Calculation of Probability Model.
Suppose that the probability model of the DID satisfies the first-order Markov hypothesis; the following formula can be used: Complexity 3 e transition probability of the set of state variables P(|X t |X t−1 , D t−1 ) and the probability of an observation set P(|O t |X t ) are given. Furthermore, the probability distribution of the set of state variables is propagated by the transition model. en, the prior probability distribution of the state variables at time t is   Complexity Given the conditional prior probabilities of the observed variable set O t and the decision-making variable set D t , the posterior probability distribution of the state variable set is expressed as e union tree corresponding to the DID is defined as a two-group T � (Γ, Δ), where Γ is the set of group nodes. e two groups in Γ are connected by the group nodes in Δ. For any pair of adjacent groups C i and C j , C i ∈ Γ, C j ∈ Γ, S k ∈ Δ, S k is a split group between C i and C j , i.e., when S k � C i ∩ C j and X S K � X C i ∩ C j . By hierarchically decomposing the DID and introducing the conditions of the splitting group [19,20], the approximate probability distribution of the state variables can be obtained as  Complexity e probability model of the DID is derived using formula (7), and the approximate joint probability distribution of the probability model in each time slice is calculated. In the case of a given policy rule δ t , the expected utility at time t is

Approximate Calculation of Local Utility.
e utility function of the utility node at time t can be expressed as follows: where and the value set of decision-making node D i t is d i � a 1 i , . . . , a s i ;x k,j represents the expected value of the variable Z i,k t when the parent node set P a (Z i,k t ) of the decision-making node D i t � a j i and Z i,k t is given, i.e., en, when D i t � a j i , the utility function of U i t can be expressed as

Problem Analysis of UUV Autonomous Decision-Making.
Experts in the UUV field can draw on their experience to form a practical autonomous decision-making set and store it in the knowledge base in the form of discrete events. Autonomous decision-making for the UUV will run through the entire mission process, including navigation and operation tasks. When a UUV is performing a mission, if a threat is detected, the autonomous decision-making action for the UUV can make one of the following four decisions according to the situation of the underwater environment, security, and working condition of the platform: continue to execute, emergency dump, replanning, and selfdestruct. When continue to execute, the UUV continues to perform according to the scheduled tasks. In case of emergency dump, the UUV considers itself unable to accomplish the task; in addition, if the situation is safe, it can emergency dump, i.e., await rescue. Replanning involves multifaceted sensing information for the UUV, which considers that it cannot complete the task. Power failure and restart measures are adopted to achieve replanning. Selfdestruct involves comprehensive information for the UUV.
Considering that the UUV is under a serious threat and unable to complete or replan its tasks, self-destruct is initiated to prevent capture.

Simulation Analysis of UUV Autonomous Decision-
Making. e task graph for the UUV is shown in Figure 7. e UUV starts from the starting point and enters the free sea area. Mission areas A and B are in the dangerous sea area. After completing the mission, the UUV returns to the end point. Suppose that the UUV successfully completes the survey of the sea area in mission area A. e survey follows a comb-shaped trajectory along the east-west direction. At a point of the planned trajectory in mission area B (the dot on the track), the UUV detects an actuator fault. Currently, the UUV has been working underwater for 10 h continuously. e remaining power capacity is 30%, the current is weak, the seawater density is moderate, and there is no other abnormal situation. e UUV is far from minefields or underwater acoustic stations and close to the mother ship. In the complex marine environment, autonomous decision-making for the UUV is affected by many factors. Before independent decision-making, it is necessary to grasp the current environmental situation and working conditions of the UUV and then perform probability estimation (Figures 3-5) based on the dynamic Bayesian model. e conditional probability or posterior probability value of each node at this time is shown in Tables 2-4. e input information obtained from simulation conditions is shown in Table 5. Based on the threat situation estimation of the Bayesian network, the current UUV marine environment situation, working conditions, and security status are shown in Table 6. e conditional probabilities and transition probabilities in the experiment are the parameters of the threat assessment dynamic Bayesian network model for the UUV, which were determined on the basis of expert knowledge. Although the conditional probabilities and transition probabilities determined by expert knowledge were subjective, it was easy to provide the constraint relationship between the nodes in the network. Table 2 shows the conditional probabilities of the threat assessment model for the underwater environment. For example, in Table 2, when UOD � {0-50 m}, UOMT � {dy-namic}, OD � {violent}, and OC � {high}, according to expert knowledge, the threat degree of the underwater environment was determined as follows: UETL � {safe, dangerous} � {0.2, 0.8}, where 0.2 represents the probability that the underwater environment was safe and 0.8 represents the probability that the underwater environment was dangerous. In Table 2, when UOD � {50-130m}, UOMT � {dynamic}, OD � {violent}, and OC � {high}, according to expert knowledge, the threat degree of the underwater environment was determined as follows: UETL � {safe, dangerous} � {0.5, 0.5}, with the distance of the underwater target changing from {0-50 m} to {50-130 m}. is indicates that the farther the target is from the UUV, the greater is the probability that the underwater environment is safe.
In the simulation experiment, the perceptive information of the current situation was regarded as the input evidence of the UUV threat assessment dynamic Bayesian network model (Figures 3-5). Subsequently, the model evidence was updated, and the UVV marine environment situation as well as the working status and safety status table (Table 6) was obtained through the inference of the dynamic Bayesian network. e autonomous decision-making set includes continue to execute, emergency dump, replanning, and selfdestruct. In this paper, TL represents the threat level, UETL represents the underwater environment, PWC represents the UUV's working condition, PS represents the UUV's safety condition, DE represents autonomous decisionmaking, and UT represents utility. ere are four possible behaviors for the autonomous decision-making nodes: DE � 0 for "continue to execute," DE � 1 for "emergency dump," DE � 2 for "replanning," and DE � 3 for "selfdestruct." e utility nodes are related to task completion and the UUV's loss conditions. e value of a utility node is determined by the UUV expert or operator according to the expected goal of the mission, i.e., the operator should first give the utility value for completing the task and the UUV's loss utility value. ese utility values reflect the purpose and value of the mission as well as the operator's attitude to the risk of the mission and personal subjective factors. Because the node states and decision-making behavior are discrete, the table describes the utility value (such as CPT). According to the task situation and based on the expert's recommendations, the mission utility value U (d, o) is given in Table 7. e formula for calculating the expected utility is as follows: where EU(DE � i) represents the expected utility value of the i th decision, p j (OP i ) represents the probability of the i th decision, and U(DE , OP i ) represents the initial utility value corresponding to the i th decision. From the DID model and Table 6, formula (12) gives  Table  3: Partial conditional probability for the UUV working status nodes.  Table  4: Partial conditional probability for the UUV security status nodes.
e maximum expected utility is −1.027, which corresponds to the autonomous decision-making result of "emergency dump." e same autonomous decision-making result is obtained by using the autonomous decision-making criterion of maximum expected utility. us, the UUV is in a relatively safe situation. After emergency dump, the UUV can be recovered by satellite communication, radio communication, etc., and it can then be returned to the factory for repair.
In the simulation, the current conditions for the UUV to make the decision-making of "emergency dump" are as follows: the steering gear fails, the energy is insufficient, other detection information shows no other abnormal conditions, and it is far away from the minefield and underwater sound station and closer to the mother ship. Under these conditions, the expert or the operator will also make the mentioned decision-making. If "continue to execute" is selected, the task could not be continued due to the steering gear failure and insufficient energy. If "replanning" is selected, "replanning" could not be performed unless the steering gear failure is solved. Since the current environmental situation is safe and there is no interception, the "selfdestruct" decision-making could not be chosen. Trigger conditions of decision-making schemes are shown in Table 8.
It could be concluded from the above analysis that when the conditions are met (combined with the abovementioned evaluation results), the decision-making result of "emergency dump" is triggered. erefore, the UUV completes the subsequent operations according to this decision-making result.

UUV Autonomous Decision-Making Analysis Based on the Expert System.
e block diagram of the UUV expert-based autonomous decision system is shown in Figure 8.
As shown in Figure 8, the expert system includes a knowledge rule base, an inference engine, and an interpreter. e knowledge rule base uses production rules to represent the empirical knowledge of domain experts. e basic format of a production rule is as follows: IF (Condition 1) AND (Condition 2) ... AND (Condition n), THEN (Conclusion). e inference engine arrives at conclusions according to the rules and drives the control based on the situation assessment results. If the premise of a rule is true, this rule is selected, the inference is performed, and the associated conclusions are made. e interpreter is used to display the inference results for the decision makers and to provide a basis for the decision information.
In the simulation experiments, a UUV senses the situation information and then obtains the evaluation results through the situation threat assessment module. As shown in Table 5, the UUV threat level is P (TL) � {safe, dan-gerous} � {0.2262, 0.7738}. Indeed, the UUV safety is threatened due to the shortage of energy. So, the surrounding marine environment situation is assessed to activate the rules and draw conclusions. e input conditions and output results of the simulation are shown in Tables 9 and 10. e simulation is carried on from the time t to the time t + 5, where the DID system and an expert system are used for dynamic decision analysis, respectively. e decision input and output are shown in Table 11. e decision progress is shown in Figure 9. It can be seen from Tables 9 and 10 that an expert system can also be used to obtain the corresponding decision results. As shown in Table 11, decision outputs from the DID and expert systems are the same when the energy balance is sufficient. Otherwise, when the energy balance is insufficient, the DID decision output is to "continue to execute," while the expert system decision output is to "stop missions," "emergency dump," and "wait for rescue." Even more, when the energy balance is seriously insufficient, the DID decision output is to "stop missions," "emergency dump," and "wait for rescue." Since the rule base of the expert system is predefined, the expert system rules will not adapt to the dynamic changes in the environmental situation. So, for dynamic decisions, the expert system is not as good as the DID decision system.
As autonomous decision-making is a process that changes with the state of the marine environment, it is necessary to evaluate the state of the marine environment in real time and make the most favorable decision in order to achieve the   maximum expected utility. Moreover, the state information in the DID must be updated in real time to accurately express the current state of the marine environment so that the optimal autonomous decision-making plan can be selected. Using the DID model to express decision-making problems, we can visualize the conditional independence and correlation between random variables in the probability model. Only the quantitative information of the situation estimation output needs to be considered in the calculation. e simulation results demonstrate that solving the UUV autonomous e system senses an abnormal state or receives a task update instruction and has a specific energy margin. e system senses an abnormal state or receives the instruction of "emergency ascent," or the energy margin is too low.
e system senses an abnormal state or receives the instruction of "selfdestruct." Required quantitative index 0.7 ≤ P (TL � safe) ≤ 1 and 0.7 ≤ P (EM � enough)≤1 0.5 ≤ P (TL � safe) and 0. 3     decision problem using dynamic influence diagrams can be more appropriate than using an expert system.

Conclusions
In the future, the marine environment will become increasingly intelligent and informative. erefore, it is crucial to improve the intelligence level of UUVs so that they can make decisions independently. Accordingly, studying autonomous decision-making technologies for UUVs is of great significance. In this paper, we solve the dynamic autonomous decision-making problem of UUVs on the basis of the dynamic influence diagram and the utility theory. e dynamic influence diagram can represent complicated relationships between multiple variables. By combining the utility theory of the decision tree, the problem of inference of uncertainty can be solved, which is beneficial for achieving the optimal dynamic decision-making plan for UUVs. Compared with the expert systems, which has poor adaptability to dynamic decisionmaking, the proposed DID method is highly adaptable and more appropriate for UUV dynamic decision-making.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Self-destruct t + 1 t + 2 t + 3 t + 4 t + 5 t + 6 t + 7 t + 8 t DID Expert systems