Bayesian-Based Search Decision Framework and Search Strategy Analysis in Probabilistic Search

In this paper, a sequence decision framework based on the Bayesian search is proposed to solve the problem of using an autonomous system to search for the missing target in an unknown environment. In the task, search cost and search efficiency are two competing requirements because they are closely related to the search task. Especially in the actual search task, the sensor assembled by the searcher is not perfect, so an effective search strategy is needed to guide the search agent to perform the task. Meanwhile, the decision-making method is crucial for the search agent. If the search agent fully trusts the feedback information of the sensor, the search task will end when the target is “detected” for the first time, which means it must take the risk of founding a wrong target. Conversely, if the search agent does not trust the feedback information of the sensor, it will most likely miss the real target, which will waste a lot of search resources and time. Based on the existing work, this paper proposes two search strategies and an improved algorithm. Compared with other search methods, the proposed strategies greatly improve the efficiency of unmanned search. Finally, the numerical simulations are provided to demonstrate the effectiveness of the search strategies.


Introduction
Unmanned search and rescue is a highly autonomous task and there are many cases of such spatial search problems [1][2][3], such as resource exploration, sea fishing, border patrols, search fugitive, and troubleshooting. Integrated high-efficiency mobile processor platform, effective sensor, and data fusion algorithms make possible the implementation of these highly autonomous tasks [4][5][6]. In the above tasks, probabilistic information is often used to describe the likelihood that the target in a different location. However, due to the limitation of sensor accuracy and complex external interference, search agents cannot always obtain the correct information; although the search agent can update the status of a target location by collecting and processing incomplete observations, an appropriate search strategy is still needed to guide it when and where to detect [7,8]. Besides, time is one of the key factors in search tasks, especially in rescue tasks or disaster management. As time goes by and the external uncertain interference, the position of the target will become more and more uncertain, which greatly increases the difficulty of the task. erefore, there is an urgent need for a general framework which can integrate the probability characteristics of the search area and deal with erroneous observations. In order to solve the above problem, this paper proposes a Bayesian-based search decision framework and two adaptive strategies to guide the search agents to find a static target in an unknown place as soon as possible [9,10]. A brief summary of prior literature: the classical search theory was introduced by Koopman during World War II [11], which focuses on using aircraft and warships to find enemy submarines in the shortest possible time. After that, the search theory has been extensively generalized by Stone [12].
In recent years, many researchers have treated search problems as a decision-making problem rather than an information collection task [13][14][15][16][17][18]. In decision theory, the search problem is considered as a decision between the current state of knowledge and the hypothesis of the decision-maker. Focusing on the problem of how to manage mobile agent to search and track multiple static targets, a perception-based decision was developed for the static objects in [19]; although this method can guarantee tracking the state of the target in a short time, it still lacks the analysis of decision evolution. In order to compare the impact of different strategies on the search process, a Bayesian-based search framework is proposed in [20], which provides a platform for comparison of search methods. In addition, inspired by run et al., the probabilistic approach [21] has arisen in the robotics community, the core of probabilistic robots is the idea of estimating states from sensor data, and the probability mass function (PMF) is used to represent the search agent's understanding of the environment in [22].
Many early studies are based on the assumption that the sensor has no false positives, including [23]. Although some scholars are devoted to solving the problems of false positives and false positives in the search task [24,25], they all assume that the cells in the search area are independent; this assumption leads to the inability to integrate the relevant information of the search into the search plan in time [26]. Aiming at the scenario of using a drone cluster to find targets in a hazardous environment, a collaborative search strategy for drones is proposed in [27], which instructs searchers to gradually move from one unit to the next to ensure that the search area is covered. e influence of the heuristic information on search agents was studied in [28]; Lanillos et al. compared the search strategy with heuristic information and the search strategy without heuristic information. e results show that the search strategy with the heuristic information can effectively avoid the search agent falling into the local optimal position. At the same time, some novel search strategies (i.e., random jump search, snapshot search, and drosophila-inspired search) were proposed and discussed in [22], but there is a lack of motion restrictions on the search agent when analyzing these strategies. Furthermore, relevant search strategies were divided into two categories in [29]. One type of search strategy is called the nonadaptive search strategy; it does not consider reoptimizing the search path but only consider collecting information. Another type of search strategy is called the adaptive search strategy, which updates the search path through the feedback of the current search information, which greatly improves the search performance. In order to prevent the collision between robots in the process of searching, a new distributed covering method of the mobile deformable convex region is proposed in [30]. e concept of the minimum expected time was proposed in [31], it is used to indicate the time required to complete the search task.
Although there have been many notable achievements in search theory [32,33], there is still room for improvement; for example, the search plan was optimized using the cumulative detection probability in [23], but the sensor's false positive error has not been considered in the search process. In addition, when the search agent needs to check some places far away from itself, it needs some path planning algorithms to guide it; this process can consider using Dijkstra's algorithm [34], A * algorithm [35], or rapidly exploring random tree algorithm [36].
Contribution of this paper: based on the previous work, a sequential decision-making search framework and two adaptive search strategies are proposed in this paper. e main difficulty of this paper is that the search agent can only move a fixed distance at a time, and the sensor is not perfect. Compared with other works, the main contributions of this article are concentrated in the following four aspects: (i) In the search process, not only the movement ability of the motion agent but also various errors of the sensor are considered. (ii) In this paper, the evolution expression of sequence decision is derived, and a Bayesian-based search decision framework is proposed to deal with the incomplete information detected by the search agent. (iii) e evolution of the search decision is analyzed quantitatively from the mathematical expression, two key factors affecting the decision are obtained, and two effective adaptive search strategies are proposed according to the characteristics of these two factors. (iv) A repetitive detection mechanism is proposed to deal with imperfect observations of sensors, which saves search resources to a certain extent and prevents search agents from falling into a locally optimal position.
Organization: the remainder of this paper is organized as follows. In Section 2, the knowledge of the search problem is introduced and a Bayesian-based search decision framework is proposed. rough the analysis of decision evolution, two effective adaptive search strategies are proposed and analyzed in Section 3. e numerical simulation results are presented in Section 4. Section 5 concludes the paper with closing remarks and avenues of future research.

Problem Formulation
In this section, the preliminary knowledge of the search problem and search decision-making framework are presented. In the search decision-making framework, the uncertain state of the target is expressed as a PMF, the search agent combines the new information with the prior information in the form of probability and updates the knowledge state with Bayesian rules to form a new posterior PMF.

Search Area.
Consider an immobile object x T lost in region A, the search area A can be divided into |A| disjoint grid cells. Figure 1 shows the grid division of a square area. It is important to note that x T is in the discrete grid and not on the grid boundary. x T ∈ A means the target x T exists in this area A. Conversely, x T ∈A indicates that the target is not in the area. Hence, we can use a Bernoulli random variable H to indicate whether the target x T is really in the region A: Furthermore, use the variable x a th to indicate whether the target is in the x a th cell. If x a th � 1, it means that the target x T 2 Scientific Programming is in the x a th cell; on the contrary, x a th � 0 means that the target is not here: In the search process, the information detected by the sensor may be incorrect due to the false positive or false positive errors, and the d a (t) ∈ 0, 1 { } is used to represent the detection result of the search agent in the a th grid at the time t; for convenience, a th is abbreviated as a.
erefore, imperfect sensors can be modeled as follows: where the error probabilities α and β quantify the characteristics of the imperfect sensing capabilities, which can be determined by experiment or sensor specifications. Note that the condition α + β ≤ 1 must be set here; otherwise, the search agent will not be able to get valid information.

Search Agent Motion Model.
Due to the limited speed of the search agent and the incomplete sensing function, each cell can only be detected one by one. Figure 2 shows a search graph where each cell is connected to all adjacent cells, and the search agent can move between two vertices that are connected. When the search agent determines that a certain cell is a possible target location, it will use the Dijkstra algorithm to build the shortest path.

Bayesian Update of the Belief Map.
e Bayesian approach provides an effective way to maintain and update all the quantitative and qualitative information related to search [23]. In the search task, the search agent marks a corresponding probability value where the target may appear. en, the search agent collects a set of observation sequences D(t) � d 1 , . . . , d t ; through this imperfect detection information, it can have a deeper understanding of the real state of the target. At the same time, the B(t) is used to represent the aggregate belief, which is defined as When the search task is launched, the search agent will have an initial aggregate belief (B(0) � δ, 0 < δ ≤ 1); the initial aggregation belief is usually given by experience. In the belief map, each cell contains a confidence value, which represents the probability of target x T in it. e recursive Bayesian approach provides a simple but effective way for updating the belief map; after the search agent obtains the detection result, we use it to update the belief map. e first step involves a simple application of the Bayesian rule to the individual cell belief: where the numerator term can be regarded as a detector model and the Pr(x T � a|D(t − 1)) is the belief at the last moment of the a th cell, which provides a recursive term for the recursive Bayesian method. By Markov's assumption of where Pr(d a (t)|D(t − 1)) is the marginal distribution measured by the sensor; it can be computed by × P r (H|D(t − 1)) .

(7)
After some algebraic manipulations, we can get the final recursive expression as follows: is not within the threshold ranges, the search agent will make a decision and terminate the task. More specifically, if the condition B(t) > B is satisfied, the search agent terminates the task, finds the cell with the highest confidence in the belief map, and determines that the target is in the cell. Conversely, if the condition B(t) < B is satisfied, the search agent determines that the target is not in the area.

Search Strategy Analysis
In this section, we study two key factors that affect the success of the search task and propose two adaptive search strategies according to the characteristics of the two factors.

Decision eory Analysis.
First, the search agent collects a series of incompletely correct observations D(t) after tmoment; therefore, we can quantify the process of decision evolution over time as follows: In order to simplify the expression, some intermediate functions are defined as follows: e probability that the search agent detects "1" in the current cell is represented by f(d a (t)); it includes the situation when the sensor has a false positive error. At the same time, g(d a (t)) represents the searcher agent detects "0" in the current cell; it also includes the situation when the sensor has a false negative error.
Hence, equation (8) can be rewritten as where }, so we can get a new closed expression to update the individual belief: where Finally, through equations (11) and (13) to calculate a one-step change in the belief map, we can find that B(t) is controlled by the following two factors: (i) Pr(x a (t) � 1 | D(t − 1)) indicated the search agent should try to reach the higher belief grid at the next step (ii) d a (t) indicated the detected result at the next step should be positive, which also includes the false positive error Due to the bounded speed of the robots in practical applications, the search agent cannot reach the cell with the highest belief immediately. Not only this, but also the above two conditions cannot always be compatible. Hence, according to the characteristics of these two factors, two different search strategies are proposed here. e first search strategy is called "myopic strategy" because the search agent always selects the cell with the highest confidence value in the single-step reachable cells as the location for the next detection. In the second strategy, the search agent always pays attention to the cell with the largest belief value in the map, so it needs to scan the belief distribution of the entire search region, which is called the "saccadic strategy."

Myopic Search Strategy.
Once the search agent adopts the myopic strategy, it checks the belief values of all the cells around it that can be reached in one step. In this process, the searcher ties to maximize B(t) in every step. Figure 3 intuitively reflects the nature of the strategy; for convenience, the unrelated cells in the belief map set as "0." When the search agent is at (7,4), the belief value around (8,5) is the largest, so it will go there in the next step. Furthermore, the pseudocode when the search agent adopts the myopic strategy is given in Algorithm 1.

Saccadic Search Strategy.
In the search area, the cell on the belief map with the largest belief value is critical to the searcher and should be checked as soon as possible, so the search agent needs to build the shortest path from the current location to there. In order to visualize the strategy, the belief value of the unrelated point on the belief map is set as "0" in Figure 4. Assuming the search agent at (7,4) and the belief value at (9, 10) are the largest, so it will construct the shortest path to there by the Dijkstra algorithm. But once the peak has changed during the update process, the search agent will cancel the original plan and rebuild a new path. Similarly, the pseudocode when the search agent adopts the saccadic strategy is given in Algorithm 2. Among them, x c represents the current location of the search agent, x d represents the current destination that needs to go, and P is an array containing the cells that need to pass from x c to x d .

Repeat Detection
Mechanism. In addition, in order to avoid the unreasonable behavior of the search strategy in the following, we also propose a repeated detection mechanism. e mechanism can be divided into the following three steps: (1) When the search agent initializes the belief map, an expected detection map is simultaneously initialized; as shown in Figure 5, the expected value is defined as "1" only in the position with the highest belief, while the others are defined as "0." (2) Once the detected result detected in the cell is different from the corresponding cell in the expected detection map, the search agent will check the cell repeatedly until it is identical. (3) e expected detection map is the same as the belief map, which is always updated in each step. In addition, the update process for the detection map is similar to step "1."

Results and Discussion
In this section, the performance parameters of the search strategy are obtained by Monte Carlo simulation. In addition, the minimum expected time to detection E[TTD] (includes the average simulation steps for a search task completion E[TTD] s , the average CPU time for a search task completion E[TTD] c ) and the accuracy P are used as the index for evaluating the performance of strategies. In the simulation of this section, the computer configuration and software are CPU i5 8250U, 1.6 GHz, 8 G Ram, MATLAB 2018b.

Search Environment.
Consider that an object without moving ability is lost in a 10 × 10 block area A. As shown in Figure 6, the initial probability distribution is modeled as a  Scientific Programming discrete approximation of Gaussian distribution, and the initial belief B(0) � 0.5; the search agent starts at (1, 1) and, equipped with an imperfect sensor, the sensor parameters are set as false alarm probability α � 0.2, false negative probability β � 0.2, and search decision threshold set as B � 0.9.

Performance Comparison.
To test the performance of the search strategies proposed in this paper, three different strategies (sweeping strategy [27], random jump strategy [20], and snapshot strategy [37]) are selected and compared.
All the strategies are tested 10,000 times; E[TTD] s , E[TTD] c , and P are shown in Table 1. Not only is the myopic strategy simple to calculate, but also it has a relatively short CPU time; although the saccadic strategy has the highest accuracy, it needs strong computing power because it involves the shortest path planning in every step. In addition, Figures 7-11 show the path of the search agent under the five different search strategies, respectively. e trajectory of the random jumping strategy is too complicated, so only a part of the trajectory is shown in Figure 10. a th+1 � P(1) (10) else (11) if e (x d , y d ) did not change then (12) a th+1 � P(i) (13) else (14) Rebuild the path P (15) a th+1 � P(1)  x y x y   In addition, the aggregate belief B(t) evolution of the five strategies is shown in Figure 12 and the myopic strategy and the saccadic strategy can quickly reach the decision threshold (within 100 steps, which are similar to the snapshot strategy) because they can use the Bayesian method to continuously collect new information to update the belief map, thus saving a lot of search resources. However, the sweeping strategy and the random jump strategy cannot use Scientific Programming the prior information to guide the searcher's behavior, resulting in the fact that the search task needs more than 600 steps. Furthermore, the information entropy is used to quantify the uncertainty of the search area. As can be seen from Figure 13, the myopic strategy and saccadic strategy can reduce the uncertainty of an unknown environment  Scientific Programming more quickly than the random jump strategy and sweeping strategy too. Due to the characteristics of the sweeping strategy, once the search agent misses the target, it will only reach the target position in the next traversal, which also results in the fact that its entropy is not reduced uniformly.

Search Strategy Analysis.
According to the experimental data in Table 1, we find E[TTD s ] and P of the myopic strategy and saccadic strategy are very close. Hence, we further studied how the two strategies influence the search agent.

Searcher with Imperfect Sensor.
rough field tests, we find that the saccadic strategy has unreasonable behavior at some time, but the myopic strategy does not exist. Hence, a set of representative test data is selected to demonstrate the irrational behavior. e unreasonable behavior of the search agent is shown in Figure 14, where the number in the cell represents the order of the cells detected by the search agent.
Because of the characteristics of the myopic strategy itself, it is inherently equipped with the repeat detection mechanism. After deploying the mechanism on the saccadic strategy, the performances of the improved saccadic strategy and saccadic strategy are compared again through 10,000 experiments: E[TTD s ] saccadic ≈ 66 steps, whereas the improved saccadic strategy yields E[TTD s ] improvedsaccadic ≈ 64 steps. Compared with the saccadic strategy, the improved saccadic strategy has no significant performance improvement, but the phenomenon is slightly caused by α and β. en, we checked the effects of α and β for the decision, comparing these strategies for each set of parameters (10,000 tests per set of parameters). Relevant experimental statistics are shown in Tables 2 and 3, from which we can see that with the increase of α or β, the performance of the improved saccadic strategy has an obvious advantage.

Multiple Scenarios.
e above experimental data shows that the myopic strategy performs so well that it can even compete with the improved saccadic strategy. In order to test the effect of the prior distribution on strategy, we have carried out a series of experiments on myopic strategy, saccadic strategy, and improved saccadic strategy under different prior belief maps.
First of all, different prior distributions are formed by varying degrees of disturbance. Figure 15 shows that heuristic information for the myopic strategy provides better robustness, as evidenced by the search agent which can easily correct the "bad initial belief map." Furthermore, the improved saccadic strategy is also robust by introducing the repeat detection mechanism. For the myopic strategy, the search agent is not guided to a local peak when it is far from the global peak; although the saccadic strategy provides a better precision for the search agent, the downside is that the performance is heavily dependent on the initial belief distribution. e performance of these three strategies in different situations is tested, and the relevant statistics are shown in Table 4. By comparing with other search methods and    testing in different scenarios, the statistical data shows that the proposed search decision framework and adaptive search strategies have better performance. It also can be seen from the data in Tables 2 and 3 that the detection mechanism proposed in this paper solves the unreasonable behavior caused by false alarms and false negatives of the sensor to a certain extent.

Conclusion and Future Work
is work studies the search problem when the sensor is incomplete and constrained by motion. At the same time, a Bayesian-based decision search framework, two adaptive search strategies, and a repeat detection mechanism are proposed. Compared with other works, the scheme proposed in this paper greatly improves the search time and the success rate of search tasks.
Future research will consider using distributed heterogeneous agents to search dynamic targets or targets with avoidance ability. In this case, information fusion is very important, such as how to fuse two or more different initial confidence graphs and how to update data between heterogeneous search agents. If we can coordinate the control of heterogeneous search agents and reasonably allocate search resources, team search will greatly improve the search efficiency.

Data Availability
All data used to support the study are included within the manuscript.

Conflicts of Interest
e authors declare no conflicts of interest. Scenario a: the search agent starts at the same location. Scenario b: the search agent starts from a local peak in a priori graph with multiple peaks. Scenario c: the search agent starts from the grid with the lowest confidence in the initial belief map.
14 Scientific Programming