Exploration Enhanced RPSO for Collaborative Multitarget Searching of Robotic Swarms

Particle Swarm Optimization (PSO) is an excellent population-based optimization algorithm. Meanwhile, because of its in-spiration source and the velocity update feature, it is also widely used in the collaborative searching tasks for swarm robotics. One of the PSO-based models for robotic swarm searching tasks is Robotic PSO (RPSO). It adds additional items for obstacle avoidance into standard PSO and has been applied to many single-target search tasks. However, due to PSO’s global optimization characteristics, it is easy to converge to a speciﬁc position in the search space and lose the ability to explore further. When faced with the problem of multitarget searching, it may become ineﬃcient or even invalid. This paper proposes an Exploration Enhanced Robotic PSO (E2RPSO) method for multitarget searching problems for robotic swarms. The proposed method modiﬁes the third item in the RPSO as an additional attraction term. This item not only enables the robot to avoid collisions but also guides the swarm to search unexplored regions as much as possible. This operation increases the swarm’s task-speciﬁc (top-down) diversity, making the system cover a broader search area and avoid falling into local optimums. Besides, the aggregation degree and evolution speed factors are also included in determining the inertia weight of the proposed method, which adjusts the swarm’s internal (bottom-up) diversity dynamically. The comparison results show that this method can balance the relationship between exploration and exploitation well, which has the potential to be applied to multitarget searching scenarios.


Introduction
As a physical system simulating the collaboration of swarm behaviors in biology, swarm robotics has been proved to have significant potential in military or civil applications, such as exploration of virgin territories [1], search and rescue [2], and cooperative transportations [3]. It aims to achieve emergent collective behaviors through the limited perception ability of swarm members and simple interactions with surroundings or teammates. e robot members in the system generally have the characteristics of low complexity [4]. At the system level, it uses redundancy to ensure robustness with excellent scalability and economy. is field has become a popular aspect of multirobot systems in recent years.
Swarm robotics is an entity realization of swarm intelligence in the domain of robotics. Many works apply the optimal searching ability of swarm intelligence (SI) algorithms to the cooperative search applications of robotic swarms [5]. In particular, the Particle Swarm Optimization (PSO) algorithm, as a prevalent optimization method, has been widely used [6,7]. It can be traced to the work of Doctor et al. [8], in which the PSO algorithm is applied to the collaborative search application of swarm robotics for the first time. After that, many improvements have been proposed to improve the search performance in different aspects. For example, the application of PSO in a search and rescue task is simulated to locate a target using a robotic swarm [9]. An odor search task to determine the possible pollutants in the environment to be searched is also developed based on PSO [10].
It should be noted that the PSO algorithm can be applied to robotic applications in two ways. e first one is that the PSO algorithm can solve filtering problems or optimize parameters for control [11,12]. e other is that each particle in the algorithm can be mapped to a member robot in a swarm. e heuristic collaborative search can then be realized by imitating the algorithm's principles under the considerations of physical constraints, as shown in Table 1.
In particular, the Robotic PSO (RPSO) introduces an additional term into the standard PSO algorithm for obstacle avoidance [13]. In the same article, the authors also introduced another improved version of PSO (Darwinian PSO, DPSO) into the field of robots, known as RDPSO. It further adds an elimination mechanism based on Darwin's natural selection principles to increase swarm diversity. e RPSO and the RDPSO have been applied to some collaborative target search scenarios of swarm robotics [14,15]. However, the additional term in those methods is only used for avoiding obstacles. e balance of exploration and exploitation still depends on the dynamic inertia weight, individual cognition (personal best), and social cognition (global best). e multitarget search problem requires the swarm to find as many targets as possible in the environment to be searched. ese two methods are still suffering from premature convergence, i.e., not able to locate all the targets in an area. It is necessary to improve further the exploration ability of the swarm to meet the multitarget searching tasks.
is paper proposes a new robotic PSO-based approach named Exploration-Enhanced RPSO (E2RPSO).
is method further improves the swarm diversity by adding an exploration enhancing operation, which is more suitable for multitarget search scenarios. e additional term in the original RPSO method is modified to maintain continuous searching states. Meanwhile, the adaptive dynamic inertia weight is still reserved to ensure the original performance.
e simulation results show that the method proposed in this paper can find more targets than other PSO based approaches; i.e., it has better multitarget search performances. e rest of this paper is organized as follows: Section 2 introduces related works. Section 3 states the multitarget searching problem and the assumptions, as well as the proposed E2RPSO method. e comparison results and discussions are presented in Sections 4 and 5. e conclusion is reached in Section 6.

PSO in Swarm Robotics.
e PSO algorithm is derived from the study of the cooperative foraging behavior of birds. Its basic idea is to let a swarm of particles move in a solution space find a relatively optimal solution. e movements of the individuals are according to the individual information and shared knowledge from others. Each particle in the swarm has a position vector and a velocity vector. e fitness value of the current position is calculated according to the predefined objective function. e velocities and the positions of the particles will be updated iteratively. After a certain number of iterations, the particles will gradually tend to an optimal solution. Let the position vector of the particle i in an N-dimensional space be , the most popular version of the PSO algorithm is written as follows [33]: where V t i and X t i are velocity and position vectors of particle i at time t, respectively, w t is the dynamical inertia weight, pb i is the personal best position of particle i, gb is the global best position of the swarm, c 1 , c 2 are personal cognition and social congnition factors, respectively, and r 1 , r 2 are random values in [0, 1]. Other parameters of PSO include population size n, maximum iteration number G max , maximum velocity V max , and so forth. It should be noted that the original version of the PSO algorithm does not have the inertia weight w t , which weakens the ability of local search. e introduction of inertia weight can dynamically adjust the weight according to the number of iterations or the swarm's current state in various versions of improved PSO algorithms.
us the algorithm can balance the local search and global search ability, further enhancing the search efficiency and preventing the algorithm from falling into the local optimum prematurely. e RPSO was proposed by taking obstacle avoidance into account for member robots [13]. Based on the original PSO algorithm, an additional item for obstacle avoidance is added; i.e., the first equation in (1) is changed to where po t i is the position of the anticollision relevant virtual attractive point, c 3 is the sensitivity of anticollision, and r 3 is Similarly, to avoid the premature problem, i.e., the robots getting stuck at the local optimum positions, some subsequent works were proposed to deal with this drawback. For example, some work combines the Darwinian PSO (DPSO) strategy, which adopts the nature selection principles to increase swarm diversity. Furthermore, to make them more physically implementable, the punishment and reward in DPSO were changed to the social exclusion and social inclusion operations of member robots. is method is called RDPSO [13]. e PSO algorithm's adaptive inertia weight is introduced into the RPSO method as well, such as Adaptive RPSO (A-RPSO) [14]. Based on the A-RPSO, some more work was proposed considering practical constraints such as relative localization, local sensing, limited communication, and the kinematic constraints [34]. Another work called ML-PSO combines a local search strategy with a modified PSO to achieve searching tasks that were presented in [36].

Multitarget Searching Problem.
Generally, collaborative searching in swarm robotics can be divided into two main scenarios, depending on the number of targets to be searched: single target and multiple targets [5,37]. For single target searching, the focus is on fusing the information collected by sensors equipped on robots and finally improving the accuracy of target position estimation. e multitargets scenario can be viewed as an extension of the single target case, where more uncertainties need to be considered [38]. For example, the number of targets may be unknown or even vary with time; the targets to be searched may be static or dynamic, etc. us the problem is how to design and implement a collaboration mechanism among the swarm members to find as many targets as possible efficiently under specific conditions.
To sum up, the PSO method is widely used in the collaborative search of robotic swarms. However, different from the swarm intelligence algorithm, it needs to consider more physical limitations of environments and member robots when mapping to swarm robotics. Moreover, when faced with multitarget search problems, task-related diversity must be added to improve the efficiency of regional coverage and find as many targets as possible. e contribution of the proposed method is to add this task-related diversity into the traditional RPSO for multitarget searching problems. e additional item in traditional RPSO is modified to have the ability of both anticollision and enhancing exploration. Details are as follows.

Assumptions.
e following assumptions are made for the member robots, targets, and environments to make the research more practical.
(1) Robots. e mobile robot in this work has a holonomic model; i.e., it can move in any direction. e member robots have global positioning ability and know the boundary of the space to be searched, but the internal information of the search space is unknown. It can transmit information to the whole group through certain communication interactions. ere are no central control and the global leader in the swarm.
(2) Targets. e targets will be treated as radio-beacons, which emit the nondirectional signals. e signals can affect the searching space with attenuation. e signal strength decreases as the distance increases. e signals broadcast by each target are indistinguishable and can be superimposed. e total number of targets is unknown; i.e., the swarm system needs to find as many targets as possible.
(3) Environments. e robotic swarm system will work in a 2D environment with obstacles; i.e., the member robots need to keep away from other members and the obstacles during the searching process. (4) Fitness evaluation. e robot's fitness value at a certain location is the superposition of the strength of signals that can be sensed at that position. (5) Target Handling. When the distance to a target is less than a certain value, the robot can recognize the target and handle it. Once the target is reached, its signal is no longer broadcast. Also, we assume that the robot that reaches the target first will stay at that position to handle the target, no longer participating in follow-up group actions. If more than one robot reaches the target position simultaneously, they will all stay there.

e Proposed Method.
e velocity update strategy of the proposed E2RPSO can be represented as follows: where the w, pb, gb are defined as same as the traditional PSO algorithm, pa t i is an additional attractive position. In the original RPSO approach, this item is used only for obstacle avoidance.
e proposed approach uses it not only for anticollision with obstacles or other members but also for enhancing the exploration ability. e determination of the pa, c 3 , and the dynamic inertia weight w t are as follows.

Additional Attractive Positions.
e additional attractive position can be determined by where dn represents the distance to the nearest obstacle or swarm-mate, d s is the predefined safe distance, po is the position determined by the requirements of anticollision, pu is the exploration enhancement attractive position. In particular, po will be related to the range and bearing of the surrounding members or obstacles. Once a member robot detects obstacles or other members in a certain range, a virtual attraction point will be formed to attract the robot to avoid a possible collision, which can be written as follows: where θ sol is the solution direction determined by obstacle avoidance techniques such as a modified polar vector field histogram (VFH) presented in [39]. To enhance the exploration ability, pu can be determined by the farthest unexplored region in the searching area, that is, where P i u is a set of locations in the searching area that the robot i has not visited.

Determination of c 3 .
Unlike that c 1 and c 2 are constants, here, c 3 is a variable that will be affected by the following listed different situations: (1) Collision Avoidance. When a robot in the swarm is under the state of obstacle avoidance or anticollision, it is better to make the robot get rid of the collision danger as soon as possible. At this time, only the requirements of collision avoidance should be considered. erefore, c 3 should be set far greater than c 1 and c 2 , i.e., c 3 ≫ max c 1 , c 2 . (2) Improving Efficiency. e value of c 3 in this paper will be determined according to the number of times a member robot repeatedly visits a certain area, denoted as m. At the beginning of the searching process, because most regions are not explored, the strategy should be more dependent on individual cognitive and social cognition coefficients, i.e., c 1 and c 2 . With the execution of the search task, robots in the swarm may repeatedly access some regions that have been visited. Here we introduced a threshold σ. When the number of repeated visits to an area is greater than this threshold, the robot will increase the value of c 3 to enhance its exploration ability for the unknown regions.
e above strategy for determining c 3 can be summarized as follows: where A, C ≫ max c 1 , c 2 , B ≈ max c 1 , c 2 .

e Adaptive Inertia
Weight. e value of dynamic weight w will be determined according to the evolution speed and aggregation degree of the population [14,40,41], which can be represented as follows: where w 0 is the initial weight, (1 − h t i ) and s t are evolutionary speed and aggregation degree, respectively, α and β in [0, 1] are relevant coefficients. h t i can be written as follows: where F(pbest t i ) is the best fitness value of the particle i till the time t. Obviously, 0 < h t i ≤ 1, the larger the (1 − h t i ) is, the higher the speed of evolution, vice versa. e definitions of aggregation degrees are as follows: where F t (best) is the best fitness value of the swarm in the time slot t, F t is the average value of all particles in tth iteration. Also, 0 < s ≤ 1, the larger value of s indicates the particles in the swarm are closer to each other. Since only the fitness values in the current iteration are involved in calculating s, it is more reflective of the swarm's current state.

Results and Discussion
We have conducted massive simulations to verify the effectiveness of the proposed method. ese simulations are performed not only to validate the performance of the proposed approach but also to get comparisons with other PSO-based methods. e representative techniques are RPSO, RDPSO [13], and ML-PSO [36] mentioned above. e effects of population size, target number, and environment size on the corresponding methods are evaluated in different simulations. Details are as follows.

Configurations.
e simulation configurations are shown in Figure 1, where the robot swarm, obstacles, and targets are illustrated. Except for experiments of evaluating the impact of map sizes, the targets and obstacles are distributed in a map of 1000 × 1000 units. e robots' initial position is in the middle of the left part of the environment. Robots can move maximumly two units in each iteration. e corresponding parameter configurations of the methods involved are all from the original works. For the parameter configurations not explicitly indicated in the original paper, the parameters consistent with the methods proposed in this paper are used. Specifically, we set c 1 � 1, c 2 � 2, for the proposed method, A � 10000, B � 2, C � 20. All the statistical results are the mean values after 30 independent tests. Figure 2. Ten robots are initialized at the middle-left of a 1000 × 1000 map with some obstacles and one target. By applying the proposed method, the target can be found through swarm cooperation. Generally, most of the PSO-based approaches can achieve the single-target search task. e above example shows that this method inherits the advantages of this kind of method, and the swarm can converge to the target position in the region to be searched.

Multitarget Search.
A multitarget searching process of the proposed method is shown in Figure 3. Twenty robots are initialized at the middle-left of a 1000 × 1000 map with some obstacles. Ten targets are distributed randomly on the map. It can be seen that this method can find multiple targets to be searched in the environment and has corresponding effectiveness. Meanwhile, according to the preset settings, the robots that arrive at a target first will stay with and handle it. Not all robots will converge to the same position. e exploration ability in the proposed method is enhanced. e unexplored area will have a higher probability of being searched. e results above indicate the effectiveness of the proposed method. Further, we also carried out quantitative evaluations of the proposed method and compared it with other PSO-based methods. e comparison methods involved in the simulation are the original Robotic PSO (RPSO), the Robotic Darwin PSO (RDPSO), and the Modified Local PSO (MLPSO). e impacts of the population size, the number of targets, and the environment are evaluated.

Impact of Population Size.
e map configurations and target distributions are the same as in Figure 3. e swarm population is changed from 10 to 50, increasing by five each time. Each size has been tested 30 times independently. e average number of targets found by the four methods is shown in Figure 4. e proposed method can find almost all targets under each population of swarm size larger than 10. For the other compared methods, they are also able to find some targets in the environment. However, once the swarm converges to some target positions, the movement ability is insufficient, resulting in the inability to explore more areas.
e average iteration times before convergence of the compared methods are shown in Table 3. In general, for all compared approaches, the number of iterations before convergence decreases with the increase in population size. Because the map size is identical, with the swarm population size increasing, the algorithms have a higher probability of finding targets and convergence. It can be seen that the method proposed in this paper can not only find almost all targets but also have a lower average iteration time. e efficiency of this method is further proved.
We also calculated the extreme performance of all participating comparison methods, that is, the probability of finding all targets in all tests, as shown in Table 3. It can be seen that the performance of the proposed method is better than that of all other techniques involved in the comparison. Except for the minimal number of robots (10) and the extreme situation of target distribution (distributed in corners of the map), all targets can be found in other cases. For other methods, the possibility of finding more targets is proportional to the number of robots. However, the number of targets found by the compared methods are all less than the proposed method.

Impact of Targets Number.
To further verify the multitarget search performance of this method, we tested the above four methods under different numbers of targets with the same map configuration (1000 × 1000). e robot population is fixed to 20, and the number of targets changes from 1 to 10. e average targets found by each compared method are shown in Figure 5. Each simulation has been tested 30 times independently. Table 4 shows the probability of finding all targets in all tests and the performance of involved methods in the case of increasing targets. For the proposed approach, almost all targets can be found successfully in all tests except for several times when the target cannot be found in some extreme cases. In general, this method achieves the best multitargets search   Table 4, the probability of finding all targets of the proposed method is significantly higher than other methods. Other methods participating in the comparison can find all the targets except when the number of targets is relatively small. With the increase in the number of targets, the possibility of finding all targets becomes smaller. However, the proposed method in this paper is less affected by the number of targets. e success rate of all targets that can be found was kept above 93% in all tests. erefore, from this perspective, this method can still maintain a relatively low search time under the premise of finding more targets. It has the best search efficiency and performance over others.

Impact of Environment Size.
We further evaluated the impact of environment size on the searching performance of all compared methods. e environment size increased from 500 × 500 to 2000 × 2000, and the target number is fixed at 10 with random distribution. e number of robots is set to 20, distributed in the middle of the left side of the map. e obstacles in the map are scaled proportionally. e tests are conducted 30 times independently under each map size. e average number of targets found is shown in Figure 6. Figure 6 shows a comparison of the number of targets found by each method as the map grows. e proposed method is kept at a high level in all cases. Of course, it is not easy to find all targets with the increase of the map with a constant population size. But our method still shows superior performance in the tests. When the total number of targets is 10, the average value of the targets found is above 9.5 for all map sizes. Other methods have some effectiveness   0  9980  0  20  3073  100  4255  11  9341  0  9001  0  25  2161  100  3615  0  6513  12  2535  6  30  2219  100  3156  20  5172  12  3074  10  35  1958  100  3072  0  5781  0  2810  14  40  1836  97  3011  13  5623  22  2821  25  45  1625  100  3864  22  5575  25  2628  33  50  1430  100  2144  45  3867  30   8 Complexity when the map has a small size. Still, with larger map sizes, the performance will be significantly affected. It should be noted that the MLPSO method also has the operation of enhancing the exploration ability of the unexplored area. e performance of this method is better than the other two methods. From the iteration time shown in Table 5, RDPSO requires significantly more search time before convergence than E2RPSO and MLPSO in all cases due to the increasing operations of avoiding local optimal. e likelihood of finding all targets of RPSO is 0% in all cases. e iteration time always reached the preset maximum time of 10000. Compared with the MLPSO method, the E2RPSO method proposed in this paper is similar for iteration times, but the number of targets found by the proposed one is the largest. Similarly, we also evaluated each method's success rates that all the targets were detected in each environment size. e proposed method has a high probability of finding all the targets. When the scale of the map increase, the likelihood of finding all the targets decreases. However, it remains above 80%, as shown in Table 5.

Discussion
From the results above, we can see that the proposed method can collaboratively locate most of the targets in different cases. e PSO algorithm is initially inspired by the foraging behaviors from the biological swarm systems. It can search and locate targets in a region. However, due to the kinematical limitation of particles and the influence of inertia, it is not enough to adjust the population's diversity only by the personal cognition factor and the inertia weight, especially in need of regional coverage or multitarget search task. rough the extensive tests above, we found the following problems which need to be further discussed.

e Source of Diversity.
Maintaining the diversity of a swarm is always an effective way to improve the performance of a swarm intelligence algorithm or application. e original PSO algorithm uses personal cognition and social cognition to reach the consensus with diversity, i.e., the social cognition for convergence and the personal cognition for diversity. However, such diversity among individuals will eventually be lost due to the unity of individual cognition and social cognition. A significant improvement of the follow-up work is the introduction of inertia weight (w t ). With the linear or adaptive inertia weight, the PSO algorithm can get more diversity and jump out of the local optimal. Different methods can balance the relationship between exploration and exploitation by adjusting the diversity of swarms. e above two kinds of diversity come from the interior of the swarm, which we call bottom-up diversity. However, the diversity should also be related to the characteristics of different optimization problems or particular tasks of other swarm systems. is kind of diversity also conforms to the  1  371  100  235  100  2226  100  365  100  2  743  100  403  100  3912  40  1013  87  3  860  100  930  67  4103  37  1130  83  4  1091  100  705  60  4565  3  2084  73  5  1662  97  815  50  4890  0  1271  37  6  2252  100  1653  47  4252  7  1866  37  7  2319  93  2514  27  6006  13  2519  13  8  2908  100  2054  10  5524  10  1537  3  9 2999 Complexity Darwinian evolution principle. To be different from the diversity mentioned above derived from the swarm itself, we turn this diversity derived from the environment into a topdown diversity. Swarm members may change some of their features according to the environment's changes or specific tasks, and then obtain the diversity from the external factors.

Multitarget Searching vs. Multimodal Optimization.
According to the diversity discussion above, the proposed method added task-specific diversity into the original RPSO approach. Since the multitarget searching problem requires the searching procedure to locate the targets as much as possible in a particular region, more exploration ability is needed for the area coverage. e multitarget searching problem is similar to the multimodal optimization problem in the domain of optimization algorithms. Many multimodal objective functions have more than one optima. e multimodal optimization aims to locate multiple peaks/optima in a single run and maintain these found optima until the end of an optimization process. e strategies for those kinds of problems are also by enhancing the task-specific diversity such as Niching strategies, subpopulation strategies, etc. erefore, to solve specific optimization problems or specific swarm robotics tasks, it is impossible to complete the tasks ideally only by bottom-up diversity. Some tasks or environment-related diversity must be introduced, i.e., topdown diversity. ese diversities may come from the prior knowledge of a problem or new cognitions of swarm members in performing tasks or solving problems. Such a situation exists in many actual biological communities, such as the foraging behavior of social organisms, which may be affected by the environment and lead to additional diversity. For example, in the collaborative foraging of ants, if some swarm members are disturbed by the surroundings, such as risk factors are detected, they may not continue to move according to the original pheromone mechanism but avoid the danger. ey are thus introducing additional diversity, which may also provide more possibilities to find new foods.

Conclusion
is paper proposed an exploration-enhanced RPSO (E2RPSO) for collaborative multitarget searching in swarm robotics. e multitarget searching task requires the swarm system to locate as many targets as possible in a specific region; i.e., it requires the swarm to have the ability of additional exploration and area coverage. By keeping the requirements above in mind, we modified the third item in RPSO, which was originally only used for obstacle avoidance. By introducing this exploration enhancing operation, new task-specific diversity of the swarm was added, which we called "top-down" diversity. Besides, the adaptive dynamic inertia weight was used to ensure the "bottom-up" diversity of the original PSO. In the comparative simulation experiments, we evaluated the impacts of population size, target number, and environment size on each method. e results show that the method proposed in this paper can find more targets in a specific region without increasing the time cost.
is method can balance the relationship between exploration and exploitation well and has excellent potential to be applied to practical multitarget searching scenarios.

Data Availability
e code used to support the findings of this study have been deposited in the GitHub repository (https://github.com/ xrl2408/E2RPSO).

Conflicts of Interest
e authors declare that they have no conflicts of interest.