Exploration and Coordination of Complementary Multi-Robot Teams in a Hunter and Gatherer Scenario

The hunter and gatherer approach copes with the problem of dynamic multi-robot task allocation, where tasks are unknowingly distributed over an environment. This approach employs two complementary teams of agents: one agile in exploring (hunters) and another dexterous in completing (gatherers) the tasks. Although this approach has been studied from the task planning point of view in our previous works, the multi-robot exploration and coordination aspects of the problem remain uninvestigated. This paper proposes a multi-robot exploration algorithm for hunters based on innovative notions of"expected information gain"to minimize the collective cost of task accomplishments in a distributed manner. Besides, we present a coordination solution between hunters and gatherers by integrating the novel notion of profit margins into the concept of expected information gain. Statistical analysis of extensive simulation results confirms the efficacy of the proposed algorithms compared in different environments with varying levels of obstacles complexities. We also demonstrate that the lack of effective coordination between hunters and gatherers significantly hurts the total effectiveness of the planning, especially in environments containing dense obstacles and confined corridors. Finally, it is statistically proven that the overall workload is distributed equally for each type of agent which ensures that the proposed solution is not biased to a particular agent and all agents behave analogously under similar characteristics.


Introduction
Multirobot systems are expected to complete tasks that are infeasible, laborious, or inefficient for a single agent to accomplish [1]. Employing multirobot systems entails addressing various problems on the subjects of task allocation [2], exploration [3], coordination [4], learning [5], swarm behavior [6,7], and heterogeneity [8]. Among all of these problems, the problem of multirobot task allocation (MRTA), that is assigning a group of tasks to individual robots, is the most deep-seated problem where its complexity increases considerably in dynamic environments. Since in dynamic problems tasks are unknowingly distributed over an environment, the MRTA problem needs to be addressed from both task planning and multirobot exploration perspectives. e former has been addressed as the hunter-and-gatherer approach in our previous works [9,10] by dividing each task into two sequential subtasks, where each subtask can only be carried out by a certain type of agent.
is novel approach poses an unexplored MRTA problem whose exploration and coordination in complementary teams are the motivation of this work.
According to the taxonomy presented in [11], problems with single-robot (ST) tasks, in which each task requires the effort of a single robot to be completed, are the most primitive cases of MRTA. For instance, the work in [12] addresses MRTA to coordinate a group of autonomous vehicles by proposing two distributed algorithms based on auction and bundle methods. However, in real-world problems, there are cases where each task requires efforts of multiple robots to be completed. is case taxonomically is known as a multirobot (MT) task problem and is investigated in [13,14]. e former proposes a distributed bees algorithm (DBA) and applies the optimized DBA to distributed target allocation in swarms of robots. e latter presents a novel weighted synergy graph model and then introduces a learning algorithm for the presented model in which the system learns agents' interactions. In both cases, the tasks have been assigned instantaneously, i.e., it is assumed that the tasks are identifiable for robots before the mission. Nonetheless, in a dynamic environment, in which tasks are unknowingly distributed over the environment, instantaneous assignment (IA) is infeasible and instead time-extended assignment (TA) must be dragged in.
In the context of TA, there are mainly two paradigms of works addressing the dynamic problems where tasks are unknowingly distributed over an environment: (1) works that address the problem purely from exploration perspective and (2) works that address the problem from MRTA point of view. Regarding the first paradigm, the authors of [15,16] present a very fundamental frontier-based algorithm for a single autonomous robot and multi-robot exploration, respectively. To enhance the efficacy of the frontier-based exploration algorithm, Zlot et al. [17] further developed the frontier-based exploration method by introducing a marketbased approach to maximize information gain while minimizing incurred costs. Utilizing the theory of information gain in [17] opened the floor to integrate the concept of entropy into the multirobot exploration algorithms. For instance, Bhattacharya et al. [18,19] are more focused on information theory and cast the exploration problem as minimization of map entropy by taking into account communication among robots. In contrast to [16][17][18][19] that consider the whole environment for exploration purposes, Lopez-Perez et al. [20] proposed an algorithm for distributed multirobot system to explore nearby zones to reduce the traversed distance, while agents are efficiently using the resources to communicate with each other. Although [16][17][18][19][20] cope with the unknown nature of the dynamic environments by introducing various multirobot exploration methods, they all neglect integrating the MRTA solution into the proposed exploration algorithms.
Works that fall into the second paradigm undertake environments comprising unknowingly distributed tasks, while addressing the MRTA aspect of the problem. On this subject, Prorok et al. [21] considered a TA problem where a system of heterogeneous robots is modeled as a community of species and developed centralized as well as decentralized methods to efficiently control the heterogeneous swarm of robots. In another effort, in [22], a novel task allocation method is developed based on Gini coefficient which increases the number of accomplished tasks considering limited energy resources. Although Prorok et al. and Wu et al. [21,22] address a time-extended assignment problem, a solution for exploring the environment to detect unknown tasks has not been provided. Although a few works such as [23] have tried to investigate the performance of task allocation algorithms in a frontier-based multirobot exploration problem, most studies have neglected the integration of multirobot exploration into dynamic MRTA problems such as ST-MR-TA : SP or MT-MR-TA : SP [24]. Consequently, to the best of the authors' knowledge, there is a lack of critical attention paid to addressing multirobot exploration and task allocation simultaneously in TA problems, while this problem is a pervasive problem in a wide variety of fields such as urban search and rescue (USAR) [25], agricultural field operations [26], and security patrols [27].
Aside from the two paradigms reviewed above, foraging is another research trend in addressing multirobot task allocation problem [28]. Although the foraging problem taxonomically falls into the discussed categories in the literature review [11,24], here we briefly compare the approaches in this paradigm with the hunter-and-gatherer framework. Taken as a whole, foraging is more concerned with collective and swarm behavior of a multiagent system, spanning from ant or bee colony optimization algorithms [29] to multiagent reinforcement approaches [30]. e idea of collective behavior emphasized in foraging research requires identical decision-making mechanism for all agents and results in a decision-making dependency among agents [31]. For instance, it is theoretically challenging to employ an explorer agent with unique search algorithm in the central place foraging algorithm [32] since this exclusive search behavior interferes the swarm behavior of agents and disturbs the system's equilibrium due to decision-making dependency among agents. By way of contrast, solutions to the hunter-and-gatherer framework [9] provide a generic platform for dynamic multiagent task allocation that is more focused on individual autonomy with no decision-making dependency among agents.
Consider the USAR in a disaster site in which a number of victims are stranded in unknown locations and need immediate rescue operations. Each victim is a task that needs to be detected first and then rescued by a rescue operation that typically needs several dexterity actions. is case exemplifies problems where multirobot exploration and task allocation aspects need to be addressed simultaneously. Besides, a rescue robot needs to have a heavy-duty manipulator and dexterous gripper [33,34], high-power actuators, tracked locomotion mechanism, high-capacity batteries, and many sorts of sensors, cameras, and communication devices to accomplish those tasks which make the robot relatively heavy, ponderous, and incapable of agile search operations. Under this circumstance, the "hunterand-gatherer approach" is decidedly justifiable, where each task is comprised of two sequential subtasks: detection and completion. Having said that, each subtask can only be carried out by a certain type of agent, where two teams of robots are employed: a team of agile robots that can quickly explore an environment and detect tasks, called "hunters," and a team of dexterous robots who accomplish detected tasks called "gatherers." Practically speaking, hunters can be a group of small UAVs which search the site to locate victims, and gatherers can be a group of maxi-sized [35] heavy-duty UGVs that rescue detected victims relying on their dexterity capabilities. is paper motivated by the problem explained above, which is taxonomically referred to as ST-MR-TA : SP or MT-MR-TA : SP [24], addresses the dynamic MRTA problem in unknown environments by proposing an integrated multirobot task allocation and exploration solution. According to the hunter-and-gatherer scheme, we first present an innovative decision-making 2 Complexity mechanism based on the novel notion of expected gain (EG), which measures density of available information in the surrounding of a potential job (task/frontier). e EG measurement has been integrated into the concept of certainty and uncertainty profit margins by which the levels of agent's confidence and conservativeness are modeled. is innovative decision-making mechanism shapes the background theory of both proposed multirobot exploration and task allocation algorithms. Besides, this work introduces a coordination factor designated for gatherers through which their behaviors range from completely indifferent to highly coordinated to hunters' locations in the environment. By the way of extensive simulations, we demonstrate that the effectiveness of the proposed algorithms is superior to the performance of the benchmark work [9]. Moreover, statistical analysis of the simulation results shows that the lack of an effective coordination between hunters and gatherers significantly hurts the total effectiveness of the planning. Finally, it is statistically proven that the overall workload is distributed equally for each type of agent which ensures that the proposed solution is not biased to an agent and all agents behave analogously under similar characteristics. e remainder of this paper is organized as follows: the problem statement is presented in Section 2. In Section 3, the methodology and planning algorithms are discussed. Simulation results are presented in Section 4 followed by conclusion remarks in Section 5.

Problem Statement
In this section, we present the problem formulation of the hunter-and-gatherer scheme in the context of dynamic MRTA. Assume that there are m tasks distributed randomly over the environment, E. We consider a case that the number and the locations of tasks are unknown for agents before the execution of the planning algorithms called hunter-andgatherer mission planning (HGMP). e set of tasks is denoted as T � T 1 , . . . , T m in which each task is split into hunting and gathering subtasks, i.e., T k � t h k , t g k with 1 ≤ k ≤ m, where t h k and t g k represent hunting and gathering subtasks, respectively. In this case, the set of agents is defined as A � A h , A g that comprises of two teams of hunters A h � a h i and gatherers A g � a g j , where 1 ≤ i ≤ n h and 1 ≤ j ≤ n g . e cost associated with a h i for accomplishment of t h k is denoted as c h k,i and c g k,j is the cost associated with a g j for accomplishment of t g k . Assumptions: throughout the paper, it is assumed that (1) Tasks are stationary, i.e., they are fixed to their locations. (2) e cost of accomplishment of each task is linearly proportional to the distance that an agent moves to do a task. An agent is considered done with a task when it reaches to the task's location. (3) All agents of a same team are identical. (4) All agents are rational, i.e., they intend to maximize their expected utility.
(5) All agents are fully autonomous and have their own utility functions, i.e., no global utility function there exists. (6) Agents from complementary teams can communicate with each other using a stably connected network. (7) Each location of the grid map is large enough to host multiple agents simultaneously. (8) Agents autonomously avoid collisions while navigating in a specific location of the grid map simultaneously.
Now, the HGMP problem can be stated as follows. Suppose that there exists a tuple for the mission such that HGMP � (E, α, T). Π denotes the assignment function which assigns m tasks to n � n h + n g agents such that Π: T↦A. Under the assumptions 1-6, the global objective Θ is to minimize the collective cost of Π: where x i k and y j k are binary decision variables for t h k and t g k : In (1), weighting parameters ρ h and ρ g are introduced to sum relative collective costs of complementary teams, because of the physical differences of each type.
is problem has a global objective Θ which can be achieved by determining the binary decision variables optimally. However, finding the optimal solution in multirobot path planning and multirobot task planning problems is NPhard, as proven in [36][37][38], respectively. at being said, addressing such problems from agents' point of view is an admissible approach to find relatively better solutions, i.e., local optimal solutions. Since agents are rational, each agent's objective is to maximize its own expected utility in distributed approaches. erefore, the aim of this paper is to design a distributed decision-making mechanism which allows agents to maximize their own expected utility while the individual efforts converge to a local optimal solution from the society standpoint. In other words, the binary decision variables in Θ need to be determined by the agents throughout explorations and coordination in a distributed manner.
is approach also necessitates a study on the hyperparameters of the proposed algorithm to demonstrate how local optimal solutions are achieved by adjusting those parameters with respect to the practical admissible ranges of parameters in each scenario.
Besides, in the proposed problem statement, the allocation problem is considered dynamic for multiple reasons. First, tasks are unknowingly distributed over the environment. us, agents do not have any prior information regarding the tasks' location and need to explore the environment to identify them. Secondly, based on the problem statement, there are always m tasks in the Complexity environment, i.e., when a task is accomplished by the agents, another task will be distributed randomly over the environment. Altogether, it is not feasible theoretically and practically to accomplish the planning right after the start of a mission. Instead, only dynamic planning algorithms can cope with the unknown and dynamic nature of the environment.

Conceptual Frameworks.
Hunters are assigned to explore the unknown environment for detecting new tasks. According to the hunter-and-gatherer scheme, the detected task can only be completed by a gatherer's effort. Since we aim to develop the planning algorithms in a distributed manner, there should be stably connected communication between agents from complementary teams. Considering that fact, hunters announce the location of any newly detected tasks so that gatherers can decide about accomplishing them. Since there is no peer-to-peer communication and all communications are supposed to be broadcasted, we name the communication platform an "online board" through which gatherers notice the location of new detections.
In this section, we develop reasoning mechanisms for both types of agents to properly achieve the global objective of this work mentioned in Section 2. We first illustrate the concept of certainty and uncertainty profit margins, which are the building blocks of the reasoning mechanism of both types of agents. Secondly, we propose a multirobot exploration algorithm for hunters in a distributed manner by introducing the notion of EG incorporated into the concept of profit margins. Subsequently, the way that gatherers accomplish detected tasks is delineated based on the same theoretical frameworks. In fact, we elucidate how the same theories of profit margins and EG can be generalized to develop the multirobot task planning and coordination algorithm of gatherers.

Notion of Profit Margins.
e rationale behind the idea of profit margins is to classify potential jobs (tasks/frontiers) in an environment into profitable, weakly-profitable, and nonprofitable types. When a job is profitable, the agent is confident about taking actions to accomplish it. On the other hand, the agent is conservative about potential jobs that are weakly-profitable and ignores nonprofitable jobs. e effort needed to accomplish a job is the factor that determines whether a job is profitable, weakly-profitable, or nonprofitable. According to the second assumption, the effort made by an agent to accomplish a job corresponds to the distance that it travels to reach the job. For example, the effort that a gatherer makes to accomplish a job is the distance that it travels to reach and accomplish a task. Similarly, the effort that a hunter makes to accomplish its job is the distance that it travels to explore the environment by reaching the frontiers. Now, we define the certainty and uncertainty profit margins (CPM and UPM) more specifically for both types of agents with respect to the accomplishment cost of a job.
CPM is a margin to which the travel distance is less than R c from agent's perspective. UPM is a margin to which the travel distance is less that R u and greater than R c from agent's point of view. Figure 1 shows the CPM and UPM conceptually as two concentric circles with the agent at the center. In the case of this figure, Job 1 is included in agent's CPM, so it is considered as a profitable job and the agent is confident to accomplish it. Further, the agent is conservative about completing Job 2 since it falls in its UPM and is a weakly-profitable job. Finally, Job 3 is located beyond the agent's UPM, so it is not profitable, and the agent ignores it.
Since agents of both type function in an environment in the presence of obstacles, we explain the concept of CMP and UMP for an agent functioning in an occupancy grid map [39]. Figure 2 illustrates an occupancy grid map with an agent located at the center. In this figure, the concept of profit margins has been applied to the probabilistic road maps (PRM) generated for agent's path planning. In other words, Figure 2 explains how an agent practically classifies jobs as profitable, weakly profitable, and nonportable in a map relying on the PRMs.
According to the hunter-and gatherer scheme, a hunter agent relies to its profit margins to explore the environment and a gatherer agent considers its profit margins to accomplish detected tasks. Regarding the definition of UPM and CPM and the way that it can be applied to PRMs, we focus on developing the reasoning mechanisms for both types of hunter-and-gatherer agents in the subsequent sections.

Reasoning Mechanism: Hunters.
In this section, we aim to develop a reasoning mechanism based on the definition of profit margins so that hunters explore the environment. In this regard, we utilize the frontier-based exploration concept to develop a CPM and UPM-based multirobot exploration algorithm. e basic idea in a frontier-based exploration algorithm is that the explorer agent selects a frontier point first and then moves towards the selected frontier to explore unknown areas iteratively. Although we develop the reasoning mechanism for hunters in a distributed manner, we need to utilize an online shared map in which certain information of the map and frontiers are accessible for all agents. Hence, before developing the reasoning mechanism, we define a platform in which agents share their gained information.
We define an online board which contains the collective gained information about the environment's map. At the beginning of each mission, all cells of the occupancy grid map are marked as unknown. While hunters explore the map, each explored cell can be marked as obstacle (cells with probability greater than 0.5 in the occupancy grid map), free, or task cell. Moreover, the unknown cells neighboring a known cell will be marked as a frontier cell. By analyzing the data embedded in the online board, each hunter decides 4 Complexity which frontier to select and explore in a distributed manner relying its reasoning mechanism. e reasoning mechanism splits into two steps: (1) the map updating process, i.e., the hunter updates some additional information on each frontier cell collectively, and (2) the decision process, i.e., the process by which the hunter chooses a frontier cell to explore.
Regarding the first step, the hunter agent classifies all frontiers into three categories according to the definition of CPM and UPM, as illustrated in Figure 3. en, the hunter updates the location of detected frontiers on the online board and the hunter agent indicates that if the new frontiers fall within its CPM or UPM. To elaborate, each frontier cell keeps two factors called certainty and uncertainty factors (CF and UF). CF of a frontier indicates the number of hunters that the frontier cell is included in their CPM. Similarly, UF of a frontier cell indicates the number of hunters that the frontier cell is within their UPM. Accordingly, the hunter updates CF and UF of all frontiers within its CPM and UPM. In each iteration, the hunter does the map updating process first and then relies on the CF and UF information of frontiers to proceed the decision process.
As elaborated above, we need to develop a decision process by which a hunter decides which frontier to choose for exploring relying on the information updated on the online board. Here, we propose a method which considers the EG available by exploring a certain frontier and chooses a frontier with maximum value of EG. is method has three main features: (1) the algorithmic method is developed in a Complexity distributed manner, so we propose the decision process for an instance hunter agent, (2) the relative position of other hunters is being considered in the decision process (using CFs and UFs) which predictably prevents hunters from rushing towards closely similar regions, and (3) the CFs and UFs of all neighbor frontiers are reflected in defining EG for a candidate frontier to guarantee the previous property. As a matter of fact, the neighborhood of a frontier corresponds to the CPM of that frontier. As explained for the map updating process, the hunter classifies all frontiers into 3 classes regarding its CPM and UPM. In this step, we clarify how EG is defined for frontiers within agent's CPM. Afterwards, we will develop the EG definition for frontiers within hunter's UPM. Needless to mention, frontiers beyond hunter's UPM are ignored by the agent due to the definition of profit margins.
Suppose that there are λ c frontiers within the hunter's CPM where λ c ≥ 1. en, the set of frontiers within its CPM is defined as   To put it simply, we have EG ∝ distance −1 for each frontier.
en, distance −1 is multiplied by a coefficient in which its numerator is the total number of frontiers available in the candidate frontier's neighborhood and its denominator is the collective CFs of those frontiers. In other words, higher values of the numerator indicate that there are other frontiers in the candidate frontier's neighborhood which can be accessible for the agent to explore easily when it visits it. However, the denominator reflects the presence of other hunters within the candidate frontier's neighborhood.
Altogether, a h i calculates ε f c,z for all frontiers within its CPM and then chooses a frontier with the maximum value of expected information gain, denoted as f c ζ , such that Similarly, EG can be defined for a candidate frontier within hunter's UPM with a slight difference. In this case, both collective CFs and UFs will be considered to define EG for a candidate frontier. To explain, suppose there are λ u frontiers within the hunter's UPM where λ u ≥ 1 and λ c � 0. en, the set of frontiers within a candidate frontier's UPM is defined as where a h i calculates ε f u,z for all frontiers within its UPM and then chooses a frontier with the maximum value of EG, denoted as f u ζ , such that e above procedures for selecting a frontier have been considered to develop a frontier selection function. Algorithm 1 illustrates the procedure in which a hunter selects a frontier within its CPM or UPM. In line 3, the hunter uses the definition of profit margins, i.e., R c and R u , to categorize all frontiers and updates the CF and UF of frontiers on the online board. In lines 5 and 9, the agent utilizes (3) and (5) respectively to calculate EGs. Further, the hunter uses (4) and (6) to choose a frontier with highest value of EG in lines 7 and 9, respectively. e frontier selection function explained is Algorithm 1 needs to be invoked in the hunter's main algorithm. To that end, Algorithm 2 illustrates the main decision procedure for a hunter agent. In line 1, τ max denotes the maximum number of iterations at which the main procedure is executed. In line 2, the hunter checks to know whether its frontier buffer is empty to invoke the frontier selection function. In line 4, the agent updates the status of the selected frontier on the online board. In fact, since the online board stores the location of all agents and also the frontier map of the environment, the CF and UF of each frontier are calculated in a centralized manner and are available on it. When the selected frontier is located within agent's CPM, then after updating, the frontier is not selectable for other agents. Otherwise, the agent only updates the status of the selected frontier to pending which still allows other agents, i.e., agents that the selected frontier is within their CPM, to select the frontier. In other words, when the selected frontier is within agent's UPM, then there are still chances for other closer agents to select the frontier.
is is a reassignment process which results in improving the assignment iteratively regarding the dynamics of the environment. However, when the agent gets close enough to the selected frontier so that the frontier becomes included in its CPM, the agent can update the status of the selected frontier such that no reassignment be allowed anymore. In line 7, the agent checks the condition to make sure whether the selected frontier is still available. Obviously, when the agent selects a frontier within its CPM, then this condition is always true. When a frontier is selected and is still available, then the hunter iteratively moves towards the selected frontier. In line 9, relying on the sensor data, the hunter Complexity 7 checks whether a new task is detected while moving towards the selected frontier. In line 16, the hunter updates the new detected frontiers on the online board according to the updated captured data, while moving towards the selected frontier. To clarify, when a selected frontier is within the agent's CPM, then the agent is responsible for exploring the corresponding area of the selected frontier, which makes it impossible for other agents to select that frontier.

Reasoning Mechanism: Gatherers.
In this section, we aim to develop a reasoning mechanism based on profit margins so that gatherers accomplish the detected tasks efficiently. On this subject, we develop a task-selection algorithm like the frontier selection algorithm in the previous section, but we also consider the coordination between gatherers and hunters to develop the algorithm. To this end, the EG for a task is a function of locations of tasks and frontiers. In fact, the locations of tasks play the main role to calculate the EG, but the locations of frontiers are also considered to involve the coordination factor between a gatherer and the other hunters. is effectively enables a gatherer agent to prioritize tasks surrounded by more frontiers because any region with higher density of frontiers is more susceptible for the presence of hunters. is reasoning rationally performs a coordination between gatherers, which are accomplishing detected tasks, and hunters, which are exploring the environment by visiting the frontiers. e reasoning mechanism for a gatherer agent splits into two steps: (1) the map updating process, i.e., the gatherer updates some additional information on each task-marked cell, and (2) the decision process, i.e., the process by which the gatherer chooses a task to accomplish. To do the map updating process, the gatherer agent classifies all detected tasks into three categories according to the definition of en, the gatherer updates CF and UF factors of each task-marked cell. e CF of a task indicates the number of gatherers that the task is included within their CPM. e UF of a task-marked cell indicates the number of gatherers that the task is within their UPM. Accordingly, the gatherer updates CF and UF of all detected tasks within its CPM and UPM. In each iteration, the gatherer does the map updating process first and then relies on the CF and UF information of detected tasks to proceed the decision process. To develop the decision process for a gatherer agent, we first clarify how EG is defined for tasks within agent's CPM and then we develop the EG definition for tasks within gatherer's UPM.
Suppose that there are κ c tasks within the gatherer's CPM where κ c ≥ 1. en, the set of tasks within its CPM is defined as T c � t c z where 1 ≤ z ≤ κ c . e gatherer, denoted as a g j , needs to calculate EG for all members of T c and then choose a task with the highest value of EG. A primary factor which effects EG of a task is the distance between the gatherer and the task such that EG ∝ distance −1 .
is proportionality needs to be completed by considering other conditions of the task to have a more accurate definition of EG. Let t c z denote the candidate task that a g j aims to analyze and calculate its EG. a g j needs to know if it visits t c z , then how many other tasks will be available within its CPM, and what is the collective CF of those tasks. Let κ c z denote the number of tasks within CPM of t c z . Accordingly, the set of expected tasks with respect to t c z is defined as t * c z � t c z,p where 1 ≤ p ≤ κ c z . Next, a g j calculates the collective CF of all members int * c z . Now, EG ∝ distance − 1 gets completed by adding a coefficient which is the ratio of κ c z and the collective CFs of t * c z . e set of EG for all tasks within agent's CPM is denoted as ε t c � ε t c,z where 1 ≤ z ≤ κ c and ε t c,z denotes the EG of a candidate task, i.e., t c z . Further, CF of a member of t * c z is denoted as σ c p . erefore, for ε t c,z of a candidate task within agent's CPM, we have To complete (7), we also need to consider the coordination between the gatherer and other hunters by caring about the availability of frontiers within CPM of the candidate task. To that end, we will multiply the right side of (7) by a coordination term which is (1 + μλ c z ), where λ c z and μ denote the number of frontiers within CPM of t c z and the coordination coefficient, respectively. us, ε t c,z of a candidate task within agent's CPM is defined as follows: Similarly, a g j calculates ε t c,z for all tasks within its CPM and then chooses a task with the maximum value of EG such that ζ � argsmax(ε f c ) where t c ζ denote the chosen task. e admissible values for the coordination coefficient are defined as μ ≥ 0. However, the optimal value of it depends on the dimension and floor map of the environment and also depends on the number of hunter agents.
By the same token, EG can be defined for a candidate task within a gatherer's UPM with a slight difference. In this case, both collective CFs and UFs will be considered to define EG for a candidate task. To illustrate, suppose that there are κ u tasks within the gatherer's UPM where κ u ≥ 1 andκ c � 0. en, the set of tasks within its UPM is defined as T u � t u z where 1 ≤ z ≤ κ u . e gatherer, denoted as a g j , needs to calculate EG for all members of T u and then choose a task with the highest value of EG. Let t u z denote the candidate task in which a g j aims to analyze and calculate its EG. a g j needs to know if it visits t u z , then how many tasks will be available within its CPM, and what are the collective CF and UF of those tasks. Let κ u z denote the number of tasks within CPM of t u z . Accordingly, the set of expected tasks with respect to t u z is defined as t * u z � t u z,p where 1 ≤ p ≤ κ u z . Next, a g j calculates the collective CF and UF of all members int * u z . e set of EG for all tasks within agent's UPM is denoted as ε t u � ε t u,z where 1 ≤ z ≤ κ u and ε t u,z denotes the EG of a candidate task, i.e., t u z . Further, CF and UF of a member of t * u z are denoted as σ u p and ω u p , respectively. Additionally, κ c z denotes the number of frontiers within CPM of t u z . erefore, for ε t u,z of a candidate task within agent's UPM, we have Similarly, a g j calculates ε t u,z for all tasks within its UPM and then chooses a task with the maximum value of EG such that ζ � argsmax(ε f u ) where t u ζ denotes the chosen task. e task-selection procedure is almost like the procedure illustrated in Algorithms 1 and 2. e main difference is the way that a gatherer calculates EG for all detected tasks which has been illustrated by (8) and (9).

Simulation Results
In this section, we put the exploration and coordination algorithms in the hunter-and-gatherer scenario into test by running extensive simulations and investigating the performance of the proposed method statistically from multiple aspects. First, we aim to validate the fairness of the proposed task allocation algorithm. is validation, which is carried out by comparing agents' effectiveness in a set of experiments and analyzing the results by paired T-test and ANOVA [40] methods, ensures that the overall workload of a mission is distributed equally among agents of both type. ereafter, we study the effect of profit margins on the total effectiveness of the proposed methods by accomplishing parameter studies on R C and R U . After that, we need to study the efficacy of the introduced coordination factor for gatherers by investigating its effect on the planning's total effectiveness. As the final steps, the functionality of the proposed method is tried out by drawing comparisons. To that end, we first compare the performance of the proposed method with the benchmark hunter-and-gatherer approach introduced by the authors' previous work in multiple environments, and then the functionality of the exploration and coordination algorithms in the context of the hunterand-gatherer scheme is verified by a comparison of its Complexity performance and a basic alternative method in which each agent does both hunting and gathering tasks itself. ese two comparisons ensure that the newly proposed method outperforms the benchmark method while it is superior to the nonhunter-and-gatherer approaches.
To simulate the proposed approaches, we developed a multirobot simulation platform in MATLAB from scratch. In this platform, we can implement the simulations on any custom map, while the number of each type of agent is adjustable. We provide some basic functions for each type of agent to enable them maneuver over the determined environment. For gatherers, we utilized A * -based motion planning algorithm which enables them to move along two points in a grid environment. Besides, the number of tasks is also adjustable while they get located randomly over the environment. As a matter of fact, we also provided the perpetual mode for implantation of the simulations where, for each gathered task, another task will be distributed randomly in the environment. Accordingly, at each iteration, there are certain number of tasks available in the environment which is adjustable for each mission. Further, in the perpetual mode, each explored and known grid of the environment turns into an unknown grid after certain iterations. e perpetual mode helps the analysis be done in a much more accurate and evidence-based way.
All simulations have been executed under the following conditions: (1) the environment is sectioned as a e L × e W grid of tiles where e L � e W � 100, (2) the quantities of each type of agents are adjusted as n h � 4 and n g � 2, (3) there are always m p � 25 tasks in the environment, (4) the maximum number of iterations is determined as τ max � 1000, and (5) we considered the weighting parameters as ρ h /ρ g � 0.2. Moreover, the simulations have been conducted with abstract agents in order to experiment and evaluate the proposed algorithms more generically and without being biased to any specific types of agents.

Task Allocation Fairness.
To demonstrate that the accomplishment's workload is distributed equally for each type of agent, the concept of fairness is introduced. We need to investigate the task allocation algorithms from the fairness perspective for two main reasons: (1) to prove that the allocation is not biased to a particular agent by ensuring that agents behave analogously under similar characteristics, and (2) to confirm that there is no imbalance in agent's involvement in a mission which, practically speaking, results in an equal wear and tear of individual robots while operating in real-world situations.
We define an effectiveness factor for each agent of both types based on their costs and accomplishment. en, using the statistical analysis, the fairness of the HGM by comparing effectiveness of different agents of each type can be proven. Let η h i and c h i denote the effectiveness of a h i and the number of tasks hunted by the agent, respectively, as the following: Similarly, η g j and c g j denote the effectiveness of a g j and the total number of tasks gathered by the agent, respectively, such that To investigate the fairness of the proposed algorithms, we ran 100 missions and recorded agents' effectiveness according to (10) and (11).
en, utilizing statistical hypothesis testing, we prove the fairness for each type of agents by showing that the averages of agents' effectiveness in 100 missions are statistically identical. Having said that, the hypothesis testing will be applied on the mean of 100, recorded effectiveness for each hunter and gatherer. Figure 5(a) shows the statistical results of η h i for all hunters. As n h � 4, an ANOVA test has been applied to the collected data to statistically prove the fairness of the proposed algorithms for hunters. e ANOVA test has been applied as follows: i denotes the average of η h i for a h i in 100 tests and α denotes the significance level. According to the results of the ANOVA test, we have F � 0.39, F − crit � 2.62, and P − value � 0.75. Since F < F − crit and P − value > α, we must retain the null hypothesis. us, it has been proved that In addition, as n g � 2, a paired T-test has been applied to the data to investigate the fairness of the proposed algorithms for gatherers. e hypothesis testing has been done in such a manner H 0 : μ n s � 100, dof � 99, and α � 0.05. According to the test p − value � 0.1. Since, p − value > α, we must retain the null hypothesis. erefore, it has been proven that μ g 1 − μ g 2 � D 0 � 0, as it is illustrated in Figure 5(b).

Effect of Agent's Profit Margins on the Total Effectiveness.
e proposed algorithms rely strongly on introduced definitions of profit margins, as discussed in the methodology section. Accordingly, we need to study the effect of profit margins' parameters on a mission's effectiveness to demonstrate their functionality for both types of agents. For this reason, we define the effectiveness for a mission, denoted as η t , which is the ratio of the total number of completed tasks, c t , and the collective cost of the whole mission, C t , as follows: Regarding (14), extensive simulations have been ran for all values of R h c and R h u , i.e., the profit margins of hunters, 10 Complexity η t has been calculated for each set of values for R h c and R h u , as illustrated in Figure 6(a). e results show that η t ranges from 0 to 0.012 while changing the value of R h c and R h u during the whole simulation. As Figure 6(a) displays, the total effectiveness reaches its maximum value, when R h c � 10 and R h u � 110. Besides, the slight increase in the total effectiveness in Figure 6(a) for small values of R h c and R h u can be interpreted as the effect of perpetual running of the simulations. In perpetual mode, known areas get unknown after a certain number of iterations which inevitably favors agents with smaller profit margins. Because of that, although a hunter agent with small values of R h c and R h u cannot explore distant frontiers to detect tasks, its surrounding explored areas get unknown with a possibility of popping up new tasks. However, the ensemble effect of profit margins of hunters ensures the existence of a maximum for η t .
Similar simulations have been run for all values of R g c and R g u , i.e., the profit margins of gatherers, such that 1 ≤ R g c ≤ 136 . η t has been calculated for each set of values for R g c and R g u , as illustrated in Figure 6(b). e results show that η t ranges from 0 to 0.012 while changing the value of R h c and R h u during the whole simulation. According to the results, the total effectiveness reaches its maximum value when R h c � 30 and R h u � 40. e main conclusion to be drawn is that the introduced profit margin parameters for both types of agents have distinct effect on the total effectiveness and there exists a maximum value for η t . Moreover, according to the proposed methodology for both types of agents, when R c decreases, the agent becomes less confident, and when R u increases, the agent becomes less conservative. In this regard, for both types of agents, the best strategy to reach the maximum of η t is neither being completely confident nor being fully conservative, but a combination of both leads to the optimum result.

Effect of Coordination Factor on the Total Effectiveness.
In this section, we aim to investigate the effect of the coordination factor, introduced in Section 3.4, on the mission's effectiveness. To conduct a comprehensive investigation, we carry out the experiments in three different grid maps with various levels of obstacle complexities: (a) a simple grid map containing two straight barriers, (b) a grid map containing sparse obstacles, and (c) a confined grid map containing narrow corridors and confined rooms, as depicted in Figure 7.
After defining three different grid maps, we ran the algorithm 200 times for different values of μ in each defined grid map, as illustrated in Figure 8. First, these results show that there is a value for μ in each grid map that leads to a maximum value of η t that denotes the average of η t in 200 tests. Considering that we want to know that how much η t increases when coordination factor changes from μ � 0 to μ � μ max . In fact, this investigation compares two cases: (1) task planning without any coordination between gatherers and hunters (μ � 0), and (2) task planning with the optimum value of the coordination factor (μ � μ max ). For that purpose, we applied a paired T-test to two of the collected datasets from Figure 8 e test has been conducted considering H 0 : η 2 − η 1 ≤ D 0 , H 1 : η 2 − η 11 > D 0 , D 0 � 0.15η 1 , n s � 200, do f � 199, and α � 0.05 where η 1 and η 2 denote the average of η t for the first and second datasets, respectively. According to the test result, p − value � 0.017, t � 2.12, and t 0.05,99 � 1.65. Since t > t 0.05,99 and p − value < α, we reject H 0 . erefore, the results prove that η t increases more than 15 percent by changing μfrom 0 to 0.4.
Likewise, we applied the same statistical analysis on the datasets collected from two other grid maps, i.e., the environment with sparse obstacles and with confined obstacles, as shown in Figure 8(b) and 8(c). According to the analysis for the environment with sparse obstacles, η t increases by 35 percent when μ changes from 0 to its optimum value which is 0.6. In addition, the analysis suggests that by changing μ from 0 to its optimum value for the environment containing confined obstacles, η t increases by 60 percent.
Considering the statistical analysis for results in all three   environments, there are two insightful implications about the effects of the coordination factor in different environments: First, the more the complexity of the obstacles in the environment, the more the drop of the total effectiveness of the planning algorithms. On this subject, the total effectiveness without any coordination between hunters and gatherers (μ � 0) drops by 32.4% from an environment with simple obstacles to the one with sparse obstacles. is drop is even more serious, that is 45.7%, by changing the obstacles of the environment from simple obstacles to confined ones. All told, lack of coordination between cooperating agents, i.e., hunters and gathers, is relatively more critical problem when the environment comprises more complex obstacles and confined and narrow corridors.
Secondly, when the environment contains more complex obstacles, it takes higher values of the coordination factor μto prevent the total effectiveness from dropping significantly. As the results presented in Figure 8 suggested, the optimum value of μ is 0.4 in an environment with simple obstacles, while the optimum value increases to 0.6 and 1 with sparse and confined obstacles, respectively. All over again, this finding emphasizes the criticality and necessity and criticality of coordination between cooperating agents from complementary teams.

Functionality Validation of the Hunter-and-Gatherer
Approach. In this section, we aim to compare the newly developed planning algorithms addressing the exploration and coordination aspects of the hunter-and-gatherer scenario with the benchmark hunter-and-gatherer mission planning introduced in [9]. To that end, we conduct the comparison in three environments containing different configurations of obstacles, as described in Figure 7. 200 tests have been carried out for each solution in each environment, as Figure 9 displays the results. To draw a valid comparison, the mutual parameters in both solutions are set identically as n h � 4, n g � 2, m p � 50, e L � e W � 100, τ max � 1000, and ρ h /ρ g � 0.2. Other specific parameters are set to their optimal values for both solutions.
According to the results, the new approach presented in this paper performs significantly superior to the benchmark hunter-and-gatherer mission planning in all three different environments. To demonstrate this statistically for the environment with simple obstacles, we apply a paired T-test to two of the collected datasets, where the first dataset contains 200 measures of η t for the new approach introduced in this work, and the second dataset comprises 200 measures of η t for the benchmark hunter-and-gatherer approach. e statistical analysis suggests that the new approach performs more effective than the benchmark approach by 14 percent, where n s � 200, do f � 199, p − value � 0.026, t � 1.95, and t 0.05,99 � 1.66. By the same token, the new approach outperforms the benchmark solution by 28% and 36% in the environments with sparse and confined obstacles, respectively. e analysis discussed above indicates that the amount of the improvement is correlated with the complexity of the obstacles in the environment. Having said that, addressing the exploration and coordination of the hunter-and-gatherer scenario is much more critical when the environment comprises obstacles, especially dense obstacles.
at is mainly because in the case of the benchmark approach performing in confined environments including dense obstacles, agents are more susceptible to get far away from each other, which eliminates any overlap between agent's profit margins. Consequently, all sorts of negotiations converge to being declined since the distance factor makes all negotiations unprofitable. Although in the new approach the agents still rely on their profit margins to make the decisions, the coordination factor between hunters and gatherers facilitates keeping reasonable overlap in their profit margins. is leads to a relatively much more effective and efficient performance in environment with the presence of obstacles, compared to the benchmark method.
To further investigate the performance of the proposed planning method, we incorporated different multirobot exploration algorithms into the developed platform and measured the total effectiveness of the planning, as defined in equation (14). In other words, we maintained the proposed algorithm for gatherers and investigated the performance of the proposed method with different search algorithms for hunters, including the one proposed in this paper called EG-based frontier selection. is experiment has been conducted on the confided grid map introduced in Figure 7(c), and for each exploration algorithm, the experiment has been repeated 20 times for each value of n t . As demonstrated in Figure 10, the random walk multiagent exploration algorithm [41] performed ineffectual compared to other algorithms. is is mainly because the random walk search algorithm neglects the distance to the frontiers and disregards the locations of other hunters in the environment. By considering the distance to candidate frontiers, multiagent frontier-based exploration algorithm [42] enhances the total effectiveness of the planning by 93.4% compared to the results with the random walk algorithm. e most optimal performance of the frontier-based exploration algorithm occurs with eight hunters in the confined Complexity 13 environment, while the random walk algorithm requires ten hunters for the very same environment to reach the optimal performance.
Alternatively, the scene partitioning multiagent exploration algorithm [20] performs drastically superior to the frontier-based algorithm and enhances the total effectiveness   e partitioning mechanism of this algorithm decreases the conflict and overlapping between explorers which leads to relatively more efficient performance. Furthermore, the scene partitioning algorithm requires six hunters to perform optimally. Finally, the EG-based multirobot exploration algorithm presented in this paper outperforms all three algorithms. Although the EG-based exploration algorithm similarly requires 6 hunters to perform optimally, it enhances the scene partitioning results by 17.8%. In fact, EG-based exploration algorithm facilitates a dynamic and temporal partitioning which provides more flexibility and controlled freedom for agents to decide and plan optimally.

Functionality Validation of the Hunter-and-Gatherer
Approach. To validate the functionality of the hunter-andgatherer approach, we compared the proposed approach with an alternative approach in which there is only one type of agent doing both exploration and completion of tasks together. e goal of this comparison is to answer two critical questions: (1) is hunter-and-gatherer scheme more economic than the explained alternative approach? and (2) what criterion needs to be satisfied for the hunter-andgatherer scheme to be relatively economic? e hunter-and-gatherer approach fundamentally differs from the alternative approach in employing two types of agile (hunters) and dexterous (gatherers) agents. In contrast to the alternative approach, each task takes two agents, that is, a pair of hunter-and-gatherer agents, to be completed in the hunter-and-gatherer scheme. is makes the ratio of the weighting parameters ρ h /ρ g an imperative factor to conduct the study. Hence, the criterion judging the functionality of the hunter-and-gatherer scheme should be expressed with respect to the ratio of the weighting parameters. To that end, we ran the algorithm for different values of μin each map and compared its total effectiveness with the total effectiveness of the alternative approach in three previously defined environments, as shown in Figure 11. According to the results, it is economic to employ the proposed hunter-and-gatherer approach for dynamic ST-MR-TA : SP problems if and only if we utilize hunter-and-gatherer agents that satisfy ρ h /ρ g < 0.45 approximately. In other words, if the cost of accomplishment of a hunter agent for a certain task is less than 45 percent of the gather's accomplishment cost for the same task, then employing the hunter-and-gatherer scheme, instead of the alternative approach, is relatively economic.

Conclusion
is paper addresses the multirobot task allocation problem in dynamic environments and focuses on the exploration and coordination aspects of the hunter-and-gatherer scheme. On this line of thought, we proposed an innovative decision-making mechanism based on the novel notion of EG, which measures the density of available information in the surrounding of a potential job (task/frontier). We demonstrated that applying the EG-based decision-making mechanism on hunters and gatherers to address exploration and task allocation aspects of the problem improves the performance of the hunter-and-gatherer scheme significantly compared to the authors' previous negotiationbased solution. We found that the significance of the improvement is correlated with the complexity of obstacles in the environment. Besides, this work proposed an EGbased coordination algorithm for gatherers, which led to a momentous increase in the planning's effectiveness. Statistical analysis on the simulations' results suggests that there is an optimum value for the coordination factor which maximizes the planning's total effectiveness. Likewise, we found that the optimum value of the coordination factor varies for environments with different densities and difficulties of obstacles. Collectively, the higher the complexity and difficulty of the obstacles in an unknown environment, the more the improving effect of the proposed method on the planning's total effectiveness. Moreover, we showed that employing two complementary teams of hunters and gatherers can effectually improve the total effectiveness of the task allocation in a mission. However, this is only true when the defined judging criteria, associated with the ratio of the weighting parameters, is adequately satisfied. Practically speaking, the affordability criteria comparing relative costs of each type of agent are straightforwardly satisfiable, as the USAR case exemplifies a real-world problem where the relative accomplishment costs of hunters (small UAVs) and gatherers (heavy-duty UGVs) satisfy the defined criteria.
Future research should define the cost function more comprehensively by considering the communication burden between agents.
is provides more realistic settings to evaluate the efficacy of solutions utilizing communication, such as the hunter-and-gatherer scheme. Besides, further research is needed to confirm the functionality of the hunterand-gatherer scheme in practice by carrying out a multirobot test bench to study the complexities and limitations of the developed theories in the context of the hunter-andgatherer approach. Future works can also consider optimizing the control parameters introduced in this work utilizing a wide variety of optimization or learning methods.

Data Availability
e data and source code used to support the findings of this study are available from the corresponding author upon request. is paper has been archived at Cornell University Library [43].

Conflicts of Interest
e authors declare that they have no conflicts of interest.