Multi-Agent Task Allocation in Complementary Teams: A Hunter and Gatherer Approach

Consider a dynamic task allocation problem, where tasks are unknowingly distributed over an environment. This paper considers each task comprised of two sequential subtasks: detection and completion, where each subtask can only be carried out by a certain type of agent. We address this problem using a novel nature-inspired approach called"hunter and gatherer". The proposed method employs two complementary teams of agents: one agile in detecting (hunters) and another skillful in completing (gatherers) the tasks. To minimize the collective cost of task accomplishments in a distributed manner, a game-theoretic solution is introduced to couple agents from complementary teams. We utilize market-based negotiation models to develop incentive-based decision-making algorithms relying on innovative notions of"certainty and uncertainty profit margins". The simulation results demonstrate that employing two complementary teams of hunters and gatherers can effectually improve the number of tasks completed by agents compared to conventional methods, while the collective cost of accomplishments is minimized. In addition, the stability and efficacy of the proposed solutions are studied using Nash equilibrium analysis and statistical analysis respectively. It is also numerically shown that the proposed solutions function fairly, i.e. for each type of agent, the overall workload is distributed equally.


Introduction
Multirobot systems are expected to undertake imperative roles in a wide variety of fields such as urban search and rescue (USAR) [1,2], agricultural field operations [3], security patrols [4,5], environmental monitoring [6], and industrial procedures [7]. Studies have shown that multirobot systems have advantage over single-robot systems by offering more reliability, redundancy, and time efficiency when the nature of the tasks is inherently distributed [8]. Nonetheless, the problem of multirobot task allocation (MRTA) poses many critical challenges that have called for investigation in the past two decades [9,10]. In this regard, the complexity of MRTA problems increases significantly in a dynamic environment, where the number and location of tasks are unknown for agents [11,12]. us, robots need to explore the environment to find tasks before accomplishing them. In real-world problems, any robot designated to perform one of the tasks in [1][2][3][4][5][6] needs to be sufficiently dexterous, i.e., to be equipped enough for accomplishing physical tasks such as object manipulation or rubble removal in a rescue mission [13], which inevitably make the robot relatively heavy and incapable of agile exploration. Having said that, the dynamic problem can be turned into a problem where each task is composed of sequential subtasks, each possible to be done only by a certain type of agent. In this case, for each type of subtask, a robot team of appropriate type must be employed.
is case poses an unexplored MRTA problem whose coupling and cooperation between those complementary teams are the motivation of this work.
In the context of MRTA, notable attention has been devoted for revealing various aspects of dynamic problems [14][15][16][17]. For instance, Lerman et al. [18] present a mathematical model of a general dynamic task allocation mechanism where robots use only local sensing and no direct communication is needed between them for task allocation. Disregarding the communication between agents is a deficiency where the information handled by the agents plays an imperative role in the functionality of a decentralized multiagent system in a dynamic environment. In this regard, Liemhetcharat and Veloso [19] introduce a novel weighted synergy graph model to capture new interactions among agents. In the created model, agents work together in a task throughout communication where weight of the edge indicates the communication cost between agents.
In contrast to the way that Liemhetcharat and Veloso [19] utilize the communication among agents, there are works employing communications among agents to frame negotiations among them. For instance, Chapman et al. [20] pursue a decentralized game-theoretic approach in which planning is achieved via negotiation between agents. Although the results show that their approach is robust to restrictions on the agents' communication and observation range, this work is not allowing agents to have differing costs for performing the same task which makes it inapplicable to a wide variety of real-world problems. On the contrary, Michael et al. [21] propose a distributed market-based coordination algorithm in which agents are able to bid for task assignments considering each agent's cost for accomplishment of tasks to address the dynamic MRTA problem. While in real-world dynamic MRTA problems, tasks are not fully observable for all agents, the authors of this work assumed that the agents have knowledge on all tasks at a time. is assumption is too strong and does not completely reflect a dynamic environment's situation. In this regard, Sariely and Balch [22] consider a real dynamic environment, present a real-time single-item auction-based task allocation method for the multirobot exploration problem, and investigate new bid evaluation strategies.
While the works reviewed above [20][21][22] present different approaches to address a dynamic MRTA problem, they all similarly do not consider the agents' capabilities while developing the assignment algorithm. On this subject, Wu et al. [23] take agents' capabilities into account in order to form teams by developing a market-based novel task allocation method based on the Gini coefficient. Although the authors demonstrated that the proposed method can effectively improve the number of tasks completed by a robot system, the effect of cooperation and coupling between the formed teams is still uninvestigated. In a similar effort, Shiroma and Campos [24] model agents' capabilities as actions and utilize single-round auction to form teams and then form coordination between agents of the same team. In the same way, coupling and cooperation of the formed teams have been left unexplored in this work, though the developed framework was able to successfully resolve the required allocation issues. By the same token, Prorok et al. [25] model the multirobot system as a community of species considering agents' capabilities and then present decentralized and centralized methods to efficiently control the heterogeneous teams of robots, regardless of interaction and collaboration between those teams. Given the review above, there is a lack of critical attention paid to the cooperation and coupling between robot teams, formed based on agents' capabilities, to address a dynamic MRTA problem.
As we discussed earlier, the number and location of tasks are unknown for agents in a dynamic environment [11,12].
In this case, robots need to explore the environment to find tasks before accomplishing them. Since tasks usually require immense efforts to be completed in real-world problems such as a rescue mission, a suitable robot needs to be equipped with various sensors and devices, much more complex mechanisms, and a higher number of actuators. As a consequence, the robots inevitably become heavy and ponderous which cannot explore the environment agilely and efficiently. Motivated by this complexity, this paper proposes a nature-inspired approach called "hunter and gatherer" which employs two teams of robots: a team of agile robots that can quickly explore an environment and detect tasks, called "hunters," and a team of dexterous robots who accomplish detected tasks called "gatherers." In fact, we are turning a dynamic MRTA problem into a problem where each task is composed of two sequential subtasks: exploration and completion. Considering this, when there are synchronization and precedence (SP) constraints which specify an ordering constraint for time-extended assignment (TA) problems [26], the MRTA is referred to as a TA:SP problem [27]. To the best of the authors' knowledge, the MT-MR-TA:SP problem has not been tackled in the literature so far, while it is a ubiquitous problem in a wide variety of fields such as USAR and agricultural field operations.
Consider the USAR in a disaster site in which a number of victims have got stranded in unknown locations and need immediate rescue operations. Each victim is a task that needs to be detected first and then rescued by a rescue operation that typically needs several dexterity actions, such as providing logistics supports, rubble removal, object manipulation, and in situ medical assessment and intervention [13], which make a rescue robot heavy and incapable of agile search operations. is is because a rescue robot needs to have a heavy-duty manipulator, high-power actuators, tracked locomotion mechanism, high-capacity batteries, and many sorts of sensors, cameras, and communication devices to accomplish those tasks which make the robot relatively heavy and ponderous. Hence, let us consider each task comprises two sequential subtasks: detection and completion, where each subtask can only be carried out by a certain type of robot. us, the case encounters an ST-MR-TA:SP or MT-MR-TA:SP problem. In the USAR example, hunters can be a group of small and light-weighted unmanned aerial vehicles (UAVs) which search the site to locate victims and gatherers can be a group of maxi-sized [13] heavy-duty unmanned ground vehicles (UGVs) that rescue detected victims relying on their dexterity capabilities.
According to the proposed hunter-and-gatherer scheme, we present a game-theoretic solution which considers coupling and cooperation between complementary agents divided into different teams by (1) utilizing market-based negotiation models, auction [28][29][30], and reverse auction and (2) introducing decentralized incentive-based decisionmaking algorithms. Proposed algorithms rely on new notions of certainty and uncertainty profit margins (CPM and UPM) that, respectively, determine the levels of confidence and conservativeness of each agent in negotiations to minimize the collective cost of task accomplishments. To enhance the effectiveness of proposed algorithms, a 2 Complexity multitask-planning algorithm is invented for gatherer agents that enables them to queue multiple tasks in their action plan for finding the optimal solution for completing a group of tasks rather than doing one by one. We show that employing two complementary teams of hunters and gatherers can effectually improve the number of tasks completed by agents, while the collective cost of accomplishments is minimized. Moreover, the stability and efficacy of the assignment algorithms are proven by a Nash equilibrium analysis and simulation experiments, respectively. Besides, we investigated the distribution of workload, as the total costs and accomplishments of a mission, among agents and showed that the proposed algorithms function fairly; that is, for each type of agent, the overall workload is distributed equally, and all agents of the same type behave analogously under similar characteristics. e remainder of this paper is organized as follows: e problem statement and formulation are presented in Section 2. In Section 3, the methodology including conceptual frameworks, reasoning mechanisms, and algorithms is proposed. Nash equilibrium analysis is carried out in Section 4. In Section 5, statistical analysis on simulation results is presented followed by a concluding discussion in Section 6.

Problem Statement
In this section, the problem of hunter-and-gatherer mission planning (HGMP) in the context of dynamic MRTA is explained. Assume that there are m tasks distributed randomly over the environment, E. We consider a case that the number and the locations of tasks are unknown for agents before the execution of the HGMP. e set of tasks is denoted as T � T 1 , ..., T m in which each task is split into hunting and gathering subtasks, i.e., T k � t h k , t g k with 1 ≤ k ≤ m, where t h k and t g k represent hunting and gathering subtasks, respectively. In this case, the set of agents is defined as A � A h , A g that comprises two teams of hunters A h � a h i and gatherers A g � a g j , where 1 ≤ i ≤ n h and 1 ≤ j ≤ n g . e cost associated with a h i for accomplishment of t h k is denoted as c h k,i , and c g k,j is the cost associated with a g j for accomplishment of t g k .

Assumptions.
roughout this paper, the following are assumed: (1) Tasks are stationary; that is, they are fixed to their locations. (2) e cost of accomplishment of each task is equal to the distance that an agent moves to do a task. An agent is considered done with a task when it reaches the task's location. (3) All agents of the same team are identical. (4) All agents are rational; that is, they intend to maximize their expected utility. (5) All agents are fully autonomous and have their own utility functions; that is, no global utility function exists.
(6) Agents from complementary teams can communicate with each other using a stably connected network.
Now, the HGMP problem can be stated as follows: Suppose that there exists a tuple for the mission such that HGMP � (E, α, Τ). Π denotes the assignment function which assigns m tasks to n � n h + n g agents such that Π: T ⟼ A. Under assumptions (1)-(6), the global objective Θ is to minimize the collective cost of Π: where x i k and y j k are binary decision variables for t h k and t g k : In (1), weighting parameters ρ h and ρ g are introduced to sum relative collective costs of complementary teams because of the physical differences of each type Table 1.
is problem has a global objective Θ which can be achieved by determining the binary decision variables optimally. ese variables need to be determined by the agents throughout explorations and negotiations in a distributed manner. Since agents are rational, each agent's objective is to maximize its own expected utility. As a consequence, the objectives of agents may be conflicting during the HGMP. Hence, the methodology should be developed so that it handles these conflicts in order to maximize the effectiveness of the HGMP and achieve the global objective Θ.

Conceptual Frameworks.
Hunters are assigned to explore the unknown environment. ere is I h as the incentive reward for a hunter, denoted as a h i , who detects a task, denoted as T k . However, the detected task can only be completed by cooperation with a gatherer. us, an extra incentive is added for motivating agents from complementary teams to build up a cooperation, denoted as I ex . Hunters and gatherers involve in negotiation processes to reach agreements for completing the tasks and sharing I ex between themselves. In a negotiation, a hunter who has detected a task on one side and one or more gatherers on the other side are involved. An agreement determines which gatherer is assigned to complete the detected task and how much is its share from I ex . Let us denote 0 ≤ P h k,i ≤ 1 and P g k,j � 1 − P h k,i as the proportions that a h i and a g j receive from I ex for accomplishment of T k , respectively. Also, the gatherer who completes the detected task receives I g as a gathering incentive, when the task is completed. Since all agents are rational, they intend to maximize their incentives by accomplishing more tasks through building up more cooperation.
To establish the process by which agents come into an agreement, we define an online board on which each hunter announces the location of its new detection to find gathering partners for starting a negotiation process. Each gatherer follows the announcements on the online board to choose a waiting hunter for negotiation by analyzing the location information shared by each waiting hunter. A gatherer then sends a readiness message to the chosen hunter to start a negotiation.
We consider two possible scenarios in order to develop reasoning mechanisms for agents to negotiate and cooperate: (1) a waiting hunter receives only one readiness message and (2) the waiting hunter receives more than one readiness message. e first scenario resembles the bargaining or reverse auction process as there is only one buyer who aims to bargain for finding the most affordable option. e second scenario is similar to an auction process where usually there is more than one buyer interested in a specific object. We utilize these two market-based methods as negotiation models between negotiating agents. In addition, it is possible that the number of waiting hunters on the board, denoted as n h w , be more than one. In this case, the question that how a gatherer chooses a hunter among n h w waiting agents is addressed in Section 3.5. For the time being, we assume that gatherers already know how to choose a partner and we focus on the negotiation reasoning mechanisms.
Fundamentals of reasoning mechanisms are discussed in the next section, and next, we will explain how agents rely on their reasoning mechanisms to behave in the reverse auction and auction scenarios in Sections 3.3 and 3.4, respectively.

Reasoning Mechanism.
In this section, reasoning mechanisms for both hunters and gatherers are developed to establish their behavior during a mission that determines the way that they communicate, negotiate, and cooperate. Since fundamentals of reasoning mechanisms are similar for both type of agents, for the sake of brevity, we consider a general agent defined as a ϑ z with ϑ ∈ h, g , where 1 ≤ z ≤ n ϑ . Moving on, it is time to introduce the CPM and UPM for a ϑ z . e CPM is a circular margin with a radius of R ϑ c , in which a ϑ z is certain about a profitable agreement even if its share in I ex is zero. e UPM is a circular margin between two concentric circles with radiuses of R ϑ u and R ϑ c , in which a ϑ z is uncertain about making a profit in an agreement; that is, its profit strongly depends on its proportion of I ex . Furthermore, a ϑ z cannot make any profit beyond its UPM even if it receives I ex entirely. Figure 1 shows the CPM and UPM as two concentric circles with a ϑ z as the center. e agent compares its cost for accomplishing the task with its CPM and UPM to realize its state to make profitable decisions during the negotiation. e following statements explain the states of a ϑ z with respect to its cost for accomplishing t ϑ k : (i) State 1: if c ϑ k,z < R ϑ c , then a ϑ z can make a profit regardless of its proportion of I ex (ii) State 2: if R ϑ c < c ϑ k,z < R ϑ u , then the profit of a ϑ z depends on its proportion of I ex e cost associated with a h i for accomplishment of t h k and the cost associated with a g j for accomplishment of t g k , respectively n Defined as n � n h + n g , where n h and n g are the total number of hunters and gatherers, respectively Π e assignment function which assigns m tasks to n agents such that Π: T ⟼ A Θ e global objective which is to minimize the collective cost of Π ρ h and ρ g e weighting parameters of hunters and gatherers, respectively x i k and y j k e binary decision variables for t h k and t g k , respectively I h and I g e incentive rewards for accomplishing a hunting or gathering task, respectively I ex An extra incentive for motivating agents from complementary teams to build up a cooperation ϑ A generalized subscript designated to denote both hunter and gatherer agents, where: Share of the i th hunter in I ex for hunting the k th task P g k,j Share of the j th gatherer in I ex for gathering the k th task n h w Total number of hunters that have detected a task and are waiting for starting a negotiation α h and α g Scaling parameters of certainty profit margins of hunters and gatherers, respectively β h and β g Scaling parameters of uncertainty profit margins of hunters and gatherers, respectively R h c and R g c Certainty radius for hunters and gatherers, respectively R h u and R g u Uncertainty radius for hunters and gatherers, respectively U h (t h k ) e profit earned by a hunter for the k th task U g (t An offer made by the j th gatherer to the i th hunter for the k th task b k j,i A bid placed by the j th gatherer to the i th hunter for the k th task Υ j Action plan of the j th gatherer where Υ j � υ 1 , ..., υ q and q ≤ q max with q max as the maximum number of tasks that a gatherer can queue 4 Complexity then a ϑ z cannot make any profit even if it receives all of I ex We formulize the CPM and UPM for a ϑ z by defining R ϑ c and R ϑ u . If a ϑ z accomplishes t ϑ k , then it receives I ϑ as an incentive. Since a ϑ z is certain about receiving I ϑ , we have R ϑ c ∝ I ϑ . Hence, we define R ϑ c by introducing α ϑ as a scaling parameter for the CPM: In addition, a ϑ z receives a proportion of I ex that will be determined by the negotiation process, so a ϑ z is uncertain about its share of I ex . us, we define R ϑ u by introducing β ϑ as a scaling parameter for the UPM: Altogether, for a ϑ z involved in a negotiation, the utility function defined below determines its payoff. Definition 1 (utility function). U ϑ (t ϑ k ) gives the profit earned by a ϑ z for accomplishing t ϑ k and building up a cooperation. e utility function of a ϑ z is defined as Now, we define a profit interval for a ϑ z , regarding its state for accomplishing t ϑ k , by which it evaluates its results in a negotiation. A profit interval is an interval for P ϑ k,z that guarantees the negotiation's profitability. According to assumption (4), a ϑ z wants to maximize its payoff, so in each negotiation, a ϑ z definitely makes a nonnegative profit such that is can be written as e overlap of 0 ≤ P ϑ k,z ≤ 1 and (7) yields the profit interval for P ϑ k,z . e overlap in all three states is expressed as follows: If a ϑ z is in state 1, then c ϑ k,z < α ϑ I ϑ and the left side of (7) is negative. Hence, the overlap is 0 ≤ P ϑ k,z ≤ 1; that is, in state 1, a ϑ z makes a profit regardless of its share in I ex . If a ϑ z is in state 2, then we have c ϑ k,z > α ϑ I ϑ . erefore, the overlap And if a ϑ z is in state 3, then we have c ϑ k,z > α ϑ I ϑ + β ϑ I ex , so the left side of (7) is greater than one. Hence, the overlap of (7) and 0 ≤ P ϑ k,z ≤ 1 is a null set; that is, the task is not profitable. Accordingly, the profit interval of a ϑ z for accomplishing t ϑ k is defined as where L ϑ low and L ϑ up denote lower and upper bounds, respectively.

e First Scenario: Reverse
Auction. Consider the scenario shown in Figure 2(a) and suppose that a h i has detected T k at the cost of c h k,i , it has posted an announcement on the online board, and it receives a readiness message only from a g j . is message is a request for a quotation message; that is, a h i offers a proportion for sharing I ex and a g j decides to accept or reject the offer. Accordingly, we explain how a h i makes offers and a g j makes an acceptance or rejection decision, using the proposed reasoning mechanisms.
According to the process illustrated in Figure 3, a h i calculates the lower bound, the midpoint, and the upper bound of its profit interval for making 3 offers. Since a h i is making offers to a g j , it should send offers using P g k,j � 1 − P h k,i , as follows: According to Figure 3, at the first decision node, a h i makes an offer regarding the explained process, and then at the second step, a g j decides to accept or reject the offer. Algorithm 1 illustrates the bargaining procedure for a h i . In this algorithm, a g cand denotes the gatherer agent that has sent the readiness message. In line 3 in Algorithm 1, is assigned to the vector "offers" in a random order.
Besides, a g j uses its own profit interval to make an acceptance or rejection decision. For each received offer made by a h i , if the offer is inside pi g , then a g j accepts the offer. Otherwise, it rejects the offer.

3.4.
e Second Scenario: Auction. Consider the auction scenario shown in Figure 2(b) and suppose that a h i has detected T k at cost c h k,i , it has posted an announcement on the online board, and it receives readiness messages from n b ≥ 2 gatherers. In this case, a h i holds an auction and selects the winner, where gatherers bid for sharing I ex to win the detected task and complete it. Accordingly, both types of agents' reasoning mechanisms need to be investigated.
We utilize the "second-price sealed-bid auction" as the negotiation framework in which the winning bidder is an agent who has placed the highest bid and it pays the amount equal to the second highest bid to the hunter holding the auction. In this auction, a g j , a gatherer bidding in the auction, can bid its valuation. Since it will not pay as much as it bids if it wins, a g j still has a chance to get a positive benefit from the auction. erefore, the main advantage of the second-price auction over the first-price auction is that truthful bidding is an optimal strategy in a second-price auction, and as a result, it is ensured to converge to an optimal solution. Trustful bidding means that it is an optimal strategy for a bidder in a second-price auction to bid however much it values that object [31]. To that end, we explain how a g j bids using its profit intervals first and then we discuss the way that a h i chooses the winning bidder. a g j bids its valuation that is the lowest bound of its profit interval. Since a g j is making an offer to a h i by bidding, it should send the bid using P h k,i � 1 − P g k,j , as follows: Besides, a h i chooses the winner bidder after a single round of bidding. Firstly, a h i chooses the winning bidder, a g w , regarding the maximum bid in the set of bids, denoted as b, such that Secondly, a h i checks if the second highest bid satisfies the minimum acceptable bid determined by its profit interval. Since the share of a h i in I ex must satisfy (6), the minimum acceptable bid is the lower bound, L h low , of its profit interval such that 3.5. Multitask Planning. In Figure 4, suppose that a h 1 has detected T 2 . a g 2 is the only gatherer agent that can send a readiness message to start a negotiation with a h 1 because a g 1 is busy with gathering T 1 . In this case, t g 2 ⟶ a g 2 is an inefficient planning where c g 2,1 < c g 2,2 . Alternatively, if a g 1 is able to plan for multiple tasks at a time, it could gather T 2 at a lower cost. Accordingly, to prevent such ineffective plannings in the HGMP, in the following, a multitask-planning algorithm for gatherers is proposed.
Let us define an action plan in which a g j queues multiple tasks to accomplish in the future such that Υ j � υ 1 , ..., υ q , where q ≤ q max with q max as the maximum number of tasks   ( 4) for each offer do (5) send the offer to a g cand (6) if an acceptance message is received then (7) results ⟵ [a g cand , offer] (8) break for (9) else (10) results ⟵ ∅ (11) end if (12) end for (13) return results ALGORITHM 1: Bargaining function for a hunter agent.  Complexity that gatherers can queue. Each task has a profile in the gatherer's action plan containing required information: υ q � t g k , c g k,j , c′ g k,j , P g k,j , where c′ g k,j denotes the temporary cost calculated by the agent for t g k . υ 1 through υ q are the tasks that a g j is already planned to accomplish (actual tasks). In addition, Γ j denotes the path that a g j follows to accomplish Υ j . Now, assume that a g j wants to add a new task to Υ j as υ i+1 , when q < q max . e first step is choosing a waiting hunter, and the second step is negotiating with the chosen agent. e negotiation processes have been discussed before, so here we focus on the procedure that a g j chooses a waiting hunter to fill up Υ j . e proposed method relies on the CPM and UPM to develop the gatherers' reasoning mechanism so that a g j fills up Υ j effectually. To that end, a three-step process in which a g j chooses a waiting hunter agent for negotiation is proposed. Before starting the process, a g j follows the online board and lists the waiting hunters in A h waiting ordered by their waiting time; that is, the oldest is the first in the list. Process steps are elaborated as follows: Step 1: a g j considers the most prior task from A h waiting , denoted as T cand . en, a g j plans the shortest multidestination temporary path, using the A * search algorithm [32], denoted as Γ j ′ , to gather all tasks in Υ j plus T cand . When Γ j ′ is generated, the temporary cost of each task must be updated in each task's profile.
Step 2: a g j verifies the feasibility of making a profit from T cand . us, it checks whether T cand is in state 3 regarding Γ j ′ . If T cand is not in state 3, then a g j goes to the next step. Otherwise, a g j withdraws T cand and starts over from the first step.
Step 3: a g j examines the effect of choosing T cand on the actual tasks in Υ j . When a g j generates Γ j ′ in step 1, it may have c g k,j ≠ c ′g k,j for the actual tasks in Υ j . For this reason, a g j checks whether (6) is still true for newly calculated temporary costs for each actual task. If (6) is true for all actual tasks, T cand is verified for starting a negotiation process; otherwise, a g j withdraws T cand and starts over from the first step.
Algorithm 2 illustrates the multitask-planning procedure for a g j . is algorithm is developed as a function for choosing a candidate task detected by a waiting hunter by considering Υ j . Not to mention, the output of this algorithm is not a task in the agent's action plan, i.e., υ q+1 . e output is a candidate task detected by a waiting hunter that potentially can be added to Υ j as υ q+1 depending on the negotiation process. Figure 5 illustrates an example in which a g 1 fills out its action plan where q max � 5. All sequences happen before accomplishing υ 1 . In sequence 1, a g 1 has already υ 1 in its action plan and chooses T C1 to negotiate. In sequence 2, a g 1 has reached an agreement for accomplishing T C1 and adds it to its action plan as υ 2 . Moreover, there are two candidates, and T C2 is chosen to negotiate because T C3 is not feasible and fails to satisfy the condition mentioned in step 2. In sequence 3, a g 1 has reached an agreement for accomplishing T C2 and adds it to its action plan as υ 3 . In addition, there are two candidate tasks, namely, T C4 and T C5 . T C5 cannot be verified in step 3, though it is feasible itself and passes step 2. erefore, a g 1 chooses T C4 for negotiation. In sequence 4, a g 1 has reached an agreement for accomplishing T C4 and adds it to its action plan as υ 4 . Moreover, it chooses T C6 to negotiate because it passes all 3 steps. Although choosing T C6 causes change in the path to T 2 , it does not bring a g 1 into state 3 for T 2 .

Decision-Making Algorithms.
Firstly, we propose a distributed decision-making algorithm determining the exploration, detection, and negotiating procedure for a h i in the HGMP. Basically, we utilize the distributed approach to ensure the reliability of the MAS, where centralized MASs may not be robust for the reason that they are relying on a single central unit. In addition, the inherence of the proposed reasoning mechanism enables agents to make the decision independently regardless of any central unit. Furthermore, the nature of the profit margins limits all interactions between agents to local regions within the environment, so there is no need for a central unit to play a role. Even in the case of auctions, each hunter who has detected a task holds a local auction and plays the role of an action organizer temporally and locally. Having said that, a distributed decision-making algorithm fits the best to the proposed reasoning mechanisms. Accordingly, Algorithm 3 illustrates the decision-making procedure for a h i . In each iteration, a h i explores the environment to detect a task. When a h i detects a task, denoted as T detected , it announces the location on the online board and waits to receive readiness messages. According to the number of readiness messages that a h i receives, it starts a reverse auction or auction negotiation process to reach an agreement. If a h i reaches an agreement, then it starts exploring the environment again. Otherwise, a h i announces its detection on the online board again and does the same procedure. τ denotes the iteration number, and τ max denotes the maximum iterations in a mission.
Secondly, we present a distributed decision-making algorithm determining the negotiating and accomplishment procedure in the HGMP for a g j regarding the explained reasoning mechanisms. Algorithm 4 illustrates the decisionmaking procedure for a g j . In each iteration, when q < q max , a g j manages its action plan by calling the "choose partner" function first and then negotiating with the chosen hunter upon availability. If the negotiation is succeeded, then it adds the new task to Υ j and updates Γ j . Moreover, in each iteration, a g j follows Γ j to gather tasks in Υ j . When a task is gathered, a g j updates Υ j by removing the accomplished task.

Nash Equilibrium Analysis
It is important to study the stability of the proposed algorithms to ensure that agents do not have motivation to change their behavior during the HGMP, i.e., to make sure that agents can make optimal decisions in the scenarios and do not vacillate in negotiations and task accomplishments. In this section, we study the stability of the proposed algorithms in both reverse auction and auction scenarios.

e First Scenario: Reverse Auction.
Consider a hunter and a gatherer agent whose preferences over outcomes are given by the utility functions U h (t h k ) and U g (t g k ), respectively. As shown in Figure 3, the model in which agents negotiate in the first scenario is a simplified reverse auction or bargaining process. According to assumption (6), each agent obtains sufficient information about all actions and utilities. us, the model turns into a perfect-information extensive form game which resembles a sharing game. We know that every (finite) perfect-information game in the extensive form has a pure-strategy Nash equilibrium (PSNE) [33]. However, the existence of PSNE does not necessarily ensure that the output of the first scenario is a PSNE. It strongly depends on the decision-making algorithm of each agent. erefore, we need to prove if the output of the proposed reasoning mechanisms in the first scenario is a PSNE.
According to the proposed reasoning mechanisms, each agent calculates a profit interval to make the most profitable decision. To be specific, a h i makes its best response by making (1) for τ � 1: τ max do (2) if hold � � false then (3) explore the environment (4) if a new task is detected then (5) T detected ⟵ the new detected task (6) hold ⟵ true (7) end if (8) end if (9) if hold �� true then (10) announce T detected on the online board (11) if one readiness message is received then (12) bargain (13) else if readiness messages >1 then (14) hold an auction (15) end if (16) if the negotiation is succeeded then (17) hold ⟵ false (18) mark the task as accomplished (19) end if (20)  (1) for τ � 1: ifa h cand ≠ ∅then (5) send the readiness message to a h cand (6) if received a bargaining notification then (7) bargain (8) else (9) place a bid for the auction (10) end if (11) end if (12) if the negotiation is succeeded then (13) add T cand to Υ j as υ q+1 (14) Γ j ⟵ Γ j ′ (15) end if (16) end if (17) follow Γ j to accomplish tasks and update Υ j (18) end for ALGORITHM 4: Decision-making outer-loop algorithm of a gatherer agent. 8 Complexity offers that fall into its profit interval. Similarly, a g j makes its best response to the scenario by accepting the offers within its profit interval. In other words, the decision of each agent is its best possible response to the scenario, and it knows that the counterpart agent is also making its best response. We know that the strategy profile in which each agent is making its best response to another agent is a PSNE [33]. Consequently, the HGMP's outcome is a PSNE in the first scenario.
Although the model itself ensures the existence of PSNE and the reasoning mechanisms' outcome is a PSNE, the desirability of PSNE is still a considerable concern. e following numerical example explains the details on how scaling parameters can affect the PSNE in the first scenario.
In the reverse auction scenario, pure strategies for a h i and a , offer1} is one of the equilibria; that is, a g j accepts the first offer which results U g (t g k ) < 0. On the contrary, if we only change scaling parameters of a g j such that α g � β g � 0.9, then {(A, A, A), offer1} is no longer a PSNE. Instead, {(R, R, R), offer1} is a PSNE; that is, a g j rejects the first offer. In conclusion, the desirability of PSNE in the first scenario can be guaranteed by designating appropriate scaling parameters α h , β h , α g , and β g .

e Second Scenario: Auction.
In the second scenario, we investigate the existence of NE by a theorem based on the CPM and UPM concepts. We investigate 3 conditions to find the NE in an auction process. We will prove the theorem by contradiction; that is, we show that no agent, involving in an auction scenario, has a motivation to deviate from a strategy profile which satisfies all 3 conditions. Theorem 1. Consider the HGMP in the second scenario associated with the second-price sealed-bid auction with participation of a hunter and gatherers whose preferences over outcome are given by the utility functions U h (t h k ) and U g (t g k ),

respectively. en, b is a Nash equilibrium if and only if conditions (i) and (ii) are satisfied for w � argsmax(b) and condition (iii) is satisfied for the hunter agent:
(i) max j≠w (α g I g + β g I ex − c g k,j ) ≤ b w , i.e., the winner submitted a sufficiently high bid (ii) max j≠w b j ≤ α g I g + β g I ex − c g k,w , i.e., the winner's valuation is sufficiently high (iii) α h I h + β h (P h k,i I ex ) − c h k,i ≥ 0, i.e., the second highest bid satisfies the minimum bid determined by the hunter Proof. If (i) does not hold, max j≠w (α g I g + β g I ex − c g k,j ) > b w , then a g j ≠ w has an interval to increase its bid, [k g low , P g k,w ], in which it can lower its share to k g low and place even a higher bid than b w and win the auction. Hence, a g j≠w has a motivation to deviate and increase its payoff. If (ii) does not hold, max j≠w b j > α g I g + β g I ex − c g k,w , then for a g w , denoted as the winner, we have U g (t g k ) < 0; that is, its payoff is negative. erefore, it can deviate by submitting a losing bid and increasing its payoff to 0. Finally, if (iii) does not hold, then the hunter agent's payoff is negative for the second highest bid. us, it can deviate by rejecting all bids and increase its payoff to 0 because it has a strong motivation to hold another auction in the following iterations and avoid a negative payoff. □ Nevertheless, the existence of NE does not necessarily ensure that the scenario's output is a NE. It strongly depends on the decision-making algorithm of each agent. In this regard, we know that each gatherer involving in the auction scenario places a bid according to (9).
is means each gatherer agent bids its own valuation, i.e., b j � v j . Accordingly, conditions (i) and (ii) are always true because not only the winner has placed the highest bid among all bidders but also it does not have a negative payoff. Besides, the hunter agent is using (10) to choose the winning bidder and (11) to verify the minimum requirement satisfaction of the second highest bid. Hence, condition (iii) is also true. As a conclusion, according to eorem 1 and also the decisionmaking algorithms of all agents participating in an auction, the result of the auction scenario is a NE.

Simulation Results
In this section, we present simulation results to (1) validate the fairness of the proposed algorithms, i.e., to ensure that the overall workload is distributed equally among agents of both types, by comparing agents' effectiveness in a set of experiments and analyzing the results by paired T-test and ANOVA [34] methods, (2) study the effect of profit margins on the total effectiveness of the HGMP, (3) demonstrate the efficacy of the proposed multitask-planning algorithm for gatherers by investigating its effect on the HGMP's total effectiveness, and (4) verify the functionality of the hunter-and-gatherer scheme, i.e., considering each task comprising two sequential exploration and completion subtasks, by a comparison between the HGMP and a basic alternative method in which each agent does both hunting and gathering tasks itself.
To simulate the proposed approaches, we developed a multirobot simulation platform in MATLAB from scratch. In this platform, we can implement the simulations on any custom map, while the number of agents of each type is adjustable. We provide some basic functions for each type of agent to enable them maneuver over the determined environment. For gatherers, we utilized the A * -based motion planning algorithm which enables them to move along two points in a grid environment. In addition, we provided a basic frontier-based exploration algorithm [35] for hunters. Besides, the number of tasks is also adjustable while they get located randomly over the environment. As a matter of fact, we also provided the perpetual mode for implantation of the Complexity simulations where for each gathered task, another task will be distributed randomly in the environment. Accordingly, at each iteration, there are a certain number of tasks available in the environment which is adjustable for each mission. Furthermore, in the perpetual mode, each explored and known grid of the environment turns into an unknown grid after certain iterations. e perpetual mode helps the analysis be done in a much more accurate and evidence-based way.
All simulations have been executed under the following conditions: (1) the environment is sectioned as an e L × e W grid of tiles where e L � e W � 100, (2) the quantities of each type of agent are adjusted as n h � 4 and n g � 2, (3) there are always m p � 50 tasks in the environment, (4) the maximum number of iterations is determined as τ max � 1000, (5) the rewards are assigned to be I h � I g � I ex � 140, and (6) we considered the weighting parameters as ρ h /ρ g � 0.2.

Fairness of the HGMP.
To demonstrate that the accomplishment workload is distributed equally for each type of agent, the concept of fairness is introduced. To that end, we define an effectiveness factor for each agent of both types based on their costs and accomplishment. en, using the statistical analysis, we prove the fairness of the HGMP by comparing effectiveness of different agents of each type. Let η h i and c h i denote the effectiveness of a h i and the number of tasks hunted by the agent, respectively, as follows: Similarly, η g j and c g j denote the effectiveness of a g j and the total number of tasks gathered by the agent, respectively, such that (13) Figure 6(a)shows the statistical results of η h i for all hunters in 200 missions. As n h > 2, an ANOVA test has been applied to the collected data to statistically prove the fairness of the HGMP for hunters. e ANOVA test has been applied as follows: In addition, as n g � 2, a paired T-test has been applied to the data to investigate the fairness of the HGMP for gatherers. e hypothesis testing has been done in a manner such that H 0 : μ dof � 199, and α � 0.05. According to the test, p value � 0.315. Since p value > α, we must retain the null hypothesis.
erefore, it has been proven that μ g 1 − μ g 2 � D 0 � 0, as it is illustrated in Figure 6(b), which means that there is no significant difference between averages of gatherers' effectiveness in 200 tests.
Both statistical analyses indicate that all agents of the same type behave analogously under similar characteristics. In fact, this analysis numerically validates the Nash equilibrium analysis proved for the HGMP. It means that if the fairness concept investigated above is not valid for the HGMP and favors certain agents unfairly, then there are strong motivations for other agents to deviate from the proposed negotiation structure.

Effects of Agents' Profit Margins on Mission's Effectiveness.
e effects of scaling parameters of profit margins, α g , β g , α h , and β h , on the total effectiveness of the HGMP need to be investigated in order to show the functionality of the CPM and UPM for both types of agents. To that end, we define an effectiveness factor for the HGMP, η t , which is the ratio of the total number of completed tasks, c t , and the collective cost of the whole mission, C t , as follows: We ran the algorithms for all values of α g and β g that are multiples of 0.025 such that 0 ≤ α g ≤ 0.5 and 0 ≤ β g ≤ 0.5, while α h � β h � 0.35. η t has been calculated for each set of values for α g and β g , as illustrated in Figure 7. Basically, this figure explains the correlation between the total effectiveness of the HGMP and the profit margin parameters of gatherers. e yellow area shows the area in which the total effectiveness is maximum. In this figure, the horizontal and vertical axes are α g and β g , respectively, i.e., the scaling parameters of gatherers, and the color mapping represents the total effectiveness of the HGMP, i.e., η t . According to the results, η t is vanishingly small when α g + β g < 0.2 which means agents cannot reach an agreement for completing the detected tasks. Furthermore, η t reaches its maximum when 0.2 < α g + β g < 0.5. Next, η t falls gradually when α g + β g > 0.5 because each gatherer's CPM and UPM are large so that the agent does not fall into state 3 and easily reach any agreement. As a result, each gatherer accomplishes a significant number of tasks inefficiently which reduces η t .
In the same way, we ran the algorithms for all values of α h and β h that are multiples of 0.05 such that 0 ≤ α h ≤ 1 and 0 ≤ β h ≤ 1, while α g � β g � 0.15. η t has been calculated for each set of values for α h and β h , as illustrated in Figure 8. Basically, this figure explains the correlation between the total effectiveness of the HGMP and the profit margin parameters of hunters. e yellow area shows the area in which the total effectiveness is maximum. In this figure, the horizontal and vertical axes are α h and β h , respectively, i.e., the scaling parameters of hunters, and the color mapping represents the total effectiveness of the HGMP, i.e., η t . Accordingly, η t is too low when α h + β g < 0.4 approximately, which means the CPM and UPM of hunters are too small and only a few agreements are reached. en, for α h + β g > 0.4, η t increases gradually to reach its maximum and then again decreases.

Complexity
According to the proposed reasoning mechanism, when the scaling parameter of an agent's CPM decreases, the agent gets less confident. And when the scaling parameter of an agent's UPM increases, the agent gets less conservative. In this regard, for both types of agents, the best strategy to reach the maximum of η t is neither being completely confident nor being fully conservative, but a combination of both leads to the optimum result. e oblique yellow area in Figure 7(b), exposing the maximum values of η t , is much narrower than the one in Figure 8(b). It shows that the CPMs and UPMs of gatherers have a more distinct influence on η t than the ones of hunters. e rationale behind this dissimilarity is that hunters rely on their CPMs and UPMs after hunting a task, i.e., after accomplishing a task, and then consider them only for finding a gatherer to complete the task. On the contrary, gatherers consider their CPMs and UPMs before gathering a task, i.e., before any accomplishment. Consequently, this difference causes a much more distinct influence of gatherers' CPMs and UPMs on η t .

5.3.
e Effect of Multitask Planning on the HGMP's Effectiveness. In this section, we aim to study the effect of the proposed multitask-planning algorithm for gatherers on the total effectiveness of the HGMP defined in (16). Accordingly, we investigate the effect of q max , which is the queue size of each gatherer, on η t . To that end, we ran 200 missions for each value of q max , varying from 1 to 10, and measured η t in each mission, as illustrated in Figure 9.
To understand how much η t increase when q max changes from q max � 1 to q max � 10, we applied a paired Ttest to the two of collected data sets. e first data set contains 200 measures of η t for q max � 1, and the second data set comprises 200 measures of η t for q max � 10. e test has been conducted considering H 0 : μ 2 − μ 1 ≤ D 0 , H 1 : μ 2 − μ 1 > D 0 , D 0 � 0.7μ 1 , n s � 200, dof � 199, and α � 0.05, where μ 1 and μ 2 denote the average of η t for the first and second data sets, respectively. According to the test result, p value � 0.0004, t � 3.39, and t 0.05,199 � 1.65. Since t > t 0.05,199 and p value < α, we reject H 0 . erefore, the results prove that η t increases more than 70% by changing q max from 1 to 10. Moreover, the results also show that the HGMP remains fair for gatherer agents by increasing q max . Figure 10 demonstrates that there is no significant difference between effectiveness of two gatherers for each value of q max .
Besides, Figure 11 shows how the HGMP's total effectiveness converges for different values of q max in a manner such that q max � 1, q max � 4, and q max � 10. According to the results, by increasing the value of q max , η t becomes more variant and the convergence time decreases, while η t enhances significantly as was proven before.

Functionality Validation of the HGMP by a Comparison.
In this section, we are intended to analyze the functionality of the proposed hunter-and-gatherer scheme. As discussed before, we consider a dynamic problem to be a TA:SP problem where each task is composed of two sequential detection and completion subtasks. Although we have discussed different aspects of the proposed approach in the previous sections, here we want to explicitly compare the proposed approach with an alternative approach in which  Maximum queue size q max = 1 q max = 4 q max = 10 Figure 11: Convergence of the HGMP's total effectiveness for q max � 1, q max � 4, and q max � 10, during τ max � 1000 iterations.
there is only one type of agent doing both exploration and completion of tasks together. According to the rationale behind the hunter-andgatherer approach, hunters must be more agile and costefficient in exploration and maneuvering. erefore, we first plotted the total effectiveness of the HGMP with respect to ρ h /ρ g which ranges from ρ h ≪ ρ g , i.e., ρ h /ρ g � 0, to ρ h ≈ ρ g , i.e., ρ h /ρ g � 1. Second, we ran the explained alternative approach to be able to judge the HGMP's functionality. Since in this approach there is no hunterand-gatherer scheme, we only have one type of agent and the obtained total effectiveness is dependent on the ratioρ h /ρ g . By this comparison, we basically wanted to answer the following question: Is the HGMP profitable compared to the alternative method? Figure 12 shows the results of the implemented simulations for that purpose, as explained above. us, the answer is that it depends to the ratio ρ h /ρ g and this is why we ended up in having a criterion for the HGMP to be profitable. According to the results, for ρ h /ρ g < 0.2, the HGMP has distinct advantage in terms of η t over the alternative model for any value of q max . Furthermore, the HGMP still remains advantageous for ρ h /ρ g < 0.6. Consequently, it is economic to employ the HGMP for the stated dynamic problem if and only if we utilize the hunter-and-gatherer agent that satisfies ρ h /ρ g < 0.6. In other words, if we employ two robots from different types as a hunter and gatherer such that the hunter's cost for following a certain path is less than 0.6 of the gatherer's cost for following the same path, then employing the HGMP will be profitable. Considering the USAR example, the hunter can be a small UAV, while the gatherer should necessarily be a heavy-duty UGV. If we consider the cost as the power consumption, then the ρ h /ρ g < 0.6 criterion will be satisfied easily.
A screen capture video of the simulation results can be found as a supplementary material along with this paper, by using the YouTube link "youtu.be/HJuiP5DMZfo," or by scanning the following QR code.

Conclusion
Inspired by the problem of "MRTA in an unknown environment," we proposed the idea of task allocation based on coupling and cooperation between complementary teams in a hunter-and-gatherer scheme. Furthermore, this work presented distributed reasoning mechanisms relying on the notions of certainty and uncertainty profit margins in which levels of confidence and conservativeness are modeled, while an effective multitask-planning algorithm for gatherers is proposed that allows them to queue multiple tasks for finding the optimal solution for completing a group of tasks rather than doing one by one. By comparing the proposed hunter-and-gatherer scheme with an alternative method, where there is only one type of agent doing both exploration and completion of tasks together, we established a criterion to judge profitability of the proposed method. Examining the real-world problems mentioned earlier confirms that the profitability criterion is reasonably satisfiable. We also found that the extreme behavior of an agent, being too confident or too conservative, hurts the total effectiveness of the mission. Furthermore, statistical analysis demonstrates a significant improvement of total effectiveness effected by the multitask-planning algorithm. However, while computational complexities for execution of the multitask-planning algorithm manifold by increasing the size of an agent's queue size, the total effectiveness of the HGMP does not increase linearly.
Future works will consider the problem of adjusting the scaling parameters by an agent during a mission to achieve the optimal performance from both agent and team points of view. We also intend to develop a multirobot exploration algorithm based on the notions of profit margins in the context of dynamic MRTA problems and investigate the effect of different multirobot exploration algorithms on the HGMP.
Data Availability e data and source code used to support the findings of this study are available from the corresponding author upon request.

Complexity 13
Conflicts of Interest e authors declare that they have no conflicts of interest.