Modeling of Task Planning for Multirobot System Using Reputation Mechanism

Modeling of task planning for multirobot system is developed from two parts: task decomposition and task allocation. In the part of task decomposition, the conditions and processes of decomposition are elaborated. In the part of task allocation, the collaboration strategy, the framework of reputation mechanism, and three types of reputations are defined in detail, which include robot individual reputation, robot group reputation, and robot direct reputation. A time calibration function and a group calibration function are designed to improve the effectiveness of the proposed method and proved that they have the characteristics of time attenuation, historical experience related, and newly joined robot reward. Tasks attempt to be assigned to the robot with higher overall reputation, which can help to increase the success rate of the mandate implementation, thereby reducing the time of task recovery and redistribution. Player/Stage is used as the simulation platform, and three biped-robots are established as the experimental apparatus. The experimental results of task planning are compared with the other allocation methods. Simulation and experiment results illustrate the effectiveness of the proposed method for multi-robot collaboration system.


Introduction
The field of distributed robotics started in the late 1980s, when several researchers began investigating issues in multiple mobile robot systems. Prior to this time, researches had concentrated on either single robot systems or distributed problem-solving systems that did not involve robotic components [1]. In the early distributed robotics work, the topics of particular interest include (1) cellular (or reconfigurable) robot systems, such as cellular robotic system [2] and cyclic swarms [3]; (2) multirobot motion planning, such as traffic control [4] and movement in formation [5]; (3) architectures for multirobot collaboration, such as ACTRESS [6]. Also, literatures on other particular topics, such as robot colonies [7] and heterogeneous multirobot systems [8], have been published.
In recent years, multirobot systems research has made great progress in many areas [9]. Issues of multirobot system [10] include biological inspirations, communication, architectures, localization/mapping/exploration, object transport and manipulation, motion coordination and formation [11,12], reconfigurable robots, and learning.
Collaboration is an important characteristic and a major evaluation indicator for multirobot systems [6]. Multirobot systems can complete the mission through collaboration which a single robot cannot achieve. It is not the linear summation of the role of single robot but includes the individual-based interactively incremental besides the linear summation, which increases the overall performance [13].
Task planning includes two aspects [9]: task decomposition and task allocation. Presently, task planning system has been widely used in the space shuttle satellites, pilotless aircraft, and so forth. The research of multirobot systems is closely related to the distributed artificial intelligence, which consists of two main areas of research: distributed problem solving (DPS) and multiagent systems. DPS mainly focuses on how to decompose a particular problem and how to solve the problem among multiple agents, including three main aspects, which are problem decomposition, subproblem solving, and synthesis of the findings.
In the task decomposition and distribution research, there are several traditional methods: Contract Net Mechanism [14], Local/Global Planning [15], Distributed Searching, and Joint Intentions. Contract Net Protocol is an important noncentralized task decomposition and distribution model in distributed artificial intelligence.
The traditional task allocation is mainly based on contract net protocol (CNP) [14], which can provide the solution for the issues by consulting to avoid conflict. Parker [16] developed a behavior-based distributed multirobot collaboration structure named L-ALLICANCE and the corresponding system with the capability of parameters learning named L2-ALLICANCE [17] in MIT. The system utilizes an incentive-based task allocation mechanism for behavior and enhances the generalization and error tolerance of multirobot system. Parker also proposed a behavior-based task allocation mechanism named ASyMTRe [18].
Gerkey applied the market algorithm in multirobot system to deal with dynamic task allocation problem named MURDOCH [19]. The market mechanism in task allocation is based on consultations. The robots complete the task assigned through mutual consultation and negotiation on the basis of certain agreement in multirobot system. However, the task allocation methods have their limitations to some extent. Behavior-based task allocation mechanism does not take into account the influence of collaboration history, while market-mechanism-based task allocation does not fully consider the impact of the time factor, leading to poor robustness.
In addition, the topics of multirobot search and rescue, cooperative localization, motion coordination, and formation control attract a lot of researchers in the fields of robot collaboration system [20]. Collaboration optimization method, including linear programming method and the Hungarian algorithm [21], can be applied to solve simple tasks and robot collaboration optimization problems. But when the number of robots and tasks increases in the system, the computational complexity will increase exponentially.
In this paper, a task planning method based on reputation mechanism is proposed. Reputation plays an important role in the collaboration among people. In many cases, task allocation is based on someone's reputation, which is gained from the evaluation of the completion of historical tasks. Reputation theory is attracting interest from industrial and academic research communities and increasingly being integrated with online services and applications, especially in network computing system.
Reputation of robots is the overall assessment and the summary of past actions observed from one robot to another through the gradual dynamic capabilities in a continuous interactive process. The assessment can be used to guide further actions of the robot. Reputation includes five attributes: (1) the related characteristics of environmental context, (2) dynamic characteristics of changing over time, (3) time lag characteristics of forming through the continuous learning and historical experience, (4) group characteristics of being affected by the group or alliance of the robot, and (5) incentive and reward mechanism. The robot with good reputation should be rewarded in task allocation. Otherwise, it should be punished.
The rest of the paper is organized into four sections as follows: the system structure and basic concepts are presented in Section 2. The modeling of task planning using reputation mechanism is proposed in Section 3. Simulation and experimental results are shown in Section 4, and concluding remarks are in Section 5.

Basic Concepts and System Structure
Task planning consists of two parts: decomposition and allocation. Task allocation in the multirobot system is divided into two categories: direct allocation and delegation allocation. Direct allocation is to assign a task to robot that can provide collaboration directly. If the robot cannot complete the assigned task, the assigned task can be delegated to other robots. The task can be further delegated if needed.
The periodical constraint is utilized to guarantee the time effectiveness of task allocation. ( , ) is a periodical constraint from to , which means that the task allocation is valid in the period from the time to . If a task cannot be accomplished in the arranged time, it will be withdrawn, which can improve the performance of the system. A typical collaboration system is shown in Figure 1.
In Figure 1, there are two cooperative groups. Tasks can be assigned by direct allocation or indirect delegation.
In the robot reputation system, robot reputation level (RRL) is proposed to quantify the reputation of each robot, which is based on the historical experience and the overall evaluation of the system. "0" and "1" are used to quantize the reputation value. "0" represents the lowest reputation and "1" represents the highest reputation. The symbol of robots group is indicated as . The value on the line from robot 1 to robot 2 is the direct reputation from 1 to 2 , and the individual reputation value of the robot 1 is the value on the robot, shown in Figure 2.
In Figure 2, the reputation of the group is the value beside the name of the group; for example, the reputation of group 1 is 0.5. The reputation of robots in the collaboration system consists of three parts: (1) the individual reputation; (2) the group's reputation where the robot belongs to; (3) the reputation between the two robots intending to cooperate.

Task Planning Using Reputation Mechanism
In this section, task planning is divided into two parts: task decomposition and task allocation. Firstly, the condition and process of the task decomposition are presented. Secondly, task allocation using reputation mechanism is defined and presented in detail.

Task Decomposition.
A task can be decomposed if it meets certain conditions. Four task decomposition conditions are given in [22]. If the task can be decomposed and is assignable, the formal definition of automata is as follows. Group 1 Group 2  An automata is a tuple = ( , 0 , , , ), consisting of a set of states ; the initial state 0 ; a set of events that causes transitions between the states; a transition relation ; is the acceptable states set. = 0 1 ⋅ ⋅ ⋅ −1 ∈ * is a finite alphabet and a sequence 0 , 1 , . . . , of + 1 states in , and 0 ∈ 0 , +1 ∈ ( , ).
The parallel composition of 1 and 2 is the automaton , defined as follows: The natural projections of in local event sets is obtained as ( ), merging the events-related states which belong to the event sets / .
(1) Decomposition Condition 1 where, 1 , and 2 are events; is a set of total events; local event sets ; a set of states ; a transition relation ; If state is reachable under events 1 , 2 that means existing a local events which contains events 1 , 2 , making the state remain reachable under the event sequence 1 2 and 2 1 .
(2) Decomposition Condition 2 where * is sequence of events. If the state is reachable under the event sequence 1 2 or 2 1 , it can launch a local existing events , which contains events 1 , 2 , making the state remain reachable under the event sequence 1 2 and 2 1 .
The Scientific World Journal ̸ = can launch ∩ ( ) and ∩ ( ), beginning with the same event. ( ) is the set of all sequences of events of automaton ;̃( ) is the set of the sequences of events of automata ; the natural projections of the th local event set in the corresponding sequence of events .
Formal linear temporal logic language is used to present the whole tasks of the robot system, which is almost the same with the natural language in structure. Several temporal operators of task decomposition are presented as follows [22]: (i) next state "o": requires that a property hold in the next state of the path; (ii) until "∪": used to combine two properties. The combined property holds if there is a state on the path where the second property hold, and at every preceding state on the path, the first property holds; (iii) eventually "⬦": used to assert that a property will hold at some future state on the path; (iv) always "◻": specifies that a property holds at every state on the path. Such as: The reachability while avoiding some events: The equivalent task automata can be transformed from the formula. The natural projections of to local event sets are obtained as ( ). Linear temporal logic and automata have a close relationship, which can be directly exported from linear temporal logic formula, because every linear temporal logic formula consists of connectors and temporal operators.

Reputation Mechanism.
The negotiation process of the robot collaboration system is presented, which mainly is concerned about how to build partnerships between the collaborative robots according to the reputation mechanism.
Definition 1 (robot initialization set). The robot initialization set is identified as ⟨ , , CHAP , AC ⟩, where denotes the ontology of collaboration robot, which can get the identification corresponding to ; ⊂ is an identification set, these identification sets are issued to in . CHAP: → is a partial order function, mapping a finite subset of attributes to the strategy set. The properties of CHAP are called limited properties. For the limited property , CHAP[ ] is called the verification strategy of . AC: → is a partial order function, mapping resources to a finite strategy set.
Definition 2 (robot collaboration strategy). A collaboration strategy is defined as a quintuple ⟨ , , init , start , reply⟩, where is the finite set of the collaboration process state, each state denoted by , and, ∈ ; is the finite set of the collaboration process messages. The massage is denoted by and usually consists of subscripts. The message sequences can be denoted as 1 , . . . , . The function init : × → defines the initial state of the requester. When the initialization set and requesting robot are given, this state is init ( , ) = , where ∉ {success, failure}. start : × Res × → × defines that the collaboration is started by a requested robot. The function reply: × → × defines each action of the robot. When the robot initialization set , current state and the latest message from the other side are given, reply( , ) = ⟨ , ⟩.
Definition 4 (direct reputation). Giving that 1 denotes the domain where 1 belongs to, 2 denotes the domain where 2 belongs to. The direct reputation from 1 to 2 is presented as where  ] .

(10)
Definition 8 (time calibration function). Giving that 1 denotes the domain where 1 belongs to, 2 denotes the domain where 2 belongs to. − denotes the last collaboration time between robot 1 and robot 2 . The time calibration function can be expressed as where 0 denotes the history experience coefficient, generally, 0 = 1. 1 denotes the reputation reward coefficient. − ( − − ) denotes the correction operator. 2 denotes the reputation penalty coefficient. + denotes the incentive ratio in time . − denotes the penalty ratio in time . denotes validation coefficient of the collaboration, if the assigned task is successfully finished, = 1; otherwise = 0. 3 denotes the attenuation coefficient.
If the robot does not take any assigned task, its reputation will diminish as the time went on. The coefficients is satisfied with 0 < 0 , 1 , 2 , 3 ≤ 1, > 0. Characteristic 1 (time attenuation). With no compensation, the proposition will be proved if ( 1 ) > ( 2 ), when 1 < 2 where 3 ( 0 ) > 0 and 1 < 2 , so the equation above is greater than 0; therefore, the proposition may be established. In the process of collaboration, if no compensation, the value of the function will be decreased.
Characteristic 2 (natural attenuation curve). Generally, 0 is 1. The value of (11) is related to and 3 . Taking that > 0, 0 < 3 ≤ 1, a different set of curves will be obtained, which are called the attenuation curves.
Characteristic 3 (historical experience related). Assuming that (11) do not have the characteristics of historical relevance, in the absence of natural attenuation as well as under the circumstances of no penalty and award compensation, reputation will have nothing to do with the historical experience. Then 0 ( − , ) = 0, because ( − , ) is not equal to zero, so 0 = 0, which is conflicted with 0 ̸ = 0. Therefore, the assumption is wrong; that is, (11) is historical experience related.
Definition 9 (group calibration function). Giving that denotes the group where robot 1 belongs to, denotes the group where robot 2 belongs to, denotes the environment context, the group calibration function is expressed as If Ω 1 = NULL or Ω 2 = NULL, then ( , , ) ∈ [0.5-0.8]. 0 < 0 ≤ 1, 0 < 1 ≤ 1, 0 denotes the positive coefficient, 1 denotes the negative coefficient. The typical parameters of groups are given in Table 1.
Characteristic 4 (newly joined robot reward). According to (13), if a robot does not belong to a group or has no collaboration with any robot in the group, the group reputation of the robot is NULL. When cooperating with the other robots, ( , , ) ∈ [0.5-0.8], which stands for relatively high reputation.

Experiments and Results Analysis
Many simulation platforms can be used for multirobot systems [23], for example, Player/Stage, TeamBots, Gazebo, USARSim, Webots, Microsoft Robotics Studio, Swarmbot 3D, Swarmanoid Simulator, and so forth. In this paper, Player/Stage is selected as the simulation platforms because of the simple operability and flexibility. Player/Stage [24] was developed by Robotics Research Lab in University of Southern California in 1999, which is an open-source project that provides internal interface and simulation environment for multirobot system. The platform can be modified and expanded by the researchers worldwide according to their requirements.

Simulation and Performance
Analysis. Three robots 1 , 2 , and 3 are put in area 1 and with the thrust capability of 2 , 3 , and 5 , respectively. Their initial reputation values are 0.5, 0.9, and 0.7. Three boxes 1 , 2 , and 3 are put in area 2, which need moving thrust of 2 , 6 , and 4 , respectively. The moving condition is that the thrust provided by the robot is greater than the needed thrust of the box.
Robot 2 is given the highest reputation value who is responsible for sending arranged commands to 1 and 3 . Assuming that assignment is given to one robot to push the box, the robot moves from area 1 to area 2, and then finds the homologous box and try to push it.
However, if 2 is too heavy to be pushed and needs cooperation from other robots, 1 and 3 are required to cooperate. The robot 2 goes to push 1 , and after that 1 and 2 will go back to area 1. 3 returns to area 2, and continues pushing 3 to area 3. All of them return to the initial place waiting for next assignment. The process can be translated into task automata , shown in Figure 3.
The assigned task is to push the boxes to area 3. The local event set of each robot is presented as follows: Checking that the decomposition conditions 1 , 2 , and , which certify that the task is decomposable and decomposed into three subtasks 1 ( ), 2 ( ), and 3 ( ).
DTM presents the direct reputation relationships among robots. Three robots in the collaboration system named from 1 to 3 , the DTM is shown as The reputation in the matrix is updated after each step of the collaboration. Box moving is simulated by the approach, shown in Figure 4.
Two typical allocation algorithms are used to make contrast and evaluate the performance of the proposed method, which are sequence allocation and auction allocation. The simulation has seven robots in area 1 and 14 boxes in area 2, without obstacles between them.
The Scientific World Journal In the case of the sequence allocations, the results of mod 3 may be 1, 2, 3, 4, 5, 6, and 0. Tasks are assigned to the robot 1 , 2 , . . . , 7 , respectively. The simulation process of sequence allocation is in Figure 5.
In the case of the auction allocation, the bidding is lauched at the beginning by the auctioneer. The best bid is picked out by the predetermined standard. The task will be assigned to the winner of the auction. The winner of the auction is chosen by the auctioneer giving the highest bid.
The bid matrix is used to store the value of robot biding for each assignment. A typical bid matrix in the simulation is presented as The tasks are performed in the following sequence: 5 , 6 , 1 , 7 , 4 , 2 , 3 , 5 , 6 , 1 , 7 , 4 , 2 , 3 . The simulation process can be described in Figure 6.
In the case of the task allocation by reputation, the initial reputation value of the system is the following matrix: The DRM of seven robots is shown as The value of DRM is updated after each process of collaboration. If successfully finished the assigned task, the value of direct reputation between the robots increases by 0.01, otherwise reduces 0.05. In the simulation process, if the assigned task failed, the moving trajectory of the robot is the same to facilitate the performance comparison. The process of performing tasks is shown in Figure 7.
Every experiment is simulated six times to get a reasonable performance assessment. The ultimate results are the arithmetic average of the six times.
The order allocation needs about 635.71 s to complete the assigned tasks. The auction allocation needs about 591.63 s to complete the tasks. The reputation-based allocation method needs about 525.15 s. The efficiency of task allocation by reputation is higher than the average of the other two methods, as shown in Figure 8.

Experimental Apparatus and Evaluation.
The bipedrobot apparatus are utilized as the experimental platform. The shape parameters of the robot are with upright height 33.3 cm, width 9.9 cm, arm length 15.9 cm, arms stretched flat horizontal length 41.7 cm, upper high 13.9 cm, waist high 19.4 cm, and weight 1 kg, as shown in Figure 9.
The wireless sensor module is used to send and receive commands, which are coded as the standard serial data and sent to the assigned robot. The effective transmission distance is about 30 m. The experimental parameters of task allocation are shown in Table 2.
Four task allocation methods are engaged to evaluate the efficiency. In addition to the proposed method, the other three methods are random allocation, order distribution, and simultaneous allocation.
A nonnumbered box with the size 8 cm 3 is arranged to be moved to the destination. The moving distance is set to be 0.6 m. After receiving the task assignment, the robot moves to the side of the box, pick it up, move it, and put it down at the     For the case of task allocation using order distribution algorithm, the results of mod 3 may be 1, 2, and 0. Tasks are allocated to the robot 1 , 2 , and 3 , respectively based on the results, and every task allocation time interval is 30 s. For the simultaneous allocation algorithm, the allocation time interval is 0 s. For the case of task allocation using random distribution, the random number set [0, 1) is used. When the generated random number ∈ [0-1/3), the task will be assigned to robot 1 ; if ∈ [1/3-2/3), the task will be assigned to robot 2 ; if ∈ [2/3-1), the task will be assigned to robot 3 .
The tasks numbered 1 , 2 , . . . , are generated by the task sequences. stands for the number of tasks. = 9 is taken, respectively, as the arranged tasks and each running time is 7 s. The arithmetic mean value of the maximum time and minimum time is removed to effectively prevent the  unexpected result of random events. The results are shown in Figure 10.
With the number of tasks increasing to = 33, it is taken, respectively, as the arranged tasks (medium number tasks). The arithmetic mean value of the maximum time and minimum time are removed. The experimental results are shown in Figure 11.
According to the practical application, the robots with relatively high-speed capability are allocated more tasks and the efficiency of the task allocation by reputation is 4.35% higher than the second high method.
When the number of tasks continues to increase to = 180, which are taken, respectively, as the arranged tasks (large number of tasks). The test results are shown in Figure 12.
In the case of large number of tasks, the efficiency of task allocation by reputation is 3.57% higher than the second high method. The experimental results show that the task allocation using reputation mechanism can effectively increase the performance and prevent a robot from a delay in the case of the individual robot failure.

Conclusion
Task planning is developed by two parts: task decomposition and task allocation. The processes of the task decomposition and task allocation using reputation mechanism, are presented. The robot collaboration strategy, the framework of reputation mechanism, and three reputations are defined in detail, which includes robot individual reputation, robot group reputation, and robot direct reputation. Time calibration function and group calibration function are designed to improve the effectiveness of the method, which are proved to be with characteristics of time attenuation, historical experience related, and newly joined robot reward. The success rate of collaboration is enhanced and the time of recovery and redistribution of the task are reduced.
Player/Stage is used as the simulation platform, and three biped-robots are established as the experimental apparatus. In the simulation, task decomposition is studied, and the result of task allocation is compared with the sequence and auction allocation methods. The biped-robots are used in the experiments, and four task allocation methods are engaged to evaluate the efficiency. The simulation and experimental results show that the approach can provide an effective performance for multirobot system.