Modeling Multioperator Multi-UAV Operator Attention Allocation Problem Based on Maximizing the Global Reward

. This paper focuses on the attention allocation problem (AAP) in modeling multioperator multi-UAV (MOMU), with the operator model and task properties taken into consideration. The model of MOMU operator AAP based on maximizing the global reward is established and used to allocate tasks to all operators as well as set work time and rest time to each task simultaneously for operators. The proposed model is validated in Matlab simulation environment, using the immune algorithm and dynamic programming algorithm to evaluate the performance of the model in terms of the reward value with regard to the work time, rest time, and task allocation. The result shows that the total reward of the proposed model is larger than the one obtained from previously published methods using local maximization and the total reward of our method has an exponent-like relation with the task arrival rate. The proposed model can improve the operators’ task processing efficiency in the MOMU command and control scenarios.


Introduction
The mode of human in the loop is widely used in the command and control (C2) of unmanned aerial vehicles (UAVs) presently.With the development of autonomous control and artificial intelligence technology, the problems in the single operator controlling multi-UAV (SOMU) have been investigated [1][2][3][4][5].Given the increasing complexity of the missions, paradigm is shifting from single operator managing single UAV (SOSU) or multiple UAVs (SOMU) towards the concept of multioperator managing multi-UAV (MOMU) [6].By assigning multiple operators to multiple UAVs, the flexibility of human-system decision-making can be improved [6].Therefore, there have been many researchers working on C2 of MOMU [7][8][9][10][11][12].However, the increased size of information captured by UAVs will restrict the tasks that need to be accomplished in a timely manner [6].In order to improve the task processing throughput in MOMU, it is of necessity to understand which tasks the operators should attend to and when they should, which is known as the attention allocation problem (AAP).
Srivastava et al. [13] pointed out that work time and rest time for each task should be allocated to the operator.Bertuccelli et al. [4] proposed a nonpreemptive scheduling formulation for a single operator performing a search mission with multi-UAV in a time-constrained environment.Jian et al. [5] modeled a single operator AAP with one operator controlling multi-UAV.Srivastava et al. [13][14][15][16] proposed an optimization framework to solve the concerns about how much attention the operator should be allocated as well as where it should be allocated.Crandall et al. [17,18] introduced an attention allocation method for one operator controlling multirobot, and they applied this method with different strategies.It was found that guiding operator attentional resources can more effectively exercise the operator's judgment and experience than dictating them.However, the methods for single operator AAP proposed above are not applicable for multioperator AAP.In multioperator AAP, not only the work time and rest time but also how the tasks would be allocated to the operators should be considered and dealt with.
A few studies were concerned about multioperator AAP.Verma and Rai [6] modeled MOMU operator AAP and balanced reward maximization and operators' workload minimization.But the work time of tasks of the model was fixed and a local optimization of one task had been achieved.Majji and Rai [19] further extended the research of Srivastava et al. [16] and developed an optimized solution to the multioperator AAP, which obtained a local optimization of one operator.The aforementioned methods accomplished a local optimization in solving multioperator AAP, without considering the influence of wait time on the tasks.Our research further improved the study of some previous works [5,13,19] by taking the wait time into consideration for multioperator AAP.Targeting to maximize the global reward, a model of multioperator AAP based on maximizing global reward is developed.
The paper is organized as follows: Section 2 introduces the theory on attention allocation; Section 3 proposes the MOMU operator AAP model; the simulation and result are presented in Section 4; and finally Section 5 discusses the conclusion and further improvement.

Theory on Attention Allocation Problem
Multioperator AAP in MOMU is dominated by various factors such as operator state, task properties, and environment [19].Environment is a complex factor due to its unpredictability; thus the two following main aspects are considered here: operator state and task properties.Operator state includes operator utilization ratio, operator skill level, and operator performance.Task properties include task starting time, latency penalty due to wait time, weight, complexity, and reward.

Operator Utilization Ratio.
Operator utilization ratio represents the busy level of one operator.The higher the utilization ratio is, the busier the operator will be.
The relation of operator utilization ratio and operator performance obeys the Yerkes-Dodson law [20], which is shown in Figure 1.According to the Yerkes-Dodson law, the performance of a human operator is unimodal, which can be expressed by an inverted-U function of the utilization ratio [13].When an operator is within an appropriate range of utilization ratio, the performance of the operator will be relatively high.Otherwise, the performance will decrease.High utilization ratio will fatigue the operator while low utilization ratio will make the operator out of the control loop, which will lead to the decreasing situational awareness of the operator.As a result, both high utilization ratio and low utilization ratio will distract the operator's focus on the C2 of UAVs.In this way, the operator will have to take more time to deal with a task.Operator utilization ratio  can be defined by the following differential equation [13]: where is a constant that depends on operator's sensitivity.If the initial operator utilization ratio is  0 ∈ [ min ,  max ], the work time is  and the rest time after the task is .Then, recursive expression for the operator utilization ratio is The evolution of operator utilization ratio with work time and rest time is shown in Figure 2. The utilization ratio will increase as the work time becomes longer.On the contrary, the ratio will decrease with longer rest time.

Operator Skill Level.
Operator skill level represents the ability of the operator to process tasks.Operator skill level will affect the performance of operator.In this article, all operators are assumed to have the same skill level to avoid any difficulties in quantification.

Operator Performance.
As UAVs are under supervisory control, a UAV operator can be treated as a two-alternative decision-making server.The decision made by an operator can be either correct or wrong.The probability of correct decision is defined to be the performance of the operator.
There are two models of two-alternative decision-making for operators.[21].The probability of correct decision at a given work time , while  ≥ 0, is expressed as follows:

Pew's Model
where  1 is the correct decision,  is the work time, and  0 , , and  are the parameters related to operators and tasks.The evolution of the probability of the correct decision under Pew's model is shown in Figure 3(a).[22].The probability of correct decision at a given work time , while  ≥ 0, is expressed as follows:

Drift Diffusion Model
where  is drift rate,  is diffusion rate,  is decision threshold, and Λ ≡ (,  2 ) is the evidence at time t.The evolution of the probability of the correct decision under drift diffusion model is shown in Figure 3(b).Pew's model has been widely used as operator performance models in recent researches on operator AAP [5,16,19].In this paper Pew's model is also used as the operator performance model.The operator performance formula is where  and  are parameters that are determined by operator skill level and task complexity.Assume that all the operators have the same skill level, so  and  are determined only by task complexity.

Task Properties
Task Weight.Tasks performed by UAV operators have different types.Even the tasks having the same type may have different degrees of importance.Thus, each task has a task weight denoted by , which is used to represent the task importance [16].
Task Complexity.Task complexity is denoted by , which can affect work time and operator performance.
Latency Penalty.When the task queue is not empty, operator cannot handle all the tasks on time.Tasks waiting in the queue will cause latency penalty.The latency penalty per unit time of one task is  [16] (each task has a different ); work time is ; rest time after the task is .All the tasks after the handling one requires wait time of ( + ), which will cause latency penalty.Assume that the sum of latency penalty per unit time of all the tasks after the handling one is   ; then the latency penalty caused by the handling task is   ⋅ ( + ).
Task Reward.Task reward is the product of task weight and operator performance:  ⋅ () [16].
Task Starting Time.As tasks arrive dynamically, their starting time is not the same, which causes different wait times in task queue.And different wait times will affect the latency penalty.

Evaluation Criterion of Attention Allocation
2.3.1.Integrated Reward.With dynamic arriving of tasks, the attention allocation is solved in multistage.In each stage, there is integrated reward of both a single task and a single operator.Integrated reward of a single task is the task reward subtracting the latency penalty caused by this task.Integrated reward of a single operator is the sum of integrated rewards of all his current tasks in that stage.

Global Reward.
Global reward is the evaluation criterion of attention allocation in one stage, which is the sum of integrated rewards of all the operators in one stage.

Total Reward.
Total reward is the sum of global rewards in all stages, which is the evaluation criterion of attention allocation after all the tasks have been allocated.

MOMU Operator AAP Modeling
In Section 2, the concepts about attention allocation were introduced.According to the process of multi-UAV supervisory control, a multi-UAV multioperator attention allocation framework was established, which is shown in Figure 4.
UAV swarm generates and broadcasts the tasks, which need to be handled by the operators.And then, according to the task properties and operator state, the support system allocates all the tasks in the task queue to multiple operators using MOMU operator AAP model based on maximizing the global reward.At the same time, the support system allocates work time and rest time to each task.Operators deal with the tasks and send results to the support system to achieve the C2 of UAVs.
Assume that there are  operators, and they need to handle  tasks at some point; the operator performance for task  ∈ {1, 2, . . ., } is expressed by the operator performance formula (6):   .Task complexity is   , which is characterized by the pair (  ,   ).According to the importance of the task, the support system will set a weight   to task .Latency penalty per unit time of task  is   .Assume that the support system allocates work time   and rest time   to task ; the reward of task  is     (  ), and the latency penalty due to the wait time of task  is (  +⋅ ⋅ ⋅+  )(  +  ).The utilization ratio of operator  before processing task  is    .  is the function of    , which is used to determine the lower limit of expected work time of task  for operator .The support system solves multioperator AAP in order to maximize the global reward at each stage.
At a stage, assume that operator  gets   tasks.Work time    and rest time    are allocated to task   , and task reward for processing task   is       (   ).The work time and rest time of task   will not affect the tasks before it; the latency penalty caused by task   is (   + ⋅ ⋅ ⋅ +    )(   +    ).Integrated reward of one task is the task reward subtracting the latency penalty caused by this task.Then, the integrated reward of task   is The integrated reward of operator  is the sum of integrated rewards of his   tasks and it is written as The sum of integrated rewards of all the operators is the global reward.For all  ∈ {1, 2, . . ., },    indicates whether operator  is assigned to perform task , where    = 1 means that operator  will process task ; otherwise, operator  will not process task .Task reward for processing task  is        (  ).The sum of latency penalty per unit time of all the tasks after task  is ( Then the integrated reward of operator  is where are the bounds of operator utilization ratio, and t, r are N-vectors of   ,   , which represent work time and rest time, respectively.z is an  ×  matrix of    , which represents task assignment indicator variable.
Suppose that tasks arrive at a certain rate ; the task queue will change dynamically.Attention allocation for dynamically arrived tasks will be solved at multistage.After an attention allocation which is referred to as a stage here, operators will work for a while to fulfill the tasks.And when one of the operators has no task to handle, the support system will start another stage.The interval between two adjacent stages is , which is the shortest total time of operators' remaining tasks in the previous stage.During time , the number of tasks in the task queue will be  =  ⋅ .Assume that the number of the tasks will not change during a stage; at each stage the number of tasks will be constant.Just before each stage, some of the operators' local task queues may not be empty, so two variables   and   are introduced.  is the integrated reward of remaining tasks in the local task queue of operator , and   is the time required for the remaining tasks.The sum of latency penalty per unit time of all the new tasks is ∑  =1      , and the latency penalty cause by   is   ⋅ ∑  =1      .Then the integrated reward of operator  is where , and   is the number of remained tasks of operator .
Two variables   and   are added to the integrated reward in (12).The objective function is the sum of the integrated rewards of  operators.Then the objective function is evolved to During an attention allocation, operator may be performing a task, and this task will not be a complete one.Assume that    and    are the remaining time to complete task   ; then   and   are modified as where  = (   +    )/(   +    ),  ∈ [0, 1].Actually, only    and    of the first task in the local task queue of an operator are different from    and    .
Assume that there are  stages; the total reward is the sum of the global rewards at all the stages.The global reward at each stage can be obtained by (13).At stage  there are   tasks.Then the total reward will be written as

Simulation
All the simulations are carried out in Matlab.The setup of each simulation of the method in this paper is shown in Simulation Setup.
Yes: Output the optimal task assignment z * , t * , r * , go to step (6); No: go to step (7); Then  =  + 1,  =  ⋅ ; if  > , then  = .Calculate R, W using equation ( 14), go to step (2); Else Then end.(7) Copy according to the fitness and concentration of replication; (9) Variation, go to step (4).[13].The operator utilization ratio range is  ∈ [0.5, 0.9].The number of operators is  = 4.The initial value of utilization of four operators is [0.6, 0.8, 0.9, 0.7].The initial number of tasks in the task queue is 10, and there will be 30 tasks dynamically generated.The MOMU operator AAP model based on maximizing global reward is solved by immune algorithm and dynamic programming algorithm.The result of this simulation is shown in Figure 5, where work time is shown in Figure 5(a) and rest time is shown in Figure 5(b).

The Result of MOMU Operator AAP Model
The result of the simulation shows that the model can dynamically allocate the attention of operators.However, not all the tasks are handled during the simulation, while some tasks are dropped in order to maximize the global reward.

Comparison with Model Based on Maximizing Local
Reward.This experiment is set to compare the model based on maximizing global reward (BOGR) with the model from the reference [19].Since dynamic attention allocation was not introduced in [19], here is a brief description of dynamic attention allocation model based on maximizing local reward (BOLR).
Step 1. Sequence the operator in ascending order according to the utilization rate.
Step 2. Use the optimal solution from [19] to solve APP for all the operators in turn.
Step 3. Sequence the operator in ascending order according to the total time for all the tasks in their local task queues.
Step 4. Wait until one of the operators has no task.If the task queue is empty, wait a unit time for the tasks coming until no task comes; otherwise, return to Step 2.
With the initial number of tasks in the task queue being 10 and following 30 tasks dynamically generated, the model introduced in this paper which is based on maximizing global reward and the method described above which is based on maximizing local reward are used to solve multioperator AAP, respectively.In both methods, the total rewards are obtained.This experiment considers two conditions with the number of operators being  = 2 and  = 4, respectively.In these two conditions, ten times of simulations are carried out using the two methods.And the results are shown in Figures 6(a) and 6(b).It is shown that, compared to the total reward of the model based on maximizing local reward, the total reward obtained from the model in this paper is increased by 28% in the case of  = 2 on average while it is increased by 30% in the case of  = 4.
From the results of this experiment, it is shown that the total reward of the model based on maximizing global reward is larger than the reward obtained from the method based on maximizing local reward.But the model in this paper costs longer time than the compared model because of using intelligent optimization algorithm.

The Effect of Task Arrival Rate on Total
Reward.The number of operators is 4 in this experiment, and the task arrival rate is from 1 to 10.For each arrival rate the simulation is carried out ten times, and the average values of the total reward for each arrival rate are shown in Figure 7.
The result shows that the total reward decreases with the increasing of the task arrival rate at the beginning ( < 5).With the task arrival rate keeping increasing ( ≥ 5), the total reward fluctuates.The total reward decreases at the beginning ( < 5), because tasks arrive earlier when the task arrival rate increases.Thus, they will have to wait longer in the task queue and task latency penalty will become larger.As the task arrival rate keeps increasing ( ≥ 5) and the total number of tasks remains constant, the support system will allocate all the tasks almost in two stages.The first stage will allocate 10 tasks and the second stage will allocate all 30 tasks dynamically generated.With the same number of stages, the same property of tasks will lead to nearly the same result, different properties of tasks will lead to different results without too many changes for the bounds of the properties.So the total reward fluctuates with no significant increase or decrease.

Conclusion
The MOMU operator AAP model based on maximizing the global reward is established in this paper.This model can dynamically allocate all the tasks in the task queue to the proper operators and set work time and rest time to each task at the same time.Validated by the simulation routine using Matlab environment, it is found that the total reward of our model is larger compared to the values obtained from the previous methods based on local reward maximization.Moreover, the result shows that the task arrival rate and total reward of MOMU operator AAP model have an exponentlike relation.
The proposed method improves the study of the MOMU operator attention allocation problem by evaluating the  reward of task assignment planning of the command and control operator team, instead of only one single operator as concerned by the previous attempts.Accordingly, the team performance of command and control, or the global reward, has been enhanced.The method can be applied to the MOMU command and control scenarios and contribute to the task processing efficiency of the operator team.In this paper, operators are treated as two-alternative decision-making models.However, in other cases, operators of UAVs may have more than two decisions for command and control.In the future, we will consider multialternative decision-making models for operators.

Figure 3 :Figure 4 :
Figure 3: The evolution of the probability of the correct decision under Pew's and drift diffusion model.
(  ) −    (     + ⋅ ⋅ ⋅ +      ) (  +   )] .(10) Let    be the utilization ratio of operator  before starting task  and   (   ) be the function of    that captures expected service time of operator  on task .The aim of the model is to maximize the global reward in an attention allocation of a stage; the objective function of multioperator AAP, which is the sum of the integrated rewards of  operators, can be written as follows: max  (t, r, z) =    (  )

Figure 5 :
Figure 5: Result of attention allocation based on maximizing global reward.