Multirobot Task Allocation in e-Commerce Robotic Mobile Fulfillment Systems

RoboticMobile Fulfillment System (RMFS) is a new type of parts-to-picker order picking system and has become the development trend of e-commerce logistics distribution centers. ,ere are usually a large number of tasks need to be allocated to many robots and the picking time for e-commerce orders is usually very tight, which puts forward higher requirements for the efficiency of multirobot task allocation (MRTA) in e-commerce RMFS. Current researches on MRTA in RMFS seldom consider task correlation and the balance among picking stations. In this paper, a task time cost model considering task correlation is built according to the characteristics of the picking process.,en, amultirobot task allocationmodelminimizing the overall picking time is established considering both the picking time balance of picking stations and the load balance of robots. Finally, a four-stage balanced heuristic auction algorithm is designed to solve the task allocationmodel and the tasks with execution sequence for each robot are obtained. By comparing with the traditional task time cost model and the algorithm without considering the balance among picking stations, it is found that the proposed model and algorithm can significantly shorten the overall picking time.


Introduction
e e-commerce logistics distribution center is an important part of the e-commerce supply chain, which greatly affects the operation efficiency of e-commerce. ere are usually a large number of stock-keeping units (SKUs) in the e-commerce logistics distribution center, and e-commerce orders have the characteristics of small batch, high-frequency, strong randomness, and tight distribution time, which puts forward higher requirements for order picking efficiency [1]. e traditional manual picking mode is unable to meet the demand because of its high error rate and low picking efficiency. In recent years, a new picking system Robotic Mobile Fulfillment System (RMFS) has become the development trend of e-commerce logistics picking system because of its high efficiency, intelligence, and flexibility [2]. e deployment of the KIVA logistics robot by Amazon in 2012 triggered the application market of RMFS [3]. In RMFS, shelves are carried by robots to picking stations, where pickers pick goods (parts) from shelves, so it is a kind of parts-to-picker picking system. Because of the significant difference between the new picking mode and the traditional manual picking mode, many decision-making problems in this new picking mode need to be studied in depth, such as storage assignment [4], order batching [5], multirobot task allocation [6], and path planning [7]. is paper mainly studies how to assign a batch of picking tasks to multiple robots, which belongs to multirobot task allocation (MRTA) problem.
Multirobot task allocation (MRTA) refers to the assigning of a series of tasks to multiple robots with certain constraints to achieve an objective, such as minimizing the total travel distance of all robots or the average cost of each task and so on [8]. e main methods for solving MRTA problems include combinatorial optimization, market-based approach, swarm intelligence approach, behavior-based approach, and emotional recruitment approach [9]. Among them, methods based on the market mechanism, such as the auction method, have attracted wide attention because of their high efficiency, good robustness, and easy expansion.
In the market mechanism, robots negotiate with each other through bidding and ultimately complete the task allocation [10]. In a dynamic task allocation environment, using an auction algorithm will contribute to the fairness and realtime of task allocation and reduce the design complexity of the system [11].
Zlot et al. [12] applied a market mechanism algorithm to multirobot dynamic task allocation for the first time, which can solve small-scale dynamic task allocation problems. Elango et al. [13] used the K-means clustering algorithm and auction mechanism to solve the dual-objective task allocation model, which considers both total travel distance and robot efficiency. Lozenguez et al. [14] proposed a sequential synchronous auction protocol to coordinate the task allocation of robots. Heap and Pagnucco [15] designed a repetitive sequential single clustering auction algorithm for multirobot dynamic task assignment. For task allocation in an intelligent warehousing system, Zhou et al. [16] proposed a balanced heuristic auction algorithm balancing a load of robots to improve the efficiency of task allocation in an intelligent warehousing system. An important part of the auction algorithm is the computation of task cost, which can be measured by completion time or path length. Liu and Kroll [17] regarded the time a robot takes to fulfill a task as the task cost for MRTA. Dou et al. [18] took path length as task cost using reinforcement learning. Lamballais et al. [19] established a queueing model to measure the utility of logistics robots and workers.
Although there are many literature on MRTA, few studies are about the MRTA problems in e-commerce RMFS. In an e-commerce order picking environment, there are a large number of picking tasks and mobile robots. So, the task allocation problem is a complex NP-hard optimization problem. Furthermore, the picking time for e-commerce orders is usually very tight, which requires higher efficiency of task allocation. erefore, it is necessary to propose an efficient task allocation method based on the application characteristics. Former research studies on MRTA mainly consider task completion time or the total travel length of robots but seldom consider the workload balance among picking stations and robots. Because of the parallel operation mode of multiple picking stations, the picking time of the whole system is determined by the picking station with the longest picking time. In the process of task allocation, the balance among picking stations should be fully considered. Uneven idleness among picking stations, e.g., robots wait in line at some picking stations while other picking stations are idle, certainly will reduce the efficiency of the system. erefore, balancing the picking time among picking stations plays an important role in improving the picking efficiency. In addition, the correlation among tasks was not fully taken into account in the previous task cost model. For example, if two tasks of one picking station are on the same shelf, by assigning the two tasks to the same robot and arranging their execution sequence adjacent to each other, the two tasks can be completed through one shelf visit and the cost for completing the tasks can be greatly reduced. e innovations of this paper are as follows: (1) based on the real operation mode that a robot can serve multiple picking stations at one time, the correlation between tasks is refined and a new task time cost model is proposed according to different types of task correlation. (2) Because of the parallel operation mode of multiple picking stations, the picking time of the whole system is determined by the picking station with the longest picking time. So the balance among picking stations as well as the load balance of robots is considered to improve picking efficiency. (3) A four-stage balanced heuristic auction algorithm is designed to solve the task allocation model, which achieves the goal of balancing picking time among picking stations by controlling the sequence of task assignments. e remainder of this paper is organized as follows. In Section 2, the task assignment problem of logistics robots in e-commerce RMFS is described in detail, and the parameters for model formulation are given. In Section 3, a new task cost calculation method considering the correlation between tasks is described, and the multiple logistics robot task allocation model considering the balance of picking stations is established. In Section 4, a four-stage balanced heuristic auction algorithm is designed to solve the task allocation model. In Section 5, simulation experiments are conducted, and the results are analyzed to verify the proposed model and algorithm. Section 6 concludes the paper and presents the limitations and prospects of the research.

2.1.
e Operation Process of e-Commerce RMFS. E-commerce orders have the characteristics of many varieties, small batch, and high frequency. In order to improve order picking efficiency, multiple orders arrived in a certain period of time are usually combined into one batch for picking, which is called wave-picking [20]. at is, the continuously arriving orders are placed in an order pool, and then, a certain number of orders are selected from the order pool as a wave of orders for picking. After that, orders are allocated to picking stations, and the items to be picked in the orders of each picking station are merged to generate a picking list, which consists of many picking tasks. Each item in the picking lists corresponds to a picking task. Finally, these picking tasks are assigned to robots according to some rules, and these robots cooperate to complete these picking tasks.
An e-commerce RMFS is depicted in Figure 1, which is similar to the typical KIVA Systems [21,22]. e picking system consists of movable shelves, picking stations, mobile robots, conveyor belts, etc. e storage area is composed of neatly distributed movable shelves. Different kinds of goods can be placed on the same shelf. Each of the picking stations on the left side is equipped with a picker to pick items from shelves and a buffer area, where shelves carried by logistics robots can queue and wait for picking. Next to the picking stations is a conveyor belt, which is used to transport the picked items. e process of goods picking is as follows: a robot runs from the current location to the shelf where the required goods are located (Figure 1 ①); then the logistics robot carries the shelf to the corresponding picking station (Figure 1 ②). After the picker picks the required goods from the shelf, the logistics robot either carries the shelf to the next picking station (Figure 1 ③) or returns the shelf back to its original position in the storage area (Figure 1 ④). en, the robot goes on to the next task (Figure 1 ⑤). All robots work together to complete the picking tasks. When all the orders assigned to the picking stations are fulfilled, this wave of picking is finished.

Parameter Definition of MRTA in RMFS.
rough the operation process analysis, we find that a robot can perform multiple tasks at one time (may be multiple goods located on the same shelf ), and one task can only be performed by one robot.
e MRTA problem in RMFS can be classified as multitask robots and single-robot tasks (MT-SR) problem [23].
Since a robot can serve one or more picking stations and perform one or more tasks at one shelf visit, the time cost of each task should be calculated in different ways. In this paper, the time cost of robots to perform different tasks is distinguished through refining the task allocation process, especially considering the situation that robots can transport a shelf to serve multiple picking stations at one time. Due to the parallel operating mode of multiple picking stations, the picking station with the longest picking time will determine the total picking time of the orders. So, the picking time among picking stations is balanced by controlling the task allocation sequence in this paper.
Before the task allocation model formulation, the following assumptions are needed: (1) e picking time for each item is the same, which is a constant.
(2) Several different goods can be placed on the same shelf and one kind of goods can only be placed on one shelf. erefore, the location of each goods is known.
(3) Logistics robots are isomorphic and travel at the same speed, without considering the interaction of logistics robots.
Due to the road layout of the RMFS (see Figure 1) and the kinematic constraints of robots, Manhattan distance is adopted and the grid map method is used to model the environment.
e parameters and variables used in the model formulation are defined as follows: A � a 1 , a 2 , . . . , a m is a set of picking tasks, and m is the total number of the picking tasks of this wave. R � r 1 , r 2 , . . . , r n is a set of mobile robots, and n is the total number of the robots. S � s 1 , s 2 , . . . , s h is a set of picking stations, and h is the total number of picking stations. O � O 1 , O 2 , . . . , O n is a collection of the task assignment schemes for all robots, where O j � a j1 ⟶ a j2 ⟶ · · · ⟶ a j|O j | represents the task assignment scheme for the robot r j , which is an orderly set of tasks assigned to r j . a jl is the task performed by r j in order l, and |O j | represents the total number of tasks assigned to r j . s a i represents the picking station where task a i was assigned, which is already known before allocating tasks to robots. d a i represents the Manhattan distance between the shelf of task a i and its picking station. d a i a i′ represents the Manhattan distance between the shelves of task a i and task a i′ . d s k s k′ represents the Manhattan distance between picking stations s k and s k′ . d r j a i represents the Manhattan distance between the initial location of the robot r j and the shelf of the task a i . v ′ is a constant, which represents the moving speed of logistics robots. Mathematical Problems in Engineering t ′ is a constant, which represents the picking time for one item. l ' jk is the order of the last task of picking station s k fulfilled by robot r j .
, which means that only if task a i is the lth task allocated to robot r j , x ijl equals 1. t ijl is the time cost, which means the time used of fulfilling the task a i by robot r j in the lth order.

e Time Cost of Tasks considering Task Correlation.
From the previous analysis, it is known that when tasks are fulfilled by different robots in a different order, the time costs of the tasks are different. If task a i is fulfilled by robot r j and the order of execution is l, that is a jl � a i , the time cost of task a i is represented by t ijl . When l � 1 , a i is the first task fulfilled by the robot r j and has no preceding task. When l ≥ 2, t ijl is relevant with the preceding task a j(l−1) . For the convenience of representation, let α � a j(l−1) . e time cost t ijl of completing task a i is expressed as follows: B αa i is a 0-1 variable. Its value is 1 when task a i is fulfilled by robot r j and located on the same shelf as its preceding task α. Otherwise, its value is 0. e time cost of completing one task includes the travel time of the logistics robot and the picking time of the picker. e picking time is a constant, and the travel time of the logistics robot is related to the travel distance. When l � 1, a i is the first task of robot r j , and the travel distance of r j includes the distance r j moving from its initial location to the shelf where task a i is located ( d r j a i ), carrying the shelf to the picking station and returning the shelf to its original location (2d a i ). So, the total distance that r j travels is d r j a i + 2d a i . When l ≥ 2 and B αa i � 0, task a i and its preceding task α are not located on the same shelf, and the travel distance of r j consists of three parts: the distance robot r j traveling from the shelf of the preceding task to the shelf of task a i (d αa i , because r j should send back the preceding task α to its original location, so when r j begins to perform task a i , it starts from there), and the distance r j carrying the shelf to the picking station and returning the shelf to its original location (2d a i ). So, the total distance r j that travels is d αa i + 2d a i . When l ≥ 2 and B αa i � 1, task a i and its preceding task α are located on the same shelf. Robot r j does not need to send the shelf of the preceding task to its original location but to send the shelf to the picking station s a i from the picking station s α , and the travel distance is d s α s a i .
After picking a i , r j needs to send the shelf to its original location and the travel distance is d a i . So, the total distance r j that travels is d s α s a i + d a i − d α . It can be seen from equation (1) that when B αa i � 1, the time cost of the task a i will be greatly reduced. In this case, task a i and task α are called correlated tasks. Assuming that α � a j(l−1) and β � a jl are two tasks fulfilled by the robot r j in adjacent sequence, the correlation between these two tasks can be divided into the following categories: (1) Strongly correlated tasks: when B αβ � 1 and s α � s β , it means that task α and task β are located on the same shelf and belong to the same picking station. In this case, the two tasks are called strongly correlated tasks. According to equation (1), the time cost of task β only consists of picking time t ′ , which is the minimum time cost of task β. (2) Weakly correlated tasks: whenB αβ � 1 and s α ≠ s β , it means that task α and task β are located on the same shelf but belong to different picking stations. In this case, the two tasks are called weakly correlated tasks. According to equation (1), the time cost of task β is only the travel time between two picking stations and picking time, which is also greatly reduced. (3) Uncorrelated tasks: when B αβ � 0, it means that task α and task β are located on different shelves. In this case, the two tasks are called uncorrelated tasks.
As above, when the two tasks fulfilled sequentially by the same robot are correlated, the time cost will be greatly reduced. erefore, when establishing the MRTA model, the correlation between tasks should be fully considered.

e Time Cost of Robots and Picking Stations.
e total time cost T j of robot r j taking to fulfill assigned tasks is expressed as equation (2), which means the total time used by robot r j to fulfill all the tasks assigned to it.
(2) e total picking time T k ′ of picking station s k is shown in the following equation: In equation (3), l jk ′ represents the last task of picking station s k assigned to robot r j . T k ′ represents the longest time the robots take to complete the tasks of picking station s k . Because the tasks of a picking station are fulfilled by the cooperation of multiple robots, the total picking time of a picking station is determined by the logistics robot that uses the longest time to complete the tasks of the picking station.

Multirobot Task Allocation Model.
Considering the balance of picking time among picking stations, the first objective function f 1 is to find the picking station with the shortest picking time, which is the picking station the next task to be assigned.
After determining the tasks to be assigned next, which robots these tasks should be assigned to are further determined. e objective of task allocation is to minimize the total picking time of all tasks in this wave. Because of the parallel operation mode of logistics robots, the total picking time of this wave depends on the robot with the longest picking time. An integer linear programming model is established to minimize the picking time of the logistics robot with the longest picking time.
Equation (6) means that a task can only be fulfilled by one robot, and the execution sequence is unique. Equation (7) means that the sum of tasks assigned to all robots is equal to the total number of tasks.

Algorithm Design
According to the characteristics of MRTA problem in e-commerce RMFS, this paper designs a four-stage balanced heuristic algorithm using parallel single-task auction [24] to solve the task allocation model. Parallel single-task auction is faster and more robust than other auction algorithms [25]; so, it is more suitable for task allocation of large-scale order picking robots. e algorithm is divided into four stages: (1) set the initial task allocation rules to decide the first task for each robot; (2) set the next task allocation rules using the correlation between tasks and considering the situation when one shelf can serve multiple picking stations at one time; (3) use the sequential auction algorithm to determine the picking station where the next to-be-assigned task is located, which reduces the range of the next task and balances the picking time of picking stations; then, the parallel single-task auction algorithm is used to determine the next task to be assigned and the robot to complete the task; (4) perform a dynamic adjustment on the calculation results to handle congestion and delay of robots during the process of fulfilling tasks. e specific steps of the algorithm are explained in detail in Sections 4.1-4.4 and the flowchart is shown in Figure 2.

Step 1: Initial Task Allocation for Each Robot.
(1) Define a set Unallocated S k for each picking station s k , which is used to store tasks that have not been assigned on picking station s k and each set is updated once after a task of the set has been assigned. (2) Define a set O j for each robot r j to store tasks that have been assigned to logistics robot r j and their specific execution sequence. e initial state of the set is empty. (3) For robot r j , traverse tasks are not assigned on all the picking stations to find the task with the shortest distance from the initial location of robot r j and put the task into set O j as the first task a j1 of robot r j . At the same time, update the set Unallocated S k . (4) When there is one and only one task in each set O j , the initial task allocation of all robots is completed. en, proceed to Step 2.

4.2.
Step 2: Task Allocation of the lth (l ≥ 2) Task of Robots considering Task Correlation.
(1) For robot r j , denote the last task in O j at present as α j . (2) Find all the strongly correlated tasks with α j and add them into set O j . Traverse the tasks in the set Unallocated S α j . If there are strongly correlated tasks with α j , add them in turn to set O j , and update Unallocated S α j ; then, turn to (3); otherwise, turn to (3) directly.
(3) Find all the weakly correlated tasks with α j and add them into set O j . Traverse all the to-be-assigned tasks in other stations except s α j . If there are weakly correlated tasks with α j , add them to set O j in proper order according to the distance between s α j and the picking stations where the tasks are located, and update the unallocated task set of the related picking stations. (4) If there are still tasks unallocated in this round, go to Step 3; otherwise, go to Step 4.

Step 3: Task Allocation Based on Sequential and Parallel
Single-Task Auction Algorithm.
(1) Use a sequential auction algorithm to decide the task on which picking station is the next task to be assigned. According to equation (3), the picking station with the shortest picking time at present is found, denoted as s ′ , from which the next task to be assigned is selected. e purpose is to balance the picking time among the picking stations, avoid uneven workload, and also reduce the selection range of the next to-be-assigned task. en, turn to (2). (2) For the set Unallocated s′ found in (1), the parallel auction algorithm is used to calculate the next to-beassigned task and the robot fulfilling the task.
(a) Each of the robots calculates the time cost of fulfilling each task in set Unallocated s′ according to equation (1), based on its tasks situation and the position of its last task.

Mathematical Problems in Engineering
(b) Find the task with the minimum fulfillment time cost, which is the next task to be assigned, and the corresponding robot is selected to fulfill this task. (c) If there are still tasks unallocated, go to Step 2; otherwise, go to Step 4.

4.4.
Step 4: Dynamic Adjustment and Update. In the real operation process, task time errors may arise owing to congestion, communication delay, or other faults of logistics robots. However, the delay is not considered in the construction of the model. Consequently, if the auction of all tasks is conducted simultaneously, it may cause serious Start Define a set Unallocateds k to store the unallocated tasks for each picking station and set the task set of each robot empty.
For each robot r j , find the task with the shortest distance from the initial location of robot r j and add it into O j as the first task of robot r j For each robot r j , denote the last task in O j at present as α j .
Find all the strongly correlated tasks with α j and add them into set O j Find all the weakly correlated tasks with α j and add them into O j in proper order according to the distance between s α j and the picking stations where the tasks are located According to Equation (3),find the picking station with the shortest picking time at present and denote it as s' Calculate the time cost of fulfilling each task in set Unallocated s' by each robot according to Equation 1.
Find the task with the minimum fulfillment time cost as the next task to be assigned and the corresponding robot is selected to fulfill this task After a set time interval, start a new round of task allocation End Y N Y Step 1 Step 2 Step 3 Step cumulative errors. So, in the paper, the auction is divided into many rounds. e time node and the number of tasks assigned in each round are set based on the total number of tasks. Each round of auction is performed according to Step 2 and Step 3. Finally, the allocation of tasks to robots and the sequence of tasks executed by each robot are obtained. e pseudocode of the main part of the algorithm is expressed in Algorithm 1.

Simulation Analysis
e effectiveness of the logistics robot task allocation model and algorithm proposed in this paper is verified using the data of an e-commerce company. In the experiment, there are 3 picking stations and 300 shelves placed in a warehouse with an area of 800 m 2 . e layout of the warehouse refers to the layout in Figure 1. ere are more than 2,000 SKUs on sales in the company. In order to avoid system congestion and improve picking efficiency, a decentralized storage strategy is adopted. at is, goods with high turnover rate are stored on shelves in different areas (for specific storage allocation methods, see [26]). Picking tasks are generated from the historical order data of the company. Each item in an order corresponds to a picking task. 15 mobile robots are employed to fulfill these tasks. Other parameters used in the experiment are given in Table 1.
e location of the tasks and their picking stations are known. e location of robots can be known at any time, and the distance between any two positions is calculated using the Manhattan distance because of the kinematics constraints of robots. e time interval between each round of task allocation was set to 15 minutes. MATLAB 7.11 was used to calculate the task allocation results. e proposed model and algorithm are also compared with the traditional time cost model and the algorithm without considering equilibrium.

Task Allocation Scheme and Task Fulfillment Time.
e task allocation scheme is the task allocation result of robots. Based on the proposed task time cost model and the designed auction algorithm, we first obtained the solution for the allocation of 200 tasks, which are numbered from 1 to 200. e task allocation scheme, which includes the allocation of tasks to robots and the task execution sequence, and the time cost for each robot to perform the task are given in Table 2. e fulfillment time of all tasks is the longest picking time of all robots, which is 860 seconds in this case.
It can be seen from Table 2 that using the proposed task time cost model and considering the time balance of picking stations, we can get the reasonable assignment of tasks to robots and the execution sequence of tasks. Furthermore, the proposed algorithm can achieve the goal of load balance among logistics robots, which is conducive to the parallel operation mode of multiple robots.

Comparative Analysis between Different Task Time Cost Models.
en, task allocation with different numbers of tasks (100, 200, 300, 400, 500) was used to compare the performance of the task time cost model proposed in this paper and the traditional task time cost model in [13]. e proposed time cost model in this paper considers task correlation, and the time cost of tasks with different types of correlation is calculated differently. e traditional task time cost model does not consider task correlation and the cost of a task is the time a robot spent performing the task alone, that is, the cost of every task is calculated according to equation (1) when l � 1. e total picking time of all tasks using these two models under different sizes of tasks is compared in Figure 3.
It can be seen from Figure 3 that the proposed task time cost model reduces the total picking time compared with the traditional task time cost model. Besides, with the increase of Input: unallocated task set Unallocated S k , Time interval T Output: task allocation result O j for each robot (1) # Initial task allocation (2) for (j � 1; j ≤ n; j ++ ) do (3) find the task with the shortest distance with r j and put it in to set O j (4) end (5) #Task allocation considering task correlation (6) for (j � 1; j ≤ n; j ++ ) do (7) α j ← the last task in O j (8) find the tasks strongly related to α j , add them into O j (9) find the tasks weakly related to α j , add them into O j in proper order (10) end (11) #task allocation based on auction algorithm (12) for (k � 1; k ≤ h; k ++ ) do (13) calculate the picking time T k ′ according to equation (3)  (14) end (15) s′← the picking station with the smallest picking time (16) for (j � 1; j ≤ n; j ++ ) do (17) calculate the time cost of fulfilling each task in Unallocated s′ by r j according to equation (1) (18) end (19) find the task with the minimum fulfillment time cost and allocate it to the corresponding robot ALGORITHM 1: e main part of the balanced heuristic algorithm. Picking time of one item (s) 4    the number of tasks, the advantage of the proposed task time cost model is becoming more and more obvious. at is because, with the increase of the number of tasks, the correlation between tasks increases, and thus, considering task correlation can save more time.

Influence of the Balance among Picking Stations.
Because of the parallel operation mode of multiple picking stations, the total fulfillment time of all tasks is determined by the longest picking time of all picking stations. erefore, balancing the picking time among picking stations is an important part of the design of the task allocation algorithm. Different numbers of tasks (300, 400, 500) were experimented on for comparative analysis between our proposed algorithm and the algorithm without considering the balance of the picking stations. e results are given in Table 3, in which the time in bold is the fulfillment time of all tasks under that scenario. It can be seen from Table 3 that the picking time of each picking station is relatively balanced and the completion time of all tasks is shorter when considering the picking time balance among picking stations. In contrast, without considering the balance of picking stations, the picking time of each picking station fluctuates greatly and it takes a longer time to complete all the tasks. As the balance among picking stations is not taken into account, the burden of each picking station is uneven, which may lead to long queues of some picking stations and idleness of other picking stations. In that case, the picking time difference between picking stations is relatively large, and the fulfillment time of all tasks is extended.

Conclusions and Prospects
is paper studied the multirobot task allocation problem in e-commerce RMFS. Combined with the operation process of the picking system, a task time cost model considering the correlation between tasks is proposed. e time balance among picking stations is added to multirobot task allocation model, and a four-stage balanced heuristic auction algorithm is designed to solve the model efficiently. Simulation results showed the proposed model and algorithm can significantly improve the efficiency of task allocation and reduce the fulfillment time of orders compared with the methods which do not consider the correlation between tasks and the balance among picking stations. e proposed task allocation method in this paper only considers the standard operation process, which is applicable to the daily smooth operation of small-and mediumsized e-commerce enterprises. For task allocation in largescale and changeable application scenarios, dynamic uncertain factors, such as order cancellation, order insertion, and traffic congestion, should be considered to improve efficiency. As reinforcement learning is suitable for dynamic and changeable environments and deep learning can handle the problem with high complexity and large space state, the MRTA method based on deep reinforcement learning may be a promising direction to solve the task allocation problem in large-scale robotic mobile fulfillment systems.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.