List Scheduling Algorithm Based on Virtual Scheduling Length Table in Heterogeneous Computing System

Edge computing needs the close cooperation of cloud computing to better meet various needs. Therefore, ensuring the efficient implementation of applications in cloud computing is not only related to the development of cloud computing itself but also affects the promotion of edge computing. However, resource management and task scheduling strategy are important factors affecting the efficient implementation of applications. Therefore, aiming at the task scheduling problem in cloud computing environment, this paper proposes a new list scheduling algorithm, namely, based on a virtual scheduling length (BVSL) table algorithm. The algorithm first constructs the predicted remaining length table based on the prescheduling results, then constructs a virtual scheduling length table based on the predicted remaining length table, the current task execution cost, and the actual start time of the task, and calculates the task priority based on the virtual scheduling length table to make the overall path the longest task is scheduled first, thus effectively shorten the scheduling length. Finally, the processor is selected for the task based on the predicted remaining length table. The selected processor may not be the earliest for the current task, but it can shorten the finish time of the task in the next phase and reduce the scheduling length. To verify the effectiveness of the scheduling method, experiments were carried out from two aspects: randomly generated graphs and real-world application graphs. Experimental results show that the BVSL algorithm outperforms the latest Improved Predict Priority Task Scheduling (IPPTS) and RE-18 scheduling methods in terms of makespan, scheduling length ratio, speedup, and the number of occurrences of better quality of schedules while maintaining the same time complexity.


Introduction
Edge computing and cloud computing are widely used in various fields. Especially with the development of Internet of Things and 5G technology, a variety of access devices and mobile devices are growing explosively. Although cloud computing has strong performance, it is difficult to meet the real-time and sufficient bandwidth requirements in the face of a large number of data transmission and device connection requirements [1]. Edge computing plays an important role in the Internet of Things because it is close to the client, which can provide near-end low latency services and ensure the security and privacy of data. In recent years, many scholars have studied it from different angles to better promote edge computing, such as cloud edge collaborative computing offload [2][3][4][5], framework [6][7][8][9], mobile edge computing offload [10][11][12][13][14][15][16], and pervasive edge computing offload [17,18]. A lot of work is also done to solve the trust and security [19][20][21][22][23] of data in the Internet of Things and data transmission [24] and collection [25] in wireless sensor networks. However, to better meet various needs, edge computing still needs the close cooperation of cloud computing. Therefore, how to schedule effectively in the cloud computing environment is not only related to the development of cloud computing itself but also affects the promotion of edge computing.
In cloud computing, most large applications do not rely on a single processor but use multiple processors for distributed and parallel computing. Generally, cloud computing resources are located in different geographical locations.
Each component in the environment has its operating system and is connected to the same network [26]. In essence, these resources are heterogeneous. They are connected through a high-speed network to form a heterogeneous system. Heterogeneous computing systems are highly competitive because they can provide parallel processing and high performance at a low cost [27], which is one of the reasons why they are widely used in scientific and industrial applications. More than half of the world's top ten supercomputing systems use GPU or CPU accelerator heterogeneous architectures, which are designed to maximize performance and efficiency [28,29]. This also shows that heterogeneous computing systems will continue to be widely concerned and applied.
The efficiency improvement of heterogeneous computing systems is not only reflected in the optimization of hardware but also important for the effective utilization of internal computing resources. The application in a heterogeneous computing system can be decomposed into many subtasks with dependency and priority constraints. In this way, multiple processors can be used to execute tasks in parallel to minimize the scheduling length, which is usually represented in the form of a directed acyclic graph (DAG) [30]. Each node in the graph represents a task with different execution costs on different processors, and the weight on each edge also represents the communication cost between tasks. Therefore, how to schedule tasks in heterogeneous computing systems is a key factor to improve system performance. However, such problems have been proven to be NP-complete [31,32]. To solve this problem, many heuristic-based algorithms [33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48] have been proposed in recent years, which are aimed at minimizing the scheduling length and other goals. These algorithms can be roughly divided into three categories: list scheduling, clustering scheduling, and duplication scheduling.
The list scheduling algorithm [33-38, 40, 41, 46] is relatively common. This type of algorithm has two main phases: task prioritization and processor selection. The first phase is mainly to sort the tasks according to the defined priority calculation function, and the latter phase is to select the tasks in the sorting list and choose an optimal processor to execute them until there are no executable tasks.
The clustering scheduling algorithm [43,[47][48][49] also has two phases. The first phase is mainly to analyze the characteristics of tasks or processors and cluster tasks or processors according to different clustering conditions. The second phase is scheduled based on the previous clustering. The main advantage of clustering is that it can reduce the cost of communication between tasks. However, for a heterogeneous system, the difference between task execution cost and intertask communication cost is large, and it will be difficult to select suitable clustering conditions. In reference [50], the authors compared many list scheduling and clustering scheduling algorithms and believe that simple and low complexity algorithms tend to have stronger competitiveness, while complex algorithms can only show advantages under certain conditions. The idea of the duplication scheduling algorithm [39,42,44,45,51,52] is to repeatedly schedule some tasks on differ-ent processors. The purpose of this is to reduce the delay in the execution of tasks due to communication time. Although the scheduling length can be shortened to a certain extent, the complexity of the algorithm is high and sacrifices a lot of processor resources.
List scheduling algorithm has lower complexity than clustering and duplication scheduling algorithm and can achieve good scheduling results, so task scheduling research also focuses more on list scheduling algorithms. However, these methods also have some shortcomings: the algorithm only analyzes the current task or its impact on subsequent tasks and lacks global considerations, which makes it difficult for the scheduling sequence generated in the priority phase to obtain better results. The calculation method of the priority is greatly affected by the system, and there are often errors of different degrees, which lead to poor stability.
To this end, we propose a new list scheduling algorithm, namely, based on a virtual scheduling length (BVSL) table algorithm to minimize the scheduling length of an application in heterogeneous computing systems. The experiment is based on randomly generated graphs and related realworld application graphs. The results show that the BVSL algorithm can obtain better scheduling results than the algorithms proposed in IPPTS [34] and literature [46]. The main contributions of the algorithm are as follows: (1) To solve the priority ordering problem in list scheduling, a new concept of "virtual scheduling length table" is proposed. By using the virtual scheduling length table in the task prioritization phase, the priority of the task can be considered as a whole, so that the task with the longest overall path is scheduled first, thus effectively shortening the scheduling length (2) The estimated remaining length table is defined based on the prescheduling results, which is more conducive to reflecting the real-time required from the current task to the exit task (3) Select a processor for the task based on the predicted remaining length table. The selected processor may not be the earliest possible finish time for the current task, but it can make the finish time of the next phase task shorter and reduce the scheduling length (4) The algorithm is evaluated from two aspects: randomly generated graphs and real-world application graphs. The experimental results show that our algorithm obtains a better scheduling length The remainder of the paper is organized as follows. Section 2 describes the scheduling problem. Section 3 reviews the related work. The proposed algorithm is explained in Section 4. Experimental details and simulation results are presented in Section 5. And finally, we conclude in Section 6.

Task Scheduling Problem
An application can generally be represented by a directed acyclic graph (DAG) G = ðT, EÞ, as shown in Figure 1, where T is a set of nodes and E is a set of directed edges. Each node in the graph can be regarded as a task, and each edge represents the dependencies between tasks. For example, eði, jÞ ∈ E indicates that task t j is an immediate successor of task t i , and task t i must finish its execution and transfer the resulting data to solve the data dependency before task t j starts. The weight of each edge eði, jÞ ∈ E represents the communication cost between task t i and t j , denoted by c i,j . The average communication cost of an edge c i,j is defined as where L is the average latency of all processors, B is the average bandwidth among processors, and data ij represents the amount of data to be transmitted from task t i to task t j . If task t j and task t i are assigned to the same processor, c i,j becomes 0.
A matrix W is generally applied as a supplement to the DAG. W is a t * p computation cost matrix, as shown in Table 1, where t represents the task number, p represents the processor number, and wðt i , p j Þ represents the estimated execution time of task t i on processor p j . The average execution time of task t i is defined as Some basic concepts commonly used in task scheduling [35,37,38] are as follows: Definition 1. predðt i Þ and succðt i Þ represent the set of immediate predecessors and the set of immediate successors of task t i , respectively. If the predðt i Þ or succðt i Þ of the task t i is empty, it is called an entry task (t entry ) or an exit task (t exit ). For convenience, if there are more than 1 entry tasks (exit task) in a DAG, a dummy entry task (exit task) with 0 weights and 0 communication can be added to the graph. Definition 2 Earliest start time (EST). The earliest start time of the task t i on the processor p j needs to satisfy all its parent tasks to be executed and the data transmitted to t i , and the processor p j is in an idle state at this time. Therefore, the calculation formula of ESTðt i , p j Þ is as follows: where availðp j Þ is the earliest available time of the processor, AFTðt k Þ represents the actual finish time of task t k , and max t k ϵpredðt i Þ fAFTðt k Þ + c i,k g denotes the arrival time of all input data for task t i on processor p j .
Definition 3 Earliest finish time (EFT). The earliest finish time of task t i on processor p j is equal to the sum of the earliest start time and the execution time of task t i on processor p j . The calculation formula of EFTðt i , p j Þ is as follows: Definition 4 Total execution time or scheduling length (makespan). The total execution time is the maximum value of the finish time of all exit tasks of the DAG. The calculation formula for makespan is as follows: Definition 5 Critical path (CP). The critical path of DAG is the longest path from the entry node to the exit node. The lower bound of the scheduling length is the minimum critical path length (CPMIN), which is accumulated by the minimum execution cost of each task in the critical path.
Definition 6 Average earliest start time (AEST). The average earliest start time can be calculated recursively by traversing the DAG downward starting from the entry task. AEST calculation formula is as follows: where AESTðt entry Þ = 0.
Definition 7 Average latest start time (ALST). The average latest start time can be calculated recursively by traversing the DAG upward starting from the exit task. The ALST calculation formula is as follows:

Wireless Communications and Mobile Computing
where ALSTðt exit Þ = AESTðt exit Þ.

Related Work
In cloud computing, efficient scheduling algorithms enable the system to process applications faster. How to perform application scheduling has a great impact on the system performance. Especially in a heterogeneous computing system, this greatly increases the complexity of the scheduling problem. Therefore, the task scheduling problem in heterogeneous computing systems has been concerned and researched all time. Such problems also belong to NP-hard problems. Therefore, many scheduling algorithms using heuristics have been researched to achieve the goal of reducing the scheduling length with lower time complexity. The most common one is the list scheduling algorithm. Compared with other clustering and duplication scheduling algorithms, its time complexity is relatively low and the scheduling results are better, so it is widely accepted and studied in task scheduling. The list scheduling algorithm has two main phases: the prioritizing phase (the priority of each task is calculated) and the processor selection phase (the task is assigned an appropriate processor according to the scheduling strategy). The most typical representative is the Heterogeneous Earliest Finish Time (HEFT) algorithm proposed in the literature [35] and the Critical Path On a Processor (CPOP) [35] algorithm proposed together with the HEFT algorithm. However, neither of these two algorithms considers the impact of current task allocation on all successors of it. The lookahead algorithm in [40] uses the "lookahead" strategy to make up for the deficiency of HEFT in selecting the appropriate processor for the task and selects the appropriate processor by predicting the impact of the current task allocation on all successors, but the main disadvantage of this algorithm is its high time complexity. To overcome this issue, the PEFT algorithm has been proposed in [38]. It proposes an "Optimistic Cost Table" (OCT) to predict and reduce the time complexity, and the result also shows that the performance similar to the lookahead algorithm can be achieved. However, the PEFT algorithm ignores the influence of the current task execution cost on the priority ranking. For this reason, the literature [33] proposed the algorithm Predict Priority Task Scheduling (PPTS) based on the Predict Cost Matrix (PCM), and the experimental results show that the PPTS algorithm achieves better performance than PEFT. The drawback of the PPTS algorithm is that it does not remove the execution cost of the current task from the PCM value in the processor selection strategy, which will affect the choice of processor. In addition, [34,36,37,41,46] also proposed different list scheduling algorithms, which can also achieve good scheduling results. Next, the algorithms mentioned in the recently proposed IPPTS [34] and literature [46] will be introduced in detail, which will also be used as a comparison algorithm for the algorithms proposed in this paper.
The IPPTS [34] algorithm is an improved algorithm based on the PPTS algorithm. The algorithm calculates the task priority by multiplying the average PCM value of the task by the number of its immediate successors. The purpose is to give higher weight to tasks with more immediate successors so that more tasks enter the ready list. In the processor selection phase, the processor that minimizes the sum of task finish time and "the Looking Head Exit Time" (LHET) is selected to realize "upward" and "downward" forecasting. Literature [34] also proved through experiments that the IPPTS algorithm can achieve better performance than the same type of algorithms such as HEFT, CPOP, PEFT, IPEFT [37], and PPTS. It also mentions that the algorithm does not perform well for the application graph algorithm with more critical tasks.
The authors in literature [46] proposed a new list scheduling algorithm (referred to as RE-18 in this paper). It is divided into three phases: the level sorting phase, the task prioritization phase, and the processor selection phase. The purpose of the level sorting phase is to determine the dependencies of tasks to determine the level of the task. In the task prioritization phase, three attributes of the current task's cumulative execution cost (CEC), data transfer cost (DTC), and rank of predecessor task (RPT) are considered. The purpose of it is to assign higher priority to tasks with more immediate successor tasks as much as possible. In the processor selection phase, a noncrossover technique [53] is used to select the processor, and the task is executed on the processor with small EFT or the processor with the least execution cost. Literature [46] proved through experiments that the proposed RE-18 algorithm has better performance than HEFT, PEFT, and other algorithms. Although the RE-18 algorithm uses a noncrossover technique to reduce the execution cost of tasks, it may also increase the communication cost and cause performance degradation.

Virtual Scheduling Length
In the list scheduling algorithm, the priority ranking calculations of many algorithms are improved or expanded based on the rank u (rank calculation method of the HEFT algorithm. This type of calculation method only considers the execution cost of the current task, its successor tasks, and the communication cost between tasks. Therefore, it cannot well reflect the real time required from the current task to the exit task. For this reason, we use the existing list scheduling algorithm to perform prescheduling to obtain a real scheduling result, and then, the predicted remaining length table (PRLT) is constructed based on the real scheduling result. PRLT is a matrix in which each PRLTðt i , p j Þ represents the predicted remaining length of task t i on processor p j , that is, the length from task t i to the exit task when task t i is on processor p j . PRLTðt i , p j Þ is calculated as where preAST ðt k Þ is the actual start time of the task t k in the prescheduling and min t k ϵsuccðt i Þ fpreAST ðt k Þ − c i,k g represents the latest finish time of the task t i on the processor p j . If the task t k is assigned the same processor as p j in the prescheduling, c i,k = 0. For exit tasks, PRLTðt exit , p j Þ = 0. In addition, due to the difference in the start execution time of the ready task, the priority of the task may be affected (an example will be given in Section 4.5). Hence, we propose VSLT to overcome this drawback. VSLT is defined as where the first part of the formula represents the earliest start execution time of task t i on processor p j , the middle part represents the estimated execution time of task t i on processor p j , and the last part PRLTðt i , p j Þ represents the predicted remaining length of task t i on processor p j . The sum of these three parts can be regarded as the total estimated scheduling length when only the currently ready task is considered, which helps us to analyze the impact of the current task on the scheduling length as a whole. Table 2 shows the comparison of better and worse results when VSLT and PRLT are used to consider priority, respectively (equal results are removed). It can be seen that for different types of application graphs, the results of using VSLT to consider priority sorting are nearly 20% better than those of using PRLT to consider priority sorting and reach the highest 35.6% in the Montage application graph. This shows that using VSLT to consider prioritization can improve the performance of the algorithm to a certain extent. Therefore, VSLT will be used as the basis for prioritization in the proposed algorithm (the application graph of the experiment here is the same as the one used in part 5).

Prescheduling Phase.
To make the sorting results of tasks closer to the real scheduling, we add a prescheduling phase before the task selection phase. In this phase, we choose the PPTS algorithm as the prescheduling algorithm. The preMap < t i , p j > , preASTðt i Þ, and premakespan generated by the prescheduling will be used as the input for the next phase, where preMap < t i , p j > represents the result of mapping between tasks and processors in prescheduling, preASTðt i Þ represents the actual start time of tasks in prescheduling, and premakespan represents the scheduling length when prescheduling is completed.

Task Prioritization Phase.
To prioritize tasks, the average VSLT will be calculated for each task. The calculation formula is as follows: In order to allow tasks with a larger average VSLT value to be executed firstly, the task priorities are sorted in descending order according to the rank VSLT ðt i Þ value.

Processor Selection Phase.
To select the processors for a task, the earliest finish time of the task on each processor is first calculated. The insertion-based policy is applied to compute EFT, that is, the possibility of inserting tasks in the earliest idle time slot between 2 scheduled tasks on the identical processor should be considered. The idle time slot should be at least capable of handling the computation cost of the task to be scheduled, and scheduling on this idle time slot should preserve precedence constraints.
Then, the EFT PRLT of the task on each processor is calculated. EFT PRLT is defined as Finally, the processor with the minimum EFT PRLT value is selected for the current task. Although the finish time of the chosen processor is not always the earliest, this processor selection policy not only considers the EFT value of the current task but also considers the impact of the chosen processor on the path length from the current task to the exit nodes. Therefore, the scheduling length can be effectively reduced to a certain extent.

Detailed
Description of the BVSL Algorithm. The pseudo-code of the algorithm is shown in Algorithm 1. First, the algorithm uses the PPTS algorithm for prescheduling and then calculates the PRLT based on the prescheduling result (lines 1 and 2). Then, an empty ready list is created, and the entry task is placed on top of the list (line 3). In the loop of while, it firstly calculates the values of EST, VSLT, and rank VSLT for all ready-list tasks and then selects the task with the highest rank VSLT value as the currently scheduled task (lines 5 and 6). After selecting the task for scheduling, the EFT PRLT values for the task on all processors are calculated and the processor p j with the minimum EF T PRLT is selected to execute task t i (lines 7-11). Finally, return the better result of the scheduling result and the prescheduling result as the final scheduling result (lines [14][15][16][17][18].
The time complexities of each step in the algorithm are as follows: Therefore, the total time complexity of the algorithm is Oðt 2 * pÞ. Tables 3 and 4, respectively, show the results of DAG prescheduling and the PRLT values of tasks on different processors in Figure 1. Table 5 shows the process of selecting processors in each iteration of the algorithm. Figure 2 shows the schedule of the example task graph in Figure 1 using the BVSL algorithm, the scheduling algorithm based on average PRLT to calculate task priority, IPPTS algorithm, and RE-18 algorithm. The makespan of the BVSL algorithm is 105, which is shorter than that of other algorithms.
For the DAG in Figure 1, the sorted list obtained by using the average PRLT sorting is (T1, T3, T2, T4, T6, T5,  T7, T8, T9, T10), and the sorted list obtained by using the average VSLT sorting is (T1, T3, T2, T4, T6, T8, T5, T7,  T9, T10). Compared with the former, the latter schedules the task T8 before tasks T5 and T7. The average PRLT of T8 is smaller than that of tasks T5 and T7, while its average VSLT value is larger than that of T5 and T7. As can be seen from Figure 2 that the only difference between Figures 2(a) and 2(b) is that task T8 is allocated before task T7 in processor P2, but the final scheduling length is shorter than that in Figure 2(b). Therefore, the start time of the task will also affect the priority of the task to a certain extent, resulting in a difference in the final result.

Experimental Results and Discussion
This section will introduce the comparison between the BVSL algorithm and the latest IPPTS and RE-18 algorithms. First, some comparison metrics will be introduced.

Comparison Metrics.
The comparison of the algorithms in this paper is based on the following four metrics: (1) Total execution time or scheduling length (makespan) [46] Makespan is the time required for an application from the execution of the first task to the end of the last task. One of the goals of the list scheduling algorithm is to minimize the makespan. The calculation is shown in Definition 4.
(2) Schedule length ratio (SLR) [35,37] The scheduling length ratio (SLR) is the normalization of the schedule length, and the definition formula is as follows: The denominator in the above formula is the minimum computation cost (CPMIN) of the task on the critical path. A lower SLR indicates a superior algorithm.
(3) Number of occurrences of better quality of schedules (NOBQSs) [35,37] NOBQS represents the percentage of occurrences of better, equal, and worse scheduling lengths between the two algorithms.
(4) Speedup [35] Speedup is the ratio of the sequential execution time to the parallel execution time (i.e., makespan). Therefore, the larger the speedup, the better the scheduling algorithm. The sequential execution time is calculated by assigning all tasks to a single processor that minimizes the total computation cost of the task graph. The speedup calculation formula is as follows: 6 Wireless Communications and Mobile Computing 5.2. Randomly Generated Application Graphs 5.2.1. Random Graph Generator. Our random graph will be generated using the task graph generator https://github .com/wzwtime/RandomGraphGenerator_new. The following parameters need to be used [34,35]: (1) Number of tasks in the graph (t) (2) The Shape Parameter of the Graph (α). We assume that the height (depth) of the DAG is randomly generated from a uniform distribution with an average value equal to ffiffi t p /α (the height is equal to the smallest integer value that is not less than the randomly generated actual value). The width of each level is randomly selected from a uniform distribution with an average value equal to α * ffiffi t p . If the selected α value is greater than 1, a dense graph (shorter graph with a high degree of parallelism) is generated. If the selected α value is less than 1, a longer graph with a lower degree of parallelism is generated.
(3) Out degree of a node, (out degree).   T1  P2  0   119   T2  P2  36  T3  P2  17  T4  P3  24  T5  P1  34  T6  P3  45  T7  P1  54  T8  P2  59  T9  P2  73  T10 P2 99 . It is a heterogeneous factor of processor speed. A high β value will result in a significant difference in the computation costs of tasks between processors, and a low β value indicates that the computation costs of a given task are almost equal among processors. The average computation costs of each task in the graph are randomly selected from a uniform distribution with the range ½0, 2 * w DAG , where w DAG is the average computational cost of a given graph that is obtained randomly. The computing cost of the task t i on each processor p j is randomly set within the following range:

Performance Results.
The average SLR produced by the algorithms as a function of CCR is shown in Figure 6. When CCR = 10, the BVSL algorithm is 14.5% and 8.8% better than RE-18 and IPPTS, respectively.
The average makespan as function on the number of processors is shown in Figure 7. When the number of processors is 4, the average makespan of the BVSL algorithm is 13.8% lower than RE-18 and 6.3% lower than IPPTS. When the number increases to 32, the average makespan of the BVSL algorithm is 12.8% lower than that of RE-18 and 6.0% lower than that of IPPTS. Figure 8 shows the average makespan of the algorithms with respect to the heterogeneity values. For a heterogeneity value of 0.1, the BVSL algorithm obtains the best results, with an average improvement of 10 percent and 2.9 percent over RE-18 and IPPTS, respectively. For a heterogeneity value of 2, the improvement over RE-18 and IPPTS increases to 19% and 15.5%, respectively. It can be seen that the BVSL  9 Wireless Communications and Mobile Computing algorithm can achieve better results in a highly heterogeneous situation. Table 6 lists the percentages of better, equal, and worse scheduling lengths generated by each algorithm compared with the remaining algorithms. Compared with the IPPTS and RE-18 algorithms, the BVSL algorithm achieves better scheduling in 78.5% and 91.3% of runs, equivalent schedules in 5.8% and 1.4% of runs, and worse schedules in 15.7% and 7.3% of runs.

Real-World Application
Graphs. In addition to randomly generated application graphs, we also considered three real-world application graphs: Montage [54][55][56], Epigenomics [55], and SIPHT [55]. Because of the known structure of these three workflows, we simply used different values for CCR, heterogeneity, and number of processors. The range of values that we used in our simulation are set as follows: (i) CCR = f0:1, 0:5, 1, 2, 5, 10g      Figure 11 shows the average speedup under different heterogeneous conditions. When β = 2, the BSLV algorithm is 11% better than RE-18 and 54% better than IPPTS. In terms of CCR (Figure 12), the average SLR improvement of BSLV over RE-18 and IPPTS is 26.5% and 19% for CCR = 10, respectively.

Epigenomics.
Secondly, for the Epigenomics application graph, we chose the application graphs of 24, 46, and 100 task sizes. The results are shown in Figures 13-16. The BVSL algorithm maintains the best performance overall. The average SLRs obtained for the BVSL, RE-18, and IPPTS algorithms as a function of CCR and heterogeneity are shown in Figures 13 and 14, respectively. For different heterogeneous, starting from β = 0:1 to β = 2:0, the improvement of BVSL compared with RE-18 increased from 6.5% to 15.8%, and the improvement of BVSL compared with IPPTS increased from 4.4% to 13.7%. For different CCRs, the average SLR of BSLV algorithm is also lower than RE-18 and IPPTS. When CCR = 10, BSLV is improved by 12.2% compared with RE-18 and 10.2% compared with IPPTS.

Wireless Communications and Mobile Computing
The average makespan for different heterogeneities and different CCRs are illustrated in Figures 15 and 16, respectively. When β = 2, the average makespan of BVSL reduces by 15.7% compared with RE-18 and 14% compared with IPPTS. For CCR = 10, the average makespan of BVSL performs better than RE-18 and IPPTS by 10.7% and 10%, respectively.

SIPHT.
In SIPHT, application graphs of 30, 60, and 100 task sizes were selected. The average SLRs for different heterogeneities and different CCRs are shown in Figures 17  and 18, respectively. For average SLRs, the performance of the BVSL algorithm is always better than other algorithms. In Figure 17, when β = 2, the BVSL algorithm is 24.9% better than RE-18 and 27.4% better than IPPTS. In Figure 18, when CCR = 10, the BVSL algorithm is 32.9% better than RE-18 and 29.7% better than IPPTS. Figure 19 shows the average speedup under different heterogeneous conditions. When β = 2, the BSLV algorithm is 14.4% better than RE-18 and 68.8% better than IPPTS. Figure 20 shows the average speedup under different CCRs. When CCR = 10, the BSLV algorithm is 47.3% better than

12
Wireless Communications and Mobile Computing RE-18 and 41% better than IPPTS. In addition, IPPTS has poor performance with high heterogeneity and low CCR. From the above experiments, we can see that the algorithm proposed in this paper has better performance, especially in the case of high heterogeneity. The main reason is that the higher the heterogeneity, the difference in the earliest start time of ready tasks will also increase. IPPTS algorithm uses bottom-up to calculate the priority, which will ignore the impact of this difference. RE-18 uses the top-down method to calculate the priority, and the priority of ready tasks will depend on their parent tasks. At this time, the higher the heterogeneity, the lower the fitness of the orig-inal priority order for the continuous scheduling of tasks. The BVSL algorithm realizes the overall consideration of task priority through prescheduling, which makes up for the above defects and improves the performance to a certain extent.

Conclusions
In general, resource management and task scheduling in edge computing and cloud computing are important factors to improve the performance of the system, and the same is true in cloud edge collaboration. This paper proposes a new list scheduling algorithm BVSL based on a virtual scheduling length table for heterogeneous computing systems in cloud. The algorithm first uses the PPTS algorithm for prescheduling and calculates the predicted remaining length table (PRLT) based on the results of the prescheduling. Secondly, to get the priority of the ready task, the actual start time of the ready task is also considered. In this way, the algorithm achieves a balance between the front and back when calculating the priority, and the priority close to the real scheduling is obtained. The experimental results on random application graphs show that the BVSL algorithm performs better than other algorithms on random graphs with a size of 20 to 400 tasks. In the case of high heterogeneity, the algorithm has stronger competition than the RE-18 and IPPTS algorithms. However, as the number of processors continues to increase, the competitiveness of algorithms also gradually declines. The results on the three real-world application graphs show that the overall performance of the BVSL algorithm is also better than the IPPTS and RE-18 algorithms. In the next step, we will study scheduling algorithms in dynamic environments and scheduling algorithms under multiple constraints.

Data Availability
We have not yet put the result data of experiments to public site of the public network. The result data of experiments can be available from the corresponding author of the manuscript.

Conflicts of Interest
The authors declare that they have no conflicts of interest.