DRL-Based Edge Computing Model to Offload the FIFA World Cup Traffic

In recent years, the volume of global video traffic has been increasing rapidly and it is considerably significant to offload the traffic during the process of video transmission and improve the experience of users. In this paper, we propose a novel traffic offloading strategy to provide a feasible and efficient reference for the following 2022 FIFA World Cup held in Qatar. At first, we present the system framework based on the Mobile Edge Computing (MEC) paradigm, which supports transferring the FIFA World Cup traffic to the mobile edge servers. Then, the Deep Reinforcement Learning (DRL) is used to provide the traffic scheduling method and minimize the scheduling time of application programs. Meanwhile, the task scheduling operation is regarded as the process of Markov decision, and the proximal policy optimization method is used to train the Deep Neural Network in the DRL. For the proposed traffic offloading strategy, we do the simulation based on two real datasets, and the experimental results show that it has smaller scheduling time, higher bandwidth utilization, and better experience of user than two baselines.


Introduction
e mobile Internet and Internet of ings (IoT) have been developed in recent years [1] especially during the process of building the smart cities [2], which has been generating the billions of Internet traffic due to the sharp increasing number of mobile devices (such as smart phone and wearable monitors). For example, the Cisco Annual Internet Report (CAIR) [3] shows that the total number of global mobile devices will grow from 5.1 billion (66 percent of population) in 2018 to 5.7 billion (71 percent of population) by 2023. In addition, the global IP traffic is expected to reach 5.3 ZB by 2023 [4] and the video traffic will account for 67.5% of the global IP traffic due to the introduction of new techniques and applications such as 4K/8K [5] and AR/VR [6] which are the necessary products in the smart cities. As we know, there are usually three kinds of video transmission, i.e., video on demand, carousel, and live streaming [7]. At the era of mobile Internet, more and more users pay attention to the applications of live streaming. Particularly, at the special time frame, the internationalized sport events which belong to the field of live streaming attract a large number of audiences without doubt. For example, the FIFA World Cup is the famous live sport event. Given this, this paper plans to investigate the FIFA World Cup and gives a network solution from the perspective of traffic offloading in order to provide support for the following 2022 FIFA World Cup in Qatar. e traditional video delivery strategies usually depend on the technique of Content Delivery Networks (CDN) [8,9]. In other words, the Internet content providers deliver the abundant contents to the edge users based on CDN in the push mode, in which the intermediate and the edge servers store the hop contents in advance. However, the strategies based on CDN are not suitable to the traffic offloading for the FIFA World Cup and the main reasons are analyzed as follows. e volume of the FIFA World Cup traffic is the unspeakably large and the corresponding features show the periodicity, abruptness, and explosivity. For the traffic, CDN fails to deliver them to the Metropolitan Area Network (MAN) or the Access Network (AN) which is very close to the mobile end-users, due to the fact that CDN servers are very expensive and it is impossible for the Internet content providers to deploy many CDN servers at the MAN or AN. As an alternative solution on the traffic offloading of the FIFA World Cup, the edge computing [10] can support the closest contents caching at the edge servers to satisfy the requirements of users. Nevertheless, at the era of mobile Internet, the ability of edge computing cannot be handled with the billions of mobile devices. At the right time, the Mobile Edge Computing (MEC) [11][12][13] has been regarded as the relatively appropriate alternative solution.
Different from the centralized cloud computing mode, the computation resources and storage resources in MEC are deployed at the edge network (such as mobile base station, wireless hotspot, and edge router) in the distributed way. On this basis, the computation tasks on the FIFA World Cup traffic can be offloaded to the mobile edge servers for running, which greatly reduces the communication overhead and the network delay of application programs. At the same time, the pressure faced by Internet content providers as well as core networks can be relieved effectively. In fact, the network performance improvement in MEC strictly depends on the tasks offloading decision [14]. Furthermore, the decision problem usually involves some necessary factors, such as network bandwidth, timing sequence of application program, and dependence between tasks. As a result, the mobile application programs can usually be built as a Directed Acyclic Graph (DAG) model [15] to realize the fine-grained traffic scheduling and enable the parallel processing for the multiple tasks. However, the task scheduling based on DAG belongs to the NP-hard problem [16], which indicates that those heuristic and approximate traffic offloading strategies cannot satisfy the requirements of users (especially for the real-time requirements) during the process of watching the FIFA World Cup. erefore, it is extremely urgent to find a stable and powerful method to solve such problem.
To the best of our knowledge, the Deep Reinforcement Learning (DRL) [17,18] has attracted much attention from the global researchers in the field of Artificial Intelligence (AI), which integrates the advantages of Reinforcement Learning (RL) [19] and Deep Neural Networks (DNN) [20] and enables obtaining the optimal decision by automatically learning the network environment based on the multiple interactions. To be specific, the DRL has the following benefits. (1) DRL can learn the optimal decision strategy in the model-free way, and the whole process has no need for the environment modelling, reflecting the great flexibility.
(2) DRL has the strong presentation ability and generalization ability of supporting the huge state space in terms of the DAG-based traffic scheduling problem. (3) DRL is a global optimization strategy and it has the large probability of obtaining the optimal solution. Although there have been some proposals on using DRL to address the traffic offloading problem in MEC, they cannot relieve the network pressure well and cannot be used for the traffic offloading in the FIFA World Cup directly. With the above considerations, this paper plans to propose a novel DRL-based MEC traffic offloading (NDMT) strategy for the 2022 FIFA World Cup, and the major contributions are summarized as follows: (i) We compute the priorities of tasks and transfer DAG into a sequence of tasks according to the computed priorities, in which the scheduling process with respect to the tasks sequence is regarded as the Markov Decision Process (MDP) (ii) A DNN model based on the Sequence to Sequence (S2S) form is designed and used to fit the scheduling strategy of MDP, where the DAG is converted into the sequence of tasks to be input into the DNN (iii) e proximal policy optimization (PPO) method is used to train the DNN in the DRL, for obtaining the high stability and reliability e rest paper is organized as follows. Section 2 reviews and compares the related work. Section 3 introduces the system framework of NDMT. Section 4 gives the problem description on traffic offloading in MEC. Section 5 presents the construction method of MDP. e traffic scheduling strategy based on DRL is proposed in Section 6. Section 7 reports the experimental results. Section 8 concludes this paper and gives the future research direction.

Related Work
In this section, we review the researches on the task (traffic) offloading of MEC in last three years (2018-2020) from two aspects, i.e., the heuristic methods and the DRL-based methods.

Heuristic Methods.
ere have been a lot of heuristic traffic offloading strategies in MEC. For example, the authors in [21] studied the scenario where multiple mobiles uploaded tasks to a MEC server in a single cell by allocating the limited server resources and wireless channels between mobiles devices. In particular, the authors formulated the optimization problem for the saved energy on the mobile devices with the tasks being dividable and utilized the selection maximum saved energy first algorithm to realize the solving process. In [22], the authors investigated an energyefficient joint computation offloading, load balancing, and transmission power control problem and further proposed a heuristic algorithm to obtain the good traffic offloading while guaranteeing load balancing among the multiple servers. In [23], a distributed computation offloading and resource allocation optimization scheme in the heterogeneous networks with MEC was proposed, in which an optimization problem was formulated to provide the optimal computation offloading strategy. In [24], the traffic offloading strategy based on Software-Defined Networking (SDN) in the ultra-dense network was devised to minimize the delay while saving the battery life of mobile devices. It transformed this optimization problem into task placement subproblem and resource allocation subproblem, which could reduce 20% of the task duration with 30% energy saving. In [25], a MEC-enabled multicell wireless network was considered where each base station is equipped with a MEC server that assisted mobile users in executing computation-intensive tasks via task offloading. It formulated the involved problem as a mixed integer nonlinear program, including the task offloading decision, uplink transmission power of mobile users, and resource allocation computation at the MEC servers. Furthermore, the authors in [26] jointly decided on the computing resource allocation for the hosted applications and designed a novel thoughtful decomposition based on the technique of the logic-based benders decomposition, with the heterogeneity in the requirements of the offloaded tasks (different computing requirements, latency, and so on) and limited MEC capabilities consideration. In [27], the authors studied the trade-off between task execution time and energy consumption at end-users under varying wireless channel conditions for soft real-time applications and involved tasks. It proposed a genetic algorithm with constrained mutation for optimal job partitioning and introduced an edge-proposing deferred acceptance algorithm to solve the preference based matching game. In [28], a task offloading algorithm that utilized the cache function of edge server was proposed. When the task with the same cached type of contents was uploaded to the edge server, the preset evaluation parameters took some factors into account to calculate the optimal processing position for the task. In [29], each base station was integrated with a MEC server in terms of the executing intensive computation task, and an iterative algorithm was proposed to solve the optimization problem in a single mobile user MEC system. In [30], an agent was introduced into the offloading of computation tasks, and a novel framework of agent-enabled task offloading in the unmanned aerial vehicle aided MEC was proposed to help the users obtain the good Quality of Experience (QoE). In [31], the authors addressed the problem of coordinating the offloading decisions of wireless devices that periodically generated computationally intensive tasks due to the various delay sensitive applications. In addition, they also developed a game theory based model and proposed a polynomial complexity algorithm for computing an equilibrium. In [32], the authors considered a system where most mobile devices migrated the duplicate computation tasks to the edge servers and shared the requested contents for computation tasks. erein, an efficient Lyapunov online algorithm that could perform joint task offloading and dynamic data caching strategies for computation tasks or contents was proposed to reduce the overall latency of all mobile devices. Although these traffic offloading strategies had good performance, they could not decrease the network pressure well and could not be used for the traffic offloading in the FIFA World Cup directly due to the special traffic features, i.e., the periodicity, abruptness, and explosivity.

DRL-Based Methods.
ere have also been some DRLbased traffic offloading strategies in MEC. For example, in [33], a multiuser MEC system was considered, where the multiple users could perform computation offloading via wireless channels to a MEC server. Particularly, the RL-based optimization framework was introduced to tackle the resource allocation in wireless MEC. In [34], a deep-Q network based task offloading and resource allocation algorithm for the MEC was proposed, where each mobile terminal had the multiple tasks offloaded to the edge server. It also designed a joint task offloading decision and bandwidth allocation optimization to minimize the overall offloading cost in terms of energy cost, computation cost, and delay cost. In [35], the DRL was first proposed to solve the offloading problem of multiple service nodes for the cluster and multiple dependencies for mobile tasks in the large-scale heterogeneous MEC. In particular, it used the long short term memory network layer and the candidate network set to improve the deep-Q network algorithm in combination with the actual environment of the MEC. In [36], the authors considered MEC for a representative mobile user in a sliced Radio Access Network (RAN), where the multiple base stations were available to be selected for computation offloading. Meanwhile, a double DQN-based strategic computation offloading algorithm to learn the optimal policy without knowing a priori knowledge of network dynamics was proposed to break the curse of high dimensionality in state space. In [37], an intelligent offloading system for vehicular edge computing by leveraging DRL was constructed, in which both communication and computation states were modelled by the finite Markov chains. In [38], the authors investigated the problem of delay sensitive task scheduling and resource management on the server side in multiuser MEC scenario, where a new online algorithm based on DRL was devised to reduce average slowdown and average timeout period of tasks in the queue. In [39], the computing aware scheduling strategy in MEC was proposed, in which a support vector machine based multiclass classifier was adopted. Although these DRL-based strategies also showed good effect on the traffic scheduling, they always had some limitations to be improved, such as scheduling time, bandwidth utilization, and QoE of user. is motivates the study of this paper. Besides, this paper also gives a special application scenario, i.e., the 2022 FIFA World Cup, which can provide a significant reference.

System Framework
is section introduces the system framework of NDMT, including MEC-based traffic offloading architecture and DRL-based workflow for MEC traffic offloading. In addition, the abbreviations frequently used in this paper are listed in Table 1.

MEC-Based Traffic Offloading
Architecture. In this paper, we present a MEC-based traffic offloading architecture, as shown in Figure 1, where the MEC servers are deployed at the edge network (e.g., AN) to provide the handy and lowlatency computation services for the mobile users. In particular, MEC allocates the specialized hardware and software resources for each user and separates such resources by using the virtualization technology, so that the quality of service and the privacy of user can be guaranteed effectively.

Mobile Information Systems
For the side of user with the mobile device, the computation tasks generated from the mobile applications (e.g., the FIFA World Cup Traffic) can be performed at the local mobile device's Central Processing Unit (CPU) directly or can be sent to the MEC server via the Data Transmission Unit (DTU) and be performed by the corresponding service instance (i.e., the remote traffic offloading). Meanwhile, the traffic offloading module is used to make the scheduling decision for all tasks in the mobile device, including two functions, i.e., the execution way and the scheduling order.
An application program usually includes the multiple computation tasks that have the dependence among them, which can help realize the fine-grained traffic scheduling and enable the parallel processing for the multiple tasks. In this paper, we model the mobile application program related to the FIFA World Cup Traffic as a DAG, denoted by G � (T, L), where T is the set of tasks and L is the set of links constructed by the tasks. Here, the arbitrary task is denoted by t i ; the arbitrary link is denoted by l(t i , t j ). Particularly, for l(t i , t j ), t i is the precursor task of t j while t j is the successor task of t i ; that is to say, the performing of t j relies on t i . In DAG, the task without any precursor task is called the entry task, whilst the task without any successor task is called exit task. In addition, it allows an application program to have the multiple entry tasks and the multiple exit tasks in case of the parallel processing. For t i , it has three attributes, i.e., the input volume of traffic, the number of CPU cycles, and the output volume of traffic, denoted by Iv i , Cy i , and Ov i respectively, and the corresponding values can be obtained by the program analyzer, just like in [40], which reflect the required transmission cost and computation cost.
As depicted in Figure 1, the performing of task has two ways, i.e., offloading performing and local performing. If t i is scheduled to the remote edge server for performing, the whole process consists of three phases, i.e., task sending, edge performing, and result returning. At the first phase, the volume of traffic Iv i is transmitted to the remote edge server.
Let Rul denote the transmission rate of uplink, and the required transmission time Tul i of t i is defined as follows: en, at the edge performing phase, the Cy i CPU cycles are performed at the corresponding server instance in the MEC server. Let Fv denote the virtual clock frequency of server instance for t i , and the required performing time Tep i of t i is defined as follows: Similar to the first phase, the returning time at the result returning phase Tdl i is defined as follows: where Rdl is the transmission rate of downlink. Furthermore, let Tof i denote the total time cost in case of performing t i via the traffic offloading way, and it concludes three parts of time costs, i.e., If t i is performed at the local mobile device, it is unnecessary to upload and download the data to which t i corresponds; this is to say, the total time cost only depends on the local computation overhead with respect to the consumption of CPU resources. Let Tlo i denote the total time cost in case of performing t i at the local mobile device, and we have where Fl is the local CPU's clock frequency.

DRL-Based Workflow for MEC Traffic
Offloading. In this section, we describe the DRL-based workflow for MEC traffic offloading, as shown in Figure 2. e whole workflow consists of four main modules, i.e., MEC problem description, MDP construction, DNN-based strategy fitting, and PPO-based reinforcement training. Among them, the first module is to present the involved problem on traffic offloading, including the local scheduling and the offloading scheduling. e second module is to transfer the scheduling process with the tasks sequence as the MDP, where the priority of task is computed and regarded as the transferred attribute. In the third module, the S2S-based DNN model is used to fit the scheduling strategy. In the last module, the PPO-based DRL method is adopted to train the DNN because the PPO has the good stability and reliability. Furthermore, during the whole process of workflow, all processing units (including CPU, DTU, virtual CPU, and virtual DTU) only perform and send one task, which indicates that the multiple tasks preemption phenomenon is not allowed. In addition, the DAG-based task scheduling in MEC satisfies the following two features. (1) Given the bandwidth limitation of edge network, the transmission rate

Problem Description
At first, we define four timestamps with respect to the completed time, i.e., task sending, edge performing, result returning, and local performing, denoted by Tsos i , Tsop i , Tsor i , and Tslp i , respectively. If t i is performed at the local device, we have Tsos i � Tsop i � Tsor i � 0; otherwise, Tslp i � 0. In particular, before t i is scheduled, it is required that all precursor tasks of t i have to be performed in advance.
Consider the condition where t i is performed at the local mobile device, let RTslp i be the ready timestamp regarding scheduling t i , and we have which indicates that RTslp i is the earliest timestamp in terms of such condition where all precursor tasks are completed, pre i is the set of precursor tasks with respect to t i , and t j is one precursor task of t i . In particular, it perhaps needs the queueing for each task before being performed at CPU; thus, the starting timestamp may be not equal to the ready timestamp. Let STslp i denote the starting timestamp regarding scheduling t i , and we have STslp i ≥ RTslp i , satisfying Consider the condition where t i is scheduled to the remote edge server for performing, let RTsos i denote the ready timestamp regarding sending t i , and we have where all precursor tasks of t i are performed at the local mobile device or the remote server. Similarly, let STsos i denote the starting timestamp when t i is sent, and we have STsos i ≥ RTsos i and Tsos i � STsos i + Tul i . en, let RTsop i denote the ready timestamp regarding performing t i via the service instance, and we have Among them, RTsop i depends on the fact that the input traffic of t i is completed; i.e., the transmission of Iv i is where exit(G) is the combinatorial set of exit tasks in G. is paper considers the delay sensitive FIFA World Cup traffic; therefore, the main purpose is to maximize the QoE while minimizing TTotal(G) based on different scheduling strategies. Particularly, during the process of scheduling, each task's execution way and scheduling order should be determined. In addition, we emphasize that the scheduling order of task at the local mobile device and that at the service instance keep consistent.

Priority Computation.
For all traffic scheduling strategies, computing the priority for each task is the indispensable operation. Furthermore, based on the computed priority, the scheduling order can be determined [41]. For the arbitrary t i in the DAG, its computation cost with respect to the time can be obtained by(4)fd4 and expressed by Tof i . Based on Tof i , the priority of t i is defined as follows: where Pr i is the priority of t i and suc i is the set of successor tasks with respect to t i . It is obvious that the (11)fd11 shows the recursive form. If t i is the exit task, we have en, the DAG is depth-firstly traversed with starting from the exit task, and all tasks' priorities can be obtained by (11) and (12)fd12. ese tasks are arranged according to the corresponding priorities in the descending order, and the scheduling sequence of tasks can be defined as follows: where n is the number of tasks (or nodes in DAG). In particular, Q is the special topological sorting result on G, and the original dependence among tasks can be guaranteed according to the sequential scheduling in Q.

Construction Method.
e sequential scheduling decision process for all sorted tasks in Q can be modelled as one MDP, denoted by M � (S, A, P, D 0 , R, λ), where S, A, P, D 0 , R, and λ are the state space, action space, state-transition matrix, the probability distribution of initial state, reward function, and discount factor, respectively.
Let k denote the scheduled number of tasks in Q, and the current state space can be expressed as follows: 1 , a 2 , . . . , a k , (14) where A k is used to describe the scheduling condition (i.e., state space) on the first k tasks in Q and a k is used to record the execution way of task: a k � 1 means that k-th task in Q is performed in the offloading way; otherwise, it is performed at the local mobile device.
Furthermore, let Q 1−>i denote the scheduled sequence with respect to the first i tasks in Q, and we can construct a scheduled subgraph of G, denoted by G 1−>i � (T ′ , L ′ ); here T ′ ⊆T, L ′ ⊆L, G 1−>n � G and G 1: 0 � Φ. Under such condition, in order to minimize the scheduling time, we give a reward function for the current task, defined as follows: (15) which refers to the time difference between a i performed before and that after in terms of s i .
Moreover, the traffic offloading decision module in Figure 1 can be defined as a conditional probability function, denoted by θ(a i |ts i ). From the initial state s 0 , upon the traffic offloading decision module completes an action, the system enters a new state and further gets the corresponding reward based on (15)fd15 until the last task in Q is completed. According to the above statements, the whole task scheduling process based on MDP is described as follows: MDP :� s 0 , a 0 , r 0 , s 1 , a 1 , r 1 , . . . , s n−1 , where s n−1 is the termination state to mean that all tasks have been completed. en, the accumulated reward with the discount factor consideration is defined as follows: which indicates that the maximization of the accumulated reward with the discount factor consideration is consistent with the minimization of the total scheduling time.

DRL-Based Traffic Scheduling Strategy
In fact, the DAG has the feature of diversity and the involved state space is infinitely great; thus, it is impossible to obtain the corresponding state-transition matrix in advance. Given this, this paper uses the DRL to find the optimal scheduling strategy for the traffic scheduling decision module. 6 Mobile Information Systems

DNN-Based Strategy Fitting.
We employ DNN to fit θ x (a i |ts i ), where x is the set of parameters related to DNN. As we know, the input of DNN is s i which is related to G but G cannot be input to the DNN directly due to the restriction of data features; therefore, the DAG is converted into the sequence of tasks (just like Q) to be input into the DNN. For such sequence, it consists of the following three vectors: (1) time vector including Tul i , Tep i , Tdl i , and Tlo i , (2) precursor vector including all precursor indexes, and (3) successor vector including all successor indexes. Among them, the size of precursor/successor vector is set as a fixed value, denoted by sz. If the number of precursor/successor tasks is smaller or equal to sz, the corresponding locations are filled by −1; otherwise, the extra parts are ignored directly. e output of DNN is the probability distribution of executable actions based on the current task state. In fact, for the current task, its scheduling strategy decision action (a i ) has the direct influence on the next task's state (s i+1 ); therefore, this paper leverages the S2S-based DNN structure model, including encoder and decoder, as shown in Figure 3. In particular, both encoder and decoder are realized by the Recurrent Neural Network (RNN). Meanwhile, the encoder receives such input sequence in turn and finally outputs the hidden layer(s) as the features of DAG. e decoder initializes its own hidden layer(s) by using the output result from the encoder. In addition, the decoder sequentially inputs the scheduling actions (A i ) and then outputs the corresponding θ x (a i |ts i ).
e above process can be still performed until s n−1 is completed.

PPO-Based Reinforcement Training.
In order to obtain the optimal traffic scheduling decision, the training objective of DRL can be defined as follows: which indicates that the whole traffic scheduling decision depends on all scheduling decisions of tasks. In this paper, we use the PPO [42] to train such traffic scheduling decisions, which can accelerate the efficient convergence while guaranteeing the scalability and reliability.
Since the scheduled DAG has the feature of diversity and the involved state space is infinitely great, it is impossible to search all DAGs. With such consideration, we can continuously collect the scheduled DGAs after strategy deployment to construct the training set related to the DAG. en, based on the training set, we also can train the traffic scheduling strategies. In addition, in order to obtain the better effect on the convergence, we scale back the obtained reward by each scheduling decision during the process of training; that is, r i always keeps in [0, 1].

Setup.
e proposed NDMT is implemented by the C++ programming, and the involved simulation parameters are set in Table 2. Among them, the network scale is dynamically changing from n � 20 to n � 60 according to the pattern of Figure 4 due to the fact that it accords with the network deployment feature to offload the FIFA World Cup traffic, where the step length is 10. Particularly, for each scale, there are 2500 DAGs being used for training the DRL and there are 300 DAGs being used for testing the DRL; in other words, there are 5 * 2500 � 12500 and 5 * 300 � 1500 DAGs being used to train and test the DRL, respectively. e TensorFlow [43] is used to realize the DRL, where both encoder and decoder have 256 hidden neurons. In addition, the method of layer normalization [44] is used to improve the training efficiency.
Furthermore, the researches from [35,36], respectively, are used as two baselines, because they are the latest research representatives on the DRL-based traffic offloading in MEC.
erein, [35] is proposed by Lu et al. while [36] is proposed by Chen et al. and in this paper they are abbreviated to ByLu and ByCh, respectively. In terms of experiments, the DRL convergence for traffic offloading is analyzed firstly. en, three metrics, i.e., scheduling time, bandwidth utilization, and QoE are used to measure the proposed NDMT's performance.

Convergence Analysis.
is section verifies the convergence of DRL based on 12500 DAGs, in which the average accumulated reward is considered as the evaluation metric. For each training, we record the corresponding training result. When the training within one period is finished, we input 300 testing DAGs into the S2S-based DNN and the related traffic offloading strategy is obtained. en, the obtained traffic offloading strategy is simulated and the average accumulated reward can be computed. e relationship between the average accumulated reward and the training period is shown in Figure 5. We can observe that the whole training process includes three stages, i.e., rapid increasing stage, stable increasing stage, and stationary stage, and the stationary stage is reached with the need of 190 training periods. It indicates that the proposed NDMT can converge to the optimal state space and further obtain the optimal traffic scheduling decision.

Scheduling Time.
e scheduling time is defined as the time difference between the time point when the first task is sent and that when the computation result of the last task is obtained by the local mobile device. e average scheduling times of NDMT, ByLu, and ByCh are shown in Figure 6.
We can observe that NDMT always has the smallest average scheduling time, followed by ByLu and ByCh, and there are two main reasons. On the one hand, NDMT considers the process of scheduling as the MDP, which can improve the processing speed for each task in the S2S-DNN structure model. On the other hand, NDMT adopts the layer normalization method to increase the training efficiency, which can further save the training time and strive for the scheduling time as small as possible. For two baselines, ByCh does not consider a priori knowledge of network dynamics to learn the optimal policy and the state space is the high dimensional; thus, it has larger average scheduling time than ByLu. In addition, we can also observe that the average scheduling time becomes larger and larger with the increasing of network scale, which results from two aspects.
On the one hand, it needs much more time to train the DAGs; on the other hand, it needs much time to compute more tasks.    1  12  23  34  45  56  67  78  89  100  111  122  133  144  155  166  177  188  199  210  221  232  243  254  265  276  287  298  309 The average accumulated reward The training period    Mobile Information Systems

Bandwidth Utilization.
e bandwidth utilization is defined as the ratio of the used bandwidth and the total network bandwidth. e average bandwidth utilizations of NDMT, ByLu, and ByCh are shown in Figure 7.
We can observe that NDMT always has the highest average bandwidth utilization, followed by ByLu and ByCh, which is regarded as an important benefit of NDMT. Regarding this, there are no the concrete reasons. Furthermore, we can observe that the average bandwidth utilization basically remains unchanged, i.e., having the strong stability for different network scales; this is because NDMT always can converge to the optimal solution (see Figure 5). It suggests that the proposed NDMT has the considerable reference value for the following 2022 FIFA World Cup. However, the average bandwidth utilizations of ByLu and ByCh become lower and lower with the increasing of network scale because the corresponding convergences are nondeterminate.

QoE of User.
We use the watching fluency to measure the QoE of user, and the watching fluency is defined as the number of network lags per 10 mins. e average numbers of network lags of NDMT, ByLu and ByCh are shown in Figure 8.
We can observe that NDMT always has the smallest average number of network lags, followed by ByCh and ByLu, which further indicates that the user has the best watching experience in NDMT, because NDMT has the highest bandwidth utilization and the smallest response time. For ByLu and ByCh, the latter deploys the large number of base stations to offload traffic and thus the required response time is relatively smaller than the former; as a result, ByCh has better QoE of user than ByLu. In addition, we can also observe that NDMT has the most stable QoE of user but the two baselines do not have, and similar reasons can be found in the above section. Moreover, the experimental results suggest that the users can enjoy the best experience when watching the following 2022 FIFA World Cup by the personal mobile devices under the environment of MEC.

Conclusions
In this paper, we investigate the MEC traffic offloading strategy based on DRL. At first, we introduce the proposed system framework, including MEC-based traffic offloading architecture and DRL-based workflow for MEC traffic offloading. en, we give the problem description, i.e., minimizing the total performing time for one application program. For the concrete scheduling strategy, it includes three parts. At the first part, we compute the priorities of tasks and transfer DAG into a sequence of tasks according to the computed priorities, and the scheduling process with respect to the tasks sequence is regarded as the MDP. At the second part, one S2S-based DNN model is used to fit the scheduling strategy. At the third part, the PPO method is employed to train the DNN. We do the simulation experiments based on the TensorFlow including the convergence analysis and the performance comparison. Meanwhile, the scheduling time, bandwidth utilization, and QoE of user are considered three performance evaluation metrics, and we observe that the proposed NDMT outperforms two baselines. Based on the nice experiment results, we think that the proposed NDMT can be regarded as a feasible and efficient reference for the following 2022 FIFA World Cup held in Qatar. In the future, we plan to improve NDMT from the following two aspects. At first, more datasets are collected and used to train the DNN; then, NDMT is deployed at a real network environment to further verify its performance.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.