Reinforcement Learning for Security-Aware Workflow Application Scheduling in Mobile Edge Computing

Mobile edge computing as a novel computing paradigm brings remote cloud resource to the edge servers nearby mobile users. Within one-hop communication range of mobile users, a number of edge servers equipped with enormous computation and storage resources are deployed.Mobile users can offload their partial or all computation tasks of a workflow application to the edge servers, thereby significantly reducing the completion time of the workflow application. However, due to the open nature of mobile edge computing environment, these tasks, offloaded to the edge servers, are susceptible to be intentionally overheard or tampered bymalicious attackers. In addition, the edge computing environment is dynamical and time-variant, which results in the fact that the existing quasistatic workflow application scheduling scheme cannot be applied to the workflow scheduling problem in dynamical mobile edge computing with malicious attacks. To address these two problems, this paper formulates the workflow scheduling problem with risk probability constraint in the dynamic edge computing environment with malicious attacks to be a Markov Decision Process (MDP). To solve this problem, this paper designs a reinforcement learning-based security-aware workflow scheduling (SAWS) scheme. To demonstrate the effectiveness of our proposed SAWS scheme, this paper compares SAWS with MSAWS, AWM, Greedy, and HEFT baseline algorithms in terms of different performance parameters including risk probability, security service, and risk coefficient. (e extensive experiments results show that, compared with the four baseline algorithms in workflows of different scales, the SAWS strategy can achieve better execution efficiency while satisfying the risk probability constraints.


Introduction
In recent years, with the explosive growth of smart devices (such as smart cameras, smart glasses, smart bracelets, and smart phones), a large number of advanced mobile applications (such as real-time navigation systems, interactive online games, virtual reality, and augmented reality) are emerging rapidly. In order to efficiently process these mobile applications, mobile devices need to be equipped with abundant computing resources and battery capabilities [1,2]. However, due to the limited size of mobile devices, they are usually resource-constrained. erefore, the conflict between the ever-increasing resource requirements of mobile applications and the limited resource capabilities of mobile devices brings great challenges to execute these mobile applications.
Mobile Edge Computing (MEC) as a new computing paradigm brings remote cloud resource to the edge servers nearby mobile users, enabling mobile users to offload partial or all computation tasks of mobile applications to edge servers for collaborative execution, and thereby greatly alleviating the conflict between resource supply and demand, effectively reducing the application completion time and the mobile devices' energy consumption [3][4][5].
Many mobile applications are typical workflow models, and they consist of a sequence of precedence-constrained tasks. For example, a video streaming-based face recognition application mainly consists of motion detection and face recognition. e face recognition further consists of face detection, image preprocessing, feature extraction, and classification [3,6]. In mobile edge computing, workflow application scheduling has a higher complexity in comparison to independent task scheduling [7][8][9]. In addition, it also faces two challenges for workflow application scheduling in mobile edge computing as follows. One is the edge environment dynamics, such as the time-varying channel quality and workload of edge servers, which can impact the workflow application scheduling decision. e other is the security problem of workflow application scheduling. Due to the open nature of the edge environment, the edge servers that aggregate an amount of user data frequently suffer from malicious attacks such as data leakage and tampering, which pose a serious threat to successfully execute these offloaded tasks [10][11][12][13]. Hence, it needs to employ various types of security services to effectively defend against the hostile attacks and protect these offloaded tasks. However, employing security services inevitably incurs additional security overhead, which will increase the completion time of workflow application. erefore, it is a big challenge to design an efficient security-aware workflow scheduling scheme to reduce the completion time of workflow application while satisfying its security requirement.
To meet the aforementioned challenges, this paper formulates the security-aware workflow scheduling problem in MEC to be a Markov Decision Process (MDP) [14]. e environment state, which consists of the task list on each edge server, the workloads on each edge server, and the channel states between the mobile device and the edge servers, can be observed. Based on the environment state, the task nodes of the workflow are dynamically scheduled to edge servers. e deep reinforcement learning algorithm is suitable to solve decision-making problems with unknown prior knowledge [15][16][17][18][19]. To solve this problem, this paper proposes a deep reinforcement learning-based securityaware workflow scheduling scheme (SAWS). Its main objective is to optimize the completion time of workflow while satisfying its security requirement. To evaluate the effectiveness of the SAWS scheme, this paper implements average workload minimization (AWM), maximum SAWS (MSAWS), Greedy, and HEFT baseline algorithms. We compare the SAWS scheme with these four baseline algorithms under different risk probabilities, different security services, different risk coefficients, different edge server's computing capacities, and different number of edge servers. e experimental results demonstrate that the SAWS strategy can optimize the completion time of workflow application while satisfying the risk probability constraint. e main contributions of this paper can be summarized as follows: is paper focuses on the security problem of workflow scheduling in a dynamic edge computing, which is more complex than independent task scheduling.
is paper formulates the security-aware workflow scheduling problem in mobile edge computing to be a finite Markov decision process, and its main objective is to minimize the completion time of workflow while satisfying the risk probability constraint. is paper proposes a deep Q-network-based securityaware workflow scheduling (SAWS) scheme to solve the workflow scheduling problem in a dynamic edge computing environment with malicious attacks. Extensive experimental results demonstrate that the SAWS scheme can greatly reduce the completion time of workflow application while satisfying the risk probability constraint. e rest of this paper is organized as follows. In Section 2, the related work is summarized. In Section 3, the system model and problem formulation for security-aware workflow scheduling in MEC are presented. In Section 4, the deep reinforcement learning-based security-aware workflow scheduling scheme is described in detail. In Section 5, the simulation parameters are settled, and the experimental performance is analyzed. In Section 6, the work of this paper is concluded.

Related Work
e task offloading problem in the MEC has been studied in a lot of works. According to different optimization goals, these works can be classified into three categories. e first one is task offloading with the goal of optimizing the mobile device's energy consumption. For example, Huang et al. [7] propose a security and cost-aware task offloading scheme based on deep reinforcement learning for task offloading in single-user multiserver scenarios. Its main goal is to minimize the task processing delay and mobile device energy consumption while satisfying the security requirement for task. Chen et al. [20] formulate task offloading problem in single-user single-server scenario to be a stochastic optimization problem and decompose this problem into two deterministic optimization subproblems. To solve these two subproblems, a TOFEE algorithm is proposed to optimize the mobile device's energy consumption. Wu et al. [21] propose a Lyapunov optimization-based energy-efficient task offloading scheme to determine the operating position of the application, the objective of which is to minimize the average energy consumption of mobile devices while satisfying the average response time constraint. e second one is task offloading with the goal of optimizing the task processing delay. For example, Chalapathi et al. [22] propose a task scheduling scheme to solve the task offloading problem in multiple cloudlets, aiming at minimizing the task processing delay. Xu et al. [23] design an adaptive task offloading scheme, which leverages decomposition-based multiobjective evolutionary algorithms to generate feasible solutions, to optimize the task processing latency and resource utilization of edge system. e third one is task offloading with the goal of optimizing the weighted sum of the mobile device's energy consumption and the task processing delay. Wu et al. [24] propose a Lyapunov optimization-based energy-efficient task offloading scheme to control the computational and communication overheads and further choose optimal computational location for the application to minimize energy consumption and task processing time. However, all above works mainly focus on the independent task scheduling in MEC. e task nodes of workflow are precedence-constrained. e above schemes are not suitable for workflow scheduling.
To further study the workflow scheduling problem in MEC, Xu et al. [25] construct a multiresource energy consumption model to solve the unity problem for traditional energy consumption model and propose a particle swarm algorithm-based energy-efficient multiresource workflow scheduling algorithm. Its main objective is to reduce the energy consumption of mobile devices while satisfying the completion time constraint for workflow. Wu et al. [26] construct a weighted resource sum graph based on resource consumption and further design a novel cost-efficient partitioning scheme, the objective of which is to find the optimal partitioning scheme to reduce execution time and energy consumption. Zhu et al. [27] formulate the workflow scheduling problem in MEC to be a joint optimization problem of energy consumption and time delay and adopt the deep Q network algorithm to solve the optimal scheduling scheme. However, the execution order of the workflow is assumed in advance, and how to calculate the execution order of workflow with precedence constraints is not introduced. In addition, this paper does not pay attention to the security problem of workflow scheduling in MEC. Liu [28] proposes a novel maximum probability function and deep Q network-based multiworkflow scheduling scheme to solve the scheduling problem in multiuser edge computing environment, which can find a high-quality workflow scheme in a dynamic environment. However, this paper does not pay attention to the security problem of workflow scheduling in dynamic MEC. erefore, all the above scheduling schemes are not suitable for security-aware workflow scheduling in dynamic mobile edge computing.
With the escalation of data security threats in mobile edge computing [10-12, 29, 30], a lot of related works have taken some measures to protect security-critical applications and the large amount of data generated in mobile devices from malicious attacks. Huang [6] designs a workflow scheduling scheme based on Genetic Algorithms to minimize the mobile device's energy consumption under the completion time of workflow and risk probability constraints. Elgendy et al. [11] design a multidevice and single-server cooperative task offloading scheme to solve the security-aware multiuser resource allocation and task offloading problem. e goal is to minimize the time delay and energy consumption of the whole system. Jia et al. [31] design an identity-based anonymous authentication key agreement protocol to ensure the security of sensitive data in MEC. He et al. [32] design a security mechanism based on adaptive algorithms to solve the security problem of IoT applications in mobile edge computing. Chen et al. [33] propose a malicious application detection method based on deep learning on mobile devices, which greatly improves the security of mobile edge computing. Xu et al. [34] design a secure service offload approach to promote Internet of vehicles service utility and edge utility while ensuring privacy security in software-defined networks enabled edge computing. Xu et al. [35] adopt a location-sensitive-hash (LSH) method to encrypt the feature information for the offloaded services and further design s LSH-based offloading scheme, the goal of which is to minimize the energy consumption and response time of all services while guaranteeing the service security. All above researches mainly design security strategies from different points to ensure the security of edge computing, and they do not pay attention to the security problem of workflow scheduling in a dynamic edge computing with unknown prior knowledge. Aiming at this problem, this paper mainly focuses on securityaware workflow scheduling problem in dynamic mobile edge computing environment with security threats.

System Model and Problem Formulation
In this section, we first introduce the mobile edge computing model, security cost model, communication model, and risk probability model in mobile edge computing environment, respectively, and then describe the security-aware workflow scheduling problem in detail. Figure 1, we consider a mobile edge computing system, which consists of a mobile device U and n edge servers eNB � eNB 1 , . . . , eNB i , . . . , eNB n . e mobile device U can be denoted by a two-tuple U � f u , N u , where f u denotes the CPU frequency of the mobile device, and N u denotes the number of CPU cores of the mobile device. Due to the limited computing resources and battery capacity of mobile device, the workflow applications (such as a video streaming-based face recognition application) running on mobile device can be scheduled to edge servers through wireless network. Each edge server can be denoted by a twotuple eNB i � 〈f c,i , N c,i 〉, where f c,i denotes the CPU frequency of the ith edge server, and N c,i denotes the number of CPU cores of the i th edge server. Each edge server has an execution queue Q c,i that is used to store the tasks scheduled to the i th edge server.

Mobile Edge Computing Model. As illustrated in
Each mobile application can be abstracted into a workflow model, which can be denoted by a directed acyclic graph (DAG) G � 〈V, E〉, in which V � v 1 , . . . , v k , . . . , v K denotes a set of task nodes, and E � e kl |v k ∈ V, v l ∈ V denotes a set of edges between task nodes. Each task node v k can be characterized by a three-tuple v k � 〈W k , D tx k , D rx k 〉, in which W k denotes the workload (CPU Cycles) of task node v k , D tx k denotes the input data size (MB) of task node v k , and D rx k denotes the output data size (MB) of task node v k . e edge e kl represents the precedence constraint between task nodes. is means that task v l can be executed only after task v k is executed. e system time is logically divided to equal length time slots, and the time slot duration is T slot . e index sets of time slots can be denoted by At the beginning of each time slot, a task node in workflow is scheduled to the edge server.

Security Cost Model.
e task nodes scheduled to edge servers are vulnerable to suffer from stealing and tampering security threats. In order to guard against these security threats, these task nodes need to employ encryption service cf and integrity service ig [36][37][38], respectively. Referring to the literature [38], encryption services cf mainly include IDEA, DES, Blowfish, AES, and RC4 algorithms. Each encryption algorithm has its own security level and encryption speed, which can be found in Table 1.
e different encryption algorithms with different security levels can be flexibly selected to protect data from being stolen. Integrity services ig mainly include TIGER, RipeMD160, SHA-1, RipeMD128, and MD5 hash functions. Each hash function has its own security level and hash speed, which can be found in Table 2. e different hash algorithms with different security levels can be flexibly selected to protect data from being tampered. By flexibly selecting different encryption and hash algorithms with different security levels, an integrated security protection is formed to protect against security threats.
To ensure the security of task nodes scheduled to edge servers, the integrated security protection consisting of encryption and hash algorithms with different security levels needs to be employed. However, different security protection leads to different security cost. When the task node in the workflow is scheduled to the i th edge server, the total encryption cost on the mobile device can be calculated by [6] where φ � 2.2. When the task node is scheduled to the i th edge server, sl

Communication Model.
Due to the user's mobility, the channel state between the mobile device and different edge servers is dynamically changing. We assume that the channel state between the mobile device and the edge servers is constant in each time slot τ and is dynamically changing in different time slots. In each time slot τ, the transmission rate R u c,i (τ) between the mobile device and the i th edge server can be calculated by where B c,i denotes the transmission bandwidth between the mobile device and the i th edge server, P u denotes the transmission power of the mobile device, G u c,i denotes the wireless channel gain between the mobile device and the i th edge server, and σ 2 denotes the Gaussian white noise power.

Risk Probability Model.
To measure the risk degree of the task nodes scheduled to edge servers, it is necessary to establish a risk probability model to quantify the risk probability of these tasks. Without loss of generality, referring to the literatures [36][37][38], the malicious attacks of data leakage and data tampering on the i th edge server are assumed to follow Poisson's distribution with parameters λ cf i and λ ig i . erefore, the task node v k in the workflow is scheduled to the edge server eNB i , and the risk probability of data leakage or data tampering can be calculated by [6,38] P sl (4) Based on the above the description, when the task v k in the workflow is scheduled to the edge server eNB i , the risk probability of the task v k suffering from these two malicious attacks can be calculated by When the risk probability of each task v k scheduled to the edge server does not exceed P max , the risk probability of task execution must meet the following risk constraint: 3.5. Problem Formulation. In this section, we formulate the security-aware workflow scheduling problem in the mobile edge computing to be a Markov Decision Process. We first introduce the sorting strategy of workflow nodes and then define the state space, action space, and reward function of this problem. Finally, the objective function and constraints of this problem are defined.

Sorting of Workflow Nodes.
In order to sort all the task nodes in the workflow, we assign a weight Pr(v k ) to each task node v k [39]. e value of Pr(v k ) can be calculated by where ET(v k ) denotes the average time of the task node v k executing on all edge services; R kl denotes the transmission rate between edge servers, where the task node v k and its successor node v l are located; succ(v k ) denotes the set of all successor nodes of the task node v k . Since the edge server each task node v k is scheduled to is not known in advance, the priority of the task node can be calculated by the average time of the task node v k executing on all edge servers. e priorities of all task nodes in workflow can be calculated by equation (7). According to the priorities of all task nodes, these task nodes can be sorted in descending order.

State Space.
In each time slot τ, the sorted task nodes are scheduled in turn. e edge server each task node v k is scheduled to is dependent on the system state. e system state s(τ) in time slot τ can be denoted by where

Action Space.
In each time slot τ, the system action a(τ) can be denoted by where a(τ) � (a c,1 (τ), . . . , a c,i (τ), . . . , a c,n (τ)) is a ndimensional vector, denoting the edge server the current task node is scheduled to. Specifically, a c,i (τ) denotes whether the current task node is scheduled to the edge server eNB i . If the value of a c,i (τ) is 1, it denotes that the current task node is scheduled to the edge server eNB i ; otherwise, it is the opposite. Note that, in each time slot τ, the current task node can only be scheduled to a single edge server. erefore, the system action needs to meet the constraint condition    (τ), . . . , sl ig c,n (τ)) denotes the security level of the integrity service employed by the task nodes scheduled to n edge servers. sl ig c,i (τ) ∈ 0, 1.0, 0.75, 0.69, 0.63, 0.44 { } denotes the security level of the integrity service employed by the task node scheduled to the i th edge server.

Reward Function.
In each time slot τ, given the system state s(τ), after taking an action a(τ), the immediate reward obtained by system is R(τ). e immediate reward R(τ) is defined as where M(a(τ)) � v k denotes that, in time slot, the task node scheduled by taking the action a(τ) is v k . max T end (M(a(τ))) denotes the execution delay of the workflow until the τ th time slot, and R(τ) denotes the increment of the workflow execution delay after scheduling the task in time slice τ. When the task node v k is scheduled to the edge server eNB i , the latest completion time T end (v k ) is needed to be calculated. In order to calculate T end (v k ), it is necessary to calculate the start time T start (v k ) of the task node v k , the encryption time T E c,i of the task node v k on the mobile device, the transmission time T trans c,i of the task node v k transmitted from the mobile device to the edge server eNB i , the waiting time T wait c,i of the task node v k on the edge server eNB i , the decryption time T DE c,i of the task node v k on the edge server eNB i , and the execution time T exec c,i of the task node v k on the edge server eNB i . In general, there may be multiple predecessor nodes for a task node v k . erefore, in order to calculate the start time T start (v k ) of task node v k , it needs to calculate the maximum sum of the completion time T end (v h ) and the transmission time T tr h,k for all the predecessor nodes v h of the task node v k . T start (v k ) and T end (v k ) can be calculated by equations (11) and (12), respectively: where pre(v k ) denotes the set of all predecessor nodes of the task node v k ; v h is a predecessor node of v k . T end (v h ) is the completion time of the task node v h ; T tr h,k is the transmission time between the scheduled node v k and its predecessor node v h .
When the task nodes are scheduled to different edge servers, they will be exposed to different risk probabilities, thereby incurring different start time and different completion time. erefore, this paper needs to find an optimal scheduling strategy π * in a dynamic MEC with security threats, the main goal of which is to minimize the completion time of the workflow while satisfying the risk probability of the task nodes.
Maximize: R(τ), (13) Subject to: e objective of this paper can be denoted by equation (13). e risk probability constraint of the task node can be denoted by equation (14).
Due to the fact that the MEC environment is dynamical, and its state change is unknown (such as the gain state of the wireless channel), it is difficult for traditional optimization methods to solve the security-aware workflow scheduling problem in a dynamic MEC with security threats. However, the deep reinforcement learning algorithm, as a model-free machine learning approach, is good at solving such dynamic stochastic optimization problems. In the next section, the deep reinforcement learning-based security-aware workflow scheduling scheme is introduced in detail.

Deep Reinforcement Learning-Based
Security-Aware Workflow Scheduling Scheme e security-aware workflow scheduling problem in a dynamic MEC with security threats is formulated to be a finite Markov Decision Process. e action space of this problem is discrete. To solve the optimal workflow scheduling scheme, this paper proposes a SAWS scheme based on deep Q network (DQN).
As shown in Figure 2, the DQN framework consists of three main functional components: (1) the evaluated Q network: the evaluated Q network is consisting of one input layer, one hidden layer, and one output layer. e number of neurons in the input layer is equal to the number of dimensions of the state, the number of neurons in the hidden layer is taken as 2048 in this paper, and the number of neurons in the output layer is equal to the number of dimensions of the action. (2) e target Q network: the structure of the target Q network is the same as that of the evaluated Q network. To continuously approach the Q function, the parameters of the target Q network are periodically updated by the parameters of the evaluated Q network. (3) e replay memory: the function of replay memory is to store these state transition experiences 〈s(τ), a(τ), R(s(τ), a(τ), s(τ′ + 1))〉. A minibatch of state transition experiences are randomly chosen from the replay memory to train the Q network in the direction of minimizing a sequence of the loss function. e detailed processes of deep Q-network-based SAWS scheme are described in Algorithm 1.
During the training stage, the system state s(τ) in each time slot τ is first observed and fed into the evaluated Q network.

Security and Communication Networks
During the testing stage, the system state is first reset, and the learned network parameters are loaded. en, at the beginning of each time slot, the current system state s(τ) is observed and fed into the trained neural network. Next, the neural network selects an optimal action a(τ) for the system state s(τ) and the corresponding reward is calculated.

Experimental Evaluation
To demonstrate the effectiveness of the proposed SAWS scheme in this paper, a lot of comparative experiments can be conducted. In this section, the simulation parameters are first set. en, MSAWS, AWM, Greedy, and HEFT baseline algorithms are introduced. Finally, the performance of the SAWS scheme in comparison with these four baseline algorithms is analyzed under different simulation parameters.

Parameter Settings.
is paper mainly considers a mobile edge computing system consisting of a mobile user U and n edge servers. Different workflow applications generated on the mobile device need to be scheduled in a dynamic MEC with security threats. Referring to the literatures [6,7], the detailed parameter settings in experiment are introduced as follows: (1) e parameter settings of the mobile device: the CPU frequency f u and the CPU core number N u of the mobile device are set to f u � 2.5 GHz and N u � 4, respectively. (2) e parameter settings of edge servers: the number of edge servers is set to n � 4. e CPU frequencies of five edge servers are set to f c,1 � · · · � f c,5 � 2.5 GHz. e numbers of CPU cores are N c,1 � 6, N c,2 � 7, N c,3 � 8 and N c,4 � 9, respectively. e risk coefficients of confidentiality service for these five edge servers are λ (3) e communication parameter settings: the transmission power of each edge server is P c,i � 40 W, the maximum bandwidth is B c,i � 100 MHz, the white Gaussian noise power is σ 2 � −174 dBm/Hz , the path loss constant is z � 2, the path loss exponent is θ � 4, and the reference distance is d o � 1 m [6,7]. e distance between the mobile device and each edge server is d i ∈ (0, 350] m. (4) e parameter settings of workflow: the number of task nodes in different workflows is set to 50, 100, and 150, respectively. e out degree or in degree of each intermediate task node is less than 5, and every two task nodes can be connected with 10% probability to form an edge. e workload W k of each task node v k is in the range of 1GHz · s to 10 GHz · s. e input data size D tr k of each task node v k is in the range of 10 MB to 100 MB, and its output data size D rx k is set from 1 MB to 10 MB. e maximum risk probability of each task node v k is P max � 0.4. (5) e parameter settings of the neural network: the evaluated Q network is consisting of one input layer, one hidden layer, and one output layer, and the number of neurons in the hidden layer is 2048. e learning rate is 0.003, and the learning discount factor c is 0.9. e size buf size of the replay memory is 3000, and the size mini batch of the state transition experiences randomly sampled from the replay memory is 64. e maximum value of episodes is set to MAX EPI � 1000. e maximum value K of steps in each episode is equal to the number of task nodes in workflow.

Performance Analysis.
To demonstrate the effectiveness of the proposed SAWS scheme, this paper implements MSAWS, AWM, Greedy, and HEFT baseline algorithms and compares the SAWS scheme with these four baseline algorithms under different experimental parameters.

Average Workload Minimization (AWM):
In each time slot, the AWM strategy chooses the edge server with the smallest average workload to schedule the task node. SAWS: is abbreviation represents a security-aware workflow scheduling scheme. Its main goal is to minimize the completion time of workflow while satisfying the risk probability constraint. MSAWS: Based on the SAWS scheme, the security service with the security level 1 is chosen for these scheduled task nodes. Greedy: In each time slot, the Greedy algorithm selects the edge server that enables each scheduled task node to complete at the earliest based on the current environment.
HEFT [40]: is abbreviation represents heterogeneous earliest finish time. is algorithm is a workflow scheduling strategy based on list and is widely used in workflow scheduling. It first needs to calculate the priority of task nodes based on their computational and communication costs. en, the task node is scheduled to the server that can complete it at the earliest.

e Convergence Analysis of SAWS.
ree different types of workflows with 50, 100, and 150 task nodes are scheduled by the SAWS scheme. Figure 3 shows their learning curves, respectively. It can be observed that the completion time gradually decreases and tends to be stabilized with the increasing of learning time (i.e., the number of Episodes). is result indicates that the proposed SAWS scheme can learn an optimal policy to schedule workflow applications with different task nodes. e optimal policy can minimize the completion time of workflow while satisfying risk probability constraint. Moreover, as shown in Figure 3, it can be further observed that the completion time of workflow application with 50 task nodes is smallest, that of workflow application with 100 task nodes is medium, and that of workflow application with 150 task nodes is the largest. is is because the larger the scale of the workflow application, the larger the completion time.

e Impact of Different Risk Probabilities.
To examine the impact of different risk probabilities on the completion times of different workflows, the risk probability is varied from 0.2 to 1.0 with the increment of 0.2 for workflows with 50, 100, and 150 task nodes, respectively. Figure 4 shows the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms under different risk probabilities for workflows with 50, 100, and 150 task nodes. As shown in Figure 4(a), the completion time of the SAWS algorithm is less than that of the MSAWS, AWM, Greedy, and HEFT algorithms. e main reason is that the SAWS algorithm can learn a security-aware workflow scheduling scheme in a dynamic MEC with security threats. is scheme can make an optimal scheduling decision according to different system states, thereby minimizing the completion time of the workflow while satisfying the risk probability constraint. e AWM algorithm selects the edge server with the least average workload to execute task node; hence, it is difficult to obtain an optimal solution. Although the Greedy and HEFT algorithms select the edge server that enables the task node to execute the task node at the earliest completion, it does not consider the after effect of task scheduling and is difficult to get an optimal solution. e MSAWS algorithm always selects the security service with the security level 1 to encrypt these scheduled task nodes. e MSAWS algorithm can effectively ensure the risk probability but significantly increases the completion time of workflow application. Moreover, we can observe that the completion time of five algorithms gradually decreases with the increase of the risk probability. It is because the greater the risk probability, the lower the security service level employed by task node to ensure its risk probability, and thereby the shorter the completion time of the workflow.
In addition, we can observe from Figure 4 that the completion time of workflow gradually decreases with the increase of the number of task nodes in workflow. e reason for this is the same as discussed in Section 5.2.1.

e Impact of Different Security Services.
To evaluate the impact of different security services on the completion times of different workflows, only encryption service or only integrity service is employed by task nodes in different workflows. For simplicity's sake, only encryption service and only integrity service are denoted by Confi_Only and Integ_Only, respectively. Figure 5 shows that the completion time of Confi_Only and Integ_Only gradually decreases with the increase of the risk probability. It can be explained that the higher the risk probability, the lower the security level employed, the higher the processing rate of the security service, and thereby the shorter the completion time of the workflow. Moreover, it can be further observed that the completion time of Integ_Only is shorter than that of Confi_Only. is is because when the security level of the encryption service is approximately equal to that of the hash service, the processing rate of the hash service is higher than that of the encryption service. At last, it can be observed from Figure 5 that, with the increase of workflow nodes, the completion times of Confi_Only and Integ_Only gradually increase. e reason for this is the same as that discussed above.

5.2.4.
e Impact of Different Risk Coefficients. Figure 6 shows the impact of different risk coefficients on the completion times of different workflows. We vary the risk coefficients of stealing and tampering security threats from 0.3 to 3, with the increment of 0.3. We can observe from Figure 6 that the completion time of Confi_Only and Integ_Only gradually increases with the increase of the risk coefficient. It is due to the fact that the task nodes are attacked more frequently with the increase of risk coefficient. In order to satisfy the risk probability constraint, the security service with a higher level is employed, which leads to longer task processing delay and the completion time of workflow. Moreover, we can observe from Figure 6 that the completion time of Confi_Only is higher than that of Integ_Only. e main reason is that when the security level of the encryption service is approximately equal to that of the hash service, the processing rate of the encryption service is lower than that of the hash service, which leads to a longer task processing delay and the completion time of workflow. Finally, we can see from Figure 6 that the completion time of Con-fi_Only and Integ_Only gradually increases with the increase of the number of the task nodes in workflow. e reason for this is the same as that discussed in Section 5.2.1.

5.2.5.
e Impact of Different Edge Server's Computing Capacities. Figure 7 shows the impact of different edge server's computing capabilities on the completion time of different workflows. As shown in Figure 7, we can see that the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms decreases with the increase of the number of the CPU cores. e main reason is that the more the CPU cores, the stronger the edge server's computing capacity, and thereby the shorter the task processing delay. erefore, the completion time of workflow gradually decreases. In addition, we can further observe from Figure 7 that the SAWS algorithm performs better than the MSAWS, AWM, Greedy, and HEFT algorithms in terms of completion time of workflow. e reason for this is the same as that discussed in Section 5.2.2. Finally, we can observe that the completion time of the SAWS, MSAWS, AWM, Greedy, and   HEFT algorithms gradually increases with the increase of the number of task nodes in workflow. e reason for this is the same as that discussed in Section 5.2.1.

5.2.6.
e Impact of the Number of Edge Servers. Figure 8 shows the impact of different number of edge servers on the completion time of different workflows with 50, 100, and 150 task nodes, respectively. To investigate the impact of different number of edge servers on performance, we vary the number of edge servers from 2 to 6 with the increment of 1. As shown in Figure 8, we can observe that the completion time of the SAWS, MSAWS, AWM, Greedy, and HEFTalgorithms gradually decreases with the increase of the number of edge servers. It can be explained that the greater the number of edge servers, the stronger the computing capacity of the whole system, and thereby the shorter the completion time of workflow. Moreover, we can further observe that the completion time of the SAWS algorithm is lower than that of the MSAWS, AWM, Greedy, and HEFT algorithms. e reason for this is the same as that discussed in Section 5.2.5. At last, we can observe that, with the increase of task nodes in workflow, the completion times of the SAWS, MSAWS, AWM, Greedy, and HEFT algorithms gradually increase. e reason for this is the same as that discussed above.

Conclusions and Future Work
is paper proposes a reinforcement learning-based security-aware workflow scheduling (SAWS) scheme to solve the workflow scheduling problem in a dynamic MEC with security threats. is paper first constructs the mobile edge computing model, security cost model, communication model, and risk probability model, respectively. en, this paper formulates the security-aware workflow scheduling problem to be a finite Markov Decision Process. To solve this problem, this paper adopts a deep Q network approach to learn an optimal security-aware workflow scheduling policy. e SAWS scheme enables minimization of the completion time of workflows while satisfying the risk probability. To verify the effectiveness of the SAWS scheme, this paper implements the MSAWS, AWM, Greedy, and HEFT baseline algorithms and compares the SAWS scheme with these four baseline algorithms under different experimental parameters such as the risk probability, the security service, the risk coefficient, the edge server's computing capacity, and the number of edge servers. e extensive experimental    Data Availability e experiment data supporting this experiment analysis are from previously reported studies, which have been cited. e experiment data used to support the findings of this study are included within the article. e experiment data are described in Section 5 in detail.

Conflicts of Interest
e authors declare that they have no conflicts of interest.