Efficient Deep Learning Approach for Computational Offloading in Mobile Edge Computing Networks

The ﬁfth-generation mobile communication technology is broadly characterised by extremely high data rate, low latency, massive network capacity, and ultrahigh reliability. However, owing to the explosive increase in mobile devices and data, it faces challenges, such as data traﬃc, high energy consumption, and communication delays. In this study, multiaccess edge computing (previously known as mobile edge computing) is investigated to reduce energy consumption and delay. The mathematical model of multidimensional variable programming is established by combining the oﬄoading scheme and bandwidth allocation to ensure that the computing task of wireless devices (WDs) can be reasonably oﬄoaded to an edge server. However, traditional analysis tools are limited by computational dimensions, which make it diﬃcult to solve the problem eﬃciently, especially for large-scale WDs. In this study, a novel oﬄoading algorithm known as energy-eﬃcient deep learning-based oﬄoading is proposed. The proposed algorithm uses a new type of deep learning model: multiple-parallel deep neural network. The generated oﬄoading schemes are stored in shared memory, and the optimal scheme is generated by continuous training. Experiments show that the proposed algorithm can generate near-optimal oﬄoading schemes eﬃciently and accurately.


Introduction
e rapid development of fifth-generation (5G) mobile communication technology services in recent times has prompted the emergence of compute-intensive applications, such as intelligent driving, ultra-high-definition video, and mobile crowdsensing [1]. e 5G technology is largely characterised by extremely high data rate, low latency, massive network capacity, and ultrahigh reliability; hence, it requires the appropriate architecture to function efficiently. However, traditional centralised cloud computing network architecture is unable to meet the requirements of the 5G software architecture because it is limited by excessive link load and delays in real-time response [2,3]. Consequently, the European Telecommunications Standards Institute (ETSI) has proposed a new concept of mobile edge computing (MEC) [4][5][6]. Users can migrate compute-intensive and delay-sensitive applications from local to edge server to solve the problems of limited computing resources and battery energy [7]. Concurrently, the edge servers can precache some content required by users to reduce access delay and improve user experience [8]. Currently, the research methods of MEC are mainly divided into two categories: offloading method for reducing time delay and offloading method for reducing energy consumption. To reduce time delay, some researchers proposed the IHRA scheme for computations in multiuser situations [9]. It considered the rich computing resources of cloud computing and the low transmission delay characteristics of MEC. e offloading technique enabled part of the computing tasks to be offloaded to the user terminal device for execution, thereby reducing the execution delay of delaysensitive applications by 30%. Some studies have proposed the LODCO algorithm [10], a dynamic computing offloading algorithm based on Lyapunov optimisation theory. is method optimises the unloading decision based on two aspects, task running delay and task running fault, to ensure the minimisation of processing delay in the offloading task and the success rate in the data transmission process. Consequently, the probability of offloading failure is reduced. e simulation results showed that the algorithm has excellent advantages in reducing time delay and could shorten the execution time of offloading tasks by 64%. However, this method only focused on the delay condition and failed to consider the energy consumption of the mobile terminal device during the offloading. As a result, the terminal equipment may not operate properly owing to insufficient power, which can have negative impact on the user experience. erefore, further study is required to discover an offloading technique that can minimise energy consumption. Several studies have been conducted on the optimisation of energy consumption to solve offloading problem in different environment scenarios. Some studies have adopted the artificial fish swarm algorithm to design the offloading scheme for energy consumption optimisation under time delay constraints [11]. Although this method effectively reduces the energy consumption of the task data transmission network by considering the link status in the network, it is characterised by high complexities. In a previous study, a particle swarm task-scheduling algorithm was designed for multiresource matching to minimise energy consumption of edge terminal devices [12]. Furthermore, some studies have investigated partial offloading of computing tasks to minimise the energy consumption of mobile devices. For wireless devices (WDs) with a separable task [13], an energy-saving optimisation problem was proposed, and a greedy algorithm was used to solve it. However, this method only reduced the energy consumption of the mobile terminal and could not minimise delays in the task execution time.
To minimise the calculation delay and energy consumption simultaneously, especially in the environment of multiple MEC servers and multiple terminal users, it is appropriate to realise the offloading of the computational tasks of wireless mobile devices [14]. e use of software defined networks (SDN) for MEC has been adequately exploited to efficiently and effectively deal with the data offloading problem. Specifically, the SDN paradigm transforms the communication networks into a programmable world, where a centralised entity, namely, SDN controller, acquires a global view of the communication links and manages the network traffic efficiently and dynamically [15,16]. However, the centralised control mode and openness of SDN can pose potential risk to the security of the controller.
Due to these limitations, it avoids the network risk induced by the centralised control mode, while considering the minimisation of computation delay and energy consumption. is study designs a deep learning offloading algorithm that focuses on reducing energy consumption and time delays. e algorithm is composed of two components, offloading scheme and deep learning, and aims to solve the problem of selective offloading of mobile application components. e innovation of the algorithm is mainly reflected in the following: (1) When we establish the system utility model in the MEC network, we mainly weight them based on the two operators of delay communication and residual energy and then obtain the cost of offloading. (2) e other innovation of this study is a new type of deep learning model, which uses multiple-parallel deep neural networks [17][18][19], stores several offloading schemes of computing tasks in a shared memory, and substitutes them into the new deep learning model. After repeated iterations, the optimal offloading scheme is finally obtained. Compared with the traditional single deep neural network, this novel deep learning model has the advantage of facilitating the optimal edge offloading scheme. us, the edge offloading convergence is greatly improved.
Experiments show that the scheme has higher accuracy, lower energy consumption, and lower communication delay. e MEC architecture based on the new deep learning model proposed in this study is shown in Figure 1.

System Model.
In this study, an efficient mobile edge offloading framework is designed. Suppose that, in the MEC framework, only one edge server, one small cell, and N WDs exist, where N WDs are represented by a set N � {1, 2, . . ., N}. We assume that the computational task of each wireless terminal device contains C independent subtasks, which are recorded as A � {Application_1, Application_2, . . ., Appli-cation_c}.
e work queue of these subtasks is stored in FIFO. For the subtasks in any WD, the offloading scheme is obtained through the offloading algorithm designed in this study. e offloading scheme is expressed in binary; that is, P t ∈ {0, 1}. P t � 0 indicates that WD subtasks are executed locally, and P t � 1 indicates that WD subtasks are offloaded to the edge server for execution. P is the offloading scheme of all subtasks in WD, which is recorded as e edge computing model of the WDs network is shown in Figure 2.

Local Execution Model.
Assuming that there is a subtask t in WD, to solve the cost function of its local execution, the computational delay and energy consumption must be first determined. Afterwards, the two following operators must be used to solve it. e data size of any WD subtask is marked as d nt . W t represents the computing resources occupied by subtask t, V represents the number of clocks occupied by the CPU to execute one byte, and f u represents the CPU operating frequency of the WD [20]. erefore, the computational delay is Cost Function: Offloading Scheme

Wireless Communications and Mobile Computing
W t is determined by d nt and V, which denote the data size of the subtask and the number of clocks occupied by the CPU executing one byte, respectively.
During local execution, the energy consumed by executing each byte is recorded as e l , and the energy consumed by executing subtasks is obtained using the following equation: e cost function of local execution is obtained from the computational delay and energy consumption of local execution, which is recorded as C l (s): In equation (5), c 1 and c 2 are weighting coefficients, which are linear with the maximum execution time (T max ) and maximum energy consumption (E max ) in the task, respectively.

Mobile Edge Computing.
To solve the problems of limited computing resources and battery energy, computeintensive and delay-sensitive computing tasks in WDs are offloaded locally to the edge server for execution to improve the efficiency of the system. Similar to local execution, computational delay (T r(c) ) and energy consumption (E r nt ) are crucial operators that affect edge offloading. is study considers the impact of computing resources and communication resources on T r(c) and E r nt . When the influence of all variables is fully estimated, the most reasonable cost function can be formulated and the optimal offloading scheme can be obtained. e process of offloading the subtasks in WDs to MES involves the following steps. First, the data is offloaded to MES. Second, the task is executed in MES. Subsequently, the third step requires a downlink to a WD, and the last step is to complete decoding in WD.
Consequently, four time constants are generated: the upload time of the subtask, T up ; the execution time of the subtask executing the task in the MES, T ex ; the time of the subtask down to the WD, T down ; and the decoding time, T d .
Equation (6) shows that, in the MES, the allocation of communication resources, namely, bandwidth, directly affects T up and T down of the subtask, and the allocation of computational resources, that is, the allocated CPU, directly affects T ex of the subtask. ey are denoted by the following equations: In equation (7), r ul and r dl represent the transmission rates of the subtask from uplink to MES and downlink to WD, respectively. ese two factors are closely related to the allocation of communication resources. W t represents the number of clocks required to execute subtask T, M represents the number of CPUs allocated to the subtask (the number of M depends on the data structure of the subtask), and f s represents the CPU working frequency of MES. Bandwidth allocation directly affects the uplink time and downlink time of subtasks [21,22]. e effects of bandwidth and number of CPUs on the computational delay and transmission delay in offloading are shown in Figure 3.
In this study, emphasis is placed on the method for solving r ul and r dl . e orthogonal frequency division multiple access (OFDMA) technology is used to complete the allocation of communication resources. is technology divides the channel into several orthogonal subchannels and converts the high-speed data signal into parallel low-speed subchannels. Because the orthogonal signals can be separated using correlation technology at the receiving end, it reduces the mutual interference among subchannels and completes the allocation of communication resources. e total network bandwidth B is decomposed into k subcarriers (K ∈ 1, 2, 3, . . ., K), and each subtask t is allocated to several subcarriers. In signal processing, considering that the additive Gaussian white noise signal (AWGN) is easy to analyse and approximate, the actual noise signal can be approximately processed with Gaussian white noise in a certain frequency band for the analysis of the noise performance of the signal processing system [23,24]. erefore, the maximum data transmission rates of uplink and downlink are Here, B is the network bandwidth, D is the distance from WD to MES, N 0 is the power consumption of noise, P u is the power consumption generated by WD transmission data, and h ul is the attenuation coefficient of uplink channel. β is the path loss index, g ul is the bit error rate of uplink data transmission, and Γ(gul) � ((−2 log(5 gul))/(3)) represents the stability margin of signal-to-noise ratio, which aims to meet the target of bit error rate.

Wireless Communications and Mobile Computing
Similar to the analysis method for calculating the delay, the energy consumption E r nt generated by offloading the subtasks to the MES is mainly composed of the energy consumption E ex of the MES execution task and the energy consumptionE d of the decoding in the WD. e energy consumption generated by the data uplink and downlink is ignored.
e cost function of remote offloading is obtained by computational delay T r (c) and energy consumption E r nt of edge offloading, which is denoted as C r (s): In equation (10), c 3 and c 4 are linearly related to T max and E max , respectively. T D represents the maximum execution time of the subtask in the MES, and E M represents the maximum energy consumption in the task.

Cost Function.
WDs require a certain amount of time and energy to perform computing tasks. In this paper, a system utility model S is introduced (equation (11) and the delay and energy are minimized by analyzing the model. e model contains four key parameters that affect the utility of the system: the number of bytes D of the subtask, the allocated communication resources, K subcarriers, and the allocated computing resources-m CPUs. e energy consumption E nt is employed to execute the task.
Based on the established system utility model and the cost analysis of local offloading and edge offloading, the cost function of performing computing tasks is obtained as In equation (12), P t represents the binary offloading scheme of each subtask in WDs. P t � 0 indicates that the subtask of WD is executed locally, and P t � 1 indicates that the subtask of WD is to be offloaded to the edge server for execution.

Problem Formulation.
is study uses the system model S and the cost function C(S, P t ) to find the optimal offloading scheme P o , which minimises the cost of executing tasks and obtains the optimal solution for MEC.
To obtain the optimal scheme, we input the system utility parameters of the computing task into the DNN [25]. After the identification of the hidden layer, we obtain the offloading scheme at the output layer. e offloading scheme passes through the back-propagation and iteration of the neural network [26], namely, gradient descent algorithm. We obtain the optimisation scheme P o that is closest to the actual value. e mathematical expression is as follows: In equation (13), t∈T C(S, P t ) represents the total cost of performing the calculation task and P represents the most Wireless Communications and Mobile Computing 5 optimal offloading scheme. We use the DNN to obtain the most approximate value P o , making P o infinitely close to P.

EDLO Algorithm Design
DNN has a high accuracy rate when training big data [27]. e second aim of this paper is to use multiple-parallel deep neural networks to train samples and obtain the optimal offloading scheme [28,29]. e input layer of the model is the vector state S * of several subtasks in the computing task.
e output layer is the unloading plan P. is study divides the computing task of WD into several subtasks. e scheme for executing the task is denoted as P � P t , and c � 1, 2, 3, . . ., |k|. Each subtask has two possibilities of offloading; that is, it can be executed locally (P t � 0) or using MES (P t � 1). erefore, they are 2 |k| offloading schemes for t subtasks.
e detailed steps of energy-efficient deep learningbased offloading (EDLO) designed in this study are listed in Table 1.

EDLO Algorithm Analysis.
e multiple-parallel deep neural network designed in this study increases the number of hidden layers and neuron sums in the network. e number of hidden layers is set to 2, and the number of neurons in each layer is 256 [35]. Based on the corresponding numerical analysis, it was observed that an increase in the number of hidden layers and neurons prompted a corresponding increase in computation accuracy [30]. e sigmoid function is used as the activation function and the value range, which is used for the output of hidden layer neurons, is [0, 1]. Finally, the cross entropy loss function is used as the loss function [31]. ese matchings are made because the use of the sigmoid function can avoid the loss of mean square error during gradient descent [32,33], where learning rate is reduced. In other words, the learning rate can be controlled by the output error. e learning process of the DNN is shown in Figure 4.

Experiment.
In this study, the EDLO algorithm is designed for the optimal offloading scheme problem. e following experiments and simulations were performed to verify the performance of the algorithm. We assume that the edge offloading framework includes three WDs and the computing task of each WD consists of five subtasks, which can be executed by either the edge server or the wireless terminal.
e data byte size of each subtask is randomly distributed within 10 M to 20 M [34]; that is, D ∈ [10 M, 20 M]. e number of clocks occupied by CPU executing one byte is given as V ∈ [2000, 10000]. e available energy E t in the WD is 1500 J [35]. e energy consumption of transmitting and receiving data by WDs is 11.3 × 10 6 J/bit [36]. e number of CPUs in the MES is denoted as m ∈ [0,10]. e number of subcarriers is denoted as n ∈ [1,512]. In Figure 5, the number of DDNs is different, which affects the convergence of the EDLO algorithm. Because of the input of different state space S, the evaluation function of the system produces different output results. is study uses the EDLO algorithm as the benchmark to ensure that the computation standard of the input quantity is unified. As shown in Figure 5, as the learning steps increase, the EDLO algorithm gradually converges to one. When using ten DNNs, the convergence rate is approximately 98% after 1000 learning steps. When using one DNN, the computation accuracy rate is 75% and cannot converge. When using three DNNs, the computation accuracy rate is approximately 97%. Specifically, the computation accuracy rate is increased by 21% compared to when one DNN is used, and the algorithm converges. As the number of DNNs increases, the convergence speed increases accordingly. e influence of different DNNs on the computation convergence speed is shown in Figure 5. e EDLO algorithm can generate an optimal unloading decision in less than 0.2 s. Its computational time is a close approximation of the different numbers of DNNs. e computational delay statistics of different numbers of DNNs are shown in Figure 6.
Different learning rates and different data storage spaces have different effects on the convergence speed of EDLO algorithm. e higher the set learning rate is, the faster the convergence speed of EDLO algorithm is. However, as shown in Figure 7, an increase in the learning rate increases the probability of obtaining the optimal solution locally as compared to the probability of obtaining the entire optimal solution. erefore, it is appropriate to choose the best learning rate according to the actual situation. Figure 8 shows that different memory spaces affect the convergence speed of the EDLO algorithm. Based on the experiments conducted, it was observed that local convergence is often the first to be established. erefore, to balance the convergence speed and overall performance of the algorithm in the MES network, we use 2048 memory space to achieve the fastest convergence speed. Figure 8 depicts the effect of the size of memory space on the convergence speed of the algorithm.

Comparative Analysis.
To accurately evaluate the performance of the EDLO algorithm, this study compares the performance of the EDLO algorithm with the processing schemes of four computing tasks. Ten DNNs are used in the EDLO algorithm, the learning rate is defined as 0.01, and the system memory is 2048. Consequently, the EDLO algorithm achieves optimal performance. e four schemes used for comparison are as follows:    Algorithm: train deep network using EDLO data BEGIN (1): Input: Input the state vector of the computing task, S * � (d, k, m, E nt ) (2): Output: Optimal offloading decision, p o � arg p min( t∈T C(S, P t )) (3): Initialization: (4): Initialize the DNNs with random parameters, weight &paranoid item (5): Empty the memory structure (6): Ci<-C(S, P) P * <-arg p min( t∈T C(S, p t )) end while (7): Store into the memory structure;       consumption of executing the program [39]. e neural network of the algorithm contains two hidden layers, each with 128 neurons.
In this paper, the EDLO algorithm and the other four algorithms are compared and tested based on the following: (1) the impact of communication resource on offloading delay, that is, the impact of bandwidth B on T ud � T up + T down ; (2) the impact of computing resources on edge computing latency, that is, the impact of the number of CPUs m on T ex ; (3) the total computational delay generated during task execution, that is, the total delay consumption of task execution; (4) computational accuracy; (5) energy consumption; and (6) computational cost. Based on the results, it was proved that the EDLO algorithm provides optimal performance. Network parameters in this study are listed in Table 2.
Owing to the TLP, the computing and communication resources of MES are not used for local data processing. In this study, when the network bandwidth is 100 Mbps, the number of cores in MES is 10 and the computing task size is 15 M. e effects of the four algorithms on communication and computational delays are compared with those of the EDLO algorithm. e communication and computational delays of EDLO algorithm are 93 ms and 102 ms, respectively. e delay time of EDLO algorithm is substantially lower than those of the TLP and ROS algorithms and moderately lower than that of the traditional deep learning algorithm, DLO. Compared to the DLO, the communication delay time and computational delay time of EDLO are 30.76% and 31.58% lower, respectively. us, the computational efficiency of the proposed algorithm is significantly improved because we consider the impact of bandwidth and CPU on delay. e communication delay T ud and computational delay T ex of all the algorithms in the offloading process are shown, respectively, in Figures 9 and 10. Figure 11 presents a comparison of the total computational delay of EDLO and other algorithms. Based on the experimental results, the total computational delay of EDLO is 860 ms, which is 44.19% lower than that of the DLO algorithm. us, the real-time performance of computation is improved greatly. Figure 12 shows a comparison of the offloading accuracy of EDLO and other algorithms. Based on the results, the offloading accuracy of EDLO is significantly higher than those of TIP and TEP and slightly higher than that of DLO. However, the offloading accuracy of the ROS algorithm is approximately zero. Similar to Figure 11, Table 3 shows that the EDLO algorithm is reliable. Figure 13 shows the comparison of energy consumption of different algorithms. e energy consumption generated by EDLO is lower than that generated by DLO, which is  9.82% and significantly lower than those of the other three traditional data processing methods. Figure 14 presents a comparison of cost of EDLO and the other algorithms. Based on the experimental results, the final cost of the EDLO algorithm is 38.32%, which is lower than that of the DLO and significantly lower than those of the other three traditional data processing methods.
It can be observed that the EDLO Algorithm exhibits superior performance compared to the three schemes of TLP, TEP, and ROS. e performance of the EDLO algorithm is higher than that of the DLO algorithm.

Conclusion
In this study, we propose a deep learning offloading algorithm, EDLO, to reduce energy consumption and time delay. To realise this, we consider the energy consumption, computing delay, computing resources, and communication resources of the applied system. Distributed management avoids the hidden dangers of network security caused by centralised control. e algorithm introduces multipleparallel DNNs, which can generate the optimal solution without manually labelling the data, and the numerical results verify the accuracy and performance of the algorithm. We envisage that our proposed framework can be applied in subsequent advances of MEC network to achieve optimised real-time offloading.
Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
Xiaoliang Cheng and Jingchun Liu have contributed equally to this work and share first authorship.