Energy-Efficient Mobile Edge Computing: Three-Tier Computing under Heterogeneous Networks

Mobile edge computing (MEC) is a promising technique to meet the demands of computing-intensive and delay-sensitive applications by providing computation and storage capabilities in close proximity to mobile users. In this paper, we study energy-efficient resource allocation (EERA) schemes for hierarchical MEC architecture in heterogeneous networks. In this architecture, both small base station (SBS) and macro base station (MBS) are equipped with MEC servers and help smart mobile devices (SMDs) to perform tasks. Each task can be partitioned into three parts. The SMD, SBS, and MBS each perform a part of the task and form a three-tier computing structure. Based on this computing structure, an optimization problem is formulated to minimize the energy consumption of all SMDs subject to the latency constraints, where radio and computation resources are considered jointly. Then, an EERA mechanism based on the variable substitution technique is designed to calculate the optimal workload distribution, edge computation capability allocation, and SMDs’ transmit power. Finally, numerical simulation results demonstrate the energy efficiency improvement of the proposed EERA mechanism over the baseline schemes.


Introduction
Driven by the rapid development of Internet of Things and mobile Internet, many novel applications are emerging [1]. However, most of these applications are computingintensive and delay-sensitive, e.g., augmented reality, face recognition, and healthcare [2]. Running these applications locally is very challenging for smart mobile devices (SMDs) when ensuring users' quality of experience (QoE) because of the limited resources of SMDs. How to complete the applications while guaranteeing users' QoE becomes the focus of academic and industrial communities. Mobile edge computing (MEC) is a promising technique to solve this problem, which endows the radio access network with computation and storage capabilities. In order to improve users' QoE, MEC helps SMDs complete applications by performing some tasks in the edge nodes of networks, which reduces the latency and energy consumption of task execution thanks to the close proximity of edge nodes to SMDs [3,4].
Extensive research on MEC has been conducted from many perspectives, e.g., single-server MEC models and multiserver MEC models. Regarding the single-server MEC models, much work has been done, e.g., single-user models [5][6][7][8][9] and multiuser models [10][11][12][13][14][15]. For a single-user MEC model, the authors in [5] considered a binary computation offloading model and derived a data consumption rate threshold that decided to offload the whole task or execute the entire task locally. Based on that work, for further reducing the energy consumption of SMDs, partial offloading was introduced into the single-user model. The task was partitioned into two parts, one of which was offloaded [6,7]. Considering the stochastic arrival of tasks, the optimal task scheduling policy was derived to minimize the weighted sum of the energy consumption and latency [8]. In addition, the energy harvesting technique was incorporated into the MEC model and the Lyapunov optimization-based dynamic computation offloading algorithm was proposed in [9]. For a multiuser MEC model, to satisfy the requirements of as many users as possible in a channel environment with wireless interference, the multiuser offloading system was formulated as a game and analyzed to admit a Nash equilibrium [10]. Considering inelastic computation tasks and non-negligible task execution durations, the authors in [11] proposed an energy-efficient resource allocation schemes. To deal with the arbitrary arrival of tasks in multiuser MEC system, tasks scheduling techniques were utilized in [12,13]. To reduce the redundant execution of the same tasks and minimize the energy consumption, the storage resource of the base station was utilized in [14]. For further improving users' QoE, wireless power transfer was added into the multiuser MEC model and an access point energy minimization problem was formulated [15].
Regarding the multiserver MEC models, many edge cloud architectures are emerging, e.g., flat edge cloud architectures [16][17][18][19] and hierarchical edge cloud architectures [20][21][22]. In the flat edge cloud architectures, MEC servers are located at the same tier. In the hierarchical edge cloud architectures, MEC servers are located at different tiers. And MEC servers in different tiers have distinct computation and storage capabilities [3,23]. For a flat edge cloud architecture, geography information of SMDs and MEC servers was used to reduce the task execution delays in [16]. Considering maximizing the revenue of service providers, resources from different service providers were centralized to create a resource pool and the revenue was allocated by using core and Shapley values [17]. To minimize the communication latency, a cloudlet selection model based on mixed integer linear programming was developed in [18]. Furthermore, by utilizing the idle computing resources of vehicles, the authors in [19] proposed a decentralized framework named Autonomous Vehicular Edge to increase the computational capabilities of vehicles. For a hierarchical edge cloud architecture, a three-tier MEC model was built on the basis of LTE-advanced mobile backhaul network [20]. For improving the cost efficiency of network operators, the authors in [21] took the cost disparity of the edge tiers into account. Under a three-tier MEC model, the Stackelberg game was used to allocate the limited computing resources of edge severs to the data service subscribers [22].
Combined with heterogeneous networks, the hierarchical MEC was further studied. The small base station (SBS) and macro base station (MBS) are equipped with MEC servers to serve SMDs. Particularly, in [24], offloading decisions and radio resource were optimized jointly for minimizing the system energy cost. Then, the framework was developed further. SBSs were endowed with computing capabilities. And a resource allocation problem for minimizing the energy consumption of mobile users and MEC servers was formulated [25]. Based on the heterogeneous network powered by hybrid energy, user association and resource allocation were optimized for maximizing the network utility [26]. Considering the variability of mobile devices' capabilities and user preferences, offloading decisions and resource allocation were optimized for maximizing system utility [27]. In addition, a novel information-centric heterogeneous network framework was designed and a virtual resource allocation problem was formulated in [28].

Motivations and Contributions.
Hierarchical architectures of edge servers have an advantage over flat architectures in serving the peak loads [23,29]. In addition, under the three-tier MEC architectures, previous studies focused on the system construction [20][21][22] and maximization of the system utility [26][27][28]. However, it is also important how to allocate computation and communication resource energy efficiently under a three-tier MEC architecture to improve users' QoE. In this paper, we investigate a multiuser threetier computing model under heterogeneous networks. The SBS integrated with relatively small computation capability and MBS integrated with great computation capability jointly execute tasks. Based on this hierarchical MEC model, an energy-efficient resource allocation (EERA) scheme is proposed. In EERA, the computation and radio resources are optimized jointly for minimizing the energy consumption of all SMDs. The main contributions of this paper are summarized as follows: (1) Based on heterogeneous networks, we establish a three-tier computing model, including local computing, SBS computing, and MBS computing. An energy-efficient optimization problem is formulated. Workload placement strategy, transmit power, and computation capability allocation are optimized to minimize SMDs' energy consumption under task delay constraints.
(2) We propose an EERA scheme based on the variable substitution technique. In this scheme, the optimal workload distribution and computation capability allocation are first obtained. Then, the optimal SMDs' transmit power is derived through the variable substitution.
(3) Numerical simulation experiments are conducted. Simulation results are presented to validate that EERA outperforms other baseline schemes and effectively reduces the SMDs' energy consumption.

1.2.
Organization. The rest of this paper is organized as follows. In Section 2, the three-tier computing model is presented and the energy-efficient optimization problem is formulated. In Section 3, EERA based on the variable substitution technique is proposed, where workload distribution in three-tier, computation capability allocation from SBS and SMDs' transmit power are optimized jointly to minimize SMDs' energy consumption. Numerical results are provided in Section 4, and conclusions are presented in Section 5.

System Model and Problem Formulation
As shown in Figure 1, SBS and MBS are equipped with MEC servers and help SMDs perform tasks. SMDs, SBS, and MBS execute tasks together and establish a three-tier computing architecture. In the first tier, there is K SMDs and the set of SMDs is denoted as K = f1, 2, ⋯, Kg. The processing capability of k-SMD ðk ∈ KÞ is denoted as f k,l cycles/s. In the second tier, the SBS has the limited computation capability denoted as F cycles/s. In the third tier, we assume that the MBS has infinite computational resources and its execution latency is negligible [9,30]. In addition, the backhaul link time delay between SBS and MBS is proportional to the transfer data size and the proportion coefficient is denoted as ϕ [24]. We assume that each user has one SMD and each 2 Wireless Communications and Mobile Computing SMD has one task. We only consider the case that SBS can transfer data to MBS and SMDs cannot offload tasks to MBS directly [24,25]. Moreover, SMDs occupy orthogonal wireless channels. The k-SMD has the task denoted as A k ðD k , C k , T k Þ. The task A k containing D k bits needs to be completed in time T k . Each bit needs C k cycles. We assume the task belongs to data-partitioned oriented tasks [6], which can be segmented arbitrarily, such as virus scan task and GZip task. The task can be executed separately in three tiers, i.e., SMDs, SBS, and MBS (Specially, in virus scan, the files can be partitioned into three parts. Then, each tier can scan a part of the total files in parallel. Finally, the results of three tiers are combined and the final result is obtained.) α k = ½α k,l , α k,s , α k,m ð0 ≤ α k,l , α k,s , α k,m ≤ 1Þ is set as the workload distribution. α k,l , α k,s , and α k,m denote the proportion of k-SMD workload, SBS workload, and MBS workload, respectively. We assume that the computation results are so small that the time delay from SBS and MBS to SMDs can be ignored [15,30,31].
We consider a low voltage task execution model and the energy consumed by one CPU cycle is denoted as ε given by where κ is a constant related to capacitance coefficient [15]. Then, the computing energy consumed locally is written as where E comp k,l denotes the k-SMD energy consumption of local computing.

Local Transmitting
Model. The transmitting channel between SMDs and SBS is assumed as Rayleigh channels [6]. We assume that the coherence time is larger than the task deadline T k , i.e., the channel gain is invariant during the task execution [31]. The channel gain is denoted as g k , and the task offloading rate can be obtained as where r k , B, p k,tx , and N 0 denote k-SMD's transmit rate, channel bandwidth, transmit power, and white Gaussian noise power, respectively. The k-SMD's transmit power cannot exceed the maximum transmit power p max k,tx . p tx denotes the SMDs' transmit power vector, which is expressed as ½p 1,tx , p 2,tx ,⋯,p K,tx . The task offloaded to MBS needs to be transferred to SBS first. Thus, the offloading time of k-SMD t trans k,l is obtained as The offloading energy consumption is the product of the offloading time and transmit power as Small BS with an MEC server integrated Macro BS Figure 1: Multiuser task execution in three-tier computing architecture.

Wireless Communications and Mobile Computing
The SBS workload from k-SMD is α k,s D k , and the number of its computation cycles is α k,s D k C k . The time delay of SBS execution is obtained as The total delay of SBS computing is made up of offloading delay and execution delay, which is given by The backhaul link delay t trans k,m is proportional to the transfer data size, i.e., the transfer delay between SBS and MBS is calculated as The MBS execution latency can be ignored. Therefore, the delay of MBS computing t k,m is the sum of offloading delay and backhaul link delay as 2.3. Problem Formulation. Based on equations (3) and (6), the energy consumption of k-SMD E k , which consists of computing consumption and transmitting consumption, is written as The task of k-SMD is executed parallel in three-tier (local devices, SBS, and MBS), and thus, the execution delay t k is obtained as The energy-efficient problem under tasks delay constraints is formulated as where (14b) means that the delay needs to meet the demand.
(14c) indicates that the SBS computation capability allocated to k-SMD cannot exceed the maximum allocation frequency.
(14e) denotes that the sum workload of the local device, SBS, and MBS needs to be equal to the total task load of k-SMD.

Problem Solution
In this section, for gaining some engineering insights, an EERA scheme based on the variable substitution technique [6,32] is proposed to solve problem P1. Firstly, we fix p tx and find the optimal workload distribution α * k and SBS computation capability allocation f * s by minimizing ∑ K k=1 E k . Then, we use α * k and f * s to find the optimal transmit power p * tx . According to equations (3), (6), and (12), E k can be rewritten as Substituting equation (14e) into (15), E k can be written as 3.1. Problem Decomposition. Fixing transmission power p tx , problem P1 is simplified to problem P2, where the second term of equation (16) is fixed and can be eliminated.
where transmit power vector p tx is fixed. Substituting the solution of problem P2 into equation (15) and optimizing p tx by minimizing ∑ K k=1 E k , we formulate problem P3 as Lemma 2. Based on Lemma 1, e k changes with v k as follows: ð2Þ v k < κC k f 2 k,l , e k increases monotonically with the increase of α k,l . ð3Þ and e k decreases monotonically with the increase of α k,l . The second case and the third case can be proved by the same way as the first case.
Based on Lemma 1 and Lemma 2, we can judge whether problem P1 has a solution or not and get Lemma 3.

Remark 4.
When v k > κC k f 2 k,l , i.e., the energy consumed per bit by offloading is more than the energy consumed per bit by local execution. More bits will be processed in the local device to save energy. That is why e k decreases monotonically with the increase of α k,l . In the second case of Lemma 2, v k < κC k f 2 k,l , i.e., the energy consumed per bit by offloading is less than the energy consumed per bit by local execution. More bits will be processed by offloading to save energy. That is why e k increases monotonically with the increase of α k,l .
Remark 5. According to Lemma 1, v k increases monotonically with the increase of p k,tx . From equation (4), a larger r k is due to a larger p k,tx and a larger p k,tx induces a larger r k . Wherefore, the larger is r k , the larger is v k . According to equation (15), when v k becomes larger, E k becomes larger. Thus, E k increases with the increase of r k , i.e., the energy consumption of SMDs increases with the increase of r k . In other words, the SMD will consume more energy when having a higher offloading rate.
Substituting equation (13) into inequality (14b), we get In order to simplify problem P2, t k,s and t k,m are compared and then, problem P2 becomes problem P2.1 and problem P2.2.
When t k,s ≥ t k,m , i.e., the delay of SBS computing is larger than MBS computing, problem P2 becomes problem P2.1, which is written as When t k,s < t k,m , i.e., the delay of MBS computing is larger than that in SBS computing, problem P2 becomes problem P2.2, which can be written as According to Lemma 2, three cases are dealt with, respectively, to solve problem P1.
(1) v k > κC k f 2 k,l : when the energy consumed per bit by offloading is more than the energy consumed per bit by local execution, the following derivations exist. Lemma 6. Both problems P2.1 and P2.2 have the same optimal local task load α * k,l as Proof. From inequalities (22b) and (23b), α k,l ≤ ð f k,l T k /D k C k Þ is obtained. In the light of the first case of Lemma 2, e k decreases monotonously with the increase of α k,l . Wherefore, we take α * k,l = ðf k,l T k /D k C k Þ.
Remark 7. According to equation (24), the local workload is related to local computation ability and the task delay 5 Wireless Communications and Mobile Computing constraint. Larger local computation ability brings a larger local workload. In order to save energy, SMDs will process as many bits as possible locally if the processing latency meets the task delay constraint. Looser delay constraint brings the SMD a larger local workload. Looser delay constraint means that the local device has more time to execute the task and thus process more bits locally to save energy. Lemma 8. Define α * k,s , α * k,m , and f * k,s as the optimal SBS workload, MBS workload, and computation ability allocated from SBS, respectively. When v k > κC k f 2 k,l , both problem P2.1 and problem P2. 2 have Proof. See Appendix C.
Remark 9. According to equation (25), α * k,s is related to backhaul link delay coefficient ϕ and the computation ability f * k,s allocated from SBS. When much SBS computation ability is allocated to k-SMD or backhaul link delay is large, the SBS workload will be large. In other words, the task will be executed prior in SBS unless MBS execution costs less time.
When v k > κC k f 2 k,l , based on Lemma 6 and Lemma 8, the solution of problem P2 can be obtained as Theorem 10.
Theorem 10. The optimal workload distribution α * k and the optimal allocation of SBS computation ability f * s can be obtained as Proof. Substituting equations (24)-(26) into equation (17d), the optimal allocation of SBS computation ability and the optimal workload distribution can be obtained.
In the light of Remark 5, the optimal transmission rate r * k can be calculated by Lemma 11 and then, problem P3 can be solved.
Lemma 11. Problem P2.1 and problem P2.2 have the same optimal transmission rate r * k as Proof. According to inequalities (C.3) and (C.9), we choose the lower boundary of r k as r * k for saving energy. Considering Lemma 8, r * k of problems P2.1 and P2.2 are same and equation (29) is obtained.
Theorem 12. The optimal transmission power p * k,tx is given by Remark 13. As can be seen from equation (30), smaller α * k,l and larger α * k,s induce larger k-SMD's transmission power p * k,tx . When the proportion of the task executed locally is small, the offloading rate should be large enough to meet the task delay constraint, which results in large transmission power. Similarly, larger α * k,s means more bits will be processed in SBS and means a larger offloading rate, which accounts for larger transmit power.
(2) v k < κC k f 2 k,l : when the energy consumed by offloading per bit is less than the energy consumed by local execution per bit, offloading will be prior to local execution for saving energy, i.e., smaller α k,l will be better for saving energy.
Considering problem P2.1, we have the optimal local workload as Lemma 14.

Lemma 14.
The optimal α k,l of problem P2.1 can be given by Proof. We have α k,l ≥ 1 − ðT k r k /D k Þ + ðα k,s C k r k /f k,s Þ by substituting equations (5), (8), and (9) into inequality (22c). Smaller α k,l leads to less energy consumption of k-SMD. Therefore, we take α k,l = 1 − ðT k r k /D k Þ + ðα k,s C k r k /f k,s Þ: Similarly to Lemma 14, we obtain the optimal local workload of problem P2.2 as Lemma 15 using inequality (23c).

Lemma 15.
The optimal α k,l of problem P2.2 can be calculated as Lemma 16. When v k < κC k f 2 k,l , the optimal MBS workload α * k,m and SBS workload α * k,s have Wireless Communications and Mobile Computing Proof. Considering problem P2.1, we obtain t k,s ≥ t k,m , where α k,s ≥ ðϕf k,s /C k Þα k,m is attained according to equations (9) and (11). From equation (31), smaller ðα k,s /f k,s Þ will be better for saving energy. Thus, we take α k,s = ðϕf k,s /C k Þα k,m : Considering problem P2.2, we get α k,m > ðC k /ϕf k,s Þα k,s from t k,s < t k,m .
According to equation (32), smaller α k,m brings smaller α k,l and saves more energy. In addition, α k,m can approach ðC k /ϕf k,s Þα k,s as much as possible because of the continuity of α k,m : Hence, we obtain α * k,m = ðC k /ϕf k,s Þα * k,s .

Remark 17.
There always exists α * k,s = ðϕf k,s /C k Þα * k,m whether v k is larger than κC k f 2 k,l or not. It indicates that the energy consumed by offloading per bit has nothing to do with the relation between α * k,s and α * k,m . The relation depends on the computation ability allocated from SBS and transfer delay of backhaul link, i.e, the distribution of workload between SBS and MBS is decided jointly by the computation ability allocated from SBS and MBS time cost.
Remark 18. Based on Lemma 14, Lemma 15, and Lemma 16, we easily find that problem P2.1 and problem P2.2 have the same optimal α k,l . In other words, the optimal workload of local devices α * k,l is independent of the workload distribution between SBS and MBS.

Remark 19.
In the second case of Lemma 2, problem P2.1 and problem P2.2 have the same optimal local workload α * k,l and same relation between α * k,s and α * k,m . Therefore, according to equation (17d), problem P2.1 and problem P2.2 have the same optimal solution about α * k : Based on Remark 19, the solution of problem P2 can be obtained by Theorem 20.

Theorem 20.
When v k < κC k f 2 k,l , the optimal computation ability allocation from SBS f * k,s and the optimal workload distribution α * k among SMDs, SBS, and MBS can be attained as Proof. Substituting equations (31) and (33) into equation (17d), the optimal workload distribution α * k can be obtained. In addition, from equation (31), α k,l decreases with the increase of f k,s . A larger f k,s brings a smaller α k,l and saves more energy. Thus, we take f * k,s = β k F.
Considering problem P3, we substitute equations (34) and (35) into E k and get the optimal transmit power p * k,tx as Theorem 21.

Theorem 21.
When v k < κC k f 2 k,l , the optimal transmission power p * k,tx is where ðdQ/dp k,tx Þj p k,tx =p = 0 and v −1 k denotes the inverse function of v k Q is defined as Proof. See Appendix D.
It is difficult to solve v −1 k and ðdQ/dp k,tx Þj p k,tx =p = 0. Hence, some tools are used to get the optimal transmission power p * k,tx . In the first step, we use MATLAB to get the maximum transmission power p end k,tx from p end k,tx = v −1 k ðκC k f 2 k,l Þ. In the second step, we use the binary search technique to search the optimal transmit power p * k,tx between 0 and p end k,tx for minimizing Q. The variables e, d, and p mid k,tx denote the search error, search interval, and interval midpoint, respectively. The search is not stopped until d < e. The detailed search process is summarized in Algorithm 1.
(3) v k = κC k f 2 k,l : when v k = κC k f 2 k,l , i.e., e k = 0, α k,l cannot change e k . In this case, the energy consumed per bit by local execution equals the energy consumed per bit by offloading. Offloading cannot reduce energy consumption of task execution. We choose to execute tasks in local devices or the entire

Theorem 22.
(1) When t all k,l ≤ t * k,of f , the task will be executed entirely by the local device and have (2) When t all k,l > t * k,off , the task will be executed entirely by offloading and have In (1) and (2), the latency of local execution entirely is denoted as t all k,l and t all k,l = ðD k C k /f k,l Þ. The minimum offloading latency is denoted as t * k, off and t * k,off = ðD k /Blog 2 ð1 + ðv −1 k ðκC k f 2 k,l Þg k /N 0 ÞÞÞ + ðϕD k C k /C k + ϕf * k,s Þ: Proof. See Appendix E.
By now, the optimal solution of problem P1 is given by the theorems and the procedure is described in Algorithm 2.

Analysis of Special Cases.
From the first four theorems, we not only consider energy minimization but also consider the delay constraint. That is why we still allocate resources when we know the case with the least energy consumption.
In Theorem 22, we only consider the latency. In this case, energy consumed per bit by offloading equals the energy consumed per bit by local execution, i.e., the offloading will not reduce energy consumption of the task execution. We cannot use the offloading to reduce SMDs' energy consumption. However, we can choose the solution with the least delay to try to improve users' QoE. Wherefore, we choose to execute the task either locally or remotely according to the latencies of the task execution in the local device and offloading.

Numerical Results
In this section, numerical results are given to evaluate the performances of the proposed EERA scheme, as compared to the following baseline schemes. Some parameters are set as follows unless stated otherwise. The tasks models of all SMDs are set to be identical, i.e., D k = 10 kbits [15], C k = 1000 cycles/bit [15], and T k = 2 ms ðk ∈ KÞ [9]. The local computation capability f k,l equals 5 × 10 8 cycles/s [33]. The energy coefficient of local computation κ is 10 −28 [15]. The maximum transmission power p max k is 0.1 watts [6]. The computation capability of the SBS server is 8 × 1 0 9 cycles/s [34]. The backhaul time delay coefficient ϕ is set to be 1:25 × 10 −8 sec/bit [24]. We consider a Rayleigh fading channel model, and the channel gain g k = λ g k . λ is an independent exponential random Step 1: According to Theorem 10 and Theorem 12, calculate α * k , f * s and p * tx .
Step 2: Based on equation (19), compute v * k by substituting the results of Step 1.
Step 3: where A d = 4:11 denotes the antenna gain, f c = 915 MHz denotes the carrier frequency, d k = 18 m denotes the distance from the SBS to k-SMD [6], and d e = 2:8 denotes the path loss exponent. The channel bandwidth B is 2 MHz [31].

Performances of EERA.
In this subsection, we analyze the performances of EERA compared with local-computing-only, full-offloading, and computing-without-MBS. Figures 2-5 present the energy consumption of SMDs under different conditions. It is shown that the proposed EERA achieves the lowest energy consumption among those four methods. Figure 2 plots the sum energy consumption of all SMDs versus the user number K. It is shown that the energy consumption by all the schemes increases as the user number grows. Besides, the energy consumption of computing-without-MBS is close to full-offloading when the user number is less than 15 while close to localcomputing-only when the user number is greater than 15. The reason is that the computation resource that each user obtains from the SBS server becomes less as the user number increases. And SMDs process more bits locally for meeting the tasks' deadline. It is also observed that EERA outperforms the other schemes. This is because EERA has more computation capacity thanks to the MBS server. And lower execution latency gives more time to offload computation bits. Figure 3 depicts the sum energy consumption of all SMDs versus the computation tasks size D. It is shown that the energy consumption by the four schemes rises with the computation task size growth. When the computation task size is small, the energy consumption of computingwithout-MBS is less than that of local-computing-only and more than that of full-offloading. When the computation task size is large, computing-without-MBS is close to full offloading. It indicates that the number of local computation bits decreases with the computation task size increase under the task latency constraints. The energy consumption of EERA is the least among these methods. In addition, the gap between EERA and full-offloading is gradually widening when the computation task size is less than 9.45 kbits and narrowing when the computation task size is greater than 9.45 kbits. The reason is that offloading consumes less energy and EERA processes more bits by offloading when the computation task size is small. To meet the task latency demand, more bits are offloaded when the computation task size grows. Figure 4 shows the sum energy consumption versus the channel bandwidth B. As we can see, with the increase of the channel bandwidth, the energy consumption by localcomputing-only remains invariant while other schemes decrease. The reason is that local-computing-only has nothing to do with offloading. However, other schemes can reduce transmit power owing to a bigger bandwidth under 9 Wireless Communications and Mobile Computing time delay constraints. The gap between full-offloading and computing-without-MBS is widening as the channel bandwidth grows. Full-offloading has more computation capability than computing-without-MBS and has lower execution latency, which leaves more time for offloading and lowers the transmit power. EERA is gradually close   to full-offloading as the channel bandwidth rises. It indicates that EERA processes more bits by offloading when the channel bandwidth is widening. Figure 5 shows the sum energy consumption versus distance from the SBS to users. It is observed that these schemes except local-computing-only rise when the distance becomes 14 Figure 4, local-computing-only has nothing to do with the communication distance. Longer distance leads to a larger path loss, which needs high transmit power to meet the time delay constraint. It is shown that the energy consumption by EERA is less than computing-without-MBS.
That is because the existence of the MBS server lowers the execution latency and the transmit power. Moreover, the gap between EERA and full-offloading is widening. It illustrates that the offloading bit number becomes less owing to the longer communication distance.   In this subsection, we analyze the energy consumption with respect to the backhaul time delay coefficient in different conditions, e.g., the varying latency constraint, the varying user number, and the varying computation task size. Figure 6 plots the sum energy consumption of all SMDs in different backhaul time delay coefficients ϕ versus the execution latency constraints. It is shown that the energy consumption decreases as the execution latency increases. The reason is that more time will be used to offload. And the low transmit power is allowed when the execution latency constraints relax. Figure 7(a) shows the energy consumption versus user number under different ϕ. Figure 7(b) depicts the energy consumption versus computation task size given different ϕ. Combined with Figure 6, it is observed that a larger backhaul time delay coefficient results in larger energy consumption with the rise of the execution latency, user number, and computation task size. The reason is that a larger backhaul time delay coefficient increases the execution time and reduces the offloading time. Thus, transmit power increases for satisfying the task latency constraints.

Conclusion
In this paper, we investigated resource allocation mechanisms for three-tier MEC architecture in heterogeneous networks. We considered that both MBS and SBS are integrated with MEC servers and are combined with local devices to form a three-tier computing architecture. Each task from SMDs can be divided into three parts. SMDs, SBS, and MBS perform a part of the task, respectively. We formulated an optimization problem to minimize all SMDs' energy consumption under the time delay constraints. To improve the efficiency of resource allocation, we proposed an EERA mechanism based on the variable substitution technique, which jointly optimized the computation and radio resources. The optimal workload placement strategy among SMDs, SBS, and MBS was derived. And the optimal computation capability allocation and SMDs' transmit power were obtained. Finally, numerical simulation results are presented. Compared with the benchmark schemes, the proposed EERA scheme can reduce the SMDs' energy consumption significantly.

Appendix A. Proof of Lemma 1
Substituting equation (4) into equation (19), we rewrite v k as v k = p k,tx r k = p k,tx Blog 2 1 + p k,tx g k /N 0 À Á À Á : ðA:1Þ The derivative of r k with respect to transmit power p k,tx is denoted as r k ′ and it can be calculated as Obviously, ðdZ/dp k,tx Þ ≥ 0 and Z increases with the increase of p k,tx . In addition, Z = 0 when p k,tx = 0. Thus, Z ≥ 0 exists. Then, we have ðdv k /dp k,tx Þ ≥ 0 and v k increases monotonically with the increase of p k,tx . The Proof is completed.

B. Proof of Lemma 3
The energy consumption should be semipositive, i.e., E k ≥ 0 always holds. According to Lemma 2, we have the following three cases: (1) In the first case of Lemma 2, i.e., ðp k,tx /r k Þ > κC k f 2 k,l .
(2) In the second case of Lemma 2, i.e., ðp k,tx /r k Þ < κC k f 2 k,l . It is similar to the first case and there exists α k,l ∈ ½0, 1.
(3) In the third case of Lemma 2, i.e., ðp k,tx /r k Þ = κC k f 2 k,l . It is obvious that E k ≥ 0.

C. Proof of Lemma 8
(1) Problem P2.1 Substituting equations (5), (8), and (9) into inequality (22c), we obtain According to equation (17d), we substitute α k,s + α k,m for 1 − α k,l and get Then, get the inequality about r k as In the light of Lemma 1 and Remark 5, smaller v k induces smaller E k and smaller r k induces smaller v k . Wherefore, r k should better be small to save energy. ðα k,s /f k,s Þ should better be small to make the lower boundary of r k small. From inequality (17c), we take f * k,s = β k F.
Considering t k,s ≥ t k,m , from equations (9) and (11) We take α k,s = ðϕf k,s /C k Þα k,m for getting small ðα k,s /f k,s Þ.

D. Proof of Theorem 21
Based on Theorem 20, we substitute α * k,l into equation (16) and get E k = 1 − T k r k D k + ϕT k C k r k 2 D k ϕC k r k + C k + ϕf k,s À Á ! × D k C k κf 2 k,l − D k p k,tx r k + D k p k,tx r k = D k C k κf 2 k,l + T k p k,tx − κC k f 2 k,l T k r k ϕC k r k + C k + ϕf k,s C k + ϕf k,s À Á : ðD:1Þ For simplifying equation (D.1) and getting the optimal transmission power p * k,tx , we define Q as Q = T k p k,tx − κC k f 2 k,l T k r k ϕC k r k + C k + ϕf k,s : ðD:2Þ In equation (D.1), a smaller Q induces a smaller E k . Thus, we will try to minimize Q by optimizing p k,tx .
Furthermore, for simplifying the expression of Q, we set A = C k + ϕf k,s and D = κC k f 2 k,l T k . Thus, Q can be rewritten as Q = T k p k,tx − Dr k ϕC k r k + A : ðD:3Þ Then, the derivative of Q can be calculated as dQ dp k,tx = AT k + T k ϕC k r k − AD + ϕC k T k p k,tx À Á r k ′ ϕC k r k + A ð Þ 2 : ðD:4Þ Define M as M = AT k + T k ϕC k r k − AD + ϕC k T k p k,tx À Á r k ′: ðD:5Þ The second derivative of r k is computed as d 2 r k dp k,tx 2 = − Bg 2 k N 0 + p k,tx g k À Á 2 ln 2 : ðD:6Þ Obviously, the second derivative of r k is negative. The derivative of M is obtained as dM dp k,tx = − AD + ϕC k T k p k,tx À Á r k ′ ′ : ðD:7Þ In (1) When Mð0Þ ≥ 0, i.e., a ≤ 1, this case does not exist. We prove this case by contradiction in the following. Firstly, suppose this case is feasible, then, we have κC k f 2 k,l Bg k N 0 ln 2 ≤ 1, κC k f 2 k,l ≤ N 0 ln 2 Bg k : ðD:9Þ According to v k < κC k f 2 k,l , we get v k < κC k f 2 k,l ≤ N 0 ln 2 Bg k : ðD:10Þ From equations (4) and (19), we obtain p k,tx r k < N 0 ln 2 Bg k , p k,tx B log 2 1 + p k,tx g k /N 0 À Á À Á< N 0 ln 2 Bg k , p k,tx g k N 0 < ln 1 + p k,tx g k N 0 :