Quantitative Comparison of the Efficiency and Scalability of the Current and Future LTE Network Architectures

The core architecture of current mobile networks does not scale well to cope with future traffic demands owing to its highly centralized composition. Typically, it is believed that decentralization of the network architecture would be a sustainable approach to deal with ever growing amount of mobile data traffic. Nevertheless, the decentralization strategy of network architecture has not been properly examined through quantitative performance studies. Given that LTE will be the leading mobile networking technology in the coming 5–10 years, we conduct a hybrid study model to compare performance of current and future (decentralized) LTE network architectures. Particularly, our analysis presents numerical results quantifying impact of the number of attached nodes on the load at network routers and links, on the latency, and on the processing cost of the user’s data and control planes. Analytical results demonstrate that decentralization of the LTEnetwork architecture achieves higher performance compared to the current architecture and improves the latency and cost of data packet delivery more than 10 and 6 times, respectively. Furthermore, it is also observed that GTP outperforms PMIP for all studied performance metrics in the decentralized architecture and provides about twofold better latency and cost for data packet delivery and roughly 6 times lower data traffic load on the network routers.


Introduction
Over the last years, with the ubiquitous deployment and rapid evolution of mobile networks (e.g., 3GPP and WiMAX), the demand of accessing the Internet for mobile users has been soared dramatically.The mobile devices (e.g., smart-phones and tablets) become an integral part of everyone's daily live and generate a substantial part of the total Internet traffic, which is still increasing significantly.
It is forecasted that by 2021 there will be around 12 billion mobile devices worldwide, and 82% from these will be smart mobile devices generating up to 99% of all mobile data.Overall, mobile data is expected to increase from 7 EB per month, seen in 2016, to 49 EB per month in 2021 [1,2].
Coping with such a demand in the current mobile networks is neither economically nor technically viable.The Radio Access Network (RAN) cannot be easily extended due to spectrum limitations.Furthermore, the core of mobile networks is highly centralized, which introduces scalability and reliability problems.
Mobile network operators increase RAN capacity by improving spectrum utilization in several ways, for example, deployment of small cells, selectively offloading traffic from cellular access to WiFi technology, and exploiting multicarrier techniques or multiple radio access technology approaches [3,4].The major challenge regarding the core networks (standardized by 3GPP, IETF) is related to the fact that a few high level network entities, entitled anchor points, manage both the data plane and the control plane.In such a centralized architecture, mobile node's (MN's) traffic must traverse the core anchor point and then go to the corresponding service node (CN); see Figure 1(a).This makes the network prone to several limitations, for example, suboptimal routing, low scalability, signaling overhead, and the lack of granularity on services [5,6].The straightforward solution to cope with such an issue may consist of operators investment to upgrade the resources of the core network entities.Although this approach is technically feasible, network operators always prefer costeffective and more sustainable solutions.Traffic offloading is an alternative approach to effectively reduce the traffic traversing through core network entities and to mitigate the traffic overhead on the limited resources of the core part.This can be achieved by placing small-scale anchor points in the proximity of the access network to locally handle MNs connections and traffic [7].This essentially leads to a decentralized (flat) network architecture; see Figure 1(b).Even though decentralization also requires further investments for network architecture changes and management, it seems in the long term to be more cost-efficient than continuously extending capacity of the centralized architecture to cope with the future demands.
Long-Term Evolution (LTE) is expected to be the leading mobile networking technology in the next decade, handling substantial part of ever growing global mobile data traffic.It is predicted that the development of LTE will not keep up with the growth of mobile traffic, and while it supports almost 69% (out of 7 EB) of the current mobile traffic, it is estimated to handle about 79% (out of 49 EB) of the worldwide mobile data traffic by 2021 [1,2].
Therefore, in this paper, we pay special attention to the LTE system and perform a hybrid study, including simulation and analytical models, to analyze in detail the performance and scalability of the current and a decentralized LTE network architecture.In particular, we analytically evaluate both network architectures to quantify how much the load on network resources (routers and links) and the latency and cost of the user's data plane (traffic forwarding procedures) and control plane (attachment and handover procedures) are affected by the increasing number of attached mobile nodes.We analyze performance of GPRS Tunneling Protocol (GTP) and Proxy Mobile IP protocol (PMIP), the two commonly used protocols in the LTE system to handle MNs data traffic and mobility in 3GPP access (Figure 2).Summarizing, our main contributions in this paper are as follows: (i) Develop a detailed model of the functions of GTP and PMIP protocols for 3GPP access on both current (centralized) and future (decentralized) LTE network architectures.(ii) Carry out a hybrid study combining simulation and analytical modeling, capturing the most essential characteristics of the system while abstracting from the less important details, in order to evaluate various scenarios in feasible time.(iii) Using the developed approach, derive various metrics to quantify and analyze the performance and scalability of the LTE network system (current architecture versus decentralized architecture).(iv) Relying on the obtained numerical results, provide an intuition of the expected impact of the number of subscribers on the different LTE core network architectures.
The rest of this paper is organized as follows: Section 2 provides concisely the necessary background about the current LTE and its existing mobility management solutions for 3GPP access.Section 3 discusses our hybrid modeling study.Section 4 describes in detail the analytical calculation of the performance metrics and the evaluation procedure.
The results to compare the performance and scalability of different network architectures are presented in Section 5. Section 6 reviews in brief the recent related works and specifies how our work differs from the literature.Finally, the paper ends up with the conclusion and discussion parts in Sections 7 and 8, respectively.

LTE Architecture
This section gives a brief overview for current LTE network architecture and the two existing mobility management protocols for 3GPP access, which are essential for perceiving the problem statement as well as the proposed model, in this work.
The LTE architecture is hierarchical and defines the Evolved Packet System (EPS) consisting of Evolved Universal Terrestrial Radio Access Network (E-UTRAN) and Evolved Packet Core (EPC).The E-UTRAN consists of a network of radio base stations (eNodeB-evolved Node B) that provide radio connectivity to User Equipment (UE).The EPC is a multiaccess IP-based network that allows for a common core network for 3GPP and non-3GPP radio access and fixed access.
The EPC consists of four main elements (Figure 2) that allow for the convergence of packet-based services [10]: (i) Serving Gateway (SGW) is a user plane node that provides data paths and routes traffic between eNodeBs and PGW.It also acts as a local mobility anchor for UEs performing handover between eNodeBs.(ii) PDN Gateway (PGW) provides the connection between the EPC and other external IP networks, as well as several additional functions, such as IP address anchoring and allocation, routing, packet filtering and monitoring, and policy control.(iii) Mobility Management Entity (MME), whose key role is to handle UE mobility, also performs the control functions to access the LTE, assigns network resources, and supports roaming and handover procedure.(iv) Policy and Charging Rule Function (PCRF) dynamically controls and manages all data sessions and determines quality of service (QoS) policies and charging rules to SGW and PGW.

Mobility Management Protocols.
Current EPC may use GTP or PMIP protocol to support UE's mobility for 3GPP access networks [11].These protocols allow an uninterrupted handover for the UEs during internetwork mobility.To manage UE's mobility using GTP or PMIP, the PGW might connect to SGW via S5/S8 interfaces (Figure 2) for nonroaming and roaming scenarios, respectively [11].
The GTP protocol is able to fully handle the control and data planes.It can forward the UE's downlink packets from source location to target place during handover.The PMIP however can only handle UE's mobility and perform data forwarding after handover procedure.Moreover, it is not able to control the bearers and QoS signaling.When PMIP is used over the S5/S8 interface, the GTP bearers are only defined between UE and the SGW.In this case, the SGW takes over the bearer binding operations, and an additional connection (dash line in Figure 2) needs to be created between the SGW and the PCRF to provide the required information on QoS policy [9,10,12].
UE may perform a handover in either idle or active mode.In idle mode, the UE stays in power consumption mode and does not inform the network about the location information.The network uses the tracking and paging procedures to discover position of UE.In active mode, UE's mobility is completely under control of the network.The decision to perform a handover and to choose the target cell is handled by the network, based on measurements performed by the eNodeB and the UE.During the handover procedure, GTP can locate the UE's position, even in the idle mode to establish the required data and control planes tunnels.However, PMIP does not support tracking and paging functions and the UE needs to be always in active mode.Therefore, GTP protocol is mostly used in the access network between SGW and eNodeBs to eliminate the aforementioned drawbacks of PMIP.
The decision for using GTP or PMIP over the S5/S8 interface depends on several parameters, such as technical support and existence of roaming scenarios among the 3GPP access and non-3GPP access networks.Note also that mobility management protocols for non-3GPP, such as Mobile IP (MIP) and Dual Stack MIP (DSMIP) [10], are out of the scope of this study.between the eNodeB, MME, SGW, and PGW.Using GTP protocol, a successful attachment results in a tunnel for the user data plane (GTP-U) between the eNodeB and SGW and between the SGW and PGW.Another tunnel for the control plane (GTP-C) is established between the SGW and PGW and also between the MME and SGW (Figure 3(a)).While GTP-U simply transports data packets within the core and radio access networks, GTP-C tunnel is used to exchange the control messages for handling UE's mobility, as well as for path management and tunnel management (e.g., adjusting QoS parameters, updating sessions for roaming subscribers, and activating and deactivating subscriber sessions).These tunnels are created for each individual UE traffic flow (IP traffic).Each GTP tunnel has an identifier, entitled Tunnel End Point Identifier (TEID).Based on the TEID the network is able to choose the appropriate tunnels to transfer data packets and control messages between the end points.

Data and Control Planes
In the case of PMIP protocol, basic IP connectivity over Generic Routing Encapsulation (GRE) tunneling is used between the SGW and PGW, and a GRE key is used to identify each tunnel.
As Figure 3(b) shows, during a handover between neighboring eNodeBs the GTP-U tunnel is updated, and if the handover is between neighboring SGWs both GTP-U and GTP-C (or GRE) are updated.
Note that tunneled packets might be further encapsulated by IPsec protocol in order to protect both control messages and data.In this study we only focus on the data and control planes' tunnels established over the transport network.Additional control signaling involved on UE's attachment, handover, and data delivery procedures with the other LTE components as well as IPsec is out of scope in this paper.

Mobility Management Messages.
The signaling for UE mobility control is similar for GTP and PMIP in the access network (Figure 2).Differences depend on the mobility protocol in the core network over the S5/S8 interfaces.
Figure 4 shows the GTP and PMIP mobility messages in the core network during UE attachment and handover procedures.  1 lists the messages used in the procedures described in the following as well as their respective sizes.Note that the size information does not include the extra GTP-C tunnel header size, which is presented in a separated column.For GTP protocol, when UE's switch is turned on, an Attach Request message is sent to the MME through the eNodeB.The MME then sends a C.S.Req message to the SGW, which is forwarded to the PGW.This request is meant for setting up the UE's default bearer and also for requesting a Packet Data Network (PDN) connectivity.The reply from the PGW, containing an IP address for the UE and a default bearer ID, is also forwarded by the SGW to the eNodeB and the MME.With this information the attachment procedure is concluded and the traffic from the UE can flow from the eNodeB to the SGW via the S1-U interface (Figure 3(a)).The MME still sends a M.B.Req message to the SGW, which in turn is forwarded to the PGW, containing the TEID assigned to the eNodeB.The PGW finally replies to this request with a M.B.Res message, and the user data plane is then set up for traffic flow in the core network.
During a handover between eNodeBs, the target eNodeB sends a Path Switch Request to the MME, informing that the UE has changed its physical location.The MME then sends to the SGW a M.B.Req message with the address of the new eNodeB and the TEID of the user plane for downlink.This information is forwarded by the SGW to the PGW.The PGW replies with a M.B.Res message to the SGW that starts forwarding downlink packets to the target eNodeB (current UE location).
On realizing that the SGW has been relocated (from the Path Switch Request message sent by the target eNodeB), the MME sends a C.S.Req message to the target SGW.This message contains the PGW addresses, the TEIDs used for uplink traffic, the address of the target eNodeB, and the protocol type used over S5 or S8 interface.The target SGW assigns the addresses and TEIDs for downlink traffic from the PGW and sends a M.B.Req message to the PGW informing about the changes.The PGW then updates its context field and replies to the SGW with a M.B.Res message including its address and TEIDs information [11].

2.3.2.
For the PMIP Protocol.Table 2 lists the messages used in the procedures explained in this section as well as their respective sizes.Note that the size information does not include additional header sizes, which can differ and are presented in a separated column.
For PMIP the initial attachment procedure of UE is similar to that of the GTP.The only difference is that the SGW has to establish a control session towards the PCRF to obtain the QoS policy information needed to perform the bearer binding.All these are obtained through a G.C.S.Res message sent by the PCRF to the SGW in reply to a G.C.S.Req message.This message exchange is done over the Stream Control Transmission Protocol (SCTP).
To establish the default bearers, the SGW sends to the PGW a P.B.U message containing, among others, the GRE key for downlink traffic, address information of the UE to request an IPv6 prefix, and charging characteristics.The PGW replies to the SGW with a P.B.A message containing, among others, the UE's address, the GRE key for uplink traffic, and the charging ID.These two messages are exchanged over a GRE tunnel, and after this exchange the SGW and the PGW set up an additional bidirectional GRE tunnel for forwarding of UE's data flows.
During handover between eNodeBs, the SGW sends to the PCRF a G.C.S.Req message informing about the change of UE's location, which was received from the MME.The PCRF replies to the SGW with a G.C.S.Res message providing, among others, the updated QoS policy and charging rules.In case of a handover with SGW relocation, in addition to the PCRF messages, the new SGW have to exchange the P.B.U and P.B.A messages with the PGW [10].
In the following, Section 3 describes our hybrid modeling approach to quantify the gains on performance and scalability in four different scenarios: the current (centralized) and a decentralized EPC architecture with GTP and PMIP mobility management protocols.Next in Section 4, we present the calculation of various performance metrics in detail.

Hybrid Simulation and Analytical Modeling
This section describes our hybrid study containing simulation and analytical modeling.As described in the Introduction, the simulation model captures the dynamic behaviour of the system at mobile node and connection level, delivering information about, for example, link load and load of routers in the different network layers.This information, together with other network and traffic parameters, is used in the analytical delay model to derive queuing delay for data packet and control messages processing in the nodes as well as queuing delays for transmitting this information on the network links.Eventually, these intermediate results are used to calculate end-to-end data packet delivery delay and cost, control plane, and data plane loads.
Figure 5 shows schematically how the various parts of our hybrid model relate to each other.It is important to remark that, in our approach on one hand, we capture the most essential characteristics of the system and on the other hand abstract from the less important details to set up an straightforward environment to perform the modeling and analyses in a feasible time.We use MATLAB as the environment to implement the models and perform the analysis.Using a normal desktop computer, it takes only a couple of minutes to carry out the complete tasks for all scenarios.
Table 3 presents the notation used in the hybrid model as well as in the analytical calculation to compute different performance metrics (Section 4).

Network Topology Model.
Figure 6 shows the network topology used in our modeling.This topology follows the Cisco 3-layer hierarchical model consisting of the core, distribution, and access layers.The core layer handles traffic transferred to and from the routers at the distribution layer.The distribution layer enables the communication between routers from the core and access layers.The access layer mainly controls the attachment of end users and devices to the network.
We define two network scenarios in our models: the current (centralized) and future (decentralized) network architectures in context of the LTE system.In each of these, the routers in the core and the access layers play different roles.
In the first scenario, for the centralized architecture, the PGW is placed in the top of the topology (R1 in Figure 6), and the SGWs are placed at the edge routers (R6 to R11).In this scenario the PGW is the anchoring point of the network and handles all data and control plane operations.The SGWs provide the data paths towards the core, and route UE packets between the access network and the PGW.
In the second scenario, for the decentralized architecture, we define S/PGWs, which are physical nodes combining functions of the SGW and PGW elements.In this scenario the S/PGWs operate as the distributed anchoring points and handle the data and control plane operations locally.These are placed in the access layer (R6 to R11).Herein, the core router (R1) is only used when data has to be routed through it.
In both scenarios, routers on the distribution layer perform as normal L3 routers and enable the communication between the core and access layers.
We consider three static CNs located at each layer.These represent the data centers that in reality could be geographically distributed.The data traffic to UE is transmitted with the shortest path through (CN → PGW → SGW → UE) or (CN → S/PGW → UE) for the centralized and decentralized scenarios, respectively.It is important to mention that we ignore all the detailed functions of the routing mechanism (e.g., load balancing) in our model.As in this paper we perform a comparative analysis for the different network architectures, using a plain routing solution having no severe effect on the final outcome of the comparison.
The two network architecture scenarios described above are later (in Section 4) combined with both the GTP and PMIP protocols to define the four scenarios, being analyzed in this work.

Mobility Model.
A straightforward approach to model mobility is by obtaining the time spent by a mobile node (also known as the residence time or the dwell time) in each SGW (or S/PGW) during movement.The Fluid-Flow mobility model is a simple approach to drive the mobile node's dwell time in cellular networks, which has been extensively used in previous work [13][14][15][16].By applying the Fluid-Flow model, where 퐸(V) is the average speed of a mobile node, the average dwell time of the node in each SGW is given by where 푅, 퐴, and 푃 denote the radius, coverage area, and perimeter of each SGW, respectively.Note that, for each mobile node, 퐸(V) is arbitrarily chosen from the predefined values listed in Table 5.
We assume that a mobile node randomly attaches to one of the SGWs and after passing the dwell time it starts to move towards the neighbor SGW.This movement is modeled by randomly choosing one of the neighbor SGWs and staying in it for the dwell time.This procedure is continuously repeated during the simulation time.Therefore, the number of handovers for each mobile node (푁 ℎ  ) can be easily derived using its dwell time and the simulation time.For both scenarios, mobile nodes choose the same trajectories, affecting the paths created for the related control and data planes during the simulation time.In our model we assume that 퐸(V) for each mobile node is constant during the simulation time.It is also assumed that each SGW covers a circular domain consisting of three eNodeBs, and therefore for a SGW relocation there will be three handovers between neighboring eNodeBs.

Traffic Model.
We assume that each mobile node randomly attaches to one of the SGWs (or S/PGWs) and starts to download data from one of the CNs, also randomly chosen.
Every mobile node has a single active session to one of the CNs during the whole simulation time.Every node follows the mobility model described in the previous section and after staying at each access layer entity (SGWs or S/PGWs) for its dwell time moves to one of the neighboring entities.
A Poisson traffic stream with average rate of 100 packets per second (CNTR) is generated from the CNs towards the connected mobile nodes, simulating the download of data by the attached nodes.For the sake of simplicity, this model ignores the packet level details (e.g., packet loss and loss recovery mechanism).To avoid IP packet fragmentation as a result of tunneling overhead at the core network, we set the size of packets from the CNs to 1200 Bytes ( [17] advises a default MTU size of 1280 Bytes).

Delay Model.
During a transmission between two endpoints, data packets or control messages may be delayed due to, for example, link congestion and queues.In our model, the network link delay in each hop includes the transmission delay (훿 (푥)  ) and the queuing delay (푑 (푥)  ), where 푥 is either a pure or a tunneled data packet or control message (Table 3).Applying an M/M/1 queuing model, the average delay (푇 (푝)  ) of a data packet of size 푝 transmitted in the network link 푙, with transmission rate TR and traffic load CNTR per mobile node, is given by 휆 푙 denotes the network link traffic, derived from the simulation by keeping track of all paths that are established using link 푙, as well as their duration, and taking into account the packet rate.
For a data packet 푝 or a control message 푚, and a node 푗, the router delay consists of the processing delay (휏 푝 (/) ), the data packet encapsulation/decapsulation or control message construction/extraction delay (휏 푡(푝/푚)  ) during a tunneling, the routing delay (휏 푟 (/)  ), and the queuing delay (푑 (푝/푚)  ) at the router (Table 3).For matters of simplicity we assume that 휏 푝 (/)  = 휏 푡 (/) = 휏 푟 (/) .Similar to the network link delay, for a network router 푗 with the processing rate PR, the average delay (푇 (푝)  ) for a data packet 푝 is defined by Herein, 휆 푗 signifies the network router traffic, obtained from the simulation by keeping track of all paths that crossed router 푗, as well as their duration.

Calculation of the Performance Metrics
This section presents the analytical calculation of the performance metrics (Table 4), defined in our model to quantify the impact of the number of mobile nodes on the performance of the EPC current and decentralized network architectures, with the GTP and PMIP protocols.Control messages load (size) In current EPC architecture, PGW handles centrally the MN's data traffic and mobility through the whole network.However, in decentralized architecture, the S/PGWs, being distributed closer to edge of the network, manage traffic of the MNs attached locally and handle their mobility when moving between the eNodeBs.Therefore, an additional mechanism needs to be implemented on top of the S/PGWs to keep ongoing traffic sessions active for the MNs performing handovers with S/PGW relocation.Different approaches may demand additional components and modifications in the network topology as well as impose further signaling efforts in the network, which must also be taken into account; see [18][19][20][21][22] as examples.
In our model we only study the parameters related to the core network.That is because the structures and characteristics for the access and wireless networks are the same for both the centralized and decentralized LTE architectures.
In the following, Section 4.1 defines the performance metrics listed in Table 4. Next we detail the latency and processing cost related metrics for both architectures in Sections 4.2 and 4.3, respectively.

Definition of Performance Metrics (i) The Average Latency of Data Packet Delivery (ALDPD).
The ALDPD is obtained using the average latency of data packet delivery for each mobile node 푖: The individual ALDPD for a mobile node 푖 is given by where Note that LAP 푖 is the latency of the initial attachment procedure for mobile node 푖.The LHP 푖 and LDPD 푖 define the average handover latency and the average latency for data packet delivery over the created paths (due to mobility) for mobile node 푖, respectively.

(ii) Average Processing Cost of Data Packet Delivery (ACDPD).
The ACDPD is obtained by The individual ACDPD for a mobile node 푖 is given by where Herein, CAP 푖 is the processing cost of the initial attachment procedure for the mobile node 푖.The CHP 푖 and CDPD 푖 define the average handover processing cost and the average processing cost for data packet delivery over the created paths (due to mobility) for mobile node 푖, respectively.One may note that LAP 푖 and CAP 푖 may slightly affect the overall amount of the ALDPD and ACDPD, respectively.However, we would like to discuss all parameters involved during the data packet delivery procedure to acquire more precise results.

(iii) Average Latency of Initial Attachment Procedure (ALAP).
It is given by

(iv) Processing Cost of Initial Attachment Procedure (CAP).
The CAP is obtained by (v) Average Latency for Handover Procedure (ALHP).The ALHP is given by

(vi) Average Processing Cost for Handover Procedure (ACHP).
It is given by (vii) Load of the Network Routers (LR) and Load of the Network Links (LNL).The LR and LNL define the data plane loads, implying how many times MNs traffic passes through the network routers and links, respectively.

(viii) The Control Message Load in Terms of Number (퐶푀퐿 (푁) ) and Size of Messages (CML (퐾퐵푠)
).The CML (푁) defines the load of control plane at the network routers in terms of the number of messages.It is obtained by We count the CML (푁) for both EPC network architectures using the GTP and PMIP protocols.
The CML (KBs) is another interpretation of CML (푁) that takes into account also the size of messages and represents a better view of control plane related load at the network routers.Considering that the control messages type and size Wireless Communications and Mobile Computing for the GTP and PMIP protocols are different, making the expression of CML (KBs) more complex, we avoid to present it here.
In the following, we elaborate on the ALDPD 푖 and ACDPD 푖 , covering also the latency of the initial attachment (LAP 푖 ) and the average handover (LHP 푖 ) procedures as well as the related costs (CAP 푖 and CHP 푖 ), respectively.Abbreviations of the control messages used in the following sections are listed in Tables 1 and 2.Moreover, the notations for the elements used in the analytical calculation of the performance metrics as well as the input parameters are listed in Tables 3, 5, and 6, respectively.

ALDPD for Individual Mobile
Node.ALDPD 푖 from (5) consists of the latency of the initial attachment procedure (LAP 푖 ), the average latency of handover procedure (LHP 푖 ), and the mean latency of data packet delivery (LDPD 푖 ) for node 푖, in the paths created from CN to the access layer routers.In the following we detail these items for the centralized and decentralized LTE architectures, considering both GTP and PMIP protocols.

The Centralized Architecture
The GTP-Based Approach.Referring to Figure 4(a), LAP 푖 defines the latency of mobile node initial attachment procedure, caused by exchanging C.S.Req/Res and M.B.Req/Res messages between the attached SGW and PGW.That is the delay of tunneling (constructing/extracting) of the messages in the attached SGW and PGW and also the delay of routing those messages (over the GTP-C) in the path between them.
LHP 푖 is calculated using the latency of handover in each path (LHP ℎ  ) during the simulation time.For both eNodeB and SGW relocations M.B.Req/Res messages are exchanged between the attached SGW and PGW.Therefore, LHP ℎ  includes (i) the delay of tunneling of messages at the attached SGW (for eNodeB relocation) or at the second SGW (for SGW relocation) and PGW and (ii) the delay of routing the tunneled messages in the path built between them.
The PMIP-Based Approach.In the PMIP-based approach the initial attachment latency includes (i) the delay for exchanging G.C.S.Req/Res messages (over SCTP protocol) between the attached SGW and PCRF; (ii) the delay for tunneling the P.B.U/P.B.A messages at the attached SGW and PGW; and (iii) the delay of routing the messages (over the GRE) in the created path (Figure 4(b)).
LHP ℎ  defines the delay for exchanging G.C.S.Req/Res messages (over the SCTP protocol) between the first SGW (for eNodeB relocation) or the second SGW (for SGW relocation) and PCRF.In case of SGW relocation, it also includes the delay of tunneling of P.B.U/P.B.A messages in the second SGW and PGW and the delay of routing the messages in the path between them.
Finally, LDPD 푘  specifies the delay of data packet delivery from CN to SGW in path k.In both GTP and PMIP approaches the LDPD 푘  includes (i) the delay of routing IP packets between CN and PGW; (ii) the delay of data packet encapsulation (over GTP-U or GRE) on PGW; (iii) the delay of routing the tunneled packets in the path between PGW and SGW; and (iv) the delay of data packet decapsulation in SGW.
If the MN experiences a handover with SGW relocation during the session time, data packets have to be tunneled between two SGWs.The delay caused by this procedure must also be taken into account in LDPD 푘  .
The detailed derivation of ALDPD 푖 for the centralized architecture is given in Appendix A.1.

The Decentralized Architecture.
In the decentralized architecture, S/PGW performs both SGW and PGW functionalities.The control messages are not tunneled during the initial attachment and handover procedures, and neither the data packets are forwarded between PGW and SGW.The only GTP tunneling is between S/PGW and eNodeBs, which is also the case for the centralized architecture (Figure 7).In the decentralized architecture, regular IP data packets with no tunneling are forwarded through CNs to S/PGWs.Therefore, LDPD 푘  only includes the delay for routing the packets in the path between an arbitrary CN and S/PGW.
Similar to the centralized architecture in the case of a S/PGW relocation during node session time the delay for forwarding the tunneled data packet between two S/PGWs must be considered.
The detailed derivation of ALDPD 푖 for the decentralized architecture is given in Appendix A.2.

ACDPD for Individual Mobile
Node.ACDPD 푖 from (8) includes the cost of handling mobility control messages (CAP 푖 and CHP ℎ  ) and the cost of data packet delivery (CDPD 푘  ) from CN to the access layer nodes in the created paths, during the MN's session time.This section describes the parameters in both centralized and decentralized approaches and details these parameters for GTP and PMIP protocols.For the sake of simplicity, we assume that the costs of the processing (퐶 푝 ), routing (퐶 푟 ), and tunneling (퐶 푡 ) in the routers are the same.Furthermore, given the traffic load crossing through the routers at each layer, we assign 1, 1/3, 1/4 unit processing cost per each KB of traffic for the root router (PGW in centralized architecture), the distribution routers, and the access routers (SGWs or S/PGW), respectively.This is a rational comparative assignment for the purpose of comparing network architectures.

The Centralized Architecture
The GTP-Based Approach.In the GTP-based approach, CAP 푖 defines the cost of exchanging C.S.Req/Res and M.B.Req/Res tunneled messages between the first attached SGW and PGW.It also includes the cost of constructing/extracting of messages in the first SGW and PGW and the cost of routing the tunneled messages (over GTP-C) in the path between them.
Similarly, CHP ℎ  describes the cost of transferring only the M.B.Req/Res tunneled messages (over GTP-C) between the first attached SGW and PGW during eNodeB relocation or the target SGW and PGW for SGW relocation.
The PMIP-Based Approach.For the PMIP protocol, CAP 푖 comprises (i) the costs of swapping the G.C.S.Req/Res messages, including the SCTP and IPv6 protocol headers, between the first attached SGW and PCRF; (ii) the cost of tunneling (constructing/extracting) the P.B.U/P.B.A messages in the first attached SGW and PGW; and (iii) the cost of routing the tunneled messages over GRE in the path between them.
CHP ℎ  for eNodeB relocation only includes the cost of processing G.C.S.Req/Res messages exchanged between the first SGW and PCRF.During SGW relocation, CHP ℎ  defines the cost of exchanging aforesaid messages between the target SGW and PCRF.Moreover, it includes the cost of exchanging the tunneled P.B.U/P.B.A messages over GRE between the second SGW and PGW.
CDPD 푘  specifies the cost of routing regular IP packets from CN to PGW and the cost of tunneling (encapsulating) data packets over GTP-U or GRE in PGW.It also counts for the cost of routing the tunneled packet in the path between PGW and SGW and also the cost of decapsulating the tunneled packet in SGW.
The cost of forwarding the tunneled data packets between two SGWs must also be considered in case of a SGW reallocation during the session time.
The detailed derivation of ACDPD 푖 for the centralized architecture is given in Appendix B.1.

The Decentralized Architecture
The GTP-Based Approach.As described in Section 4. In decentralized architecture, CDPD 푘  specifies the cost of routing IP packets with no tunneling header in the path between CN and S/PGW.If a MN experiences handover during the session time, CDPD 푘  also includes the cost of forwarding the tunneled data packets between the first and target S/GWs.
Note that, in both network architectures, if there is more than one SGW (or S/PGW) between CN and mobile node the data packets are tunneled and forwarded among them.Therefore, the additional delay and cost due to this procedure also must be considered in calculating LDPD 푘  and CDPD 푘  .
The detailed derivation of ACDPD 푖 for the decentralized architecture is given in Appendix B.2.

Numerical Results
This section presents the numerical results of the performance metrics, defined in Section 4. The obtained results provide scalability indicators for the EPC centralized and decentralized network architectures via a quantitative analogy over the performance of GTP and PMIP protocols.Table 5 lists the input parameters and Table 6 summarizes the default parameters used in Sections 3 and 4.

Average Cost and Latency of MN's Initial Attachment,
Handover, and Data Packet Delivery Procedures.The graphs in Figure 8 show the impact of the number of MNs on GTP (solid lines) and PMIP (dash lines) performance in terms of cost and latency, for both the EPC centralized (red lines) and decentralized (blue lines) architectures.It is notable that the decentralized architecture outperforms the centralized one, regardless of the mobility protocol.This is because in the decentralized architecture the control plane messages without tunneling are handled in the S/PGWs of access layer.Furthermore, data traffic with regular IP packets are only transmitted over the paths between CNs and S/PGWs without crossing the root node.Accordingly, latency and cost measures are substantially improved.
The MN's Initial Attachment Procedure.Figures 8(a) and 8(d) show that, in centralized architecture during the initial attachment procedure, PMIP achieves better outcomes than GTP particularly in latency.This is due to the fact that four  messages in GTP and two messages in PMIP are exchanged (over GTP-C and GRE, respectively) between the SGWs and PGW.The other two messages in PMIP are exchanged between the SGW and PCRF, through a private network link with no tunneling (Figure 4).In decentralized architecture, the S/PGWs handle these messages.Hence, the related latency and cost are only due to processing of messages on the S/PGWs, having no tunneling header.In this scenario, GTP provides better results as its initial attachment messages are smaller than PMIP messages.
For the centralized architecture, Figure 8(a) shows that the growth rate of the attachment latency in PMIP is 5% less than GTP.However, in the decentralized architecture GTP performs 4% better than PMIP.In addition, using decentralized architecture, the attachment latency is improved ≈44 and ≈8 times in GTP and PMIP, respectively (Table 7).
Figure 8(d) shows that for attachment cost in centralized architecture PMIP provides (>1%) lower increasing slope compared to GTP.However, in the decentralized architecture this metric for GTP is ≈7% lower than for PMIP.The percentage shows the growth rate (GR) of the exponential graphs.* * The percentage shows the increasing slope (IS) of the liner graphs.The exponential and linear graphs are expressed by 푌 푛 = 푌 푛 0 × (1 + GR) 푛 and 푌 푛 = IS × 푛 + 푌 푛 0 , respectively.푌 and 푛 represent the performance metrics and the number of MNs in 푦and 푥-axes, respectively.* * * 푅푎푡푖표 (R) shows the proportion of the metrics in centralized to decentralized architecture, derived for 푌 푛 0 (푛 0 = 100).
Furthermore, in decentralized architecture GTP and PMIP decrease the attachment cost by ≈9 and ≈3 times, respectively, compared to the centralized architecture.
The Handover Procedure.Figure 8(b) shows that in centralized architecture PMIP offers a lower handover latency than GTP.This is caused by the latency of routing two tunneled messages between the attached SGW and PGW in GTP for every eNodeB relocation.For PMIP the messages are only processed and exchanged between the SGW and PCRP, via a dedicated link.For the cost metric, GTP outperforms PMIP (Figure 8(e)).Because in GTP both the control plane tunnel headers and handover messages are smaller than PMIP.Furthermore, GTP uses fewer number of messages during SGW relocation.In the decentralized architecture, GTP shows better functionality than PMIP both in terms of latency and cost (Figures 8(b) and 8(e)).This is due to the processing of fewer short-sized handover messages in GTP on S/PGWs.Figure 8(b) indicates that for centralized architecture PMIP shows 2% lower growth rate on handover latency than GTP.In decentralized architecture, GTP outperforms PMIP and provides 7% lower growth rate for this metric.Moreover, the handover latency is also improved by ≈62 and ≈3 times in GTP and PMIP, respectively.In terms of handover cost, for both architectures GTP carries out better than PMIP, achieving 2.6% and 12.7% lower increasing slope for this metric in the centralized and decentralized architectures, respectively (Table 7).Furthermore, for decentralized architecture, GTP reduces the handover cost by ≈12 times compared with centralized one, which is ≈2 times when PMIP is used (Figure 8(e)).
The Data Packet Delivery Procedure.As shown in Figure 3(a), sizes of the tunnels header in both GTP and PMIP data planes are the same, and hence, one may expect similar latency and cost for both protocols.However, recalling from ( 4) and ( 7), these metrics also depend on the latency and cost on the attachment and handover procedures, resulting in the similar outcomes.Figure 8(c) shows that PMIP performs 2% better than GTP for centralized architecture in terms of growth rate for latency on data packet delivery.However, in decentralized architecture GTP outperforms PMIP and provides 1% lower growth rate for this metric.Furthermore, GTP improves latency for ≈11% times, which is ≈6% far from PMIP.
For the cost metric, GTP provides better results than PMIP with ≈1% and ≈9% lower increasing slope in the centralized and decentralized architectures.In addition, for decentralized architecture GTP reduces the data packet delivery cost by ≈6 times compared to centralized one, while PMIP achieves a ratio of ≈3 times (Figure 8(f)).
Table 7 summarizes the results discussed in this section.

The Data Plane and Control
Plane Load on the Network Routers and Links.Figure 9 shows the impact of the number of MNs on the network routers and links loads for centralized and decentralized architectures.Although we expected a higher load of both data and control planes in decentralized architecture than in centralized one, we observe that the differences are surprisingly large.That is because in decentralized architecture the control plane messages as well as the data traffic are managed and handled at anchor points placed in the access layer, resulting in reduced load and stress in the upper layers.
Load of the Data Plane.Figures 9(a) and 9(d) show the load of the data plane on the network routers and links for different EPC network architectures, respectively.As expected, the root router (R1-PGW, Figure 6) and the routers on the distribution layer (R2 to R5) in the centralized approach are used more often than in the decentralized one.Accordingly, the network links between these two layers, forwarding data traffic between the routers, are also used more in centralized approach.
In decentralized architecture MNs traffic is mainly handled by the access layer routers (R6 to R11-S/PGW) and load is more distributed over the network.This reduces the stress on upper layers and diminishes the load of the core routers and links by growing the number of attached MNs.
Note that, in both architectures, some of the network links are not used.That is because in the proposed model the path construction between the MNs and CNs is only based on the shortest-paths approach, disregarding other functionalities such as load balancing.
Note also that, in decentralized architecture, the load on router R9 is slightly higher than in centralized architecture.This is because in the former R9 is directly connected to CN3 and directly serves the MNs that link to CN3.In addition in centralized architecture, the obtained results show that the load of the root router (R1) is ≈11 times more than in decentralized architecture.For the routers placed in the distribution and access layers this load ratio on average is ≈2 and ≈1 times, respectively.
Load of the Control Plane.Figures 9(b) and 9(e) show the impact of the number of MNs on loads of GTP and PMIP control messages within the routers for different EPC architectures.In centralized architecture all messages related to MN's mobility are handled by the root router (PGW).The routers at the distribution layer (R2 to R5) have also to be involved on crossing the messages within the core network.However, in decentralized architecture control messages are not traversed to the upper layer routers and the access layer routers (S/PGWs in our model) are in charge of managing the mobility messages.
Figures 9(c) and 9(f) show loads on the network routers due to the control plane messages in terms of amount (KBs) of the handling messages.It is observed that PMIP protocol inflicts more load to the network than GTP protocol.This is because size of the mobility messages and the tunneling headers for PMIP are larger than for GTP.Furthermore, GTP generates less number of control messages during the SGW or S/PGW relocation (Figures 3(a) and 4).
Our results indicate that, for both GTP and PMIP protocols, the control plane load on the routers is amplified linearly by the number of attached nodes to the network.It is also inferred that PMIP imposes in average ≈6 times higher load on the routers than GTP.

Related Work
Using a simulation model, a comparative study has been performed in [23], to analyze performances of the proposed Dynamic Mobility Anchoring (DMA) scheme and Mobile IP (MIP) protocol to handle the MNs' TCP-based traffic.In this study, the handover latency and TCP segment delays have been used as the performance metrics.The work in [16] presents an analytical and experimental evaluation of PMIPbased mobility management in centralized and distributed ways.The metrics of signaling and packet delivery costs, handover latency, and packet loss have been used to present the trade-offs between two architectures.In a similar way, [24] investigated impact of the distribution and dynamic activation of the mobility anchors on performance of PMIP protocol.Here, the evaluation has been accomplished based on the packet delivery cost, anchored/nonanchored packet ratio, and traffic distribution ratio.The authors in [25] proposed a PMIPv6-based distributed mobility management model (D-PMIPv6), which outperformed the conventional PMIPv6 in terms of route optimization, and packet delivery and signaling costs.The discussed approach was based on distribution of access routers with a centralized management model.A comprehensive study of distributed and dynamic mobility management (DDMM) has been done by [26].The authors discussed an architecture with distributed deployment of mobility anchors and dynamic activation.In this research, it has been shown that DDMM generally achieves higher performance compared to centralized mobility management in terms of packet delivery cost, tunneling overhead, and throughput.The work in [15] proposed a partially (P-DMM) and a fully distributed mobility management (F-DMM).In the former only the data plane was distributed, while in the latter both data and control planes were distributed.In this work, it was shown that the F-DMM outperforms P-DMM strategy in terms of handover latency and packet loss.In [9], the functional differences of the GTP and PMIP, within the EPC, have been discussed and the signaling costs of these protocols, using dynamic QoS and policy control, have been evaluated.An introduction of the different LTE core network architectures and mobility management schemes has been presented in [27].In this work, an analytical model based on the two-dimensional hexagonal random walk model was proposed for comparing performance of the mobility management on different EPC core architectures.For the performance evaluations the total signaling cost and load on the network nodes have been used as the comparison parameters.
Our work differs from the literature, particularly with respect to comprehensive analysis of four main possible scenarios on LTE system, that is, the current (centralized) and future (decentralized) EPC architectures with GTP or PMIP mobility protocols for 3GPP access.Compared to previous studies, mostly focused on the performance analysis of MIP/PMIP protocols in general centralized and distributed approaches, we paid special attention to the LTE network, the dominant mobile networking technology to accommodate the major part of worldwide mobile data traffic in coming decade.
We conducted a hybrid modeling study, detailing the GTP and PMIP protocols (the only used mobility support mechanisms in the EPC) data and control planes functions for 3GPP access, to quantify the performance and scalability of the current and future LTE network architectures.The proposed model enables detailed analysis on the load of network entities and on the network efficiency parameters for the user's data and control planes, by increasing the number of mobile nodes.To the best of our knowledge, this is the first work to perform a quantitative analysis of such scale on the LTE system in order to compare the performance of different network architectures.

Conclusions
Although decentralization of the core network architecture is not standardized by the 3GPP yet, it is seen as the vision on emerging future mobile network (e.g., 5G) architectural standards.In this regard there are different ongoing research projects and activities, aiming to come up with the solutions to address the features and demands of current mobile networks in a decentralized architecture.In this article, we have particularly conducted a detailed analysis and comparative study of centralized and decentralized network architectures in the LTE system for 3GPP access.We have carried out a hybrid study comprising simulation and analytical modeling to evaluate the attached node's (device's) data traffic and mobility related messages load, as well as the latency and cost for the initial attachment, handover, and data packet delivery procedures in both network architectures, using GTP and PMIP protocols.Our research aimed, in particular, to quantify the impact of the number of connected devices to the network, on various performance and scalability metrics for both LTE network architectures.
Given the specified scenarios and parameters in our study, the optioned results show that decentralization of the LTE network architecture substantially reduces load of data traffic (≈11 times) on the core of the network.Accordingly, this leads to improvement of the latency for attachment (≈44 times) and handover (≈61 times) procedures during the node mobility and the latency for data delivery procedure (≈11 times) (using GTP), which are the keys in providing higher QoS and QoE for the subscribers.It is also shown that a decentralized architecture (using GTP) imposes remarkably lower processing cost on both the data plane (≈7 times) and control plane (≈11 times during handover) (see Table 7), which is also an essential concern for network operators.The analysis indicated that, in the centralized architecture, PMIP achieves slightly lower growing rates for the latencies but provides higher increasing slopes for the loads and costs compared to GTP protocol.However, in a decentralized architecture, GTP protocol outperforms PMIP for all the performance metrics.
The presented approach of analysis helps to assess the network efficiency (in terms of the data and control planes latency and processing cost) and the network scalability (in terms of handling the data traffic load and signaling overhead) in the LTE network with different core network architectures.The outcomes of this research provide clear intuition on the impact of the growing number of users on the current and future LTE systems.In a future study, our analysis can be further extended by taking into account other parameters such as the additional control plane overhead, demanded to address the core network features (e.g., mobility management, policy controlling, and accounting), and the required investment and level of complexity for modification and maintenance of the network architectures.This can provide the mobile network operators a trustworthy insight during decision making and policy development procedures Wireless Communications and Mobile Computing in order to accommodate the network infrastructure for coping with the future demands on mobile data traffic.

Discussion
This section briefly discusses the major modifications that would be required on the current EPS system as well as some of technical challenges to realize a decentralized LTE network architecture and to support MN's mobility in this architecture.
8.1.Anchor Point Relocation.In the current 3GPP LTE specification, IP-based traffic continuity is not supported when a MN changes its EPS traffic anchor point (PGW), for example, during interoperator roaming procedure.In the existing LTE system, MN's traffic remains anchored to a single PGW until it moves out of the access network, and upon a handover to a different anchor point, flows initiated at the previous PGW will be stopped.This is due to the fact that, in the current LTE technical specifications, there is not a standard mechanism to support a handover procedure with PGW relocation and to provide the continuity of bearers after PGW relocation.Following a decentralized LTE architecture, the anchor points (S/PGW) are placed closer to the edge network to locally handle the MNs' traffic and mobility, and the MNs are expected to change their anchor point far more often.In this case, two layers of mobility management are needed in order to handle the MNs' IP traffic continuity during a handover with a S/PGW relocation: (i) within the EPC network (between the S/PGWs and eNodeBs) and (ii) above the EPC network (between the S/PGWs and CNs).The main obstacle in implementation of IP address continuity within the EPC network comes by the fact that, in the current EPS architecture, there is neither signaling nor data forwarding scheme available between two different PGW entities.However, by combining the PGW and SGW functionalities into a single entity (S/PGW), current standard solution and messages used for a handover procedure with SGW relocation can be revised and modified to support IP traffic continuity between different S/PGW domains.We have addressed this issue in our previous works and detailed information about the related modifications can be found in [19,20].
The mobility above the anchor points (S/PGWs) is discussed in the following section.

Traffic Steering on Top of the Anchor Points.
As described previously, in the current EPC architecture, the PGW as a central anchor point handles all the MNs data traffic and mobility related functions.However in a LTE with decentralized architecture, the S/PGWs are distributed closer to the edge of the network, anchoring the MNs attached locally and handling mobility for those users moving between eNodeBs.In this case, an additional mobility support mechanism also needs to be implemented to keep ongoing sessions active (above the EPC) for the MNs performing handover with mobility anchor (S/PGW) relocation.To address this issue, we have developed two network layer [19,20] and one transport layer [21] solutions.Generally, the mobility support approaches running to the network layer handle MN's mobility requirements in a transparent manner and hide any changes from upper layers.However, they require some infrastructural modifications and impose extra overhead.Transport layer mobility management schemes keep the network infrastructure intact and implement the whole functionality for supporting MN's mobility in the transport layer of the end host entities.Different approaches may impose further signaling efforts in the network, which must also be taken into account on calculation of the performance metrics.

Unifying Anchor Points Data.
Besides the traffic forwarding and mobility anchoring functions, PGW is also centrally in charge of other tasks such as the policy enforcement, packet filtering, packet screening, lawful interception, and charging for each MN.Moving towards a decentralized EPC architecture leads to the distribution of several S/PGWs on the edge of the access network.This demands common templates of the aforementioned functions for S/PGWs to carry out the unique regulations for the MNs performing handover among them.To do so, additional network connections and synchronization mechanisms (or, e.g., resource pooling and memory sharing) need to be applied between the S/PGWs and also with the other EPC components (e.g., MME and PCRF).

Figure 3 :
Figure 3: LTE data and control planes for 3GPP access.

Figure 6 :
Figure 6: The network topology used in the analytical model.

Figure 7 :
Figure 7: A general view of LTE decentralized architecture.
The GTP-Based Approach.LAP 푖 only includes the delay of processing the C.S.Req/Res and M.B.Req/Res messages on S/PGW.Similarly, the delay of LHP ℎ is for processing the M.B.Req/Res messages in the first attached S/PGW during eNodeB relocation and in the second one for S/PGW relocation.The PMIP-Based Approach.The delay of LAP 푖 is for processing the G.C.S.Req/Res, including the SCTP and IPv6 protocol headers, and P.B.U/P.B.A messages on S/PGW.LHP ℎ  for eNodeB relocation only includes the processing delay of G.C.S.Req/Res messages in S/PGW.In the case of S/PGW relocation the delay is due to processing the G.C.S.Req/Res and P.B.U/P.B.A messages in target S/PGW.
2.2, the decentralized architecture does not have a control or data plane tunneling in the core network.Therefore, CAP 푖 only includes the cost of processing C.S.Req/Res and M.B.Req/Res messages in the S/PGW.CHP ℎ  includes the cost of processing M.B.Req/Res messages in the first attached S/PGW for eNodeB relocation and in the target S/PGW during S/PGW relocation.The PMIP-Based Approach.In this approach, CAP 푖 is related to the cost of processing G.C.S.Req/Res messages, including the SCTP and IPv6 protocol headers, and the P.B.U/P.B.A messages in S/PGW.CHP ℎ  for eNodeB relocation only defines the cost of processing the G.C.S.Req/Res messages, including the SCTP and IPv6 protocol headers, in the first S/PGW.During S/PGW relocation it describes the cost of processing the G.C.S.Req/Res and P.B.U/P.B.A messages in the second S/PGW.

Figure 8 :
Figure 8: The latency and cost of GTP and PMIP mobility protocols in EPC centralized and decentralized architectures.
b er o f m o b il e n o d es (f) Control messages load (KB) on the routers in decentralized architecture

Figure 9 :
Figure 9: The data plane and control plane loads, over the network routers and links in EPC centralized and decentralized architectures.

Table 7 :
Performance of GTP and PMIP in the various EPC network architectures.