A Trace Data-Based Approach for an Accurate Estimation of Precise Utilization Maps in LTE

.


Introduction
Long-Term Evolution (LTE) was defined as an evolution of the Universal Mobile Telecommunication System (UMTS), promising higher data rates, higher spectral efficiency, lower latency, and more flexible channel bandwidths than its predecessors.In practice, these ambitious goals can only be achieved by an effective network management process.Thus, automatic network management has been recognized as one of key drivers for the success of LTE networks [1].
To manage a cellular network, a vast amount of information must be handled by operators in the form of counters, alarms, events, charging data records, or trouble tickets.For simplicity, network planning and optimization are carried out nowadays based on Key Performance Indicators (KPIs) [2].KPIs are computed from counters stored in network elements, referred to as Performance Measurements (PMs).
PMs are updated with connection events and periodically uploaded in the network management system.During network planning, the future system is designed to fulfil target values of a predefined set of these KPIs [3].At this stage, either analytical models or simulation tools can be used to estimate network performance from traffic, mobility, and propagation predictions.After network deployment, the accuracy of these tools can be improved by collecting live measurements.This process, known as measurement-based replanning, has successfully been applied to antenna tilt planning [4], frequency planning [5], adjacency planning [6], and network hierarchical structuring [7].Later, network parameter plans can be fine-tuned on a cell and periodic basis by automatic optimization procedures to track fluctuations due to fast changes in traffic demand or as a temporary solution until a corrective replanning action is carried out [8].Likewise, operators cannot detect localized performance problems (e.g., coverage holes) in a live network from PMs. Although, the latter issues could also be detected in drive tests with specialized equipment, these measurement campaigns can only cover small geographical areas for economic reasons.
With the latest advances of information technology, it is now possible to process huge data volumes almost in real time [10].This process, known as Big Data Analytics (BDA), can boost the performance of network management procedures by considering very detailed information that was discarded before.Thus, BDA has been identified as an enabling technology for Self-Organizing Networks (SON) in 5G systems.A first step in the exploitation of data is the combination of User Equipment (UE) measurements with positioning information to build accurate network performance maps (a.k.a.-maps) [11].These maps obtained by geolocating measurements can be used to calibrate propagation models in network planning tools or pinpoint performance issues in a live network, thus reducing the need for drive tests during network operation.Location information can be provided by the UE (e.g., in Assisted Global Navigation Satellite System (A-GNSS)) or computed by the network from UE measurements (e.g., Observed Time Difference of Arrival (OTDOA) or Enhanced Cell Identifier (ECID)) [12].Measurement collection can be automated by the Minimization of Drive Test (MDT) feature [13].A second step in the exploitation of data is the processing of traces collected by network elements.Traces systematically register all signaling events associated with a specific cell/user in some period of time.Thus, temporal granularity in traces is much smaller than ROP in PM (a few minutes against one hour, typically).This detailed information can be used to extend the analysis capabilities of automatic network planning and optimization algorithms in centralized SON tools located in the network management system.However, due to the large amount of information to be saved, overload problems can occur when trace files exceed a certain size limit.An overloaded trace file could lead to wrong conclusions when being analysed.Thus, a proper trace file configuration (i.e., how much information and for how long must be saved) is key to avoid trace file overload.
In this work, an automatic method is proposed to improve the accuracy of radio network utilization measurements in a Long-Term Evolution (LTE) network.The output of the method is an accurate estimate of the spatiotemporal distribution of cell resource utilization, showing the amount of radio resources demanded by users in small regions of the service area for each cell (in the order of tens of meters) within a short time interval (in the order of minutes).Utilization distributions (a.k.a.utilization maps) are computed offline from statistics of radio resources elements (REs) in connection traces stored in the network management system.The proposed approach detects overloaded trace files, so that wrong conclusions from corrupted files are avoided.The resulting maps can be used by operator personnel to visualize traffic patterns or to improve automatic planning and optimization algorithms in a centralized SON tool.The proposed method is tested with real connection traces taken from a large geographical area of a live LTE network.
The rest of the paper is organized as follows.Section 2 covers related work to clarify the novelty of the paper.Section 3 describes the network data handled by the method, namely, network performance indicators, connection traces, and location information.Section 4 presents the method to build the spatiotemporal distributions of network utilization from connection traces.Then, Section 5 presents an example of how the method works in a live LTE scenario.Finally, the conclusions are presented in Section 6.

Related Work
In the literature, several methods have been described to derive the spatial traffic distribution in a cellular network from live measurements.In [14], the spatial user distribution of each cell in a 2G network is estimated for coverage optimization purposes from the ring defined by Timing Advance statistics [15].In [16], a method is proposed to construct realistic spatial user and traffic distributions for cellular networks with time varying usage intensities per land-use class from PMs.The spatial distributions consider user behavior for different times of the day and different days of the week.PMs collected on a cell basis are used to weight distributions according to the traffic in the live network.Unfortunately, these PMs make use only of a few samples of the network performance within a ROP to create a distribution (e.g., one sample per minute in a one hour ROP) which is a rough approximation to the real behavior of the network.Other studies like [17] use PMs that are classified into spatial bins, so more complex PMs can be calculated from the stored information.
Using a more detailed input like traces can be advantageous to enrich network data.In [11], data from drive tests are used to build up estimates of the spatial network performance, commonly known as -maps ( refers to the parameter distributed in spatial bins).Thus, propagation and network performance models can be tuned.However, the main drawback in drive tests is the cost and access restrictions.While drive tests are limited in space and time, traces collected by base stations are easily available for the whole network service area and for large time periods.
In [18], several sources of information, including data connection traces, are used to estimate the average percentage of uplink/downlink Physical Resource Blocks (PRBs) used in a cell.However, a cell-based statistic is built and no spatial or temporal distribution for such a usage percentage is calculated.Similarly, a comprehensive analysis of several widely accepted throughput performance indicators based on real LTE connection traces is presented in [19].The analysis is performed on a per-cell and per-connection basis, but no spatiotemporal distribution is derived.
In the context of SON, several works have considered the use of traces to improve self-healing and self-optimization algorithms.In [20], a root cause analysis methodology based on user connection traces is proposed to identify the cause of abnormal connection releases in a LTE system.In [21], a self-tuning algorithm is proposed for adjusting antenna tilts in a LTE system on a cell-by-cell basis.For this purpose, several new indicators are computed to detect insufficient cell coverage, cell overshooting, and abnormal cell overlapping from user connection traces.
In this work, data connection traces are used for the construction of a temporal and spatial map describing how the radio resource usage is distributed both in time and in space.The usage of connection traces provides a thin space/time granularity, which can be used to better understand network traffic.

System Model
In this section, network data involved in the construction of the utilization distributions are presented.First, some network performance indicators involved are introduced, and, then, connection traces are explained.

Traffic and Network Capacity Indicators. A first indicator
for the analysis of cellular network capacity is the carried traffic, defined as the volume of data transferred by the network.Carried traffic mainly depends on user demand (offered traffic) provided that the network capacity is enough.Figure 1 shows the structure of the time and frequency resources available for data transmission in LTE (normal cyclic prefix case) [22] that defines what has already been named as network capacity.As seen in the figure, the smallest resource unit is the resource element (RE), consisting of 1 subcarrier for a duration of 1 Orthogonal Frequency-Division Multiple Access (OFDMA) symbol.REs are grouped into RB, consisting of 84 REs (=12 subcarriers ⋅ 7 symbols).Such RBs are usually assigned in pairs in the time domain (2 time slots).The maximum number of RBs depends on the available system bandwidth, ranging from 6 to 100 for 1.4 and 20 MHz, respectively [22].
In the LTE downlink, some REs are reserved for special purposes, such as Reference Signals (RS), synchronization signals, control signaling, and critical broadcast system information.Only the remaining REs are used for data transmission.Additionally, cell (or network) utilization measures the amount of radio resources used in a cell (or network), usually compared to the maximum number of resources that it is possible to use (i.e., a ratio is usually provided).More specifically, utilization ratio, CUR, in a cell  is defined as where UR is the total number of used radio resources (typically RBs) and AR is the total number of available resources for data transmission in cell .Superscript PM refers to the origin of information for CUR calculation, in contrast to CUR calculation from data connection traces, as it will be explained later.Note that both measures, UR and AR, come from the aggregation of information during the ROP; that is, UR is the sum of all RBs used across all TTIs in the period of measurement, and AR is the product of the number of available RBs in a cell  by the number of TTIs per period.
Similarly, a network cell utilization ratio (CUR) is defined as the average of CUR() for all cells in the network.CUR is typically available in live network equipment as a PM on a per-cell and ROP basis.Unfortunately, one value per ROP is a rough time scale if a higher time resolution is desired.Alternatively, to PMs, data traces supply information of used radio resources at a finer level, so an alternative CUR can be calculated from data connection traces.

Data Traces.
Data trace files (DTF) consist of records with radio related measurements performed by a UE or a base station.Such measurements can be recorded periodically (e.g., every minute) or triggered by a specific event (e.g., a connection ends).DTFs can be classified into User Equipment Traffic Recordings (UETR) and Cell Traffic Recordings (CTR) [23].UETR are used to monitor a specific user, while CTR are used to monitor cell performance.Both UETR and CTR are constructed from individual connections, but, in UETR, the operator decides the tracked UE, whereas, in CTR, all (or a random subset of) UE in a cell is recorded [24].The types of events collected in UETR and CTR are as follows: (i) Internal events are generated inside the eNodeB and sent to the Operation and Support System Radio and Core (OSS-RC) for monitoring purposes.Internal events contain information of events, procedures, and periodic reports at UE, cell, or eNodeB level.They can be event-triggered, periodical, or related to procedures taking place in the UE or the cell.
(ii) External events are generated externally to the eNodeB, corresponding to Layer 3 Protocol messages.
In LTE, eNodeBs store Radio Resource Control (RRC) messages received from the UE through the LTE-Uu interface and the exchanged messages with other eNodeBs through either the X2 or the S1 interface.
Trace collection starts with the operator defining the event(s), UE and cells to be monitored, and the reporting period (i.e., ROP).After enabling trace collection, UE transfers its measurement records to its serving eNodeBs.When ROP is finished, the eNodeB generates CTR and UETR trace files encoded in ASN.1 format, which are then sent to the OSS-RC.Some of the measurements recorded in traces (e.g., the amount of radio resources used by UE) are reported by the user at the end of the user connection.A user connection is defined as the time spent by a user in a cell comprising from the context setup to a context release or a HandOver (HO).
In other words, a user connection is considered to end when the service is terminated (e.g., a voice call ends) or when the serving cell changes (e.g., a HO, leading to more than one connection in a voice call).Events are saved in trace files with a maximum size, set by the operator.Depending on how the trace collection process is configured (i.e., how much UE is monitored, how much information must be saved, or how long the ROP is defined), maximum size of the data trace file can be reached before the ROP is ended.If this happens, trace file is overloaded.Note that, when overload occurs, events for the rest of the ROP are not saved in trace file.Trace file is, thus, corrupted, and any statistic extracted from that file might be incorrect.

Method Description
In this section, a method to build a spatial and a temporal distribution for CUR on a cell basis using connection traces is presented.Utilization maps allow determining accurately how the resources have been utilized.The method also detects overload situations in trace files and is able to extract the right estimation to generate utilization maps in these circumstances.
The method consists of four stages, namely, data collection, data preprocessing, construction of the distributions, and overload detection/correction.

Data Collection.
The input dataset consists of traces collected in the network management system for one or more ROPs.In this work, trace information is enriched by a specific vendor software, Ericsson's Trace Processing Server (TPS) tool.TPS adds information to the reported events, such as the Call ID or the Cell ID.Moreover, TPS provides each event with a geolocation.Note that users are not typically geolocated, and spatial information is needed for the construction of utilization maps.Different geolocation calculations are used by TPS, depending on the Reference Signal Received Power (RSRP) data available in every event, as follows: (i) If at least three radio measurement reports corresponding to sectors located in different sites are available for the event, then the UE's position is estimated by a classical triangulation method from corresponding site locations [25].
(ii) If only two radio measurement reports from different sectors in the same site are available for the event, UE position is estimated with additional Timing Advance (TA) measurements.Briefly, distance and angle between server cell and UE are estimated by TA measurements, defining a ring around the antenna where the user can be located.UE's location inside that ring is defined by the distance and relative sector locations, respectively [26].
(iii) Finally, if RSRP measurements are reported from only one sector for the same event, TA measurements and sector planning information (i.e., location, azimuth, and beam width) are used to estimate UE's position.
As a result, the information extracted from connection traces in the dataset is as follows: (i) A unique identifier for each connection (referred to as connection ID or CID) and record (referred to as event ID or EID)  starts.This stage consists of two steps: (a) reordering records and (b) the share of used resources along time connection.Figure 2 illustrates this preprocessing stage.In a first step, all records in data traces are grouped and ordered by their CID (i.e., they are ordered by the connection they belong to).Hereafter, the number of CIDs available in a data trace is defined as  CID and  is the index of the th connection (i.e.,  = {1, . . .,  CID }).The number of records in every connection is defined as  rec () (e.g.,  rec (888) = 3 in Figure 2).Then, all records in a connection are sorted by their timestamp.Note that any record in data trace is totally identified by its EID.Index  is used in this work to refer a generic EID (i.e., a generic record), and, thus, any indicator in records can be referred by that index (e.g., CID() or UsedRE() refer to the values of connection ID and used REs in record with EID = , resp.).
In a second step, the method estimates the number of resources reported by each event of a connection.Note that the amount of used REs for the whole connection,  UsedRE (), is only reported at the end of the connection .Thus, the evolution of used REs along the connection is not recorded.Current practice in network planning tools is to assign UE measurements summarized at the end of connection to the location of the UE at the end of the connection (i.e., UsedRE() = 0 for all records with CID =  but UsedRE() =  UsedRE () for the record with the latest timestamp, Figure 2).
Alternatively, it is proposed here that all records in a connection include an estimation of the REs used during the time interval from last record reported.The total number of REs used during the whole connection,  UsedRE (), is distributed along the records reported during the connection .For this purpose,  UsedRE () is spread in  rec () time intervals and assigned to the  rec () records available for the connection .For simplicity, it is assumed that  UsedRE () is uniformly distributed between records belonging to the same connection.Thus, the amount of radio resources assigned to every record is calculated as Figure 2 shows an example for CID = 888 and  rec (888) = 3.
The latter assumption of a uniform temporal distribution is not necessarily true for individual UE in a millisecond scale (e.g., voice call with discontinuous transmission), but it is true when aggregating resources from multiple users are in a utilization map with a resolution of minutes and tens of meters.

Construction of Utilization Maps.
Once the use of REs has been distributed among records on a per-connection basis, the construction of the temporal and spatial distributions, hereafter referred to as utilization maps, is performed.These utilization maps report about the CUR along time and space.In theory, time resolution can be as short as the timestamp granularity in data records (typically milliseconds).However, a coarser granularity is typically defined for utility reasons by the operator so time dimension is discretised into bins of equal size (bin = 1 minute, typically).Similarly, spatial resolution can be as short as the geolocation algorithms (typically meters), but a coarser granularity is desired, so a spatial square grid is defined through the setting of a spatial bin ( bin , typically tens of meters).Thus, a discrete utilization map is defined as CUR(, , , ), where  and  are the indexes in the spatial grid, and  denotes the time interval.An additional index, server cell , is added with the additional aim of enabling a cell-based analysis.This discrete CUR is calculated as where Loc() is defined as ∀ ∈ [(⋅ bin , ⋅ bin ), ((+1)⋅ bin , (+ 1) ⋅  bin )] and  bin is defined ∀ ∈ [( − 1) ⋅  bin (),  ⋅  bin ].Also,  TTI is the number of TTIs in  bin and  AvailRE () is the number of available REs in cell  per TTI, computed as where  SC is the number of subcarriers per Radio Bearer,  Sym is the number of symbols per TTI, and  RB is the number of RBs in the cell bandwidth. Sym is defined network-wide, whereas  SC and  RB () can take different values for different cells.Summarizing, (3) calculates the aggregation of all values of UsedRE() in those records  with serving cell , created in some time inside the temporal bin and in some location inside the square spatial bin, and divides that aggregated value by the number of all available REs in that temporal bin.Note that  AvailRE () is constant with time, as it can only change after a replanning action (e.g., the addition of a new carrier in cell ).
Finally, spatial and temporal utilization maps are obtained through the aggregation across different dimensions in CUR(, , , ).More specifically, the spatial utilization map CUR (spat) is defined as where   bin is the number of temporal bins during the ROP for the data trace (i.e.,   bin = ROP/ bin ).Note that (5) aggregates CUR values at the same location, even if resources at that location have been assigned to different cells.This overlapping phenomenon (i.e., UE at the same point is served by different cells) is usual due to mobile fading channel phenomena, especially at cell edges.The aim of CUR (spat) (, ) figure is, thus, the observation of used radio resources at some point, whatever cell those resources are assigned to.If a spatial map for a particular cell, , is desired, it can be easily obtained as Besides the spatial analysis, operators usually aim to analyse the evolution of CUR along time for some cell , so an additional temporal utilization map is defined in a cell basis, CUR (temp) (, ), and calculated as Note that the time resolution of utilization maps does not necessarily coincide with the trace collection period (i.e., ROP) defined by the operator.As previously stated, the time resolution of utilization maps can be as short as the event timestamp granularity, provided that event aggregation is carried out in a sufficiently small time window.Likewise, the time resolution can expand beyond one ROP if trace files from consecutive ROPs are merged.Thus, the ROP only limits the maximum frequency with which utilization maps are updated, but not their accuracy.In current networks, ROP is fixed to 15 minutes for both PMs and traces, as this ensures a reasonable update frequency for PMs while reducing the exchange of information between base stations and the network management system.Such a period is small enough if utilization maps are used for visual checks by the operator.

Overload Detection and
Correction.An incorrect trace configuration leads to excessive information to be saved and, as a consequence, the overload of data trace file.When information overload occurs, the amount of data overflows the collecting system and no information is saved from that moment until the space is flushed at the beginning of a new ROP.Overload in a cell  can be easily detected when representing CUR (temp) (, ) as a lack of information from some temporal bin,  over , up to the end of the ROP (i.e., CUR (temp) (, ) = 0, ∀ >  over ).Note that events in cell  are still reported after overload (i.e., radio resources are used for  >  over ), but this information is not saved.CUR calculation in (5) assumes that information in data traces has been collected for a complete ROP, so CUR could be underestimated when overload occurs and it is not detected.This problem can be solved by replacing   bin in ( 5) and ( 6) wqith the number of active bins   act,bin , that is, the number of temporal bins in a ROP before overload occurs, calculated as where  over is the last timestamp registered before overload occurs.Thus, an estimation of CUR for the complete ROP is given from events reported just before  over .This extrapolation assumes that results obtained at the first part of the ROP,  <  over , can be extended to the rest of the ROP,  >  over .

Proof of Concept
Utilization maps are built with the above-described approach from real connection traces collected in a live LTE network.For clarity, the analysis setup is first described and results are presented later.

Analysis Setup.
Data is taken from a cluster of 74 cells, covering a geographical area of approximately 500 km 2 .Data traces were collected on a working day (Tuesday) from 12:00 pm to 17:00 pm in a ROP basis of 15 minutes (i.e., 20 data traces are available per cell).Additionally, channel configuration in this network is  SC = 12,  Sym = 14, and  RB = 50 for all cells in the cluster (i.e., AvailRE() = 12 ⋅ 14 ⋅ 50 = 8400).
Table 1 shows main statistics from data traces, where average, maximum, and minimum values are shown for the duration of the connection in seconds, the number of records per connection,  rec (), and the number of used RE per connection,  UsedRE (), in bytes.Temporal and spatial maps are built for  bin = 1 minute and  bin = 25 m.
Ideally, method assessment should have been carried out in a controlled environment (e.g., a drive test in a precommercial network).Unfortunately, only passive measurements were possible.In the absence of these tests, the only way to check the consistency of the method (apart from some isolated visual consistency checks) is by comparing against PM values.Thus, before computing utilization maps, and with the aim of checking the integrity of network data, it is  checked that CUR (PM) () (i.e., the overall CUR value in cell  extracted from PM) is identical to that CUR value extracted from the aggregation of information in data traces for every cell .The latest indicator is labeled as CUR (traces) () and calculated as From this comparison, it is observed, in some cells, CUR (PM) () ̸ = CUR (traces) ().A closer analysis shows that this difference is due to overload problems in the trace file.Note that, when overload is detected, CUR (traces) () is estimated (and not measured), so small differences can be encountered.Temporal Distribution.Figure 3 shows the temporal utilization map for nonsaturated cell  1 , CUR (temp) ( 1 , ).For an easier analysis, only one hour is presented (from 5 hours available) corresponding to the interval 16:00 to 17:00 pm.The PM showing the average value for the whole hour is also superimposed (6.83%).As shown in the figure, there exist some utilization peaks at 16:05, 16:24, 16:25, and 16:48 hours.
The maximum value (21.48%) is reached at 16:05 hours.The average deviation in this cell and hour is  CUR (temp) ( 1 ,) = 3.61% and the mean average deviation for all the nonsaturated cells in that hour is 4.71%, with a maximum and minimum value of 18.28 and 0.52%, respectively.Note that PM information only gives average CUR values.
Figure 4 shows the temporal utilization map for a saturated cell  1 , CUR (temp) ( 1 , ).The overload phenomenon is clearly observed in the figure ( over ∼ 7 minutes for every ROP in the figure).Specifically, 42% of cells (i.e., 31 cells out of 74) show this saturation phenomenon at some time for the measurement period (5 hours).Average valid time when overload (i.e., time spent from the beginning of ROP until overload occurs) is 6 (of 15) minutes; that is, information is not available for 60% of ROP time in average when overload occurs.This is a proof of the importance of the overload problems.Spatial Distribution.Figure 5 shows CUR (spat) (, ) in percentage for the whole scenario.As in temporal distributions, CUR is calculated for the period between 16:00 and 17:00 hours.White points in the figure indicate there was no use of RE, or  none event, at that spatial bin.As previously said, an spatial value of CUR (spat) (, ) may contain the use of resources from more than one cell.Absolute CUR value in Figure 5 is strongly dependent on the spatial granularity.More important is the information about the spatial use of radio resources and differences across points in the figure, especially in the same cell.The average deviation value of CUR (spat) (, , ) in a cell is 0.04%, with 7.84% as maximum value for this scenario (the minimum value is negligible, i.e., there is at least one cell almost uniformly distributed).This spatial information helps to detect and locate problems in the cell service area, which is the main aim of the utilization map.
Spatiotemporal Distribution.Utilization maps can provide additional information when spatial distributions are compared in different time instants.Figure 6 shows two spatial traffic distributions for a small area covering a few cells at different time instants.More specifically, Figure 6 The comparison of both plots in Figure 6 shows important differences in the location of the spatial bins with a high CUR value.These changes were expected due to the significant user movements at the time when both plots were built (i.e., at the end of the work day).More importantly, differences between plots in Figure 6 illustrate how utilization maps are an effective tool to detect problems with an important spacetime perspective (e.g., large flows of people along the day) and to design effective planning/optimization actions when necessary.

Figure 3 :
Figure 3: Temporal utilization map in a nonsaturated cell.

Figure 4 :
Figure 4: Temporal utilization map in a saturated cell.
(a) shows CUR (spat) (, ) in percentage values at 16:01 hours in a working day (i.e., summation along  in (5) is changed by a concrete  0 instant and  bin = 1min), and Figure 6(b) shows CUR (spat) (, ) values at 16:20 hours ( 1 ).Black points in the figure locate cell sites in the area under study.
The total number of REs used by each connection and TTI,  UsedRE (e.g., 1 RE continuously assigned for 1 second results in  UsedRE = 1000) 4.2.Data Preprocessing.Once connection traces have been collected and enhanced by the TPS, a preprocessing stage

Table 1 :
Resource elements per channel configuration.