Self-Organized Cell Outage Detection Architecture and Approach for 5 G H-CRAN

An attractive architecture called heterogeneous cloud radio access networks (H-CRAN) becomes one of the important components of 5G networks, which can provide ubiquitous high-bandwidth services with flexible network construction. However, massive access nodes increase the risk of cell outages, leading to negative impact on user-perceived QoS (Quality of Service) and QoE (Quality of Experience).Thus, cell outagemanagement (COM) became a key function proposed in SON (Self-OrganizedNetworks) use cases. Based on COM, cell outage detection (COD) will be resolved before cell outage compensation (COC). Currently few studies concentrate on COD for 5G H-CRAN, and we propose self-organized COD architecture and approach for it. We firstly summarize current COD solutions for LTE/LTE-A HetNets and then introduce self-organized architecture and approach suitable for H-CRAN, which includes COD architecture and procedures, and corresponding key technologies for it. Based on the architecture, we take a use case with handover data analysis usingmodified LOF (Local Outlier Factor) detection approach to detect outage for different kinds of cells in H-CRAN. Results show that the proposed approach can identify the outage cell effectively.


Introduction
The future 5G networks will provide ubiquitous highbandwidth network services for varieties types of heterogeneous terminals by integrating different advanced technologies, such as millimeter wave, large-scale Multiple-Input Multiple-Output (MIMO) and Nonorthogonal Multiple Access (NOMA) for wireless access, and Software Defined Network (SDN), cloud computing for core networks [1].In order to make network more efficient, 5G will be a combination of centralized performance processing and distributed demodulation.C-RAN has become a new type of network architecture because of its advantages of cloud computing and distributed processing, which can realize the efficient usage and sharing of computing resources [2].However, LTE/LTE-A heterogeneous networks (HetNet) providing differentiated data services will still be left to 5G networks.To improve spectrum efficiency and energy efficiency for HetNet coexisting with C-RAN, a new 5G wireless network architecture call H-CRAN is proposed [3].In addition, a brief architecture for H-CARN is shown in Figure 1.Each nodeC connects to ACEs or RRHs from wired or wireless fronthaul, and users communicate with ACE or RRH from wireless link, and D2D link is also adopted to improve edge user throughputs as well.NodeCs are located in cloud environment and can connect to Internet.
In 5G H-CRAN, Building Baseband Unit (BBU) pools are centralized into cloud computing node called NodeC [3].These nodes can communicate with each other and control regional access nodes such as remote radio heads (RRHs) and existing legacy communication entities (LCEs).All the NodeC are located in the cloud and connect to the Internet, and they can control ACEs and RRHs with wired or wireless fronthaul.Moreover, NodeC collects and processes the spatial information from ACEs and RRHs and performs most of the processing, calculating, and radio resource management functions.Here ACEs (such as eNodeB) may pay the role of providing seamless coverage and RRHs are response for hot spots.Users in H-CRAN are composed of different kinds of devices, such as mobile phones, laptops, smart meters, and Internet of things (IoT) devices and may adopt advanced communication technologies such as D2D [4].
To realize the universal plug and play function, offload network traffic immediately, and manage the computing and spectrum resource dynamically, SON plays an important role to realize intelligent management of H-CRAN.SON is proposed to reduce service providers' operating expenses in LTE/LTE-A systems and HetNet [5].H-CRAN consists of a large number of heterogeneous access nodes and cloud computing resource units, and its resource should be virtualized for sharing as well.Therefore, a large-scale SON (LS-SON) that integrates a unified autonomic process of ultracomputing, ultra-planning, ultra-configuration, and ultraoptimization is preferred [3].
LS-SON can reduce the complexity of cochannel interference management in H-CRAN to save the operating costs of all RRHs and ACEs.So it is used to coordinate the management functions of the entire network and improve the overall operational efficiency.Because NodeC needs to serve multiple RANs and cooperate with RRHs, it is considered to implement self-configuring, self-optimization, and selfhealing by using a centralized SON architecture for LS-SON.
As one of the critical functions and use cases of SON, self-healing not only identifies fault events, but also is able to diagnose the causes (for example, deciding why it happened), and then triggers appropriate compensation mechanism to turn network to normal state [6].In H-CRAN, it first has to perceive a fault that has occurred or is about to occur and then adopts proper actions to recover the services (partially or wholly, definitely or temporarily).
According to 3GPP standard, the self-healing function can be further divided into multiple use cases, including fault diagnosis, fault classification, and COM.COM first needs to detect the cell outage automatically and then perform a reasonable compensation mechanism to repair the faults, so as to minimize the impact of cell outage [7].Therefore, COD is an essential prerequisite for self-healing.
At present, the studies of COD mainly aim at traditional LTE/LTE-A HetNet with limited data collection, which may not be appropriate for the complex H-CRAN network architecture.This paper proposes a self-organized COD architecture and a corresponding approach for H-CRAN based on previous work and gives a use case with HO data analysis to evaluate them.
The rest of the paper is organized as follows.Section 2 introduces the related work for COD.Section 3 proposes the architecture and approach for 5G H-CRAN COD, meanwhile introducing COD procedures and related technologies.In Section 4, a COD use case for HO data analysis with modified M-LOF is introduced, and conclusions are given in Section 5.

Related Work
In wireless communication networks, cell outage is mainly caused by the software and hardware faults which bring in network communication interrupt, thus affecting network QoS and users' QoE [8].At present, the existing COD researches mainly focus on the LTE/LTE-A HetNet.Several detection methods use data collected by drive test or subscriber complaints to analyze the network faults and cell outages.For instance, in [9], COD is analyzed autonomously by preprocessing the minimization drive testing (MDT) together with local outlier factor based detector (LOFD) and one class support vector machine based detector (OCSVMD) to detect and localize anomalous network behavior.These solutions not only cost much time and manpower cost, but also require expert knowledge or prior experience.
Several studies pay attention to COD with KPI variation such as handover statistics [10], and a cooperative femtocell outage detection architecture, which consists of a trigger stage and a detection stage with RSRP, is introduced in [11].Further, an efficient discriminant function is used to complete COD with CQI and RRC connection reestablishment information in [12].However, these approaches are just suitable for traditional UMTS or LTE/LTE-A networks.
Recently, there have been a few studies that focus on COD with machine learning approaches under data collected from users or cells.In [13], an unsupervised data mining algorithm with a reference signal received power (RSRP) and reference signal received quality (RSRQ) was proposed to detect cell outages.Reference [14] applied the Hidden Markov model for cell outage detection under RSRP and RSRQ as well.Further, a classification-based approach named nearest neighbor (KNN) is proposed for COD in [15], and transductive confidence machines (TCM) based COD with RSRP and SINR data is proposed in [16].Moreover, our previous work used LOF to detect cell outage with handover statistics [17].These works can give suggestions for technology selection for 5G H-CRAN COD.
In particular, RRHs failure in 5G H-CRAN may be difficult to detect because their failure may not trigger operation and maintenance system alarm, which make that self-healing function unable to timely compensate for these nodes.As a result, the outage may last hours or a few days before being discovered, unless the abnormal status is captured by DT test or feedback from users.
Rapid fault discovery and localization of abnormal cells can reduce network paralysis and deterioration caused by node outage.Due to the high density of nodes in H-CRAN, the faults of a single RRH will not quickly affect the network as users may handover or reconnect to neighbor RRHs or ACEs.It increases the fault tolerance rate but makes COD more difficult as well, thus increasing the instability of the network.
Based on above analysis, we find that present COD approaches for LTE/LTE-A HetNet may not be suitable for H-CRAN due to their limited data sets and obvious outage alarm.Therefore, in this paper, we will establish COD architecture with complete procedures to handle implicit RRH outages.

COD Architecture and Approach for 5G H-CRAN
To make it clean, in 5G H-RAN, we also regard the coverage of each RRH as a cell.Still, cell definition for ACEs may use traditional ones.As COD for ACEs can be easily resolved by approach mentioned above in HetNet, here we mainly concentrate on implicit cell outage for RRHs.
For the convenience of illustration, we consider a simple cell outage scenario in H-CRAN under a NodeC as shown in Figure 2.Here one ACE and many randomly distributed RRHs are under control of one NodeC.When a RRH cell turns into outage, its serving users may reconnect to ACE or handover to another RRH cell.This reconnection or handover will result in signaling and communication variation among users, RRHs, ACEs, and NodeC.We require all these data to execute COD if we want to achieve accurate and timely outage detection.
In the above scenario, a simple user reconnect procedure is shown in Figure 3.If a user has to establish a communication link with a RRH, it first sends a "Connection Request" command to it, and then this RRH will send a "Resource Request" command to NodeC.Only when NodeC reply a "Request Response" command to RRH with required resource allocation, the RRH will give a successful connect reply with "Connection Setup" to users.Then the service will go on.It means a user keeps a connection to a RRH under the control of NodeC with allocated resource.If the RRH has gone into outage state due to cable loss or power off, user's service will fail and it will attempt to reconnect to ACE or other RRHs.If NodeC has enough resource, connection between user and ACE will be established again as the same as RRH connection.Here RRH outage may not report to NodeC immediately.However, we can count the time of connect request from user to other RRH and ACE, as the two counting points show in Figure 3.If we store these data and analyze it with time series fitting or prediction, we may get variation features and obtain abnormal points at several time intervals, which may be useful for RRH COD.Based on the above analysis, we want to construct an integrated COD architecture for 5G H-CRAN, and its detailed introduction is given below.
3.1.COD Architecture and Procedures.Firstly, the proposed architecture for COD can be found in Figure 4.
As shown in Figure 4, COD mainly consists of data collection and data analysis stages.Firstly, data collection should store data from different sources, which are as follows: (1) Data collection from users through measurement reports, such as RSRP, RSRQ, SINR, and CQI information: this information is always huge and hard to be handled synchronously as the time interval is fairly dense.So we can set a sampling interval and just take the statistics for them.
(2) Data collection from RRH/ACE, such as HO/Connection Request/NCL: these data may come from the KPI statistics from OAM system.These data are performance indicators and can be used directly as the interval is defined beforehand under acceptable level.
(3) Data collection for NodeC, which may network level data such as network topology, and preconfigured cell parameters (transmit power, spectrum, antenna height, and tilt): these data can be used to supplement the spatial and temporal analysis of different cells.
(4) Drive test data: drive test is a tool to verify network performance afterward.It can be used as a validation for COD conclusions and thus provide correction suggestions for COD approach.
After data collection, all the data can be put to data analysis stage to execute COD.As shown in Figure 4, with SON entities which located at NodeC, the procedure will be executed as follows: (1) SON entities firstly should preprocess these data to improve the data quality.
(2) SON entities adopt temporal and spatial prediction method to obtain varying patterns for temporal data and spatial data, respectively.
(3) Next SON entities choose proper machine learning approach to identify the outage cell.
Still, with COD results, we can give suggestions for COC.And COC effectiveness evaluation can be obtained through network performance monitoring, thus constructing a selforganized loop.The critical technologies of preprocessing, spatial and temporal prediction and machine learning will be introduced next.

Key Technologies for COD
3.2.1.Data Processing.Data processing includes two concepts such as Data Cleaning and Feature Engineering.These two are compulsory for achieving better accuracy and performance before machine learning and deep learning.And it includes data cleaning, data integration, data transformation, and data reduction.So in our architecture, we should choose proper data processing approach to obtain high-quality data.

Spatial Prediction. Spatial prediction technologies
mainly aim at analysis spatial traffic distributions or capture user variations.Currently, several methods have been adopted in this field, such as log-normal or Weibull distribution used in [18] or traffic patterns identifying methods proposed in [19].With spatial prediction, we can obtain different distribution laws.

Temporal Prediction.
Temporal prediction is aiming at predicting future variation direction based on past and  [20] or S-ARIMA model [21] to construct prediction sequences, which will be useful for abnormal traffic recognition.Moreover, joint temporalspatial predication approaches deserve more attention as well.

Machine Learning.
As analyzed in [22], machine learning has been used for LTE/LTE-A self-healing and achieves remarkable results.However, these methods are mainly based on single data source and do not take integrated data set shown in Section 3.1 together.To make COD more accurate and intelligent in 5G networks, deep learning model [23] such as DRN model can be considered.

A Use Case for 5G H-CRAN COD
According to the architecture and procedures above, we introduce a use case to achieve COD for 5G H-CRAN.Here we suppose that the outage RRH or ACE cell cannot carry traffic anymore.Typically at this point, all users served by this cell have to be offloaded to its neighbor cells.This will result in a huge number of handovers in H-CRAN.The rising handovers in neighbor cells could be easily detected by means of data mining.So we will discuss our algorithm by using the temporal data of inHO statistics collected from the NodeC.Still, we mainly collect cell level information stored in NodeC as shown in Table 1.
Moreover, in order to make the evaluation more obvious, we will evaluate the procedure under a simulation platform developed by our lab.Parameter settings can be found in Table 2.
In our simulation platform, we assume that rapid decrease of cell's transmit power indicates cell outage.That is, we can set cell outage manually to evaluate efficiency of our approach.Then users under the affected outage areas switch to the neighbor cells which will increase the number of handovers.We will use the inHO data collected from all the cells with a machine learning methods to evaluate whether outage cells can be detected.

COD Procedures with M-LOF.
Considering the situations above and the advantage of LOF shown in [9], a modified LOF based detector is adopted to detect the outage by extracting both spatial and temporal correlations of handover statistics over the neighbors of outage cells.The procedure is shown in Figure 6.
The COD procedure includes four steps: data collection, preprocessing, machine learning, and localization, and all can be executed automatically under the control of LS-SON.In this way, the outage cell can be detected timely, so we can minimize the performance degradation of the network.Here we will introduce the functions in three steps.

Preprocessing.
When data collection is finished, SON entities proceed to extract feature vector   for LOF [24], corresponding to each temporal data.Since the inHO data is collected from different cells at the different time, the data vector is designed as spatial-temporal series.
In fact, the inHO statistics of outage cells are zero since the KPIs in outage cell cannot be collected.Moreover, for one possibility, there may be a few zero-inHO cells, because no active users are within their coverage.This will enhance the difficulty of detection.In this case, we filter all inHO statistics of zero.Here each cell splits out an ordered time series of inHO.To finish length-wise normalization and carry out uniformization and standardization processing, we slice the time series to several subseries.
For temporal analysis, 5 TTIs data is put into one feature vector   , as shown in below where  represents the order of subseries. (1) The feature vector   utilizes exactly one cell's inHO data, which is used to find out the time when an outage happened.
For spatial analysis, one feature vector   consists of  cell handover statistics at TTI time , where  represents the identity of the cell.
After slicing the data series, linear functional transformation criterion is used to normalize the data to eliminate errors caused by nonuniform features.For example, the criterion of   is defined as 4.2.2.Machine Learning.LOF detection is an unsupervised anomaly detection algorithm.The local densities of target points are calculated by LOF used to compare with its neighbors.The larger the difference between the sample and its neighbors is, the larger outage factor score will be assigned to the sample.The advantage of LOF is that no prior knowledge is required in advance to detect the unknown cell outage.The detailed definitions of LOF are explained as follows.
Definition 1. k-distance of cell , denoted as   (), represents the distance between cell  and its th nearest neighbor, so that for at least  cell   satisfies and for at most  − 1 cells   satisfies  (  , ) <  (  , ) . ( Definition 2. -distance neighborhood of cell , denoted as    () (), when   ()is given,    () () is the set which includes every cell whose distance to cell  is smaller than   ().
Definition 3. Reachability distance of cell  with respect to cell  is denoted as rd  (, ); it is the maximum one of   () and the distance between  and .
Definition 4. Local reachability density of cell p, denoted as lrd  (), is shown as follows: Definition 5. Local outlier factor of cell p, denoted as LOF  (), can be obtained as below: The above descriptions give the definition of LOF.From our previous work in [17], the proposed M-LOF is a proper approach for heterogeneous network, so here we use the modified LOF algorithm called M-LOF to make a refined choice for H-CRAN.Definitions of M-LOF and m-distance are given below.Definition 6. -distance of cell , given the positive integer , is defined as

Wireless Communications and Mobile Computing
Here  is a constant value to enhance the accuracy.
Definition 10.Modified local outlier factor of cell p is shown below: According to the definition of LOF  () and LOF  (), the choice of  is exactly sensitive.As a consequence, if there is a prior experience, we will choose the cross-validation method to estimate the parameter k.

Localization.
The last step is localization based on the output of LOF calculation.Here the neighbor cell list [25] is used to search the relation between the outage cell and its neighbor cells by geographic information with -score.In this way, the outage cells may be localized.

Results
Analysis.Based on the above steps, M-LOF would detect anomalous behaviors of handover.And in the simulation, the value of  for the LOF based detector is found between 5 and 14.
For the purpose of validation, we will first analyze the results from the spatial and temporal perspectives and then determine outage cell locations with neighbor cell list, and performance comparison between M-LOF and LOF will be given at last.

Spatial Analysis.
For spatial analysis, we focus on the temporal data of different cells at the same time period.Figure 7 shows the M-LOF of each cell at 95th TTI.It can be seen that factor values derived from six cells' inHO data are far higher than the normal value which is usually less than 1.5.Therefore, the abnormal cells can be distinguished from normal cells as they have experienced many user reconnections.
For comparison, Figure 7 also shows the detection result using MDT measurements (with RSRP and SINR) as the data source.It can be seen that the abnormal cells are hard to be detected using MDT.Because M-LOF values of cells based on MDT data are smoother than inHO data.In contrast, our proposed method using the inHO data has a better performance for abnormal cell detection in H-CRAN.

Temporal Analysis.
Figure 8 shows the temporal analysis result for M-LOF variations relative to TTI for outage cell 13.At this time, from the results we find that M-LOF value arises between TTI 90 and TTI 100, which is in accordance with our outage time setting.As initial connections of users occur between 0-5 TTI under unstable status, so we just ignore these data.cells.First, we need to set a threshold to filter anomalous cells through -score calculation as where   is the M-LOF value of kth cell and   ,   are the mean and standard deviation of M-LOF value for the anomaly scores of the other cells.The reference -score threshold is configured with a preferred value for the abnormal cells.Here the reference -score is set to 2.1 according to the computing results.We conducted two sets of evaluations to make a comparison of performance of COD based on inHO data and MDT data, respectively.The results can identify outage cell 18 and cell 61, which is the right one as shown in Figure 9, as LOF values of their neighbor cells are relatively higher than other cells.

Performance Evaluation.
In this part, we analyzed the M-LOF detection performance under varying traffic conditions.Since the behaviors of users have a direct effect on inHO data, the diagnosis process has been tested in different scenarios by changing the User Density (UD) and User Velocity (UV) parameters as the baseline setup.To evaluate the impact of the variations of UD and UV on M-LOF values, different scenarios are set up by adjusting these two parameters.The Cumulative Distribution Function (CDF) of the M-LOF values under different UV conditions is shown in Figure 10.
It can be seen that, for the low-velocity scenario, almost 80% of the M-LOF values are less than 0.5.However, there is a significant reduction in the M-LOF value as the UV increases.Likewise, a similar behavior is observed with the increase of UD, as shown in Figure 11.The UV and UD parameters influence the distribution and spread of the inHO data as explained earlier, and consequently the value of M-LOF.This leads to a low detection performance of M-LOF since it generates an increased number of false alarms.At last, this paper compares LOF and M-LOF by evaluating False Positive Rate (FPR) and False Negative Rate (FNR) and the final results of the simulation are shown in Table 3.The FNR represents the chance that an outage cell is not recognized from the outage cells, while the FPR represents the chance that a normal cell is recognized as the outage cell from all normal cells.
From Table 3, we can see that FPR and FNR of LOF based detector are 12% and 3%, respectively.It means the outage cell can be almost wholly detected, although a small part of the normal cell may be determined as an anomaly.The main cause of the 3% FNR is that a small number of anomalous cells have low traffic so that seldom user data cannot be collected when they are in outage.So these small cells are failed to be detected.However, these outages have little impact on overall network performance and user experience.Therefore, we can say the outage cells can be detected successfully with LOF based detector approach.As for M-LOF, the FPR is 6% which is smaller than LOF.The reason why M-LOF has better performance is that M-LOF pays more attention to local density.This is in line with the actual use of outage detection since the abnormal handovers caused by the neighbor outage cell are also localized.
Still, as M-LOF just modified the distance computation of LOF, so for time complexity, it is just the same as LOF, with is ( 2 ).Here  is the TTI number or cell number shown in Figure 7 or Figure 8.

Conclusions
In this paper, we propose self-organized cell outage detection architecture and approach for 5G H-CRAN.The previous methods using single data source are hard to detect the hidden outage such as RRHs.Aiming at this problem, we introduce architecture and corresponding approach to realize COD for H-CRAN and still analyze the required key technologies.Then we evaluate the approach under a use case, which utilizes the collected inHO statistics and adopt M-LOF to detect the anomalous behavior outage cells.Results show that the use case is effective.However, much work still has to be done for this architecture, as we do not consider resource allocation and COC solutions; still, we have to take more intelligent learning model such as deep learning into account as well and compare the efficiency of different algorithms under a embodied H-CRAN network deployments.Moreover, we will focus on a specific perspective for selfhealing approaches for H-CRAN and propose more practical solutions.

4. 1 .
Use Case Scenario Description.The scenario is shown in Figure 5.It consists of 19 ACE with 25 users under coverage.All these users are active users.Each ACE has 3 cells and one RRH cell is located under each ACE cell.The UEs are randomly distributed in cells to generate handover statistics and measurement reports.

Definition 7 .Definition 8 . 10 ) 9 .
-distance neighborhood of cell p, denoted as   −   () (), is the set which includes every cell which distance to cell  is smaller than m-distance.Reachability distance of m-distance of cell  concerning cell o, denoted as  −   (, ), is  −   (, ) = max { −   () ,  (, )} .(Definition Local reachability density of cell p is defined as

Figure 10 :
Figure 10: CDF of M-LOF for User Velocity.

Figure 11 :
Figure 11: CDF of M-LOF value for User Density.

Table 1 :
Cell level data.
current temporal sequence data, for instance, traffic variation of one ACE or load of NodeC.Temporal prediction can adopt Holt-Winter traffic forecast