Monte Carlo-Based Covariance Matrix of Residuals and Critical Values in Minimum L1-Norm

Defense Engineering Program, Military Institute of Engineering, Rio de Janeiro 22290-270, Brazil Graduate Program in Geodetic Sciences, Federal University of Paraná, Curitiba 81531-990, Brazil Institute of Geography, Federal University of Uberlândia, Monte Carmelo 38500-000, Brazil Graduate Program in Agriculture and Geospatial Information, Federal University of Uberlândia, Monte Carmelo 38500-000, Brazil Department of Cartographic Engineering, Geodesy and Photogrammetry, Universitat Politècnica de València, Valencia 46022, Spain


Introduction
e least-squares (LS) estimator, also known as L2-norm minimization, is the standard method in surveying data processing. In case of absence of outliers, it is the best linear unbiased estimator for the unknown parameters [1]. It also provides the maximum likelihood solution, if observational errors are normally distributed. e LS minimizes the sum of the squares of the residuals v (weighted by the weight matrix of observations P), that is, v T Pv � min . (1) However, if there are outliers in the sample, the LS will provide biased parameters [2,3]. Here, we follow the definition of Lehmann [4]: "an outlier is an observation that is so probably caused by a gross error that it is better not used or not used as it is." In surveying engineering, statistical testing procedures are commonly applied to deal with data possibly contaminated by outliers. is idea goes back to the pioneering work of Baarda [5,6]. He introduced the Data Snooping (DS) procedure in order to detect outliers in a geodetic network. e issue of outlier detection in surveying engineering has been widely explored in the literature (see, e.g., [7][8][9][10][11][12][13]). A conceptual analysis of measurement errors and outliers in geodetic networks can be found in [14]. e iterative approach of DS, also known as Iterative Data Snooping (IDS) [13], is the most well-established outlier identification method in geodetic networks. For a review of DS and its variations, we suggest [12]. In the IDS procedure, every observation i is tested against outliers by computing its normalized residual w i , i.e., the ratio between its LS residual v i and its LS residual standard deviation σ v i : e i th observation with the extreme (highest) normalized LS residual (max |w i |) is compared to its critical value |Z α/2 |, which is generally taken from the normal statistical table (α being the user-defined significance level of the test). If max | w i | > |Z α/2 |, then the respective observation is flagged as an outlier and usually excluded from the observations set. e same procedure is repeated iteratively until no observation is flagged as an outlier. It is worth mentioning that (2) is a simplification of w i for scenarios of uncorrelated observations, an assumption that was adopted throughout this work.
However, IDS involves not only a single test but also multiple hypothesis testing. In that case, the "false positive rate" (type I error probability or significance level α) is the rate of experiments in which at least one observation is flagged as an outlier, when, in fact, there is none. In this context, Lehmann [9] showed that the critical values cannot be derived from well-known univariate test distributions (e.g., normal distribution), due to the correlations between residuals. Hence, in order to fully control false positive rates in geodetic networks, critical values for IDS started to be numerically computed by means of Monte Carlo simulation (MCS) (see, e.g., [15,16]).
As mentioned, IDS, however, is based on LS residuals, which are very "sensitive" to outliers [2]. On the other hand, minimum L1-norm is one of the standard robust estimation methods in Geodesy [17]. e estimator that minimizes L1norm is more "resistant" or "robust" to outliers, as they tend to be almost completely projected onto the corresponding residuals [18]. Numerical examples can be found in [19,20]. e test against outliers by computing normalized residuals is also useful in minimum L1-norm [18,21].
Minimum L1-norm seeks the minimization of the sum of weighted absolute residuals [19]: where p is the weight vector of uncorrelated observations and |v| is the vector of absolute residuals. In particular, Minimum L1-norm is likely to provide higher outlier identification success rate than IDS for low-redundancy networks [15,22]. e minimum L1-norm solution may not be unique [23], and its vector of residuals in geodetic networks tends to be sparse, with many residuals being equal to zero (see, e.g., [20,24]). is means that corresponding observations are accepted as "perfect" observations, without any measurement errors. Geodetic observations, however, always have (at least) random errors [25]. Hence, such assumption of "perfect" observations is disconnected to the physical reality of geodetic networks. Besides, the estimator that minimizes L1-norm is biased except for some particular cases [26]. erefore, the final estimation of a network should always be performed by LS, even if minimum L1-norm was applied to identify outliers [8,27].
Minimum L1-norm has no analytical direct solution and needs to be solved by numerical methods. is work focuses on the solution by the simplex method [28] of linear programming, the most widely used approach for solving minimum L1-norm [23]. In geodetic networks, it has already been applied by many authors (see, e.g., [19,20,[29][30][31]).
Another technique commonly employed to solve minimum L1-norm in geodetic networks is the iterative reweighted least squares (IRLS) (see, e.g., [24,[32][33][34]). It has also been applied in deformation analysis of geodetic networks [35][36][37]. However, it is worth mentioning that IRLS does not seem to be always reliable [38], as it is a "local" estimator and may produce unacceptable solutions if it gets stuck in a local optimum [39].
Other techniques that were performed to compute minimum L1-norm in geodetic networks include simulated annealing [24]; genetic algorithm [39,40]; and linear programming by an interior point method [38]. Solutions of minimum L1-norm by these methods and by IRLS are outside the scope of this paper.
To actually identify outliers based on minimum L1-norm results, geodesists have already tried two different criteria: (1) the ratio between residuals and respective observation standard deviation σ i as the test statistic (see, e.g., [15,41]); and (2) the normalized residual w i (Equation 2, considering v i and σ v i from minimum L1-norm) as the test statistic (see, e.g., [21,42]). For both criteria, the chosen critical values (from which the respective observation would be classified as an outlier) were, in general, common values taken from the univariate normal statistical table, such as |Z α/2 | � 3.00 (α � 0.27%) and |Z α/2 | � 3.29 (α � 0.1%), which, as mentioned, are not appropriate even for IDS. However, it remains unclear how to properly choose critical values in minimum L1-norm.
In this context, this paper has three main contributions: (1) it provides a new procedure to compute critical values for normalized residuals in robust estimation based on MCS control of false positive rate; (2) it serves as a method to compare different quality control procedures by preserving respective critical values with the same false positive rate; and (3) it provides a Monte Carlo approach to compute the covariance matrix of residuals Ʃ v for any adjustment procedure, which is, indeed, a prerequisite to compute critical values for residuals. e outline of this paper is as follows. First, we present the new approach to estimate Ʃ v by means of MCS. We highlight that such technique can be widely applied to any adjustment procedure, including LS, the estimator that minimizes L1-norm and other robust estimators. en, also by MCS, we present a procedure to compute critical values for normalized residuals in robust estimation, based on the control of false positive rate (unprecedented in geodesy). Experiments were conducted in three different leveling networks, focusing on the minimum L1-norm solution by the simplex method and on comparison with LS/IDS.

Covariance Matrix of Residuals by Means of MCS.
Given a geodetic network with m observations and n unknowns parameters, its respective mathematical model may be defined by equations (4) and (5), with A mxn being the "design" matrix with the coefficients of the parameters vector x nx1 and l mx1 , the vector of reduced observations [43]. In equation (5) (stochastic model), σ 2 0 is the variance factor and is Ʃ Lmxm the covariance matrix of observations. As mentioned, v mx1 is the residual vector and P mxm is the (symmetric positive-definite) weight matrix of observations. e solution for x produced by the LS is given by equation (6).
Hence, applying the general law of propagation of variances [25], the covariance matrix of parameters Ʃ x is expressed by Assuming σ 2 0 � 1 and l a � Ax being the vector of adjusted observations, its covariance matrix Ʃ La is Finally, Ʃ v for the LS adjustment can be directly computed by equation (9). Note that Ʃ x , Ʃ La , and Ʃ v in equations (7)-(9) refer to LS, as they were obtained by the LS solution.
In matrix Ʃ v , elements in position (i, j) represent the covariance between residuals of observations i and j and elements (i, i) represent the variance σ 2 v i of the i th observational residual. Hence, despite the adjustment procedure, once its Ʃ v is obtained, w i of each observation (equation (2)) can be easily calculated as For other adjustment procedures, however, the computation of Ʃ v is not so immediate. Actually, some robust estimators even lack a closed-form expression for the computation of its Ʃ v in the literature. In order to fully address this issue, which is also a prerequisite to enable the calculation of critical values for normalized residuals in robust estimation, we propose the following approach ( Figure 1) to compute Ʃ v for any adjustment procedure, by means of MCS. e inputs are functional and stochastic models (matrices A and Ʃ L ) of the geodetic network and the chosen adjustment procedure (e.g., LS or minimum L1norm).  (10)) for each residual and compute the covariance COV (v i ,v j ) for each pair of residuals (equation (11)): (4) fix the estimated Ʃ v by placing variances of each i th residual in the i th element of the main diagonal and covariances in its corresponding position (i, j).
Regarding Ʃ v in minimum L1-norm, some authors (see, e.g., [8,21]) derived variances of the subset of nonzero residuals, and Junhuan [18] presented an expression valid for IRLS solution. On another hand, our MCS method computes all variances and covariances and is more general, not only for any minimum L1-norm solution but also for any adjustment procedure.

Critical Values Based on False Positive Rates.
Recently, many works (see, e.g., [9,15,16]) have investigated critical values for normalized residuals in IDS based on MCS control of false positive rate. Motivated by mentioned works, we propose the following procedure for any robust estimator ( Figure 2). In fact, the method could be applied to any adjustment procedure, but it is clearly more useful for robust  (1). Random errors generally follow normal distributions in geodetic observations. However, an advantage of an MCS approach is that it may be applied to other error distributions, as can be seen in the work of Lehmann [9].

Experiments and Results
Experiments were performed in three simulated leveling networks (Figure 3). Network A consists in one control station, m � 6 observations (height differences), and n � 3 unknowns (station heights). Network B consists in one control station, m � 10 observations, and n � 4 unknowns. Network C consists in one control station, m � 15 observations, and n � 5 unknowns. erefore, the "mean redundancy number" [13] of network A is r A � (6-3)/6 � 0.50, network B is r B � (10-4)/10 � 0.60, and network C is For all networks, the standard deviation of the observations was given by σ i � 1.0(mm) * �� K i , where K i (in km) is the length of the respective leveling line. In the ascending order of the observation index, the lengths (in km) of each leveling line were as follows: for network A, [42,38,27,22,23,33]; for network B, [37,28,33,26,40,32,39,29,34,41]; and for network C, [30,34,25,37,28,38,29,35,31,26,33,36,27,32,24]. erefore, for example, σ i of the 4 th observation of network A (which is also the lowest σ i of all networks) is σ 4(Network A) � 1.0(mm) * �� 22 √ � 4.69 mm. Minimum L1-norm adjustments were performed by the simplex method of linear programming. Normally distributed pseudorandom numbers were generated by the Mersenne Twister algorithm [44] and transformed from uniform to normal distribution by the Ziggurat technique [45]. All experiments were performed in the software Octave.
At first, we computed Ʃ v for the three networks in three different ways: (1) Ʃ v (LS-A) for the LS adjustment by its (wellestablished) analytical formulation (equation (9) Table 1 presents the elements (rounded to one decimal place) of these matrices computed for network A. Respective matrices for networks A, B, and C (rounded to three decimal places) can be found in the appendix. en, we applied the new procedure to compute critical values for normalized residuals by MCS and based on the false positive rate (significance level α) in minimum L1norm (simplex solution). For comparison and further analysis, in Table 2, we also presented critical values for IDS with the same procedure and critical values from the normal distribution statistical table.
Although the use of MCS seems computationally expensive, today, this is no longer an obstacle even for personal computers [12]. As so, the whole process of computing minimum L1-norm critical values, which includes the estimation of respective Ʃ v (L1-MCS) (Figure 2), took approximately 14, 21, and 29 minutes (for networks A, B, and C, respectively) using an Intel (R) Core (TM) i5 2.50 GHz processor with 4 Gb RAM.

Discussion
e first issue to be addressed is the covariance matrix of residuals for network A, B, and C. Table 3 presents  . Ʃ v (LS-A) and Ʃ v (LS-MCS) had very close values for the three networks. e difference between corresponding elements were always less than 0.300 mm 2 (being less than 0.100 mm 2 for more than 75% of all networks elements), and average differences computed were less than 0.060 mm 2 for all networks. ese differences are relatively very low, as the minimum variance of observations (of the 4 th observation of network A) was 4.69 2 ≈ 22.0 mm 2 . Hence, this result validates our strategy presented to compute Ʃ v based on MCS.       ) ones. e maximum variance differences (in mm 2 ) were 11.703, 24.525, and 6.606 for networks A, B, and C, respectively, and average variance differences were always higher than 4.900 mm 2 . Hence, as expected, using Ʃ v (LS-A) is not appropriate in the context of minimum L1-norm.
As expected, critical values for normalized residuals in IDS and minimum L1-norm presented in Table 2 were always different from critical values obtained in the univariate normal distribution table. Moreover, minimum L1-norm critical values were always higher than IDS ones. is characteristic highlights the importance of controlling the false positive rate properly by MCS, as was proposed in this research.
Finally, we note also that critical values for both minimum L1-norm and IDS vary from different networks. Although IDS values tend to increase with network redundancy, as already shown by [16], the same cannot be claimed for minimum L1-norm. Hence, this issue may be a subject for further investigation.

Concluding Remarks
In this work, we successfully developed and presented an approach by means of MCS to compute the covariance matrix of residuals and critical values for normalized residuals in any adjustment procedure. Since the LS method has a well-established analytical expression for the covariance matrix of residuals, our MCS strategy to estimate it was first applied to LS. We found that differences in respective elements between our strategy and analytical formulation were negligible, which validates our approach.
Numerical results of the whole procedure of computing critical values, which includes the estimation of the respective residuals covariance matrix, were presented in three leveling networks for minimum L1-norm solved by the simplex method of linear programming and compared to LS (for the covariance matrix of residuals) and IDS based on LS results (for critical values). In this sense, we highlight that, as mentioned, the minimum L1-norm solution may not be unique. Hence, conclusions of this paper are associated with the simplex solution of minimum L1-norm.
We have shown that the covariance matrix of residuals may change along with the adjustment procedure (in our case, from LS to minimum L1-norm). erefore, since robust estimators generally do not have a well-established solution to compute the covariance matrix of residuals, the approach presented for any adjustment procedure (including robust estimators) herein is a valuable strategy.
Surveyors cannot rely on critical values from univariate normal distribution either for IDS or minimum L1-norm. Moreover, critical values vary even among robust estimators.        However, unlike IDS, the critical values in minimum L1-norm do not necessarily tend to increase with network redundancy. Hence, the main contribution of this work was the proposed Monte Carlo-based critical values to control the false positive rate for normalized residuals of robust estimators. Future research should perform this proposal in order to provide a fair comparison among different quality control procedures with the same false positive rate. Furthermore, one can investigate effects of chosen false positive rates in probability levels of classes of errors in outlier identification, i.e., type II error, type III error, overidentification positive and negative, and statistical overlap (see [16] associated with robust estimators). e proposed approach for the computing residuals covariance matrix can be extended to covariance matrices other than the residuals one in future works. One can, e.g., compute the network parameters in each MCS trial and then compute the parameter covariance matrix of the chosen adjustment procedure. e relationship between network redundancy and critical values for normalized residuals in robust estimation also needs further investigation. Besides, the new approach for the computation of the covariance matrix of residuals and for the estimation of critical values for normalized residuals described here should be applied for other robust estimators and other types of geodetic networks, such as Global Navigation Satellite System (GNSS) networks.