Robust AIC with High Breakdown Scale Estimate

. Akaike Information Criterion (AIC) based on least squares (LS) regression minimizes the sum of the squared residuals; LS is sensitive to outlier observations. Alternative criterion, which is less sensitive to outlying observation, has been proposed; examples are robust AIC (RAIC), robust Mallows Cp (RCp), and robust Bayesian information criterion (RBIC). In this paper, we propose a robust AIC by replacing the scale estimate with a high breakdown point estimate of scale. The robustness of the proposed methods is studied through its influence function. We show that, the proposed robust AIC is effective in selecting accurate models in the presence of outliers and high leverage points, through simulated and real data examples.


Introduction
Akaike Information Criterion (AIC) [1] is a powerful technique for model selection, and it has been widely used for selecting models in many fields of study.
Consider a multiple linear regression model: where x  = (x 1 , . . ., x  )  is a vector containing  explanatory variables,   is the response variable,  is a vector of  parameters,  is the intercept parameter, and   is the error component, which is independent and identically distributed (iid), with mean 0 and variance  2 .The classical AIC is defined as where SSE = ∑  2  , with   =   −  LS − x   βLS .Since the LS estimator is vulnerable in presence of outliers, it is not surprising that AIC inherits this problem.Several robust AIC alternatives to the AIC have been proposed in the literature (see [2][3][4]).For an example Ronchetti [3] has proposed and investigated the properties of a robust version of AIC, with respect to -estimation.A similar idea was used by Martin [2] for autoregressive models.Furthermore, very recently Tharmaratnam and Claeskens [4] have proposed a robust AIC with respect to -estimation and estimation, to generalize the information criteria, using the full likelihood models.In spite of all these complicated approaches of deriving robust AIC, we introduce a straightforward approach to derive robust AIC, which focuses on modifying the estimate of the scale.
To set the idea, the influence of outlier on AIC is illustrated through the presence of outliers in the -direction (called a vertical outlier) or in the -direction (called a leverage point).For this, a point with coordinates (0,  10 ) is added, where the value of  ranges between (−1.5 and 3).A similar approach is done for leverage points, by replacing the value x with ( 10 , 0) (Figure 1).Table 1 and Figure 2 show that the value of AIC increases as the size of contamination in ( 10 ,  10 ) increases, as expected, and if  or  is extremely large, the AIC is unbounded; that is, it will tend to infinity.
The remainder of the paper has been organized as follows.Section 1.1 reviews some robust regression estimate methods.In Section 1.2 we review a robust version of AIC; we discuss the robustness problem from the viewpoint of model selection and point out the sensitivity of robust AIC, based on estimator to leverage points.We derive the influence function of AIC and study its properties in Section 2. The performance of robust AIC is evaluated and compared to the commonly used classical AIC in Section 3. Finally, concluding remark is presented in Section 4.
where  is symmetric and nondecreasing on [0, ∞).Furthermore, (0) = 0, and  is almost continuously differentiable, anywhere.Furthermore,  is a function, which is less sensitive to outliers than squares, yielding the estimating equation: where  =   .If we choose  function in (3) as Tukey Biweight function where  = 1.5476 yields  =  Φ [(; )] = 2(1 −  0 ()), with Φ the standard normal cumulative distribution function and  ∼ (0, 1), the resulting estimator is then Biweight -estimator.-estimators are efficient and highly robust to unusual value of , but one rogue leverage point can break them down completely.For this reason, generalized -estimators were introduced, which solve where  is a weight function [6].
In recent years, a good deal of attention in the literature has been focused on high-breakdown methods; that is, one method must be resistant to even multiple severe outliers.Many methods are based on minimizing a more robust scale estimate than sum of squared residuals.For example, Rousseeuw [7] proposed LMS, a high-breakdown method that minimizes the median of the squared residuals, rather than their sum.In addition, Rousseeuw [8] proposed least trimmed squares (LTS), which minimizes the sum of the  smallest squared residuals, defined as βLTS,, = arg min based on the ordered absolute residuals LTS converges at the rate of  1/2 with the same asymptotic efficiency under normality, as Huber's skip estimator.The convergence rate of LMS is  1/3 , and its objective function is less smooth than LTS.

The Robust Version of AIC. Consider scale estimate of errors defined by
with SSE = ∑ = =1  2  and   =   − μ − x   β.By replacing the value of SSE in (2) in terms of , AIC in (2) can be expressed as follows: Notice that the largest values of AIC indicate that the model (and hence the explanatory variables) is less successful in explaining the variations in the response, while a small value of AIC indicates an excellent fit (i.e., model) to the response data.
Ronchetti [3] proposed a robust counterpart of the AIC statistic.The extension of AIC to RAIC is inspired by the extension of maximum likelihood estimation to estimation.The author derived RAIC for an error distribution with density function () = exp(−()).For a given constant  and a given function , the author chose the model that minimizes RAIC (, , ) where σ is some robust estimate of  and β is the estimator defined as in (3).Huber [9] suggested ) and  = 1.345.
We introduce an alternative robust version of AIC, by replacing  in (8) to the robust estimator of scale which attains a 50% breakdown point.When  = /2, (8) finds the estimates corresponding to the half samples, having the smallest sum of squares of residuals.As such, as expected, the breakdown point is 50%, and the estimated scale from LTS is For other robust estimations, the -estimator and the Biweight -estimator are compared to least trimmed of squares.Based on the results shown in Table 2, it is evident that the -estimator is much more robust than LS but suffers from leverage points.The Biweight -estimator and LTS show robust behavior: the AIC BS is stable, even though the size of the outliers increases.In the next section, we generalize these findings, by computing the associated influence functions.

Influence Function
Consider the linear model in (1), for  = 1, . . ., .Assume the distribution of errors satisfying   (x) =  0 (x/), where  is the residual scale parameter and  0 is symmetric with valid probability density function.
Let x and  be independent stochastic variables with distribution .The functional  is Fisher-consistent for the parameters (, ) at the model distribution , as follows: For a Fisher-consistent scale estimator,   (x) = (x/), for all  > 0. In general, the influence function of  at the distribution  is defined as where () is the functional defined as the solution of the objective model and Δ (x,) is the distribution which contains where   =   − μ − x   β and σ2  = /( − ) are computed from sum model.(The proof is in the Appendix.) It is clear that the influence function in (13) is bounded if the (  /  , ,  0 ) is also bounded.It is evident that AIC is nonrobust, since the LS estimate has unbounded influence function.The influence function of -estimation with respect to  is bounded by choice of , but it is unbounded with respect to the  direction.That is, where  =   and  is a certain  ×  matrix given by  (, ) = ∫   ( − x   ()) xx   (x, ) .

(16)
The influence function of the AIC using LTS estimators, following Theorem 1, takes the form We have noted that the influence function of AIC LTS is bounded in both  and  directions, as (  /  , σ2 LTS ,  0 ) is bounded.Moreover, we conclude that AIC with high breakdown point estimator provides reliable fits to the data in presence of leverage points.

Simulation Result.
The resulting fit to the data is classified as one of the following: (i) correct fit (true model); (ii) overfit (models containing all the variables in the true model plus other variables that are redundant); (iii) under fit (models with only a strict of the variables in true model); (iv) wrong fit (model that are neither of the above).
Tables 3, 4, and 5 show detailed simulation result for different versions of AIC methods.For uncontaminated datasets, the classical AIC performs best, compared to robust AIC.By introducing vertical outliers, the classical AIC selects a large proportion of wrong fit models and, as expected, the robust AIC will usually (i.e., with higher proportion) select the correct model.For bad leverage points, we observe that AIC tends to produce overfit and with high level of contamination it takes a higher proportion of wrong fit.However, AIC  tend to produce either an under fit or wrong fit model.However, the robust estimate produces comparable power in the presence of bad leverage points.
For good leverage points, AIC tends, also, to produce overfit.On the other hand, the robust AIC tend to produce either correct fit or under fit model.

Example 2 (Stack Loss Data).
Stack loss data was presented by [10].This data set consists of 21 observations on three independent variables, and it contains four outliers ).The data are given in Table 6.
We applied the traditional and robust versions of AIC methods on the data.Table 7 shows that the classical method selects the full model, and robust RAIC method ignored one of the important variables (x 2 ), whereas robust AIC methods, based on high break down points estimators, agreed with the importance of the two variables, x 1 and x 2 .

Conclusions
The least trimmed squares (LTS) and the least median squares (LMS) are robust regression methods, frequently used in practice.Nevertheless, they are not commonly used for selecting models.This paper introduced the Akaike Information Criterion (AIC) based on LTS and LMS scales, which are robust against outliers and leverage point.Our simulation result illustrated excellent performance of AIC LTS and AIC LMS for contaminated data sets.This paper focused on the AIC variable selection criteria; one might be interested in extending other robust model selection criteria to advanced robust breakdown point estimation methods, such as LTS, LMS, or BS estimators.In addition, this paper has considered regression model with continuous variables; however, future studies might consider mixed variables (i.e., continuous and dummy).
(a) vertical outliers (outliers in the  only), (b) good leverage points (outliers in the  and x), (c) bad leverage points (outliers in the x only).

Table 1 :
AIC for different values of  10 and  10 .

Table 2 :
Robust AIC for different values of  10 (vertical) and  10 (leverage). 10 AIC  AIC BS AIC LTS  10 AIC  AIC BS AIC LTS Theorem 1.Let  be some distribution other than .Take (X, ) ∼  and denote by  the error term of the model.Assume that  has the property that is differentiable with partial derivatives equal to zero at the origin (0,0).Then,  ((x, ) , AIC  , )

Table 3 :
Percentage of select models from classical AIC, robust RAIC, AIC LTS , AIC LMS , and AIC BS , with vertical outliers.

Table 4 :
Percentage of select models from classical AIC, robust RAIC, and robust AIC LTS , AIC LMS , and AIC BS with bad leverage points.

Table 5 :
Percentage of select models from classical AIC, robust RAIC, and robust AIC LTS , AIC LMS , and AIC BS , with good leverage points.

Table 6 :
Stack loss data set.

Table 7 :
The result variable selection of stack loss data.