Akaike Information Criterion (AIC) based on least squares (LS) regression minimizes the sum of the squared residuals; LS is sensitive to outlier observations. Alternative criterion, which is less sensitive to outlying observation, has been proposed; examples are robust AIC (RAIC), robust Mallows Cp (RCp), and robust Bayesian information criterion (RBIC). In this paper, we propose a robust AIC by replacing the scale estimate with a high breakdown point estimate of scale. The robustness of the proposed methods is studied through its influence function. We show that, the proposed robust AIC is effective in selecting accurate models in the presence of outliers and high leverage points, through simulated and real data examples.
Akaike Information Criterion (
Consider a multiple linear regression model:
Since the
To set the idea, the influence of outlier on
AIC for different values of

AIC 

AIC 


3.2  4.5  3.3 

2.9  4  3.2 

2.4  3.5  3.0 

1.9  3  2.8 

1.2  2.5  2.6 

0.2 

− 

− 
−2.5  2.6 
−0.5  0.3  −3  2.9 
−1  1.2  −3.5  3.1 
−1.5  1.9  −4  3.2 
Data and positions for
Effect of adding one observation
The remainder of the paper has been organized as follows. Section
The
In recent years, a good deal of attention in the literature has been focused on highbreakdown methods; that is, one method must be resistant to even multiple severe outliers. Many methods are based on minimizing a more robust scale estimate than sum of squared residuals. For example, Rousseeuw [
Consider scale estimate of errors defined by
Ronchetti [
We introduce an alternative robust version of
Robust AIC for different values of










3.7  3.9  3.4  4.5  3.9  3.9  3.4 

3.7  3.9  3.4  4  3.7  3.9  3.4 

3.7  3.9  3.4  3.5  3.6  3.9  3.4 

3.7  3.9  3.4  3  3.6  3.9  3.4 

3.7  3.9  3.4  2.5  3.7  3.9  3.4 

3.7  3.9  3.4 








−2.5  3.7  3.9  3.4 
−0.5  3.7  3.9  3.4  −3  3.7  3.9  3.4 
−1  3.7  3.9  3.4  −3.5  3.9  3.9  3.4 
−1.5  3.7  3.9  3.4  −4  4  3.9  3.4 
Consider the linear model in (
Let
Let
It is clear that the influence function in (
In this section, AIC_{LTS}, AIC_{LMS}, and AIC_{BS} are compared with
vertical outliers (outliers in the
good leverage points (outliers in the
bad leverage points (outliers in the
For all situations, we randomly generate (0%, 5%, 10%, 20%, 30%, and 40%) of outliers from
The resulting fit to the data is classified as one of the following:
correct fit (true model);
overfit (models containing all the variables in the true model plus other variables that are redundant);
under fit (models with only a strict of the variables in true model);
wrong fit (model that are neither of the above).
Tables
Percentage of select models from classical AIC, robust RAIC,

AIC  RAIC 






Correct fit  84.6%  54.4%  57.6%  65.2%  45.2% 
Overfit  15.4%  0%  0%  0%  0%  
Under fit  0%  43.6%  41.2%  34%  54.6%  
Wrong fit  0%  2.0%  1.2%  0.8%  0.2%  



Correct fit  2.8%  49.6%  56.8%  62.6%  45.4% 
Overfit  2%  0%  0%  0%  0%  
Under fit  22%  51%  42.8%  36.6%  54.6%  
Wrong fit  73.2%  1.4%  0.4%  0.8%  0%  



Correct fit  3.2%  45.0%  51%  56%  39.8% 
Overfit  0.8%  0%  0%  0%  0%  
Under fit  20.2%  52.4%  48%  43.8%  55.2%  
Wrong fit  75.8%  2.6%  1.0%  0.2%  0%  



Correct fit  4.2%  30.8%  49.4%  58%  34.2% 
Overfit  0.2%  0%  0%  0%  0%  
Under fit  22.8%  67.8%  49.6%  41.4%  65.8%  
Wrong fit  72.8%  1.4%  1.0%  0.6%  0%  



Correct fit  1.6%  0%  44.0%  51.2%  23.2% 
Overfit  0.2%  0%  0%  0%  0%  
Under fit  22.6%  73.8%  55.4%  48.2%  76.8%  
Wrong fit  72.8%  26.2%  0.6%  0.6%  0%  



Correct fit  2.2%  0%  37.6%  41.2%  12.4% 
Overfit  0.6%  0%  0%  0%  0%  
Under fit  22.8%  70.8%  62.4%  58.4%  87.4%  
Wrong fit  74.4%  29.2%  0%  0.4%  0.2% 
Percentage of select models from classical AIC, robust RAIC, and robust

AIC  RAIC 






Correct fit  0%  0%  54.6%  60.8%  43.8% 
Overfit  70.4%  0%  0%  0%  0%  
Under fit  0%  64.4%  44%  38.4%  55.4%  
Wrong fit  29.6%  35.6%  1.4%  0.8%  0.8%  



Correct fit  0%  0%  63.8%  67.8%  51.0% 
Overfit  63%  0%  0%  0%  0%  
Under fit  0%  54.6%  34.6%  31.4%  48.8%  
Wrong fit  37%  42.4%  1.6%  0.8%  0.2%  



Correct fit  0%  0%  56.6%  63.4%  49.2% 
Overfit  54.8%  0%  0%  0%  0%  
Under fit  0.2%  60.8%  42.4%  35.8%  50.6%  
Wrong fit  44.8%  39.2%  1.0%  0.8%  0.2%  



Correct fit  0%  0%  56.6%  61.4%  46% 
Overfit  37.6%  0%  0%  0%  0%  
Under fit  0.25%  29.6%  42.6%  36%  53.8%  
Wrong fit  60.4%  42.6%  0.8%  2.4%  0.2%  



Correct fit  1.0%  0%  55.2%  64.6%  51.4% 
Overfit  13.8%  0%  0%  0%  0%  
Under fit  1.2%  54.4%  43.8%  33.8%  48.6%  
Wrong fit  81%  45.4%  1.0%  1.6%  0% 
Percentage of select models from classical AIC, robust RAIC, and robust

AIC  RAIC  AIC_{LTS}  AIC_{LMS}  AIC_{BS}  


Correct fit  0.2%  47.0%  53.6%  58.4%  42.6% 
Overfit  99.8%  0%  0%  0%  0%  
Under fit  0%  50.4%  46%  41.2%  57.4%  
Wrong fit  0%  2.6%  0.4%  0.4%  0%  



Correct fit  0%  44.2%  54.6%  59.4%  40.4% 
Overfit  99.6%  0%  0%  0%  0%  
Under fit  0.2%  53.8%  44.6%  39.8%  59.6%  
Wrong fit  0.2%  2.0%  0.8%  0.8%  0%  



Correct fit  0.8%  32.4%  50.4%  56.2%  33.2% 
Overfit  97.6%  0%  0%  0%  0%  
Under fit  0.8%  66.6%  48.4%  43%  66.4%  
Wrong fit  0.8%  1.0%  1.2%  0.8%  0.2%  



Correct fit  1.8%  0%  46.0%  50.6%  27.0% 
Overfit  97.8%  96.8%  0%  0%  0%  
Under fit  2.8%  2.8%  53.6%  49.2%  72.8%  
Wrong fit  0.6%  0.4%  0.4%  0.2%  0.2%  



Correct fit  0.2%  0%  37.4%  37%  13.8% 
Overfit  97.4%  100%  0%  0%  0%  
Under fit  2.2%  0%  62.6%  62.6%  86.2%  
Wrong fit  0.2%  0%  0%  0.4%  0% 
For bad leverage points, we observe that
For good leverage points,
Stack loss data was presented by [
Stack loss data set.






27  89  42 

27  88  37 

25  90  37 

24  87  28 

22  87  18 

23  87  18 

24  93  19 

24  93  20 

23  87  15 

18  80  14 

18  89  14 

17  88  13 

18  82  11 

19  93  12 

18  89  8 

18  86  7 

19  72  8 

19  79  8 

20  80  9 

20  82  15 

20  91  15 
We applied the traditional and robust versions of
The result variable selection of stack loss data.
Selected variables  AIC  RAIC 





6.7  8.0 




7.1 

5.9  5.4  7.0 

8.4 

7.0  6.4  7.3 

8.2  9.0 




8.7  8.9  6.9  6.3  7.6 

9.1  9.0  8.1  7.3  8.8 


10.6  7.6  6.7  9.1 
The least trimmed squares (
Consider
Inserting (
The author declares that there is no conflict of interests regarding the publication of this paper.
The author would like to thank Professor Nor Aishah Hamzah and Dr. Rossita M Yunus for their support to complete this study. The author is also grateful to the anonymous reviewers for their valuable comments on earlier draft of this paper. This research has been funded by University of Malaya, under Grant no. RG20811AFR.