The Optimal Selection for Restricted Linear Models with Average Estimator

and Applied Analysis 3 where p is a positive integer, θ j and ψ l are parameters, e i is random error, r s is a constant for restricting the sth parameter θ s , and a s,k and b s are some constants. Assume thatφ(z i,l , ψ l ) = ∑ ∞ l=1 z i,l ψ l and φ(a s,k , ψ k ) = ∑ ∞ k=1 a s,k ψ k converge in mean square. In (3), the explanatory variable x i,j is involved in the model on theoretical grounds or other reasons and z i,l is the additional explanatory variable that we need to make sure whether it should be included in the model. In the context of building tracking portfolio, the fixed explanatory variable x i,j stands for the historical return of the jth stock at time i, which must be selected by investors because of their personal preference or the stable earning of this stock. In a tracking portfolio, investors need to select some alternative stocks from numerous stocks to realize their expected return. So, the additional explanatory variable z i,l indicates the historical return of the lth alternative stock at time i. Since z i,l can be viewed as a series expansion, the identity (3) includes semiparametric models as special form. In fact, the model (3) generalizes the models considered by Lai and Xie [26] and Liang et al. [12]. In addition, the parameters θ j and ψ l denote, respectively, the proportion of the jth required stock and the lth alternative stock in a tracking portfolio. In (4), the parameter θ s is adjusted by a linear combination of ψ k and b s . The economic significance of (4) is that the proportion of each fixed investment varies with the proportion of all alternative investments. When investors change their preference or have acquired new information on alternative stocks, they are capable of adjusting the proportion between required stocks and alternative stocks according to (4). This implies that the increase or decrease of alternative stocks can affect the proportion of each required stock in a portfolio. Besides, if we assume that ψ k = 0, k = 1, . . . ,∞, the model (4) becomes r s θ s = b s which which has been discussed by Lai and Xie [26]. Particularly, if set r 1 = ⋅ ⋅ ⋅ = r p = 1, a 1,1 + ⋅ ⋅ ⋅ + a p,1 = ⋅ ⋅ ⋅ = a 1,∞ + ⋅ ⋅ ⋅ + a p,∞ = −1, and b 1 + ⋅ ⋅ ⋅ + b p = 1, the restricted equation (4) turns into (2). Denote an index set U = {K 1 , . . . ,K M }, where M is a positive integer. LetΘ = (θ 1 , . . . , θ p ) T andΨ = (ψ 1 , . . . , ψ ∞ ) T, where “T” stands for the transpose operation. Due to the uncertain number of z i,l in formula (3), we consider a sequence of approximately restricted models (3) and (4) with K m ∈ U, where the mth model includes the first K m elements of z i , that is, z i,1 , . . . , z i,K m , and the parameter of θ s is restricted by a linear combination withK m elements of Ψ and a constant b s . Hence, the mth approximately restricted models (3) and (4) are


Introduction
The essential task of risk investment aims to select an optimal tracking portfolio among numerous portfolios of stocks.Given a desired target and a series of stocks, a tracking portfolio is comprised by every nonempty subset of the given group of stocks so as to track the target to a certain degree.Because of the number of nonempty subsets of stocks, there exists a mass of possible tracking portfolios.Among all possible portfolios, we should find an optimal tracking portfolio whose return is closest to the targets.Statistically, a tracking portfolio is built by a group of stocks, which is equivalent to fitting a restricted linear model with the target's return as the dependent variable and returns on stocks in the group as the regressors.Since the coefficient of a regressor indicates the proportion of the investment in the corresponding stock within the total investment in the portfolio, the linear model is restricted such that all coefficients in the model sum to one.Thus, the task of choosing an optimal tracking portfolio can be accomplished by selecting an optimal restricted linear model.
In this paper, a model average technique is developed for examining the selection problem of restricted linear models.Model selection has played an important role in econometrics and statistics over the past decades.The goal of model selection is to choose a model which gives the wellposed fit for observational data.So, the investigation of model selection is an indispensable process in empirical analysis.This work proposes a procedure of minimizing -class generalized information criterion to select the optimal weights for constrained linear models.Under some conditions, we examine the asymptotic behaviors of the selection program.
Various methods have been suggested to study the problems of model selection.Knight and Fu [1] discussed the lasso-type estimators with least squares methods.To simultaneously estimate parameters and select important variables, Fan and Peng [2] proposed a method of nonconcave penalized likelihood and demonstrated that this technique had an oracle property when the number of parameters was infinite.Zou and Yuan [3] investigated the oracle theory of model selection based on composite quantile regression.Caner [4] considered model selection by the generalized method of moments estimator.In the empirical likelihood framework, Tang and Leng [5] studied the parametric estimator and variable selection for diverging numbers of parameters.Jennifer et al. [6] explored the ability of automatic selection algorithms to handle the selection problems of both variables and principal components.
Model averaging is another popular and widely used technique for model selection.The method is to average the estimators corresponding to different candidate models.Bayesian and frequentist are two main perspectives of thought in model averaging.Although their spirit and objectives are similar, the two techniques are different in inference and selection of models.In view of the Bayesian model averaging, the basic paradigm was introduced by Leamer [7].Owing to the difficulty of implementing, the approach was basically ignored until the 2000s.About recent developments of this method, the readers can refer to Brown et al. [8] and Rodney and Herman [9].Compared with Bayesian model averaging, since the method of frequentist model averaging focused on model selection rather than model averaging, it has been considered by many authors, for instance, Hjort and Claeskens [10], Hansen [11], Liang et al. [12], Zhang and Liang [13], and Hansen and Racine [14].
Generally speaking, different methods of model selection need to construct distinct model selection criteria including AIC [15], Mallows' C  [16], CV [17], BIC [18], GCV [19], GMM J-statistic [20], and FPE  [21].Zhang et al. [22] employed the generalized information criterion for selecting the regularization parameters.To choose basis functions of splines, Xu and Huang [23] showed the optimal property of a LsoCV criterion and designed an efficient Newtontype algorithm for this criterion.Focusing on the divergence measure of Kullback-Leibler, So and Ando [24] defined a generalized predictive information criterion using the bias correction of an expected weighted loglikelihood estimator.Groen and Kapetanios [25] examined the criteria of AIC and BIC to discuss consistent estimates of a factor-augmented regression.
The literature mentioned above pays more attention to the unconstrained models with independently and identically distributed random errors.Recently, Lai and Xie [26] discussed model selection for constrained models, which were limited to the homoscedastic cases.Instead of using unrestricted models or homoscedastic models, we develop a -class generalized information criterion (-GIC) to discuss the selecting problems of approximately constrained linear models with dependent errors.The -GIC is an extension of the GIC   proposed by Shao [27] and includes some conventional model selection criteria, such as BIC and GIC.We employ the technique of weighted average least squares to estimate the approximately constrained models and choose the weights through minimizing the -class generalized information criterion.Our main result demonstrates that the -class generalized information criterion is asymptotically equivalent to the average squared error.In other words, the selected weights from -GIC are asymptotically optimal.Moreover, we highlight two new results which enrich the works of Lai and Xie [26].One is that an estimate of variance is given and the estimate is proved to be consistent.Another is that the selected weights from -GIC are shown to be still asymptotically optimal, when the true variance is replaced by the suggested estimate.The finite sample properties of model selection are performed by Monte Carlo simulation.The results of simulation reveal that the proposed method of model selection is dominant over some alternative approaches.
The remainder of this paper begins with an illustration of the model set-up and average estimator in Section 2. Section 3 calculates the average squared error of the model average estimator.The -GIC criterion is introduced and its asymptotic optimality is derived in Section 4. Section 5 states some results from simulation evidence and Section 6 is conclusions.

Model Set-Up and Average Estimator
The core in risk investment is to build a tracking portfolio of stocks whose return mimics that of a chosen investment target.Let   be the return from investing in a selected target and  , be the historical return of the th stock at time .Assume that  stocks are available for building a tracking portfolio of the target.Then, a tracking portfolio consisting of all  stocks can be represented by where   is unknown parameter and   is random error.The left-hand side of ( 1) is the return of investing one dollar in the target.The right-hand side is the return of investing one dollar in the portfolio consisting of all  stocks plus random noise.
Because each parameter   stands for the proportion of investment on the corresponding stock to the total investment in the tracking portfolio, the sum of all parameters is one, namely, which means the 100 percent of the whole investment.In practice, there exist a large number of stocks.These stocks compose various portfolios that may track the target to some degree.Among all possible tracking portfolios, an ideal tracking portfolio should be the one whose return is closest to the target's return.Therefore, we need to find such an optimal tracking portfolio.Because of the dependance between a tracking portfolio and a restricted linear model, the aim of finding the optimal tracking portfolio can be accomplished by choosing an optimal restricted linear model.
In (3), the explanatory variable  , is involved in the model on theoretical grounds or other reasons and  , is the additional explanatory variable that we need to make sure whether it should be included in the model.In the context of building tracking portfolio, the fixed explanatory variable  , stands for the historical return of the th stock at time , which must be selected by investors because of their personal preference or the stable earning of this stock.In a tracking portfolio, investors need to select some alternative stocks from numerous stocks to realize their expected return.So, the additional explanatory variable  , indicates the historical return of the th alternative stock at time .Since  , can be viewed as a series expansion, the identity (3) includes semiparametric models as special form.In fact, the model (3) generalizes the models considered by Lai and Xie [26] and Liang et al. [12].In addition, the parameters   and   denote, respectively, the proportion of the th required stock and the th alternative stock in a tracking portfolio.In (4), the parameter   is adjusted by a linear combination of   and   .The economic significance of ( 4) is that the proportion of each fixed investment varies with the proportion of all alternative investments.When investors change their preference or have acquired new information on alternative stocks, they are capable of adjusting the proportion between required stocks and alternative stocks according to (4).This implies that the increase or decrease of alternative stocks can affect the proportion of each required stock in a portfolio.Besides, if we assume that   = 0,  = 1, . . ., ∞, the model (4) becomes     =   which which has been discussed by Lai and Xie [26].
For a positive integer G, let  be the maximal value of  2  ,  = 1, . . ., , and let   ,  = 1, . . ., G be the nonzero eigenvalue of Ω  .Assume that both   and   are summable, namely, Since the covariance matrix Ω  determines the algebraic structure of the model average estimator, we discuss its properties in the following.

Lemma 5.
Let A be a symmetric matrix and  be a random vector with zero expectation.Then, one has Var(  A) = 2 tr(Var()A Var()A), where Var(⋅) is the operation of variance.
Proof.The proof of this lemma is provided in Lai and Xie [26].

Average Squared Error
Denote an average squared error by where  is a fixed positive integer which is often used to eliminate the boundary effect.Andrews [28] suggested that the  can take the value of  −2 , when errors obeyed an independent and identical distribution with variance  2 .The most common situation is  = 1.The average squared error   () can be viewed as a measure of accuracy between μ() and .Obviously, an optimal estimator can obtain the minimum value of   ().In other words, we should select a weight vector  from W to make that the average squared error   () takes value that is as small as possible.
In order to investigate the problem of weight selection, we give the expression of conditional expected average squared error as follows: From the definition of   (), the following lemma can be obtained.Lemma 6.The conditional expected average squared error can be rewritten as where () =  − ().

The 𝑘-Class Generalized Information Criterion and Asymptotic Optimality
As the value of  is unknown, the average squared error   () cannot be used directly to select the weight vector .Thus, we suggest a -class generalized information criterion (-GIC) to choose the weight vector.Further, we will prove that the selected weight vector from -GIC minimizes the average squared error   ().
The -GIC for the restricted model average estimator is where ℏ is larger than one and satisfies assumption (35) mentioned below.The -class generalized information criterion extends the generalized information criterion (GIC   ) proposed by Shao [27].Because the ℏ can take different values, the -GIC includes some common information criteria for model selection such as the Mallows C  criterion ( = ℏ = 1), the GIC criterion ( = 1, ℏ → ∞), the FPE  criterion ( = 1, ℏ > 1), and the BIC criterion ( = 1, ℏ = log()).
It follows from (33) and Lemma 6 that Lemma 7 is established as desired.
Lemma 7 shows that   () is equivalent to the conditional expected average squared error plus an error bias.Particularly, when  approaches to infinite,   () is an unbiased estimation of   ().
The -GIC criterion is defined so as to select the optimal weight vector ŵ.The optimal weight vector ŵ is chosen by minimizing   ().
Obviously, the well-posed estimators of parameters are Θ and Ψ with the weight vector ŵ.Under some regular conditions, we intend to demonstrate that the selection procedure is asymptotically optimal in the following sense: If the formula (34) holds, we know that the selected weight vector ŵ from   () can realize the minimum value of   ().In other words, the weight vector ŵ is equivalent to the selected weight vector by minimizing   () and is the optimal weight vector for μ().The asymptotic optimality of   () can be established under the following assumptions.
for a large ℏ and any positive integer 1 ≤   < ∞.
The above assumptions have been employed by many literatures of model selection.For instance, the expression (35) was used by Shao [27] and the formula (36) was adopted by Li [29], Andrews [28], Shao [27], and Hansen [11].
The following lemma offers a bridge for proving the asymptotic optimality of   ().( The goal is to choose ŵ by minimizing   ().From (39), one only needs to select ŵ through minimizing where  ∈ W.
Compared with   (), it is sufficient to establish that  −1 ⟨  , () − ()⟩  and  −1 ℏ tr(()Ω  ) −  − where " →" denotes the convergence in probability.Following the idea of Li [29], we testify the main result of our work that the minimizing of -class generalized information criterion is asymptotically optimal.Now, we state the main theorem.Theorem 9.Under assumptions (35) and (36), the minimizing of -class generalized information criterion   () is asymptotically optimal, namely, (34) holds.
In practice, the covariance of errors is usually unknown and needs to be estimated.However, it is difficult to build a good estimate for Ω  in virtue of the special structure of Ω  .
In the special case, when the random errors are independent and identical distribution with variance  2 , the consistent estimator of  2 can be built for the constrained models (7) and (8).Let  =  in μ and   = tr(  ), where  corresponds to a "large" approximating model.Denote σ2  = (−  ) −1 ( − μ )  ( − μ ).The coming theorem will show that σ2 is a consistent estimate of  2 .Theorem 10.If   / → 0 when   → ∞ and  → ∞, we have σ2 Proof.Writing Λ =  +  −1   , one obtains where The above expression implies that the second term on the right hand of (56) approaches to zero.By the similar proof of (59), we obtain that the final term on the right-hand of (56) also tends to zero.The proof of Theorem 10 is complete.
In the case of independently and identically distributed errors, if we replace  2 by σ2  in the -GIC, the -GIC can be simplified to   () =      − μ() where   > 0.
By assumption (35), one knows that the above equation is close to zero.Thus, we obtain → 0. It follows from Lemma 3 and the definition of   () that ( −1 tr(())) 2 is no greater than   ().Then, the formula (63) is confirmed.Therefore, we conclude that the minimizing of   () is also asymptotically optimal.

Monte Carlo Simulation
In The sample size is varied between  = 50, 100, 150 and 200.The number of models is determined by  = ⌊3 1/3 ⌋, where ⌊⌋ stands for the integer part of .We set  so that R2 varies on a grid between 0.1 and 0.9.The number of simulation trials is Π = 500.For the -GIC, the value of  takes one and ℏ adopts the effective number of parameters.
To assess the performance of -GIC, we consider five estimators which are (1) AIC model selection estimators (AIC), (2) BIC model selection estimators (BIC), (3) leaveone-out cross-validated model selection estimator (CV, [17]), (4) Mallows model averaging estimators (MMA, [11]), and (5) -GIC model selection estimator (-GIC).Following Machado [30], the AIC and BIC are defined, respectively, as We employ the out-of-sample prediction error to evaluate each estimator.For each replication, { ℓ ,  ℓ ,  ℓ } 100 ℓ=1 are generated as out-of-sample observations.In the th simulation, the prediction error is where ŵ is selected by one of the five methods.Then, the outof-sample prediction error is calculated by where Π = 500 is the number of replication.Obviously, the smaller PE implies the better method of model estimator.
We consider PE under homoscedastic errors at first.The prediction error calculations are summarized in Figure 1.The four panels in each graph depict results for a variety of sample sizes.In each panel, PE is displayed on the -axis and R2 is displayed on the -axis.We find that the -GIC estimators are almost the best estimators among those considered.When R2 is very large, the MMA estimators can sometimes be marginally preferred to the -GIC estimators.In each panel, the AIC and CV have quite similar prediction errors.For a smaller R2 , the AIC obtains a higher prediction error than CV.However, the AIC estimators yield smaller PE than the CV estimators, when R2 is increasing.In many situations, the PE of BIC estimator with a large R2 are quite poor relative to the other methods.
Next, we discuss PE under correlative errors and the PE calculation is summarized in Figure 2. Broadly speaking, the conclusions are similar to those found in homoscedastic cases.The -GIC estimator frequently yields the most accurate estimators followed by the MMA estimator, and both average estimators enjoy significantly smaller PE than the other three estimators over a large portion of the R2 space.When R2 ≤ 0.2, the BIC estimator outperforms the -GIC estimator.Again, the AIC estimator is habitually the worst performing estimator with the CV being a close second in a large region of the R2 space.Besides, their relative efficiency relies closely on sample size with the BIC estimator revealing increasing PE and the remaining four estimators showing decreasing PE, as  increases.

Conclusions
In risk investment, an important subject is to find an optimal portfolio.The commonly used techniques are the optimization methods based on the scheme of mean-variance.However, those methods are cumbersome in computing and cannot obtain the closed solutions for some complex problems.To make up the defects of mean-variance, an alternative methodology for obtaining an optimal portfolio is to use model selection.This paper attempts to develop a statistical program to consider the selection problem of optimal tracking portfolio.We build the theoretical models of tracking portfolios by constrained linear models.Then, the selection problems of optimal portfolio boil down to choosing an optimal constrained linear model.
In the setting of unrestricted models or homoscedastic models, a large number of works investigate the problems of model selection.In distinction, we discuss the model selection for constrained models with dependent errors.The restricted models are estimated by the method of weighted average least squares.Thus, the selection of an optimal constrained model is equivalent to finding a series of optimal weights.We select the weights by minimizing a -class generalized information criterion (-GIC), which is an estimate of the average squared error from the model average fit.The procedure of selecting weights is proved to be asymptotically optimal.Through Monte Carlo simulation, the performance of -GIC is compared against that of four other methods.It is found that the -GIC gives the best performance in most cases.
There are two limitations of our results which are open for further research.First, what is the asymptotic distribution of the parametric estimators?Second, can the theory be generalized to allow for continuous weights?These questions remain to be answered by future research.In this work, we mainly adopt the method of regression analysis to solve the selection problem.In fact, some alternative mathematical tools can also be employed to explore the theoretical properties of model selection.For example, the optimal model can be selected by the methods of linear optimization or quadratic programming and we can apply the techniques of linear functional analysis and stochastic control to consider the inferences of parametric estimator.Besides, we mention that the applications of this study can also be extended to some other fields including risk management, ruin theory, and factor analysis.

Figure 1 :
Figure 1: Results for Monte Carlo design.

Figure 2 :
Figure 2: Results for Monte Carlo design.