Statistical Estimation of Portfolios for Dependent Financial Returns

1 Department of Applied Mathematics, Waseda University, Tokyo 169-8555, Japan 2 Department of Statistics/Graduate Institute of Statistics & Actuarial Science, Feng Chia University Taichung 407, Taiwan 3 Department of Mathematics, Faculty of Science, Niigata University, Niigata 950-2181, Japan 4 The Jikei University School of Medicine, Tokyo 105-8461, Japan 5 ECARES-Solvay Brussels School of Economics and Management, Universite Libre de Bruxelles, 1050 Brussels, Belgium


Introduction
Consider a sequence of random variables S 1 , S 2 , . . . converging in probability to a real constant c. By this we mean that Pr{|S T − c| > ε} → 0 as T → ∞ for all ε > 0. The simplest setting in which to obtain large-deviation results is that considering sums of independent identically distributed iid random variables on the real line. For example, we would like to consider the large excursion probabilities of sums as the sample average: where the X i , i 1, 2, . . ., are i.i.d., and T approaches infinity. Suppose that E X i m exists and is finite. By the law of large numbers, we know that S T should be converging to m. Hence, where i ξ λ is a stochastic process on −π, π with ξ λ ξ −λ and cum{dξ λ 1 , . . . , dξ λ k } η ⎛ ⎝ k j 1 λ j ⎞ ⎠ ν k λ 1 , . . . , λ k−1 dλ 1 · · · dλ k , 3.2 where cum{. . .} denotes the cumulant of k-th order, ν 1 0, ν 2 λ 1, |ν k λ 1 , . . . , λ k−1 | ≤ const k for all k and η λ ∞ j −∞ δ λ 2πj is the period 2π extension of the Dirac delta function. To simplify the problem, we assume in this paper that the process X t,T is Gaussian, namely, we assume that ν k λ 0 for all k ≥ 3; ii there exists constant K and a 2π-periodic function A : 0, 1 ×R → C with A u, λ A u, −λ and sup t,λ for all T . A u, λ and μ u are assumed to be continuous in u.
The function f u, λ : |A u, λ | 2 is called the time-varying spectral density of the process. In the following, we will always denote by s and t time points in the interval 1, T , while u and v will denote time points in the rescaled interval 0, 1 , that is u t/T . We discuss the asymptotics away from the expectation of some statistics used for the problem of discriminating between two Gaussian locally stationary processes with specified Advances in Decision Sciences 5 mean functions. Suppose that {X t,T , t 1, . . . , T; T ≥ 1} is a Gaussian locally stationary process which under the hypothesis Π j has mean function μ j u and time-varying spectral density f j u, λ for j 1, 2. Let X T X 1,T , . . . , X T,T be a stretch of the series {X t,T }, and let p j · be the probability density function of X T under Π j j 1, 2 . The problem is to classify X T into one of two categories Π 1 and Π 2 in the case that we do not have any information on the prior probabilities of Π 1 and Π 2 .

3.4
Initially, we make the following assumption.
Assumption 3.2. i We observe a realisation X 1,T , . . . , X T,T of a Gaussian locally stationary process with mean function μ j and transfer function A j • , under Π j , j 1, 2; ii the A j u, λ are uniformly bounded from above and below, and are differentiable in u and λ with uniformly continuous derivatives ∂/∂u ∂/∂λ A j ; iii the μ j u are differentiable in u with uniformly continuous derivatives.
In time series analysis, the class of statistics which are quadratic forms of X T is fundamental and important. This class of statistics includes the first-order terms in the expansion with respect to T of quasi-Gaussian maximum likelihood estimator QMLE , tests and discriminant statistics, and so forth Assume that G • is the transfer function of a locally stationary process, where the corresponding G satisfies Assumption 3.2 ii and g u is a continuous function of u which satisfies Assumption 3.2 iii , if we replace A j by G and μ j u by g u , respectively. And set G T ≡ Σ T G, G , f G u, λ ≡ |G u, λ | 2 , g T ≡ {g 1/T , . . . , g T/T } and Q T ≡ X T G −1 T X T g T X T . Henceforth, E j · stands for the expectation with respect to p j · . Set S j T Q ≡ Q T − E j Q T for j 1, 2. We first prove the large-deviation theorem for this quadratic form Q T of X T . All the proofs of the theorems are in the Appendix.

Advances in Decision Sciences
Next, one considers the log-likelihood ratio statistics. It is well known that the loglikelihood ratio criterion: gives the optimal discrimination rule in the sense that it minimizes the probability of misdiscrimination Anderson 10 . Set S j T Λ ≡ Λ T − E j Λ T for j 1, 2. For discrimination problem one gives the large-deviation principle for Λ T .

3.10
Similarly, under Π 2 , In practice, misspecification occurs in many statistical problems. We consider the following three situations. Although actually {X t,T } has the time-varying mean functions μ j u and the time-varying spectral densities f j u, λ , under Π j , j 1, 2, respectively, i the mean functions are misspecified to μ j u ≡ 0, j 1, 2; ii the spectral densities are misspecified to f j u, λ ≡ f j 0, λ , j 1, 2; 3.13 Set S j T M k ≡ M k,T − E j M k,T for j 1, 2 and k 1, 2, 3. The next result is a large-deviation theorem for the misspecified log-likelihood ratios M k,T . It is useful in investigating the effect of misspecification.

3.18
Now, we turn to the discussion of our discriminant problem of classifying X T into one of two categories described by two hypotheses a follows: Π 1 : μ 1 u , f 1 u, λ , Π 2 : μ 2 u , f 2 u, λ .

3.19
Advances in Decision Sciences 9 We use Λ T as the discriminant statistic for the problem 3.19 , namely, if Λ T > 0 we assign X T into Π 2 , and otherwise into Π 1 . Taking x −lim T → ∞ T −1 E 1 Λ T in 3.9 , we can evaluate the probability of misdiscrimination of X T from Π 1 into Π 2 as follows: 3.20 Thus, we see that the rate functions play an important role in the discriminant problem.
From these figures, we see that the magnitude of the mean function is large at u close to 0, while the magnitude of the time-varying spectral density is large at u close to 1.
Specifically, we use the formulae in those theorems concerning Π 2 to evaluate the limits of the large-deviation probabilities:

4.2
Though the result is an asymptotic theory, we perform the simulation with a limited sample size. Therefore, we use some levels of x to expect fairness, that is, we take x −0.1, −1, −10.
The results are listed in Table 1.
For each value x, the large-deviation rate of Λ T is the largest and that of M 3,T is the smallest. Namely, we see that the correctly specified case is the best, and on the other hand the misspecified to stationary case is the worst. Furthermore, the large-deviation rates −LDP   We see that the large-deviation rate of Λ T keeps the almost constant value at all the time u and frequency λ. On the other hand, that of M 1,T is small at u close to 0 and those of M 2,T and M 3,T are small at u close to 1 and λ close to 0. That is, the large-deviation probability of M 1,T is violated by the large magnitude of the mean function, while those of M 2,T and M 3,T are violated by that of the time-varying spectral density. Hence, we can conclude the misspecifications seriously affect our discrimination.

Appendix
We sketch the proofs of Theorems 3.3-3.5. First, we summarize the assumptions used in this paper.
Assumption A. 1. i Suppose that A : 0, 1 × R → C is a 2π-periodic function with A u, λ A u, −λ which is differentiable in u and λ with uniformly bounded derivative ∂/∂u ∂/∂λ A. f A u, λ ≡ |A u, λ | 2 denotes the time-varying spectral density. A • t,T : R → C are 2π-periodic functions with sup t,λ ii suppose that μ : 0, 1 → R is differentiable with uniformly bounded derivative. We introduce the following matrices see Dahlhaus 4 p.154 for the detailed definition : respectively. We need the following lemmata which are due to Dahlhaus 3,4 . Lemma A.2 is Lemma A.5 of Dahlhaus 3 and Lemma A.3 is Theorem 3.2 ii of Dahlhaus 4 . We also remark that if U T and V T are real nonnegative symmetric matrices, then Proof of Theorems 3.3-3.5. We need the cumulant generating function of the quadratic form in normal variables X T ∼ N ν j T , Σ j T . It is known that the quadratic form S j T ≡ X T H T X T h T X T − E j X T H T X T h T X T has cumulant generating function log E j e ωS j T equals to In view of Lemmas A.2 and A.3, the above ψ 1 T ω converges to ψ M 3 , given in Theorem 3.5. Clearly, ψ M 3 exists for ω ∈ D ψ M 3 {ω : 1 − ω{ f 1 u, λ /f 1 0, λ − f 1 u, λ /f 2 0, λ } > 0} and is convex and continuously differentiable with respect to ω. For a sequence {ω m } → ω 0 ∈ ∂D ψ M 3 as m → ∞, we can show that Hence, ψ M 3 D ψ M 3 ⊃ x, ∞ for every x > 0. Application of the Gärtner-Ellis theorem completes the proof.

Introduction
The conditional least squares CL estimator is one of the most fundamental estimators for financial time series models. It has the two advantages which can be calculated with ease and does not need the knowledge about the innovation process i.e., error term . Hence this convenient estimator has been widely used for many financial time series models. However, Amano and Taniguchi 1 proved it is not good in the sense of the efficiency for ARCH model, which is the most famous financial time series model. The estimating function estimator was introduced by Godambe 2, 3 and Hansen 4 . Recently, Chandra and Taniguchi 5 constructed the optimal estimating function estimator G estimator for the parameters of the random coefficient autoregressive RCA model, which was introduced to describe occasional sharp spikes exhibited in many fields and ARCH model based on Godambe's asymptotically optimal estimating function. In Chandra and Taniguchi 5 , it was shown that G estimator is better than CL estimator by simulation. Furthermore, Amano 6 applied CL and G estimators to some important time series models RCA, GARCH, and nonlinear AR models and proved that G estimator is 2 Advances in Decision Sciences better than CL estimator in the sense of the efficiency theoretically. Amano 6 also derived the conditions that G estimator becomes asymptotically optimal, which are not strict and natural. However, in Amano 6 , G estimator was not applied to a conditional heteroscedastic autoregressive nonlinear model denoted by CHARN model . CHARN model was proposed by Härdle and Tsybakov 7 and Härdle et al. 8 , which includes many financial time series models and is used widely in the finance. Kanai et al. 9 applied G estimator to CHARN model and proved its asymptotic normality. However, Kanai et al. 9 did not compare efficiencies of CL and G estimators and discuss the asymptotic optimality of G estimator theoretically. Since CHARN model is the important and rich model, which includes many financial time series models and can be assumed as return processes of assets, more investigation of CL and G estimators for this model are needed. Hence, in this paper, we compare efficiencies of CL and G estimators and investigate the asymptotic optimality of G estimator for this model. This paper is organized as follows. Section 2 describes definitions of CL and G estimators. In Section 3, CL and G estimators are applied to CHARN model, and efficiencies of these estimators are compared. Furthermore, we derive the condition of asymptotic optimality of G estimator. We also compare the mean squared errors of θ CL and θ G by simulation in Section 4. Proofs of Theorems are relegated to Section 5. Throughout this paper, we use the following notation: |A|: Sum of the absolute values of all entries of A.

Definitions of CL and G Estimators
One of the most fundamental estimators for parameters of the financial time series models is the conditional least squares CL estimator θ CL introduced by Tjφstheim 10 , and it has been widely used in the finance. θ CL for a time series model {X t } is obtained by minimizing the penalty function , and m is an appropriate positive integer e.g., if {X t } follows kth-order nonlinear autoregressive model, we can take m k . CL estimator generally has a simple expression. However, it is not asymptotically optimal in general see Amano and Taniguchi 1 . 1.

5.8
The

5.10
Hence, c is −1, and f x becomes the density function of the normal distribution.

Introduction
The modern portfolio theory has been developed since circa the 1950s. It is common knowledge that Markowitz 1, 2 is a pioneer in this field. He introduced the so-called meanvariance theory, in which we try to maximize the expected return minimize the variance under the constant variance the constant expected return . After that, many researchers followed, and portfolio theory has been greatly improved. For a comprehensive survey of this field, refer to Elton et al. 3 , for example. Despite its sophisticated paradigm, we admit there exists several criticisms against the early portfolio theory. One of them is that it blindly assumes that the asset returns are normally distributed. As Mandelbrot 4 pointed out, the price changes in the financial market do not seem to be normally distributed. Therefore, it is appropriate to use the nonparametric estimation method to find the optimal portfolio. Furthermore, it is empirically observed that financial returns are dependent. Therefore, it is unreasonable to fit the independent model to it.
One of the nonparametric techniques which has been capturing the spotlight recently is the empirical likelihood method. It was originally proposed by Owen 5, 6 as a method of 2 Advances in Decision Sciences inference based on a data-driven likelihood ratio function. Smith 7 and Newey and Smith 8 extended it to the generalized empirical likelihood GEL . GEL can be also considered as an alternative of generalized methods of moments GMM , and it is known that its asymptotic bias does not grow with the number of moment restrictions, while the bias of GMM often does.
From the above point of view, we consider to find the optimal portfolio weights by using the GEL method under the multivariate stationary processes. The optimal portfolio weights are defined as the weights which minimize the variance of the return process with constant mean. The analysis is done in the frequency domain.
This paper is organized as follows. Section 2 explains about a frequency domain estimating function. In Section 3, we review the GEL method and mention the related asymptotic theory. Monte Carlo simulations and a real-data example are given in Section 4. Throughout this paper, A and A * indicate the transposition and adjoint of a matrix A, respectively.

Frequency Domain Estimating Function
Here, we are concerned with the m-dimensional stationary process {X t } t∈Z with mean vector 0, the autocovariance matrix and spectral density matrix Suppose that information of an interested p-dimensional parameter θ ∈ Θ ⊂ R p exists through a system of general estimating equations in frequency domain as follows. Let φ j λ; θ , j 1, . . . , q, q ≥ p be m × m matrix-valued continuous functions on −π, π satisfying φ j λ; θ φ j λ; θ * and φ j −λ; θ φ j λ; θ . We assume that each φ j λ; θ satisfies the spectral moment condition where θ 0 θ 10 , . . . , θ p0 is the true value of the parameter. By taking an appropriate function for φ j λ; θ , 2.3 can express the best portfolio weights as shown in Example 2.1.  Advances in Decision Sciences   3 stationary process, hence {p t } is still stationary, and, from Herglotz's theorem, its variance is Our aim is to find the weights θ 0 θ 10 , . . . , θ m0 that minimize the variance the risk of the portfolio p t under the constrain of m i 1 θ i 1. The Lagrange function is given by where e 1, 1, . . . , 1 and ξ is Lagrange multiplier. The first order condition leads to where I is an identity matrix. Now, for fixed j 1, . . . , m, consider to take

2.9
From Herglotz's theorem, θ 10 γ i h /γ i 0 , and θ 20 γ j k /γ j 0 . Then, θ 0 θ 10 , θ 20 corresponds to the desired autocorrelations ρ ρ i h , ρ j k . The idea can be directly extended to more than two autocorrelations. Example 2.3 Whittle estimation . In this example, we set p q. Consider fitting a parametric spectral density model f θ λ to the true spectral density f λ . The disparity between f θ λ and f λ is measured by the following criterion: which is based on Whittle likelihood. The purpose here is to seek the quasi-true value θ defined by Assume that the spectral density model is expressed by the following form: where each B θ j is an m × m matrix B θ 0 is defined as identity matrix , and K is an m × m symmetric matrix. The general linear process has this spectral form, so this assumption is not so restrictive. The key of this assumption is that the elements of the parameter θ do not depend on K. We call such a parameter innovation-free. Imagine that you fit the VARMA process, for example. Innovation-free implies that the elements of the parameter θ depend on only AR or MA coefficients and not on the covariance matrix of the innovation process. Now, let us consider the equation:

Generalized Empirical Likelihood
Once we construct the estimating function, we can make use of the method of GEL as in the work by Smith 7 and Newey and Smith 8 . GEL is introduced as an alternative to GMM and it is pointed out that its asymptotic bias does not grow with the number of moment restrictions, while the bias of GMM often does.
To describe GEL let ρ v be a function of a scalar v which is concave on its domain, an open interval V containing zero. Let Λ n θ {λ : λ m λ t ; θ ∈ V, t 1, . . . , n}. The estimator is the solution to a saddle point problem The empirical likelihood EL estimator cf. 9 the exponential tilting ET estimator cf. 10 , and the continuous updating estimator CUE cf. 11 are special cases with ρ v

Advances in Decision Sciences
The following assumptions and theorems are due to Newey and Smith 8 .
vi ρ v is twice continuously differentiable in a neighborhood of zero.
iii rank G p.

Monte Carlo Studies and Illustration
In the first part of this section, we summarize the estimation results of Monte Carlo studies of the GEL method. We generate 200 observations from the following two-dimensional-AR 1 model innovation process, distributed to twodimensional t-distribution whose correlation matrix is identity, and degree of freedom is 5. The true autocorrelations with lag 1 of this process are ρ 1 1 0.3894 and ρ 2 1 0.4761, respectively. As described in Example 2.2, we estimate ρ 1 1 and ρ 2 1 by using three types of frequency domain GEL method EL, ET, and CUE . Table 1 shows the mean and standard deviation of the estimation results whose repetition time is 1000. All types work properly.
Next, we apply the proposed method to the returns of market index data. The sample consists of 7 weekly indices S&P 500, Bovespa, CAC 40, AEX, ATX, HKHSI, and Nikkei having 800 observations each: the initial date is April 30, 1993, and the ending date is August 22, 2008. Refer to Table 2 for the market of each index.  As shown in Example 2.1, we use frequency domain GEL method to estimate the optimal portfolio weights, and the results are shown in Table 3. Bovespa and ATX account for large part in the optimal portfolio.

Introduction
Since the first formation of Markowitz's mean-variance model, portfolio optimization and construction have been a critical part of asset and fund management. At the same time, portfolio risk assessment has become an essential tool in risk management. Yet there are wellknown shortcomings of variance as a risk measure for the purposes of portfolio optimization; namely, variance is a good risk measure only for elliptical and symmetric return distributions.
The proper mathematical characterization of risk is of central importance in finance. The choice of an adequate risk measure is a complex task that, in principle, involves deep consideration of the attitudes of market players and the structure of markets. Recently, value at risk VaR has gained widespread use, in practice as well as in regulation. VaR has been criticized, however, because as a quantile is no reason to be convex, and indeed, it is easy to construct portfolios for which VaR seriously violates convexity. The shortcomings of VaR led to the introduction of coherent risk measures. Artzner et al. 1 and Föllmer and Schied 2 question whether VaR qualifies as such a measure, and both find that VaR is not an adequate measure of risk. Unlike VaR, expected shortfall or tail VaR , which is defined as the expected portfolio tail return, has been shown to have all necessary characteristics of a coherent risk measure. In this paper, we use α-risk as a risk measure that satisfies the conditions of coherent risk measure see 3 . Variants of the α-risk measure include expected shortfall and tail VaR. The α-risk-minimizing portfolio, introduced as a pessimistic portfolio in Bassett et al. 3 , can be formulated as a problem of linear quantile regression.
Since the seminal work by Koenker and Bassett 4 , quantile regression QR has become more widely used to describe the conditional distribution of a random variable given a set of covariates. One common finding in the extant literature is that the quantile regression estimator has good asymptotic properties under various data dependence structures, and for a wide variety of conditional quantile models and data structures. A comprehensive guide to quantile regression is provided by Koenker 5 . Quantile regression methods use a pseudolikelihood based on an asymmetric Laplace reference density see 6 . Komunjer 7 introduced a class of "tick-exponential" distribution, which includes an asymmetric Laplace density as a particular case, and showed that the tick-exponential QMLE reduces to the standard quantile regression estimator of Koenker and Bassett 4 .
In quantile regression, one must know the conditional error density at zero, and incorrect specification of the conditional error density leads to inefficient estimators. Yet correct specification is difficult, because reliable shape information may be scarce. Zhao 8 ,Whang 9 , and Komunjer and Vuong 10 propose efficiency corrections for the univariate quantile regression model. This paper describes a semiparametrically efficient estimation of an α-risk-minimizing portfolio in place of an asymmetric Laplace reference density a standard quantile regression estimator , by using any other α-quantile zero reference density f, based on residual ranks and signs. A √ n-consistent and asymptotically normal one-step estimator is proposed. Like all semiparametric estimators in the literature, our method relies on the availability of a √ n-consistent first-round estimator, a natural choice being the standard quantile regression estimator. Under correct specifications, they attain the semiparametric efficiency bound associated with f. The remainder of this paper is organized as follows. In Section 2, we introduce the setup and definition of an α-risk-minimizing portfolio and present its equivalent formation under quantile regression settings. Section 3 contains theoretical results for our one-step estimator, and Section 4 describes its computation and performance. Section 5 gives empirical applications, and Section 6 our conclusions.

α-Risk-Minimizing Portfolio Formulation
"α-risk" can be considered a coherent measure of risk as discussed in Artzner et al. 1 . The α-risk of X, say ρ ν α X , is defined as where ν α t : min{t/α, 1} and F ← X α : inf{x : F X x ≥ α} denote the quantile function of a random variable X with distribution function F X . Here, we recall the definition of expected Advances in Decision Sciences 3 shortfall and the relationship among the tail risk measures in finance. The α-expected shortfall defined for α ∈ 0, 1 as can be shown to be a risk measure that satisfies the axioms of a coherent measure of risk. It is worth mentioning that the expected shortfall is closely related but not coincident to the notion of conditional value at risk CVaR α defined in Uryasev 11 and Pflug 12 . We note that expected shortfall and conditional VaR or tail conditional expectations are identical "extreme" risk measures only for continuous distributions, that is, To avoid confusion, in this paper, we use the term "α-risk measure" instead of terms like expected shortfall, CVaR, or tail conditional expectation. Bassett et al. 3 showed that a portfolio with minimized α-risk can be constructed via the quantile regression QR methods of Koenker and Bassett 4 . QR is based on the fact that a quantile can be characterized as the minimizer of some expected asymmetric absolute loss function, namely, where ρ α u : u α − 1{u < 0} , u ∈ R is called the check function see 5 , and 1A is the indicator function defined by 1A To construct the optimal i.e., α-risk minimized portfolio, the following lemma is needed. Lemma 2.1 Theorem 2 of 3 . Let X be a real-valued random variable with EX μ < ∞, then Then, Y Y π X π denotes a portfolio consisting of d different assets X : X 1 , . . . , X d with allocation weights π : π 1 , . . . , π d subject to d j 1 π j 1 , and the optimization problem under study is, for some prespecified expected return μ 0 , min π ρ ν α X π subject to E X π μ 0 , 1 d π 1.

2.6
The sample or empirical analogue of this problem can be expressed as
The large sample properties of β n α , especially its √ n-consistency, can be implied from the standard arguments and assumptions in the QR context see 5 .
Let W n and Σ n W be the mean vector and the covariance matrix of W i which are given by

2.11
where σ pq : for p, q 2, . . . , d. The above correlation coefficient can take values close to 1 when n/κ 2 is close to 0 with X 1 −X p / 0 and X 1 −X q / 0. Hence, the correlation of the estimated portfolio weights is possibly highly correlated among assets whose sample means differ from X 1 , while these problems are ignorable in an asymptotic inference problem if we take κ O n 1/2 . Thus far, we have seen that the α-risk-minimizing portfolio can be obtained by 2.9 , which was the result of Bassett et al. 3 . In what follows, we show that semiparametrically efficient inference of the optimal weights β n α is feasible. The quantity estimated by 2.9 can be regarded as a QR coefficient β α , defined by where F ← X|S · denotes a conditional quantile function, that is, F ← X|S α : inf{x : P X ≤ x|S ≥ α}. Note that here the QR model 2.14 has a random coefficient regression RCR interpretation of the form Z i W i β U i with componentwise monotone increasing function β and random variables U i that are uniformly distributed over 0, 1 , that is, U i ∼ Uniform 0, 1 see 5 . Here, a choice such that β u β 1 u , β 2 u , . . . , β d u . . , b d with F ξ the distribution function of some independent and identically distributed i.i.d. n-tuple ξ 1 , . . . , ξ n yields Hence, recalling that the first component of W i is 1, it follows that, for any fixed α ∈ 0, 1 , the QR coefficient β α can be characterized as the parameter b ∈ R d of a model such as where the density g of G is subject to Advances in Decision Sciences that is, G ← α 0. Let us describe this model as Z n , A n , P n Q , with P n Q : {P n b,g | b ∈ R d , g ∈ F α }, where P n b,g denotes the distribution of an observation {Z i } n i 1 . This model 2.16 is a fixed-α submodel of 2.14 and is the parametric submodel through which we will achieve semiparametric efficiency.
The model 2.16 is a quantile-restricted linear regression model. But here we have no knowledge about the true density g, other than that it belongs to F α , which allows us to identify b. So, we arbitrarily choose f from F α and call it the "reference density" and correspondingly define a "reference model" where the density f of F is subject to f ∈ F α . The goal of the next section is to construct an asymptotically efficient version of β n α based on some feasible f ∈ F α , that is, attaining the semiparametric lower bound at correctly specified density f g that nevertheless remains √ n-consistent under a misspecified density f / g .

Semiparametrically Efficient Estimation
The procedure that we will apply here to achieve semiparametric efficiency is based on the invariance principle, as introduced by Hallin and Werker 13 . To this end, first we should have locally asymptotic normality LAN; see, e.g., van der Vaart 14 for a parametric submodel P n b,g , namely, where all the stochastic convergences are taken under P b,g : P ∞ b,g . Here, the random vector Δ n b;g is called the central sequence, and the positive definite matrix I b;g is the information matrix. To ensure the LAN condition for model 2.18 , the following assumption is required.
The reference density f has finite Fisher information for location: where e b n ,i denotes the residual i.e., e b n ,i : Z i − W i b n . Consequently, we have the contiguity P n b n h/ √ n,f P n b n ,f , and of course P n b n ,f P n b,f as well. Recall that the contiguity P n Q n means that for any sequence S n , if P n S n → 0, then Q n S n → 0 also. The reason why we have specified uniform LAN, rather than LAN at single b, is the one-step improvement, which will be discussed later. By following Hallin and Werker 13 , a semiparametrically efficient procedure can be obtained by projecting Δ n b n ;f on some σ-field to which the generating group for {P n b n ,f | f ∈ F α } becomes maximal invariant see, e.g., Schmetterer 16 . For the quantile-restricted regression model 2.16 , such a σ-field is studied by Hallin et al. 6 and found to be generated by signs and ranks of the residuals. Here, let us denote the sign of a residual as S b n ,i , the rank of a residual as R n b n ,i , and the σ-field generated by them as 3.5 Then, "good" inference should be based on where U b n ,i : F e b n ,i is i.i.d. uniform on 0, 1 under P n b n ,f and hence approximated by where I fg and μ − ϕ g are consistent estimates of respectively.
Consistent estimates I fg and μ − ϕ g can be obtained in the manner of Hallin et al. 19 , which is done without the kernel estimation of g, though here we omit the details.
Therefore, the one-step estimator b n f defined by 3.8 for b is semiparametrically efficient at f g.
In our original notation, the above statement can be rewritten as, for some α ∈ 0, 1 fixed, Recall that the standard QR estimator, defined at 2.9 , is asymptotically normal see Koenker 5 : where D : Advances in Decision Sciences 9 Denote the true portfolio weight with respect to risk probability α by π 1 − 1 d−1 π 2 α , π 2 α , where π 2 α π 2 α , . . . , π d α , and its standard quantile regression and our one- Denote the block matrix of the covariance matrix of standard quantile and one-step estimators by 3.14 where submatrices D 22 and Σ 22 are d − 1 × d − 1 symmetric matrices for the covariance of portfolio weights π 2 . Then we obtain the variances of the α-risk-minimizing portfolio constructed by the standard quantile, and the one-step estimators are stated in the following proposition. Since direct evaluation gives the following statement, we skip its proof.
Proposition 3.6. The asymptotic conditional variances of an α-risk-minimizing portfolio using the standard quantile regression and one-step estimators given at X x are, respectively, For any positive definite matrices A and B, we say A ≤ B if B − A is nonnegative definite. To compare the efficiency of the standard quantile regression estimator and the onestep estimator, we need to show that To see this, as in Section 3 of Koenker and Zhao 20 , let us consider Σ : Note that Σ is a nonnegative definite matrix. If Σ −1 fg DΣ −1 fg is a positive definite, then there exists orthogonal matrix P, such that This result assures that the one-step estimator is asymptotically more efficient than the standard quantile regression estimator. From this result, it is easy to see that 3.18 Also, by taking expectation on both sides, the same inequality holds for unconditional variances.

Numerical Studies
In this section, we examine the finite sample properties of the proposed one-step estimator described in Section 3 for the cases where α 0.1 and 0.5. Our simulations are performed with two data generating processes to focus on the underlying true density g and how the choice of the reference density f might affect the finite sample performances.
The first data-generating process DGP1 is the same as that investigated by Bassett et al. 3 . For DGP1, we consider the construction of an α-minimizing portfolio from four independently distributed assets, that is, asset 1 is normally distributed with mean 0.05 and standard deviation 0.02. Asset 2 is a reversed χ 2 3 density with location and scale chosen so that its mean and variance are identical to those of asset 1. Asset 3 is normally distributed with mean 0.09 and standard deviation 0.05. Finally, asset 4 has a χ 2 3 density with identical mean and standard deviation to asset 3. DGP2 is a four-dimensional normal distribution with mean vectors the same as those of DGP1, and covariance matrix Σ σ ij i,j 1,...,4 with diag Σ 0.02, 0.02, 0.05, 0.05 and σ i,j for i / j is σ ii σ jj ρ. Here, we set ρ 0.5, which indicates that the asset returns have correlation 0.5. Notice that both DGP1 and DGP2 have the same mean and variance structures. The underlying true conditional densities of u Z − A w b for DGP1 and DGP2 are a mixture of the normal χ 2 3 and reversed χ 2 3 distribution and the normal distributions, respectively. A simulation of the estimator, for sample, size n 100, 500, and 1000 consists of 1000 replications. We choose prespecified expected return μ 0 at 0.07.
For each scenario, we computed standard quantile regression estimates β n α with corresponding portfolio weights π QR 1 − d j 2 β n j α , β n 2 α , . . . , β n d α , and our onestep estimates are defined by 3.8 for various choices of the reference density f and actual density g in the α-minimizing portfolio allocation problem.
To make the problem a pure location model, we set the variance of the estimated The true density g can be estimated by the kernel estimator for DGP1, where K is a kernel function, and h is a bandwidth. The first derivative g u is estimated by As for the DGP2, the actual density g becomes normal because the portfolio is constructed by normally distributed returns. We use the normal distribution N , the asymmetric Laplace distribution AL , the logistic distribution LGT , and the asymmetric power distribution APD with λ 1.5 for the reference density f. The density function of the asymmetric power distribution introduced by Komunjer 7 is given by When α 0.5, the APD pdf is symmetric around zero. In this case, the APD density reduces to the standard generalized power distribution GPD 21, . Special cases of the GPD include uniform λ ∞ , Gaussian λ 2 , and Laplace λ 1 distributions. When α / 0.5, the APD pdf is asymmetric. Special cases include asymmetric Laplace λ 1 , the two-piece normal λ 2 distributions.
For a given sample size, we compute simulated mean and standard deviation of π QR and π OS f and the relative efficiency Var π Table 1 gives the results of the relative efficiencies for DGP 1. When α 0.1, we see that the efficiency gains of one-step estimators with asymmetric Laplace reference density are large compared with other reference densities with n 1000, while these efficiency gains are less when sample size is n 100. When α 0.5, relative efficiency of assets 3 and 4 with asymmetric Laplace reference density is minimum, while for assets 1 and 2, relative efficiency with normal reference density is minimum. This is because of the covariance structure of Σ n W defined by 2.10 . As can be seen in Section 2, if μ 1 / μ p and μ 1 / μ q , the p, q th element of the correlation matrix defined by 2.13 has a value close to unity. In this case, the asymptotic variance of the usual quantile regression estimator becomes large, which leads to unsatisfactorily large variances in assets 3 and 4. However, the asymptotic variance of our one-step estimator does not have such problems. Table 2 gives the results of the relative efficiencies for DGP2. In line with efficiency at a correctly specified reference density f N, we see that the relative efficiency is minimal for all assets and sample sizes with α 0.1 and 0.5. Even though we misspecify the reference density f / N, there exists some sort of efficiency gain except for assets 1 and 2 of the asymmetric Laplace reference density with n 100 and α 0.1. Efficiency gains for the normal reference density and logistic reference density are almost the same because the underlying true density is a symmetric normal distribution, and the asymmetric power reference density with λ 1.5 outperforms the asymmetric Laplace reference density. Figure 1 plots the kernel densities for the estimated portfolio weights for DGP2 with α 0.5 and n 1000. We see that the standard quantile regression estimators have long  tails on both sides for all assets, whereas one-step estimators have a narrower interval and higher peak at the true weight. This confirms that the one-step estimators are more semiparametrically efficient than the standard ones.

Empirical Application
We apply our methodology to weekly log returns of the 96 stocks of the TOPIX large 100 index. The samples run from January 5, 2007, to December 2, 2011, for a total of 257 observations. The stock prices are adjusted to take into account events such as stock splits on individual securities. Preliminary tests reveal that most log return series have high values of kurtosis and negative values of skewness in general, which indicates that the log returns are non-Gaussian.
We computed the optimal portfolio allocations for α 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5. We set κ 1000 and μ −0.002, which is the third quartile of the average logreturn distribution. For the first-round estimates, we used the standard quantile regression estimator, and for the one-step estimates, we chose a normal distribution as a reference density. Since we do not have enough information about the shape of the portfolio distributions for the various choices of α, the actual density g is estimated by the kernel method. Figure 2 plots the cumulative distribution functions of the α-risk-minimizing portfolios obtained by the standard quantile regression estimates and one-step estimates for α 0.1, 0.2, 0.3, and 0.5. Summary statistics for the distributions of the different portfolios are reported in Table 3. Figure 2 and Table 3 clearly show that the optimal α-risk-minimizing portfolio manages to reduce the occurrence of events in the left tail when α is small for both standard QR estimates and one-step estimates. The standard deviation of the one-step estimates of an α-minimizing portfolio is smaller than that of the standard QR estimates. We can also observe that the range of a constructed portfolio with one-step estimates is much smaller than that of standard QR estimates, due to the semiparametric efficiency properties of our one-step estimators. When α becomes large, the difference in the standard deviation of the constructed portfolio between standard QR estimates and one-step estimates tends to become large. Hence, efficiency gains are large for α 0.5, which is the mean absolute deviation portfolio.  Table 3: Summary statistics for the α-minimizing-portfolios using quantile regression methods X π QR and one-step estimates X π Another interesting finding is that the standard QR-constructed portfolios have high-density peaks at the required quantiles for all values of α, whereas the portfolio constructed by onestep estimates has a quite moderate density reduction at the required quantiles. Consistent with economic intuition, higher risk aversion is associated with a shorter left tail. In the case where α ≤ 0.1, maximum loss is limited to less than −0.02. This result is particularly striking given that the sample includes the stock market crash of October 2008 due to the US subprime mortgage crisis and the bankruptcy of Lehman Brothers, which resulted in a weekly loss of more than −0.220 for TOPIX. The sample also includes the stock market crash of March 2011 due to the catastrophic earthquake and tsunami that hit Japan, which resulted in a weekly loss of −0.104. Figure 3 presents empirical efficient frontiers corresponding to the standard quantile regression-based portfolios and one-step estimates of a portfolio with α 0.1 and 0.5.

16
Advances in Decision Sciences Figure 3 clearly illustrates that the standard quantile regression-based portfolio is completely inefficient, far from the one-step frontier.

Summary and Conclusions
This paper considered a semiparametrically efficient estimation of an α-risk-minimizing portfolio. A one-step estimator based on residual signs and ranks was proposed, and simulations were performed to compare the finite sample relative efficiencies for the standard quantile regression estimators and the one-step one. These simulations confirmed our theoretical findings. An empirical application to construct a portfolio using 96 Japanese stocks was investigated and confirms that the one-step α-risk-minimizing portfolio has smaller variance that is obtained by the standard quantile regression estimator. Further research topics include 1 construction of portfolios without short-sale constraints and 2 extending the results to the covariates of time series with heteroskedastic returns. For the former, we impose nonnegativity of the weights by using a penalty function containing a term that diverges to infinity as any of the weights becomes negative see 22 .
For the latter, we refer to Hallin et al. 6 and Taniai 23 .

Introduction
The empirical likelihood is one of the nonparametric methods for a statistical inference proposed by Owen 1, 2 . It is used for constructing confidence regions for a mean, for a class of M-estimates that includes quantile, and for differentiable statistical functionals. The empirical likelihood method has been applied to various problems because of its good properties: generality of the nonparametric method and effectiveness of the likelihood method. For example, we can name applications to the general estimating equations, 3 the regression models 4-6 , the biased sample models 7 , and so forth. Applications are also extended to dependent observations. Kitamura 8 developed the blockwise empirical likelihood for estimating equations and for smooth functions of means. Monti 9 applied the empirical likelihood method to linear processes, essentially under the circular Gaussian assumption, using a spectral method. For short-and long-range dependence, Nordman and Lahiri 10 gave the asymptotic properties of the frequency domain empirical likelihood. As we named above, some applications to time series analysis can be found but it seems that they were mainly for stationary processes. Although stationarity is the most fundamental assumption when we are engaged in a time series analysis, it is also known that real time series data are generally nonstationary e.g., economics analysis . Therefore we need to 2 Advances in Decision Sciences use nonstationary models in order to describe the real world. Recently  proposed an important class of nonstationary processes, called locally stationary processes. They have so-called time-varying spectral densities whose spectral structures smoothly change in time.
In this paper we extend the empirical likelihood method to non-Gaussian locally stationary processes with time-varying spectra. First, We derive the asymptotic normality of the maximum empirical likelihood estimator based on the central limit theorem for locally stationary processes, which is stated in Dahlhaus 13, Theorem A.2 . Next, we show that the empirical likelihood ratio converges to a sum of Gamma distribution. Especially, when we consider a stationary case, that is, the time-varying spectral density is independent of a time parameter, the asymptotic distribution becomes the chi-square.
As an application of this method, we can estimate an extended autocorrelation for locally stationary processes. Besides we can consider the Whittle estimation which is stated in Dahlhaus 13 . This paper is organized as follows. Section 2 briefly reviews the stationary processes and explains about the locally stationary processes. In Section 3, we propose the empirical likelihood method for non-Gaussian locally stationary processes and give the asymptotic properties. In Section 4 we give numerical studies on confidence intervals of the autocorrelation for locally stationary processes. Proofs of theorems are given in Section 5.

Locally Stationary Processes
The stationary process is a fundamental setting in a time series analysis. If the process {X t } t∈Z is stationary with mean zero, it is known to have the spectral representation: where A λ is a 2π-periodic complex-valued function with A −λ A λ , called transfer function, and ξ λ is a stochastic process on −π, π with ξ −λ ξ λ and is the 2π-periodic extension of the Dirac delta function. If the process is stationary, the covariance between X t and X t k is independent of time t and a function of only the time lag k. We denote it by γ k Cov X t , X t k . The Fourier transform of the autocovariance function is called spectral density function. In the expression of 2.1 , the spectral density function is written by g λ |A λ | 2 . It is estimated by the periodogram, defined by If one wants to change the weight of each data, we can insert Advances in Decision Sciences 3 the function h x defined on 0, 1 into the periodogram: The function h x is called data taper. Now, we give a simple example of the stationary process below.
Example 2.1. Consider the following AR p process: where ε t are independent random variables with mean zero and variance 1. In the form of 2.1 , this is obtained by letting

2.5
As an extension of the stationary process, Dahlhaus 13 introduced the concept of locally stationary. An example of the locally stationary processes is the following time-varying AR p process: where a j u is a function defined on 0, 1 and ε t are independent random variables with mean zero and variance 1. If all a j u are constant, the process 2.6 reduces to stationary.
To define a general class of the locally stationary processes, we can naturally consider the time-varying spectral representation However, it turns out that 2.6 has not exactly but only approximately a solution of the form of 2.7 . Therefore, we only require that 2.7 holds approximately. The following is the definition of the locally stationary processes given by Dahlhaus 13 .
A sequence of stochastic processes X t,T t 1, . . . , T is called locally stationary with mean zero and transfer function A • t,T , if there exists a representation where the following holds.
ii There exists a constant K and 2π-periodic function for all T ; A u, λ is assumed to be continuous in u.
The time-varying spectral density is defined by g u, λ : |A u, λ | 2 . As an estimator of g u, λ , we define the local periodogram I N u, λ for even N as follows:

2.11
Here, h : R → R is a data taper with h x 0 for x / ∈ 0, 1 . Thus, I N u, λ is nothing but the periodogram over a segment of length N with midpoint uT . The shift from segment to segment is denoted by S, which means we calculate I N with midpoints t j S j −1 N/2 j 1, . . . , M , where T S M − 1 N, or, written in rescaled time, at time points u j : t j /T . Hereafter we set S 1 rather than S N. That means the segments overlap each other.

Empirical Likelihood Approach for Non-Gaussian Locally Stationary Processes
Consider an inference on a parameter θ ∈ Θ ⊂ R q based on an observed stretch X 1,T , . . . , X T,T .
We suppose that information about θ exists through a system of general estimating equations. For short-or long-memory processes, Nordman and Lahiri 10 supposed that θ 0 , the true value of θ, is specified from the following spectral moment condition: Sciences   5 where φ λ, θ is an appropriate function depending on θ. Following this manner, we naturally suppose that θ 0 satisfies the following time-varying spectral moment condition:

Advances in Decision
in a locally stationary setting. Here φ : 0, 1 × −π, π × R q → C q is a function depending on θ and satisfies Assumption 3.4 i . We give brief examples of φ and corresponding θ 0 .
Then 3.2 leads to When we consider the stationary case, that is, g u, λ is independent of the time parameter u, 3.4 becomes which corresponds to the autocorrelation with lag k. So, 3.4 can be interpreted as a kind of autocorrelation with lag k for the locally stationary processes.
Example 3.2 Whittle estimation . Consider the problem of fitting a parametric spectral model to the true spectral density by minimizing the disparity between them. For the stationary process, this problem is considered in Hosoya and Taniguchi 14 and Kakizawa 15 . For the locally stationary process, the disparity between the parametric model g θ u, λ and the true spectral density g u, λ is measured by and we seek the minimizer Under appropriate conditions, θ 0 in 3.7 is obtained by solving the equation ∂L θ /∂θ 0.
Suppose that the fitting model is described as g θ u, λ σ 2 u f θ u, λ , which means θ is free from innovation part σ 2 u . Then, by Kolmogorov's formula Dahlhaus 11, Theorem 3.2 6 Advances in Decision Sciences we can see that π −π log g θ u, λ dλ is independent of θ. So the differential condition on θ 0 becomes This is the case when we set Now, we set as an estimating function and use the following empirical likelihood ratio function R θ defined by Denote the maximum empirical likelihood estimator by θ, which maximizes the empirical likelihood ratio function R θ .

Remark 3.3.
We can also use the following alternative estimating function: instead of m j θ in 3.10 . The asymptotic equivalence of m j θ and m T j is satisfied for any j, and this is shown by straightforward calculation.
To show the asymptotic properties of θ and R θ 0 , we impose the following assumption. iii The data taper h : R → R with h x 0 for all x / ∈ 0, 1 is continuous on R and twice differentiable at all x / ∈ p where p is a finite set and sup x/ ∈p |h x | < ∞. iv For k 1, . . . , 8, 3.14 Remark 3.5. Assumption 3.4 ii seems to be restrictive. However, this is required to use the central limit theorem for locally stationary processes cf. Assumption A.1 and Theorem A.2 of Dahlhaus 13 Most of the restrictions on N result from the √ T -unbiasedness in the central limit theorem . See also A.3. Remarks of Dahlhaus 13 for the detail. Now we give the following theorem. Theorem 3.6. Suppose that Assumption 3.4 holds and X 1,T , . . . , X T,T is realization of the locally stationary process which has the representation 2.8 . Then,

3.16
Here Σ 1 and Σ 2 are the q by q matrices whose i, j elements are 3.17

3.18
respectively, and Σ 3 is the q by q matrix which is defined as In addition, we give the following theorem on the asymptotic property of the empirical likelihood ratio R θ 0 .

Advances in Decision Sciences
Theorem 3.7. Suppose that Assumption 3.4 holds and X 1,T , . . . , X T,T is realization of a locally stationary process which has the representation 2.8 . Then, where Z i is distributed as Gamma 1/2, 1/ 2a i , independently.
Remark 3.9. If the process is stationary, that is, the time-varying spectral density is independent of the time parameter u, we can easily see that Σ 1 Σ 2 and the asymptotic distribution becomes the chi-square with degree of freedom q.
Remark 3.10. In our setting, the number of the estimating equations and that of the parameters are equal. In that case, the empirical likelihood ratio at the maximum empirical likelihood estimator, R θ , becomes one cf. 3, page 305 . That means the test statistic in Theorem 3.7 becomes zero when we evaluate it at the maximum empirical likelihood estimator.

Numerical Example
In this section, we present simulation results of the estimation of the autocorrelation in locally stationary processes which is stated in Example 3.1. Consider the following time-varying AR 1 process: ∼ Gamma 3/π, 3/π 1/2 − 3/π 1/2 and a u u − b 2 , b 0.1, 0.5, 0.9. The observations X 1,T , . . . , X T,T are generated from the process 4.1 , and we make the confidence intervals of the autocorrelation with lag k 1, which is expressed as  collecting the points θ which satisfy −π −1 log R θ < z α where z α , is α-percentile of the asymptotic distribution in Theorem 3.7. We admit that Assumption 3.4. ii is hard to hold in a finite sample experiment, but this Monte Carlo simulation is purely illustrative and just for investigating how the sample size and the window length affect the results of confidence intervals.
We set a confidence level as α 0.90 and carry out the above procedure 1000 times for each case. Table 1 shows the averages of lower and upper bounds, lengths of the intervals, and the successful rates. Looking at the results, we find out that the larger sample size gives the shorter length of the interval, as expected. Furthermore, the results indicate that the larger window length leads to the worse successful rate. We can predict that the best rate N/T lies around 0.02 because the combination T, N 500, 10 seems to give the best result among all.

Some Lemmas
In this subsection we give the three lemmas to prove Theorems 3.6 and 3.7. First of all, we introduce the following function L N : R → R, which is defined by the 2π-periodic extension of

5.1
The properties of the function L N are described in Lemma A.4 of Dahlhaus 13 . 10 Advances in Decision Sciences

5.4
As in the proof of Theorem 2.2 of Dahlhaus 12 we replace A • Advances in Decision Sciences 11 for j 2, . . . , k. The replacement error is smaller than In the same way we replace A • u j T −N/2 s j ,T ω j by A u j −N/2 s j /T, λ j for j 2, . . . , k, and then we obtain

5.8
The integral part is equal to So we get

5.10
Since h x 0 for x / ∈ 0, 1 , we only have to consider the range of s which satisfies 1 ≤ s u k T − u j T ≤ N − 1. Therefore we can regard −N/2 s u k T − u j T /T as O N/T , and Taylor expansion of A around u j gives the first equation of the desired result. Moreover, as in the same manner of the proof of Lemma A.5 of Dahlhaus 13 we can see that which leads to the second equation.
Proof. We set

5.13
Henceforth we denote φ u, λ, θ 0 by φ u, λ for simplicity. This lemma is proved by proving the convergence of the cumulants of all orders. Due to Lemma A.8 of Dahlhaus 13 the expectation of P M is equal to Then the α, β -element of the covariance matrix of P M is equal to Due to Lemma A.9 of Dahlhaus 13 , this converges to φ i u, λ φ j u, μ g u, λ g u, μ q 4 λ, −λ, μ dλ dμ du.

5.17
By Assumption 3.4 iv the covariance tends to Σ 1 . The kth cumulant for k ≥ 3 tends to zero due to Lemma A.10 of Dahlhaus 13 . Then we obtain the desired result.

Lemma 5.3. Suppose 3.2 and Assumption 3.4 hold. Then,
Proof. First we calculate the mean of α, β -element of S M :

5.19
Due to Dahlhaus 12, Theorem 2.2 i the second term of 5.19 becomes

5.20
Next we consider cov I N u j , λ , I N u j , μ

5.22
It converges to zero when λ / − μ and is equal to when λ −μ. Similarly the second term of 5.21 converges to zero when λ / μ and is equal to 5.23 when λ μ. We can also apply Lemma 5.1 to the third term of 5.21 , and analogous calculation shows that it converges to zero. After all we can see that 5.19 converges to Σ 2 αβ , the α, β -element of Σ 2 .
Next we calculate the second-order cumulant: This is equal to

5.25
Using the product theorem for cumulants cf. 16

5.26
We can apply Lemma 5.1 to all cumulants which is seen in 5.25 , and the dominant term of the cumulants is o N 4 so 5.25 tends to zero. Then we obtain the desired result.

Proof of Theorem 3.6
Using the lemmas in Section 5.1, we prove Theorem 3.6. To find the maximizing weights w j s of 3.11 , we proceed by the Lagrange multiplier method. Write where α ∈ R q and γ ∈ R are Lagrange multipliers. Setting ∂G/∂w j 0 gives So the equation M j 1 w j ∂G/∂w j 0 gives γ −M. Then, we may write where the vector α α θ 0 satisfies q equations given by

5.30
Therefore, θ is a minimizer of the following minus empirical log likelihood ratio function l θ :

5.33
Then, from 5.30 and 5.32 , we have

5.37
which leads to

5.38
Similarly, we have

5.39
Next, from Lemma 5.3, we obtain Finally, we have Now, 5.34 , 5.35 and 5.38 -5.41 give Because of Lemma 5.2, we have

Proof of Theorem 3.7
Using the lemmas in Section 5.1, we prove Theorem 3.7. The proof is the same as that of

5.47
Every w j > 0, so 1 Y j > 0, and therefore by 5.47 , we get

5.52
The above expectation is written as

5.53
From

5.58
Noting that we can see that the final term in 5.58 has a norm bounded by

5.60
Hence, we can write where o p M −1/2 . By 5.57 , we may write where for some finite K

5.63
Advances in Decision Sciences 21 We may write

5.64
Here it is seen that

5.65
And finally from Lemmas 5.2 and 5.3, we can show that

5.66
Then we can obtain the desired result.

Introduction
Portfolio optimization is said to be "myopic" when the investor does not know what will happen beyond the immediate next period. In this framework, basic results about single period portfolio optimization such as mean-variance analysis are justified for short-term investments without portfolio rebalancing. Multiperiod problems are much more realistic than single-period ones. In this framework, we assume that an investor makes a sequence of decisions to maximize a utility function at each time. The fundamental method to solve this problem is the dynamic programming. In this method, a value function which expresses the expected terminal wealth is introduced. The recursive equation with respect to the value function is so-called Bellman equation. The first order conditions FOCs to satisfy the Bellman equation are key tool in order to solve the dynamic problem. The original literature on dynamic portfolio choice, pioneered by Merton 1 in continuous time and by Samuelson 2 and Fama 3 in discrete time, produced many important 2 Advances in Decision Sciences insights into the properties of optimal portfolio policies. Unfortunately, since it is known that the closed-form solutions are obtained only for a few special cases, the recent literature uses a variety of numerical and approximate solution methods to incorporate realistic features into the dynamic portfolio problem such as Ait-Sahalia and Brandet 4 and Brandt et al. 5 .
We introduce an procedure to construct the dynamic portfolio weights based on AR bootstrap.
The simulation algorithm is as follows; first, we generate simulation sample paths of the vector random returns by using AR bootstrap. Based on the bootstrapping samples, an optimal portfolio estimator, which is applied from time T − 1 to the end of trading time T , is obtained under a constant relative risk aversion CRRA utility function. Note that this optimal portfolio corresponds "myopic" single period optimal portfolio. Next we approximate the value function by linear functions of the past observation. This idea is similar to that of 4, 5 . Then, optimal portfolio weight estimators at each trading time are obtained based on the value function. Finally, we construct an optimal investment strategy as a sequence of the optimal portfolio weight estimators.
This paper is organized as follows. We describe the basic idea to solve multiperiod optimal portfolio weights under a CRRA utility function in Section 2. In Section 3, we discuss an algorithm to construct the estimator involving the method of AR bootstrap. The applications of our method are in Section 4.

Multiperiod Optimal Portfolio
Suppose the existence of a finite number of risky assets indexed by i, i 1, . . . , m . Let X t X 1 t , . . . , X m t denote the random excess returns on m assets from time t to t 1 suppose that S i t is a value of asset i at time t. Then, the return is described as 1 X i t S i t /S i t−1 . Suppose too that there exists a risk-free asset with the excess return X f Suppose that B t is a value of risk-free asset at time t. Then, the return is described as 1 X f B t /B t − 1 . Based on the process {X t } T t 1 and X f , we consider an investment strategy from time 0 to time T where T ∈ N denotes the end of the investment time. Let w t w 1 t , . . . , w m t be vectors of portfolio weight for the risky assets at the beginning of time t 1. Here we assume that the portfolio weights w t can be rebalanced at the beginning of time t 1 and measurable predictable with respect to the past information F t σ X t , X t−1 , . . . . Here we make the following assumption.
Assumption 2.1. There exists an optimal portfolio weight w t ∈ R m satisfied with | w t X t 1 1 − w t e X f | 1 we assume that the risky assets exclude ultra high-risk and high-return ones, for instance, the asset value S i t 1 may be larger than 2S i t , almost surely for each time t 0, 1, . . . , T − 1 where e 1, . . . , 1 .
Then the return of the portfolio from time t to t 1 is written as 1 X f w t X t 1 − X f e assuming that S t : S 1 t , . . . , S m t B t e, the portfolio return is written as w t S t 1 1 − w t e B t 1 / w t S t 1 − w t e B t 1 X f w t X t 1 − X f e and the return from time 0 to time T called terminal wealth is written as 2.1

3
Suppose that a utility function U : x → U x is differentiable, concave, and strictly increasing for each x ∈ R. Consider an investor's problem Following a formulation by the dynamic programming e.g., Bellman 6 , it is convenient to express the expected terminal wealth in terms of a value function V t : subject to the terminal condition V T U W T . The recursive equation 2.3 is the so-called Bellman equation and is the basis for any recursive solution of the dynamic portfolio choice problem. The first-order conditions FOCs here ∂/∂w t E V t 1 |F t E ∂/∂w t V t 1 |F t .is assumed . in order to obtain an optimal solution at each time t are These FOCs make up a system of nonlinear equations involving integrals that can in general be solved for w t only numerically.
According to the literature e.g., 5 , we can simplify this problem in case of a constant relative risk aversion CRRA utility function, that is,

Advances in Decision Sciences
where γ denotes the coefficient of relative risk aversion. In this case, the Bellman equation simplifies to From this, the value function V t can be expressed as and Ψ t also satisfies a Bellman equation subject to the terminal condition Ψ T 1. The corresponding FOCs in terms of Ψ t are

Estimation
Suppose that {X t X 1 t , . . . , X m t ; t ∈ Z} is an m-vector AR 1 process defined by where μ μ 1 , . . . , μ m is a constant m-dimensional vector, t 1 t , . . . , m t are independent and identically distributed i.i.d. random m-dimensional vectors with E t 0

and E t t Γ Γ is a nonsingular m by m matrix , and
A is a nonsingular m by m matrix. We make the following assumption.
Given {X −n 1 , . . . , X 0 , X 1 , . . . , X t }, the least-squares estimator A t of A is obtained by solving Let F t n · denote the distribution which puts mass 1/ n t at t Step 1. First, we fix the current time t which implies that the observed stretch n t is fixed. Then, we can generate {X b 1 ,b 2 ,t * s } by 3.4 .
Step 2. Next, for each b 0 1, . . . , B, we obtain w b 0 ,t T −1 as the maximizer of 3.5 or the solution of

6
Advances in Decision Sciences with respect to w. Here we introduce a notation "E * s · " as an estimator of conditional expectation E · | F s , which is defined by E T −1 corresponds to the estimator of myopic single period optimal portfolio weight.
Step 3. Next, we construct estimators of Ψ T −1 . Since it is difficult to express the explicit form of Ψ T −1 , we parameterize it as linear functions of X T −1 as follows; Note that the dimensions of θ T −1 in Ψ 1 and Ψ 2 are m 1 and m m 1 /2 m 1, respectively. The idea of Ψ 1 and Ψ 2 is inspired by the parameterization of the conditional expectations in 5 .
In order to construct the estimators of Ψ i i 1, 2 , we introduce the conditional least squares estimators of the parameter θ i T −1 , that is,

3.10
Then, by using θ Step 4. Based on the above Ψ i , we obtain w b 0 ,t T −2 as the maximizer of

3.11
Advances in Decision Sciences 7 or the solution of with respect to w. This w b 0 ,t T −2 does not correspond to the estimator of myopic single period optimal portfolio weight due to the effect of Ψ i .
Step 6. Then, we define an optimal portfolio weight estimator at time t as w t t : w b 0 ,t t by Step 4. Note that w t t is obtained as only one solution because X Step 7. For each time t 0, 1, . . . , T − 1, we obtain w t t by Steps 1-6. Finally, we can construct an optimal investment strategy as { w t

Examples
In this section we examine our approach numerically. Suppose that there exists a risky asset with the excess return X t at time t and a risk-free asset with the excess return X f 0.01. We assume that X t is defined by the following univariate AR 1 model: Let w t be a portfolio weight for the risky asset at the beginning of time t 1. Suppose that an investor is interested in the investment strategy from time 0 to time T . Then the terminal wealth is written as 2.1 . Applying our method, the estimator W T can be obtained by where w t is the estimator of optimal portfolio under the CRRA utility function defined by 2.5 . In what follows, we examine the effect of W T for a variety of n initial sample size , B resampling size , A AR parameter , Γ variance of t , γ relative risk aversion parameter , and Ψ defined by 3.7 or 3.8 .  It can be seen that X show similar behavior with X t .      Next, we construct the optimal portfolio estimator w t t along the lines with Steps 2-7. Here we apply the approximated solution for 3.5 or 3.11 following 5 , that is,

12 Advances in Decision Sciences
This approximate solution describes a fourth-order expansion of the value function around 1 X f w s describes a second-order expansion . According to 5 , a second-order expansion of the value function is sometimes not sufficiently accurate, but a fourth-order expansion includes adjustments for the skewness and kurtosis of returns and their effects on the utility of the investor. Figure 2 shows time series plots for single portfolio return 1 X f w t X t 1 −X f , Line 1 , cumulative portfolio return W T , Line 2 , and value of utility function 1/ 1 − g W 1−g T , Line 3 for γ 5, 10 and 20. The solid line shows the investment only for risk-free asset i.e., w t 0 , the dotted line with shows myopic single period portfolio i.e., Ψ i 1 and the dotted line with shows dynamic multiperiod portfolio by using Ψ 1 .
Regarding the single-portfolio return, we can not argue the best investment strategy among the risk-free, the myopic portfolio and the dynamic portfolio investment. However, to look at the cumulative portfolio return or the value of utility function, it is obviously that the dynamic portfolio investment is the best one. The difference between the myopic and dynamic portfolio is due to Ψ and is called "hedging demands" because by deviating from the single period portfolio choice, the investor tries to hedge against changes in the investment opportunities. In view of the effect of γ, we can see that the magnitude of the hedging demands decreases with increased amount of γ.
We can see that for all T , the means of terminal wealth W T are larger than that of riskfree investment i.e., 1 X f T . In view of the distribution of W T , the means are larger than the medians q 0.5 which shows the asymmetry of the distribution. Among the myopic, dynamic portfolio using Ψ 1 and Ψ 2 , dynamic portfolio using Ψ 2 is the best investment strategy in view of the means of W T or 1/ 1 − g W 1−g T . There are some cases that the means of W T for dynamic portfolio using Ψ 1 are smaller than those for myopic portfolio. This phenomenon would show the inaccuracy of the approximation of Ψ. In addition, in view of the dispersion of W T , the dynamic portfolio's one is relatively smaller than the myopic portfolio's one.
Example 4.2 sample size n and resampling size B . In this example, we examine effect of the initial sample size n and the resample size B . Let μ 0.02, A 0.1, Γ 0.05, T 10, and γ 5. In the same manner as Example 4.1, we consider the effect of W T for n 10, 100, 1000 and B 5, 20, 100. Figure 3 shows the box plots of the terminal wealth W T for each n and B.
It can be seen that the medians tend to increase with increased amount of n and B. In addition, the wideness of the box plots decreases with increased amount of n and B. This phenomenon shows the accuracy of the approximation of X * t .

Example 4.3 AR Parameter
A and variance of ∈ t Γ . In this example, we examine effect of the AR parameter A and the variance of t Γ . Let μ 0.02, n 100, B 100, T 10, and γ 5. In the same manner as Example 4.1, we consider the effect of W T for A 0.01, 0.1, 0.2, and Γ 0.01, 0.05, 0.10. Figure 4 shows the box plots of the terminal wealth W T for each A and Γ.
Obviously, the medians increase with decreased amount of Γ which shows that the investment result is preferred when the amount of t is small. On the other hand, the wideness Advances in Decision Sciences 13 of the box plots increases with increased amount of A which shows that the difference of the investment result is wide when the amount of A is large.

Introduction
In discussion of the relations between time series, concepts of dependence and causality are frequently invoked. Geweke 1 and Hosoya 2 have proposed measures of dependence and causality for multiple stationary processes see also Taniguchi et al. 3 . They have also showed that these measures can be additively decomposed into frequency-wise. However, it seems to be restrictive that these measures are constants all the time. Priestley 4 has developed the extensions of prediction and filtering theory to nonstationary processes which have evolutionary spectra. Alternatively, in this paper we generalize measures of dependence and causality to multiple locally stationary processes.
When we deal with nonstationary processes, one of the difficult problems to solve is how to set up an adequate asymptotic theory. To meet this  introduced an important class of nonstationary processes and developed the statistical inference. We give the precise definition of multivariate locally stationary processes which is due to Dahlhaus 8 . 2 Advances in Decision Sciences  N/2, . . . , 1, . . . , T, . . . , T N/2; T, N ≥ 1 is called locally stationary with mean vector 0 and transfer function matrix A • if there exists a representation . . , ζ d Z λ is a complex valued stochastic vector process on −π, π with ζ a λ ζ a −λ and cum dζ a 1 λ 1 , . . . , dζ a k λ k η a 1 , . . . , a k 1, . . . , d Z , where cum{· · · } denotes the cumulant of kth order, and η λ ∞ j −∞ δ λ 2πj is the period 2π extension of the Dirac delta function.
ii There exists a constant K and a 2π-periodic matrix valued function for all a, b 1, . . . , d Z and T ∈ N. A u, λ is assumed to be continuous in u.
We call f u, λ : A u, λ ΩA u, λ * the time-varying spectral density matrix of the process, where Ω κ a,b a,b 1,...,d Z . Write then { t } becomes a white noise process with E t 0 and Var t Ω. Our objective is the generalization of dependence and causality measures to locally stationary processes and construction of test statistics which can examine the nonstationary effect of actual time series data. The paper, organized as follows. Section 2 explains the generalization of causality measures to multiple locally stationary processes. Since this extension is natural, we do is not explain the original idea of the causality measures in stationary case and recommend to refer Geweke 1 and Hosoya 2 for it. In Section 3 we introduce the nonparametric spectral estimator of multivariate locally stationary processes and explain their asymptotic properties. Finally, we propose the test statistics for linear dependence and show their performance in terms of empirical numerical example in Section 4.

Measurements of Linear Dependence and Causality for Nonstationary Processes
Here, we generalize measures of dependence and causality to multiple locally stationary processes. The assumptions and results of this section are straightforward extension of the original idea in stationary case. To avoid repetition, Geweke  We obtain the best one-step linear predictor of Z t 1,T by projecting the components of the vector onto H Z t,T , so here projection implies componentwise projection. We denote the error of prediction by ξ t 1,T . Then, for locally stationary process we have where δ s,t is the Kronecker delta function. Note that ξ t,T 's are uncorrelated but do not have identical covariance matrices; namely, G t,T are time-dependent. Now, we impose the following assumption on G t,T .
Note that each H t,T j ξ t−j,T , j 0, 1, . . . is projection of Z t,T onto the closed subspace spanned by ξ t−j,T . Now, we have the following Wold decomposition for locally stationary processes. 2.1 , 2.2 , and 2.4 , v t,T is deterministic, and E v s,T ξ t,T ≡ 0.

Lemma 2.2 Wold decomposition . If {Z t,T } is a locally stationary vector process of d Z components, then Z t,T u t,T v t,T , where u t,T is given by
If only u t,T occurs, we say that Z t,T is purely nondeterministic. 2.14 Next, we decompose measures of linear causality into frequency-wise. To define frequency-wise measures of causality, we introduce the following analytic facts.

Lemma 2.5. The analytic matrix Φ t,T z corresponding to a fundamental process {ε t } (for {Z t,T })
is maximal among analytic matrices Ψ t,T z with components from the class H 2 , and satisfying the boundary condition 2.8 ; that is, Although the following assumption is natural extension of Kolmogorov's formula in stationary case see, e.g., 9 , it is not straightforward and unfortunately, so far, we cannot prove it from more simple assumption. We guess it requires another completely technical paper.

2.16
Now we define the process {η t,T } as we have

2.20
we have the following lemma.
Lemma 2.7. Φ t,T z is an analytic function in the unit disc with Φ t,T 0 Φ t,T 0 * G t,T and thus maximal, such that the time-varying spectral density f t,T λ has a factorization where W T ω M ∞ ν −∞ W M ω 2πν is the weight function and M > 0 depends on T , and I N u, λ is the localized periodogram matrix over the segment { uT −N/2 1, uT N/2} defined as

3.2
Here h : 0, 1 → R is a data taper and H 2,N N s 1 h s/N 2 . It should be noted that I N u, λ is not a consistent estimator of the time-varying spectral density. To make a consistent estimator of f u, λ we have to smooth it over neighbouring frequencies.
Now we impose the following assumptions on W · and h · .
Assumption 3.2. The weighted function W : R → 0, ∞ satisfies W x 0 for x / ∈ −1/2, 1/2 and is continuous and even function satisfying which plays a role of kernel in the time domain.
Furthermore, we assume the following.
The following lemmas are multivariate version of Theorem 2.2 of Dahlhaus 10 and Theorem A.2 of Dahlhaus 7 see also 11 .

3.9
have, asymptotically, a normal distribution with zero mean vector and covariance matrix V whose i, j -the element is a 2 ,a 3 ,a 4 b 1 ,b 2 3.11

Testing Problem for Linear Dependence
In this section we discuss the testing problem for linear dependence. The average measure of linear dependence is given by the following integral functional of time varying spectral density: Advances in Decision Sciences We consider the testing problem for existence of linear dependence: For this testing problem, we define the test statistics S T as then, we have the following result. where the asymptotic variance of S T is given by To simplify, {Z t,T } is assumed to be Gaussian locally stationary process. Then, the asymptotic variance of S T becomes the integral functional of the time-varying spectral density:

4.9
If we take V 2 Next, we introduce a measure of goodness of our test. Consider a sequence of alternative spectral density matrices: where b u, λ is a d Z × d Z matrix whose entries b ab u, λ are square-integrable functions on 0, 1 × −π, π . Let E g T · and V f · denote the expectation under g T u, λ and the variance under f u, λ , respectively. It is natural to define an efficacy of L T by in line with the usual definition for a sequence of "parametric alternatives." Then we see that

4.13
For another test L * T we can define an asymptotic relative efficiency ARE of L T relative to L * T by  If we take the test statistic based on stationary assumption as another test L * T , we can measure the effect of nonstationarity when the process concerned is locally stationary process.

4.14
Finally, we discuss a testing problem of linear dependence for stock prices of Tokyo Stock Exchange. The data are daily log returns of 7 companies; 1 : HITACHI 2 : The results are listed in Table 1. It shows that all values for each two companies are large. Since under null hypothesis the limit distribution of L T is standard normal, we can conclude hypothesis is rejected. Namely, the linear dependencies exist at each two companies. In particular, the values both among electric appliance companies and among automobile companies are significantly large. Therefore, we can see that the companies in the same business have strong dependence.
In Figures 1 and 2, the daily linear dependence between HONDA and TOYOTA and between HITACHI and SHARP is plotted. They show that the daily dependencies are not constant and change in time. So, it seems to be reasonable that we use the test statistic based on nonstationary assumption.

Introduction
The Government Pension Investment Fund GPIF of Japan was established in April 1st 2006 as an independent administrative institution with the mission of managing and investing the Reserve Fund of the employees' pension insurance and the national pension http://www. gpif.go.jp/for more information 1 . It is the world's largest pension fund $1.4 trillions in assets under management as of December 2009 , and it has a mission of managing and investing the Reserve Funds in safe and efficient investment with a long-term perspective. Business management targets to be achieved by GPIF are set by the Minister of Health, Labour, and Welfare based on the law on the general rules of independent administrative agencies. In the actuarial science, "required Reserve Fund" for pension insurance has been investigated for a long time. The traditional approach focuses on the expected value of future obligations and interest rate. Then, the investment strategy is determined for exceeding the expected value of interest rate. Recently, solvency for the insurer is defined in terms of random values of future obligations e.g., Olivieri and Pitacco 2 . In this paper, we assume that the Reserve Fund is defined in terms of the random interest rate and the expected future obligations. Then, we propose optimal portfolios by optimizing the randomized Reserve Fund. The GPIF invests in a portfolio of domestic and international stocks and bonds. In this paper, we consider the optimal portfolio problem of the Reserve Fund under two econometric specifications for the asset's returns.
First, we select the optimal portfolio weights based on the maximization of the Sharpe ratio under three different functional forms for the portfolio mean and variance, two of them depending on the Reserve Fund at the end-of-period target about 100 years . Following the asset structure of the GPIF, we split the assets into cash and domestic and foreign bonds on one hand and domestic and foreign equity on the other. The first type of assets are assumed to be short memory, while the second type are long memory. To obtain the optimal portfolio weights, we rely on bootstrap. For the short memory returns, we use wild bootstrap WB . Early work focuses on providing first-and second-order theoretical justification for the wild bootstrap in the classical linear regression model see, e.g., 3 . Gonçalves and Kilian 4 show that WB is applicable for the linear regression model with conditional heteroscedastic such as stationary ARCH, GARCH, and stochastic volatility effects. For the long memory returns, we apply sieve bootstrap SB . Bühlmann 5 establishes consistency of the autoregressive sieve bootstrap. Assuming that the long memory process can be written as AR ∞ and MA ∞ processes, we estimate the long memory parameter by means of the Whittle's approximate likelihood 6 . Given this estimator, the residuals are computed and resampled for the construction of the bootstrap samples, from which the optimal portfolio estimated weights are obtained. We study the usefulness of these procedures with an application to the GPIF assets.
Second, we consider the case when the returns are time dependent and follow a heavytailed. It is known that one of the stylized facts of financial returns are heavy tails. It is, therefore, reasonable to use the stable distribution, instead of the Gaussian, since it allows for skewness and fat tails. We couple this distribution with the conditional heteroskedasticity autoregressive nonlinear CHARN model that nests many well-known time series models, such as ARMA and ARCH. We estimate the parameters and the optimal portfolio by means of empirical likelihood.
The paper is organized as follows. Section 2 sets the Reserve Fund portfolio problem. Section 3 focuses on the first part, that is, estimation in terms of the Sharpe ratio and discusses the bootstrap procedure. Section 4 covers the CHARN model under stable innovations and the estimation by means of empirical likelihood. Section 5 concludes.

Reserve Funds Portfolio with End-of-Period Target
Let S i,t be the price of the ith asset at time t i 1, . . . , k , and let X i,t be its log-return. Time runs from 0 to T . The paper, we consider that today is T 0 and T is the end-of-period target. Hence the past and present observations run for t 0, . . . , T 0 , and the future until the end-ofperiod target for t T 0 1, . . . , T. The price S i,t can be written as Sciences   3 where S i,0 is the initial price. Let F i,t denote the Reserve Fund on asset i at time t and be defined by

Advances in Decision
where c i,t denotes the maintenance cost at time t. By recursion, F i,t can be written as We gather the Reserve Funds in the vector F t F 1,t , . . . , F k,t . Let F t α α F t be a portfolio form by the k Reserve Funds, which depend on the vector of weights α α 1 , . . . , α k . The portfolio Reserve Fund can be expressed as a function of all past returns

2.4
We are interested in maximizing F t α at the end-of-period target F T α It depends on the future returns, the maintenance cost, and the portfolio weights. While the first two are assumed to be constant from T 0 to T the constant return can be seen as the average return over the T − T 0 periods , we focus on the optimality of the weights that we denote by α opt .

Sharpe-Ratio-Based Optimal Portfolios
In the first specification, the estimation of the optimal portfolio weights is based on the maximization of the Sharpe ratio:

Advances in Decision Sciences
under different functional forms of the expectation μ α and the risk σ α of the portfolio. We propose three functional forms, two of them depending on the Reserve Fund. The first one is the traditional based on the returns: where E X T and V X T are the expectation and the covariance matrix of the returns at the end-of-period target. The second form for the portfolio expectation and risk is based on the vector of Reserve Funds: where E F T and V F T indicate the mean and covariance of the Reserve Funds at time T . Last, we consider the case where the portfolio risk depends on the lower partial moments of the Reserve Funds at the end-of-period target: where F is a given value. Standard portfolio management rules are based on a mean-variance approach, for which risk is measured by the standard deviation of the future portfolio value. However, the variance often does not provide a correct assessment of risk under dependency and non-Gaussianity. To overcome this problem, various optimization models have been proposed such as mean-semivariance model, mean-absolute deviation model, mean-variance-skewness model, mean-C VaR model, and mean-lower partial moment model. The mean-lower partial moment model is an appropriate model for reducing the influence of heavy tails.
The k returns are split into pand q-dimensional vectors {X S t ; t ∈ Z} and {X L t ; t ∈ Z}, where S and L stand for short and long memory, respectively. The short memory returns correspond to cash and domestic and foreign bonds, which we generically denote by bonds. The long memory returns correspond to domestic and foreign equity, which we denote as equity.
Cash and bonds follow the nonlinear model where μ S is a vector of constants, H : R mp → R p × R p is a positive definite matrix-valued measurable function, and S t S 1,t , . . . , S p,t are i.i.d. random vectors with mean 0 and covariance matrix Σ S . By contrast, equity returns follow a long memory nonlinear model Advances in Decision Sciences with −1/2 < d < 1/2, and L t L 1,t , . . . , L p,t are i.i.d. random vectors with mean 0 and covariance matrix Σ L .
We estimate the optimal portfolio weights by means of bootstrap. Let the superindexes S, b and L, b denote the bootstrapped samples for the bonds and equity, respectively, and B the total number of bootstrapped samples. In the sequel, we show the bootstrap procedure for both types of assets. * b 3.13 And the bootstrapped Reserve Fund portfolio is 3.14 Finally, the estimated portfolio weights that give the optimal portfolio are α opt arg max where μ b α and σ b α may take any of the three forms introduced earlier but be evaluated in the bootstrapped returns or Reserve Funds.

An Illustration
We consider monthly log-returns from January 31    and follow 3.6 . Figure 1 shows the five assets. Cash is virtually constant, and equities are significantly more volatile than bonds and with averages that are slightly higher than those of bonds.
We estimate the optimal portfolio weights, α opt1 , α opt2 , and α opt3 , corresponding to the three forms for the expectation and risk of the Sharpe ratio, and we compute the trajectory of the optimal Reserve Fund for t T 0 1, . . . , T. Because of liquidity reasons, the portfolio weight for cash is kept constant to 5%. The target period is fixed to 100 years, and the maintenance cost is based on the 2004 Pension Reform. Table 1 shows the estimated optimal portfolio weights for the three different choices of the portfolio expectation and risk. The weight of domestic bonds is very high and clearly dominates over the other assets. Domestic bonds are low risk and medium return, which is in contrast with equity that shows higher return but also higher risk, and with foreign bonds that show low return and risk. Therefore, in a sense, domestic bonds are a compromise between the characteristic of the four equities and bonds. Figure 2 shows the trajectory of the future Reserve Fund for different values of the yearly return assumed to be constant from T 0 1 to T ranging from 2.7% to 3.7%. Since the investment term is extremely long, 100 years, the Reserve Fund is quite sensitive to the choice of the yearly return. In the 2004 Pension Reform, authorities assumed a yearly return of the portfolio of 3.2%, which corresponds to the middle thick line of the figure.

Optimal Portfolio with Time-Dependent Returns and Heavy Tails
In this section, we consider the second scenario where returns follow a dependent model with stable innovations. The theory of portfolio choice is mostly based on the assumption that investors maximize their expected utility. The most well-known utility is the Markowitz's mean-variance function that is optimal under Gaussianity. However, it is widely acknowledged that financial returns show fat tails and, frequently, skewness. Moreover, the variance may not always be the best risk measure. Since the purpose of GPIF is to avoid making a big loss at a certain point in future, risk measures that summarize the probability that the Reserve Fund is below the prescribed level at a certain future point, such as value at risk VaR , are more appropriate 7 . In addition, the traditional mean-variance approach considers that returns are i.i.d., which is not realistic as past information may help to explain today's distribution of returns.
We need a specification that allows for heavy tails and skewness and time dependencies. This calls for a general model with location and scale that are a function of past observations and with innovations that are stable distributed. The location-scale model for the returns is the conditional heteroscedastic autoregressive nonlinear CHARN , which is fairly general and it nests important models such as ARMA and ARCH.
Estimation of the parameters in a stable framework is not straightforward since the density does not have a closed form Maximum likelihood is feasible for the i.i.d. univariate case thanks to the STABLE packages developed by John Nolan-see Nolan 8 and the website http://academic2.american.edu/∼jpnolan/stable/stable.html. For more complicated cases, including dynamics, maximum likelihood is a quite complex task. . We rely on empirical likelihood, which is one of the nonparametric methods, as it has been already studied in this context 9 . Once the parameters are estimated, we simulate samples of the returns, which are used to compute the Reserve Fund at the end-of-period target, and estimate the optimal portfolio weights by means of minimizing the empirical VaR of the Reserve Fund at time T .
Suppose that the vector of returns X t ∈ R k follows the following CHARN model: where F μ : R kp → R k is a vector-valued measurable function with a parameter μ ∈ R p 1 and H σ : R kq → R k × R k is a positive definite matrix-valued measurable function with a Advances in Decision Sciences 9 parameter σ ∈ R p 2 . Each element of the vector of innovations ε t ∈ R k is standardized stable distributed: ε i,t ∼ S α i , β i , 1, 0 and ε i,t 's are independent with respect to both i and t. We set θ μ, σ, α, β , where α α 1 , . . . , α k and β β 1 , . . . , β k . The stable distribution is often represented by its characteristic function: where δ ∈ R is a location parameter, γ > 0 is a scale parameter, β ∈ −1, 1 is a skewness parameter, and α ∈ 0, 2 is a characteristic exponent that captures the tail thickness of the distribution: the smaller the α the heavier the tail. The distributions with α 2 correspond to the Gaussian. The existence of moments is given by α: moments of order higher than α do not exist, with the case of α 2 being an exception, for which all moments exist. The lack of important moments may, in principle, render estimation by the method of moments difficult. However, instead of matching moments, it is fairly simple to match the theoretical and empirical characteristic function evaluated at a grid of frequencies 9 . Let be the residual of the CHARN model. If the parameters μ and σ are the true ones, the residuals i,t should be independently and identically distributed to S α i , β i , 1, 0 . So the aim is to find the estimated parameters such that the residuals are i.i.d. and stable distributed, meaning that their probability law is expressed by the above characteristic function. Or, in other words, estimate the parameters by matching the empirical and theoretical characteristic functions and minimizing their distance. Let J be the number of frequencies at which we evaluate the characteristic function: ν 1 , . . . , ν J . That makes, in principle, a system of J matching equations. But since the characteristic function can be split into the real and imaginary parts, φ ν E cos νε i,t iE sin νε i,t , we double the dimension of the system by matching these parts. Let Re φ ν and Im φ ν be the real and imaginary parts of the theoretical characteristic function, and cos ν i,t and sin ν i,t the empirical counterparts. The estimating functions are for each i 1, . . . , k, and gather them into the vector ψ θ t ψ θ 1,t , . . . , ψ θ k,t . 4.5 The number of frequencies J and the frequencies themselves are chosen arbitrary. Feuerverger and McDunnough 10 show that the asymptotic variance can be made arbitrarily close to the Cramér-Rao lower bound if the number of frequencies is sufficiently large and the grid is sufficiently fine and extended. Similarly, Yu 11, Section 2.1 argues that, from the viewpoint of the minimum asymptotic variance, many and fine frequencies are the appropriate. However, Carrasco and Florens 12 show that too fine frequencies lead to a singular asymptotic variance matrix and we cannot calculate its inverse. Given the estimating functions 4.5 , the natural estimator is constructed by GMM: where W is a weighting matrix defining the metric its optimal choice is typically the inverse of the covariance matrix of ψ θ t and the expectations are replaced by sample moments. GMM estimator can be generalized to the empirical likelihood estimator, which was originally proposed by Owen 13 as nonparametric methods of inference based on a data-driven likelihood ratio function see also 14 , for a review and applications . It produces a better variance estimate in one step, while, in general, the optimal GMM requires a preliminary step and a preliminary estimation of an optimal W matrix. We define the empirical likelihood ratio function for θ as where p p 1 , . . . , p T 0 and the maximum empirical likelihood estimator is Qin and Lawless 15 show that this estimator is consistent, asymptotically Gaussian, and with covariance matrix Once the parameters are estimated, we compute the optimal portfolio weights and the portfolio Reserve Fund at the end-of-period target. Because of a notational conflict, the weights are now denoted by a a 1 , . . . , a k . And, for simplicity, we assume that there is no maintenance cost, so 2.5 simplifies to The procedure to estimate the optimal portfolio weights is as follows.

11
Step 1. For each asset i 1, . . . , k, we simulate the innovation process based on the maximum empirical likelihood estimator α i , β i .
Step 2. We calculate the predicted log-returns for t T 0 1, . . . , T and based on the estimators μ, σ and the simulated ε t obtained in Step 1.
Step 3. For a given portfolio weight a, we calculate the predicted values of fund at time T , F T a , with 4.10 .
Step 4. We repeat Step 1-Step 3 M times and save F

4.13
Step 5. Minimize g a with respect to a: a * arg min a g a .

An Illustration
In this section, we apply the above procedure to real financial data. We consider the same monthly log-returns data in Section 3.1. domestic bond DB , domestic equity DE , foreign bond FB , and foreign equity FE are assumed to follow the following ARCH 1 model: X t σ t ε t , σ 2 bX 2 t−1 , 4.14 respectively. Here b > 0 and ε t i.i.d.
∼ S α, β, 1, 0 . Cash is virtually constant so we assume the log-return of cash as 0, permanently. Set the present Reserve Fund F T 0 1 and the target period is fixed to half years. Table 2 shows the estimated optimal portfolio weights for the different prescribed level F. The weights of domestic and foreign bonds tend to be high when F is small. Small F implies that we want to avoid the loss. On the contrary, the weights of equities become higher when F is larger. Large F implies that we do not want to miss the chance of big gain. This result seems to be natural because bonds are lower risk less volatile than equities.

Conclusions
In this paper, we study the estimation of optimal portfolios for a Reserve Fund with an endof-period target in two different settings. In the first setting, one assets are split into short

Introduction
Since the random walk is a martingale sequence, the best predictor of the next term becomes the value of this term. In this sense, the random walk is used as a model expressing equitableness and the effectiveness of various finance phenomena in economics. Furthermore, because the random walk is a unit root process, taking the difference of the random walk, we can recover the independent sequence. However, the information of the original sequence will be lost by taking the difference when it does not include a unit root. Therefore, the testing of the existence of unit root in the original sequence becomes important.

Advances in Decision Sciences
In this section, we review the fundamental asymptotic results for unit root processes. Let {ε j } be i.i.d. 0, σ 2 random variables, where σ 2 > 0, and define the partial sum which is the so-called random walk process. Random walk corresponds to the first-order autoregressive AR 1 model with unit coefficient. Therefore, random walk is included in unit root I 1 processes which is a class of nonstationary processes. Let C C 0, 1 be the space of all real-valued continuous functions defined on 0, 1 . For random walk process, we construct the sequence of the processes of the partial sum {R T } in C as It is well known that the partial sum process {R T } converge weakly to a standard Brownian motion on 0, 1 , namely, where L · denotes the distribution law of the corresponding random elements. This result is the so-called functional central limit theorem FCLT see Billingsley 1 . The FCLT result can be extended to the unit root process where the innovation is general linear process. We consider a sequence { R T } of a stochastic process in C defined by where r j j i 1 u i and {u j } is assumed to be generated by Here, {ε j } is a sequence of i.i.d. 0, σ 2 random variables, and {α j } is a sequence of constants which satisfies ∞ l 0 l|α l | < ∞; therefore, {u j } becomes stationary process. Using the Beveridge and Nelson 2 decomposition, it holds see, e.g., Tanaka 3 The asymptotic property of LSE for stationary autoregressive models has been well established see, e.g., Hannan 4 . On the other hand, due to its nonstationarity, the Advances in Decision Sciences 3 LSE of random walk does not satisfy asymptotic normality. However, we can derive the limiting distribution of LSE of unit root process from the FCLT result. For more detailed understanding about unit root process with i.i.d. or stationary innovation, refer to, for example, Billingsley 1 and Tanaka 3 .
In the above case, the {u j }'s are stationary and hence, have constant variance, while covariances depend on only time differences. This is referred to as the homogeneous case, which is too restrictive to interpret empirical data, for example, empirical financial data. Recently, an important class of nonstationary processes have been proposed by Dahlhaus see, e.g., Dahlhaus 5,6 , called locally stationary processes. In this paper, we alternatively adopt locally stationary innovation process, which has smoothly changing variance. Since the LSP innovation has time-varying spectral structure, it is suitable for describing the empirical financial time series data. This paper is organized as follows. In the appendix, we review the extension of the FCLT results to the cases that the innovations are locally stationary process. Namely, we explain the FCLT for unit root, near unit root, and general integrated processes with LSP innovations. In Section 2, we obtain the asymptotic distribution of the least squares estimator for each case of the appendix. In Section 3, we also consider the testing problem for unit root with LSP innovation. Finally, in Section 4, we discuss the extensions of LSE, which include various famous estimators as special cases. It is easy to check Therefore, the continuous mapping theorem CMT leads to L U 1,T , V 1,T → L H 1 X and 2.6

Least Squares Estimator for Near Unit Root Process
We next consider the least squares estimator ρ T for model A.11 in the case that β t ≡ β is a constant on 0, 1 , namely, y j,T ρ T y j−1,T u j,T , j 1, . . . , T , 2.7 with ρ T 1 − β/T . Then, we have Advances in Decision Sciences   5 where Let us define a continuous function It is easy to check Therefore, the CMT leads to L U 2,T , V 2,T → L H 2 Y and

Least Squares Estimator for I d Process
Furthermore, we consider the least squares estimator x ν 2 dν.

2.15
It is easy to check Therefore, the CMT leads to L U 3,T , V 3,T → L H 3 X {d−1} and

2.17
The equality above is due to d − 1 -times differentiability of X {d−1} .

Testing for Unit Root
In the analysis of empirical financial data, the existence of the unit root is an important problem. However, as we see in the previous section, the asymptotic results between unit root and near unit root processes are quite different the drift term appeared in the limiting Advances in Decision Sciences 7 process of near unit root . Therefore, we consider the following testing problem against the local alternative hypothesis: We should assume that σ 2 is a unit to identify the models. Let the statistics S 1,T be constructed in 2.3 . Recall that, as T → ∞, under H 0 , Since { ∞ l 0 α l ν } 2 , ∞ l 0 α l ν 2 are unknown, we construct a test statistic where u j,T x j,T − x j−1,T . A nonparametric time-varying spectral density estimator f u, λ is given by where λ l 2π/T l − π, l 1, . . . , T − 1 and μ k 2π/T k − π, k 1, . . . , T − 1. Here, I N u, λ is the local periodogram around time u given by where · denotes Gauss symbol and, for real number a, a is the greatest integer that is less than or equal to a. Furthermore, we employ the following kernel functions and the orders of bandwidth for smoothing in time and frequency domain, respectively, which are optimal in the sense that they minimize the mean squared error of nonparametric estimator see Dahlhaus 6 ; however, we simply multiply the orders of bandwidth by the constants equal to one. Then, it can be established that, under H 0 , We now have to deal with statistics for which numerical integration must be elaborated. Let R be such a statistic, which takes the form R U/V . Using Imhof's 7 formula gives us distribution function of R, where φ s; x is the characteristic function of xV − U, namely, However, so far we do not have the explicit form of the distribution function of the estimator. Therefore, we cannot perform a numerical experiment except for the clear simple cases. It includes the complicated problem in the differential equation and requires one further paper for solution.

Extensions of LSE
In this section, we consider the extensions of LSE ρ T for near random walk model y j,T ρ T y j−1,T u j,T , where {u j,T } is generated from the time-varying MA ∞ model in A.1 , ρ j,T 1 − 1/T β j/T , β t ∈ C 0, 1 , y 0,T √ TσY 0 , and Y 0 ∼ N γ Y , δ Y is independent of {ε j } and X 0 . Then, we define a sequence {Y T } of partial sum processes in C as Then, we can obtain The integration by parts and Ito's formula lead to A.15

A.3. I d Process with Locally Stationary Disturbance
A.17 In the estimation of portfolios, it is natural to assume that the utility function depends on exogenous variable. From this point of view, in this paper, we develop the estimation under the utility function depending on exogenous variable. To estimate the optimal portfolio, we introduce a function of moments of the return process and cumulant between the return processes and exogenous variable, where the function means a generalized version of portfolio weight function. First, assuming that exogenous variable is a random process, we derive the asymptotic distribution of the sample version of portfolio weight function. Then, an influence of exogenous variable on the return process is illuminated when exogenous variable has a shot noise in the frequency domain. Second, assuming that exogenous variable is nonstochastic, we derive the asymptotic distribution of the sample version of portfolio weight function. Then, an influence of exogenous variable on the return process is illuminated when exogenous variable has a harmonic trend. We also evaluate the influence of exogenous variable on the return process numerically.

Numerical Studies for Stochastic Exogenous Variables
This subsection provides some numerical examples which show the influence of Z t on Ω ij .
Example 2.4. For a risk-free asset X 0 t and risky asset X t , we consider construction of optimal portfolios αX t α 0 X 0 t . Here {X t } is the return process of the risky asset, which is generated by X t θX t − 1 ε t μ 1 , 2.19 where E{ε t } 0, Var{ε t } σ 2 . We assume that X 0 t μ, and that the exogenous variable in the frequency domain is given by Z λ δ λ . Write, which are covarances between Z t and X t , and show an influence Z t on X t . From Figure 1, it is seen that as θ tends to 1, and λ a 3 tends to 0, then Ω 33 increases. If θ tends to −1 and λ a 3 tends to −π, π, then Ω 33 also increases, which entails that the exogenous variables have big influence on the asymptotics of estimators when θ is close to the unit root of AR 2.2 .
Remark 2.5. Ω 13 is robust for the shot noise in Z t at λ λ a 3 .

Portfolio Estimation for Nonstochastic Exogenous Variables
So far we assumed that the sequence of exogenous variables {Z t } is a random stochastic process. In this section, assuming that {Z t } is a nonrandom sequence, we will propose a portfolio estimator, and elucidate the asymptotics. We introduce the following quantities, A j,k n t 1 X j t Z k t n n t 1 Z 2 k t , B j,m,k n t 1 X j t X m t Z k t n n t 1 Z 2 k t .

3.1
We assume that Z t 's satisfy Grenander's conditions G1 -G4 with a n j,k h n−h t 1 Z j t h Z k t . 3.2 G1 lim n → ∞ a n j,j 0 ∞, j 1, . . . , q .
G3 a n j,k h / a n j,j h a n k,k h ρ j,k h o 1/ √ n for j, k 1, . . . , q, h ∈ Z.

Numerical Studies for Nonstochastic Exogenous Variables
Letting {X t } and {Z t } be scalar processes, we investigate an influence of non-stochastic process {Z t } on {X t }. The figures below show influence of harmonic trends {Z t } on V j, m, k : j , m , k in Ω j,m,k . In these cases V j, m, k : j , m , k measures the amount of covariance between XX and Z. where ε t 's are i. i. d. N 0,1 variables. Next, suppose that {Z t } consists of harmonic trends with period μ and the quarter period. We plotted the graph of V j, m, k : j , m , k in Figure 2. which proves the asymptotic normality.

Introduction
The CAPM is one of the typical models of risk asset's price on equilibrium market and has been used for pricing individual stocks and portfolios. At first, Markowitz 1 did the groundwork of this model. In his research, he cast the investor's portfolio selection problem in terms of expected return and variance. Sharpe 2 and Lintner 3 developed Markowitz's idea for economical implication. Black 4 derived a more general version of the CAPM. In their version, the CAPM is constructed based on the excess of the return of the asset over zero-beta return E R i E R 0m β im E R m − E R 0m , where R i and R m are the return of the ith asset and the market portfolio and R 0m is the return of zero-beta portfolio of the market portfolio. Campbell et al. 5 discussed the estimation of CAPM, but in their work they did not discuss the time dimension. However, in the econometric analysis, it is necessary to investigate this model with the time dimension; that is, the model is represented as R i,t α im β im R m,t i,t . Recently from the empirical analysis, it is known that the return of 2 Advances in Decision Sciences asset follows a short-memory process. But Granger 6 showed that the aggregation of shortmemory processes yields long-memory dependence, and it is known that the return of the market portfolio follows a long-memory process. From this point of view, first, we show that the return of the market portfolio and the error process t are long-memory dependent and correlated with each other.
For the regression model, the most fundamental estimator is the ordinary least squares estimator. However, the dependence of the error process with the explanatory process makes this estimator to be inconsistent. To overcome this difficulty, the instrumental variable method is proposed by use of the instrumental variables which are uncorrelated with the error process and correlated with the explanatory variable. This method was first used by Wright 7 , and many researchers developed this method see Reiersøl 8 ,Geary 9 ,etc. . Comprehensive reviews are seen in White 10 . However, the instrumental variable method has been discussed in the case where the error process does not follow long-memory process, and this makes the estimation difficult.
For the analysis of long-memory process, Robinson and Hidalgo 11 considered a stochastic regression model defined by y t α β x t u t , where α, β β 1 , . . . , β K are unknown parameters and the K-vector processes {x t } and {u t } are long-memory dependent with E x t 0, E u t 0. Furthermore, in Choy and Taniguchi 12 , they consider the stochastic regression model y t βx t u t , where {x t } and {u t } are stationary process with E x t μ / 0, and Choy and Taniguchi 12 introduced a ratio estimator, the least squares estimator, and the best linear unbiased estimator for β. However, Robinson and Hidalgo 11 and Choy and Taniguchi 12 assume that the explanatory process {x t } and the error process {u t } are independent.
In this paper, by the using of instrumental variable method we propose the two-stage least squares 2SLS estimator for the CAPM in which the returns of the individual asset and error process are long-memory dependent and mutually correlated with each other. Then we prove its consistency and CLT under some conditions. Also, some numerical studies are provided.
This paper is organized as follows. Section 2 gives our definition of the CAPM, and we give a sufficient condition that return of assets as short dependence is generated by the returns of market portfolio and error process which are long-memory dependent and mutually correlated each other. In Section 3 we propose 2SLS estimator for this model and show its consistency and asymptotic normality. Section 4 provides some numerical studies which show interesting features of our estimator. The proof of theorem is relegated to Section 5.

Advances in Decision Sciences
The next example prepares the asymptotic variance formula of B 2SLS to investigate its features in simulation study. 3.20

Numerical Studies
In this section, we evaluate the behaviour of B 2SLS in the case p 1 in 2.7 numerically.
Example 4.1. Under the condition of Example 3.3, we investigate the asymptotic variance behaviour of B 2SLS by simulation. Figure 3 plots V * d X , d Z for 0 < d X < 1/2 and 0 < d Z < 1/2.
From Figure 3, we observe that, if d Z 0 and if d X 1/2, then V * becomes large, and otherwise V * is small. This result implies only in the case that the long-memory behavior of Z t is weak and the long-memory behavior of X t is strong, V * is large. Note that long-memory behaviour of Z t makes the asymptotic variance of the 2SLS estimator small, but one of X t makes it large.  where X t , w t , and u t are the scalar long-memory processes which follow FARIMA 0, d 1 , 0 , FARIMA 0,d 2 ,0 , and FARIMA 0,0.1,0 , respectively. Note that Z t and t are correlated, X t and Z t are correlated, but X t and t are independent. Under this model we compare B 2SLS with the ordinary least squares estimator B OLS for B, which is defined as The lengths of X t , Y t , and Z t are set by 100, and based on 5000 times simulation we report the mean square errors MSE of B 2SLS and B OLS . We set d 1 , d 2 0.1, 0.2, 0.3 in Table 1.
In most cases of d 1 and d 2 in Table 1, MSE of B 2SLS is smaller than that of B OLS . Hence, from this Example we can see that our estimator B 2SLS is better than B OLS in the sense of MSE. Furthermore, from Table 1, we can see that MSE of B 2SLS and B OLS increases as d 2 becomes large; that is, long-memory behavior of w t makes the asymptotic variances of B 2SLS and B OLS large.  Table 2. We chose the Nikkei stock average as the instrumental variable, because we got the following correlation analysis between the residual processes of returns and Nikkei. which supports the assumption Cov X t , ε t 0.
From Table 2, we observe that the return of the finance stock American Express is strongly correlated with that of S&P500 and the return of the auto industry stock Ford is negatively correlated with that of S&P500.

Proof of Theorem
This section provides the proof of Theorem 3.2. First for convenience we define Z t Z 1,t , . . . , Z p,t ≡ δ X t . Let u t u 1,t , . . . , u p,t be the residual from the OLS estimation of 3.1 ; that is, The OLS makes this residual orthogonal to X t :