Parameter Estimation in Mean Reversion Processes with Deterministic Long-Term Trend

This paper describes a procedure based on maximum likelihood technique in two phases for estimating the parameters in mean reversion processes when the long-term trend is defined by a continued deterministic function. Closed formulas for the estimators that depend on observations of discrete paths and an estimation of the expected value of the process are obtained in the first phase. In the second phase, a reestimation scheme is proposed when a priori knowledge exists of the long-term trend. Some experimental results using simulated data sets are graphically illustrated.


Introduction
In the literature it is common to find studies and applications of mean reversion models, at both practical and theoretical levels, especially in fields such as energy markets, commodities, and interest rates.In these models there is a long-term trend which acts as an attractor making the process to oscillate around it and there is a random component that adds volatility to the movement.The specific characteristics of the models will depend on the structure of the trend and the volatility of the process.Some common models of mean reversion are those in which all the parameters in the process are constant, such as Ornstein-Uhlenbeck model, the CIR model proposed by Cox et al. [1], and in general the models obtained from the CKLS structure proposed by Chan et al. [2].In addition to these models there are other methods not so common in the literature in which the reversion is determined not by a parameter but a deterministic function or another stochastic process [3,4].In general, the fit of the models to specific situations is done by assigning values to the parameters and functions, either from a priori knowledge of the problem or from parameter estimation techniques based on historical data series that help to detect unobservable information from the process.
One of the first difficulties in the application of parameter estimation techniques is that it is necessary to have information with the same time resolution that is specified in the model.Since models of stochastic differential equations are formulated in continuous time and that the observed paths of the process can be obtained only in discrete time, it is necessary to discretize the EDE model.An approach frequently used for this purpose is given by Euler-Maruyama.With this methodology a new discrete-time process is obtained, with which it is possible to make inferences that are valid for continuous time model, as in the case of estimating the parameters.
There are different procedures that can be implemented such as methods based on distributional moments [5], Kernel methods [6,7], least squares, and maximum likelihood [8][9][10].Parameter estimation becomes fundamental too for modeling control systems.In the literature it is possible to find different methodologies for those issues such as those developed for Meng and Ding [11], where the parameter estimation in an equation-error autoregressive (EEAR) system is done through a transformation of the model to an 2 Journal of Probability and Statistics equivalent one removing the autoregressive term and to obtain the parameters of the original model the principle of equivalence is used.Another methodology is proposed for Cao and Liu [12] where the parameter estimation in a power system is carried out from a recursive procedure to estimate simultaneously all the parameters from techniques based on the hierarchical identification principle, using gradient algorithms and least squared algorithms.Complementarily, Ding et al. [13] used methods based on the gradient algorithms and least squares iterative algorithms for system identification in output error (OE) and output error moving average (OEMA) systems, where the parameters that depend on unknown variables are computed using estimates of these unknown variables from previously estimated parameters.
In the area of multirate system identification, Ding et al. [14,15] used the polynomial transformation technique to deal with the identification problem for dual-rate stochastic systems, while Sahebsara et al. [16] discussed the parameter estimation of multirate multi-input multioutput systems.Also, Ding and Chen [17] presented the combined parameter and state estimation algorithms of the lifted state-space models for general dual-rate systems based on the hierarchical identification method.Shi et al. [18] gave a crosstalk identification algorithm for multirate xDSL FIR systems.
In general, the auxiliary model identification idea has been used to solve the identification problem of dual-rate single-input single-output systems as shown in [19].
For more detailed information about multi-innovation stochastic gradient algorithm for multi-input multioutput systems and for multirate multi-input systems as well as auxiliary models based multi-innovation extended stochastic identification theory (see [20] and references therein).We will concentrate on one-factor stochastic model in which the parameters are estimated using maximum likelihood based on the discretized model when the long-term trend is given by a deterministic function.In this case we must estimate both the trend function and the parameters.To estimate the long-term trend function some convolution techniques and numerical differentiation are used, whereas the normality properties of residuals resulting from discretization are used for estimating parameters by maximum likelihood and in this way it is possible to obtain closed formulas for estimators based on the observations of a path of the process and the estimation of the long-term trend.
In the estimation process there may be some bias in the estimates with respect to the actual values despite the fact that they are obtained by the method of maximum likelihood.When this situation occurs it is necessary to develop alternative methodologies to have a greater adjustment to the estimates based on the initial estimates [21].
This paper is organized as follows.In Section 2 a description of mean reversion processes is presented for cases when the long-term trend is given by a continued deterministic function.The first phase of the estimation technique is shown in Section 3. Section 4 presents some examples and preliminary results for the estimation technique and in Section 5 (second phase), a procedure for reestimating the parameters from additional a priori knowledge is described.

Mean Reversion Processes with Deterministic Long-Term Trend
Mean reversion processes of one factor with constant parameters and mean given by a deterministic function can be written as with initial condition  0 = , where  > 0,  > 0, and  ∈ [0, 3/2] are constants, () is a deterministic function, and {  } ≥0 is a Unidimensional Standard Brownian Motion defined on a complete probability space (Ω, F, P).
parameter is called reversion rate, () is the mean reversion level or trend of long-run equilibrium,  is the parameter associated with the volatility, and  determines the sensitivity of the variance to the level of   .
Model ( 1) is a generalization of the models CKLS, Chan et al. [2], where the mean reversion level is a deterministic function that captures the trend of the process.In a general way, () plays the role of an attractor at each point  in the sense that, when   > () the trend term (() −   ) < 0 and therefore   decreases and when   < () a similar argument establishes that   grows.
To establish the relation between the mean reversion function () and the expected value of the process [  ], we can consider (1) written in integral form as ( The solution of this equation is To illustrate graphically the relation between () and (), we use () =  +  sin(2/) as an example.In this case the solution to (3) is given by where  = 2/.
Figure 1 shows the dynamic behavior of (), () and one path of   for some specific values in the parameters.
From the Figure it can be seen that the expected value of the process mimics the long-term mean: that is just the way mean reversion works.

Phase 1: Parameter Estimation
Similar to the procedure established in Marín Sánchez and Palacio [10] and Huzurbazar [22] the main objective is to obtain closed formulas for the parameter estimators from discrete observations of a sample path of the process.This can be achieved from the discretized maximum likelihood function, which is obtained from the process that results from the Euler-Maruyama discretization scheme applied to (1) and using the normal and independent increments properties of the Brownian Motion over the residuals of the discrete process.
The variable   depends on the observations  = ( 0 ,  1 , . . .,   )  of the sample path of the process and on the unobservable quantities () and its derivative ṁ().Taking into account that the sample path resumes all the information that is known from the process, it is necessary to estimate the expected value () from those observations.As in Tifenbach [4], p. 31-32, we can assume that the expected value of the process can be approximated using a convolution 1 over the sample path, so the variable () will depend only on the observations  = ( 0 ,  1 , . . .,   )  .
Thus, we assume that the expected value of the process   can be approximated by taking a convolution of the sample path  = ( 0 ,  1 , . . .,   ) for some particular convolution function () given by The most common convolution is the moving average, where   constants are given by   = 1/(2 + 1) for all , which is denoted by ( 1  ).Other convolutions can be defined varying the relative weight of each observation: for example, they can be defined from the coefficients of odd rows of Pascal's triangle which is denoted by ( 2  ).In this case the central observations have a greater weight than more distant observations and for this reason the convolution is close to the path.Those specific cases of convolutions are presented in Figure 2.
The convolution is used for the purpose of smoothing the sample path, eliminating the noise and capturing the trend.This objective can also be achieved using other techniques like filtering.In Figure 2 the Hodrick-Prescott [24] filter denoted by ( hp  ) was used to obtain an estimation of () that are smoother than the convolutions ( 1  ) and ( 2  ).At this point we have that the expected value () can be approximated from the sample path  = ( 0 ,  1 , . . .,   ).Now, to get the derivative of () we can apply numerical derivatives techniques like Taylor's theorem: for example, the derivative at the observation  can be defined using a threepoint rule as for  = 1, 2, . . .,  − 1.For  = 0 the derivate is given by ṁ0 = ( 1 −  0 )/Δ and for  =  by ṁ = (  −  −1 )/Δ.This procedure can be done using a different number of points to calculate the derivate at each point.
With the estimation of () and ṁ() we have one realization of   and now we can define its normal density function:  Since each   is independent the joint density function can be expressed as the product of their marginal densities: Consequently, the likelihood function is given by Taking the natural logarithm on both sides it follows that Now consider the problem of maximizing the likelihood function given by If we assumed that the value of  is known a priori, the estimation process leads to the estimated parameter  given by The estimator σ is determined by the equation: Finally, with the estimation of α we can now approximate μ() with (5).

Preliminary Results
The aim of this section is to show the performance of the estimators defined in the previous section.To make this we will simulate some paths of a process satisfying (1) with a set of parameters  and  and giving a deterministic form to the () function and then find the estimators with the established procedure in order to determine-using statistical properties-if the estimates are approximate to actual parameters.
We analyze the performance of the estimators for sinusoidal and parabolic () trend functions and for a set of values of  and  when  = 1.Each generated path consists of 250 observations with Δ = 1/250.The statistics are calculated for 1000 different simulations.Considering that for the parameters estimation it is necessary to find the expected value of the process from each path, we compare two different convolution functions and the Hodrick-Prescott filter.For numerical derivation a five-point derivation rule is used.
The values of the parameters and () functions used in the simulations are presented in Table 1.
In Table 2 we present some basic statistics (mean  and standard deviation   ) for the estimators obtained from the simulations.The convolution functions used in the simulations are the ones defined for the weights ( 1  ) and ( 3  ). 1  weights are the same defined in Section 3 for the moving average and  3   weights are given by  3 , = (2 − ||)/(3 + 1) for  from − to .In this convolution the central observation has a relative weight twice the most distant observation.
From Table 2 we can conclude that the estimation technique is working well, especially in the first example (sinusoidal trend).The relative error of the estimators is less than 2% in the case of the parameter  but there is a bias in the parameter  that depends on the convolution function and at the same time on the parameter  used in such convolution.
In Figure 3 we can see the performance of the estimation of the deterministic function ().In that figure an estimation from a random path is shown instead of the average.The estimates are close to the theoretical function in both cases but having some differences at the beginning and the final of the observations.One form to measure the fit between the theoretical function and its estimation is using the Root Mean Square error (RMS).Table 3 presents that measure for the proposed estimation in cases 1 and 2.

Phase 2: Re-Estimation
From the simulations we can see that the estimate of () is not very accurate and that differences come from m() and ̂ṁ(), and in turn this affects the estimation of other parameters.
If it were possible to have all the paths of the process it would not be necessary for numerical approximation of the expected value or its derivative to capture information from (), and it could be enough to perform an average of all paths at each time   and that could generate pretty accurate estimations.As this condition is not the usual but only has a path, the procedure detailed above is necessary and this involves the errors mentioned in the estimates of () and ṁ().
To correct this deficiency it is proposed to carry out a reestimate procedure trying to adjust a function to the initial estimate from a priori knowledge of the process.This recalculation allows a correction in the other estimators while providing a functional form that allows simulations of future paths of the process.
The following procedure is proposed to perform the reestimate of (): (1) Get an estimate of μ() according to the procedure described in Section 4.
(2) Define a functional structure (Θ, ) from a priori knowledge of the process.
(3) Estimate the vector parameters Θ minimizing the error function defined by As an example, suppose the functional form of the long-term mean is defined as where Θ = [, , ]  .In this case the error function would be given by To estimate Θ it is necessary to solve the system of equations defined by ∇ Error(Θ) = 0, which is equivalent to the system of linear equations Θ = , where A similar procedure can be implemented under the assumption that (Θ, ) =  +  sin((2/)), to find Θ = [, ]  if  is known.A comparison between (),  est (), and  rest () = ( Θ, ) is shown in Figure 4 for both cases and for a random path of the process defined with the same set of parameters of Table 1.
Having a best estimate of () allows a reestimation of the parameters  and .To achieve this we must perform a similar procedure to that proposed in Section 4, defining So,   Table 4 summarizes the results obtained from the first estimation and the reestimation procedure for the cases where () has a sinusoidal form and parabolic form and the other parameters are defined in Table 1.
From the results of the above example we can observe how the reestimation procedure reduces the bias in the initial estimation of  for both sinusoidal and parabolic cases.The final bias in  is approximately 1.5% of the real value of the parameter, reducing almost 90% of the error in the first estimation in the case of sinusoidal function and near 60% for the parabolic function.The reestimations of  do not significantly change from the original estimate.For the case of () the RMS error is present in Table 5 for a random path of the process.In both cases the error in the reestimation is less than the 50% of the error present in the initial estimation.

Conclusions and Comments
This paper shows how starting from a path of a stochastic process is possible to find an approximation of the parameters that define the underlying dynamics It is important to highlight that, for modeling the dynamic behavior of prices of some commodities or interest rates, only one realization of the process in discrete time is available and it represents the only source of information to capture the nature of the process.Therefore, it becomes very important to obtain and interpret the invisible and visible information found on the path to make projections more adjusted to the future process behavior being studied.
It is essential that a model that seeks to reproduce the behavior of a process that is generated in the real world incorporates a priori information provided by an expert in the topic and that this additional information allows the model to better fit the process.This prior information is key in the development of the methodology proposed in this paper, because it allows us to adjust the initial estimates of the parameters making more reliable the results obtained with the adjusted models.On the other hand, in the development of the proposed methodology it can be seen that it allows us to obtain closed formulas for the maximum likelihood estimators in mean reversion process when the long-term trend is defined by a deterministic function, decreasing computational effort and allowing greater understanding of procedures.() is a convolution function.In the discrete case the convolution of two sequences can be written as ( * * ) Taking the expected value, [  ] − [ 0 ] =  ∫  0 (() − [  ]) and thus obtain the ordinary differential equation: ṁ () =  ( () −  ()) ;  () =  [  ] .

Figure 2 :
Figure 2: Dynamic behavior for process   , the expected value (), and three types of convolution.(a) Moving average ( 1  ) with  = 20.(b) Pascal's weights ( 2  ) with  = 20.(c) Hodrick-Prescott filter ( hp  ) with  hp = 40000.For all the cases Δ = 1/250 for a total of 250 observations and the parameters of   are the same as in Figure 1.

Table 1 :
Simulation parameters for sinusoidal and parabolic trend functions.

Table 4 :
Estimated and reestimated parameters.The estimations are calculated with the convolution ( hp  ).Results obtained with 1000 simulations with 250 observations each for  1 and  2 as defined in Table1: case 1  = 30 and  = 0.4, case 2  = 30 and  = 0.13.

Table 5 :
Results of RMS error for the estimation and reestimation of (): case 1 sinusoidal trend and case 2 parabolic trend.