This paper describes a procedure based on maximum likelihood technique in two phases for estimating the parameters in mean reversion processes when the long-term trend is defined by a continued deterministic function. Closed formulas for the estimators that depend on observations of discrete paths and an estimation of the expected value of the process are obtained in the first phase. In the second phase, a reestimation scheme is proposed when a priori knowledge exists of the long-term trend. Some experimental results using simulated data sets are graphically illustrated.
1. Introduction
In the literature it is common to find studies and applications of mean reversion models, at both practical and theoretical levels, especially in fields such as energy markets, commodities, and interest rates. In these models there is a long-term trend which acts as an attractor making the process to oscillate around it and there is a random component that adds volatility to the movement. The specific characteristics of the models will depend on the structure of the trend and the volatility of the process.
Some common models of mean reversion are those in which all the parameters in the process are constant, such as Ornstein-Uhlenbeck model, the CIR model proposed by Cox et al. [1], and in general the models obtained from the CKLS structure proposed by Chan et al. [2]. In addition to these models there are other methods not so common in the literature in which the reversion is determined not by a parameter but a deterministic function or another stochastic process [3, 4]. In general, the fit of the models to specific situations is done by assigning values to the parameters and functions, either from a priori knowledge of the problem or from parameter estimation techniques based on historical data series that help to detect unobservable information from the process.
One of the first difficulties in the application of parameter estimation techniques is that it is necessary to have information with the same time resolution that is specified in the model. Since models of stochastic differential equations are formulated in continuous time and that the observed paths of the process can be obtained only in discrete time, it is necessary to discretize the EDE model. An approach frequently used for this purpose is given by Euler-Maruyama. With this methodology a new discrete-time process is obtained, with which it is possible to make inferences that are valid for continuous time model, as in the case of estimating the parameters.
There are different procedures that can be implemented such as methods based on distributional moments [5], Kernel methods [6, 7], least squares, and maximum likelihood [8–10]. Parameter estimation becomes fundamental too for modeling control systems. In the literature it is possible to find different methodologies for those issues such as those developed for Meng and Ding [11], where the parameter estimation in an equation-error autoregressive (EEAR) system is done through a transformation of the model to an equivalent one removing the autoregressive term and to obtain the parameters of the original model the principle of equivalence is used. Another methodology is proposed for Cao and Liu [12] where the parameter estimation in a power system is carried out from a recursive procedure to estimate simultaneously all the parameters from techniques based on the hierarchical identification principle, using gradient algorithms and least squared algorithms. Complementarily, Ding et al. [13] used methods based on the gradient algorithms and least squares iterative algorithms for system identification in output error (OE) and output error moving average (OEMA) systems, where the parameters that depend on unknown variables are computed using estimates of these unknown variables from previously estimated parameters.
In the area of multirate system identification, Ding et al. [14, 15] used the polynomial transformation technique to deal with the identification problem for dual-rate stochastic systems, while Sahebsara et al. [16] discussed the parameter estimation of multirate multi-input multioutput systems. Also, Ding and Chen [17] presented the combined parameter and state estimation algorithms of the lifted state-space models for general dual-rate systems based on the hierarchical identification method. Shi et al. [18] gave a crosstalk identification algorithm for multirate xDSL FIR systems.
In general, the auxiliary model identification idea has been used to solve the identification problem of dual-rate single-input single-output systems as shown in [19].
For more detailed information about multi-innovation stochastic gradient algorithm for multi-input multioutput systems and for multirate multi-input systems as well as auxiliary models based multi-innovation extended stochastic identification theory (see [20] and references therein). We will concentrate on one-factor stochastic model in which the parameters are estimated using maximum likelihood based on the discretized model when the long-term trend is given by a deterministic function. In this case we must estimate both the trend function and the parameters. To estimate the long-term trend function some convolution techniques and numerical differentiation are used, whereas the normality properties of residuals resulting from discretization are used for estimating parameters by maximum likelihood and in this way it is possible to obtain closed formulas for estimators based on the observations of a path of the process and the estimation of the long-term trend.
In the estimation process there may be some bias in the estimates with respect to the actual values despite the fact that they are obtained by the method of maximum likelihood. When this situation occurs it is necessary to develop alternative methodologies to have a greater adjustment to the estimates based on the initial estimates [21].
This paper is organized as follows. In Section 2 a description of mean reversion processes is presented for cases when the long-term trend is given by a continued deterministic function. The first phase of the estimation technique is shown in Section 3. Section 4 presents some examples and preliminary results for the estimation technique and in Section 5 (second phase), a procedure for reestimating the parameters from additional a priori knowledge is described.
2. Mean Reversion Processes with Deterministic Long-Term Trend
Mean reversion processes of one factor with constant parameters and mean given by a deterministic function can be written as(1)dXt=αμt-Xtdt+σXtγdBt;t∈0,Twith initial condition X0=x, where α>0, σ>0, and γ∈0,3/2 are constants, μ(t) is a deterministic function, and Btt≥0 is a Unidimensional Standard Brownian Motion defined on a complete probability space Ω,F,P.
α parameter is called reversion rate, μ(t) is the mean reversion level or trend of long-run equilibrium, σ is the parameter associated with the volatility, and γ determines the sensitivity of the variance to the level of Xt.
Model (1) is a generalization of the models CKLS, Chan et al. [2], where the mean reversion level is a deterministic function that captures the trend of the process. In a general way, μ(t) plays the role of an attractor at each point t in the sense that, when Xt>μ(t) the trend term αμt-Xt<0 and therefore Xt decreases and when Xt<μ(t) a similar argument establishes that Xt grows.
To establish the relation between the mean reversion function μ(t) and the expected value of the process EXt, we can consider (1) written in integral form as(2)Xt-X0=α∫0tμs-Xsds+σ∫0tXsγdBs.
Taking the expected value, E[Xt]-E[X0]=α∫0tμs-EXsds and thus obtain the ordinary differential equation:(3)m˙t=αμt-mt;mt=EXt.
The solution of this equation is m(t)=m0e-αt+e-αt∫0teαsμsds.
To illustrate graphically the relation between μ(t) and m(t), we use μt=a+bsin2πt/P as an example. In this case the solution to (3) is given by(4)mt=a+m0-a+αβbα2+β2e-αt+αbα2+β2αsinβt-βcosβt,where β=2π/P.
Figure 1 shows the dynamic behavior of μ(t), m(t) and one path of Xt for some specific values in the parameters.
Dynamic behavior for m(t), μ(t), and Xt, where X0=2, α=37, σ=0.7, γ=1, a=1.5, b=1, P=1, and β=2π with Δt=1/250 for a total of 250 observations.
From the Figure it can be seen that the expected value of the process mimics the long-term mean: that is just the way mean reversion works.
3. Phase 1: Parameter Estimation
Similar to the procedure established in Marín Sánchez and Palacio [10] and Huzurbazar [22] the main objective is to obtain closed formulas for the parameter estimators from discrete observations of a sample path of the process. This can be achieved from the discretized maximum likelihood function, which is obtained from the process that results from the Euler-Maruyama discretization scheme applied to (1) and using the normal and independent increments properties of the Brownian Motion over the residuals of the discrete process.
Consider the differential equation (1) with the initial value X0=x. αμt-Xt and σXtγ are, respectively, the trend and diffusion functions that depend on t, Xt, and a vector of unknown parameters θ=α,σ,γ. Now suppose that the conditions that guarantee the existence and uniqueness of the solution Mao [23] are satisfied, so the expected value of Xt exists and μ(t) can be defined using m(t) and its derivative m˙(t) (see (3)). In this way,(5)μt=mt+m˙tα.
Replacing μ(t) in (1) results in(6)dXt=αmt+m˙tα-Xtdt+σXtγdBt;t∈0,T.
Now we can assume that we have the discrete observations Xt=X(τt) for the times τ0,τ1,…,τT, where Δt=τt-τt-1≥0, t=1,2,…,T. Using the numerical scheme of Euler-Maruyama on (6) it follows that(7)Xτt=Xτt-1+αmτt-1-Xτt-1+m˙τt-1Δt+Xγτt-1ετt,where ετt≈N(0,σ2Δt).
Define the new variable Yt:(8)Yt=Xt-Xt-1-αmt-1-Xt-1+m˙t-1ΔXt-1γwith Δ constant.
Note that Yt, i=1,2,3,…,T, are independent random variables distributed N(0,σ2Δ).
The variable Yt depends on the observations X=X0,X1,…,XT′ of the sample path of the process and on the unobservable quantities mt and its derivative m˙t. Taking into account that the sample path resumes all the information that is known from the process, it is necessary to estimate the expected value m(t) from those observations. As in Tifenbach [4], p. 31-32, we can assume that the expected value of the process can be approximated using a convolution^{1} over the sample path, so the variable m(t) will depend only on the observations X=X0,X1,…,XT′.
Thus, we assume that the expected value of the process Xt can be approximated by taking a convolution of the sample path X=X0,X1,…,XT for some particular convolution function c(t) given by(9)mi=∑k=-NNckXi-k.The most common convolution is the moving average, where ck constants are given by ck=1/2N+1 for all N, which is denoted by (ct1). Other convolutions can be defined varying the relative weight of each observation: for example, they can be defined from the coefficients of odd rows of Pascal’s triangle which is denoted by (ct2). In this case the central observations have a greater weight than more distant observations and for this reason the convolution is close to the path. Those specific cases of convolutions are presented in Figure 2.
Dynamic behavior for process Xt, the expected value m(t), and three types of convolution. (a) Moving average (ct1) with N=20. (b) Pascal’s weights (ct2) with N=20. (c) Hodrick-Prescott filter (cthp) with λhp=40000. For all the cases Δt=1/250 for a total of 250 observations and the parameters of Xt are the same as in Figure 1.
The convolution is used for the purpose of smoothing the sample path, eliminating the noise and capturing the trend. This objective can also be achieved using other techniques like filtering. In Figure 2 the Hodrick-Prescott [24] filter denoted by (cthp) was used to obtain an estimation of m(t) that are smoother than the convolutions (ct1) and (ct2).
At this point we have that the expected value m(t) can be approximated from the sample path X=(X0,X1,…,XT). Now, to get the derivative of m(t) we can apply numerical derivatives techniques like Taylor’s theorem: for example, the derivative at the observation i can be defined using a three-point rule as(10)m˙i=2mi+1-3mi+mi-1Δ,for i=1,2,…,T-1. For i=0 the derivate is given by m˙0=m1-m0/Δ and for i=T by m˙T=mT-mT-1/Δ. This procedure can be done using a different number of points to calculate the derivate at each point.
With the estimation of m(t) and m˙(t) we have one realization of Yt and now we can define its normal density function:(11)fYi:θ=12πσ2Δ1/2exp-12σ2ΔXi-Xi-1-αmi-1-Xi-1+m˙i-1ΔXi-1γ2.
Since each Yi is independent the joint density function can be expressed as the product of their marginal densities:(12)fY1,Y2,…,YT=∏k=1TfYk:θ.
Consequently, the likelihood function is given by(13)Lθ∣Yt=12πσ2ΔT/2exp-12σ2Δ∑i=1TXi-Xi-1-αmi-1-Xi-1+m˙i-1ΔXi-1γ2.
Taking the natural logarithm on both sides it follows that(14)logL=-T2log2πσ2Δ-12σ2Δ∑i=1TXi-Xi-1-αmi-1-Xi-1+m˙i-1ΔXi-1γ2.
Now consider the problem of maximizing the likelihood function given by(15)∂logL∂θ=0→,ie,θ^=argθmaxlogL.
If we assumed that the value of γ is known a priori, the estimation process leads to the estimated parameter α given by(16)α^=∑i=1TXi-Xi-1-m˙i-1Δmi-1-Xi-1/Xi-12γ∑i=1Tmi-1-Xi-1/Xi-1γ2Δ.
The estimator σ^ is determined by the equation:(17)σ^=1TΔ∑i=1TXi-Xi-1-α^mi-1-Xi-1+m˙i-1ΔXi-1γ2.
Finally, with the estimation of α^ we can now approximate μ^(t) with (5).
4. Preliminary Results
The aim of this section is to show the performance of the estimators defined in the previous section. To make this we will simulate some paths of a process satisfying (1) with a set of parameters α and σ and giving a deterministic form to the μ(t) function and then find the estimators with the established procedure in order to determine—using statistical properties—if the estimates are approximate to actual parameters.
We analyze the performance of the estimators for sinusoidal and parabolic μ(t) trend functions and for a set of values of α and σ when γ=1. Each generated path consists of 250 observations with Δt=1/250. The statistics are calculated for 1000 different simulations. Considering that for the parameters estimation it is necessary to find the expected value of the process from each path, we compare two different convolution functions and the Hodrick-Prescott filter. For numerical derivation a five-point derivation rule is used.
The values of the parameters and μ(t) functions used in the simulations are presented in Table 1.
Simulation parameters for sinusoidal and parabolic trend functions.
Case 1
Case 2
α
30
30
σ
0.4
0.13
X0
2
1.5
μ(t)
1.5+sin2πt
-0.5t2+t+1.5
In Table 2 we present some basic statistics (mean θ¯ and standard deviation sθ) for the estimators obtained from the simulations. The convolution functions used in the simulations are the ones defined for the weights ck1 and ck3. ck1 weights are the same defined in Section 3 for the moving average and ck3 weights are given by ck,N3=2N-k/N(3N+1) for k from -N to N. In this convolution the central observation has a relative weight twice the most distant observation.
Results obtained with 1000 simulations with 250 observations each. The real values of the parameters are shown in Table 1. For case 1 in the convolutions we use N=22 and for case 2 N=60. For both cases we set λ=62500 in ckhp.
Case 1
Case 2
μ1(t)
ck1
ck3
ckhp
μ2(t)
ck1
ck3
ckhp
α
α-
24.8540
31.3347
32.8771
α
α-
31.1160
35.8249
51.1093
sα
3.2218
3.9947
5.1406
sα
5.8254
6.6984
10.1414
σ
σ-
0.4002
0.3942
0.4023
σ
σ-
0.1293
0.1281
0.1267
sσ
0.0177
0.0174
0.0174
sσ
0.0060
0.0059
0.0060
From Table 2 we can conclude that the estimation technique is working well, especially in the first example (sinusoidal trend). The relative error of the estimators is less than 2% in the case of the parameter σ but there is a bias in the parameter α that depends on the convolution function and at the same time on the parameter N used in such convolution.
In Figure 3 we can see the performance of the estimation of the deterministic function μ(t). In that figure an estimation from a random path is shown instead of the average. The estimates are close to the theoretical function in both cases but having some differences at the beginning and the final of the observations. One form to measure the fit between the theoretical function and its estimation is using the Root Mean Square error (RMS). Table 3 presents that measure for the proposed estimation in cases 1 and 2.
Results of RMS error for each estimation μ1(t) and μ2(t).
RMS
Case 1
Case 2
ck1
0.0782
0.0266
ck3
0.0578
0.0238
ckhp
0.0767
0.0140
Estimated μ1(t) and μ2(t). The left side correspond to Case 1 (sinusoidal) and the right side to Case 2 (parabolic). The (A) plots are obtained with the convolution ck1, (B) plots with (ck3) and (C) plots with (ckhp). The parameters involved in this figure are presented in Tables 1 and 2.
5. Phase 2: Re-Estimation
From the simulations we can see that the estimate of μ(t) is not very accurate and that differences come from m^t and m˙^t, and in turn this affects the estimation of other parameters.
If it were possible to have all the paths of the process it would not be necessary for numerical approximation of the expected value or its derivative to capture information from μ(t), and it could be enough to perform an average of all paths at each time ti and that could generate pretty accurate estimations. As this condition is not the usual but only has a path, the procedure detailed above is necessary and this involves the errors mentioned in the estimates of m(t) and m˙(t).
To correct this deficiency it is proposed to carry out a reestimate procedure trying to adjust a function to the initial estimate from a priori knowledge of the process. This recalculation allows a correction in the other estimators while providing a functional form that allows simulations of future paths of the process.
The following procedure is proposed to perform the reestimate of μ(t):
Get an estimate of μ^(t) according to the procedure described in Section 4.
Define a functional structure f(Θ,t) from a priori knowledge of the process.
Estimate the vector parameters Θ minimizing the error function defined by (18)ErrorΘ=∑t=1TfΘ,t-μ^t2.
As an example, suppose the functional form of the long-term mean is defined as(19)fΘ,t=at2+bt+c,where Θ=a,b,c′. In this case the error function would be given by(20)ErrorΘ=∑t=0Tμ^t-at2+bt+c2.To estimate Θ it is necessary to solve the system of equations defined by ∇Error(Θ)=0, which is equivalent to the system of linear equations MΘ=d, where(21)M=∑t4∑t3∑t2∑t3∑t2∑t∑t2∑tT+1d=∑t2μ^t∑tμ^t∑μ^t.A similar procedure can be implemented under the assumption that fΘ,t=a+bsin2π/Pt, to find Θ=a,b′ if P is known. A comparison between μ(t), μest(t), and μrest(t)=f(Θ^,t) is shown in Figure 4 for both cases and for a random path of the process defined with the same set of parameters of Table 1.
Comparison of dynamic behavior of μ(t), μest(t), and μrest(t). In case (a) μ1(t)=1.5+sin2πt and in case (b) μ2(t)=-0.5t2+t+1.5.
Having a best estimate of μ(t) allows a reestimation of the parameters α and σ. To achieve this we must perform a similar procedure to that proposed in Section 4, defining(22)Yt=Xt-Xt-1-αμt-1-Xt-1ΔXt-1γΔ constant.
Table 4 summarizes the results obtained from the first estimation and the reestimation procedure for the cases where μ(t) has a sinusoidal form and a parabolic form and the other parameters are defined in Table 1.
Estimated and reestimated parameters. The estimations are calculated with the convolution (ckhp). Results obtained with 1000 simulations with 250 observations each for μ1 and μ2 as defined in Table 1: case 1 α=30 and σ=0.4, case 2 α=30 and σ=0.13.
μ1(t)
α^
α^^
μ2(t)
α^
α^^
α
α-
24.8540
29.6620
α
α-
31.1160
30.4616
sα
3.2218
3.4518
sα
5.8254
5.6828
μ1(t)
σ^
σ^^
μ2(t)
σ^
σ^^
σ
σ-
0.4002
0.4021
σ
σ-
0.1293
0.1300
sσ
0.0177
0.0187
sσ
0.0060
0.0060
From the results of the above example we can observe how the reestimation procedure reduces the bias in the initial estimation of α for both sinusoidal and parabolic cases. The final bias in α is approximately 1.5% of the real value of the parameter, reducing almost 90% of the error in the first estimation in the case of sinusoidal function and near 60% for the parabolic function. The reestimations of σ do not significantly change from the original estimate. For the case of μ(t) the RMS error is present in Table 5 for a random path of the process. In both cases the error in the reestimation is less than the 50% of the error present in the initial estimation.
Results of RMS error for the estimation and reestimation of μ(t): case 1 sinusoidal trend and case 2 parabolic trend.
RMS
Case 1
Case 2
μest(t)
0.0961
0.0137
μrest(t)
0.0261
0.0058
6. Conclusions and Comments
This paper shows how starting from a path of a stochastic process is possible to find an approximation of the parameters that define the underlying dynamics It is important to highlight that, for modeling the dynamic behavior of prices of some commodities or interest rates, only one realization of the process in discrete time is available and it represents the only source of information to capture the nature of the process. Therefore, it becomes very important to obtain and interpret the invisible and visible information found on the path to make projections more adjusted to the future process behavior being studied.
It is essential that a model that seeks to reproduce the behavior of a process that is generated in the real world incorporates a priori information provided by an expert in the topic and that this additional information allows the model to better fit the process. This prior information is key in the development of the methodology proposed in this paper, because it allows us to adjust the initial estimates of the parameters making more reliable the results obtained with the adjusted models.
On the other hand, in the development of the proposed methodology it can be seen that it allows us to obtain closed formulas for the maximum likelihood estimators in mean reversion process when the long-term trend is defined by a deterministic function, decreasing computational effort and allowing greater understanding of procedures.
Competing Interests
The authors declare that they have no competing interests.
Endnotes
A convolution function ct is an even, continuous, and real valued function satisfying the fact that c(t)⩾0 and ∫-∞∞ctdt=1. A convolution of a curve st is another function l(t) defined by∗lt=∫-∞∞cwst-wdw,where c(t) is a convolution function.In the discrete case the convolution of two sequences can be written as∗∗li=∑k=-∞∞cksi-k.
CoxJ. C.IngersollJ.RossS.A theory of the term structure of interest ratesChanK. C.KarolyiG. A.LongstaffF. A.SandersA. B.An empirical comparison of alternative models of the short-term interest ratePilipovicD.TifenbachB.KoulisT.ThavaneswaranA.Inference for interest rate models using Milstein’s approximationFlorens-ZmirouD.On estimating the diffusion coefficient from discrete observationsJiangG. J.KnightJ. L.A nonparametric approach to the estimation of diffusion processes, with an application to a short-term interest rate modelNowmanK. B.Gaussian estimation of single-factor continuous time models of the term structure of interest ratesYuJ.PhillipsP.Corrigendum to ‘a Gaussian approach for continuous time models of the shortterm interest rate’Marín SánchezF. H.PalacioJ. S.Gaussian estimation of one-factor mean reversion processesMengD.DingF.Model equivalence-based identification algorithm for equation-error systems with colored noiseCaoY.LiuZ.Signal frequency and parameter estimation for power systems using the hierarchical identification principleDingF.LiuP. X.LiuG.Gradient based and least-squares based iterative identification methods for OE and OEMA systemsDingF.LiuP. X.ShiY.Convergence analysis of estimation algorithms for dual-rate stochastic systemsDingF.LiuP. X.YangH. Z.Parameter identification and intersample output estimation for dual-rate systemsSahebsaraM.ChenT.ShahS. L.Frequency-domain parameter estimation of general multi-rate systemsDingF.ChenT.Hierarchical identification of lifted state-space models for general dual-rate systemsShiY.DingF.ChenT.Multirate crosstalk identification in xDSL systemsDingF.ChenT.Combined parameter and output estimation of dual-rate systems using an auxiliary modelHanL.ShengJ.DingF.ShiY.Auxiliary model identification method for multirate multi-input systems based on least squaresYuJ.Bias in the estimation of the mean reversion parameter in continuous time modelsHuzurbazarV. S.The likelihood equation, consistency and the maxima of the likelihood functionMaoX.HodrickR. J.PrescottE. C.Postwar U.S business cycles: an empirical investigation