This article presents statistical inference methodology based on maximum likelihoods for delay differential equation models in the univariate setting. Maximum likelihood inference is obtained for single and multiple unknown delay parameters as well as other parameters of interest that govern the trajectories of the delay differential equation models. The maximum likelihood estimator is obtained based on adaptive grid and Newton-Raphson algorithms. Our methodology estimates correctly the delay parameters as well as other unknown parameters (such as the initial starting values) of the dynamical system based on simulation data. We also develop methodology to compute the information matrix and confidence intervals for all unknown parameters based on the likelihood inferential framework. We present three illustrative examples related to biological systems. The computations have been carried out with help of mathematical software: MATLAB® 8.0 R2014b.
Universiti Teknologi PetronasERGSHiCOE1. Introduction
Delay differential equations (DDEs) are widely used to model many real life phenomena, especially in science and engineering. Examples include the modeling of spread of infectious diseases, modeling of tumor growth and the growth of blood clots in the brain, population dynamics, traffic monitoring, and price fluctuations of commodities in economics; see [1–4]. A univariate delay differential equation model (DDEM) with multiple delays equates the real valued observations, yi, as noisy realizations from an underlying DDE: (1)yi=xti+ϵi,i=0,1,2,…,n,where ϵi’s are errors assumed to arise from a noise distribution with zero mean and unknown standard deviation σ>0. In (1), xti is the solution, xt, of the DDE(2)x˙t=ft,xt,z1t,z2t,…,zmt,θevaluated at the n time points, ti,i=0,1,…,n; in (2), zjt=xt-τj,j=1,2,…,m, is the jth delay term with delay parameter τj>0, and θ=(θ1,θ2,…,θp) is a vector of other parameters of interest that govern the trajectories of the underlying DDE in (2). Equations (1) and (2) constitute a univariate DDEM in the most general form. In a DDEM, the parameters θr and τj are often unknown and have to be estimated based on observations yi,i=0,1,…,n.
Not many methods appear in the statistical literature on parameter estimation and inference for DDEMs. Among the statistical approaches that have been suggested, many involve restrictions on the form of DDEMs that are being investigated. When such restrictions are relaxed, high computational costs and challenges arise. Typically, further inferential procedures such as obtaining standard errors and confidence intervals associated with parameter estimates involve further computational costs and challenges. We give a brief review of these works and approaches that have been reported in the literature in the following paragraph.
Ellner et al. [5] estimate the derivative of a univariate DDEM, which is assumed to be in an additive form, using nonparametric smoothing. Subsequently, they infer the constant (single) delay parameter, τ, based on fitting a generalized additive model. Ellner’s technique, although it unifies previous works, can thus be applied to DDEMs which satisfy the assumed additive form only. Wood [6] developed spline based model fitting techniques in the case when the DDEMs are partially specified. The spline based method involves high computational costs as cross-validation is used to select the smoothing coefficients associated with the penalty term as well as the unknown parameter estimates. A penalized semiparametric method is proposed by Wang and Cao [7] which involves maximizing an objective function consisting of two terms: a likelihood term and a penalty term which measures the discrepancy between an estimate of the derivative, x˙t, and the right hand side of the DDEM in (1). The selection of smoothing coefficients is done, similar to [6], via cross-validation, whereas standard errors of parameter estimates are obtained by bootstrapping. It follows that the method of [7], like [6], involves high computational costs. Further, Wang and Cao consider only univariate DDEMs with a single delay parameter. An estimation method based on Least Squares Support Vector Machines (LS-SVMs) for approximating constant as well as time-varying parameters in deterministic parameter-affine DDEMs is presented by Mehrkanoon et al. [8]. We note that Mehrkanoon performs parameter estimation only; no standard errors of estimates or confidence intervals are reported. Further, only single delays (either constant or time varying) are considered in [8].
In this paper, we consider parameter estimation and inference for univariate DDEMs with multiple delays based on the maximum likelihood. The method of maximum likelihood, as advocated by Fisher in his important papers [9, 10], has become one of the most significant tools for estimation and inference available to statisticians. Maximum likelihood estimators (MLEs) are well defined once a distributional model is specified for the observations. MLEs have well-behaved and well-understood properties: Huber [11] presents general conditions whereby the MLE is consistent for the true value of the unknown parameters for large sample sizes. Wald [12] and Akaike [13] observed that the maximum likelihood estimator is a natural estimator for the parameters when the true distribution is unknown. The large sample theory and distributional properties of MLEs can be used to perform subsequent inference procedures such as obtaining standard errors and confidence intervals and performing tests of hypotheses at minimal additional computational costs. MLEs are also the basic estimators that are used in subsequent statistical inferential procedures such as model selection using Akaike Information Criteria (AIC), Bayes Information Criteria (BIC), and other model selection criteria. Model selection is an important issue in DDEMs, such as for partially specified DDEMs in [6], where several models can be elicited for an observed physical process, but one model needs to be selected among many which fits the observed data and is simple enough to understand (Occam’s razor principle).
MLE can be developed for a large variety of estimation situations and is asymptotically efficient, which means that for large samples it produces the most precise estimates compared to non-MLE based methods (such as [8]). These are the reasons why we preferred using MLE over all other estimators for DDEMs in this paper.
The remainder of this paper is organized as follows: we define univariate DDEMs in Section 2. In Section 3, the MLE approach for DDEMs is outlined and the MLE is obtained computationally using an adaptive grid procedure followed by a gradient descent algorithm. We also develop algorithms for obtaining the information matrix and construct standard errors and confidence intervals for the unknown parameters. Three examples of univariate DDEMs related to biological systems are presented, and the numerical solutions and results based on the proposed methodology are provided based on simulation in Section 4.
2. General Model Formulation
Recall the DDEM defined by (1) and (2). The observation yi∈R is obtained at the ith sampled time point, ti, with T0=t0<⋯<tn=T1, where yi=xti+ϵi, i=0,1,2,…,n. In the remainder of this paper, the errors are assumed to be independent and identically distributed according to a normal with mean zero and unknown standard deviation σ>0, that is, ϵi~N(0,σ2). The underlying dynamical system xt,t∈T0,T1⊂R, is expressed implicitly in terms of the DDE. The general form of DDE with multiple delays for xt∈R is given by (2) as x˙t=ft,xt,z1t,z2t,…,zmt,θ, where zj(t)={xt-τj:t-τj≥0,j=1,2,…,m}, ft,xt,z1t,z2t,…,zmt,θ is 1-dimensional function, and x˙t denotes the first derivative of xt with respect to time t. The quantities θ∈Rp and τ=(τ1,τ2,…,τm) are unknown parameters of the DDEM, where θ is a vector of unknown parameters of dimension p and τ is a vector of time delays of dimension m. The complete trajectory of the function xt on T0,T1 will be determined by (2), and initial condition function φ:t0-maxτ1,τ2,…,τm,t0→R, where φt=a for all t∈t0-maxτ1,τ2,…,τm,t0, is also unknown in addition to the unknown parameters θ and τ. For given values of θ, τ, and a, we note that the solution xt, xt=xt,θ,τ,a, is a function of θ, τ, and a; thus, we make it an explicit function of the unknown quantities θ, τ, and a based on the xt,θ,τ,a notation. The observations y0,y1,…,yn are collected at the (n+1) sampled time points T0=t0<t1<⋯<tn=T1, based on the observational model (1). Our goal is to estimate θ,τ, and a based on the observations y0,y1,…,yn. Here, note that a is also unknown and thus appears as a nuisance parameter since properties of the dynamical system are governed by θ and τ and not a.
3. The Maximum Likelihood Estimation Approach for DDEMs
The likelihood of the DDEM for parameters θ, τ, and a, given observations y0,y1,…,yn, is (3)Lθ,τ,a∣y=∏i=0npyi∣θ,τ,a,where y=(y0,y1,…,yn) is the collection of all (n+1) observations on y with density(4)pyi∣θ,τ,a=12πσ2e-1/2σ2yi-xti,θ,τ,a2,based on the normality assumption on ϵi’s in (1). The above expression for the likelihood can be simplified to(5)Lθ,τ,a∣y=∏i=0n12πσ2e-1/2σ2yi-xti,θ,τ,a2=2πσ2-n+1/2e-1/2σ2∑i=0nyi-xti,θ,τ,a2.We assume for the moment that σ>0 is fixed and known; the case when σ>0 is unknown is dealt with later. Thus, the above likelihood is taken to be a function of (θ,τ,a) for now. The usual practice for statistical inference is to use the natural logarithm of the likelihood function, namely, the log-likelihood function, which is given by(6)lθ,τ,a∣y=∑i=0nlnpyi∣θ,τ,a=-n+12ln2πσ2-12σ2∑i=0nyi-xti,θ,τ,a2.Expressions of the log-likelihood l are often simpler than the likelihood function, L, since they are easier to differentiate and the results are more stable computationally.
Since ln(x) is a monotonically increasing function of x, it follows that the maximization of (3) and (6) is equivalent in that the same optimized parameter is found. We denote the MLE of θ,τ,a as θ^,τ^,a^. The MLE θ^,τ^,a^ is a point estimate such that (7)θ^,τ^,a^=argmaxθ,τ,alθ,τ,aand can be viewed as a random vector depending on the distribution of data, y=(y0,y1,…,yn). We now consider the case when σ2 is unknown. The MLE of σ2 is denoted by σ^2. After finding θ^,τ^,a^ as in (7), the log-likelihood equation in (6) is maximized as a function of σ2. The resulting estimate is available in closed form and is given by(8)σ^2=1n+1∑i=0nyi-xti,θ^,τ^,a^2.
3.1. A Two-Stage Numerical Procedure for Finding the MLE
To find the MLE numerically, we develop a two-stage numerical procedure consisting of an adaptive grid procedure, then followed by a gradient descent algorithm. Two stages are needed as we wish to utilize the advantages of each algorithm while avoiding the drawbacks of the other in each stage. Grid algorithms are able to find the global maximum of a function over a grid space. First, it evaluates values of the function on the grid space and then finds the grid value that corresponds to the maximum. Provided the grid space is refined enough, the grid value corresponding to this maximum will be close to the domain value that actually corresponds to the global maximum. So by gridding, we are able to ensure that we are close to the global maximum. The adaptive grid algorithm enhances the original gridding algorithm so that we will move closer and closer to the global maximum. However, the main drawback of any grid (and adaptive grid) algorithm is its slowness in convergence.
On the other hand, gradient descent algorithms can converge to maxima of a function sufficiently quickly. The main drawback of gradient descent algorithms is that it will find the nearest local maximum from the starting point. So, if the original starting point is not close to the global maximum, a gradient descent algorithm will not guarantee that the global maximum is found since it might get “stuck” at a local maximum only.
The function to maximize in our case is the log-likelihood in (6) and the domain value corresponding to this maximum is the MLE. Thus, our two-step method uses adaptive grid in the first stage to ensure that we are close to the MLE and then switches to the quasi-Newton algorithm to ensure rapid convergence to the MLE.
3.1.1. Grid Procedure
To find θ^ and τ^ as in (7), the value with largest (log) likelihood should be chosen. This can be done by an adaptive grid procedure. The gridding is carried out for θ and τ, and for each pair of (θ,τ) in the grid space, a Newton-Raphson numerical procedure is used to find the maximum value of a defined as(9)a^=a^θ,τ=argmaxalθ,τ,a.
We use the grid space Θ={(θr,τs),r=1,2,…,R,s=1,2,…,S}, which covers RS values of (θr,τs). For every fixed value of θr and τs in Θ, we find the MLE of a, a^θ,τ, given by maximizing the log-likelihood above. Since a^θ,τ satisfies(10)∂lθ,τ,a^θ,τ∂a=0,the numerical problem is solved by using Newton-Raphson method: (11)ah+1θ,τ=ahθ,τ-∂lθ,τ,ahθ,τ/∂a∂2lθ,τ,ahθ,τ/∂a2,where(12)∂lθ,τ,a∂a=1σ2∑i=0nyi-xti,θ,τ,a∂xti,θ,τ,a∂a,(13)∂2lθ,τ,a∂a2=1σ2∑i=0nyi-xti,θ,τ,a∂2xti,θ,τ,a∂a2-1σ2∑i=0n∂xti,θ,τ,a∂a2;in (12) and (13), ∂xti,θ,τ,a/∂a and ∂2x(ti,θ,τ,a)/∂a2 are, respectively, the first and second partial derivative of xt,θ,τ,a with respect to a and then evaluated at t=ti for i=0,1,2,…,n. As seen from (12) and (13), we need to calculate ∂x/∂a and ∂2x/∂a2 at each t=ti,i=0,1,2,…,n. This is done recursively as follows.
The first derivative process of ∂x/∂a is obtained by differentiating (2) with respect to a. Differentiating (2) with respect to a, where θ and a are independent of each other, gives(14)∂x˙∂a=∂f∂x∂x∂a+∑j=1m∂f∂zj∂zj∂awhich implies that the first derivative process ∂x/∂a satisfies another DDE which is given by (14).
Similarly, the second derivative process ∂2x/∂a2 is obtained by differentiating (14) with respect to a, to obtain(15)∂2x˙∂a2=∂2f∂x2∂x∂a2+∑j=1,k=1m∂2f∂zj∂zk∂zj∂a∂zk∂a+2∑j=1m∂2f∂x∂zj∂x∂a∂zj∂a+∂2x∂a2∂f∂x+∑j=1m∂2zj∂a2∂f∂zjwhich implies that ∂2x/∂a2 satisfies a DDEM depending on ∂x/∂a. The above two DDEs can be solved numerically based on initial conditions that are specified below.
To obtain the initial conditions of the first and second derivative process, we note that(16)xt=afor t∈-∞,t0.Thus, ∂x/∂a=1 and ∂2x/∂a2=0 for t∈(-∞,t0]. Subsequently, we can get the value of ∂lθ,τ,a/∂a and ∂2lθ,τ,a/∂a2 numerically at every value of ti by numerically solving the DDEs using (12) and (13). We divide each [ti-1,ti] into M equal segments. Here, M is a natural number.
The grid algorithm operates on the grid space of (θ,τ), and the Newton-Raphson procedure is nested within the adaptive grid algorithm. Thus, for every grid value pair θr,τs, the Newton-Raphson uses these values of (θ,τ) to find “a” via (11). On convergence of the Newton-Raphson method, we obtain the MLE of a, a^(θr,τs), for each point grid (θr,τs) in Θ. The log-likelihood lθ,τ,a is calculated based on (6) using xi=x(ti,θr,τs,a^(θr,τs)). Then the point based maximum is found by finding the maximum l(θr,τs,a^θr,τs) as a function of θr,τs. We define the MLE which is obtained from the gridding algorithm as(17)θ^G,τ^G=argmaxr,slθr,τs.
3.1.2. The Adaptive Grid Procedure
The adaptive grid (AG) algorithm is a repeated application of the generic grid procedure over increasingly finer intervals for (θ,τ). The AG algorithm is as follows:
Choose an initial grid space Θ(0) consisting of the grid points (θr0,τs0), r=1,2,…,R and s=1,2,…,S.
Maximize l(θr,τs,a^θr,τs) with respect to r and s as described in the grid procedure above.
Obtain (θ^G0,τ^G0) as in (17).
Refine the grid: suppose (θr00,τs00)≡(θ^G0,τ^G0) as in #(3). The new grid space Θ(1) has lower and upper θ-grid points given by (θr0-10,θr0+10). The corresponding lower and upper τ-grid points are (τs0-10,τs0+10). If either the lower or upper bounds are not found, then the original grip space is enlarged so that the MLE occurs in the interior of Θ(0).
Repeat steps #(2)–#(4) to obtain (θ^G1,τ^G1) based on the generic grid procedure. Repeat to generate the sequence (θ^Gk,τ^Gk), k=0,1,2,…. Stop at k∗ when θ^Gk∗,τ^Gk∗-(θ^Gk∗-1,τ^Gk∗-1)<δ, a prespecified threshold.
The final MLE based on the adaptive grid technique is(18)θ^0,MLE,τ^0,MLE=θ^Gk∗,τ^Gk∗.
Remark 1.
In step #(1), the initial grid space Θ(0) is chosen to be a large domain that is likely to contain the MLE. In our simulation experiments, since the true values of θ,τ are known, the domain is selected around these true values. In practice, we need to carry out an exhaustive search within the upper and lower bounds of θ and τ. If the parameters are positive, say, as is usually the case, the lower bounds can be taken to be zero. Next, we can consider a large positive numbers, say B and C, and construct the grid in 0,Bp×0,Cm consisting of H equidistant marginal grind points. The value of H need not be too large since we only aim to explore the log-likelihood profile. The log-likelihood can be evaluated at these grid points and plotted to visualize properties of the resulting surface. Depending on this plot, we can choose either to fix or increase B and C until we are certain that the MLE is within the selected domain.
After obtaining the first-step approximation to the MLE by the adaptive grid procedure above, we use the MATLAB function fminunc to obtain the final MLE by minimizing the negative log-likelihood function viewed as a function of the unknown parameter vector Γ=(Γ1,Γ2,…,ΓJ)=(θ,τ). We have J=p+m, and the final step MLE is defined as (19)ΓMLE=argminΓ-l0Γ,where l0Γ=l0θ,τ=lθ,τ,a^θ,τ with a^θ,τ defined in (9). We require to input the gradient vector for this MATLAB function which is given by (20)∇l0Γ=∂l0Γ∂Γ1∂l0Γ∂Γ2⋮∂l0Γ∂Γj.The explicit expression of each entry of ∇l0Γ is provided in Appendix A. The MATLAB function fminunc uses, as an option, a quasi-Newton procedure that does not require the calculation of second derivatives and hence saves computational time.
3.2. Statistical Inference Based on MLEs3.2.1. Information Matrix
Now we incorporate σ2 into the estimation procedure as well. Once (θ^MLE,τ^MLE,a^MLE) is obtained by the above two-stage procedure, the MLE of σ2 is obtained analytically as(21)σ^MLE2=1n+1∑i=0nyi-xti,θ^MLE,τ^MLE,a^MLE2.Let Γ=(Γ1,Γ2,…,ΓK)=(θ,τ,a,σ2) denote the K×1 vector of all unknown parameters (including σ2) where K=J+2=p+m+2. Subsequent inference based on the MLEs requires the computation of the Fisher information [14, 15]. The Fisher information matrix I(Γ) is given by the K×K symmetric matrix whose (u,v)th element is the covariance between uth and vth first partial derivatives of the log-likelihood:(22)IΓu,v=Cov∂lΓ∣y∂Γu,∂lΓ∣y∂Γv.Based on the expected values of the second partial derivatives, the Fisher information matrix in (22) is equivalent to(23)IΓu,v=-E∂2lΓ∣y∂Γu∂Γv,1≤u,v≤K.The observed Fisher information matrix is simply I(Γ^MLE), the information matrix evaluated at the maximum likelihood estimate, Γ^MLE, of Γ. Further, its inverse evaluated at the MLE is an estimate of the asymptotic covariance matrix for Γ^MLE which is given by(24)COVΓ^MLE=IΓ^MLE-1.Since the log-likelihood function is given by(25)lθ,τ,a,σ2∣y=-n+12ln2πσ2-12σ2∑i=0nyi-xti,θ,τ,a2,the first-order partial derivative of lθ,τ,a,σ2∣y with respect to each element of Γ is given by(26)∂lθ,τ,a,σ2∣y∂Γu=1σ2∑i=0nyi-xti,θ,τ,a∂xti,θ,τ,a∂Γufor 1≤u≤K. The second-order partial derivative is(27)∂2lθ,τ,a,σ2∣y∂Γu∂Γv=1σ2∑i=0nyi-xti,θ,τ,a∂2xti,θ,τ,a∂Γu∂Γv-∑i=0n∂xti,θ,τ,a∂Γu∂xti,θ,τ,a∂Γv,for 1≤u,v≤K, (28)∂2l∂Γu∂σ2=-1σ22∑i=0nyi-xti,θ,τ,a∂xti,θ,τ,a∂Γufor 1≤u≤K, and(29)∂2l∂σ22=12σ4n+1-2σ2∑i=0nyi-xti,θ,τ,a2.Taking expectations on both sides of the above equations and from (23), we get (30)IΓu,v=-E∂2lΓ∣y∂Γu∂Γv=1σ2∑i=0n∂xti,θ,τ,a∂Γu∂xti,θ,τ,a∂Γv,for 1≤u, v≤K,(31)IΓu,σ2=-E∂2lΓ∣y∂Γu∂σ2=0for 1≤u≤K, and(32)IΓσ2,σ2=-E∂2lΓ∣y∂σ22=n+12σ4.We compute each element (u,v) of the matrix in (23) for DDEMs with single and multiple delays; the explicit expressions are given in Appendices A and B.
3.2.2. Confidence Intervals
A confidence interval for an unknown parameter gives the range of values most likely to cover the true value of the parameter with high probability. The standard form of a confidence interval is (33)estimate+/-margin of error.To construct a level C confidence interval for any element of Γ=(Γ1,Γ2,…,ΓK)=(θ,τ,a,σ2), say, Γu, for 1≤u≤K, we need to find an estimate of the margin of error. First, the estimated standard error of the maximum likelihood estimate, Γ^u,MLE, of Γu is given by(34)SEΓ^u,MLE=COVΓ^MLEu,u,where COVΓ^MLE is the covariance matrix as given in (24). The explicit terms of the covariance matrix can be obtained by substituting (30) into (24). The confidence interval for Γu is(35)Γ^u,MLE±zα/2SEΓ^u,MLE,where zα/2=norminv(1-α/2), α=0.05, is the desired significance level and zα/2SEΓ^u,MLE is the margin of error. We can find confidence intervals for all components of Γ=(Γ1,Γ2,…,ΓK)=(θ,τ,a,σ2) in this way. In some cases, the estimated confidence interval in (35) may include some negative values which is unreasonable for a parameter that is known to be positive. In this case, we perform a logarithmic transformation of the parameter, construct the confidence interval for the log-transformed parameter, and then transform the confidence interval back to the original parameter space. The confidence interval for Γu based on this log-transformation procedure is(36)explogΓ^u,MLE±zα/2SEΓ^u,MLEΓ^u,MLE.
4. Examples
We present three examples of DDEMS in the univariate case: we consider two models with a single delay and a third one with two delays.
4.1. Example 1
We consider the exponential delay differential equation model (EDDEM) with a single delay (i.e., p=1,m=1) which is the solution to the DDE (37)x˙t=θxt-τ.The EDDEM in (37) is a model for ideal population growth under infinite resources and no deaths, such as a protozoan or bacterial culture dividing under constant environmental conditions. The delay parameter τ can be taken to represent the gestation period or maturity period, that is, the time taken for individuals to be ready for division. The parameter θ represents the growth rate of the population. We numerically solve the DDE in (37) using the MATLAB function dde23 with fixed parameters (θ,τ,a,σ2) values at (0.5,1,5,0.01). Sampled observations from the DDEM as in (1) were obtained at discrete time intervals of width h=0.1 starting from t0=0. The endpoint considered is tn=10 corresponding to n+1=101. The aim is to estimate θ, τ, a, and σ2 based on y0,y1,…,yn. Figure 1(a) illustrates the different behaviour of xt based on different parameter specifications. Figure 1(b) shows the underlying trajectories of the solution xt from the DDE model (37) and the n+1 sampled observations.
Numerical solution of the EDDEM (37) using the fixed parameter values in [0,100] by step size h=0.1 with θ=0.5 and τ=1 or τ=1.5
Numerical solution of the EDDEM (37) using the estimated parameter values. The stars are the simulated noisy data from (1) at n+1=101 equally spaced time points in [0,10] by step size h=0.1 with θ=0.5 and τ=1
The initial grid space for the adaptive grid procedure was taken to be Θ(0)=θr,τs:θr=θ0+rh,τs=τ0+sh,r=1,2,…,R;s=1,2,…,S with (θ0,τ0)=0.1,0.6, h=0.1, R=S=9; see the remark that is given towards the end of this section regarding the selection of Θ(0) for this example as well as for the rest. The stopping criteria threshold δ was chosen to be 0.0001. The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 1 and 2. Recall that M is the number of subdivisions of each interval [ti-1,ti] needed for calculating the quantities ∂xti,θ,τ,a/∂a and ∂2x(ti,θ,τ,a)/∂a2; see Section 3.1.1. As M increases, the MLE estimates become more accurate but at the cost of increased computational time. We note that, for M=50 or M=100, satisfactory results are already achieved in terms of closeness to the true parameter values of θ,τ,a,σ2=(0.5,1,5,0.01). Subsequently, M=100 is considered for finding the maximum of the log-likelihood function, for finding the information matrix and computing the confidence intervals of the parameters.
Table showing values of Max.Val(l), a^θ^,τ^, θ^,τ^, and σ^2 for the EDDEM by using the adaptive grid procedure. Here, the number of equally spaced time points in [0,10] is n+1=101 with h=0.1.
M
Max.Val(l)
a^θ^,τ^
θ^,τ^
σ^2
10
−0.5817
5.0181
(0.5000,1.0000)
0.0114
20
−0.5037
5.0091
(0.5000,1.0000)
0.0099
30
−0.4880
5.0061
(0.5000,1.0000)
0.0096
40
−0.4821
5.0046
(0.5000,1.0000)
0.0095
50
−0.4792
5.0037
(0.5000,1.0000)
0.0094
100
−0.4748
5.0019
(0.5000,1.0000)
0.0093
1000
−0.4724
5.0002
(0.5000,1.0000)
0.0093
Table showing value of Max.Val(l), ∇lΓ, and (θ^MLE,τ^MLE) for the EDDEM with h=0.1.
M
Max.Val(l)
∇lΓ
θ^0,MLE,τ^0,MLE
(θ^MLE,τ^MLE)
100
0.4715
10-2×0.0006-0.1001
(0.5000,1.000)
(0.5001645,1.0000017)
The Fisher information matrix as h=0.1 and M=100 for EDDEM is (38)IΓ=109×1.08870.04040.045900.04040.00150.001700.04590.00170.002000000.0006,and variance of Γ^ at the MLE is given by(39)VarΓ^=10-4×0.0004-0.01290.00150-0.01294.3309-3.502900.0015-3.50293.044500000.0170.The 95% confidence intervals for parameters (θ,τ,a,σ2) in EDDE with single delay are shown, respectively, in Table 3.
Table showing the 95% confidence intervals for parameters θ,τ,a, and σ2 in EDDEM.
M
(θL,θU)
(τL,τU)
(aL,aU)
(σL2,σU2)
100
(0.4996,0.5004)
(0.9592,1.0408)
(4.9677,5.0361)
(0.0068,0.0119)
4.2. Example 2
A delay differential equation in population ecology given by (40)x˙t=θxt1-xt-τKis known as Hutchinson’s equation [16], where x is the population at that instant, θ is the intrinsic growth rate, and K is the carrying capacity of the population. Both θ and K are positive constants and τ is a positive constant delay parameter.
Define θ1=θ and θ2=θ/K; then (40) can be rewritten as(41)x˙t=θ1xt-θ2xtxt-τ.The observations y0,y1,…,yn are collected at the (n+1) sampled time points T0=t0<t1<⋯<tn=T1, based on the observational model (1), and the aim is to estimate θ1,θ2, τ, a, and σ2 based on the observations y0,y1,…,yn. The DDE in (41) is solved numerically by using the MATLAB function dde23 with fixed parameters (θ1,θ2,τ,a,σ2) values. Figure 2(a) shows the underlying trajectories (mean function) of the solution xt from the DDE model (41). Figure 2(a) illustrates the different behaviour of xt based on different parameter specifications which reflect both stable and unstable solutions. Subsequently, we fix the parameter specification at (θ1,θ2,τ,a,σ2)=(0.5,0.7,2.5,0.5,0.0001). Sampled observations from the DDEM as in (1) were obtained at discrete time intervals of width h=0.1 starting from t0=0. The endpoint considered is tn=10 corresponding to n=100 where the number of sampled time points are n+1. Figure 2(b) shows the underlying trajectory of the solution xt from the DDE model (41) and the n+1 sampled observations for the time range selected.
Numerical solution of the DLDEM with two delays (41) using the fixed parameter values in [0,100] by step size h=0.1 with θ1=0.5 and θ2=0.7. The steady state is stable when τ=2.5 and becomes unstable when τ=3.19
Numerical solution of the DLDEM with two delays (41) using the estimated parameter values. The stars are the simulated noisy data by adding noises to the DLDE solutions at n+1=101 equally spaced time points in [0,10] by step size h=0.1 with θ1=0.5, θ2=0.7, and τ=2.5
The initial grid space for the adaptive grid procedure was taken to be Θ(0)=θ1u,θ2v,τr:θ1u=θ10+uh,θ2v=θ20+vh,τr=τ0+rh with (θ10,θ20,τ0)=0.4,0.6,2.4,h=0.1 and U=V=R=3. The stopping criteria threshold δ was chosen to be 0.0001. The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 4 and 5. As in the previous example, as M increases, the a^θ1,θ2,τ becomes more accurate and very close to be the true value aθ1,θ2,τ=0.5 but at the cost of increased computational time.
Table showing values of Max.Val(l), a^θ^1,θ^2,τ^, θ^1,θ^2,τ^, and σ^2 for the DLDEM with single delay by using an adaptive grid. Here, the number of equally spaced time points in [0,10] is n+1=101 with h=0.1.
M
Max.Val(l)
a^θ1,θ2,τ
θ^1,θ^2,τ^
σ^2
10
-0.0049
0.5008
(0.5, 0.7, 2.5)
0.000095
20
-0.0048
0.5005
(0.5, 0.7, 2.5)
0.000095
30
-0.0048
0.5005
(0.5, 0.7, 2.5)
0.000095
40
-0.0048
0.5004
(0.5, 0.7, 2.5)
0.000095
50
-0.0048
0.5004
(0.5, 0.7, 2.5)
0.000095
100
-0.0048
0.5003
(0.5, 0.7, 2.5)
0.000095
1000
-0.0048
0.5003
(0.5, 0.7, 2.5)
0.000095
Table showing value of Max.Val(l), ∇lΓ, and (θ^MLE,τ^MLE) for the DLDEM with single delay with h=0.1.
M
Max.Val(l)
∇lΓ
θ^10,MLE,θ^20,MLE,τ^0,MLE
(θ^1MLE,θ^2MLE,τ^MLE)
100
0.0048
10-5×-0.4019-0.26530.9364
(0.5,0.7,2.5)
(0.5007149, 0.7006367, 2.5004988)
The Fisher information matrix as h=0.1 and M=100 for DLDEM with single delay is(42)IΓ=106×3.7030-2.2042-0.1819-0.57570-2.20421.46650.20690.32850-0.18190.20690.07740.00590-0.57570.32850.00590.3164000005.6613×103,and variance of Γ^ by using MLE is given by(43)VarΓ^=10-3×0.02410.0463-0.0669-0.003000.04630.0907-0.1330-0.00750-0.0669-0.13300.21020.01250-0.0030-0.00750.01250.0052000001.7664×10-7.The 95% confidence intervals for parameters (θ1,θ2,τ,a,σ2) in DLDE with single delay are shown in Table 6.
Table showing the 95% confidence intervals for parameters θ1,θ2,τ,a, and σ2 in DLDE with single delay.
M
(θ1L,θ1U)
(θ2L,θ2U)
(τL,τU)
(aL,aU)
(σL2,σU2)
100
(0.4904,0.5096)
(0.6813,0.7187)
(2.4716,2.5284)
(0.4959,0.5048)
(0.00007,0.00012)
4.3. Example 3
The delayed logistic differential equation model (DLDEM) with two delays proposed by Braddock and van den Driessche [17] is the solution to the DDE(44)x˙t=θxt1-xt-τ1k1-xt-τ2k2,where θ, k1, k2, τ1, and τ2 are positive constants. DDEs with two delays appear in many applications such as epidemiological models [18], physiological models [19], neurological models [20], and medical models [21]. In such equations [22, 23], very rich dynamics have been observed. Denoting θ1=θ, θ2=θ/k1, θ3=θ/k2, z1(t)=xt-τ1, and z2(t)=xt-τ2, we obtain(45)x˙t=θ1xt-θ2z1txt-θ3z2txt.By using the MATLAB function dde23 with fixed parameters (θ1,θ2,θ3,τ1,τ2,a,σ2) values, we obtain the trajectories of the solution xt. As in Example 2, we note the different characteristics of the solution depending on the parameter specifications as shown in Figure 3(a). Subsequently, the parameters are fixed at 0.5,0.7,0.12,2,5,0.5,0.0012 and observations y0,y1,…,yn are collected at the (n+1) sampled time points T0=t0<t1<⋯<tn=T1, based on the observational model (1) at discrete time intervals of width h=0.1 starting from t0=0. The endpoint considered is tn=10 corresponding to n+1=101. Figure 3(b) shows the underlying trajectories of the solution xt from the DDE model (41) and the n+1sampled observations.
Numerical solution of the DLDEM with two delays (45) using the fixed parameter values in [0,100] by step size h=0.1 with θ1=0.5,θ2=0.7, and θ3=0.12. The steady state is stable when τ1=2 and τ2=5 and becomes unstable when τ1=3.15 and τ2=6
Numerical solution of the DLDEM with two delays (45) using the estimated parameter values. The stars are the simulated noisy data by adding noises to the DLDE solutions at n+1=101 equally spaced time points in [0,10] by step size h=0.1 with θ1=0.5,θ2=0.7, θ3=0.12, τ1=2, and τ2=5
The initial grid space for the adaptive grid procedure is taken to be Θ(0)=θ1u,θ2v,θ3w,τ1r,τ2s:θ1u=θ10+uh,θ2v=θ20+vh,θ3w=θ30+wh,τ1r=τ10+rh,τ2s=τ20+sh with θ10,θ20,θ30,τ10,τ20=0.4,0.6,0.02,1.9,4.9,h=0.1 and U=V=W=R=S=3. The stopping criteria threshold δ was chosen to be 0.0001. The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 7 and 8.
Table showing value of Max.Val(l), a^(θ1,θ2,θ3,τ1,τ2), θ^1,θ^2,θ^3,τ^1,τ^2, and σ^2 in DLDEM with two delays by using an adaptive grid for (θ1,θ2,θ3,τ1,τ2)=(0.5,0.7,0.12,2,5) and n+1=101 equally spaced time points in [0,10] at h=0.1.
M
Max.Val(l)
a^(θ1,θ2,θ3,τ1,τ2)
θ^1,θ^2,θ^3,τ^1,τ^2
σ^2
10
-6.4697×10-5
0.5007
(0.5,0.7,0.12,2,5)
1.2686×10-6
20
-6.5220×10-5
0.5006
(0.5,0.7,0.12,2,5)
1.2788×10-6
30
-6.5478×10-5
0.5005
(0.5,0.7,0.12,2,5)
1.2839×10-6
40
-6.5622×10-5
0.5005
(0.5,0.7,0.12,2,5)
1.2867×10-6
50
-6.5714×10-5
0.5005
(0.5,0.7,0.12,2,5)
1.2885×10-6
100
-6.5910×10-5
0.5005
(0.5,0.7,0.12,2,5)
1.2923×10-6
1000
-6.6098×10-5
0.5004
(0.5,0.7,0.12,2,5)
1.2960×10-6
Table showing value of Max.Val(l), ∇lΓ, and θ^1MLE,θ^2MLE,θ^3MLE,τ^1MLE,τ^2MLE for DLDEM with two delays with h=0.1.
The Fisher information matrix as h=0.1 for DLDEM with two delays at M=100 is(46)IΓ=108×1.5952-0.9359-0.8343-0.2912-0.0098-0.23240-0.93590.56300.49730.18090.00740.13280-0.83430.49730.44320.16190.00630.11210-0.29120.18090.16190.07110.00330.04080-0.00980.00740.00630.00330.00030.00010-0.23240.13280.11210.04080.00010.15190000000305361,and variance of Γ^ by using MLE is given by(47)VarΓ^=10-3×0.06420.03310.0983-0.0161-0.65150.001400.03310.02150.0457-0.0063-0.35860.000200.09830.04570.1569-0.0275-0.97090.00270-0.0161-0.0063-0.02750.00660.1444-0.00070-0.6515-0.3586-0.97090.14446.8315-0.010200.00140.00020.0027-0.0007-0.01020.000200000003.3000×10-11.The 95% confidence intervals for parameters (θ1,θ2,θ3,τ1,τ2,a,σ2) in DLDE with two delays are shown in Table 9.
Table showing the 95% confidence intervals for parameters θ1,θ2,θ3,τ1,τ2,a, and σ2 in DLDE with two delays.
M
100
(θ1L,θ1U)
(0.4843,0.5157)
(θ2L,θ2U)
(0.6909,0.7091)
(θ3L,θ3U)
(0.0955,0.1445)
(τ1L,τ1U)
(1.9950,2.0050)
(τ2L,τ2U)
(4.8380,5.1620)
(aL,aU)
(0.4996,0.5013)
(σL2,σU2)
(0.0000009,0.0000016)
Remark 2.
As mentioned earlier, the adaptive grid procedure in the first stage of our two-step procedure needs to select a sufficiently large domain that is likely to contain the MLE. The MLE should be close to the true parameter values that generated the data as standard MLE theory [11–13] dictates. This has also been established in the three examples considered. Hence, since the true value of (θ,τ) is known in our simulation examples, we selected the initial domain of the grid procedure to contain these true values in its interior. Thus, the notation θ0,τ0 denotes the lower bound of the parameters which is used for the grid procedure. The grid Θ(0)=θr,τs:θr=θ0+rh,τs=τ0+sh,r=1,2,…,R;s=1,2,…,S is ensured to contain the true parameter values based on selection of R, S, and h in all the examples. Other than this consideration, the true values that were used in the simulation were selected rather arbitrarily, only chosen so as to be representative parameter values that exhibit the typical nature of trajectories of the underlying DDEs as shown in the figures.
5. Conclusion
In this paper, we presented the method of maximum likelihood for estimating parameters in delayed differential equations. As examples we considered the exponential differential equation model, delayed logistic differential equation model with single delay, and delayed logistic differential equation model with two delays; then we estimated the unknown parameters in these models. Two-step approach using an adaptive grid followed by a gradient descent procedure is proposed. Our methodology estimates the delay parameter as well as the initial starting value of the dynamical system correctly based on simulation data. Confidence intervals and information matrix by using maximum likelihood are obtained and are found to contain the true values of the parameter based on simulation data.
In this paper, we took the initial value function of the DDE as an unknown constant “a”. However, it is possible to extend the constant initial value assumption to a more general linear or nonlinear function, say φ(x), for x∈-τ,0. Two complications arise here. First, we need additional unknown parameters to represent φ(x); for example, if φx=a+bx is chosen to be linear, we have to estimate parameter b in addition to a. Higher order functions offer greater flexibility in modeling the initial function but at the expense of estimating extra parameters and slowing down the computational procedure. A second issue that follows the first is the selection of a “best” initial value function—either constant, linear, quadratic, or others. Thus, further research is required to address this concern and we hope to report some results in this direction in the future.
AppendixA. Quasi-Newton Procedure
As input to the quasi-Newton procedure, we require to compute the gradient vector, ∇l0Γ, as given in (20), which consists of the partial derivatives of the log-likelihood function with respect to the entries of Γ. Recall that the DDEM has multiple delays parameters given by τ=(τ1,τ2,…,τm), evolution parameters given by θ=θ1,θ2,…,θp, and unknown initial condition a. Based on the log-likelihood(A.1)lθ,τ,a=-12σ2∑i=0nyi-xti,θ,τ,a2,recall that we define lθ,τ,a^(θ,τ) as (A.2)lθ,τ,a^θ,τ=-12σ2∑i=0nyi-xti,θ,τ,a^θ,τ2.Denoting l0(θ,τ)≡lθ,τ,a^(θ,τ) and xi=xti,θ,τ,a^θ,τ, the first-order partial derivatives are as follows:(A.3)∂l0∂θu=1σ2∑i=0nyi-xi∂xi∂θu+∂xi∂a∂a^∂θu,∂l0∂τv=1σ2∑i=0nyi-xi∂xi∂τv+∂xi∂a∂a^∂τv.The above equations involve derivatives of xi with respect to the parameters. Each derivative expression of ∂xi/∂θu, ∂xi/∂τv, and ∂xi/∂a for i=0,1,2,…,n can be numerically obtained from respective DDEs which are derived from the initial model in (2) by differentiating it with respect to the quantity of interest. In the case of ∂xi/∂θu, differentiating (2) with respect to θu gives a new DDE for ∂x/∂θu:(A.4)∂x∂θu˙=∂f∂x∂x∂θu+∑j=1m∂f∂zj∂zj∂θu+∂f∂θu,where ∂x/∂θu˙ is the derivative of ∂x/∂θu with respect to t and ∂zj/∂θu is the delayed version of ∂x/∂θu; that is, (A.5)∂zj∂θut=∂x∂θut-τj.The initial condition for the DDE in (A.4) is ∂x/∂θu=0 since the derivative of the initial value a with respect to θu is 0. Based on this initial condition, the above DDE model can be numerically solved and the values of ∂xi/∂θu can be obtained from ∂x(t)/∂θu for each t=ti,i=0,1,2,…,n.
In a similar way, each value of ∂xi/∂τv can be determined by differentiating (2) with respect to τv. The new DDE for ∂x/∂τv is(A.6)∂x∂τv˙=∂f∂x∂x∂τv+∑j=1m∂f∂zj∂xt-τj∂τv-ft-τv,xt-τv,z1t-τv,z2t-τv,…,zmt-τv,θ,where ∂x/∂τv˙ is the derivative of ∂x/∂τv with respect to t and ∂zj/∂τv is the delayed version of ∂x/∂τv; that is, (A.7)∂zj∂τvt=∂xt-τj∂τv-ft-τv,xt-τv,z1t-τv,z2t-τv,…,zmt-τv,θ.The initial condition for the DDE in (A.6) is ∂x/∂τv=0 since, again, the derivative of the initial value a with respect to τv is 0. Based on this initial condition, the above DDE model can be numerically solved and the values of ∂xi/∂τv can be obtained from ∂x(t)/∂τv for each t=ti,i=0,1,2,…,n. The case of ∂xi/∂a is similar and has been discussed in the main text when presenting the Newton-Raphson procedure.
The expressions in (A.3) also involve ∂a^/∂θu and ∂a^/∂τv. They can be obtained from differentiating the equation satisfied by a^θ,τ in (10) for every pair θ,τ:(A.8)∂lθ,τ,a^θ,τ∂a=0∀θ,τ.Differentiating with respect to θu, we get(A.9)∂2l∂θu∂a+∂2l∂a2∂a^∂θu=0;thus(A.10)∂a^∂θu=-∂2l/∂θu∂a∂2l/∂a2for 1≤u≤p. Similarly, from differentiating (10) with respect to τv, we get (A.11)∂2l∂τv∂a+∂2l∂a2∂a^∂τv=0,and hence, (A.12)∂a^∂τv=-∂2l/∂τv∂a∂2l/∂a2,for 1≤v≤m. We give the explicit expressions for the second-order derivatives of l with respect to its arguments in Appendix B (as given by (B.6), (B.7), and (B.11) in Appendix B).
B. Information Matrix
For the DDE in (2) with multiples delays τ=τ1,τ2,…,τm and θ=θ1,θ2,…,θp, recall that lθ,τ,a,σ2=-n+1/2ln2πσ2-1/2σ2∑i=0nyi-xi2, where xi=xti,θ,τ,a. The first-order partial derivatives of lθ,τ,a,σ2 are as follows: (B.1)∂l∂θu=1σ2∑i=0nyi-xi∂xi∂θu,∂l∂τv=1σ2∑i=0nyi-xi∂xi∂τv,∂l∂a=1σ2∑i=0nyi-xi∂xi∂afor 1≤u≤p, 1≤v≤m. As mentioned in Appendix A, these partial derivatives of lθ,τ,a,σ2 can be evaluated numerically from the derivative expression of ∂xi/∂θu, ∂xi/∂τv, and ∂xi/∂a for i=0,1,2,…,n, since each of them forms an additional DDE derived from (2) by differentiating it with respect to the quantity of interest. Further we have (B.2)∂l∂σ2=-12σ2n+1-1σ2∑i=0nyi-xi2.Differentiating the above again with respect to the arguments of lθ,τ,a,σ2, the second-order partial derivatives have the general forms (B.3)∂2l∂θu∂θu∗=1σ2∑i=0nyi-xi∂2xi∂θu∂θu∗-∑i=0n∂xi∂θu∂xi∂θu∗,(B.4)∂2l∂θu∂τv∗=1σ2∑i=0nyi-xi∂2xi∂θu∂τv∗-∑i=0n∂xi∂θu∂xi∂τv∗,(B.5)∂2l∂τv∂τv∗=1σ2∑i=0nyi-xi∂2xi∂τv∂τv∗-∑i=0n∂xi∂τv∂xi∂τv∗,(B.6)∂2l∂θu∂a=1σ2∑i=0nyi-xi∂2xi∂θu∂a-∑i=0n∂xi∂θu∂xi∂a,(B.7)∂2l∂τv∂a=1σ2∑i=0nyi-xi∂2xi∂τv∂a-∑i=0n∂xi∂τv∂xi∂a,(B.8)∂2l∂θu∂σ2=-1σ22∑i=0nyi-xi∂xi∂θu,(B.9)∂2l∂τv∂σ2=-1σ22∑i=0nyi-xi∂xi∂τv,(B.10)∂2l∂a∂σ2=-1σ22∑i=0nyi-xi(B.11)∂2l∂a2=1σ2∑i=0nyi-xi∂2xi∂a2-∑i=0n∂xi∂a2,(B.12)∂2l∂σ22=12σ22n+1-2σ2∑i=0nyi-xi2.Equations (B.6), (B.7), and (B.10) are required for the quasi-Newton procedure in Appendix A.
To obtain the information matrix, we need to take the expectation of the negative of the second-order derivatives of lθ,τ,a,σ2=lΓ∣y with respect to its arguments:(B.13)IΓu,v=-E∂2lΓ∣y∂Γu∂Γv.Taking negative on the LHS of (B.3)–(B.12), followed by expectation under the sampling distribution of each yi, we note that the general terms of the form (B.14)∑i=0nyi-xi∂2xi∂Γu∂Γvor(B.15)∑i=0nyi-xi∂2xi∂Γuon the RHS become zero since the expectation of yi equals xi. Hence we get (B.16)IΓu,v=-E∂2lΓ∣y∂Γu∂Γv=1σ2∑i=0n∂xti,θ,τ,a∂Γu∂xti,θ,τ,a∂Γvas in (30) as well as (B.17)IΓu,σ2=-E∂2lΓ∣y∂Γu∂σ2=0.Finally, taking expectation in (B.12), (B.18)E-∂2l∂σ22=-12σ22n+1-2σ2∑i=0nEyi-xi2=-12σ22n+1-2σ2n+1σ2=n+12σ22.
Conflicts of Interest
The authors declare that there are no conflicts of interest with respect to the research, authorship, and/or publication of this paper.
Acknowledgments
The authors would like to thank Universiti Teknologi PETRONAS for the financial assistance under ERGS and HiCOE grants from the Ministry of Education Malaysia, as well as resource facilities provided by Center for Intelligent Signal and Imaging Research (CISIR).
KuangY.BatzelJ. J.TranH. T.Stability of the human respiratory control system. I. Analysis of a two-dimensional delay state-space modelKalmár-NagyT.StépánG.MoonF. C.Subcritical Hopf bifurcation in the delay equation model for machine tool vibrationsBellenA.ZennaroM.EllnerS. P.KendallB. E.WoodS. N.McCauleyE.BriggsC. J.Inferring mechanism from time-series data: Delay-differential equationsWoodS. N.Partially specified ecological modelsWangL.CaoJ.Estimating parameters in delay differential equation modelsMehrkanoonS.MehrkanoonS.SuykensJ. A.Parameter estimation of delay differential equations: an integration-free LS-SVM approachFisherR. A.On the mathematical foundations of theoretical statisticsFisherR. A.Theory of statistical estimationHuberP. J.The behavior of maximum likelihood estimates under nonstandard conditionsProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability1967WaldA.Note on the consistency of the maximum likelihood estimateAkaikeH.Information theory and an extension of the maximum likelihood principleLehmannE. L.CasellaG.SpanosA.HutchinsonG. E.Circular causal systems in ecologyBraddockR. D.van den DriesscheP.On a two-lag differential delay equationCookeK. L.YorkeJ. A.Some equations modelling growth processes and gonorrhea epidemicsBeuterA.BélairJ.LabrieC.BélairJ.Feedback and delays in neurological diseases: A modeling study using gynamical systemsBélairJ.CampbellS. A.Stability and bifurcations of equilibria in a multiple-delayed differential equationBélairJ.MackeyM. C.MahaffyJ. M.Age-structured and two-delay models for erythropoiesisHaleJ. K.HuangW. Z.Global geometry of the stable regions for two delay differential equationsMahaffyJ. M.JoinerK. M.ZakP. J.A geometric analysis of stability regions for a linear differential equation with two delays