Maximum Likelihood Inference for Univariate Delay Differential Equation Models with Multiple Delays

This article presents statistical inference methodology based on maximum likelihoods for delay differential equation models in the univariate setting. Maximum likelihood inference is obtained for single and multiple unknown delay parameters as well as other parameters of interest that govern the trajectories of the delay differential equation models. The maximum likelihood estimator is obtained based on adaptive grid and Newton-Raphson algorithms. Our methodology estimates correctly the delay parameters as well as other unknown parameters (such as the initial starting values) of the dynamical system based on simulation data. We also develop methodology to compute the information matrix and confidence intervals for all unknown parameters based on the likelihood inferential framework. We present three illustrative examples related to biological systems.The computations have been carried out with help of mathematical software: MATLAB 8.0 R2014b.


Introduction
Delay differential equations (DDEs) are widely used to model many real life phenomena, especially in science and engineering.Examples include the modeling of spread of infectious diseases, modeling of tumor growth and the growth of blood clots in the brain, population dynamics, traffic monitoring, and price fluctuations of commodities in economics; see [1][2][3][4].A univariate delay differential equation model (DDEM) with multiple delays equates the real valued observations,   , as noisy realizations from an underlying DDE:   =  (  ) +   ,  = 0, 1, 2, . . ., , where   's are errors assumed to arise from a noise distribution with zero mean and unknown standard deviation  > 0.
In (1), (  ) is the solution, (), of the DDE ẋ () =  (,  () ,  1 () ,  2 () , . . .,   () , ) evaluated at the  time points,   ,  = 0, 1, . . ., ; in (2),   () = ( −   ),  = 1, 2, . . ., , is the th delay term with delay parameter   > 0, and  = ( 1 ,  2 , . . .,   ) is a vector of other parameters of interest that govern the trajectories of the underlying DDE in (2).Equations ( 1) and ( 2) constitute a univariate DDEM in the most general form.In a DDEM, the parameters   and   are often unknown and have to be estimated based on observations   ,  = 0, 1, . . ., .Not many methods appear in the statistical literature on parameter estimation and inference for DDEMs.Among the statistical approaches that have been suggested, many involve restrictions on the form of DDEMs that are being investigated.When such restrictions are relaxed, high computational costs and challenges arise.Typically, further inferential procedures such as obtaining standard errors and confidence intervals associated with parameter estimates involve further computational costs and challenges.We give a brief review of these works and approaches that have been reported in the literature in the following paragraph.
Ellner et al. [5] estimate the derivative of a univariate DDEM, which is assumed to be in an additive form, using nonparametric smoothing.Subsequently, they infer 2 Complexity the constant (single) delay parameter, , based on fitting a generalized additive model.Ellner's technique, although it unifies previous works, can thus be applied to DDEMs which satisfy the assumed additive form only. Wood [6] developed spline based model fitting techniques in the case when the DDEMs are partially specified.The spline based method involves high computational costs as cross-validation is used to select the smoothing coefficients associated with the penalty term as well as the unknown parameter estimates.A penalized semiparametric method is proposed by Wang and Cao [7] which involves maximizing an objective function consisting of two terms: a likelihood term and a penalty term which measures the discrepancy between an estimate of the derivative, ẋ (), and the right hand side of the DDEM in (1).The selection of smoothing coefficients is done, similar to [6], via cross-validation, whereas standard errors of parameter estimates are obtained by bootstrapping.It follows that the method of [7], like [6], involves high computational costs.Further, Wang and Cao consider only univariate DDEMs with a single delay parameter.An estimation method based on Least Squares Support Vector Machines (LS-SVMs) for approximating constant as well as time-varying parameters in deterministic parameter-affine DDEMs is presented by Mehrkanoon et al. [8].We note that Mehrkanoon performs parameter estimation only; no standard errors of estimates or confidence intervals are reported.Further, only single delays (either constant or time varying) are considered in [8].
In this paper, we consider parameter estimation and inference for univariate DDEMs with multiple delays based on the maximum likelihood.The method of maximum likelihood, as advocated by Fisher in his important papers [9,10], has become one of the most significant tools for estimation and inference available to statisticians.Maximum likelihood estimators (MLEs) are well defined once a distributional model is specified for the observations.MLEs have well-behaved and well-understood properties: Huber [11] presents general conditions whereby the MLE is consistent for the true value of the unknown parameters for large sample sizes.Wald [12] and Akaike [13] observed that the maximum likelihood estimator is a natural estimator for the parameters when the true distribution is unknown.The large sample theory and distributional properties of MLEs can be used to perform subsequent inference procedures such as obtaining standard errors and confidence intervals and performing tests of hypotheses at minimal additional computational costs.MLEs are also the basic estimators that are used in subsequent statistical inferential procedures such as model selection using Akaike Information Criteria (AIC), Bayes Information Criteria (BIC), and other model selection criteria.Model selection is an important issue in DDEMs, such as for partially specified DDEMs in [6], where several models can be elicited for an observed physical process, but one model needs to be selected among many which fits the observed data and is simple enough to understand (Occam's razor principle).
MLE can be developed for a large variety of estimation situations and is asymptotically efficient, which means that for large samples it produces the most precise estimates compared to non-MLE based methods (such as [8]).These are the reasons why we preferred using MLE over all other estimators for DDEMs in this paper.
The remainder of this paper is organized as follows: we define univariate DDEMs in Section 2. In Section 3, the MLE approach for DDEMs is outlined and the MLE is obtained computationally using an adaptive grid procedure followed by a gradient descent algorithm.We also develop algorithms for obtaining the information matrix and construct standard errors and confidence intervals for the unknown parameters.Three examples of univariate DDEMs related to biological systems are presented, and the numerical solutions and results based on the proposed methodology are provided based on simulation in Section 4.

The Maximum Likelihood Estimation Approach for DDEMs
The likelihood of the DDEM for parameters , , and , given observations  0 ,  1 , . . .,   , is where y = ( 0 ,  1 , . . .,   ) is the collection of all ( + 1) observations on y with density based on the normality assumption on   's in (1).The above expression for the likelihood can be simplified to We assume for the moment that  > 0 is fixed and known; the case when  > 0 is unknown is dealt with later.Thus, the above likelihood is taken to be a function of (, , ) for now.The usual practice for statistical inference is to use the natural logarithm of the likelihood function, namely, the loglikelihood function, which is given by Expressions of the log-likelihood  are often simpler than the likelihood function, , since they are easier to differentiate and the results are more stable computationally.Since ln() is a monotonically increasing function of , it follows that the maximization of (3) and ( 6) is equivalent in that the same optimized parameter is found.We denote the MLE of (, , ) as ( θ, τ, â).The MLE ( θ, τ, â) is a point estimate such that and can be viewed as a random vector depending on the distribution of data, y = ( 0 ,  1 , . . .,   ).We now consider the case when  2 is unknown.The MLE of  2 is denoted by σ2 .After finding ( θ, τ, â) as in (7), the log-likelihood equation in ( 6) is maximized as a function of  2 .The resulting estimate is available in closed form and is given by

A Two-Stage Numerical Procedure for Finding the MLE.
To find the MLE numerically, we develop a two-stage numerical procedure consisting of an adaptive grid procedure, then followed by a gradient descent algorithm.Two stages are needed as we wish to utilize the advantages of each algorithm while avoiding the drawbacks of the other in each stage.Grid algorithms are able to find the global maximum of a function over a grid space.First, it evaluates values of the function on the grid space and then finds the grid value that corresponds to the maximum.Provided the grid space is refined enough, the grid value corresponding to this maximum will be close to the domain value that actually corresponds to the global maximum.So by gridding, we are able to ensure that we are close to the global maximum.The adaptive grid algorithm enhances the original gridding algorithm so that we will move closer and closer to the global maximum.However, the main drawback of any grid (and adaptive grid) algorithm is its slowness in convergence.
On the other hand, gradient descent algorithms can converge to maxima of a function sufficiently quickly.The main drawback of gradient descent algorithms is that it will find the nearest local maximum from the starting point.So, if the original starting point is not close to the global maximum, a gradient descent algorithm will not guarantee that the global maximum is found since it might get "stuck" at a local maximum only.
The function to maximize in our case is the log-likelihood in (6) and the domain value corresponding to this maximum is the MLE.Thus, our two-step method uses adaptive grid in the first stage to ensure that we are close to the MLE and then switches to the quasi-Newton algorithm to ensure rapid convergence to the MLE.
The first derivative process of / is obtained by differentiating (2) with respect to .Differentiating (2) with respect to , where  and  are independent of each other, gives which implies that the first derivative process / satisfies another DDE which is given by ( 14).
Similarly, the second derivative process  2 / 2 is obtained by differentiating (14) with respect to , to obtain which implies that  2 / 2 satisfies a DDEM depending on /.The above two DDEs can be solved numerically based on initial conditions that are specified below.
To obtain the initial conditions of the first and second derivative process, we note that Thus, / = 1 and  2 / 2 = 0 for  ∈ (−∞,  0 ].Subsequently, we can get the value of (, , )/ and  2 (, , )/ 2 numerically at every value of   by numerically solving the DDEs using ( 12) and (13).We divide each [ −1 ,   ] into  equal segments.Here,  is a natural number.
(4) Refine the grid: suppose as in #(3).The new grid space Θ (1) has lower and upper -grid points given by (  0 −1 (0) ,   0 +1 (0) ).The corresponding lower and upper -grid points are ).If either the lower or upper bounds are not found, then the original grip space is enlarged so that the MLE occurs in the interior of Θ (0) .
(5) Repeat steps #(2)-#(4) to obtain ( θ (1) , τ (1) ) based on the generic grid procedure.Repeat to generate the sequence ( θ () , τ Remark 1.In step #(1), the initial grid space Θ (0) is chosen to be a large domain that is likely to contain the MLE.In our simulation experiments, since the true values of (, ) are known, the domain is selected around these true values.In practice, we need to carry out an exhaustive search within the upper and lower bounds of  and .If the parameters are positive, say, as is usually the case, the lower bounds can be taken to be zero.Next, we can consider a large positive numbers, say  and , and construct the grid in [0, ]  × [0, ]  consisting of  equidistant marginal grind points.The value of  need not be too large since we only aim to explore the log-likelihood profile.The log-likelihood can be evaluated at these grid points and plotted to visualize properties of the resulting surface.Depending on this plot, we can choose either to fix or increase  and  until we are certain that the MLE is within the selected domain.
After obtaining the first-step approximation to the MLE by the adaptive grid procedure above, we use the MATLAB function fminunc to obtain the final MLE by minimizing the negative log-likelihood function viewed as a function of the unknown parameter vector Γ = (Γ 1 , Γ 2 , . . ., Γ  ) = (, ).We have  =  + , and the final step MLE is defined as where  0 (Γ) =  0 ((, )) = (, , â(, )) with â(, ) defined in (9).We require to input the gradient vector for this MATLAB function which is given by The explicit expression of each entry of ∇ 0 (Γ) is provided in Appendix A. The MATLAB function fminunc uses, as an option, a quasi-Newton procedure that does not require the calculation of second derivatives and hence saves computational time.Let Γ = (Γ 1 , Γ 2 , . . ., Γ  ) = (, , ,  2 ) denote the  × 1 vector of all unknown parameters (including  2 ) where  =  + 2 =  +  + 2. Subsequent inference based on the MLEs requires the computation of the Fisher information [14,15].The Fisher information matrix (Γ) is given by the  ×  symmetric matrix whose (, V)th element is the covariance between th and Vth first partial derivatives of the log-likelihood:

Statistical Inference
Based on the expected values of the second partial derivatives, the Fisher information matrix in ( 22) is equivalent to The observed Fisher information matrix is simply ( ΓMLE ), the information matrix evaluated at the maximum likelihood estimate, ΓMLE , of Γ. Further, its inverse evaluated at the MLE is an estimate of the asymptotic covariance matrix for ΓMLE which is given by Since the log-likelihood function is given by the first-order partial derivative of (, , , for 1 ≤  ≤ , and Taking expectations on both sides of the above equations and from (23), we get for 1 ≤  ≤ , and We compute each element (, V) of the matrix in (23) for DDEMs with single and multiple delays; the explicit expressions are given in Appendices A and B.

Confidence Intervals.
A confidence interval for an unknown parameter gives the range of values most likely to cover the true value of the parameter with high probability.
The standard form of a confidence interval is estimate + / − margin of error.
To construct a level C confidence interval for any element of Γ = (Γ 1 , Γ 2 , . . ., Γ  ) = (, , ,  2 ), say, Γ  , for 1 ≤  ≤ , we need to find an estimate of the margin of error.First, the estimated standard error of the maximum likelihood estimate, Γ,MLE , of Γ  is given by where COV( ΓMLE ) is the covariance matrix as given in (24).The explicit terms of the covariance matrix can be obtained by substituting (30) into (24).The confidence interval for Γ  is where  /2 = norminv(1 − /2),  = 0.05, is the desired significance level and  /2 SE( Γ,MLE ) is the margin of error.We can find confidence intervals for all components of Γ = (Γ 1 , Γ 2 , . . ., Γ  ) = (, , ,  2 ) in this way.In some cases, the estimated confidence interval in (35) may include some negative values which is unreasonable for a parameter that is known to be positive.In this case, we perform a logarithmic transformation of the parameter, construct the confidence interval for the log-transformed parameter, and then transform the confidence interval back to the original parameter space.The confidence interval for Γ  based on this log-transformation procedure is

Examples
We present three examples of DDEMS in the univariate case: we consider two models with a single delay and a third one with two delays.

Example 1.
We consider the exponential delay differential equation model (EDDEM) with a single delay (i.e.,  = 1,  = 1) which is the solution to the DDE The EDDEM in (37) is a model for ideal population growth under infinite resources and no deaths, such as a protozoan or bacterial culture dividing under constant environmental conditions.The delay parameter  can be taken to represent the gestation period or maturity period, that is, the time taken for individuals to be ready for division.The parameter  represents the growth rate of the population.We numerically solve the DDE in (37) using the MATLAB function dde23 with fixed parameters (, , ,  2 ) values at (0.5, 1, 5, 0.01).
Sampled observations from the DDEM as in (1) were obtained at discrete time intervals of width ℎ = 0.1 starting from  0 = 0.The endpoint considered is   = 10 corresponding to  + 1 = 101.The aim is to estimate , , , and  2 based on  0 ,  1 , . . .,   .Figure 1(a) illustrates the different behaviour of () based on different parameter specifications.Figure 1(b) shows the underlying trajectories of the solution () from the DDE model (37) and the ( + 1) sampled observations.The initial grid space for the adaptive grid procedure was taken to be Θ (0) = {(  ,   ) :   =  0 + ℎ,   =  0 + ℎ,  = 1, 2, . . ., ;  = 1, 2, . . ., } with ( 0 ,  0 ) = (0.1, 0.6), ℎ = 0.1,  =  = 9; see the remark that is given towards the end of this section regarding the selection of Θ (0) for this example as well as for the rest.The stopping criteria threshold  was chosen to be 0.0001.The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 1 and 2. Recall that  is the number of subdivisions of each interval [ −1 ,   ] needed for calculating the quantities (  , , , )/  and  2 (  , , , )/ 2 ; see Section 3.1.1.As  increases, the MLE estimates become more accurate but at the cost of increased computational time.We note that, for  = 50 or  = 100, satisfactory results are already achieved in terms of closeness to the true parameter values of (, , ,  2 ) = (0.5, 1, 5, 0.01).Subsequently,  = 100 is considered for finding the maximum of the log-likelihood function, for finding the information matrix and computing the confidence intervals of the parameters.
is known as Hutchinson's equation [16], where  is the population at that instant,  is the intrinsic growth rate, and  is the carrying capacity of the population.Both  and  are positive constants and  is a positive constant delay parameter.
The Fisher information matrix as ℎ = 0.1 and  = 100 for DLDEM with single delay is (43)  The 95% confidence intervals for parameters ( 1 ,  2 , , ,  2 ) in DLDE with single delay are shown in Table 6.[17] is the solution to the DDE
Remark 2. As mentioned earlier, the adaptive grid procedure in the first stage of our two-step procedure needs to select a sufficiently large domain that is likely to contain the MLE.The MLE should be close to the true parameter values that generated the data as standard MLE theory [11][12][13] dictates.This has also been established in the three examples considered.Hence, since the true value of (, ) is known in our simulation examples, we selected the initial domain of the grid procedure to contain these true values in its interior.Thus, the notation ( 0 ,  0 ) denotes the lower bound of the parameters which is used for the grid procedure.The grid Θ (0) = {(  ,   ) :   =  0 + ℎ,   =  0 + ℎ,  = 1, 2, . . ., ;  = 1, 2, . . ., } is ensured to contain the true parameter values based on selection of , , and ℎ in all the examples.Other than this consideration, the true values that were used in the simulation were selected rather arbitrarily, only chosen so as to be representative parameter values that exhibit the typical nature of trajectories of the underlying DDEs as shown in the figures.
Numerical solution of the EDDEM (37) using the fixed parameter values in [0, 100] by step size ℎ = 0.1 with  = 0.5 and  = 1 or  = 1Numerical solution of the EDDEM (37) using the estimated parameter values.The stars are the simulated noisy data from (1) at +1 = 101 equally spaced time points in [0, 10] by step size ℎ = 0.1 with  = 0.5 and  = 1

Complexity
Numerical solution of the DLDEM with two delays (41) using the fixed parameter values in [0, 100] by step size ℎ = 0.1 with  1 = 0.5 and  2 = 0.7.The steady state is stable when  = 2.5 and becomes unstable when  = Numerical solution of the DLDEM with two delays (41) using the estimated parameter values.The stars are the simulated noisy data by adding noises to the DLDE solutions at  + 1 = 101 equally spaced time points in [0, 10] by step size ℎ = 0.1 with  1 = 0.5,  2 = 0.7, and  = 2.5

Figure 2
Figure 2 at discrete time intervals of width ℎ = 0.1 starting from  0 = 0.The endpoint considered is   = 10 corresponding to  = 100 where the number of sampled time points are ( + 1). Figure 2(b) shows the underlying trajectory of the solution () from the DDE model (41) and the (+1) sampled observations for the time range selected.