Efficiently Implementing the Maximum Likelihood Estimator for Hurst Exponent

This paper aims to efficiently implement the maximum likelihood estimator (MLE) for Hurst exponent, a vital parameter embedded in the process of fractional Brownian motion (FBM) or fractional Gaussian noise (FGN), via a combination of the Levinson algorithm and Cholesky decomposition. Many natural and biomedical signals can often be modeled as one of these two processes. It is necessary for users to estimate the Hurst exponent to differentiate one physical signal from another. Among all estimators for estimating the Hurst exponent, the maximum likelihood estimator (MLE) is optimal, whereas its computational cost is also the highest. Consequently, a faster but slightly less accurate estimator is often adopted. Analysis discovers that the combination of the Levinson algorithm and Cholesky decomposition can avoid storing any matrix and performing any matrix multiplication and thus save a great deal of computer memory and computational time. In addition, the first proposed MLE for the Hurst exponent was based on the assumptions that the mean is known as zero and the variance is unknown. In this paper, all four possible situations are considered: known mean, unknown mean, known variance, and unknown variance. Experimental results show that the MLE through efficiently implementing numerical computation can greatly enhance the computational performance.

Although the accuracy of estimating the Hurst exponent by the MLE is the best, it is easy to induce computational problems and enormous computational expenditure.For example, evaluating the inverse of an autocovariance matrix may be numerically unstable, especially when  is close to 1.Under this situation, the autocovariance matrix almost becomes singular because the autocovariance matrix of DFGN changes very slowly [35].This problem will cause computational inaccuracy, leading to wrong explanations for physical signals of interest.On the other hand, the cost often makes users hesitate to apply the MLE to quick response systems, and thus the theoretical value of the MLE is generally much higher than its practical applications.
When taking a closer look at the structure of the autocovariance matrix, a combination of the Levinson algorithm [38] and Cholesky decomposition [39] can solve computational problems and reduce computational cost.Accordingly, users will be encouraged to adopt the MLE even in the quick response situations, and then the MLE has a better opportunity to become the first choice in the future, especially when computer speed continues to be increased up to a certain level.
When the MLE was first proposed by Lundahl et al. [29], the analysis and evaluation of the MLE were based on the assumptions that the mean of DFGN is zero and the variance is unknown.It is only applicable to physical signals modeled as DFBM, but not suitable for the model of DFGN.When signals are modeled as DFGN, it is easy for users to obtain wrong estimation results, unless they take the sample mean out of the original signals beforehand.Therefore, it is necessary for practical signals to give a complete consideration of four possible cases, including known mean, unknown mean, known variance, and unknown variance; moreover, each unknown mean also considers two approaches for comparison: the sample mean and the mean estimated by MLE.In terms of the practical situation of a realization of physical signals, users can choose one case to estimate the Hurst exponent and then the fractal dimension.
The rest of this paper is organized as follows.Section 2 briefly describes mathematical preliminaries.Section 3 introduces practical considerations for the MLE.Section 4 shows how to implement the MLE in an efficient way.Section 5 discusses experimental results.Finally, Section 6 concludes the paper with some facts.

Mathematical Preliminaries
In this section, some related models are reviewed, including FBM, FGN, DFBM, and DFGN.For consistency, the notation {(),  ∈ R} is used to denote a continuous-time random process and {[],  ∈ Z} a discrete-time random process.
FBM, represented by   (, ) where  belongs to a sample space Ω, is a generalization of Brownian motion.For conciseness, the short notation   () is adopted in place of   (, ).According to Mandelbrot and Van Ness [31], FBM is formally defined by the following relations: where  is the Hurst exponent with a value lying between 0 and 1 and the increments of FBM, (), are zero mean, Gaussian, and independent increments of ordinary Brownian motion.Its symmetric form is described as follows: When  equals 0.5, FBM becomes the ordinary Brownian motion.Unfortunately, FBM is a nonstationary process, whose Wigner-Ville spectrum (WVS) is given by the following expression [30]: In spite of FBM being a time-varying process, the increment of FBM is a stationary and self-similar process, called FGN.

Practical Considerations for the MLE
It is well known from the properties of DFGN that the probability density function (PDF) of DFGN can be expressed as follows [29]: where is the dataset and R is the , where   () is the ACF as expressed by (4).
In real applications, some physical signals can be modeled as either DFBM or DFGN.If the signals of interest are modeled as DFBM, their increment, DFGN, will not be affected by displacement.However, if the signals of interest are modeled as DFGN, signal displacement will result in a very severe error unless the displacement problem of signals is considered in advance.The reason for displacement may be modeling error, measurement error, inappropriate operation, or apparatus baseline calibrating error, and so forth.In order to avoid the error resulting from displacement, two approaches are considered to estimate displacement: one is to maximize PDF over the mean; the other is to simply take the sample mean out of signals.Considering that the PDF of DFGN has two explicit parameters, the mean and variance, each parameter may be known or unknown, and each unknown mean includes two approaches, all together, there are four cases covering six approaches.

Case 1: Known Mean (Displacement) and Known Variance.
Under this case, there are no mean and variance necessary to be estimated before estimating the Hurst exponent.In theory, this is the best case since information about the mean and variance is provided.For convenience, the logarithm of PDF will be maximized instead of PDF, which produces the same result since logarithmic operation still preserves the monotonic property of a function.Without loss of generality, displacement is set to be 0. From (5), the logarithm of PDF is as follows: Since constant terms and coefficients do not affect the maximum, a compact form is described as follows: where and  2 is known to users.

Case 2: Known Mean (Displacement) and Unknown
Variance.The case first proposed by Lundahl et al. [29] is like this.Likewise, displacement is assumed to be 0. The Hurst exponent can be estimated by using the following equation: max It is well known that the logarithm of PDF is expressed as follows: By maximizing the log (x; ) over  2 , it follows that By substituting ( 11) into (10), the final function to be maximized is Likewise, the terms that do not affect maximization will be omitted, and thus a compact form is described as follows: max

Case 3: Unknown Mean (Displacement) and Known
Variance.Let measurement data be z = x + , where x can be modeled as DFGN with zero mean and  is a column vector with each element being constant ; that is,  = [  ⋅ ⋅ ⋅ ]  .The Hurst exponent can be estimated by using the following two approaches based on two estimators about .
Approach 1.First maximize the logarithm of PDF over  by taking derivative with respect to , and then maximize the logarithm of the maximum PDF on estimated  over , that is, max The unknown displacement of DFGN is assumed to be , and thus the PDF will be log  (z; , ) where First, maximize the log (z; , ) over  by taking derivative with respect to  and the operation is equivalent to maximizing the (z − )  R −1 (z − ).The estimator of  is derived from the Appendix as follows: where where Next, by substituting ( 17) into ( 16), the final function to be maximized is Likewise, the terms without affecting maximization are omitted, and thus a compact form is described as follows: max Approach 2. Use the sample mean to replace the previous estimator of .Other procedures are the same as the ones of Approach 1.The sample mean is the simplest method to estimate the mean, which is

Case 4: Unknown Mean (Displacement) and Unknown
Variance.This case is the most general in real applications.Like Case 3, measurement data are assumed to be z = x +  and the unknown variance is  2 .Similarly, the Hurst exponent is estimated by using the following two approaches.
Approach 1.Like Case 3, how to estimate the Hurst exponent is described as follows: max First, maximize the log (x; , ) over  2 and  by taking derivatives with respect to  2 and , respectively, and then the estimators of  2 and  are derived as follows: Likewise, the terms that do not affect maximization are omitted, and thus a compact form is described as follows: max Approach 2. Use the sample mean to replace the previous estimator of .Other procedures are the same as the ones of Approach 1.
The final step of each case is to estimate the Hurst exponent, but it needs some tips.A direct maximization over  is unfeasible because the Hurst exponent is an implicit parameter.Therefore, the golden section search [41] was adopted to find out the maxima of ( 7), ( 13), (20), and (24) in this paper.

Efficient Procedures for the MLE
In this section, the computational stability and efficiency of using the MLE for the Hurst exponent are studied.Since computing the inverse and determinant of an autocovariance matrix is sensitive to the data size, especially when  is close to 1 [35].Also, for a large dataset, storing the whole autocovariance matrix requires a large amount of computer memory.Therefore, a reliable and efficient procedure is necessary for estimating the Hurst exponent, especially when users use an ordinary computer with less memory and lower CPU speed.After carefully studying the structure of an autocovariance matrix, a combination of the Levinson algorithm and Cholesky decomposition can be applied to efficiently compute the inverse and determinant of an autocovariance matrix, and then the iterative structures of the two algorithms can be exploited to estimate the Hurst exponent without storing any matrix and performing any matrix multiplication.For convenience, some notations are listed below for further quotation: where   (),  = 1, 2, . . .,  are the predictor coefficients of order  and   ,  = 0, 1, . . .,  − 1 are the prediction error powers of order  − 1.These coefficients are iteratively computed by the Levinson algorithm.It is worth noting that when numerically calculating log |R|, ∑ −1 =0 log   is computed instead of log( 0  1 ⋅ ⋅ ⋅  −1 ) because the term  0  1 ⋅ ⋅ ⋅  −1 easily approaches to zero numerically as the data size grows larger.Thus, using the following equation to compute log |R| is essential: Obviously, the Levinson algorithm and Cholesky decomposition can be used to save the time of computing the inverse and determinant of the autocovariance matrix.However, estimating the Hurst exponent by the currently mentioned structure of implementation still needs matrix computation and storing, which requires excessive computational time and computer memory.When taking a closer look at the term x  R −1 x of ( 7) or ( 13), a magic and helpful form appears as follows: where = w   x,  = 0, 1, . . .,  − 1.
With (31), storing any matrix in the process of computation is no longer necessary, which is also a very efficient step.
In this paper, the golden section search was adopted to find out the maxima of ( 7), ( 13), (20), and (24).In the process of searching for each maximum, computing the inner terms of ( 7), ( 13), (20), or (24) is necessary, such as In order to efficiently estimate the Hurst exponent by using (7) or (13), first compute a  ,  = 0, 1, . . .,  − 1, by using the Levinson algorithm, then w  by using (26), y by using ( 30) and ( 31), x  R −1 x by using (29), and log |R| by using (28).The details of determining the Hurst exponent by using the MLE are described in the following procedure.(1) initialize  = 0; (2) compute a  and   by using the Levinson algorithm; (3) compute w   by using ( 26 Obviously, it is unnecessary to store any matrix and execute any matrix multiplication in a series of computation except for vector storing and multiplication.The efficient procedure not only saves computer memory but also storing time. Next, an efficient procedure for computing (20) or ( 24) is considered.Each procedure considers two approaches to estimating the mean: the sample mean and the mean estimated by MLE.For the first approach, users first take the sample mean out of the original signals and then call the function evaluation of ( 7) or (13).For the second approach, users must use an efficient implementing procedure for (17) to estimate the mean.Based on the composition structure of the mean estimated by MLE, (17) can be decomposed into the following equation: where When carefully observing the term (z − μ)  R −1 (z − μ), it can be decomposed into the following equation: where In order to efficiently compute (20) or (24), first, compute a  ,  = 0, 1, . . .,  − 1, by using the Levinson algorithm, then, w  by using ( 26), W1 by using (34), Wz by using (36), μ by using (32), Wμ by using ( 37), (z − μ)  R −1 (z−μ) by using (35), and log |R| by using (28).The details of determining the Hurst exponent by using the MLE are described in the following procedure.
Similar to Procedure 1, it is not necessary to store any matrix and execute matrix multiplication in a series of computation except for vector storing and multiplication.Without considering the efficient computation of x  R −1 x, users first implement the Levinson algorithm to obtain a  and   ,  = 0, 1, . . .,  − 1 and then store L and D, obtain R −1 by using L  DL, and finally take matrix computation for x  R −1 x.The traditional procedure only saves the time of inverting matrix by using the Levinson algorithm but overlooks the potential efficacy of the predictor coefficients and prediction error powers iteratively generated.With this stable and efficient implementation, the practicability of the MLE will be greatly enhanced.

Results and Discussion
In order to analyze four possible cases and compare their efficiency, the generating algorithm proposed by Lundahl et al. [29] was used to generate DFGN because the realizations produced by this algorithm possess fine correlation structure and long-term dependency.
All estimations were performed with the same computing specifications: (1) hardware: a computer of Intel Core i7-2600 processor up to 3.40 GHz and a RAM of 8.00 GB (7.89 GB available); (2) operating system: Windows 7 Professional Service Pack 1; (3) programming software: MATLAB R2011b 64-bit (win64); (4) optimization algorithm: golden section search with threshold 0.0001, which takes 21 iterations and totally 22 function evaluations [42].Table 1 shows the experimental results, each value representing the mean of mean-squared errors (MSEs) of 100 realizations over 13 Hurst exponents, simply denoted as mean mean-squared errors (MMSEs).
On the other hand, the function evaluation time is recorded and is used to compare with the implementation time of the traditional MLE for efficiency analysis.Table 2 lists the average time (in seconds) of 13 Hurst exponents spent by each approach in two executing procedures, with and without considering the computational efficiency, as well as their corresponding time ratio.
The best results are almost acquired among Case 1, and the second best results are among Case 3, and the worst results are among Case 4 from Table 1.This is reasonable because both mean and variance are known for Case 1, but both mean and variance are unknown for Case 4. It is worth noting that the accuracy of Case 3 is better than Case 2. This indicates that accuracy is more related to known variance than known mean.Generally speaking, a most practical situation is Case 4 with both unknown mean and variance; the situation of Case 1 is less likely practical.Table 1 suggests that using the sample mean instead of the mean estimated by MLE is also a reliable approach.
It is easy to see that the computational cost of the traditional MLE for the Hurst exponent needs ( 3 ), whereas two newly proposed procedures only need ( 2 ).In addition to lower computational complexity, storing data by vector instead of matrix also help raise the computational efficacy.Table 2 suggests that without matrix calculation, time saving is obvious, especially when the data size grows larger.For example, with the data size of 4096, the ratio of each proposed efficient procedure to the traditional one reaches up to 80 times.The ratio will be more tremendous especially for computers of limited resources.These results will contribute to the position of the MLE for estimating Hurst exponent.

Table 1 :
Accuracy comparison for four cases covering six approaches, each value representing the mean of mean-squared errors (MSEs) of 100 realizations over 13 Hurst exponents, simply denoted as mean mean-squared errors (MMSEs).

Table 2 :
Efficiency comparison for four cases covering six approaches, each value representing either the average time (in seconds) of 13 Hurst exponents spent by each approach in two executing procedures (with and without considering the computational efficiency) or their corresponding time ratio.‖A‖  denotes the sum of all elements of matrix A. For clarity, the subscript  is used to emphasize the dependence on the data size during the procedure of proof.Under this situation, μ = (1/‖A  ‖  ) ∑ −1 =0 ‖a , ‖    , where A means.Therefore, users must take the sample mean out of the original signal before using the MLE, or a direct computation will easily lead to a severely wrong result, further providing a wrong signal explanation.In order to extend the MLE for Hurst exponent to signals of DFGN, four possible cases are considered: known mean, unknown mean, known variance,