An Empirical Likelihood Ratio-Based Omnibus Test for Normality with an Adjustment for Symmetric Alternatives

An omnibus test for normality with an adjustment for symmetric alternatives is developed using the empirical likelihood ratio technique. We first transform the raw data via a jackknife transformation technique by deleting one observation at a time. The probability integral transformation was then applied on the transformed data, and under the null hypothesis, the transformed data have a limiting uniform distribution, reducing testing for normality to testing for uniformity. Employing the empirical likelihood technique, we show that the test statistic has a chi-square limiting distribution. We also demonstrated that, under the established symmetric settings, the CUSUM-type and Shiryaev–Roberts test statistics gave comparable properties and power. The proposed test has good control of type I error. Monte Carlo simulations revealed that the proposed test outperformed studied classical existing tests under symmetric short-tailed alternatives. Findings from a real data study further revealed the robustness and applicability of the proposed test in practice.


Introduction
e empirical likelihood (EL) methodology was introduced in [1,2] and has been widely studied as a nonparametric approximation of the parametric likelihood approach (e.g., [3][4][5][6]). us, it utilizes the concept of the likelihoods in a distribution-free manner in approximating optimal parametric likelihood-based techniques. e method provides a versatile approach that may be applied to perform inference for a wide variety of statistical applications. An area with substantial new development in the use of the EL methods is hypothesis testing. Various researchers have proposed goodness-of-fit (GoF) tests for continuous distributions based on the EL for a wide range of hypothesis tests, which includes exponentiality [7,8], logistic [9], uniformity [10], and normality [11,12].
From the various proposed EL testing procedures as well as in the current statistical practice, it is evident that the problem of testing composite hypotheses of normality is undeniably the most common research focus in GoF testing. e continued growing need for normality tests is attributed to the frequent use and applications of normally distributed data in various areas of pure and applied statistical practices. Although it is difficult to propose a test for normality competing with the highly efficient family of Shapiro-Wilk tests (e.g., [13][14][15][16]), the proposed EL-based normality tests have proved to be superior under certain alternative distributions [12]. Of these tests, the moment-based tests seem to have gained more traction due to their flexibility, simplicity, power properties, and convenient use of omnibus tests in accessing the normality of underlying continuous distributions.
To test for normality, Dong and Giles [11] proposed an omnibus test statistic by directly utilizing the EL methodology outlined by Owen [17]. ey utilized the first four moment constraints that characterize the normal distribution. After outlining drawbacks of the test proposed by Dong and Giles [11], Shan et al. [12] proposed a cumulative sum-(CUSUM-) type simple and exact empirical likelihood ratiobased (SEELR) test statistic for normality which unlike that of Dong and Giles [11] has good control of type I error (also see [18]) and can be easily implemented in a wide range of statistical packages. e test by Shan et al. [12] is an omnibus test that makes use of standardized sample observations using the Lin and Mudholkar [19] jackknife transformation. In their study, Shan et al. [12] reported that power of their proposed omnibus test is comparable to well-known existing tests and oftentimes outperforms these tests under certain alternatives, mostly asymmetric distributions.
Just like some tests for normality, the test proposed by Shan et al. [12] suffers the loss of power under several symmetric alternatives. It is a challenge to propose an omnibus test that has high power than the classical Jarque-Bera's tests [20,21] and the D'Agostino-Pearson k 2 test [22] in detecting departures from normality in alternatives that exhibit the symmetric nature of the normal distribution.
rough the utilization of various mathematical and statistical properties that characterize the normal distribution (for example, see [19,23,24]) one can remedy such shortfalls in GoF tests. One such remedy is transformation to uniformity, which has several benefits that include increasing the power of a test under certain alternatives (for example, see [8,25]). For a data-driven omnibus test for symmetry, Fang et al. [25] utilized a bootstrapping approach coupled with the probability integral transformation, and under the null hypothesis, the transformed data had a limiting uniform distribution. For superior power under symmetric alternatives, their proposed test required only odd-ordered orthogonal moments of the transformed data in constructing the test statistic. e use of the probability integral transformation in the development of GoF tests of normality has been widely used especially in empirical distribution function-(EDF-) based tests. Rosenblatt [26] first introduced the concept. us, the EDF tests make use of the probability integral transformation U � F(X). If F(X) is the distribution function of X, the random variable U is uniformly distributed between 0 and 1. Given n observations X (1) , . . . , X (n) , the values U (i) � F(X (i) ) are computed. e most commonly used EDF tests for normality are the Anderson-Darling test [27,28] and the Lilliefors test which is well known as the modified Kolmogorov-Smirnov test [29] and the Cramér-von Mises test [30]. In addition to the use of the probability integral transformation, several approaches have been used to construct GoF tests for the composite hypothesis of normality. In this study, we adopted the EL methodology to propose a new omnibus test for normality by exploiting different forms of characterizing the normal distribution. e purpose of this paper is to use a jackknife characterization due to Shan et al. [12], as in Lin and Mudholkar [19], followed by a probability integral transformation (see [8,25]) for developing a goodness-of-fit test for normality. Here we consider the approach to obtaining a GoF test statistic by combining two well-known characterizations, individually powerful against different classes of alternatives. However, following the works of Fang et al. [25], we restrict attention to symmetric alternatives. Power comparisons are conducted with some of the most widely known EDF-based tests, well-known and powerful moment-based tests, and the powerful classical SW tests.

Test Development
Consider an unknown continuous distribution with nonordered random variables denoted by X 1 , X 2 , . . . , X n that are assumed to be independent and identically distributed (i.i.d.). e intention is to test whether the observations are consistent with a normal distribution. us, we intended to test whether to accept or reject the following null hypothesis: where μ and σ 2 are unknown parameters. We then proposed to use standardized random variables of the sample observations. To achieve this, we adopted a jackknife transformation technique by deleting one observation at a time following Lin and Mudholkar [19] works (also see [12]). us, we transformed our observations using It should be noted according to Shan et al. [12] as n gets large, the standardized data points Z 1 , Z 2 , . . . , Z n become asymptotically independent while under the null hypothesis they are distributed according to a t distribution with n − 2 df, which as n grows approaches the standard normal. In addition to this transformation, we then further adopted the probability integral transformation (see [25] as well as [8]). e probability integral transformation then transformed the standardized random variables into independent uniformly distributed random variables Y 1 , Y 2 , . . . , Y n . at is, under the null hypothesis, the transformed data follow the uniform distribution asymptotically. From the proposed transformation, Y i are uniformly distributed on (0, 1), where the density function of Y(a, b) is given by where a � the lowest value of y and b � the highest value of y. e k th moment of the uniform distribution is defined by . (4) moment equations utilizing the raw moments of the uniform distribution, which are given by e composite hypothesis for the ELR test was then given by e nonparametric empirical likelihood function corresponding to the given hypotheses in equation (7) is expressed as where the unknown probability parameters p i 's are attained under H 0 and H a . Under H 0 , the EL function given in equation (8) is maximized with respect to the p i 's subject to two constraints: Following this, the weights of p i 's are identified as p 1 , p 2 , . . . , p n � arg max a 1 ,a 2 ,...,a n n i�1 a i n i�1 a i � 1, where 0 ≤ a j ≤ 1, for j � 1, 2, . . . , n. If we then use the Lagrangian multipliers technique, it can be shown that the maximum EL function under H 0 can be expressed as where λ k in equation (11) is a root of Under the alternative hypothesis, n i�1 p i Y k i � μ k is not required to identify the weights, p i in order to maximize the EL function but only, n i�1 p i � 1. us, under H a , the nonparametric EL function is given by Now let us consider (−2LLR) k to be −2 log-likelihood test statistic for the hypotheses H 0 : It should be noted that, under H 0 , minus two times the log LLR has an asymptotic χ 2 (1) limiting distribution [1]. Considering the null and alternative hypotheses, the test statistic is given by We then proposed to reject the null hypothesis using two different test statistics. Firstly, we considered the cumulative sum-(CUSUM-) type statistic given by Secondly, we considered the common alternative to the CUSUM-type statistic, which is to utilize the Shiryaev-Roberts (SR) statistic (for example, see [31] among others). In our case, the classical SR statistic was of the form where C α is the test threshold and is 100(1 − α)% percentile of the χ 2 (1) distribution. e set G are integer values representing the moment constraints that will maximize the test statistic. Our proposed test statistic (equation (15)) is developed utilizing approaches introduced by Vexler and Wu [32], and various authors have demonstrated that the SR statistic (see equation (16)) and the CUSUM-type statistic have almost equivalent optimal statistical properties due to their common null-martingale basis [31]. e choice of G is also vital in moment-based test statistics. Fang et al. [25] utilized the probability integral transformation and recommended that, under the test of symmetry, only oddordered moments of the transformed data are required in the construction of the test statistic. ey further alluded that the use of odd-ordered moments has several benefits that include power against most symmetric alternatives and robustness and performs well under small sample sizes among others. For our proposed test statistics, we decided to then conduct an extensive Monte Carlo simulation exercise in order to empirically evaluate the suitable choice of G that will give us optimal power under symmetric alternatives in testing for normality. Following the work of Fang et al. [25] as well as Shan et al. [12], we estimated the powers of the test statistics for different alternatives and definitions of oddordered moments of G. Table 1 displays a subset of the Monte Caro simulation results. We also considered additional alternatives based on samples of sizes n � 20, 50, and 100 at α � 0.05. We used size-adjusted critical values for each test statistic, and power for each test was computed using 5,000 replications. e results were that both the CUSUM-type and the Shiryaev-Roberts proposed test statistics with G � 3, 5 { } showed an average power that was greater than that of all other cases under symmetric shorttailed alternatives. In addition, the CUSUM-type test statistics with G � 3, 5 { } showed an average power that was greater than that of all other cases under symmetric longtailed alternatives. Since our proposed test is meant to perform superior under symmetric alternatives, we    [25]. In this article, we denoted CSELR for the CUSUM-type test statistic and SRELR for the Shiryaev-Roberts proposed test statistic. A schematic algorithm of the testing procedure is shown in Figure 1.

Monte Carlo Simulation Procedures
We utilized the R statistical package for all the simulation procedures. Firstly, size-adjusted critical values for the proposed test statistics were determined. In order to achieve this, we used 50,000 replications, and without loss of generality, data were simulated from a standard normal distribution at stipulated sample sizes and α-levels. Only samples of sizes 20 to 100 were considered (see Table 2). is was entirely motivated by the need to utilize samples that commonly arise in practice.
For power comparisons, twelve selected competitor tests were considered. e choice of these tests was guided by potential competitor tests, thus tests developed using similar characterization techniques as well as well-known powerful classical normality tests.

(3) Set 3: asymmetric distributions:
(i) e gamma distribution with parameters (2, 1) (ii) e Weibull distribution with parameters (2, 1) (iii) e skewed normal distribution with parameters (0, 1, 5) (iv) e skewed Cauchy distribution with parameters (0, 2, 5) (v) e beta distribution with parameters (2, 1) and (3, 1.5) For power simulation, 10,000 samples each of size n � 20, 30, 50, 80, and 100 were obtained under the various alternative distributions. Power was computed by considering the number of times the test rejected the null hypothesis over the total number of replications. A numerical bootstrap study on real data was conducted to assess the robustness and applicability of the proposed tests. However, it was necessary to first assess the type I error control of our proposed tests before the power study.

Type I Error Control.
Here, we provide the values of type I error rates along with the associated standard errors of the proposed tests for α � 0.01, 0.05, and 0.10. In order to compute these quantities, for each nominal alpha, we generated 500,000 random samples from a standard normal distribution, each corresponding to sample size n � 20, 30, 50, 80, and 100. e results presented in Table 3 show that the proposed tests control type I error very well. Figure 2 includes plots of the simulated type error rates only for α � 0.05 for all the sample sizes considered. e plots for the empirical cumulative probability function of the simulated p values for n � 20, α � 0.01, and α � 0.10 were omitted since their plots were more or less the same as those for other sample sizes and α � 0.05, respectively. It is evident that the plots produced the expected appearance in all the simulated scenarios.
at is, the plots show close to the α-level of simulated type I error rates. e closeness of the estimated probabilities of type I error to the nominal value (α � 0.05) attests that the GoF test does perform as expected.
ese results were extended in order to evaluate the type I Journal of Probability and Statistics error control when simulating from a normal distribution with varying parameters of μ and σ 2 . We considered various scenarios which include N(0, 5 2 ), N(5, 5 2 ), N(7, 15 2 ), N (15, 25 2 ), and N(50, 75 2 ) for samples of sizes 20, 50, 100 at alpha levels of 0.01, 0.05, and 0.10 (see Table 4). Similarly, as observed in Table 3, the estimated probabilities of type I error were close to the respective nominal values which shows that the GoF test does perform as expected. It is important to note that various alternative methods that can also be used to assess the closeness of the simulated type I error rates to the nominal size alpha are available in the literature. e most popular one is based on the central limit End Transform data using the jackknife resampling technique Apply the probability intergal transformation on the transformed data, Utilizing the raw moments of the uniform distribution, apply th EL on the composite hypotheses Using the Lagrangian multipliers maximise the EL functions under Calculate the value of test statistic given by CSELR = max (-2LLR) k or SRELR = ∑ k∈G exp (-2LLR) k k∈G Figure 1: Schematic algorithm of the testing procedure.   Journal of Probability and Statistics theorem and it was described in detail by Batsidis et al. [41]. Once the type I error rates were examined, we then proceeded to evaluate the powers of the proposed tests to determine how well they would detect departures from normality and to see their power performance compared to those of the selected competing tests.

Monte Carlo Power Simulation Results.
Results for the Monte Carlo power comparisons are presented. Bold numbers in all tables represent the two most superior tests under the respective simulated scenarios. From Tables 5 and 6, we found out that when the alternative distributions are short-tailed and symmetric, our proposed tests performed quite well. Under these symmetric alternative cases, our proposed tests (SRELR and CSELR) significantly outperformed all other studied tests. Tests based on LL, JB, RJB, and �� b 1 have the least power as compared to other tests. In general, the tests based on SRELR, CSELR, DP, SW, and b 2 are the most powerful under these symmetric short-tailed alternative distributions.
For symmetric long-tailed alternatives (see Tables 7 and  8), the tests based on RJB, b 2 , SF, and JB are more superior, whereas the tests based on �� b 1 , LL, and SEELR are the least powerful. Our proposed tests performed slightly lesser than the DP test but were comparable to the SW test. It is important to note that, in all of the cases under these symmetric long-tailed alternatives, our proposed tests outperformed all the EDF-based tests.
For the considered asymmetric alternatives (see Table 9), the tests based on SEELR, SW, SF, and AD are more superior, whereas the tests based on b 2 , RJB, and SRELR are the least powerful. Our proposed test based on CSELR performed slightly lesser than the JB test but was comparable to the LL test. It is important to note that, in all of the cases under these asymmetric alternatives, our proposed test based on CSELR outperformed the SRELR-based test.
In order to get a clearer visualisation of the performance of the different normality tests, the ranking procedure was used. Tables 10 to 12 contain the ranking of all the tests considered in this study according to the average powers computed from the values in Tables 5-8 and 9, respectively. e rank of power is based on the set of alternative distributions and sample sizes, respectively. Using average powers, we can select the tests that are, on average, most powerful against the alternatives from the given sets of alternatives. It should be noted that, under all the symmetric simulated scenarios, our proposed tests (SRELR and CSELR) were comparable in power.
From Table 10, it can be clearly seen that our proposed tests (SRELR and CSELR) are the most powerful tests for both small and moderate sample sizes under symmetric short-tailed alternatives. is is followed rather closely by the DP test. e results of the total rank based on all sample sizes (i.e., n � 20 to 100) show that our proposed tests (SRELR and CSELR) are overly the most superior tests for symmetric short-tailed distributions.  Figure 2: Cumulative type I error rates for α � 0.05 at different sample sizes (n � 30, 50, 80, and 100) using 500,000 simulations. 8 Journal of Probability and Statistics For symmetric long-tailed alternatives (see Table 11), generally the RJB test was the most powerful in both small and moderate sample sizes. Our proposed tests had comparable power with the AD test under small samples for symmetric long-tailed alternatives. However, under moderate sample sizes, our proposed tests were slightly more powerful than the DP and SW tests. Lastly, considering all the sample sizes under symmetric long-tailed alternatives, our proposed tests were comparable to the SW test.
Lastly, under asymmetric alternatives (see Table 12), our proposed test based on CSELR performed better than the SRELR-based test. It is also important to note that our related test, the SEELR, outperformed all other tests under these considered asymmetric alternatives. It is also important to note that, unlike some of the competitor tests, our proposed tests were consistent in power under all alternative distributions for all simulated scenarios.

Real Data Study
We used the snowfall dataset to examine the applicability of the proposed test on real data. e snowfall dataset consists of 63 snow precipitation values that were recorded from the   year 1910 to 1972. e dataset has been extensively used in various statistical applications; see, for example, aler [42], Carmichael [43], Tukey [44], and Parzen [45]  e snowfall data is well known to be consistent with the normal distribution. We plotted a histogram and a Q-Q plot in order to examine the hypothesis for the normality of the snowfall data (see Figure 3).
From the plots, it is clearly visible that the snowfall data are consistent with a normal distribution. Following the ideas introduced by Stigler [46], we conducted a bootstrap    Table 9 for set 3 of alternative distributions. type study to empirically examine the proposed test based on the two statistics, CSELR and SRELR. e approach was to use a sample of size 60 by randomly selecting from the snowfall data and then test for normality at 0.05 level of significance. We repeated this strategy 10000 times, and the bootstrap type procedure showed that the proposed CSELR test had a p value of 0.7755, while the SRELR had a p value of 0.1451. In order to further examine the normality of the snowfall data, we repeated the bootstrap type study using the AD, CVM, JB, and SW tests. e p values that were obtained, that is, 0.6862 for the AD test, 0.6921 for the CVM test, 0.5702 for the JB test, and 0.6650 for the SW test were all suggestive for one to conclude that the snowfall data are indeed normally distributed. us, the p values obtained from the traditional tests as well as our proposed tests show to be reliable in illustrating the normality of the snowfall data. us, our proposed test statistics have demonstrated that they are indeed applicable when applied on some reallife data.

Conclusion
By utilizing the EL methodology and exploiting the mathematical properties and different forms of transforming the normal distribution, we have developed simple and powerful tests for normality against symmetric alternatives. e proposed tests are consistent and control type I error very well, which is consistent with what has been reported in other studies which looked at EL-based GoF tests (see, for example, [8,12,18]). ey outperformed other common traditional tests under symmetric short-tailed alternatives. e proposed tests also performed quite well under symmetric long-tailed alternatives where they were found to be comparable to the SW test and outperformed all the considered EDF tests. e application of our proposed tests on real data revealed the applicability as well as the robustness of the proposed tests in practice. It would be desirable to develop an ELR-based test for normality that outperforms the classical tests under most alternative distributions that occur in practice.
is might be the case after certain modifications and improvements that include further exploring the EL methodology as well as other forms of characterizing the normal distribution. e researchers are currently looking at exploiting the use of EDF in developing an empirical likelihood moment-based EDF test for normality. us, combining the characterization of EDF-based tests and EL omnibus tests can potentially improve power under small to moderate sample sizes. #Moment function for uniform distribution momentFU < −function(k, a, b){ z < −(b^(k + 1) −a^(k + 1))/((k + 1) * (b − a)) } #Compute test statistic teststatistic < −function ( Data Availability e data used to demonstrate the applicability of our proposed tests in practice are presented in this article and can also be obtained from respective authors cited in the "Real Data Study" section. All other data were simulated using R and the source code is available in the Appendix.

Conflicts of Interest
e authors declare that they have no conflicts of interest.