Comparison of Test Statistics of Nonnormal and Unbalanced Samples for Multivariate Analysis of Variance in terms of Type-I Error Rates

In this study, we investigate how Wilks' lambda, Pillai's trace, Hotelling's trace, and Roy's largest root test statistics can be affected when the normal and homogeneous variance assumptions of the MANOVA method are violated. In other words, in these cases, the robustness of the tests is examined. For this purpose, a simulation study is conducted in different scenarios. In different variable numbers and different sample sizes, considering the group variances are homogeneous (σ12 = σ22 = ⋯ = σg2) and heterogeneous (increasing) (σ12 < σ22 < ⋯<σg2), random numbers are generated from Gamma(4-4-4; 0.5), Gamma(4-9-36; 0.5), Student's t(2), and Normal(0; 1) distributions. Furthermore, the number of observations in the groups being balanced and unbalanced is also taken into account. After 10000 repetitions, type-I error values are calculated for each test for α = 0.05. In the Gamma distribution, Pillai's trace test statistic gives more robust results in the case of homogeneous and heterogeneous variances for 2 variables, and in the case of 3 variables, Roy's largest root test statistic gives more robust results in balanced samples and Pillai's trace test statistic in unbalanced samples. In Student's t distribution, Pillai's trace test statistic gives more robust results in the case of homogeneous variance and Wilks' lambda test statistic in the case of heterogeneous variance. In the normal distribution, in the case of homogeneous variance for 2 variables, Roy's largest root test statistic gives relatively more robust results and Wilks' lambda test statistic for 3 variables. Also in the case of heterogeneous variance for 2 and 3 variables, Roy's largest root test statistic gives robust results in the normal distribution. The test statistics used with MANOVA are affected by the violation of homogeneity of covariance matrices and normality assumptions particularly from unbalanced number of observations.


Introduction
Variance analysis is a method used to test whether there is a statistical difference between three or more group means. Multivariate analysis of variance (MANOVA) is the extended version of univariate analysis of variance (ANOVA). However, MANOVA is a statistical method that examines the effect of two or more independent variables on two or more dependent variables [1]. MANOVA is a method that can be used when several measurements are made on each person or object in one or more samples. Measurements are taken based upon the response variables. Hence, MANOVA data format, different from ANOVA, can be considered as a vector [2]. e test statistics for MANOVA give a measure of the overall likelihood of picking two or more random vectors of means [2,3].
MANOVA has three main assumptions as in all parametric tests. e first one is the assumption that observations are independent of each other. is assumption explains that the sample is completely random. e second assumption is that the dependent variables have a multivariate normal distribution in group. e third assumption is the homogeneity of variances. In this test, since there is more than one dependent variable, not only the equality of the variances between the groups should be ensured but also the equality of the covariance between the dependent variables must be sustained. For this, the variance-covariance matrix is used.
In his study, Olson [4] has analyzed a total of 6 test statistics including Wilks' lambda, Pillai's trace, Hotelling's trace, and Roy's largest root test statistics where the number of variables is 2, 3, 6, and 10; the number of groups is 2, 3, 6, and 10; and sample size is 5, 10, and 50, for type-I and type-II errors in 1000 repetitions. In addition, in [5,6], Olson conducted simulation studies on the results of different conditions of test statistics.
In their studies, Todorov and Filzmoser [7] evaluated the performance of Wilks' lambda test statistic in terms of simulated significance levels, power functions, and endurance under various distributions. Gasperik [8] conducted a simulation study to investigate the robustness of the results of MANOVA when the dependent variables had different correlations among different groups and when the sample was taken from the multivariate uniform distribution. With Monte Carlo studies, Adeleke et al. [2] explored the behaviour of three of the existing test statistics (Wilks' lambda, Pillai's trace, and Roy's largest roots) and suggested alternative test statistics to perform MANOVA tests when the normality assumption is violated in the error term. When the MANOVA's assumptions are not achieved for functional data, Górecki and Smaga [9] in their work have proposed permutation tests and random projection tests based on simple function generated from classical test statistics.
In practice, in most cases, it is not possible to sustain all of the assumptions for multivariate analysis of variance. In this case, the question of how Wilks' lambda, Pillai's trace, Hotelling's trace, and Roy's largest root test statistics perform on different conditions and in different distributions to achieve MANOVA tests and lack of a study in the literature involving all of the situations mentioned in this scenario constitute the motivation of this work. Hence, the aim of this study is to investigate how Wilks' lambda, Pillai's trace, Hotelling's trace, and Roy's largest root test statistics are affected in different number of variables and different sample sizes when the normal and homogeneous variance assumptions of the MANOVA method are violated. In other words, it is the examination of whether the tests are reliable (robustness) or not.

Materials and Methods
In the study, for same groups, with different variable numbers and different sample values, various scenarios for different distributions were constructed where group variances are constant (σ 12 � σ 22 � · · · � σ g2 ) and increasing(σ 12 < σ 22 < · · · < σ g2 ). In these scenarios, provided that the number of groups g � 3, the number of variables p � 2 and p � 3, and the number of observations n = 10, n = 20, and n = 50, random numbers are generated from Gamma (4-4-4; 0.5), Gamma (4-9-36; 0.5), Student's t (2), and normal (0; 1) distributions. Furthermore, the cases where the number of observations in the groups being balanced and unbalanced are also taken into account. By employing 10000 repetitions in Monte Carlo simulation, Wilks' lambda, Pillai's trace, Hotelling's trace, and Roy's largest root test statistics were calculated, and for each of these tests, type-I error value is calculated. By comparing type-I error values with the nominal value of α = 0.05, the hypothesis of "if "p < α," equality of the means" is rejected. Simulation study (Mass (Modern Applied Statistics with S′-2017.04.21) and Lestat (a package for LEARNING STA-TISTICS-20.02.2015) package) was performed using RStudio program language.
2.1. Test Statistics. As mentioned earlier, MANOVA examines whether average vectors from two or more groups come from the same sample distribution using appropriate test statistics. A test statistics is used to assess a particular hypothesis through sample data obtained from one or more populations. e hypothesis for the mean vectors is as follows: (1) e four most common test statistics used in testing this hypothesis are Wilks' lambda [10], Hotelling's trace [11], Pillai's trace [12], and Roy's largest roots [2,13].

Wilks' Lambda Test Statistic.
In the comparison of the mean vectors of p number of variables and g number of groups, the matrices are expressed as follows: where B represents the total matrix of squares between groups and W represents the total matrix of squares within groups, g is the number of mean vectors to be compared, x i is the number of observations for the i-th group, x is the general mean vector, n i is the number of observations for the i-th group, and S i is the variance-covariance matrix for the i-th group. e statistic which is defined by Wilks [10] for the first time, is the ratio of two matrices to the determinant. e approach to zero of this ratio is indicative of the difference between the mean vectors. Furthermore, for λ i while BW −1 is the root of the matrix and s is the number of matrices different than zero, Wilks' lambda statistic is given as where g is the number of groups, p is the number of variables in each group, N is the number of observations, λ i is the i-th root of BW −1 , and s � min(g − 1, p). Test statistic in equation (4) can be denoted as follows [2]: . For large samples, the Bartlett approach is preferred instead of this test statistic. As a test statistic for the Bartlett method, L � −[(N − 1 − (p + g))/ 2]ln Λ equation is used. is shows χ 2 distribution for p(g − 1) degree of freedom [14]. For multivariate multifactor analysis of variance, significance of Wilks' lambda test statistic can also be done with the help of F distribution [15].

Hotelling's Trace Test Statistic.
In this statistic which is developed by Hotelling [11] and Lawley [16], λ i 's are calculated from the root of BW −1 matrix [17]: If T > χ 2 Table[p(g−1)];α , then there is a difference between mean vectors. In order to test the T statistic, F distribution can be used [18].

Pillai's Trace Test Statistic.
e test statistic which was introduced by Pillai in 1955 is defined as F T shows an F distribution whose degree of freedom is s(2m + s + 1) and (2n + s + 1). For s = 1, distribution is a full F distribution [19]:

Roy's Largest Root Test Statistic.
If the largest root is denoted by λ max , Roy's largest root test statistic is introduced by Roy in 1957. is statistic is shown as e generated value is compared with the Heck graph with s, m, and n parameters. If the T statistic is greater than the Heck graph value, it is said to be that there is a difference between the mean vectors [20].

Results
In Table 1, when we observed type-I error rate of test statistics obtained from the simulation result where the parameter value of the Gamma distribution is (4-4-4; 0.5) and the number of variables is 2 with homogeneous and heterogeneous variances, Pillai's test statistic gives the closest result to the nominal value in balanced and unbalanced sample size. When the number of variables is 3 with homogeneous and heterogeneous variances, Roy's largest root test statistic gives better results in the balanced sample size and Pillai's test statistic gives better results in the unbalanced sample size. In the case of 3 variables, Hotelling's trace test statistic in balanced sample size and Wilks' lambda test statistic in unbalanced sample size give more closer results. In Figure 1, deviations from the type-I error value are expressed visually. Table 2 shows the type-I error rates of the test statistics obtained from the result of the simulation in case of the degree of freedom of Student's t distribution is two. According to the results, in the case of homogeneous and heterogeneous variances with variable numbers 2 and 3, all test statistics give the same results. In the case of homogeneous variance, Pillai's trace test statistic gives the closest result when the sample size is balanced and unbalanced. In the case of heterogeneous variance, Wilks' lambda test statistic gives the closest result to the nominal value when the samples are balanced and unbalanced.
In Figure 2, deviations from the type-I error value are expressed visually. As seen in Figure 2, the largest variation (10-10-50) in the type-I error value is in the group of observation numbers.
In Table 3, in the case of homogeneous variance with the balanced and unbalanced sample size for 2 variables, Roy's largest root test statistic gives the closest results. For 3 variables, Wilks' lambda statistic gives the closest result to the nominal value. In the case of heterogeneous variance, type-I error values show more variability than homogeneous variance situation. Despite this variability, Roy's largest root test statistic gives better results when the number of variables is 2 and 3.
In Figure 3, deviations from type-I error value are expressed visually.

Discussion and Conclusion
In this study, the results of the test statistics for different sample sizes were investigated by a simulation study in situations when the MANOVA prerequisites, particularly the normality assumption and homogeneous variance assumption, were violated; upon surveying the literature, the relevant studies in this respect are presented in Section 1. To summarize the results obtained in the previous studies, in his simulation study conducted in 1974, Olsan, who has a lot of studies on this subject, stated that Pillai's trace test statistic gives more robust results than the other test statistics when moved away from the normal distribution and the homogeneity of the covariance matrices is not achieved. In Olson        In summary, the test statistics used with MANOVA are affected by the violation of the homogeneity and normality assumptions of the covariance matrices, in particular from the unbalanced number of observations. According to scenario results, in the case of homogeneous variance Pillai's trace test statistic and in the case of heterogeneous variance Wilks' lambda test statistic give the best results in terms of performance, or the alternative robust test statistics and Bayes methods, which are recommended in the literature, can be used. is study can be extended by simulation studies for different scenarios with different distributions and parameters.
Data Availability e [simulated data] data used to support the findings of this study are available from the corresponding author upon request.

Disclosure
is study was published as abstract in XIX National and II International Biostatistics Congress Abstract Book, p. 57, 25-28 Oct 2017, Belek, Antalya, Turkey.