Research Article Hierarchical Missing Data and Multivariate Behrens–Fisher Problem

This article ﬁrstly deﬁnes hierarchical data missing pattern, which is a generalization of monotone data missing pattern. Then multivariate Behrens–Fisher problem with hierarchical missing data is considered to illustrate that how ideas in dealing with monotone missing data can be extended to deal with hierarchical missing pattern. A pivotal quantity similar to the Hotelling T 2 is presented, and the moment matching method is used to derive its approximate distribution which is for testing and interval estimation. The precision of the approximation is illustrated through Monte Carlo data simulation. The results indicate that the approximate method is very satisfactory even for moderately small samples.


Introduction
Inferences with incomplete data have aroused lots of interest among statisticians in the past as well as present. e causes for missing data could be various which will not be discussed in this article. However, to ignore the process that causes missing data, it is usually assumed that the data are missing at random (MAR). For an exposition of such issues, we refer to Little and Rubin [1] or Little [2]. Lu and Copas [3] pointed out that inference from the likelihood method ignoring the missing data mechanism is valid if and only if the missing data mechanism is MAR.
ere are a few missing patterns considered in the literatures, but the incomplete data with monotone pattern (see display (1) and (2)) not only occur frequently in practice but also it allows the exact calculation of the maximum likelihood estimators (MLEs) and the likelihood ratio statistics and relevant distributions if multivariate normality is assumed. Anderson [4], one of the earliest authors in this area, gave a simple approach to derive the MLEs and present them for a special case of monotone pattern. Krishnamoorthy and Pannala [5,6] provided an accurate, simple approach to construct a confidence region for a normal mean vector. Hao and Krishnamoorthy [7] developed an inferential procedure on a normal covariance matrix. Yu et al. [8] considered the problem of testing equality of two normal mean vectors with the assumption that the two covariance matrices are equal, while Krishnamoorthy and Yu [9] considered the Behrens-Fisher problem. Yu et al. [10] considered the problem of testing equality of two normal covariance matrices with monotone missing data.
Besides, Batsidis [11][12][13] extends the inferences on monotone missing data to the assumption of elliptically contoured distributions of which the multivariate normal is a special case. For theory and methods of multivariate analysis based on the elliptically contoured distributions, we refer to Fang and Zhang [14].
Most of the papers mentioned above use a similar strategy in dealing with the monotone missing data. To illustrate this, consider the data matrices with 2-block monotone pattern as shown below: e strategy is as follows: if we do not have the extradata on y, i.e., we have only the first n samples on (x, y), usually we already have a statistics, say Q, out of the complete data. Similarly, if we have only (n + m) sample on x, we also have a similar statistics, say Q 1 , for the lower-dimensional problem.
We then decompose Q into two parts Q 1 ′ + Q 2 , which correspond to the sample data on x and y, respectively. However, since we have extradata on y, Q 1 ′ should be replaced with Q 1 . Hence, we get the final statistics for inference Q 1 + Q 2 .
In this article, we will define a new data missing pattern, the hierarchical data missing pattern, which is a generalization of monotone missing pattern. Moreover, the strategy just mentioned can also be used. To see this, we consider the multivariate Behrens-Fisher problem with hierarchical missing data. e approach that we will employ is based on the one due to Krishnamoorthy and Yu [9] for the monotone missing data. e article is organized as follows: in the following section, we define the hierarchical data missing pattern. In Section 3, an approximate method for the multivariate Behrens-Fisher problem with hierarchical missing data is outlined. e accuracy of the approximation is investigated using the Monte Carlo simulation in Section 4. e methods are illustrated using an example in Section 5, and some concluding remarks are given in Section 6.

Hierarchical Data Missing Pattern
e monotone pattern of missing data is like following data: where x ij is a p i × 1 vector, N 1 ≥ N 2 ≥ · · · ≥ N k , i � 1, . . . , k. In other words, there are N 1 observations available on the first p 1 components, N 2 observations available on the first p 1 + p 2 components, and so on. Notice that N 1 ≥ N 2 ≥ · · · ≥ N k and p 1 + · · · + p k � p.
We define the hierarchical data missing pattern of as the following pattern: where the index sets satisfy following conditions: (1) e index set of the first row, i.e (1, . . . , n, n + 1, . . . , n + m, n + m + 1, . . . , n + m + 1, n + m + 1 + 1 . . . , N), is the union of the index sets of all the other rows. (2) e index sets of two different rows are either disjoint, or inclusive. It is easy to see that the monotone pattern is a special case of the hierarchical pattern.
Now we consider the Behrens-Fisher problem with hierarchical missing data.
To formulate the problem, let x follows a p-variate normal distribution with mean vector μ and covariance matrix Σ, and we write this as x ∼ N p (μ, Σ). Meanwhile, let y ∼ N p (β, Δ), and y is independent of x. It is assumed that Σ and Δ are unknown and arbitrary positive definite matrices. Let us consider the problem of testing: Suppose that we have a sample of N 1 observations available on x and a sample of M 1 observations available on y. We consider a simple 3-block hierarchical data as shown below (it is easy to extend the ideas and procedures for 3block data to general case as in (3), but the notation will become very complicated): In other words, in the x sample, there are N 1 observations available on the first p 1 components, N 2 observations available on the first p 1 + p 2 components, and N 1 − N 2 observations available on the first p 1 and the last p 3 components. Notice that N 1 ≥ N 2 , M 1 ≥ M 2 , and p 1 + p 2 + p 3 � q 1 + q 2 + q 3 � p.
As pointed in Yu et al. [8], we do not need to consider the case of unequal pattern, i.e, p i ≠ q i , for some i � 1, 2, 3, since any type of unequal patterns data can be rearranged to form an equal monotone pattern. For example, assume that , where Partition the matrix Y similarly. at is, , where Let x l and S l denote, respectively, the sample mean vector and the sum of squares and sum of products matrix based on X l , l � 1, 2, 3. Similarly, let y l and V l denote, respectively, the sample mean vector and the sums of squares and products matrix based on Y l , l � 1, 2, 3. We partition these means and matrices accordingly as follows: 1 , e statistics y l and V l based on the data matrix Y in (7) are also partitioned like x l and S l : Finally, we partition the parameters as follows: where μ i is p i dimensional, i � 1, 2, 3. Furthermore, define δ � μ − β so that Let n i � N i − 1 and m i � M i − 1, i � 1, 2, 3. e following summary statistics are needed to define the pivotal quantity that we will use for hypothesis testing about δ. Let Furthermore, let e pivotal quantity that we propose for hypothesis testing and confidence estimation of δ which is given by

Journal of Mathematics
e idea behind Q is as follows: if there are only N 2 (M 2 ) observations on the first p 1 components of X(Y), the appropriate statistic for hypothesis testing and confidence estimation of ( δ 1 ′ , δ 2 ′ ) ′ � ( ( μ 1 − β 1 ) ′ , ( μ 1 − β 2 ) ′ ) ′ can be decomposed as the sum of two parts after some algebra: Since there are additional observations on the first p 1 components, the first part above should be replaced by Q 1 .
Similarly, If there are only the last N 3 (M 3 ) observations on the first p 1 components of X(Y), the appropriate statistic for hypothesis testing and confidence estimation of ( δ 1 ′ , δ 3 ′ ) ′ can be decomposed as the sum of two parts after some algebra: Again, the first part should also be replaced by Q 1 .

Hypothesis Test and Confidence Region for μ − β.
Because Q is resembling the Hotelling-T 2 statistic, and its distribution is free of any parameters, it is reasonable to approximate its distribution by the distribution of dF p,] , where d is a positive constant and F(a, b) denotes the F random variable with numerator degrees of freedom a and the denominator degrees of freedom b.
To find an approximation to the distribution of Q, we evaluated its first two approximate moments in the Appendix. en, using the "moment matching" method, the distribution of Q is approximated by dF p,] , where d is a positive constant, and F a,b denotes the F random variable with numerator degrees of freedom a and the denominator degrees of freedom b. e unknown constants d and v can be determined so that the first two moments of Q are equal to those of dF p,] . Using the modified Wishart approximation (see Lemma A.1 in Appendix) and following the lines of Krishnamoorthy and Pannala [6], we evaluated an approximation G 1 for E(Q) and an approximation G 2 for E(Q 2 ) in Appendix. To express G 1 and G 2 , we need the following terms.
Let S 1 � (S (1,1) 1 /(n 1 N 1 )), V 1 � (V (1,1) 1 /(m 1 M 1 )), C 1 � S 1 + V 1 , and Let In terms of the above quantities, we have and an approximation to the distribution of the pivotal quantity Q in (10) is given by (21) us, for a given level α and an observed value Q 0 of Q, the null hypothesis that δ � μ − β � 0 will be rejected whenever the p value Furthermore, an approximate 1 − α confidence set for μ − β is the set of values of δ that satisfy where Q is given in (9) and

Accuracy of the Approximations
We have used two approximations, one for approximating the sum of two Wishart matrices with different scale matrices and another for approximating the moments of Q to derive the distribution of Q. So, to understand the accuracy of the approximation, we estimated the sizes of the test for hypotheses in (4) when the nominal level is 0.05 using the Monte Carlo simulation.
To select the parameter configurations for Monte Carlo simulation, we note that the distribution of Q is location invariant, and so without loss of generality, we can assume that μ � β � 0 to estimate the sizes. As pointed out in the study of Krishnamoorthy and Yu [9], we can also take Σ as a diagonal matrix with positive elements and Δ as a correlation matrix.
e estimated sizes are presented in Table 1 for the case of p 1 � 2, p 2 � 1, p 3 � 1, and a few selected sample sizes. e sample sizes are chosen so that the number of data missing is relatively small in some cases and large in other cases. It is clear from Table 1 that the coverage probabilities are very close to 0.95 for all the cases considered. In the worst situations, the coverage probabilities are around 0.93.

An Illustrative Example
We shall now illustrate the methods using "Fisher's Iris Data" which represent measurements of the sepal length and width and pedal length and width in centimeters of fifty plants for each of three types of iris: Iris setosa, Iris versicolor, and Iris virginica. e data sets are posted in many websites, and we downloaded them from http:// javeeh.net/sasintro/intro151.html. For illustration purpose, we use the data on virginica (x) and setosa (y). Since the sample size is large enough, we simply assume that the data are following approximately a multivariate normal distribution.
We created hierarchical patterns by discarding the last 15 measurements on x 3 (pedal length of virginica) and the first 35 measurements on x 4 (pedal width of virginica), the last 30 measurements on y 3 (pedal length of setosa) and the first 20 measurements on y 4 (pedal width of setosa). at is, we have p 1 � 2, p 2 � 1, p 3 � 1, and (N 1 , N 2    Since Q is much larger than the critical value, we have sufficient evidence to reject H 0 at 95% confidence level.

Concluding Remarks
In this article, we define hierarchical data missing pattern and point out that the strategy in many papers dealing with monotone missing data can be extended to deal with hierarchical missing data. To illustrate this, the multivariate Behrens-Fisher problem is considered. Based on the procedures due to Krishnamoorthy and Yu [9] dealing with the monotone missing data, we proposed a Hotelling T 2 type test for Behrens-Fisher problem. e test is simple to use, and the hierarchical patterns of the two samples are not necessarily the same.
As pointed out by two reviewers, this paper is based on multivariate normal population. Like what did in Batsidis [11][12][13] for monotone missing data, an extension of the results given in this paper for hierarchical missing data from elliptic distribution is an interesting open problem. Moreover, the proposed study can be extended for the neutrosophic statistics as future research. For details of neutrosophic statistics, see Aslam [15,16] and Kashif et al. [17]. Appendix e following two lemmas are needed to find approximate moments of Q in (14). In Lemma A.1, we propose the modified version of the Nel and van der Merwe [18] Wishart approximation given in Krishnamoorthy and Yu [19]. For a proof of Lemma A.2, see Seber [20]; p. 52.