On the Marginal Distribution of the Diagonal Blocks in a Blocked Wishart Random Matrix

Copyright © 2016 Kjetil B. Halvorsen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. LetA be a (m1 +m2) × (m1 +m2) blockedWishart randommatrix with diagonal blocks of ordersm1 ×m1 andm2 ×m2. The goal of the paper is to find the exact marginal distribution of the two diagonal blocks ofA. We find an expression for this marginal density involving the matrix-variate generalized hypergeometric function. We became interested in this problem because of an application in spatial interpolation of random fields of positive definite matrices, where this result will be used for parameter estimation, using composite likelihood methods.


Introduction
The goal of this paper is to find an exact and useful form for the marginal distribution of the diagonal blocks of a 2 × 2 blocked Wishart random matrix. This problem arises in an applied problem, to estimate the parameters of a Wishart random field, which will be reported elsewhere.
Let be a ( 1 + 2 ) × ( 1 + 2 ) Wishart random matrix, where the diagonal blocks are of orders 1 × 1 and 2 × ⊤ 12 2 ). Denote the number of freedom parameters by and the scale parameter, which is a matrix blocked in the same way as , by Σ = ( . We are mostly interested in the special case Σ = ( where the absolute value of is less than one, but the general case is not more difficult. All matrices are real. Notation: we use Tr( ) for the trace of the square matrix and etr( ) = exp(Tr( )). We write P( ) for the convex cone of real × positive definite matrices, and we write O( ) for the orthogonal group, that is, the set of × orthogonal matrices. The Stiefel manifold, that is, the set of 1 × 2 column orthogonal matrices is written as V 2 , 1 ( 2 ≤ 1 ). We indicate the transpose of a matrix by superscript ⊤.
In the convex cone of positive definite matrices, we use the cone order, defined by < meaning that − is positive definite, written as − > 0. Integrals over cones are written as ∫ 0 ( )( ) meaning the integral is taken over the cone 0 < < . The multivariate gamma function is denoted by Γ ( ) for R( ) > ( − 1)/2; see Muirhead [1] for proofs and properties.
In Section 2 we give some background information, especially about the Jacobians which we need to evaluate the integrals. In Section 3 we state our results and give proofs. In Section 4 we give some comments on the result.

Background
The single most important reference for background material for this paper is Muirhead [1]. Some results therefrom will not be cited directly.
When doing change of variables in a multiple integral we need to know the Jacobian. Here we will list the ones we need; 2 International Journal of Analysis most can be found in Muirhead [1] or in Mathai [2]. We are following the notation of Muirhead [1]. First there is a very brief summary.
For any matrix , let denote the matrix of differentials . For an arbitrary 1 × 2 matrix , the symbol ( ) denotes the exterior product of the elements of : If is a symmetric 2 × 2 matrix, the symbol ( ) will denote the exterior product of the 2 ( 2 + 1)/2 distinct elements of : with similar definitions for other kinds of structured matrices.
The following invariant form in the orthogonal group represents the Haar measure, ( ⊤ ) = ⋀ =1 ⋀ = +1 ℎ ⊤ ℎ . Here represents an orthogonal matrix. This form normalized to have total mass unity is represented by ( ). We also need to integrate over a Stiefel manifold; then ( ⊤ ) represents a similarly defined invariant form; see Muirhead [1].
Lemma 1 (Jacobian of the symmetric square root of a positive definite matrix). Let and be in P + ( ) such that = 2 and let Δ be a diagonal matrix with the eigenvalues of on the diagonal. Then, This result can also be found in Mathai [2]. We need the generalized polar decomposition of a rectangular matrix. Let be 1 × 2 rectangular matrix with 2 ≤ 1 . Then we always have = where is positive semidefinite and positive definite if has full rank, and is 1 × 2 column orthogonal matrix. In that last case, is unique; see Higham [5].
Lemma 2 (Generalized polar decomposition). Let be 1 × 2 matrix with 1 ≥ 2 and of rank 2 , with 2 distinct singular values. Write = , with ∈ V 2 , 1 and ∈ P( 2 ). Then has 2 distinct eigenvalues. Also let Δ be the diagonal matrix with the eigenvalues of on the diagonal. Then Note that since those results are used for integration, the assumption of distinct singular values is unimportant, since the subset where the singular values are equal has measure zero.

Results
Let us state our main result.

Theorem 3 (The marginal distribution of the diagonal blocks of a blocked Wishart random matrix with blocks of unequal sizes). Let
Wishart random matrix, where the diagonal blocks are of sizes 1 × 1 and 2 × 2 , respectively. The Wishart distribution of has ≥ 1 + 2 degrees of freedom and positive definite scale matrix Σ = ( blocked in the same way as . The marginal distributions of the two diagonal blocks 1 and 2 have density function given by [1].
Note that the definition of the matrix-variate hypergeometric function is by a series expansion, which is convergent in all cases we need; see Muirhead [1]. The rest of this section consists of a proof of this theorem.
Introduce the following notation: the Schur complements of Σ = ( In the following we will be using some standard results on blocked matrices without quoting them.
The Wishart density function of written as a function of the blocks is where −1 = 2 ( 1 + 2 ) /2 Γ 1 + 2 ((1/2) )(det Σ) /2 and = ( − 1 − 2 − 1)/2. In the following we will work with the density concentrating on the factors depending on 12 . To prove the theorem we need to integrate out the variable 12 . The other variables, which are constant under the integration, will be concentrated in one constant factor. So we repeat formula (6) written as a differential form with the constants left out 2 )) det( 1 ) . Now, to find the marginal distribution of the diagonal blocks, we need to integrate over the offdiagonal block 12 . Under this integration the value of the diagonal blocks 1 and 2 will remain fixed, and the region of integration will be a subset of R 1 × 2 consisting of the matrices 12 such that the block matrix = ( 1 12 ⊤ 12 2 ) is positive definite. This seems like a complicated set, but we can give a simple description of it using the polar decomposition of a matrix. Note that this is one of the key observations for the proof, and this author has not seen any use of this observation earlier.
Now we need to assume that 1 ≥ 2 . For the opposite inequality a parallel development can be given, using the other factorization det = det( 2 ) det( 1 − 12 −1 2 ⊤ 12 ). From, for instance, Theorem 1.12 in Zhang [6] it follows that the region of integration is the set where we use the usual symmetric square root. Then in terms of the new variable the region of integration becomes and with the generalized polar decomposition in the form ⊤ = with ∈ P + ( 2 ), ∈ V 2 , 1 , ⊤ = 2 so the region of integration can be written as which is a Cartesian product of a cone interval with a Stiefel manifold. The Jacobian of the transformation from 12 to is ( ) = ( ⊤ ) = det( 2 ) − 1 /2 det( 1 ) − 2 /2 ( 12 ). The Jacobian of the polar decomposition ⊤ = is ( ) = ( ⊤ ) = (det Δ) 1 − 2 ∏ 2 < (Δ + Δ )( )( ⊤ ), where Δ is a diagonal matrix with the eigenvalues of on the diagonal; see Lemma 2. A last transformation will be useful. Define 2 = . The Jacobian of this transformation is ( ) = 2 2 det Δ∏ 2 < (Δ + Δ )( ); Δ is as above. See Lemma 1. Applying this transformation the integral of (7) can be written as where the constant We are ready to perform the integration over the Stiefel manifold. For this purpose we need a generalization of Theorem 7.4.1 from Muirhead [1], which we cite here.
Let be × real matrix with ≤ and = [ 1 : But we have an integral over the Stiefel manifold, not the orthogonal group, so we need now to generalize the result (13) to an integral over the Stiefel manifold. What we need is the following. Let V 2 , 1 be the manifold of 1 × 2 column orthogonal matrices with 2 ≤ 1 , and let be a function defined on the Stiefel manifold. We can extend this function to a function defined on O( 1 ) in the following way. Let be 1 × 1 orthogonal matrix, and write it in block form as [ 1 : 2 ] such that 1 ∈ V 2 , 1 . How can we characterize the set of 2 which is complementing 1 to form an orthogonal matrix? First, let 2 be a fixed but arbitrary matrix complementing 1 . Then clearly any other 1 × ( 1 − 2 ) column orthogonal matrix with the same column space also works. The common column space is the orthogonal complement of the column space of 1 . The set of such matrices can be described as 2 ]) = ( 1 ) and for the integral we find that Returning to our integral, the integral over the Stiefel manifold occurring in (11) can now be written as where we did use (13). Here Vol(O( )) = 2 2 /2 /Γ ( /2) is the volume of the orthogonal group; see Muirhead [1]. The differential form ( ) denotes Haar measure normalized to total mass unity. Now write = 1/2 2 1 ⊤ 1/2 2 ; then we can write (11) as and to evaluate this integral we need Theorem 7.2.10 from Muirhead [1]; we do not state it here. Using this we find a result we need for the integral of a hypergeometric function, by using the series expansion definition of the hypergeometric function and integrating term by term. The proof is a simple calculation that we leave out. Now using (18) to calculate (17) we get, finally, the result but note that one pair of upper and lower arguments to the hypergeometric function are equal with those arguments canceled.
With a little algebra we complete the proof of our main theorem.

Some Final Comments
To help interpret our main result, we calculated the conditional distribution of the matrix 1 given the matrix 2 . We will not give the full details of the calculation here but only give the result. The density of 1 given that 2 = 2 has the density given by where we have given the conditional density only for the special case Σ = ( . For this case we have, with the notation from the main theorem, 1 = 2 = (1 − 2 )Σ 0 , = ( /(1 − 2 ))Σ −1 0 , and ⊤ 2 = ( 2 /(1 − 2 ))Σ −1 0 . We have defined Ω = 2 −1 1 2 , which can be seen as a noncentrality parameter. The density above is equal to the noncentral Wishart distribution given in Theorem 10.3.2 in Muirhead [1]. We see that the conditional distribution is a kind of noncentral Wishart distribution, where the noncentrality parameter Ω depends on the conditioning matrix 2 . In this way, the effect of the conditioning is to change the distribution of 1 , which in the marginal case is central Wishart, to a noncentral Wishart distribution, with noncentrality parameter depending on the conditioning matrix.
As said in Introduction, this result will be used for modelling of a spatial random field of tensors, where we will estimate the parameters using composite likelihood. This application will be reported elsewhere. For that application we will need to calculate values of matrix-variate hypergeometric functions numerically. A paper giving an efficient method for summing the defining series is Koev and Edelman [7], with associated Matlab implementation. Butler and Wood [8] give a Laplace approximation for the case we need, the 0 1 function.