1. Introduction The goal of this paper is to find an exact and useful form for the marginal distribution of the diagonal blocks of a 2×2 blocked Wishart random matrix. This problem arises in an applied problem, to estimate the parameters of a Wishart random field, which will be reported elsewhere.
Let A be a (m1+m2)×(m1+m2) Wishart random matrix, where the diagonal blocks are of orders m1×m1 and m2×m2, respectively. In our intended application m1, m2 will be small integers (and m1=m2, but we choose to treat the more general case). Write A=A1A12A12⊤A2.
Denote the number of freedom parameters by n and the scale parameter, which is a matrix blocked in the same way as A, by Σ=Σ1Σ12Σ12⊤Σ2. We are mostly interested in the special case Σ=Σ0ρΣ0ρΣ0Σ0 where the absolute value of ρ is less than one, but the general case is not more difficult.
All matrices are real. Notation: we use Tr(A) for the trace of the square matrix A and etr(A)=exp(Tr(A)). We write P(m) for the convex cone of real m×m positive definite matrices, and we write O(m) for the orthogonal group, that is, the set of m×m orthogonal matrices. The Stiefel manifold, that is, the set of m1×m2 column orthogonal matrices is written as Vm2,m1(m2≤m1). We indicate the transpose of a matrix by superscript ⊤.
In the convex cone of positive definite matrices, we use the cone order, defined by A<B meaning that B-A is positive definite, written as B-A>0. Integrals over cones are written as ∫0Ig(A)dA meaning the integral is taken over the cone 0<A<I. The multivariate gamma function is denoted by Γm(a) for R(a)>m-1/2; see Muirhead [1] for proofs and properties.
In Section 2 we give some background information, especially about the Jacobians which we need to evaluate the integrals. In Section 3 we state our results and give proofs. In Section 4 we give some comments on the result.
2. Background The single most important reference for background material for this paper is Muirhead [1]. Some results therefrom will not be cited directly.
When doing change of variables in a multiple integral we need to know the Jacobian. Here we will list the ones we need; most can be found in Muirhead [1] or in Mathai [2]. We are following the notation of Muirhead [1]. First there is a very brief summary.
For any matrix X, let dX denote the matrix of differentials dxij. For an arbitrary m1×m2 matrix X, the symbol dX denotes the exterior product of the mn elements of dX:(1)dX≡⋀j=1m2⋀i=1m1dxij.If X is a symmetric m2×m2 matrix, the symbol dX will denote the exterior product of the m2(m2+1)/2 distinct elements of dX:(2)dX≡⋀1≤i≤j≤m2m1dxij,with similar definitions for other kinds of structured matrices.
The following invariant form in the orthogonal group represents the Haar measure, H⊤dH=⋀i=1m⋀j=i+1mhj⊤dhi. Here H represents an orthogonal matrix. This form normalized to have total mass unity is represented by dH. We also need to integrate over a Stiefel manifold; then H⊤dH represents a similarly defined invariant form; see Muirhead [1].
Some needed Jacobians are not in Muirhead [1], so we cite those Jacobians here, from Díaz-García et al. [3, 4].
Lemma 1 (Jacobian of the symmetric square root of a positive definite matrix). Let S and R be in P+(m) such that S=R2 and let Δ be a diagonal matrix with the eigenvalues of R on the diagonal. Then,(3)dS=2mdetΔ∏i<jmΔi+ΔjdR=∏i≤jmΔi+ΔjdR.
This result can also be found in Mathai [2].
We need the generalized polar decomposition of a rectangular matrix. Let C be m1×m2 rectangular matrix with m2≤m1. Then we always have C=UH where H is positive semidefinite and positive definite if C has full rank, and U is m1×m2 column orthogonal matrix. In that last case, U is unique; see Higham [5].
Lemma 2 (Generalized polar decomposition). Let X be m1×m2 matrix with m1≥m2 and of rank m2, with m2 distinct singular values. Write X=UH, with U∈Vm2,m1 and H∈P(m2). Then H has m2 distinct eigenvalues. Also let Δ be the diagonal matrix with the eigenvalues of H on the diagonal. Then(4)dX=detΔm1-m2∏i<jm2Δi+ΔjdHU⊤dU.
Note that since those results are used for integration, the assumption of distinct singular values is unimportant, since the subset where the singular values are equal has measure zero.
3. Results Let us state our main result.
Theorem 3 (The marginal distribution of the diagonal blocks of a blocked Wishart random matrix with blocks of unequal sizes). Let A=A1A12A12⊤A2 be a (m1+m2)×(m1+m2) blocked Wishart random matrix, where the diagonal blocks are of sizes m1×m1 and m2×m2, respectively. The Wishart distribution of A has n≥m1+m2 degrees of freedom and positive definite scale matrix Σ=Σ1Σ12Σ12⊤Σ2 blocked in the same way as A. The marginal distributions of the two diagonal blocks A1 and A2 have density function given by (5)c·etr-12Σ1-1A1+F⊤C2FA1·etr-12C2-1A2·detA1n-m2-1/2detA2n-m1-1/2F01n214G,where C2=Σ2-Σ12⊤Σ1-1Σ12, F=C2-1Σ12⊤Σ1-1, and G=A21/2FA1F⊤A21/2. c-1=2(m1+m2)n/2Γm1(n/2)Γm2(n/2)detΣn/2. 0F1 is the generalized matrix-variate hypergeometric function, as defined in Muirhead [1].
Note that the definition of the matrix-variate hypergeometric function is by a series expansion, which is convergent in all cases we need; see Muirhead [1]. The rest of this section consists of a proof of this theorem.
Introduce the following notation: the Schur complements of Σ=Σ1Σ12Σ12⊤Σ2 is C1=Σ1-Σ12Σ2-1Σ12⊤ and C2=Σ2-Σ12⊤Σ1-1Σ12. Then define F=C2-1Σ12⊤Σ1-1.
In the following we will be using some standard results on blocked matrices without quoting them.
The Wishart density function of A written as a function of the blocks is(6)c·etr-12Σ1-1A1+F⊤C2FA1-2F⊤A12⊤+C2-1A2·detA1γdetA2-A12⊤A1-1A12γ,where c-1=2(m1+m2)n/2Γm1+m2(1/2n)detΣn/2 and γ=n-m1-m2-1/2. In the following we will work with the density concentrating on the factors depending on A12. To prove the theorem we need to integrate out the variable A12. The other variables, which are constant under the integration, will be concentrated in one constant factor. So we repeat formula (6) written as a differential form with the constants left out(7)K1·etrFA12detA2-A12⊤A1-1A12γdA12,where K1=c·etr-1/2(Σ1-1A1+F⊤C2FA1etr-1/2C2-1A2)det(A1)γ. Now, to find the marginal distribution of the diagonal blocks, we need to integrate over the off-diagonal block A12. Under this integration the value of the diagonal blocks A1 and A2 will remain fixed, and the region of integration will be a subset of Rm1×m2 consisting of the matrices A12 such that the block matrix A=A1A12A12⊤A2 is positive definite. This seems like a complicated set, but we can give a simple description of it using the polar decomposition of a matrix. Note that this is one of the key observations for the proof, and this author has not seen any use of this observation earlier.
Now we need to assume that m1≥m2. For the opposite inequality a parallel development can be given, using the other factorization detA=det(A2)det(A1-A12A2-1A12⊤). From, for instance, Theorem 1.12 in Zhang [6] it follows that the region of integration is the set(8)A12∈Rm1×m2:0<A12⊤A1-1A12<A2.Introduce E=A2-1/2A12⊤A1-1/2 where we use the usual symmetric square root. Then in terms of the new variable E the region of integration becomes(9)E⊤∈Rm1×m2:0<EE⊤<Iand with the generalized polar decomposition in the form E⊤=UP with P∈P+(m2), U∈Vm2,m1, EE⊤=P2 so the region of integration can be written as(10)P∈P+m2, U∈Vm2,m1:0<P2<Iwhich is a Cartesian product of a cone interval with a Stiefel manifold.
The Jacobian of the transformation from A12 to E is dE=dE⊤=det(A2)-m1/2det(A1)-m2/2dA12. The Jacobian of the polar decomposition E⊤=UP is dE=dE⊤=(detΔ)m1-m2∏i<jm2(Δi+Δj)dPU⊤dU, where Δ is a diagonal matrix with the eigenvalues of P on the diagonal; see Lemma 2. A last transformation will be useful. Define P2=X. The Jacobian of this transformation is dX=2m2detΔ∏i<jm2(Δi+Δj)dP; Δ is as above. See Lemma 1.
Applying this transformation the integral of (7) can be written as(11)K2·∫0I∫Vm2,m1etrX1/2A21/2FA11/2UdetI-XγdetXm1-m2-1/2dXU⊤dU,where the constant (12)K2=2-m2c etr-12Σ1-1+F⊤C2FA1+C2-1A2·detA1γ+m2/2detA2γ+m1/2.
We are ready to perform the integration over the Stiefel manifold. For this purpose we need a generalization of Theorem 7.4.1 from Muirhead [1], which we cite here.
Let X be m×n real matrix with m≤n and H=[H1:H2]n×n orthogonal matrix, where H1 is n×m. Then(13)∫OnetrXH1dH=F01n214XX⊤.But we have an integral over the Stiefel manifold, not the orthogonal group, so we need now to generalize the result (13) to an integral over the Stiefel manifold. What we need is the following. Let Vm2,m1 be the manifold of m1×m2 column orthogonal matrices with m2≤m1, and let f be a function defined on the Stiefel manifold. We can extend this function to a function defined on O(m1) in the following way. Let U be m1×m1 orthogonal matrix, and write it in block form as [U1:U2] such that U1∈Vm2,m1. How can we characterize the set of U2 which is complementing U1 to form an orthogonal matrix? First, let U2 be a fixed but arbitrary matrix complementing U1. Then clearly any other m1×(m1-m2) column orthogonal matrix with the same column space also works. The common column space is the orthogonal complement of the column space of U1. The set of such matrices can be described as {V∈Vm1-m2,m1:V=U2Q for Q∈O(m1-m2)}. For this set we write Vm1-m2,m1H1. As a set we can identify this with O(m1-m2). Specifically, we can identify U2 with the very special column orthogonal matrix 0m2×m1-m2Q, where Q∈O(m1-m2) which clearly forms a proper submanifold of the Stiefel manifold Vm1-m2,m1. The function f can now be extended to the orthogonal group by defining f(U)=f([U1:U2])=f(U1) and for the integral we find that(14)∫Om1fU1U⊤dU=∫Vm2,m1∫Vm1-m2,m1H1fU1:U2U⊤dU=∫Vm2,m1fU1U1⊤dU1∫Vm1-m2,m1H1Q⊤dQ=Vol Om1-m2∫Vm2,m1fH1H1⊤dH1.Returning to our integral, the integral over the Stiefel manifold occurring in (11) can now be written as(15)∫Vm2,m1etrX1/2A21/2FA11/2UU⊤dU=1VolOm1-m2∫Om1etrX1/2A21/2FA11/2U1U⊤dU,where U1 consists of the m2 first columns of U(16)=VolOm1VolOm1-m2∫Om1etrX1/2A21/2FA11/2U1dU=VolOm1VolOm1-m2F01m1214A11/2F⊤A21/2XA21/2FA11/2,where we did use (13). Here Vol(O(m))=2mπm2/2/Γm(m/2) is the volume of the orthogonal group; see Muirhead [1]. The differential form dU denotes Haar measure normalized to total mass unity.
Now write G=A21/2FA1F⊤A21/2; then we can write (11) as(17)K2VolOm1VolOm1-m2∫0IdetXm1-m2-1/2detI-XγF01m1214GXdXand to evaluate this integral we need Theorem 7.2.10 from Muirhead [1]; we do not state it here.
Using this we find a result we need for the integral of a hypergeometric function, by using the series expansion definition of the hypergeometric function and integrating term by term.
Theorem 4. If Y is a symmetric m×m matrix one has that(18)∫0IdetXa-m+1/2detI-Xb-m+1/2Fpqa1,…,apb1,…,bqXYdX=ΓmaΓmbΓma+bFp+1q+1a1,…,ap,ab1,…,bq,a+bYso both degrees of the hypergeometric function are raised by one.
The proof is a simple calculation that we leave out.
Now using (18) to calculate (17) we get, finally, the result (19)K2·VolOm1VolOm1-m2Γm2m/2Γm2n-m1/2Γm2n/2F12m12m12,n214Gbut note that one pair of upper and lower arguments to the hypergeometric function are equal with those arguments canceled.
With a little algebra we complete the proof of our main theorem.
4. Some Final Comments To help interpret our main result, we calculated the conditional distribution of the matrix A1 given the matrix A2. We will not give the full details of the calculation here but only give the result. The density of A1 given that A2=a2 has the density given by(20)12mn/2Γmn/2detC1n/2etr-12C1-1A1detA1n-m-1/2·etr-12ΩF01n214ΩC1-1A1,where we have given the conditional density only for the special case Σ=Σ0ρΣ0ρΣ0Σ0. For this case we have, with the notation from the main theorem, C1=C2=(1-ρ2)Σ0, F=ρ/1-ρ2Σ0-1, and F⊤C2F=ρ2/1-ρ2Σ0-1. We have defined Ω=ρ2C1-1a2, which can be seen as a noncentrality parameter. The density above is equal to the noncentral Wishart distribution given in Theorem 10.3.2 in Muirhead [1]. We see that the conditional distribution is a kind of noncentral Wishart distribution, where the noncentrality parameter Ω depends on the conditioning matrix A2. In this way, the effect of the conditioning is to change the distribution of A1, which in the marginal case is central Wishart, to a noncentral Wishart distribution, with noncentrality parameter depending on the conditioning matrix.
As said in Introduction, this result will be used for modelling of a spatial random field of tensors, where we will estimate the parameters using composite likelihood. This application will be reported elsewhere. For that application we will need to calculate values of matrix-variate hypergeometric functions numerically. A paper giving an efficient method for summing the defining series is Koev and Edelman [7], with associated Matlab implementation. Butler and Wood [8] give a Laplace approximation for the case we need, the F01 function.