A Note on Functional Averages over Gaussian Ensembles

In this work we find a new formula for matrix averages over the Gaussian ensemble. Let ${\bf H}$ be an $n\times n$ Gaussian random matrix with complex, independent, and identically distributed entries of zero mean and unit variance. Given an $n\times n$ positive definite matrix ${\bf A}$, and a continuous function $f:\R^{+}\to\R$ such that $\int_{0}^{\infty}{e^{-\alpha t}|f(t)|^2\,dt}<\infty$ for every $\alpha>0$, we find a new formula for the expectation $\E[\mathrm{Tr}(f({\bf HAH^{*}}))]$. Taking $f(x)=\log(1+x)$ gives another formula for the capacity of the MIMO communication channel, and taking $f(x)=(1+x)^{-1}$ gives the MMSE achieved by a linear receiver.


Introduction
Random matrix theory was introduced to the theoretical physics community by Wigner in his work on nuclear physics in the 1950s ( [23,24]). Since that time, the subject is an important and active research area in mathematics and it finds applications in fields as diverse as the Riemann conjecture, physics, chaotic systems, multivariate statistics, wireless communications, signal processing, compressed sensing and information theory. In the last decades, a considerable amount of work has emerged in the communications and information theory on the fundamental limits of communication channels that makes use of results in random matrix theory [11,19,1]. For this reason, computing averages over certain matrix ensembles becomes extremely important in many situations. To be more specific, consider the well known case of the single user MIMO channel with multiple transmit and receive antennas. Denoting the number of transmitting antennas by t and the number of receiving antennas by r, the channel model is where u ∈ C t is the transmitted vector, y ∈ C r is the received vector, H is a r × t complex matrix and n is the zero mean complex Gaussian vector with independent, equal variance entries. We assume that E(nn * ) = I r , G. H. Tucci is with Bell Laboratories, Alcatel-Lucent, 600 Mountain Ave, Murray Hill, NJ 07974. E-mail: gabriel.tucci@alcatel-lucent.com.
where (·) * denotes the complex conjugate transpose. It is reasonable to put a power constraint E(u * u) = E(Tr(uu * )) ≤ P where P is the total transmitted power. The signal to noise ratio, denoted by snr, is defined as the quotient of the signal power and the noise power and in this case is equal to P/r.
Recall that if A is an n×n Hermitian matrix then there exists U unitary and D = diag(d 1 , . . . , d n ) such that A = UDU * . Given a continuous function f we define f (A) as Naturally, the simplest example is the one where H has independent and identically distributed (i.i.d.) Gaussian entries, which constitutes the canonical model for the single user narrow band MIMO channel. It is known that the capacity of this channel is achieved when u is a complex Gaussian zero mean and covariance snr I t vector (see for instance [20,19]). For the fast fading channel, assuming statistical channel state information at the transmitter, the ergodic capacity is given by where in the last equality we use the fact that Tr log(·) = log det(·). We refer the reader to [20] or [19] for more details on this.
Another important performance measure is the minimum mean square error (MMSE) achieved by a linear receiver, which determines the maximum achievable output signal to interference and noise ratio (SINR). For an input vector x with i.i.d. entries of zero mean and unit variance the MSE at the output of the MMSE receiver is given by where the expectation on the left hand side is over both the vectors x and the random matrices H, while the right hand side is over H only. We refer to [19] for more details on this.
There is a big literature and history of work on averages over Gaussian ensembles; see for instance [20,15,19,13,6,1,11,10,16,2,9,3] and references therein. In [20] the capacity of the Gaussian channel was computed as an improper integral. This integral is difficult to compute and asymptotic and simulation results are provided. In [3,2,13,16,17] several asymptotic results for large complex Gaussian random matrices are studied in connection with wireless communication and information theory. In [13] many aspects of correlated Gaussian matrices are addressed, in particular the capacity of Rayleigh channel was computed as the number of antennas increases to infinity. The books [19,11,1] are excellent introductions to random matrix theory and their applications to physics and information theory. In [10] the spectral eigenvalue distribution for a random infinite d-regular graph was computed.
The typical approach in computing averages over random matrices is to consider the asymptotic behavior as the size of the matrix increases to infinity. In this work we contribute to this area by providing a unified framework to express the ergodic mutual information, the MSE at the output of the MMSE decoder and other types of functionals of a single user MIMO channel, when the number of transmitting and receiving antennas are equal and finite. We do not rely on asymptotic results as the number of antennas increases. The results shown in this work are new and novel to the best knowledge of the author and they were not discovered before.
In Section 2, we present some preliminaries in Schur polynomials that are later used in this work. In Section 3, we prove the main result of the paper, Theorem 3.2. This Theorem provides a new formula for the expectation where A is positive definite matrix and f a continuous function such that We also discuss some applications and present some examples.

Schur Polynomials Preliminaries
A symmetric polynomial is a polynomial P (x 1 , x 2 , . . . , x n ) in n variables such that if any of the variables are interchanged one obtains the same polynomial. Formally, P is a symmetric polynomial if for any permutation σ of the set {1, 2, . . . , n} one has Symmetric polynomials arise naturally in the study of the relation between the roots of a polynomial in one variable and its coefficients, since the coefficients can be given by a symmetric polynomial expressions in the roots. Symmetric polynomials also form an interesting structure by themselves.
The resulting structures, and in particular the ring of symmetric functions, are of great importance in combinatorics and in representation theory (see for instance [4,12,8,14] for more on details on this topic). The Schur polynomials are certain symmetric polynomials in n variables. This class of polynomials are very important in representation theory since they are the characters of irreducible representations of the general linear groups. The Schur polynomials are indexed by partitions. A partition of a positive integer n, also called an integer partition, is a way of writing n as a sum of positive integers. Two partitions that differ only in the order of their summands are considered to be the same partition. Therefore, we can always represent a partition λ of a positive integer n as a sequence of n non-increasing and non-negative integers d i such that Notice that some of the d i could be zero. Integer partitions are usually represented by the so called Young's tableaux (also known as Ferrers' diagrams). A Young tableaux is a finite collection of boxes, or cells, arranged in left-justified rows, with the row lengths weakly decreasing (each row has the same or shorter length than its predecessor). Listing the number of boxes on each row gives a partition λ of a non-negative integer n, the total number of boxes of the diagram. The Young diagram is said to be of shape λ, and it carries the same information as that partition. For instance, in Figure 1 we can see the Young tableaux corresponding to the partition (5, 4, 1) of the number 10.
Given a partition λ of n the following functions are alternating polynomials (in other words they change sign under any transposition of the variables): where S n is the permutation group of the set {1, 2, . . . , n}. Since they are alternating, they are all divisible by the Vandermonde determinant The Schur polynomial associated to λ is defined as the ratio: This is a symmetric function because the numerator and denominator are both alternating, and a polynomial since all alternating polynomials are divisible by the Vandermonde determinant (see [4,8,14] for more details here). For instance, and Another definition we need for the next Section is the so called hook length, hook(x), of a box x in Young diagram of shape λ. This is defined as the number of boxes that are in the same row to the right of it plus those boxes in the same column below it, plus one (for the box itself). For instance, in Figure 2 we can see the hook lengths of the partition (5, 4, 1). The product of the hook's length of a partition is the product of the hook lengths of all the boxes in the partition.
We recommend the interested reader to consult [4,8,14] for more details and examples on this topic.

Averages over Gaussian Ensembles
Let M n be the set of all n × n complex matrices and U n the set of n × n unitary complex matrices. Let dH be the Lebesgue measure on M n and let be the Gaussian measure on M n . This is the induced measure by the Gaussian random matrix with complex independent and identically distributed entries with zero mean and unit variance in the set of matrices, when this is represented as an Euclidean space of dimension 2n 2 . Note that this probability measure is left and right invariant under unitary multiplication (i.e., dν(HU) = dν(UH) = dν(H) for every unitary U). The following Theorem can be found in page 447 of [8].
where h(λ) is the product of the hook-lengths of λ.
For every α > 0 let us define the following class of functions This is a Hilbert space with respect to the inner product Moreover, polynomials are dense with respect to this norm (see Chapter 10 in [18]). Let A α be the set of continuous functions in L 2 α and let A be the intersection of all the A α , Note that the family A is a very rich family of functions. For instance, all functions that do not grow faster than polynomials belong to these family.
In particular, f (t) = log(1 + t) ∈ A. where ∆(D) is the Vandermonde matrix associated with the matrix D = diag(d 1 , . . . , d n ) and T k is the matrix constructed by replacing the (k + 1) Proof. First, we will prove the Theorem for polynomials. Let p and q be two polynomials. It is clear that Tr (p + q)(H * AH) = Tr p(H * AH) + Tr q(H * AH) and (p + q) k = p k + q k for every k = 0, . . . , n − 1. Therefore, both sides of the Equation (7)  is equal to where λ k is the partition (m − k, 1 k ).
Since A is a positive definite matrix, by the spectral Theorem there exists U unitary and D = diag(d 1 , . . . , d n ) diagonal such that A = UDU * . Note that the d i are the eigenvalues of A. By definition of the Schur polynomials where ∆(D) is the Vandermonde matrix associated with the sequence {d i } n i=1 and S k is a matrix whose i-th column is equal to It is easy to see that after k transpositions of the rows of the matrix S k we obtain a new matrix H k whose i-th column is equal to This matrix is equal to the Vandermonde matrix ∆(D) except for the (k +1) row, {d (m + n − (k + 1))! (n − (k + 1))! · det(H k ).
Using the fact that ∞ 0 e −t t p dt = p! and the definition of p k (x) for the case p(x) = x m we see that Therefore, our claim holds and we have proven the result for all polynomials. Now consider f ∈ A and let β be the maximum eigenvalue, i.e., β = max{d 1 , . . . , d n }. Define α = 1/β. Since f ∈ A, then f ∈ A α and let {p (r) } r≥1 be a sequence of polynomials such that f −p (r) α → 0. Let T (n) k be the matrix constructed by replacing the (k +1) row of ∆(D) ({d Let T k be the matrix constructed by replacing the (k + 1) row of ∆(D) by To prove that Equation (7) holds it is enough to prove that det(T (n) k ) → det(T k ) as n → ∞ for every k = 0, 1, . . . , n − 1. For this, it is enough to prove that p for every k and every i = 1, 2, . . . , n. Note that where we use Cauchy-Schwartz for the second inequality and change of variable for the last one. Now, by construction the sequence {p (r) } satisfies finishing the proof.
Remark 3.3. We would like to observe that the case when not all the eigenvalues are different can be treated as above by perturbing of the original eigenvalues and applying a subsequent limit. We present an instance of this situation in Corollary 3.6.
As a consequence we have a new formula for the capacity of the MIMO communication channel and for the MMSE described in the introduction. is equal to where T k is the matrix constructed by replacing the (k + 1) row of ∆(D) Corollary 3.5. Let A be as in theorem 3.

Then
Mn where T k is the matrix constructed by replacing the (k + 1) row of ∆(D) As an application let us compute explicitly the two dimensional case for the capacity.  is equal to Putting all the pieces together we finish the proof.
Analogously, we can compute explicitly the moments for the two dimensional case.

Conclusion
Using results on random matrix theory and representation theory, in particular Schur polynomials, we prove a new formula for the average of functionals over the Gaussian ensemble. In particular, this gives another formula for the capacity of the MIMO Gaussian channel and the MMSE achieved by a linear receiver.