Research Article Rapidly-Converging Series Representations of a Mutual-Information Integral

This paper evaluates a mutual-information integral that appears in EXIT-chart analysis and the capacity equation for the binaryinput AWGN channel. Rapidly-converging series representations are derived and shown to be computationally efficient. Over a wide range of channel signal-to-noise ratios, the series are more accurate than the Gauss-Hermite quadrature and comparable to Monte Carlo integration with a very large number of trials.


Introduction
The mutual-information between a binary-valued random variable and a consistent, conditionally Gaussian random variable with variance σ 2 is log 2 (1 + e −y )dy. ( This function and its inverse play a central role in EXITchart analysis, which may be used to design and predict the performance of turbo codes [1], low-density parity-check codes [2], and bit-interleaved coded modulation with iterative decoding (BICM-ID) [3].A change of variable yields which has the same form as the equation for the capacity of a binary-input AWGN channel [4].Despite its origins in the early years of information theory, the integral in (2) has not been expressed in closed form or represented as a rapidly-converging infinite series.Consequently, (2) has been evaluated by numerical integration, Monte Carlo simulation, or approximation [2].
As will be shown subsequently, a rapidly-converging series representation for (1) and (2) is where Numerical evaluation of (3) requires truncation to a finite number of terms.Since an alternating series satisfying the Leibniz convergence criterion [5] appears in (3), simple upper and lower bounds are easily obtained by series truncation, and the maximum error is easily computed from the first omitted term.Only six terms in the summation over m in (3) are needed to compute J(σ) with an error less than 0.01.
Define J M (σ) to be (3) with the upper limit on the summation over m replaced with M. Let E(M, σ) be the magnitude of the error when the series that is truncated after M terms, i.e.E(M, σ which indicates a rapid convergence of (3).Because of the dependence on σ in the bound given by ( 5), the number of terms required to achieve a very small error may become large for low σ.This motivates the use of an alternative series expression for (1) and ( 2) that is suitable for small σ.A rapidly-converging series representation for 0 < σ ≤ 0.25 is where Note that ( 6) is valid for Both series entail the computation of the Q function, which is itself an integral.However, the central importance of the Q function has led to the development of very efficient approximations to compute it, for instance, by using the erfc function found in MATLAB or the approximation proposed in [6].
In EXIT-chart analysis [1], the inverse of J(σ) is required.The truncated series can be easily inverted by using numerical algorithms such as the MATLAB function fzero, which uses a combination of bisection and inverse quadratic interpolation.
The remainder of this paper is organized as follows.The series are derived in Section 2. Computational aspects of (3) and ( 6) are discussed in Section 3. The series are compared against Gauss-Hermite quadrature in Section 4, and the paper concludes in Section 5.

Derivation of Series
To derive the desired representations, we first derive a family of series representations of a more general integral.Consider the following integral: where Since I(0, σ) = 0 and I(a, 0) = ln(1 + a), a > 0 and σ > 0 are assumed henceforth.For any β ≥ 0, define Dividing the integral in (8) into two integrals, changing variables in the second one, and using the fact that G(x) is an even function, we obtain I(a, σ) = I 1 (a, σ) + I 2 (a, σ), where A uniformly convergent Taylor series expansion of the logarithm over the interval [0, 1 + 2β] is where 0 ≤ y ≤ 1 + 2β.The uniform convergence can be proved by application of the Leibniz criterion for alternating series [5].

Evaluation of Series
To numerically evaluate (3) and ( 6), the summations must be evaluated with a finite number of terms.Let the largeσ series be J M (σ), that is, (3) with the upper limit on the summation over m replaced with M. Similarly, define J M,N (σ) to be the small-σ series given by ( 6) with the upper limits of the summations over m and n replaced with M and N , respectively.
The rapid convergence of the large-σ series is illustrated in Figure 1, which shows the value of the truncated series J M (σ) as a function of M for σ = 0.25.The error bounds determined by (5), which are also shown, are observed to be pessimistic.
Figure 2 shows the number of terms required for the large-σ series to converge to attain various error magnitudes as a function of σ. Figure 2 shows that (3) is not efficiently computed for small values of σ and error magnitude, which motivates the use of (6) for small σ.
Evaluation of ( 6) is complicated by the presence of two infinite summations.For the range of σ of interest, the summation over n is the dominant of the two infinite summations.This behavior is illustrated in Figure 3, which shows the values of the summations over m and n for σ = 0.25 as a function of the number of terms.When computed to 10 terms, the value of the summation over n is 7.8 × 10 −3 , while the value of the summation over m is only −8.5 × 10 −7 .For lower values of σ, the magnitude of the summation over m is even smaller and becomes negligible as σ approaches zero.For this reason, we evaluate the summation over m first and select the upper limit on the summation M such that where ζ is a small threshold value.If the numerical accuracy requirements are modest, the summation over m can be omitted.
After computing the summation over m to M terms, ( 6) is evaluated with the number of terms N in the summation over n chosen to satisfy After each term is added to the summation over n in (6), the absolute value in (24) is evaluated, and the process halts once the threshold is reached.The number of terms M and N to achieve convergence criterion (23) with ζ = 10 −4 and (24) with = 10 −2 is shown in Figure 4 for σ ≤ 0.5.For higher values of σ, evaluation of (6) becomes unstable because the large upper limit on the summation over n results in large binomial coefficients that cause numerical overflow in typical implementations.Also shown in Figure 4 is the number of terms M required to compute the truncated large-σ series (3) with convergence threshold = 10 −2 , where M is related to by From Figure 4, it might appear that the small-σ series ( 6) is less complex to evaluate than the large-σ series (3) for all σ ≤ 0.5.However, due to the presence of the double summation in (6), this is not necessarily true.To determine a threshold below which the small-σ series is preferable computationally, one should consider the total number of terms containing an exponential and/or Q-function.To evaluate the truncated large-σ series (3), a total of M + 2 terms involving exp and/or Q must be computed.On the other hand, to evaluate the truncated small-σ series (6), a total of M+2+N (N +1)/2 terms involving exp and/or Q must be computed.Figure 5 compares the total number of terms involving exp and/or Q that must be computed for each  of the two series representations as a function of σ.From this figure, it is seen that for σ ≤ 0.26, fewer terms are required for the small-σ series.For all values of σ, the number of terms is fewer than 28, and for most σ it is significantly smaller.
Figure 6 compares the series representations against the value of (1) found using Monte Carlo integration with one million trials per value of σ.The small-σ series is used for σ ≤ 0.25, and the large-σ series is used for σ > 0.25.As before, = 10 −2 for both series, and ζ = 10 −4 for the small-σ series.There is no discernible difference between the series representations and the Monte Carlo integration, and any small differences can be attributed mainly to the finite number of Monte Carlo trials.Given the rapid rate of convergence of (3), it is interesting to see the value of the large-σ series when only one term is maintained in the summation or if the summation is dropped completely. Figure 7 compares the value of the truncated large-σ series for M = {0, 1} and the M required to satisfy the convergence criterion with = 10 −4 .Using M = 0 provides an upper bound that is tight only for large values of σ, such as σ > 2, whereas using M = 1 provides a tight lower bound even for relatively small values of σ.Using M = 4 gives two decimal places of accuracy for all σ ≥ 0.1.

Comparison with Gauss-Hermite Quadrature
The integral given by ( 1) and (2) may also be solved using a form of numerical integration known as Gauss-Hermite quadrature [7].After a change of variables (z = y/ √ 2), the integral may be written as where With the Gauss-Hermite quadrature, the integral in (26) is evaluated using  where f (z) is given by ( 27), z i are the roots of the nth-degree Hermite polynomial and w i are the associated weights The roots {z 1 , . . ., z n } of H n (z) may be found using Newton's method [7].We compare five realizations of the J(σ) function as follows: (1) the "infinite" large-σ series representation, computed with very large M (i.e., M = 750); (2) the truncated large-σ series representation (3), truncated to M terms; (3) the truncated small-σ series representation (6), with the first summation truncated to M = 1 terms and the second (double) summation truncated to an upper limit on the outer summation of N ; (4) the Gauss-Hermite quadrature with n = M terms, that is, the same number of terms as the truncated series representation; (5) Monte Carlo integration with 1 million trials.
For each realization, we determine the error to be the magnitude of the difference between the calculated value and the value computed by the "infinite" series.
Figure 8 shows the errors of the two truncated series and the Gauss-Hermite quadrature.The Gauss-Hermite quadrature is evaluated with M = 6 terms, and the large-σ series is truncated to M = 6 terms.In order to have a comparable number of terms in the summations in (6), M = 1 and N = 2 are used for the small-σ series.The error of the Monte Carlo approach is also shown (dotted line).
When σ > 1.6, the truncated large-σ series usually provides smaller error than Gauss-Hermite quadrature (except at σ = 2.5 and σ = 3.9, where the Gauss-Hermite quadrature briefly has a smaller error).Since the amount of computation required per term of the series is roughly the same, the largeσ series is preferable to the Gauss-Hermite quadrature.For σ < 1.6, the error of the Gauss-Hermite quadrature is smaller than that of the large-σ series.In this region, the small-σ series could be used, since it provides a smaller error than the large-σ series below σ = 0.38.

Conclusions
Series representations have been derived for a mutualinformation function that is used in EXIT-chart analysis and the evaluation of the capacity of a binary-input AWGN channel.Truncated versions of the series are computationally competitive with the Gauss-Hermite quadrature and do not require finding roots of Hermite polynomials.The series are useful for computation and to provide simple lower and upper bounds.

Figure 1 :
Figure 1: Truncated large-σ series J M (σ) for σ = 0.25 and a bound on the error.

Figure 2 :
Figure 2: Minimum number of terms required to attain various error magnitudes E(M, σ).

Figure 3 :Figure 4 :
Figure 3: Evaluation of the two summations in (6) for σ = 0.25.(a) Value of the summation over m as a function of the number of terms M; (b) Value of the summation over n as a function of the number of terms N.

Figure 5 :
Figure5: The total number of terms involving exp and/or Q in the small-σ series (6) and the large-σ series (3).

Figure 8 :
Figure 8: Error (relative to infinite series) of truncated series with 6 terms and Gauss-Hermite quadrature with 6 terms.Monte Carlo with 1 million trials shown for comparison purposes.