Computation of Channel Capacity Based on Self-Concordant Functions

The computation of channel capacity is a classical issue in information theory. We prove that algorithms based on self-concordant functions can be used to deal with such issues, especially when constrains are included. A new algorithm to compute the channel capacity per unit cost is proposed. The same view is suited to the computation of maximum entropy. All the algorithms are of polynomial time.


Introduction
The computation of channel capacity is a classical issue in information theory.Much attention has been paid to it [1][2][3].Despite some progress [3], the Arimoto-Blahut algorithm [4][5][6][7] is still the most used method.This method has been developed into a general algorithm, alternative minimization method [8], and has found applications in various fields [9][10][11].Recently, the computation of channel capacity with constraints, such as the computation of capacity-cost function, is becoming increasingly important [5,[12][13][14][15].The Arimoto-Blahut algorithm is very limited in dealing with such issues.Due to lack of appropriate means to eliminate the Lagrange multipliers, the method in [5], which is based on Arimoto-Blahut algorithm, cannot obtain the optimal point.
The computation of channel capacity has always been an optimization problem, but no one optimization algorithm, such as the gradient method and Newton's method, and so forth, has become the mainstream method.One reason is that so far no one optimization algorithm can be as effective as the Arimoto-Blahut algorithm.However, with advances in optimization theory and with the increasing importance of constrained channel capacity issues, the situation is changing.
In this paper, some new algorithms based on self-concordant function are proposed.Besides the effectiveness-it is of polynomial time-the algorithm can compute constrained channel capacity easily.In Section 2, we introduced the definition of the self-concordant function and the main properties, as well as some algorithms for convex optimization problems.In Section 3, we give some new algorithms for channel capacity with or without constraints and for channel capacity per unit cost.Because it is based on the self-concordant function, these algorithms are of polynomial time.Some conclusions are given in Section 4.

Self-Concordant Function and Convex Programming
For convenience, in this section, we make a brief introduction to self-concordant functions, including the definition, main propositions, and algorithms for convex optimization.
2.1.Self-Concordant Function.Self-concordant function theory, published by Nesterov and Nemirovskii in 1994 [16,17], is of landmark importance for establishing general interior point polynomial-time algorithms for convex programming.
The theory reveals the main property of the interior point Definition 1 (see [19]).Let for all x ∈ D.
Since linear and convex quadratic functions have zero third derivatives, they are trivial self-concordant functions.The most common nontrivial self-concordant function is the logarithmic function −log x.
We can get more self-concordant functions due to the following simple and useful combination rules [19].

Newton's Method for Self-Concordant Functions. Consider the convex optimization problem min
where D ⊂ R n is a bounded, closed, convex subset with a nonempty interior.Let f (x) be a convex function.

Repeat
(1) Compute the Newton step and Newton decrement else If f (x) is a self-concordant function on D, then the damped Newton algorithm above possesses the following good properties: (iii) if at some iteration i we have λ( f , x i ) ≤ 0.25, then we are in the region of quadratic convergence of the method, that is, for every (iv) the number of Newton steps required to find a point for some constant C.
For the convex optimization problem with equality constraints Ax = b, Newton's method and the bound (8) are still valid as long as the initial point x 0 ∈ int(D) satisfies Ax 0 = b, and the Newton step Δx nt and decrement λ( f , x) are computed as follows: For general convex optimization problems are where f 0 , . . ., f m : R n → R are convex and twice continuously differentiable, and A is a p × n matrix with rank (A) = p < n.By means of the logarithmic barrier function, (10) can be formulated approximately as an equality constrained problem as follows: where is called the logarithmic barrier function for (10).Let x * (t) be the optimal point of ( 11) and let p * be the optimal value of (10), then f 0 (x * (t)) − p * ≤ m/t.Therefore, we can simply take t = m/ε and solve the equality constrained problem (11) using Newton's method to get an ε-solution of (10), and if the objective function of ( 11) is self-concordant, then the number of Newton's iterate is bounded by (8).However, this method usually requires a large t, which may bring about a large f (x (0) ) − p * and a large number of Newton's iterate, thus it is rarely used.A commonly used method is the pathfollowing method as follows.
Compute x * (t) by minimizing t f 0 (t) + ϕ, subject to Ax = b, starting at x.
(2) Update.x = x * (t). ( If for any t > 0, f (x) = t f 0 (x) + ϕ(x) is self-concordant, then the total number of Newton steps in the path-following method, not counting the initial centering step, is The term in brackets is the iterate number of t from t 0 to m/ε, and the term in parentheses is the iterate number of Newton's method per centering step.

Algorithms Based on Self-Concordant Functions for Channel Capacity
In this section, it is proved that the channel capacity, with or without constrains, can be computed by the path-following method in polynomial time.Meanwhile, we prove that channel capacity per unit cost is a single-peak function of the expected cost.

Channel Capacity without Constrains.
Let X = {x 1 , . . ., x m } and Y = {y 1 , . . ., y n } be the input and output alphabets, respectively.Let p(x) = (p(x 1 ), . . ., p(x m )) and q(y) = (q(y 1 ), . . ., q(y n )) be the distributions of X and Y , respectively.Let T = (p(y j | x i )) m×n = (p i j ) m×n be a transition matrix from X to Y .Hence, If I(X; Y ) is the mutual information between X and Y , then Channel capacity is defined as follows: Equation ( 16) is a function of p(x) and p(x|y) and we denote it by I(p(x); p(x|y)).
The Arimoto-Blahut algorithm utilized (16) as follows.In ( 16), both p(x) and p(x|y) are unknown, but given p(x), (16) can get its maximum at and given p(x|y), (16) can get its maximum at If p 0 (x) = (1/m, . . ., 1/m), then the channel capacity can be approximated by computing p 0 (x|y), p 1 (x), p 1 (x|y), . . .through (18) and (19), alternatively.Let C(N + 1, N) be the mutual information I(p N+1 (x); p N (x|y)), then, In order to get an algorithm based on the self-concordant function, we utilize (15).Let By (15), the channel capacity can be expressed as follows: It is a standard convex optimization problem.Unfortunately, the objective function of ( 22) is not self-concordant, even when the logarithmic barrier −log p i is added.
It seems that we should add q j > 0 as constraints for (22) since there are log q j in the objective function of (22).But the q j are not independent variables.It can be computed by (14) and maintain positive as long as all the p i > 0, i = 1, 2, . . ., m, hold.Nevertheless, we still add the logarithmic barrier −log q j in the objective of (22) to get a self-concordant objective function.
For t > 0, consider the convex optimization problem as follows: In order to show the self-concordance of the objective function of ( 23), we need Proposition 6.

Proposition 6. For any
Proof.We have Thus By Propositions 3-6, the objective function of ( 23) is selfconcordant, so we can solve it by the path following method, and the number of Newton iterations is bounded by (13).
Equation (13) shows that solving (23) by the path following method is a polynomial time algorithm.In addition, a main advantage is that the algorithm can deal with constrains very easily.

Channel Capacity with Constrains and Channel Capacity
Per Unit Cost.Consider the channel capacity with constraints as follows: For r = 2, [5] gave some algorithms based on the Arimoto-Blahut method.Because one cannot eliminate the Lagrange multiplier, the algorithms in [5] cannot get the optimal solution for (26).
By adding logarithmic barriers for the inequality constraints in the objective of ( 26) as follows, we can get a polynomial time algorithm for (26): The objective function of ( 27) is self-concordant, so we can solve ( 27) by the path-following method.Channel capacity per unit cost is one of the typical problems of channel capacity with constraints [15,[20][21][22].By [15], channel capacity per unit cost can be computed by where C(β) is the capacity-cost function, which is the solution of the following constrained problem [12]: Equation ( 29) is a special case of (26) (r = 1).Here we discuss the channel capacity per unit cost in a more general manner.(2) let then ρ(β) is either a unimodal function or a monotone function.
(2) Without loss of generality, we can assume C(β) is differentiable.If not, we can approach it by a differentiable function to any precision [23].Suppose there is a β 0 ∈ (a, b), such that Let It is the equation of a straight line that has slope C (β 1 ) and passes through the (β 0 , C(β 0 )).Since C(β) is increasing and concave, we get y 1 (β 1 ) ≤ C(β 1 ) and It is the equation of a straight line that has slope C (β 0 ) and passes through (β 0 , C(β 0 )).Since C(β) is increasing and concave, we get y 0 (β 2 ) ≥ C(β 2 ) and then for any N > 0, there are x 0 , such that Since N is arbitrary, we get In turn, if then for any N > 0, there are β 0 , a < β 0 < b, such that and vice versa.Suppose (46) is valid, for any ε > 0, there are x 0 ∈ D, such that Since ε is arbitrary, we get On the other hand, since (47) is valid too, for any ε > 0, there are a < β 0 < b, such that Since ε is arbitrary, we get If F(x) = I(X, Y ) is the average mutual information defined by (15) and G(x) = E [b(X)], then (3) of Theorem 7 is Theorem 2 in [15].
In addition to average mutual information, entropy is another frequently used function in applications [24].It is obvious that Theorem 7 is valid when F(x) is an entropy function (as a function of the distribution of input symbols).It is useful to point out that Theorem 7 is still valid when G(x) is a positive definite quadratic function [25].
Making use of (2) of Theorem 7, we can write an algorithm to compute the sup(F(x)/G(x)) as follows.The algorithm is based on the 0.618 method used to locate the optimal point of a single peak function.( Even if a = 0, that is when there are zero cost input symbols, the algorithm is still valid.

Maximum Entropy.
A similar but simpler problem is the computation of maximum entropy.The typical form is as follows: where π i are known, and E i can be computed by sample data, i = 1, . . ., r.The GIS algorithm [26][27][28] is one of the most common algorithms for (54), which is only suited to linear constraints.In spite of that there is a good discussion on the computation in [27]; however, we cannot ensure that GIS is a polynomial time algorithm.
In [25] a maximum entropy problem with nonlinear constraints is considered.It is as follows: where (σ i j ) m×m is a positive definite matrix.We can solve (54) and (55) in polynomial time.In fact, for (54), we add logarithmic barriers − log p i in its objective function, for (55), we add logarithmic barrier − log p i and in its objective function, and the problems can be transformed into (3).It is good to know that the function in (56) is self-concordant.Therefore, the path following method is valid.
With the Arimoto-Blahut algorithm, iterate 180 times, we get the same results.
Let b = (1, 2, 5, 7, 6, 8, 3, 9, 3, 4) T be cost vector of the input symbols.Let G(p) = b T • p be the expect cost of the input symbols.The curve of C(β)/β is shown in Figure 1; By Algorithm 8, the channel capacity per unit cost is 0.5801, and it is attained at β = 1.8325.The search interval is [1, 5.8].The number of iterations of the path-following is 17.The number of iterations of the 0.618 method is 19.The number of iterations of Newton's algorithm is 5.
Let G(x) = b x + x Dx be a positive definite quadratic function, where The curve of C(β)/β is shown in Figure 2. The maximum is 0.2156, and it is attained at β = 5.9093.The number of iterations of the path-following is 17.The number of iterations of the 0.618 method is 19.The number of iterations of Newton's algorithm is 4.

Conclusion
By means of self-concordant function theory, the computation of channel capacity, especially, when there are constraints and when constraints are nonlinear, becomes very simple.When the numerator is a general concave function and the denominator is a general convex function, the formula about channel capacity per unit cost is still valid.Furthermore, the function C(β)/β is single peak, hence we can get some new algorithms for channel capacity per unit cost.

Figure 1 :Figure 2 :
Figure 1: Curve of C(β)/β, where G(x) is the expected cost of the input symbol.