Shannon Entropy: Axiomatic Characterization and Application

We have presented a new axiomatic derivation of Shannon Entropy for a discrete probability distribution on the basis of the postulates of additivity and concavity of the entropy function.We have then modified shannon entropy to take account of observational uncertainty.The modified entropy reduces, in the limiting case, to the form of Shannon differential entropy. As an application we have derived the expression for classical entropy of statistical mechanics from the quantized form of the entropy.


Introduction
Shannon entropy is the key concept of information theory [1]. It has found wide applications in different fields of science and technology [2][3][4][5]. It is a characteristic of probability distribution providing a measure of uncertainty associated with the probability distribution. There are different approaches to the derivation of Shannon entropy based on different postulates or axioms [6,7].
The object of present paper is to stress the importance of the properties of additivity and concavity in the determination of functional form of Shannon entropy and it's generalization. The main content of the paper is divided into three sections. In section 2 we have provided an axiomatic derivation of Shannon entropy on the basis of the properties of additivity and concavity of entropyfunction. In section 3 we have generalized Shannon entropy and introduced the notion of total entropy to take account of observational uncertainty. The entropy of continuous distribution, called the differential entropy has been obtained as a limiting value . In section 4 the differential entropy along with the quantum uncertainty relation has been used to derive the expression of classical entropy in statistical mechanics.

Shannon Entropy : Axiomatic Characterization
Let ∆ n be the set of all finite discrete probability distribution In other words, P may be considered as a random experiment having n possible outcomes with probabilities (p 1 , p 2 , ...., p n ). There is uncertainty associated with the probability distribution P and there are different measures of uncertainty depending on different postulates or conditions. In general, the uncertainty associated with the random experiment P is a mapping [8] where R is the set of real numbers. It can be shown that (2.1) is a reasonable measure of uncertainty if and only if it is a Shur concave on ∆ n [8]. A general class of uncertainty measures is given by where 0 log 0 = 0 by convention and k is a constant depending on the unit of measurement of entropy. There are different axiomatic characterizations of Shannon entropy based on different set of axioms [6,7]. In the following we shall present a different approach depending on the concavity character of entropy-function. We set the following axiom to be satisfied by the entropy function H(P ) = H(p 1 , p 2 , ...., p n ).
Axiom (2): We assume that generalized form of entropy-function (2.2): Axiom (3) : We assume that the function φ is a continuous concave function of its arguments.
THEOREM (2.1) : If the entropy-function H(P ) satisfies the above axioms (1) to (4),then H(P ) is given by where k is a positive constant depending on the unit of measurement of entropy.
PROOF : For two statistically independent experiments the joint probability distribution p jα Then according to the axiom of additivity of entropy (2.5), we have Let us now make small changes of the probabilities p k and p j of the probability distribution P = (p 1 , p 2 , ...., p j , ..p k , ..., p n ) leaving others undisturbed and keeping the normalization condition fixed.
By the axiom of continuity of φ the relation (2.8) can be reduced to the form The r.h.s of (2.9) is independent of q α and the relation (2.9) is satisfied independently of p's if The above leads to the Cauchy's functional equation The solution of the functional equation (2.11) is given by where A, BandCare all constants. The condition of concavity (axiom (3)) requires A < 0 and let us (1). The generalized entropy (2.4) then or where constants (B-A) and C have been omitted without changing the character of the entropyfunction. This proves the theorem.

Total Shannon Entropy and Entropy of Continuous Distribution
The definition (2.3) of entropy can be generalized straightforwardly to define the entropy of a discrete random variable.
DEFINITION : Let X ∈ R denotes a discrete random variable which takes on the values ., x n with probabilities p 1 , p 2 , ...., p n respectively, the entropy H(X) of X is then defined by the expression [3] H Let us now generalize the above definition to take account for an additional uncertainty due to the observer himself, irrespective of the definition of random experiment. Let X denotes a discrete random variable which takes the values x 1 , x 2 , ...., x n with probabilities p 1 , p 2 , ...., p n . We decompose the practical observation of X into two stages. First, we assume that X ∈ L(x i ) with probability x i . The Shannon entropy of this experiment is H(X). Second, given that X is known to be in the ith interval, we determine its exact position in L(x i ) and we assume that the entropy of this experiment is U(x i ). Then The global entropy associated with the random variable X is given by Let h i denotes the length of the ith interval L(x i ), (i = 1, 2, ..., n), and define We have then The expression H T (X) given by (3.4) will be referred to as the total entropy of the random variable X. The above derivation is physical. In fact, what we have used is merely a randomization of the individual event X = x i , (i = 1, 2, ...., n) to account for the additional uncertainty due to the observer himself, irrespective of the definition of random experiment [3]. We shall, derive the expression (3.4) axiomatically as generalization of the theorem (2.1). (1) to (4)  PROOF : The procedure is the same as that of theorem (2.1) upto the relation (2.12) :

THEOREM (3.1) : Let the generalized entropy (2.2) satisfies, in addition to the axioms
Integrating (3.6) with respect to p j and using the boundary condition (3.5), we have so that the generalized entropy (2.2) reduces to the form n p j where we have taken A = −k < 0 for the same unit of measurement of entropy and the negative sign to take account the axiom (1). The constants appearing in (3.8) have been neglected without any loss of characteristic properties. The expression (3.8) is the required expression of total entropy obtained earlier.
Let us now see how to obtain the entropy of a continuous probability distribution as a limiting value of the total entropy H T (X) defined above. For this let us first define the differential entropy H(X) of a continuous random variable X.
DEFINITION : The differential entropy H C (X) of a continuous random variable with probability density f (x) is defined by [9] where R is the support set of the random variable X. We divide the range of X into bins of length ( or width ) h. Let us assume that the density f (x) is continuous within the bins. Then by mean value theorem, there exists a value x i within each bin such that We define the quantized or discrete probability distribution (p 1 , p 2 , ....., p n ) by so that we have then The total entropy H T (X) defined for h i = h(i = 1, 2, ...., n) then reduces to the form Let h → 0, then by definition of Riemann integral we have H T (X) → H(X) as h → 0, that is, Thus we have the following theorem : The total entropy H T (X) defined by (3.13) approaches to the differential entropy H C (X) in the limiting case when the length of each bin tends to zero.

Application:Differential Entropy and Entropy in Classical Statistics
The above analysis leads to an important relation connecting quantized entropy and differential entropy. From (3.13) and (3.15) we see that showing that when h → 0 that is, when the length of the bins h is very small the quantized entropy given by the l.h.s of (4.1) approaches not to the differential entropy H C (X) defined in (3.9) but to the form given by the r.h.s of (4.1) which we call modified differential entropy. This relation has important physical significance in statistical mechanics. As an application of this relation we now find the expression of classical entropy as a limiting case of quantized entropy.
Let us consider an isolated system with configuration space volume V and a fixed number of particles N, which is constrained to the energy-shell R = (E, E +∆E). We consider the energy shell rather than just the energy surface because the Heisenburg uncertainty principle tells us that we can never determine the energy E exactly. we can make ∆E as small as we like. Let f (X N ) be the probability density of microstates defined on the phase space Γ = {X N = (q 1 , q 2 , ...., q 2N ; p 1 , p 2 , ...., p 2N ) . The normalized condition is Following (4.1) we define the entropy of the system as The The classical entropy that follows a limiting case of Von Neumann entropy is given by [12] f (X N ) N N This is, however different from the one given by (4.7) and it does not lead to the form of Boltzmann entropy (4.6).

Conclusion
The literature on the axiomatic derivation of Shannon entropy is vast [6,7]. The present approach is, however, different. This is based mainly on the postulates of additivity and concavity of entropy function. There are, infact, variant forms of additivity and non decreasing characters of entropy in thermodynamics. The concept of additivity is dormant in many axiomatic derivations of Shannon entropy. It plays a vital role in the foundation of Shannon information theory [13].
Non-additive entropies like Renyi's entropy and Tsallis entropy need a different formulation and leads to different physical phenomena [14,15]. In the present paper we have also provided a new axiomatic derivation of Shannon total entropy which in the limiting case reduces to the expression of modified differential entropy (4.1). The modified differential entropy together with quantum uncertainty relation provides a mathematically strong approach to the derivation of the expression of classical entropy.