ENTROPY CONSIDERATIONS IN THE NONCOMMUTATIVE SETTING

An analogue of the classical link between the relative entropy and Fisher information entropy is presented in the context of free probability theory. Several generalizations of the relative entropy in terms of density matrices are also discussed.


Introduction.
It is known that the equilibrium states of Hamiltonian models of interacting particle systems correspond to states with extremal entropic properties.The transition to equilibrium occurs through the metastable states, related to probabilistic limit results, such as the law of large numbers, the central limit theorem, large deviations.The principle of maximum entropy is well known and widely used in the construction of mathematical models.According to it, under the uncertainty of the choice of a specific type of distribution to be used as a model of a physical phenomenon, maximum entropy distributions (and/or their mixtures) should be tested first.Conversely, if a distribution fits a real phenomenon well, its relationship with maximum entropy distributions should be investigated.
Recall that for a discrete random variable X taking values x 1 ,...,x n with probabilities p 1 ,...,p n , Shannon's entropy is H(X) = − n k=1 p k log p k .For an absolutely continuous random variable X with Lebesgue density p(x), Shannon's entropy is given by H(X) = −Ep(X) log p(X).
The probability distributions corresponding to the extremal entropy cases are the building blocks of infinitely divisible distributions, as proved by Goldman [4].Namely, the normal and exponential/Poisson distributions maximize Shannon's entropy when certain moments are prescribed.
Exponential distributions have maximum entropy among all distributions concentrated on the positive half-line and having finite expectations (if Y 1 is a random variable with exponential distribution of parameter λ > 0 (EY 1 = λ), then for all X random variables such that P (X ≤ 0) = 1 and EX = λ, one has H(X) ≤ H(Y 1 )).
Normal distributions have maximum entropy among all distributions with finite variances (if Y 2 is a normal random variable with parameters 0, σ 2 , then for all X random variables with EX = 0 and EX 2 = σ 2 , one has H(X) ≤ H(Y 2 )).
This paper is organized as follows.In Section 2, we present the classical link between the relative entropy and Fisher information measure developed by Barron [1].The concept of free random variable is introduced in Section 3, by analogy with the classical notion of independence.In Section 4, we present the free relative entropy and free Fisher information and provide a step-by-step derivation of a relationship between these two measures, in analogy to the classical case presented in Section 2. We conclude with a discussion and comparison of various candidates for the quantum analogue of Fisher information which is not uniquely defined in the recent literature.

2.
The relative entropy and Fisher information measure.In considering central limit theorem type results, it is more convenient to introduce the relative entropy (Kullback-Leibler information "distance"), as remarked by Barron in [1].Definition 2.1.If X is a random variable with finite variance and density p(x) (with respect to the Lebesgue measure), then the relative entropy of X with respect to the normal density g, with the same mean and variance as p, is Obviously, the relative entropy is not symmetric with respect to p and g, so it is not a proper metric (distance).Nonetheless, it measures the deviation of p from g in terms of entropy.If EX = a and var X = σ 2 , then Hence the relative entropy is the difference between Shannon's entropy of the normal law and that of X.The concavity of the logarithm implies that S is nonnegative and equals zero only if p(x) = g(x) almost everywhere (i.e., the normal law has maximum Shannon's entropy for a given variance).Definition 2.2.For a random variable X, with continuously differentiable density f (x) and EX = a, var X = σ 2 < ∞, the standardized Fisher information measure is defined as where g is the normal density with the same mean and variance as f .
The normal density has minimum Fisher information measure for a given variance (since Φ ≥ 0, with equality only if f = g).Proposition 2.3 (see [1]).The relative entropy S(X) of any random variable X with finite variance is the integral of Fisher's standardized information, with respect to an independent normal random variable Y having the same mean and variance as X, namely In what follows we derive an analogue of the above relationship in the noncommutative setting of free probability.

3.
Freeness.We investigate results describing the entropy and the asymptotic behavior of sequences of observables (physical quantities), treated as elements of a von Neumann algebra Ꮽ, acting on a separable complex Hilbert space, and having a faithful normal state ϕ.The states are represented by the linear functionals ϕ on Ꮽ, satisfying the conditions of positivity (ϕ(A * A) ≥ 0, A ∈ Ꮽ) and normalization (ϕ(I) = 1); ϕ(A) is then interpreted as the expected value of the observable A for the state ϕ.The states of a given system form a convex set, with the pure states being its extremal elements and the mixed states are represented by positive trace class operators (density matrices).The dynamical properties are reflected in the spectrum of the density evolution operators (density matrices).For instance, dynamical systems with continuous spectrum evolve in a mixing manner, as first noted by von Neumann and Koopman [6].
The classical notion of independence allows the calculation of mixed moments in terms of single moments.Consider two random variables X and Y , living on some probability space (Ω, U,P), assumed to be bounded (so that all their moments exist and determine their respective distributions).The expectation In particular, ϕ(XY ) = ϕ(X)ϕ(Y ) and ϕ(XXY Y ) = ϕ(XY XY ) = ϕ(XX)ϕ(Y Y ).
Definition 3.1.A noncommutative probability space (Ꮽ,ϕ) consists of a unital algebra Ꮽ and a unital linear functional (analogue of the classical expectation map) A random variable X is an element of the given algebra Ꮽ.The random variables X 1 ,X 2 ,... ∈ Ꮽ are free (with respect to ϕ) if, for all m ∈ N and for all polynomials p 1 (X), p 2 (X),...,p m (X) in the variable X we have: whenever ϕ p k (X r (k) ) = 0, for all k = 1,...,m and r (k) ≠ r (k + 1), for all k = 1,...,m − 1 (consecutive indices are distinct).
This definition provides a rule for calculating mixed moments in the free variables in terms of moments of the single variables.
Consider, for instance, the noncentered polynomial-like variables (3.4) as in the independent case.Similarly, Hence ϕ(XXY Y ) ≠ ϕ(XY XY ), illustrating the fact that freeness is the noncommutative analogue of independence.
Since for the free random variables X and Y the moments of X +Y can be expressed in terms of the moments of X and the moments of Y , the distribution µ X+Y depends on the distributions µ X and µ Y , through a special type of convolution introduced by Voiculescu in [12], called the free additive convolution and denoted , The analogue of the logarithm of the Fourier transform used to linearize the convolution in the classical case (ln ) is the R-transform in the free setting.The R-transform R µ (z), corresponding to each probability measure µ, is an analytic function on the upper half-plane, viewed as a formal power series in the indeterminate z.The desired feature of the R-transform is to linearize free additive convolution: The R-transform is uniquely determined by the Cauchy-transform G µ of µ, defined by the analytic function such that G µ (z) = G µ (z).Its Laurent series expansion, involving the moments of µ, is This series can be formally inverted with respect to z: The coefficients C n (called free cumulants) are polynomial functions of the moments The link between the R-transform and the Cauchy-transform is that G µ (z) and R µ (z) + 1/z are inverses of each other with respect to composition: is absolutely continuous with respect to the Lebesgue measure and its density is is the free analogue of the variance parameter of a classical normal law.) 4. Free entropy.The free analogues of Shannon's entropy and Fisher information measure were first considered by Voiculescu in [13,14], and further investigated in [8].
For a von Neumann algebra Ꮽ and a faithful normal trace state ϕ : Ꮽ → C, consider the tracial W * -probability space (Ꮽ,ϕ).The distribution of any X ∈ Ꮽ, is defined by the unique probability measure µ, compactly supported on R, having The Hilbert space L 2 (Ꮽ,ϕ) is obtained completing Ꮽ with respect to the norm X 2 = ϕ(X * X), X ∈ Ꮽ.For a positive integer N, denote by ᏹ N (Ꮽ) the W * -algebra of the N × N matrices over Ꮽ, and , for any X ∈ ᏹ N (Ꮽ), one can extend the previous norm by continuity to the Hilbert space L 2 (ᏹ N (Ꮽ), ϕ N ).
For an n-tuple of selfadjoint elements X j ∈ Ꮽ sa , the free entropy, denoted χ(X 1 ,..., X n ), involves sets of points in (ᏹ N (Ꮽ)) n , that is, finite-dimensional matrices, so-called matrix microstates.The free entropy is then defined as the normalized limit of the logarithms of volumes of all such possible microstates.For one random variable X ∈ Ꮽ sa with the distribution µ, the free entropy was defined in [13] as: ∈ Ω be a semicircular system (i.e., {S j } are free and each S j has a (0, 1) semicircular distribution).Then χ(X) ≤ χ(S), for all X ∈ Ω.
Proof.Since 3/4+ log |s −t|dµ(s)dµ(t) = 1/2, for µ with density 1/(2π) The extension of the above result to matrices is the following.Proposition 4.2.Let C = (c ij ) 1≤i, j≤N be a real positive definite invertible matrix, and , for all X ∈ Ω(C), where S is a free family of (0, 1) semicircular elements.Equality is attained if and only if (X 1 ,...,X N ) and ( j b 1j S j ,..., j b nj S j ) have the same distribution.

Proof. Apply the previous result to
, for all j ∈ J.The family of vectors {ξ j } j∈J ⊂ L 2 (Ꮽ,ϕ) is said to fulfill the conjugate relations for {A j } j∈J , with respect to the W *subalgebra Ꮾ ⊆ Ꮽ, if for all n ≥ 0, B 0 ,...,B n ∈ Ꮾ, j, j 1 ,...,j n ∈ I, As usual, the conjugate relations allow the introduction of an inner product in L 2 (Ꮽ,ϕ).Definition 4.4.Let (Ꮽ,ϕ) be a faithful trace W * -probability space.If {A j } j∈J ⊂ Ꮽ is a selfadjoint family, Ꮾ ⊆ Ꮽ is a unital W * -subalgebra such that {A j } j∈J has a conjugate system {ξ j } j∈J relative to Ꮾ, then the free Fisher information of {A j } j∈J with respect to Ꮾ is If {A j } j∈J does not have a conjugate system relative to Ꮾ, then [8]).
An analogue of the classical Cramér-Rao inequality concerning the free Fisher information was formulated in [15].
) is prescribed, then the free Fisher information Φ * (X 1 ,...,X n ) is minimized when the X j 's are free semicircular elements of equal radii.
The link between the free entropy and the free Fisher information (similar to relation (2.4) for the classical case) is presented in the stages below.
(1) For the one-dimensional case, let c ∈ Ꮽ be a circular variable (i.e., the real and imaginary parts of c are free and have semicircular distributions of equal radii).Assume further that the variance ϕ N (c) = 1 and {c, c * } is free from {a, a * }.Then, (2) Consider the matrix for all n ≥ 0) and free from A. Then, (3) The multidimensional version of (4.5) is where s j ∈ Ꮽ are semicircular and c i ∈ Ꮽ are circular elements, normalized by their variances (ϕ e., when n = N and m = N(N − 1)/2), using relations (4.7) and (4.6), yields the matrix version: where the selfadjoint family , and the family (s ij ) N  i,j=1 ⊂ Ꮽ has the properties: • The elements s ii are semicircular of variance 1, for all 1 ≤ i ≤ N.
• s * ij = s ji , for all 1 ≤ i, j ≤ N, and the elements s ij are circular of variance equal to 1, for all 1 ≤ i < j ≤ N. • (6) Using the scaling formulas: Φ * (κX) = κ −2 Φ * (X) and χ(κX) = χ(X) + log κ, for κ > 0 and X a selfadjoint random variable, then (4.9) leads to which is the free setting analogue of relations (2.1) and (2.4).Recently, Nica et al. [8] provided free entropy minimization criteria not only when the expectation or variance were prescribed, but also when the moments of all orders, that is, the whole distribution of an element A * A, is specified.

Entropy considerations in quantum setting.
Since the quantum analogue of the classical Fisher information is not uniquely defined in the recent literature, it is useful to study and compare various candidates.The relative entropy has several generalizations in the context of density matrices describing the mixed quantum state space of a system.For two density matrices P and Q, with positive eigenvalues, the relative entropy would be S(P , Q) = Tr P log(P /Q).To give meaning to the quotient P /Q, one can consider Umegaki's relative entropy [11], or the Belavkin and Staszewski relative entropy [2], (note that S BS and S U coincide in the case of commuting density matrices).
If Ꮽ represents a finite quantum system, then the family of states ϕ corresponding to the density matrices P ϕ in the Gelfand Naimark Segal (GNS) space Ᏼ of ϕ, consists of essentially localized modifications of ϕ and ϕ(A) = Tr P ϕ A ∀A ∈ Ꮽ. (5.3) When ϕ is faithful, the representation is unique and the normalization ϕ(I) = 1 implies Tr P ϕ = 1.
The relative entropy of the state ϕ with respect to ψ is naturally defined by where supp P ϕ designates the smallest projection V such that ϕ(V ) = ϕ(I).In particular, S(ϕ, ψ) is always finite if the density of ψ has strictly positive eigenvalues (i.e., ψ is faithful).The functional (ϕ, ψ) S(ϕ, ψ) is lower semicontinuous with respect to the weak* topology on the state space of the C*-algebra Ꮽ.If Ꮽ is finitely dimensional, the functional (ϕ, ψ) S(ϕ, ψ) is measurable [9]. Proposition

The relative entropy functional S(ϕ, ψ) has the following properties:
(a) For any ϕ, ψ positive functionals on the finite quantum system Ꮽ, (5.6) (Note: α is a Schwarz mapping if it is linear and satisfies the inequality α(A * A) ≥ α(A) * α(A), for all A ∈ Ꮽ 1 .)In particular, if α is an automorphism of Ꮽ, we obtain the invariance property: , then the following joint convexity holds: (5.7) Proof.(a) Let P ϕ = i λ i p i and Q ψ = j κ j q j be the spectral decompositions of the density matrices P ϕ and Q ψ .The Taylor expansion of the function η(x) = −x log x yields the inequality for some θ ∈ (x, y).Since η(x) = −x log x is convex on real intervals [x, y], the functional F(P ) = Tr[η(P )] is convex on the set of selfadjoint matrices with spectrum in [x, y], {P ∈ Ꮽ sa : Sp(P ) ⊂ [x, y]}.Thus Tr Tr p i q j ≥ 0. (5.9) Summing over i and j gives the desired inequality Note that β is a contraction (with respect to Hilbert-Schmidt norm): (5.11) Next, introduce the positive operators γ and δ by δ BT where Then ,AT ,AT (5.14) Hence, β * δβ ≤ γ.In particular, for 0 (5.15) The inequality (5.6) is obtained by differentiating in (5.15) at t = 0 and using the following representation of the relative entropy (for nonnegative density matrices P ϕ = T 1 and Q ψ = S 1 ): (5.16) An integral formulation similar to (5.16) is (5.17) (c) Let T 1 and S 1 be the densities of ϕ and ψ; likewise, let T 2 and S 2 be the densities of ϕ 12 (a ⊕ b) = λϕ 1 (a) + (1 − λ)ϕ 2 (b) and ψ 12 (a ⊕ b) = λψ 1 (a) + (1 − λ)ψ 2 (b), respectively.Then (5.17) holds and (5.18) Letting t 1 yields the joint convexity of the relative entropy: Allowing other functions in place of the logarithm, broadens further the concept of relative entropy.For instance, considering the operator convex function f : 2 , producing the so-called quadratic relative entropy (the χ 2divergence) (5.19) Proposition 5.2 (see [10]).The quadratic relative entropy satisfies the properties: Remark 5.3.If α is interpreted as a communication channel between two finite quantum systems, then the inequality (5.20) says that the output information corresponding to the states α(P ), α(Q) is always less than or equal to the input information of the states P and Q; S f (α(P ), α(Q)) represents the amount of information correctly transmitted through the channel α, and η f (α) can be viewed as the efficiency of the communication channel.
The quadratic relative entropy (5.19) can be extended for an arbitrary operator convex function f : (0, ∞) → R with f (1) = 0 when the density matrix P is assumed to be invertible.The function f admits the integral representation by means of a finite positive measure ν on R. Based on the integral representation (5.21) it was recently shown by Petz and Ruskai [10] that η f (α) = η (x−1) 2 (α), for any operator convex function f .Since the relative entropy coefficient η f (α) is independent of the operator convex function f , it is the object of current investigations to establish whether the relative entropy coefficient depends on the Riemannian metric defined on the manifold of density matrices.A generalized relative entropy induces a Riemannian metric on the state space.The set ᏹ of positive definite density matrices P = (P ij ) (with normalized trace, equal to 1) can be parametrized in an affine way, by the real numbers Re P ij , ImP ij (1 ≤ i < j ≤ n) and the positive numbers P ii .Thus ᏹ can be embedded into the Euclidean k-space with k = n 2 −1 and become a manifold.At each point P ∈ ᏹ, the tangent space T P (ᏹ) is identified with the set of all traceless selfadjoint matrices.Then, an inner product can be defined on T P (ᏹ) by (5.22) In the commutative case, a positive and trace-preserving Riemannian metric corresponding to the (logarithmic) relative entropy The quantum noncommutative case is richer, since the left and right multiplications by P −1 are not equivalent.In this setting, an analogue of P −1 B can be defined by Ω The squared geodesic distance [γ f (P , Q)] 2 is a differentiable monotone relative entropy distance, that is, it satisfies the properties: (a) [γ f (P , Q)] 2 ≥ 0, with equality if and only if P = Q.(b) γ f (P , Q) = γ f (Q, P ), for all P,Q ∈ ᏹ.(c) γ f (P , Q) ≤ γ f (P , R) + γ f (R, Q), for all P,Q,R ∈ ᏹ.(d) γ f (κP , κQ) = κγ f (P , Q), for scalars κ > 0. (e) γ f (P , Q) is jointly convex in P and Q. (f) γ f (α(P ), α(Q)) ≤ γ f (P , Q), for any trace-preserving Schwarz map α.(g) The function h(u, v) = S(P + uA, Q + vB) is differentiable.The above properties are easily verified in light of Propositions 5.1 and 5.2; (g) follows from Theorem 3.6(2) [5].

Conclusion.
The concept of entropy relies on having chosen certain measures (like phase space volumes, or Hilbert space norms) which are conserved under unitary time evolution.We highlighted the importance of the probability distributions corresponding to the extremal cases for the relative entropy and free entropy in various instances.These entropy inequalities may be viewed as manifestations of a universal principle of nondecrease of uncertainty, similar to the second law of thermodynamics, as remarked by Gnedenko and Korolev in [3].
The powerful blend of operatorial algebraic and probabilistic points of view afforded by the free probability theory provides the ideal framework for describing the limiting behavior of special classes of random matrices, as their size tends to infinity.Such large random matrices are extensively used in physical modeling (e.g., Ising models, phase transition phenomena, spin tunneling) motivating the interest in further investigating this topic in future work.