Some inequalities of the Slater type for convex functions defined on general linear spaces are given. Applications for norm inequalities and f-divergence measures are also provided.
1. Introduction
Suppose that I is an interval of real numbers with interior I°, and f:I→ℝ is a convex function on I. Then f is continuous on I° and has finite left and right derivatives at each point of I°. Moreover, if x,y∈I° and x<y, then f-′(x)≤f+′(x)≤f-′(y)≤f+′(y) which shows that both f-′ and f+′ are nondecreasing functions on I°. It is also known that a convex function must be differentiable except for at most countably many points.
For a convex function f:I→ℝ, the subdifferential of f denoted by ∂f is the set of all functions φ:I→[-∞,∞] such that φ(I°)⊂ℝ andf(x)≥f(a)+(x-a)φ(a),foranyx,a∈I.
It is also well known that if f is convex on I, then ∂f is nonempty, f-′, f+′∈∂f and if φ∈∂f, thenf-′(x)≤φ(x)≤f+′(x),foranyx∈I°.
In particular, φ is a nondecreasing function.
If f is differentiable and convex on I°, then ∂f={f′}.
The following result is well known in the literature as the Slater inequality.
Theorem 1.1 (Slater, 1981, [1]).
If f:I→ℝ is a nonincreasing (nondecreasing) convex function, xi∈I,pi≥0 with Pn:=∑i=1npi>0 and ∑i=1npiφ(xi)≠0, where φ∈∂f, then
1Pn∑i=1npif(xi)≤f(∑i=1npixiφ(xi)∑i=1npiφ(xi)).
As pointed out in [2] (see also [3, p. 64] and [4, p. 208]), the monotonicity assumption for the derivative φ can be replaced with the condition
∑i=1npixiφ(xi)∑i=1npiφ(xi)∈I,
which is more general and can hold for suitable points in I and for not necessarily monotonic functions.
For recent works on Slater’s inequality, see [5–7].
The main aim of the present paper is to extend Slater’s inequality for convex functions defined on general linear spaces. A reverse of the Slater’s inequality is also obtained. Natural applications for norm inequalities and f-divergence measures are provided as well.
2. Slater’s Inequality for Functions Defined on Linear Spaces
Assume that f:X→ℝ is a convex function on the real linear space X. Since for any vectors x,y∈X the function gx,y:ℝ→ℝ,gx,y(t):=f(x+ty) is convex, it follows that the following limits exist∇+(-)f(x)(y):=limt→0+(-)f(x+ty)-f(x)t,
and they are called the right (left) Gâteaux derivatives of the function f in the point x over the direction y.
It is obvious that for any t>0>s we havef(x+ty)-f(x)t≥∇+f(x)(y)=inft>0[f(x+ty)-f(x)t]≥sups<0[f(x+sy)-f(x)s]=∇-f(x)(y)≥f(x+sy)-f(x)s,
for any x,y∈X and, in particular,∇-f(u)(u-v)≥f(u)-f(v)≥∇+f(v)(u-v),
for any u,v∈X. We call this the gradient inequality for the convex function f. It will be used frequently in the sequel in order to obtain various results related to Slater’s inequality.
The following properties are also of importance:∇+f(x)(-y)=-∇-f(x)(y),∇+(-)f(x)(αy)=α∇+(-)f(x)(y),
for any x,y∈X and α≥0.
The right Gâteaux derivative is subadditive while the left one is superadditive, that is,∇+f(x)(y+z)≤∇+f(x)(y)+∇+f(x)(z),∇-f(x)(y+z)≥∇-f(x)(y)+∇-f(x)(z),
for any x,y,z∈X.
Some natural examples can be provided by the use of normed spaces.
Assume that (X,∥·∥) is a real normed linear space. The function f:X→ℝ, f(x):=(1/2)∥x∥2 is a convex function which generates the superior and the inferior semi-inner products 〈y,x〉s(i):=limt→0+(-)‖x+ty‖2-‖x‖22t.
For a comprehensive study of the properties of these mappings in the Geometry of Banach Spaces, see the monograph [8].
For the convex function fp:X→ℝ, fp(x):=∥x∥p with p>1, we have∇+(-)fp(x)(y)={p‖x‖p-2〈y,x〉s(i)ifx≠0,0ifx=0,
for any y∈X.
If p=1, then we have∇+(-)f1(x)(y)={‖x‖-1〈y,x〉s(i)ifx≠0,+(-)‖y‖ifx=0,
for any y∈X.
For a given convex function f:X→ℝ and a given n-tuple of vectors x=(x1,…,xn)∈Xn, we consider the setsSla+(-)(f,x)∶={v∈X∣∇+(-)f(xi)(v-xi)≥0∀i∈{1,…,n}},Sla+(-)(f,x,p)∶={v∈X∣∑i=1npi∇+(-)f(xi)(v-xi)≥0},
where p=(p1,…,pn)∈ℙn is a given probability distribution, that is, pi≥0 for i∈{1,…,n} and ∑i=1npi=1.
The following properties of these sets hold.
Lemma 2.1.
For a given convex function f:X→ℝ, a given n-tuple of vectors x=(x1,…,xn)∈Xn, and a given probability distribution p=(p1,…,pn)∈ℙn, one has
Sla-(f,x)⊂Sla+(f,x) and Sla-(f,x,p)⊂Sla+(f,x,p);
Sla-(f,x)⊂Sla-(f,x,p) and Sla+(f,x)⊂Sla(f,x,p) for all p=(p1,…,pn)∈ℙn;
the sets Sla-(f,x) and Sla-(f,x,p) are convex.
Proof.
The properties (i) and (ii) follow from the definition and the fact that ∇+f(x)(y)≥∇-f(x)(y) for any x,y.
(iii) Let us only prove that Sla-(f,x) is convex.
If we assume that y1,y2∈Sla-(f,x) and α,β∈[0,1] with α+β=1, then by the superadditivity and positive homogeneity of the Gâteaux derivative ∇-f(·)(·) in the second variable we have
∇-f(xi)(αy1+βy2-xi)=∇-f(xi)[α(y1-xi)+β(y2-xi)]≥α∇-f(xi)(y1-xi)+β∇-f(xi)(y2-xi)≥0,
for all i∈{1,…,n}, which shows that αy1+βy2∈Sla-(f,x)
The proof for the convexity of Sla-(f,x,p) is similar and the details are omitted.
For the convex function fp:X→ℝ, fp(x):=∥x∥p with p≥1, defined on the normed linear space (X,∥·∥) and for the n-tuple of vectors x=(x1,…,xn)∈Xn∖{(0,…,0)} we have, by the well-known property of the semi-inner products,〈y+αx,x〉s(i)=〈y,x〉s(i)+α‖x‖2,foranyx,y∈X,α∈R,
thatSla+(-)(‖⋅‖p,x)=Sla+(-)(‖⋅‖,x):={v∈X∣〈v,xj〉s(i)≥‖xj‖2∀j∈{1,…,n}}
which, as can be seen, does not depend on p. We observe, by the continuity of the semi-inner products in the first variable, that Sla+(-)(∥·∥,x) is closed in (X,∥·∥). Also, we should remark that if v∈Sla+(-)(∥·∥,x), then for any γ≥1 we also have that γv∈Sla+(-)(∥·∥,x).
The larger classes, which are dependent on the probability distribution p∈ℙn, are described bySla+(-)(‖⋅‖p,x,p):={v∈X∣∑j=1npj‖xj‖p-2〈v,xj〉s(i)≥∑j=1npj‖xj‖p}.
If the normed space is smooth, that is, the norm is Gâteaux differentiable in any nonzero point, then the superior and inferior semi-inner products coincide with the Lumer-Giles semi-inner product [·,·] that generates the norm and is linear in the first variable (see for instance [8]). In this situation,Sla(‖⋅‖,x)={v∈X∣[v,xj]≥‖xj‖2∀j∈{1,…,n}},Sla(‖⋅‖p,x,p)={v∈X∣∑j=1npj‖xj‖p-2[v,xj]≥∑j=1npj‖xj‖p}.
If (X,〈·,·〉) is an inner product space, then Sla(∥·∥p,x,p) can be described bySla(‖⋅‖p,x,p)={v∈X∣〈v,∑j=1npj‖xj‖p-2xj〉≥∑j=1npj‖xj‖p},
and if the family {xj}j=1,…,n is orthogonal, then obviously, by the Pythagoras theorem, we have that the sum ∑j=1nxj belongs to Sla(∥·∥,x) and therefore to Sla(∥·∥p,x,p) for any p≥1 and any probability distribution p=(p1,…,pn)∈ℙn.
We can state now the following results that provide a generalization of Slater’s inequality as well as a counterpart for it.
Theorem 2.2.
Let f:X→ℝ be a convex function on the real linear space X, x=(x1,…,xn)∈Xn an n-tuple of vectors, and p=(p1,…,pn)∈ℙn a probability distribution. Then for any v∈Sla+(f,x,p), one has the inequalities
∇-f(v)(v)-∑i=1npi∇-f(v)(xi)≥f(v)-∑i=1npif(xi)≥0.
Proof.
If we write the gradient inequality for v∈Sla+(f,x,p) and xi, then we have that
∇-f(v)(v-xi)≥f(v)-f(xi)≥∇+f(xi)(v-xi),
for any i∈{1,…,n}.
By multiplying (2.18) with pi≥0 and summing over i from 1 to n, we get
∑i=1npi∇-f(v)(v-xi)≥f(v)-∑i=1npif(xi)≥∑i=1npi∇+f(xi)(v-xi).
Now, since v∈Sla+(f,x,p), then the right hand side of (2.19) is nonnegative, which proves the second inequality in (2.17).
By the superadditivity of the Gâteaux derivative ∇-f(·)(·) in the second variable, we have
∇-f(v)(v)-∇-f(v)(xi)≥∇-f(v)(v-xi),
which, by multiplying with pi≥0 and summing over i from 1 to n, produces the inequality
∇-f(v)(v)-∑i=1npi∇-f(v)(xi)≥∑i=1npi∇-f(v)(v-xi).
Utilising (2.19) and (2.21), we deduce the desired result (2.17).
Remark 2.3.
The above result has the following form for normed linear spaces. Let (X,∥·∥) be a normed linear space, x=(x1,…,xn)∈Xn an n-tuple of vectors from X, and p=(p1,…,pn)∈ℙn a probability distribution. Then for any vector v∈X with the property
∑j=1npj‖xj‖p-2〈v,xj〉s≥∑j=1npj‖xj‖p,p≥1,
we have the inequalities
p[‖v‖p-∑j=1npj‖xj‖p-2〈v,xj〉i]≥‖v‖p-∑j=1npj‖xj‖p≥0.
Rearranging the first inequality in (2.23), we also have that
(p-1)‖v‖p+∑j=1npj‖xj‖p≥p∑j=1npj‖xj‖p-2〈v,xj〉i.
If the space is smooth, then the condition (2.22) becomes
∑j=1npj‖xj‖p-2[v,xj]≥∑j=1npj‖xj‖p,p≥1,
implying the inequality
p[‖v‖p-∑j=1npj‖xj‖p-2[v,xj]]≥‖v‖p-∑j=1npj‖xj‖p≥0.
Notice also that the first inequality in (2.26) is equivalent with
(p-1)‖v‖p+∑j=1npj‖xj‖p≥p∑j=1npj‖xj‖p-2[v,xj](≥p∑j=1npj‖xj‖p≥0).
Corollary 2.4.
Let f:X→ℝ be a convex function on the real linear space X, x=(x1,…,xn)∈Xn an n-tuple of vectors, and p=(p1,…,pn)∈ℙn a probability distribution. If
∑i=1npi∇+f(xi)(xi)≥(<)0,
and there exists a vector s∈X with
∑i=1npi∇+(-)f(xi)(s)≥(≤)1,
then
∇-f(∑j=1npj∇+f(xj)(xj)s)(∑j=1npj∇+f(xj)(xj)s)-∑i=1npi∇-f(∑j=1npj∇+f(xj)(xj)s)(xi)≥f(∑j=1npj∇+f(xj)(xj)s)-∑i=1npif(xi)≥0.
Proof.
Assume that ∑i=1npi∇+f(xi)(xi)≥0 and ∑i=1npi∇+f(xi)(s)≥1 and define v:=∑j=1npj∇+f(xj)(xj)s. We claim that v∈Sla+(f,x,p).
By the subadditivity and positive homogeneity of the mapping ∇+f(·)(·) in the second variable, we have
∑i=1npi∇+f(xi)(v-xi)≥∑i=1npi∇+f(xi)(v)-∑i=1npi∇+f(xi)(xi)=∑i=1npi∇+f(xi)(∑j=1npj∇+f(xj)(xj)s)-∑i=1npi∇+f(xi)(xi)=∑j=1npj∇+f(xj)(xj)∑i=1npi∇+f(xi)(s)-∑i=1npi∇+f(xi)(xi)=∑j=1npj∇+f(xj)(xj)[∑i=1npi∇+f(xi)(s)-1]≥0,
as claimed. Applying Theorem 2.2 for this v, we get the desired result.
If ∑i=1npi∇+f(xi)(xi)<0 and ∑i=1npi∇-f(xi)(s)≤1, then for
w:=∑j=1npj∇+f(xj)(xj)s,
we also have that
∑i=1npi∇+f(xi)(w-xi)≥∑i=1npi∇+f(xi)(∑j=1npj∇+f(xj)(xj)s)-∑i=1npi∇+f(xi)(xi)=∑i=1npi∇+f(xi)((-∑j=1npj∇+f(xj)(xj))(-s))-∑i=1npi∇+f(xi)(xi)=(-∑j=1npj∇+f(xj)(xj))∑i=1npi∇+f(xi)(-s)-∑i=1npi∇+f(xi)(xi)=(-∑j=1npj∇+f(xj)(xj))(1+∑i=1npi∇+f(xi)(-s))=(-∑j=1npj∇+f(xj)(xj))(1-∑i=1npi∇-f(xi)(s))≥0,
where, for the last equality, we have used the property (2.4). Therefore, w∈Sla+(f,x,p) and by Theorem 2.2 we get the desired result.
It is natural to consider the case of normed spaces.
Remark 2.5.
Let (X,∥·∥) be a normed linear space, x=(x1,…,xn)∈Xn an n-tuple of vectors from X, and p=(p1,…,pn)∈ℙn a probability distribution. Then for any vector s∈X with the property that
p∑i=1npi‖xi‖p-2〈s,xi〉s≥1,
we have the inequalities
pp‖s‖p-1(∑j=1npj‖xj‖p)p-1(p‖s‖∑j=1npj‖xj‖p-∑j=1npj〈xj,s〉i)≥pp‖s‖p(∑j=1npj‖xj‖p)p-∑j=1npj‖xj‖p≥0.
The case of smooth spaces can be easily derived from the above; however, the details are left to the interested reader.
3. The Case of Finite Dimensional Linear Spaces
Consider now the finite dimensional linear space X=ℝm and assume that C is an open convex subset of ℝm. Assume also that the function f:C→ℝ is differentiable and convex on C. Obviously, if x=(x1,…,xm)∈C, then for any y=(y1,…,ym)∈ℝm we have∇f(x)(y)=∑k=1m∂f(x1,…,xm)∂xk⋅yk.
For the convex function f:C→ℝ and a given n-tuple of vectors x=(x1,…,xn)∈Cn with xi=(xi1,…,xim) with i∈{1,…,n}, we consider the setsSla(f,x,C)∶={v∈C∣∑k=1m∂f(xi1,…,xim)∂xik⋅vk≥∑k=1m∂f(xi1,…,xim)∂xik⋅xik∀i∈{1,…,n}},Sla(f,x,p,C)∶={v∈C∣∑i=1n∑k=1mpi∂f(xi1,…,xim)∂xik⋅vk≥∑i=1n∑k=1mpi∂f(xi1,…,xim)∂xik⋅xik},
where p=(p1,…,pn)∈ℙn is a given probability distribution.
As in the previous section the sets, Sla(f,x,C) and Sla(f,x,p,C) are convex and closed subsets of clo(C), the closure of C. Also {x1,…,xn}⊂Sla(f,x,C)⊂Sla(f,x,p,C) for any p=(p1,…,pn)∈ℙnis a probability distribution.
Proposition 3.1.
Let f:C→ℝ be a convex function on the open convex set C in the finite dimensional linear space ℝm, (x1,…,xn)∈Cn an n-tuple of vectors and (p1,…,pn)∈ℙn a probability distribution. Then for any v=(v1,…,vm)∈Sla(f,x,p,C), one has the inequalities
∑k=1m∂f(v1,…,vm)∂xk⋅vk-∑i=1n∑k=1mpi∂f(xi1,…,xim)∂xik⋅vk≥f(v1,…,vn)-∑i=1npif(xi1,…,xim)≥0.
The unidimensional case, that is, m=1 is of interest for applications. We will state this case with the general assumption that f:I→ℝ is a convex function on an open interval I. For a given n-tuple of vectors x=(x1,…,xn)∈In, we have
Sla+(-)(f,x,I)∶={v∈I∣f+(-)′(xi)⋅(v-xi)≥0∀i∈{1,…,n}},Sla+(-)(f,x,p,I)∶={v∈I∣∑i=1npif+(-)′(xi)⋅(v-xi)≥0},
where (p1,…,pn)∈ℙn is a probability distribution. These sets inherit the general properties pointed out in Lemma 2.1. Moreover, if we make the assumption that ∑i=1npif+′(xi)≠0, then for ∑i=1npif+′(xi)>0 we have
Sla+(f,x,p,I)={v∈I∣v≥∑i=1npif+′(xi)xi∑i=1npif+′(xi)},
while for ∑i=1npif+′(xi)<0 we have
Sla+(f,x,p,I)={v∈I∣v≤∑i=1npif+′(xi)xi∑i=1npif+′(xi)}.
Also, if we assume that f+′(xi)≥0 for all i∈{1,…,n} and ∑i=1npif+′(xi)>0, then
vs:=∑i=1npif+′(xi)xi∑i=1npif+′(xi)∈I,
due to the fact that xi∈I and I is a convex set.
Proposition 3.2.
Let f:I→ℝ be a convex function on an open interval I. For a given n-tuple of vectors x=(x1,…,xn)∈In and (p1,…,pn)∈ℙn a probability distribution, one has
f-′(v)(v-∑i=1npixi)≥f(v)-∑i=1npif(xi)≥0,
for any v∈Sla+(f,x,p,I).
In particular, if one assumes that ∑i=1npif+′(xi)≠0 and
∑i=1npif+′(xi)xi∑i=1npif+′(xi)∈I,
then
f-′(∑i=1npif+′(xi)xi∑i=1npif+′(xi))[∑i=1npif+′(xi)xi∑i=1npif+′(xi)-∑i=1npixi]≥f(∑i=1npif+′(xi)xi∑i=1npif+′(xi))-∑i=1npif(xi)≥0.
Moreover, if f+′(xi)≥0 for all i∈{1,…,n} and ∑i=1npif+′(xi)>0, then (3.10) holds true as well.
Remark 3.3.
We remark that the first inequality in (3.10) provides a reverse inequality for the classical result due to Slater.
4. Some Applications for f-Divergences
Given a convex function f:[0,∞)→ℝ, the f-divergence functional If(p,q):=∑i=1nqif(piqi),
where p=(p1,…,pn),q=(q1,…,qn) are positive sequences, was introduced by Csiszár in [9], as a generalized measure of information, a “distance function’’ on the set of probability distributions ℙn. As in [9], we interpret undefined expressions byf(0)=limt→0+f(t),0f(00)=0,0f(a0)=limq→0+qf(aq)=alimt→∞f(t)t,a>0.
The following results were essentially given by Csiszár and Körner [10]:
if f is convex, then If(p,q) is jointly convex in p and q;
for every p,q∈ℝ+n,
we have
If(p,q)≥∑j=1nqjf(∑j=1npj∑j=1nqj).
If f is strictly convex, equality holds in (4.3) if and only ifp1q1=p2q2=⋯=pnqn.
If f is normalized, that is, f(1)=0,
then for every p,q∈ℝ+n with ∑i=1npi=∑i=1nqi,
we have the inequalityIf(p,q)≥0.
In particular, if p,q∈ℙn, then (4.5) holds. This is the well-known positivity property of the f-divergence.
It is obvious that the above definition of If(p,q) can be extended to any function f:[0,∞)→ℝ; however, the positivity condition will not generally hold for normalized functions and p,q∈ℝ+n with ∑i=1npi=∑i=1nqi.
For a normalized convex function f:[0,∞)→ℝ and two probability distributions p,q∈ℙn, we define the setSla+(f,p,q):={v∈[0,∞)∣∑i=1nqif+′(piqi)⋅(v-piqi)≥0}.
Now, observe that∑i=1nqif+′(piqi)⋅(v-piqi)≥0,
is equivalent withv∑i=1nqif+′(piqi)≥∑i=1npif+′(piqi).
If ∑i=1nqif+′(pi/qi)>0, then (4.8) is equivalent withv≥∑i=1npif+′(pi/qi)∑i=1nqif+′(pi/qi),
therefore in this caseSla+(f,p,q)={[0,∞)if∑i=1npif+′(pi/qi)<0,[∑i=1npif+′(pi/qi)∑i=1nqif+′(pi/qi),∞)if∑i=1npif+′(pi/qi)≥0.
If ∑i=1nqif+′(pi/qi)<0, then (4.8) is equivalent withv≤∑i=1npif+′(pi/qi)∑i=1nqif+′(pi/qi),
thereforeSla+(f,p,q)={[0,∑i=1npif+′(pi/qi)∑i=1nqif+′(pi/qi)]if∑i=1npif+′(piqi)≤0,∅if∑i=1npif+′(piqi)>0.
Utilising the extended f-divergences notation, we can state the following result.
Theorem 4.1.
Let f:[0,∞)→ℝ be a normalized convex function and p,q∈ℙn two probability distributions. If v∈Sla+(f,p,q), then one has
f-′(v)(v-1)≥f(v)-If(p,q)≥0.
In particular, if one assumes that If+′(p,q)≠0 and
If+′(⋅)(⋅)(p,q)If+′(p,q)∈[0,∞),
then
f-′(If+′(⋅)(⋅)(p,q)If+′(p,q))[If+′(⋅)(⋅)(p,q)If+′(p,q)-1]≥f(If+′(⋅)(⋅)(p,q)If+′(p,q))-If(p,q)≥0.
Moreover, if f+′(pi/qi)≥0 for all i∈{1,…,n} and If+′(p,q)>0, then (4.15) holds true as well.
The proof follows immediately from Proposition 3.2 and the details are omitted.
The K. Pearson χ2-divergence is obtained for the convex function f(t)=(1-t)2,t∈ℝ and given by
χ2(p,q):=∑j=1nqj(pjqj-1)2=∑j=1n(pj-qj)2qj=∑j=1npi2qi-1.
The Kullback-Leibler divergence can be obtained for the convex function f:(0,∞)→ℝ, f(t)=tlnt and is defined by
KL(p,q):=∑j=1nqj⋅pjqjln(pjqj)=∑j=1npjln(pjqj).
If we consider the convex function f:(0,∞)→ℝ, f(t)=-lnt, then we observe that
If(p,q):=∑i=1nqif(piqi)=-∑i=1nqiln(piqi)=∑i=1nqiln(qipi)=KL(q,p).
For the function f(t)=-lnt, we will obviously have that
Sla(-ln,p,q):={v∈[0,∞)∣-∑i=1nqi(piqi)-1⋅(v-piqi)≥0}={v∈[0,∞)∣v∑i=1nqi2pi-1≤0}=[0,1χ2(q,p)+1].
Utilising the first part of Theorem 4.1, we can state the following.
Proposition 4.2.
Let p,q∈ℙn be two probability distributions. If v∈[0,(1/(χ2(q,p)+1))], then one has
1-vv≥-ln(v)-KL(q,p)≥0.
In particular, for v=1/(χ2(q,p)+1), one gets
χ2(q,p)≥ln[χ2(q,p)+1]-KL(q,p)≥0.
If we consider now the function f:(0,∞)→ℝ, f(t)=tlnt, then f′(t)=lnt+1 and
Sla((⋅)ln(⋅),p,q):={v∈[0,∞)∣∑i=1nqi(ln(piqi)+1)⋅(v-piqi)≥0}={v∈[0,∞)∣v∑i=1nqi(ln(piqi)+1)-∑i=1npi⋅(ln(piqi)+1)≥0}={v∈[0,∞)∣v(1-KL(q,p))≥1+KL(p,q)}.
We observe that if p,q∈ℙn are two probability distributions such that 0<KL(q,p)<1, then
Sla((⋅)ln(⋅),p,q)=[1+KL(p,q)1-KL(q,p),∞).
If KL(q,p)≥1, then Sla((·)ln(·),p,q)=∅.
By the use of Theorem 4.1, we can state now the following.
Proposition 4.3.
Let p,q∈ℙn be two probability distributions such that 0<KL(q,p)<1. If v∈[(1+KL(p,q))/(1-KL(q,p)),∞), then one has
(lnv+1)(v-1)≥vln(v)-KL(p,q)≥0.
In particular, for v=(1+KL(p,q))/(1-KL(q,p)), one gets
(ln[1+KL(p,q)1-KL(q,p)]+1)(1+KL(p,q)1-KL(q,p)-1)≥1+KL(p,q)1-KL(q,p)ln[1+KL(p,q)1-KL(q,p)]-KL(p,q)≥0.
Similar results can be obtained for other divergence measures of interest such as the Jeffreys divergence and Hellinger discrimination. However, the details are left to the interested reader.
Acknowledgment
The author would like to thank the anonymous referees for their valuable comments that have been implemented in the final version of the paper.
SlaterM. S.A companion inequality to Jensen's inequality198132216016663370010.1016/0021-9045(81)90112-XZBL0521.41011PečarićJ. E.A multidimensional generalization of Slater's inequality198544329229479461210.1016/0021-9045(85)90100-5ZBL0605.41012PečarićJ. E.ProschanF.TongY. L.1992187New York, NY, USAAcademic Pressxiv+467Mathematics in Science and Engineering1162312DragomirS. S.2004Hauppauge, NY, USANova Science Publishersx+2252093455KhanM. A.PečarićJ. E.Improvement and reversion of Slater's inequality and related results201020101464603410.1155/2010/6460342665487ZBL1206.26030DragomirS. S.Some Slater's type inequalities for convex functions defined on linear spaces and applications200912, article 8MatićM.PečarićJ.Some companion inequalities to Jensen's inequality2000333553681768496ZBL0968.26015DragomirS. S.2004Hauppauge, NY, USANova Science Publishersx+2222047793CsiszárI.Information-type measures of difference of probability distributions and indirect observations196722993180219345ZBL0157.25802CsiszárI.KörnerJ.1981New York, NY, USAAcademic Pressxi+452Probability and Mathematical Statistics666545