1. Introduction
Let X={Xn,n≥0} be an arbitrary information source taking values on alphabet set S={1,2,…,N} with the joint distribution
(1)P(X0=x0,X1=x1,…,Xn=xn)=p(x0,x1,…,xn), xi∈S,
for ∀0≤i≤n, n≥0. If X={Xn,n≥0} is an mth-order nonhomogeneous Markov information source, then, for n≥m,
(2)P(Xn=xn∣X0=x0,X1=x1,…,Xn-1=xn-1) =P(Xn=xn∣Xn-m=xn-m,hhhhhhhhhhhhhhXn-m+1=xn-m+1,…,Xn-1=xn-1).
Denote
(3)μ(i0,i1,…,im-1) =P(X0=i0,X1=i1,…,Xm-1=im-1),(4)pn(j∣i1,i2,…,im) =P(Xn=j∣Xn-m=i1,…,Xn-1=im),
where μ(i0,i1,…,im-1) and pn(j∣i1,i2,…,im) are called the m-dimensional initial distribution and the mth-order transition probabilities, respectively. Moreover,
(5)Pn=(pn(j∣i1,i2,…,im))
are called the mth-order transition probability matrices. In this case,
(6)p(x0,x1,…,xn) =μ(x0,x1,…,xm-1)∏k=mnpk(xk∣xk-m,…,xk-1).

There are many of practical information sources, such as language and image information, which are often mth-order Markov information sources and always nonhomogeneous. So it is very important to study the limit properties for the mth-order nonhomogeneous Markov information sources in information theory. Yang and Liu [1] proved the strong law of large numbers and the asymptotic equipartition property with convergence in the sense of a.s. the mth-order nonhomogeneous Markov information sources. But the problem about the central limit theorem for the mth-order nonhomogeneous Markov information sources is still open.

The central limit theorem (CLT) for additive functionals of stationary, ergodic Markov information source has been studied intensively during the last decades [2–9]. Nearly fifty years ago, Dobrushin [10, 11] proved an important central limit theorem for nonhomogeneous Markov information resource in discrete time. After Dobrushin's work, some refinements and extensions of his central limit theorem, some of which are under more stringent assumptions, were proved by Statuljavicius [12] and Sarymsakov [13]. Based on Dobrushin's work, Sethuraman and Varadhan [14] gave shorter and different proof elucidating more the assumptions by using martingale approximation. Those works only consider the case about 1th-order nonhomogeneous Markov chain. In this paper, we come to study the central limit theorem for mth-order nonhomogeneous Markov information sources in Cesàro sense.

Let X={Xn,n≥0} be an mth-order nonhomogeneous Markov information source which is taking values in state space S={1,2,…,N} with initial distribution of (3) and mth order transition probability matrices (5). Denote
(7)Xmn={Xm,Xm+1,…,Xn}.
We also denote the realizations of Xmn by xmn. We denote the mth-order transition matrix at step k by
(8)Pk=(pk(j∣i1m)), j∈S, i1m∈Sm,
where Pk(j∣i1m)=P(Xk=j∣Xk-mk-1=i1m).

For an arbitrary stochastic square matrix A whose elements are Ai,j, we will set the ergodic δ-coefficient equal to
(9)δ(A)=supi,j∈S ∑k∈S[Ai,k-Aj,k]+,
where [a]+=max{0,a}. Now we extend this idea to the mth-order stochastic matrix Q whose elements are q(i1m,j)=q(j∣i1m), and we will introduce the ergodic δ-coefficient equal to
(10)δ(Q)=supi1m,j1m∈Sm∑k∈S[q(i1m,k)-q(j1m,k)]+.

Now we define another stochastic matrix as follows:
(11)P-=(p-(j1m∣i1m)) i1m,j1m∈Sm,
where
(12)p-(j1m∣i1m)={p(jm∣i1m),as jv=iv+1,v=1,2,…,m-1;0,otherwise.P- is called the m-dimensional stochastic matrix determined by the mth-order transition matrix.

Let Sn(i1m)=Sn(i1,i2,…,im) be the number of (i1,i2,…,im) in the sequence of X0m-1,X1m,…,Xn-mn-1; that is,
(13)Sn(i1m)=∑k=mnI{Xk-mk-1=i1m}.

Lemma 1 (see [<xref ref-type="bibr" rid="B20">1</xref>]).
Let X={Xn,n≥0} be an mth-order nonhomogeneous Markov information source which is taking values in state space S={1,2,…,N} with initial distribution of (3) and mth-order transition probability matrices (5). Sn(i1m) is defined as (13). Let P=(p(j∣i1m)) be another m-order transition matrix, and let P- be the m-dimensional stochastic matrix determined by the mth-order transition matrix P, that is, π=πP-. Suppose that
(14)limn→∞1n∑k=mn|pk(ji1m)-p(j∣i1m)|=0, ∀j∈S, i1m∈Sm.
Then one has
(15)limn→∞Sn(i1m)n=π(i1m), a.s.

3. Proof of Theorem <xref ref-type="statement" rid="thm2.1">2</xref>
Let {Ω,ℱ,P} be a probability space and let {Mn,n=1,2,…} be a sequence of random variables which is defined on {Ω,ℱ,P}. Let {ℱn,n=1,2,…} be an increasing sequence of σ-fields of ℱ sets. Now let {Mn,ℱn,n=1,2,…} be a sequence of martingale, so that
(22)D0=0, Dn=Mn-Mn-1, n=1,2,….
is a martingale difference. ℱ0 is a trivial σ field. For n=1,2,…, denote
(23)σn2=E(Dn2∣ℱn-1),Vn2=∑j=1nσj2,vn2=E(Vn2)=E(Mn2).

Lindeberg Condition. For ∀ϵ>0,
(24)limn→∞∑j=1nEDj2I{|Dj|≥ϵvn}vn2=0,
where I{·} denotes the index function.

In our proof, we will use the central limit theorem of martingale sequences as the technical tool.

Lemma 4 (see [<xref ref-type="bibr" rid="B1">15</xref>]).
Suppose that the sequence of martingale {Mn,ℱn,n=1,2,…} satisfies the following condition:
(25)Vn2vn2⟹p1.
Moreover, if the Lindeberg condition holds, then one has
(26)Mnvn⟹DN(0,1),
where ⇒p and ⇒D denote convergence in probability and in distribution, respectively.

Before we prove our main result Theorem 2, we at first come to prove Theorem 5.

Theorem 5.
Let X={Xn,n≥0} be an m-order nonhomogeneous Markov information source which is taking values in state space S={1,2,…,N} with initial distribution of (3) and mth-order transition probability matrices (5). Let f be any function defined on the state space Sm+1. Suppose that the function f satisfies condition (21). Let {Wn,n≥0} be defined as (16). If (14) holds, then
(27)Wnnσ⇒DN(0,1),
where ⇒D denotes the convergence in distribution.

Proof of Theorem <xref ref-type="statement" rid="thm3.2">5</xref>.
Noting that by using the property of the conditional expectation and Markov property, it follows from (17) that
(28)Vn2n=1n∑k=mnE[Dk2∣ℱk-1]=1n∑k=mn{(E[f(Xk-mk)∣Xk-mk-1])2E[f2(Xk-mk)∣Xk-mk-1]=1n∑k=mnh -(E[f(Xk-mk)∣Xk-mk-1])2}:=I1(n)-I2(n),
where
(29)I1(n)=1n∑k=mnE[f2(Xk-mk)∣Xk-mk-1]=1n∑k=m n∑j∈S ∑i1m∈Smf2(i1m,j)pk(j∣im)I{Xk-mk-1=i1m}=∑j∈S ∑i1m∈Smf2(i1m,j)1n∑k=mnpk(j∣i1m)I{Xk-mk-1=i1m},(30)I2(n)=1n∑k=mn(E[f(Xk-mk)∣Xk-mk-1])2=1n∑k=m n∑i1m∈Sm[∑j∈Sf(i1m,j)pk(j∣i1m)]2 ×I{Xk-mk-1=i1m}.
Noting that, on the one hand,
(31)|1n∑k=mnI{Xk-mk-1=i1m}[pk(j∣i1m)-p(j∣i1m)]| ≤1n∑k=mn|pk(j∣i1m)-p(j∣i1m)|
which tends to zero as n tends to infinity by using (14). Thus we have
(32)limn→∞1n∑k=mnI{Xk-mk-1=i1m}pk(j∣i1m) =limn→∞1n∑k=mnI{Xk-mk-1=i1m}p(j∣i1m) =limn→∞1nSn(i1m)p(j∣i1m) =π(i1m)p(j∣i1m) a.s.,
where the third equation holds because of (15). Combining (29) and (32), we get
(33)limn→∞I1(n)=∑j∈S ∑i1m∈Smπ(i1m)p(j∣i1m)f2(i1m,j) a.s.
On the other hand, let us come to compute the limit of I2(n) as n tends to infinity. By using (14) again, we have
(34)|I2(n)-1n∑k=mn ∑i1m∈Sm[∑j∈Sf(i1m,j)p(j∣i1m)]2I{Xk-mk-1=i1m}| ≤1n∑k=m n ∑i1m∈Sm[∑j∈Sf(i1m,j)|pk(j∣i1m)-p(j∣i1m)|] hhhhhhh×[∑j∈Sf(i1m,j)(pk(j∣i1m)+p(j∣i1m))] ≤2(maxi1m∈Sm,j∈S f(i1m,j))2 ×∑i1m∈Sm∑j∈S∑k=mn|pk(j∣i1m)-p(j∣i1m)|n⟶0. hhhhhhhhhhhhhhhhhhhhhhhhh as n⟶ ∞.

Thus by Lemma 1, we easily arrive at
(35)limn→∞I2(n) =∑i1m∈Sm[∑j∈Sf(i1m,j)p(j∣i1m)]21n∑k=mnI{Xk-mk-1=i1m} =∑i1m∈Sm[∑j∈Sf(i1m,j)p(j∣i1m)]2Sn(i1m)n =∑i1m∈Smπ(i1m)[∑j∈Sf(i1m,j)p(j∣i1m)]2 a.s.

Combining (28), (33), and (35), we arrive at
(36)limn→∞Vn2n =∑i1m∈Smπ(i1m){[∑j∈Sf(i1m,j)p(j∣i1m)]2∑j∈Sf2(i1m,j)p(j∣i1m) =∑i1m∈Smπ(i1m)h -[∑j∈Sf(i1m,j)p(j∣i1m)]2} a.s.,
which implies that
(37)limn→∞Vn2n =∑i1m∈Smπ(i1m){[∑j∈Sf(i1m,j)p(j∣i1m)]2∑j∈Sf2(i1m,j)p(j∣i1m) =∑i1m∈Smπ(i1m)h -[∑j∈Sf(i1m,j)p(j∣i1m)]2}hhhhhhhhhhhhhhhhhhhhhh in probability.
Note that
(38)Vn2n≤maxm≤k≤nE[Dk2∣Xk-mk-1]=maxm≤k≤n{(E[f(Xk-mk)∣Xk-mk-1])2E[f2(Xk-mk)∣Xk-mk-1]=maxm≤k≤nh -(E[f(Xk-mk)∣Xk-mk-1])2}≤maxi1m∈Sm,j∈S f2(i1m,j).
Since S is a finite set, then the random sequence {Vn2/n,n≥1} is uniformly integrable. Combining above two facts, we arrive at
(39)limn→∞E[Vn2]n =∑i1m∈Smπ(i1m){∑j∈Sf2(i1m,j)p(j∣i1m) =∑i1m∈Smπ(i1m)hh-[∑j∈Sf(i1m,j)p(j∣i1m)]2} =σ2>0.

It follows that
(40)Vn2vn2⇒p1,
where vn2=E[Vn2]=E[Wn2]. On the other hand, similar to the analysis of inequality (38), we also have that {Dn2=[f(Xk-mk)-E[f(Xk-mk)∣Xk-mk-1]]2} is uniformly integrable, so that
(41)limn→∞∑j=mnEDj2I(|Dj|≥ϵn)n=0,
which implies that the Lindeberg condition holds, and then we can easily get our conclusion by using Lemma 4.

Now let us come to prove our main result of Theorem 2.

Proof of Theorem <xref ref-type="statement" rid="thm2.1">2</xref>.
Note that
(42)Sn-E[Sn] =Wn+∑k=mn[E[f(Xk-mk)∣Xk-mk-1]-E[f(Xk-mk)]].

Denote
(43)P(Xk-mk-1=s1m,Xk=j)=Pk(s1m,j).
and M=sups1m∈Sm,j∈S f(s1m,j). Let us come to estimate the upper bound of |E[f(Xk-mk)∣Xk-1]-E[f(Xk-mk)]|. In fact, it follows from the C-K formula
(44)|E[f(Xk-mk)∣Xk-mk-1]-E[f(Xk-mk)]| =|∑j∈Sf(Xk-mk-1,j)Pk(j∣Xk-mk-1) h -∑s1m∈Sm,j∈Sf(s1m,j)Pk(s1m,j)| ≤supi1m|∑j∈Sf(i1m,j)[∑s1mpk(j∣i1m) hhh∑j∈Sf(i1m,j)hh -∑s1mP(Xk-mk-1=s1m)pk(j∣s1m)]| ≤Msupi1m∑j|pk(j∣i1m)-∑s1mP(Xk-mk-1=s1m)pk(j∣s1m)| =Msupi1m∑j|∑s1mP(Xk-mk-1=s1m)pk(j∣i1m) hhhhhhhh -∑s1mP(Xk-mk-1=s1m)pk(j∣s1m)| ≤M∑s1mP(Xk-mk-1=s1m) hhh×supi1m∈Sm∑j∈S|pk(j∣i1m)-pk(j∣s1m)| ≤Msupl1m,k1m ∑j∈S|pk(j∣l1m)-pk(j∣k1m)| =2Mδ(Pk),
where δ(Pk)=supl1m,k1m∑j∈S[pk(j∣l1m)-pk(j∣k1m)]+=(1/2)supl1m,k1m∑j∈S|pk(j∣l1m)-pk(j∣k1m)|. By using condition (19), we get
(45)limn→∞∑k=mn[E[f(Xk-mk)∣Xk-mk-1]-E[f(Xk-mk)]]n=0.
Then, by using (27), (42), and (45), we can arrive at our conclusion (20). Thus the proof of Theorem 2 is completed.