JPS Journal of Probability and Statistics 1687-9538 1687-952X Hindawi 10.1155/2018/8068196 8068196 Research Article A Novel Entropy-Based Decoding Algorithm for a Generalized High-Order Discrete Hidden Markov Model http://orcid.org/0000-0003-0694-7273 Chan Jason Chin-Tiong 1 http://orcid.org/0000-0002-3253-0538 Ong Hong Choon 2 Su Steve 1 Ted Rogers School of Management Ryerson University 350 Victoria St. Toronto ON Canada M5B 2K3 ryerson.ca 2 School of Mathematical Sciences Universiti Sains Malaysia 11800 Gelugor Penang Malaysia usm.my 2018 252018 2018 15 12 2017 12 02 2018 27 02 2018 252018 2018 Copyright © 2018 Jason Chin-Tiong Chan and Hong Choon Ong. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The optimal state sequence of a generalized High-Order Hidden Markov Model (HHMM) is tracked from a given observational sequence using the classical Viterbi algorithm. This classical algorithm is based on maximum likelihood criterion. We introduce an entropy-based Viterbi algorithm for tracking the optimal state sequence of a HHMM. The entropy of a state sequence is a useful quantity, providing a measure of the uncertainty of a HHMM. There will be no uncertainty if there is only one possible optimal state sequence for HHMM. This entropy-based decoding algorithm can be formulated in an extended or a reduction approach. We extend the entropy-based algorithm for computing the optimal state sequence that was developed from a first-order to a generalized HHMM with a single observational sequence. This extended algorithm performs the computation exponentially with respect to the order of HMM. The computational complexity of this extended algorithm is due to the growth of the model parameters. We introduce an efficient entropy-based decoding algorithm that used reduction approach, namely, entropy-based order-transformation forward algorithm (EOTFA) to compute the optimal state sequence of any generalized HHMM. This EOTFA algorithm involves a transformation of a generalized high-order HMM into an equivalent first-order HMM and an entropy-based decoding algorithm is developed based on the equivalent first-order HMM. This algorithm performs the computation based on the observational sequence and it requires OTN~2 calculations, where N~ is the number of states in an equivalent first-order model and T is the length of observational sequence.

1. Introduction

State sequence for the Hidden Markov Model (HMM) is invisible but we can track the most likelihood state sequence based on the model parameter and a given observational sequence. The restored state has many applications especially when the hidden state sequence has meaningful interpretations for making predictions. For example, Ciriza et al.  have determined the optimal printing rate based on the HMM model parameter and an optimal time-out based on the restored states. The classical Viterbi algorithm is the most common technique for tracking state sequence from a given observational sequence . However, it does not measure the uncertainty present in the solution. Proakis and Salehi  proposed a method for measuring the error of a single state but this method is unable to measure the error of the entire state sequence. Hernando et al.  proposed a method of using entropy for measuring the uncertainty of the state sequence of a first-order HMM tracked from a single observational sequence with a length of T. The method is based on the forward recursion algorithm integrated with entropy for computing the optimal state sequence. Mann and McCallum  developed an algorithm for computing the subsequent constrained entropy of HMM which is similar to the probabilistic model conditional random fields (CRF). Ilic  developed an algorithm based on forward-backward recursion over the entropy semiring, namely, the Entropy Semiring Forward-Backward (ESRFB) algorithm for a first-order HMM with a single observational sequence. ESRFB has lower memory requirement as compared with Mann and McCallum’s algorithm for subsequent constrained entropy computation.

This paper is organized as follows. In Section 2, we define the generalized HHMM and present the extended entropy-based algorithm for computing the optimal state sequence developed by Hernando et al.  from a first-order to a generalized HHMM. In Section 3, we first review the high-order transformation algorithm proposed by Hadar and Messer  and then we introduce EOTFA, an entropy-based order-transformation forward algorithm for computing the optimal state sequence for any generalized HHMM. We discuss future research in Section 4 on entropy associated with state sequence of a generalized high-order HMM.

2. Entropy-Based Decoding Algorithm with an Extended Approach

The uncertainty appearing in a HHMM can be quantified by entropy. This concept is applied for quantifying the uncertainty of the state sequence tracked from a single observational sequence and model parameters. The entropy of the state sequence equals 0 if there is only one possible state sequence that could have generated the observation sequence as there is no uncertainty in the solution. The higher this entropy the higher the uncertainty involved in tracking the hidden state sequence. We extend the entropy-based Viterbi algorithm developed by Hernando et al.  for computing the optimal state sequence from a first-order HMM to a high-order HMM, that is, kth-order, where k2. The state entropy in HHHM is computed recursively for the reason of reducing the computational complexity from ONkT which used direct evaluation method to OTNk+1 in a HHMM where N is the number of states, T is the length of observational sequence, and k is the order of the Hidden Markov Model. In terms of memory space, the entropy-based Viterbi algorithm is more efficient which requires ONk+1 as compared to the classical Viterbi algorithm which requires OTNk+1. The memory space for the classical Viterbi algorithm is dependent on the length of the observational sequence due to the involvement of the process of “back tracking” in computing the optimal state sequence.

Before introducing the extended entropy-based Viterbi algorithm, we define a generalized high-order HMM, that is, kth-order HMM, where k2. These are followed by the definition of forward and backward probability variables for a generalized high-order HMM. These variables are required for computing the optimal state sequence in our decoding algorithm.

2.1. Elements of HHMM

HHMM involves two stochastic processes, namely, hidden state process and observation process. The hidden state process cannot be directly observed. However, it can be observed through the observation process. The observational sequence is generated by the observation process incorporated with the hidden state process. For a discrete HHMM, it must satisfy the following conditions.

The hidden state process qtt=2-kT is the kth-order Markov chain that satisfies(1)Pqtqll<t=Pqtqll=t-kt-1,where qt denotes the hidden state at time t and qtS, where S is the finite set of hidden states.

The observation process ott=1T is incorporated with the hidden state process according to the state probability distribution that satisfies (2)Potoll<t,qllt=Potqll=t-k+1t,where ot denotes the observation at time t and otV, where V is the finite set of observation symbols.

The elements for the kth-order discrete HMM are as follows:

Number of distinct hidden states, N

Number of distinct observed symbols, M

Length of observational sequence, T

Observational sequence, O=ot,t=1,2,,T

Hidden state sequence, Q=qt,t=2-k,,T

Possible values for each state, S=si,i=1,2,,N

Possible symbols per observation, V=vw,w=1,2,,M

Initial hidden state probability vector, πi1,πi1i2,,πi1ik

where πi1 is the probability that model will transit from state si1, (3)πi1=Pq1=si1,i1=1Nπi1=1,πi10,1i1N

πi1i2 is the probability that model will transit from state si1 and state si2,(4)πi1i2=Pq0=si1,q1=si2,i2=1Nπi1i2=1,πi1i20,1i1,i2N,

πi1ik is the probability that model will transit from state si1, state si2,, and state sik, (5)πi1ik=Pq2-k=si1,q3-k=si2,,q1=sik,ik=1Nπi1ik=1,πi1ik0,1i1,i2,,ikN

State transition probability matrix, A1=ai1i2,A2=ai1i2i3,,Ak=ai1i2ik+1,

where Aj-1 is the j-dimensional state transition probability matrix and ai1i2ij, is the probability of a transition to state sij given that it has had a transition from state si1 to state si2 to and to state sij-1 where j=2,,k+1,(6)ai1ij=Pqt=sijqt-1=sij-1,qt-2=sij-2,,qt-j+1=si1,ij=1Nai1i2ij=1,ai1i2ij0

Emission probability matrix, B1=bi1vm,B2=bi1i2vm,,Bk=bi1ikvm,

where B1 is the two-dimensional emission probability matrix and bi1vm is a probability of observing vm in state si1,(7)bi1vm=Pot=vmqt=si1,m=1Mbi1vm=1,bi1vm0,1i1N,

where Bj is the j+1-dimensional emission probability matrix and bi1ijvm is a probability of observing vm in state si1 at time t-j+1, si2 at time t-j+2,, and sij at time t where j=2,,k,(8)bi1ijvm=Pot=vmqt=sij,qt-1=sij-1,,qt-j+1=si1,m=1Mbi1ijvm=1,bi1ijvm0,1i1,i2,,ijN

For the kth-order discrete HMM, we summarize the parameters by using the components of λ=πi1,πi1i2,,πi1i2ik,A1,A2,,Ak,B1,B2,,Bk.

Note that throughout this paper, we will use the following notations.

q1:t denotes q1,q2,,qt

o1:t denotes o1,o2,,ot

2.2. Forward and Backward Probability

The entropy-based algorithm proposed by Hernando et al.  for computing the optimal state sequence of a first-order HMM is incorporated with forward recursion process. Recently, high-order HMM are widely used in a variety of applications such as speech recognition [8, 9] and longitudinal data analysis [10, 11]. For the HHMM, the Markov assumption has been weakened since the next state not only depends on the current state but also depends on other historical states. The state dependency is subjected to the order of HMM. Hence we have to modify the classical forward and backward probability variables for the HHMM, that is, the kth-order HMM where k2 are shown as follows.

Definition 1.

The forward variable αti2,i3,,ik+1 in the kth-order HMM is a joint probability of the partial observation sequence o1,o2,,ot and the hidden state of si2 at time t-k+1,si3 at time t-k+2,,sik+1 at time t where 1tT. It can be denoted as(9)αti2,i3,,ik+1=Po1,o2,,ot,qt-k+1=si2,qt-k+2=si3,,qt=sik+1λ.From (9), t=1 and 1i2,i3,,ik+1N, we obtain the initial forward variable as(10)α1i2,i3,,ik+1=Po1,q2-k=si2,q3-k=si3,,q1=sik+1λ=Pq2-k=si2,q3-k=si3,,q1=sik+1Po1q2-k=si2,q3-k=si3,,q1=sik+1=πi2i3ik+1bi2i3ik+1o1.From (9), (10), and 1i1,i2,,ik,ik+1N, we obtain the recursive forward variable for t=2,,T, (11)αti2,i3,,ik+1=Po1,o2,,ot,qt-k+1=si2,qt-k+2=si3,,qt=sik+1λ=i1=1NPo1,o2,,ot,qt-k=si1,qt-k+1=si2,qt-k+2=si3,,qt=sik+1λ=i1=1NPo1,o2,,ot-1,qt-k=si1,qt-k+1=si2,,qt-1=sikλPqt=sik+1qt-k=si1,qt-k+1=si2,,qt-1=sik×Potqt-k+1=si2,qt-k+2=si3,,qt=sik+1=i1=1Nαt-1i1,i2,,ikai1i2ik+1bi2i3ik+1ot.

Definition 2.

The backward probability variable βti1,i2,,ik in the kth-order HMM is a conditional probability of the partial observation sequence ot+1,ot+2,,oT given the hidden state of si1 at time t-k+1, si2 at time t-k+2,, and sik at time t. It can be denoted as(12)βti1,i2,,ik=Pot+1,ot+2,,oTqt-k+1=si1,qt-k+2=si2,,qt=sik,λ,where 1tT, 1i1,i2,,ikN.

We obtain the initial backward probability variable as(13)βTi1,i2,,ik=1.From (12) and (13), we obtain the recursive backward probability variable for t=1,2,,T-1,(14)βti1,i2,,ik=Pot+1,ot+2,,oTqt-k+1=si1,qt-k+2=si2,,qt=sik,λ=ik+1=1NPot+1,ot+2,,oT,qt+1=sik+1qt-k+1=si1,qt-k+2=si2,,qt=sik,λ=ik+1=1NPot+2,,oTqt-k+2=si2,,qt+1=sik+1,λPqt+1=sik+1qt-k+1=si1,,qt=sik,λ×Pot+1qt-k+2=si2,,qt+1=sik+1=ik+1=1Nβt+1i2,i3,,ik+1ai1i2ik+1bi2i3ik+1ot+1.

The probability of the observational sequence given the model parameter for the first-order HMM can be represented by using the classical forward probability and backward probability variables . We extend it to HHMM by using our modified forward probability and backward probability variables. The proof is due to Rabiner .

Definition 3.

Let αti1,i2,,ik and βti1,i2,,ik be the forward probability variable and backward probability variable, respectively; POλ is presented using the forward and backward probability variables as(15)POλ=Po1,,oTλ=ii=1Ni2=1Nik=1Nαti1,i2,,ikβti1,i2,,ik.

Proof.

(16) P O λ = P o 1 , o 2 , , o T λ = i 1 = 1 N i 2 = 1 N i k = 1 N P o 1 , o 2 , , o T , q t - k + 1 = s i 1 , q t - k + 2 = s i 2 , , q t = s i k λ = i 1 = 1 N i 2 = 1 N i k = 1 N P o 1 , o 2 , , o t , q t - k + 1 = s i 1 , q t - k + 2 = s i 2 , , q t = s i k λ × P o t + 1 , o t + 2 , , o T q t - k + 1 = s i 1 , q t - k + 2 = s i 2 , , q t = s i k , λ = i 1 = 1 N i 2 = 1 N i k = 1 N α t i 1 , i 2 , , i k β t i 1 , i 2 , , i k . We now normalize both of the forward and backward probability variables. These normalized variables are required as the intermediate variables for the algorithm of state entropy computation.

Definition 4.

The normalized forward probability variable α^ti2,i3,,ik+1 in the kth-order HMM is defined as the probability of the hidden state of si2 at time t-k+1,si3 at time t-k+2,,sik+1 at time t given the partial observation sequence o1,o2,,ot where 1tT. (17)α^ti2,i3,,ik+1=Pqt-k+1=si2,qt-k+2=si3,,qt=sik+1o1,o2,,ot.From (10), (17), t=1, and 1i1,i2,,ikN, we obtain the initial normalized forward probability variable as(18)α^1i2,i3,,ik+1=Pq2-k=si2,q3-k=si3,,q1=sik+1o1=Pq2-k=si2,q3-k=si3,,q1=sik+1,o1Po1=πi2i3ik+1bi2i3ik+1o1r0,where(19)r0=jk=1Nj1=1Nπj1j2jkbj1j2jko1.From (11), (17), (18), and t=2,,T, 1i1,i2,,ik,ik+1N, we obtain the recursive normalized forward probability variable as(20)α^ti2,i3,,ik+1=Pqt-k+1=si2,qt-k+2=si3,,qt=sik+1o1,o2,,ot=Pqt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1,o2,,otPoto1,o2,,ot-1=i1=1Nα^t-1i1,i2,,ikai1i2ik+1bi2i3ik+1otrt,where (21)rt=jk=1Nj1=1Ni1=1Nαt-1i1,j1,,jk-1ai1j1jkbj1j2jkot.Note that the normalization factor rt ensures that the probabilities sum to one and it also represents the conditional observational probability .

Definition 5.

The normalized backward probability variable β^ti1,i2,,ik in the kth-order HMM is defined as the quotient of a conditional probability of the partial observation sequence ot+1,ot+2,,oT given the hidden state of si1 at time t-k+1, si2 at time t-k+2,,sik at time t, and a conditional probability of the partial observation sequence ot+1,ot+2,,oT given the entire observation sequence o1,o2,,oT. It can be denoted as(22)β^ti1,i2,,ik=Pot+1,ot+2,,oTqt-k+1=si1,qt-k+2=si2,,qt=sikPot+1,ot+2,,oTo1,o2,,oT,where 1tT, 1i1,i2,,ikN

From (14) and (22), we obtain the recursive normalized backward probability variable as(23)β^ti1,i2,,ik=Pot+1,ot+2,,oTqt-k+1=si1,qt-k+2=si2,,qt=sikPot+1,ot+2,,oTo1,o2,,oT=ik+1=1NPot+1,ot+2,,oTqt-k+1=si1,qt-k+2=si2,,qt=sik,qt+1=sik+1Pot+1,,oTo1,o2,,oT=ik+1=1Nβ^t+1i2,i3,,ik+1ai1i2ik+1bi2i3ik+1ot+1rt+1,where (24)rt+1=jk=1Nj1=1Ni1=1Nαti1,j1,,jk-1ai1j1jkbj1j2jkot+1.Our extended algorithm includes the normalized forward recursion given by (18) and (20). The extended algorithm for the kth-order HMM requires OTNk+1 calculations if we include either normalized forward recursion given by (18) and (20) or the normalized backward recursion given by (13) and (23). The direct evaluation method, in comparison, requires ONT+k-1 calculations where N is the number of states, T is the length of observational sequence, and k is the order of the Hidden Markov Model.

2.3. The Algorithm by Hernando et al.

Hernando et al.  are pioneers for using entropy to compute the optimal state sequence of a first-order HMM with a single observational sequence. This algorithm is based on a first-order HMM normalized forward probability,(25)α^tj=Pqt=sjo1,o2,,ot,auxiliary probability,(26)Pqt-1=siqt=sj,o1:t,and intermediate entropy,(27)Htsj=Hq1:t-1qt=sj,o1:t.The entropy-based algorithm for computing the optimal state sequence of a first-order HMM is as follows .

(1 ) Initialization. For t=1 and 1jN, (28)H1sj=0,α^1j=πjbjo1i=1Nπibio1.

(2 ) Recursion. For t=2,,T-1, and 1jN, (29)α^tj=i=1Nα^t-1iaijbjotk=1Ni=1Nα^t-1iaikbkot,Pqt-1=siqt=sj,o1:t=aijα^t-1ik=1Ni=1Naikα^t-1i,Htsj=i=1NPqt-1=siqt=sj,o1:tHt-1si-i=1NPqt-1=siqt=sj,o1:tlog2Pqt-1=siqt=sj.

( 3) Termination (30) H T q 1 : T o 1 : T = i = 1 N H T s i α ^ T i - i = 1 N α ^ T i l o g 2 α ^ T i .

This algorithm performs the computation linearly with respect to the length of the observation sequence with computational complexity OTN2. It requires the memory space of ON2 which indicates that the memory space is independent of the observational sequence.

2.4. The Computation of the Optimal State Sequence for a HHMM

The extended classical Viterbi algorithm is commonly used for computing the optimal state sequence for HHMM. This algorithm provides the solution along with its likelihood. This likelihood probability can be determined as follows.(31)Pq1,q2,,qTo1,o2,,oT=Pq1,q2,,qT,o1,o2,,oTPo1,o2,,oT.This probability can be used as a measure of quality of the solution. The higher the probability of our “solution,” the better our “solution.” Entropy can also be used for measuring the quality of the state sequence of the kth-order HMM. Hence, state entropy is proposed to be used for obtaining the optimal state sequence of a HHMM.

We define entropy of a discrete random variable as follows .

Definition 6.

The entropy HX of a discrete random variable X with a probability mass function PX=x is defined as(32)HX=-xXPxlog2Px.When the log has a base of 2, the unit of the entropy is bits. Note that 0log0=0.

From (32), the entropy of the distribution for all possible state sequences is as follows:(33)Hq1,q2,,qTo1,o2,,oT=-QPq1=si1,q2=si2,,qT=siTo1,o2,,oTlog2Pq1=si1,q2=si2,,qT=siTo1,o2,,oT.For the first-order HMM, if all NT possible state sequences are equally likely to generate a single observational sequence with a length of T, then the entropy equals Tlog2N. The entropy is kTlog2N in the kth-order HMM if all NkT possible state sequences are equally likely to produce the observational sequence.

For this extended algorithm, we require an intermediate state entropy variable, Htsi2,si3,,sik+1 that can be computed recursively using the previous variable, Ht-1si1,si2,,sik.

We define the state entropy variable for the kth-order HMM as follows.

Definition 7.

The state entropy variable, Htsi2,si3,,sik+1, in the kth-order HMM is the entropy of all the state sequences that lead to state of si2 at time t-k+1,si3 at time t-k+2,, and sik+1 at time t, given the observation sequence o1,o2,,ot. It can be denoted as(34)Htsi2,si3,,sik+1=Hq2-k:t-1qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t.

We analyse the state entropy for the kth-order HMM in detail, shown as follows.

From (34) and t=1, we obtain the initial state entropy variable as(35)H1si2,si3,,sik+1=0.From (34) and (35) we obtain the recursion on the entropy for t=2,,T, and 1i1,i2,,ik+1N, (36)Htsi2,si3,,ik+1=Hq2-k:t-1qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t=Hqt-k:t-1qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t+Hq2-k:t-2qt-k,qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t,where(37)Hqt-k:t-1qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t=-i1=1NPqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:tlog2Pqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:t,Hq2-k:t-2qt-k,qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t=i1=1NPqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:tHq2-k:t-2qt-k=si1,qt-k+1=si2,,qt=sik+1,o1:t=i1=1NPqt-k=si1,,qt-1=sikqt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:tHt-1si1,si2,,sik.The auxiliary probability Pqt-k=si1,qt-k+1=si2,,qt-1=sikqt-k+1=si2,,qt-1=sik,qt=sik+1,o1:t is required for our extended entropy-based algorithm. It can be computed as follows:(38)Pqt-k=si1,qt-k+1=si2,,qt-1=sikqt-k+1=si2,,qt-1=sik,qt=sik+1,o1:t=Pqt-k+1=si2,,qt=sik+1,ot-k+1,,otqt-k=si1,,qt-1=sik,o1:t-1Pqt-k=si1,,qt-1=siko1:t-1Pqt-k+1=si2,,qt=sik+1,ot-k+1,,oto1:t-1=Pot-k+1,,otqt-k+1=si2,,qt=sik+1Pqt-k+1=si2,,qt=sik+1qt-k=si1,,qt-1=sikPqt-k=si1,,qt-1=siko1:t-1Pot-k+1,,otqt-k+1=si2,,qt=sik+1Pqt-k+1=si2,,qt=sik+1o1:t-1=Pqt-k=si2,,qt=sik+1qt-k=si1,,qt-1=sikPqt-k=si1,,qt-1=siko1:t-1jk=1Njk-1=1Nj1=1NPqt-k+1=sj2,,qt=sjk+1qt-k=sj1,,qt-1=sjkPqt-k=sj1,,qt-1=sjko1:t-1=ai1i2ikik+1α^t-1i1,i2,,ikjk=1Njk-1=1Nj1=1Naj1j2jkik+1α^t-1j1,j2,,jk.For the final process of our extended algorithm, we are required to compute the conditional entropy Hq1:To1:T which can be expanded as follows: (39)Hq1:To1:T=Hq1:T-kqT-k+1=si1,qT-k+2=si2,qT-k+3=si3,,qT=sik,o1:T+HqT-k+1:To1:T=i1=1Ni2=1Nik=1NHTsi1,si2,,sikα^Ti1,i2,,ik-i1=1Ni2=1Nik=1Nα^Ti1,i2,,iklog2α^Ti1,i2,,ik.The following basic properties of HMM and entropy are used for proving Lemma 8.

(i) According to the generalized high-order HMM, state qt-k-j+1, j2 and qt are statistically independent given qt-k,qt-k+1,qt-k+2,,qt-1. The same applies to qt-k-j+1, j2 and ot are statistically independent given qt-k,qt-k+1,qt-k+2,,qt-1.

(ii) According to the basic property of entropy , (40)HXY=y=HXifXandYareindependent.

We now introduce the following lemma for the kth-order HMM. The following proof is due to Hernando et al. .

Lemma 8.

For the kth-order HMM, the entropy of the state sequence up to time t-k-1, given the states from time t-k to time t-1 and the observations up to time t-1, is conditionally independent of the state and observation at time t(41)Ht-1si1,si2,,sik=Hq1:t-2qt-k=si1,qt-k+1=si2,qt-k+2=si3,,qt=sik+1,o1:t.

Proof.

(42) H q 1 : t - 2 q t - k = s i 1 , q t - k + 1 = s i 2 , q t - k + 2 = s i 3 , , q t = s i k + 1 , o 1 : t = H q 1 : t - 2 q t - k = s i 1 , q t - k + 1 = s i 2 , q t - k + 2 = s i 3 , , q t - 1 = s i k , o 1 : t - 1 , q t = s i k + 1 , o t = H q 1 : t - 2 q t - k = s i 1 , q t - k + 1 = s i 2 , q t - k + 2 = s i 3 , , q t - 1 = s i k , o 1 : t - 1 = H t - 1 s i 1 , s i 2 , , s i k .

Our extended entropy-based algorithm for computing the optimal state sequence is based on normalized forward recursion variable, state entropy recursion variable, and auxiliary probability. From (18), (20), (35), (36), (38), and (39), we construct the extended entropy-based decoding algorithm for the kth-order HMM as follows:

( 1) Initialization. For t=1 and 1i2,i3,,ik+1N, (43)H1si2,si3,,sik+1=0,α^1i2,i3,,ik+1=πi2i3ik+1bi2i3ik+1o1jk=1Njk-1=1Nj1=1Nπj1j2jkbj1j2jko1.

( 2) Recursion. For t=2,,T-1, and 1i1,i2,,ik+1N, (44)α^ti2,i3,,ik+1=i1=1Nα^t-1i1,i2,,ikai1i2ikik+1bi2i3ikik+1otjk=1Nj1=1Ni1=1Nα^t-1i1,j1,,jk-1ai1j1jkbj2j3jk+1ot,Pqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:t=ai1i2ikik+1α^t-1i1,i2,,ikjk=1Njk-1=1Nj1=1Naj1j2jkik+1α^t-1j1,j2,,jk,Htsi2,si3,,sik+1=i1=1NPqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:tHt-1si1,si2,,sik-i1=1NPqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:tlog2Pqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:t.

( 3) Termination (45) H q 1 : T o 1 : T = i 1 = 1 N i k = 1 N H T s i 1 , s i 2 , , s i k α ^ T i 1 , i 2 , , i k - i 1 = 1 N i k = 1 N α ^ T i 1 , i 2 , , i k l o g 2 α ^ T i 1 , i 2 , , i k .

This extended algorithm performs the computation of the optimal state sequence linearly with respect to the length of observational sequence which requires OTNk+1 calculation and it has memory space that is independent of the length of observational sequence, ONk+1, since α^ti2,i3,,ik+1, Htsi2,si3,,sik+1, Pqt-k=si1,,qt-1=sikqt-k+1=si2,,qt=sik+1,o1:t should be computed only once in tth iteration and, having been used for the computation of (t+1)th, they can be deleted from the space storage.

2.5. Numerical Illustration for the Second-Order HMM

We consider a second-order HMM for illustrating our extended entropy-based algorithm in computing the optimal state sequence. Let us assume that this second-order HMM has the state space S, which is S=s1,s2 and the possible symbols per observation which is O=v1,v2,v3.

The graphical representation of the first-order HMM that is used for the numerical example in this section is given in Figure 1. The second-order HMM in Figure 2 is developed based on the first-order HMM in Figure 1 which has two states and three observational symbols. A HMM of any order has the parameters of λ=π,A,B where π is the initial state probability vector, A is the state transition probability matrix, and B is the emission probability matrix. Note that the matrices of A and B whose components are indicated as ai1i2, ai1i2i3, bi2ot=vm and bi2i3ot=vm where 1i1,i2,i32 and 1m3 can be obtained from Figures 1 and 2. However, the initial state probability vector is not shown in the above graphical diagrams.

The graphical diagram shows a first-order HMM with 2 states and 3 observational symbols.

The graphical diagram shows a second-order HMM with 2 states and 3 observational symbols.

The initial state probability vectors for the first-order and second-order HMM are shown as follows:(46)π1=0.50.5,π2=0.50,π3=0.50.π1=π˙i2 is the initial state probability vector for the first-order HMM and π2=π˙i21 and π3=π˙i22 are the initial state probability vectors for the second-order HMM where π˙i2=P(q1=si2), π˙i21=P(q1=s1,q0=si2), π˙i22=P(q1=s2,q0=si2), and 1i22.

The state transition probability matrices for the first-order and second-order HMMs are shown as follows:(47)A1=0.50.510,A2=0.50.500,A3=0.50.510.A1=ai1i2 is the state transition probability matrix for the first-order HMM and A2=ai1i21 and A3=ai1i22 are the state transition probability matrices for the second-order HMM where ai1i2=P(qt=si2qt-1=si1), ai1i21=P(qt=s1qt-1=si2,qt-2=si1), ai1i22=P(qt=s2qt-1=si2,qt-2=si1), and 1i1,i22

The emission probability matrices for the first-order and second-order HMMs are shown as follows:(48)B1=0.500.5001,B2=0.50.500,B3=00.500,B4=0.5010.B1=bi2ot=vm is the emission probability matrix for the first-order HMM and B2=bi2i3ot=v1, B3=bi2i3ot=v2, and B4=bi2i3ot=v3 are the emission probability matrices for the second-order HMM where bi2ot=vm=P(ot=vmqt=si2), bi2i3ot=v1=P(ot=v1qt=si3,qt-1=si2), bi2i3ot=v2=P(ot=v2qt=si3,qt-1=si2), and bi2i3ot=v3=P(ot=v3qt=si3,qt-1=si2).

The following is the observational sequence that we used for illustrating our extended algorithm: (49)o1:6=o1=v1,o2=v1,o3=v3,o4=v2,o5=v3,o6=v1.We applied our extended algorithm for computing the optimal state sequence based on state entropy. The computed value of the state entropy is shown in Figure 3.

The evolution of the trellis structure of the second-order HMM with the observation sequence o1:6=(o1=v1,o2=v1,o3=v3,o4=v2,o5=v3,o6=v1).

The total entropy after each time step is displayed at the bottom of Figure 3. For example, after receiving the second observation, that is, o1:2=(o1=v1,o2=v1), it has produced two state sequences which are q1:2=(q1=s1,q2=s1) and q1:2=(q1=s1,q2=s2) as shown by the bold arrows. Each possible state sequence has a probability of 0.5; that is, α^2(1,1)=α^2(1,2)=0.5, and hence the total entropy is 1 bit. However, after receiving the fourth observation, that is, o1:4=(o1=v1,o2=v1,o3=v3,o4=v2), it has produced one state sequence which is q1:4=(q1=s1,q2=s2,q3=s1,q4=s2) as shown by the dashed arrow. This possible state sequence has a probability of 1, that is, α^4(1,2)=1, and hence the total entropy is 0 bit. After receiving the sixth observation, this second-order HMM has produced only one possible optimal state sequence; that is, q1:6=(q1=s1,q2=s2,q3=s1,q4=s2,q5=s1,q6=s2) with the total entropy of 0 which indicates that there is no uncertainty.

3. Entropy-Based Decoding Algorithm with a Reduction Approach

The extended entropy-based Viterbi algorithm in Section 2 has addressed only the issue related to memory space but this algorithm is not able to overcome the computational complexity. In this section, we introduce an efficient entropy-based algorithm that used reduction approach, namely, entropy-based order-transformation forward algorithm (EOTFA) to compute the optimal state sequence based on entropy of any generalized HHMM. This algorithm has addressed issues related to memory space and computational complexity.

3.1. Transforming a High-Order HMM with a Single Observational Sequence

This EOTFA algorithm involves a transformation of a generalized high-order HMM into an equivalent first-order HMM and an algorithm is developed based on the equivalent first-order HMM. This algorithm performs the computation based on the observational sequence and it requires OTN~2 calculations, where N~ is the number of states in an equivalent first-order model and T is the length of observational sequence.

The transformation of a generalized high-order HMM into an equivalent first-order HMM is based on Hadar and Messer’s method .

Suppose Q~t=qt,qt-1,,qt-k+1 for 1tT; then the hidden state process Q~tt=1T of the kth-order Markov chain satisfies (50)PQ~tQ~ll<t=Pqt,qt-1,,qt-k+1qt-1,qt-2,,q2-k=Pqtqt-1,qt-2,,qt-k=Pqt,qt-1,,qt-k+1qt-1,qt-2,,qt-k=PQ~tQ~t-1,where Q~t takes the value from the set of hidden states S~=si,i=1,2,,Nk. Hence, the hidden state process Q~tt=1T forms the first-order HMM Markov process.

The observation process ott=1T satisfies(51)Potoll<t,Q~llt=Potollt-1,qllt=Potqllt=Potqll=t-kt=PotQ~t.Hence, the hidden state process Q~tt=1T and the observation process ott=1T form the first-order HMM.

Remarks 9.

(i) (52)PQ~tQ~t-1=PQ~t=qt-k+1=si2,qt-k+2=si3,,qt=sik+1Q~t-1=qt-k=si1,qt-k+1=si2,,qt-1=sik=PQ~t=si2,si3,,sik+1Q~t-1=si1,si2,,sik=PQ~t=si2i3ik+1Q~t-1=si1i2ik,where si1,si2,,sik and si2,si3,,sik+1S~.

(ii) (53)PotQ~t=PotQ~t=qt-k+1=si2,qt-k+2=si3,,qt=sik+1=PotQ~t=si2,,sik+1=PotQ~t=si2i3ik+1,where si2,si3,,sik+1S~.

Note that we assume si1,si2,,sik=si1i2ik and si2,si3,,sik+1=si2i3ik+1.

The elements for the transformation of a high-order into an equivalent first-order discrete HMM are as follows:

Number of distinct hidden states, N~

Number of distinct observed symbols, M

Length of observational sequence, T

Observational sequence, O=ot,t=1,2,,T

Hidden state sequence, Q~=Q~t,t=1,2,,T

Possible values for each state, S~=si,i=1,2,,Nk

Possible symbols per observation, V~=νw,w=1,2,,M

Initial hidden state probability vector, π~=π~i, and π~i is the probability that model will transit from state s~i=si1,si2,,sik=si1i2ik, where(54)π~i=PQ~1=s~i,i=1N~π~i=1,π~i0

State transition probability matrix, A~=a~ij and aij is the probability of a transition from state s~i=si1,si2,,sik at time t-1 to state s~j=si2,si3,,sik+1 at time t where(55)a~ij=PQ~t=s~jQ~t-1=s~i,j=1N~a~ij=1,a~ij0,

where the first k-1 entries of s~i are equal to the last k-1 entries of s~j

Emission probability matrix, B~=b~ivm, and b~ivm is a probability of observing vm in state s~i=si1,si2,,sik at time t:(56)b~ivm=Pot=vmQ~t=s~i,m=1Mb~ivm=1,b~ivm0.

3.2. The Forward and Backward Probabilities Variables for the Transformed Model

In this subsection, we omit the derivations for forward and backward probability variables since the derivations are similar to the derivations in Section 2.2.

The forward recursion variable for the transformed model at time t is as follows:(57)α~tj=Po1,o2,,ot,Q~t=s~jλ=Po1,o2,,ot,Q~t=si2i3ikλ=i=1N~α~t-1ia~ijb~jot.The backward recursion variable for the transformed model at time t is as follows:(58)β~ti=Pot+1,ot+2,,oTQ~t=s~i,λ=Pot+1,ot+2,,oTQ~t=si1i2ik=j=1N~β~t+1ja~ijb~jot+1.The normalized forward variable at time t is as follows:(59)α~tj=PQ~to1:t=i=1N~α~t-1ia~ijb~jotrt,where rt=j=1N~i=1N~α~t-1ia~ijb~jot.

The normalized backward variables at time t is as follows:(60)β~ti=Pot+1:TQ~tPot+1:Too:t=j=1N~β~t+1ja~ijb~jot+1rt+1,where rt+1=j=1N~i=1N~α~tia~ijb~jot+1.

3.3. The Computation of the Optimal State Sequence for a HHMM

For EOFTA algorithm, we require state entropy variable, Hts~j, that can be computed recursively using the previous variable, Ht-1s~i.

We define the state entropy variable as follows.

Definition 10.

The state entropy variable, Hts~j, in an order-transformation HMM, is the entropy of all the paths that lead to state of s~j at time t, given the observations o1,o2,,ot. It can be denoted as(61)Hts~j=HQ~1:t-1Q~t=s~j,o1:t.From (61) at t=1, we obtain the initial state entropy variable as(62)H1s~j=0.From (61) and (62), we obtain the recursion on the entropy for t=2,,T-1, and 1i,jN~(63)Hts~j=HQ~1:t-1Q~t=s~j,o1:t=HQ~1:t-2,Q~t-1Q~t=s~j,o1:t=HQ~t-1Q~t=s~j,o1:t+HQ~1:t-2Q~t-1,Q~t=s~j,o1:t,where(64)HQ~t-1Q~t=s~j,o1:t=-i=1N~PQ~t-1=s~iQ~t=s~j,o1:tlog2PQ~t-1=s~iQ~t=s~j,o1:t,HQ~1:t-2Q~t-1,Q~t=s~j,o1:t=i=1N~PQ~t-1=s~iQ~t=s~j,o1:tHQ~1:t-2Q~t-1=s~i,Q~t=s~j,o1:t=i=1N~PQ~t-1=s~iQ~t=s~j,o1:tHt-1s~i.The auxiliary probability PQ~t-1=s~iQ~t=s~j,o1:t is required for our EOTFA algorithm. It can be computed as follows:(65)PQ~t-1=s~iQ~t=s~j,o1:t=PQ~t-1=s~iQ~t=s~j,ot,o1:t-1=PQ~t=s~j,otQ~t-1=s~i,o1:t-1PQ~t-1=s~io1:t-1PQ~t=s~j,oto1:t-1=PotQ~t=s~jPQ~t=s~jQ~t-1=s~iPQ~t-1=s~io1:t-1PotQ~t=s~jPQ~t=s~jo1:t-1=PQ~t=s~jqt-1=s~iPQ~t-1=s~io1:t-1k=1N~PQ~t=s~jQt-1=s~kPQ~t-1=s~ko1:t-1=a~ijα~t-1ik=1N~a~kjα~t-1k.For the final process, we compute Hq1:To1:T which can be expanded as follows:(66)HQ~1:To1:T=HQ~1:T-1Q~T=s~j,o1:T+HQ~To1:T=i=1N~HTs~iα~Ti-i=1N~α~Tilog2α~Ti.The basic entropy concept in (40) and the following basic properties of HMM are used for proving our Lemma 11. According to the transformation of a high-order into an equivalent first-order HMM, state Q~t-r, r2, and Q~t are statistically independent given Q~t-1. The same applies to Q~t-r, r2 and ot are statistically independent given Q~t-1.

The following proof is due to Hernando et al. .

Lemma 11.

For the transformation of a high-order into an equivalent first-order HMM, the entropy of the state sequence up to time t-2, given the states at time t-1 and the observations up to time t-1, is conditionally independent on the state and observation at time t(67)Ht-1s~i=HQ~1:t-2Q~t-1=s~i,Q~t=s~j,o1:t.

Proof.

(68) H Q ~ 1 : t - 2 Q ~ t - 1 = s ~ i , Q ~ t = s ~ j , o 1 : t = H Q ~ 1 : t - 2 Q ~ t - 1 = s ~ i , o 1 : t - 1 , Q ~ t = s ~ j , o t = H Q ~ 1 : t - 2 Q ~ t - 1 = s ~ i , o 1 : t - 1 = H t - 1 s ~ i .

Our EOTFA algorithm for computing the optimal state sequence is based on the normalized forward recursion variable, state entropy recursion variable, and auxiliary probability. From (59), (60), (61), (62), (63), and (66), we construct our EOTFA algorithm as follows.

( 1) Initialization. For t=1 and 1jN~, (69)H1s~j=0,α~1j=π~jb~jo1i=1N~π~ib~io1.

( 2) Recursion. For t=2,,T and 1jN~, (70)α~tj=i=1N~α~t-1ia~ijb~jotk=1N~i=1N~α~t-1ia~ikb~kot,PQ~t-1=s~iQ~t=s~j,o1:t=a~ijα~t-1ik=1N~a~kjα~t-1k,Hts~j=i=1N~Ht-1s~iPQ~t-1=s~iQ~t=s~j,o1:t-i=1N~PQ~t-1=s~iQ~t=s~j,o1:tlog2PQ~t-1=s~iQ~t=s~j,o1:t.

( 3) Termination (71) H Q ~ 1 : T o 1 : T = i = 1 N ~ H T s ~ i α ~ T i - i = 1 N ~ α ~ T i l o g 2 α ~ T i .

The direct evaluation algorithm, Hernando et al.’s algorithm, and our algorithm perform the computation of state entropy exponentially with respect to the order of HMM. Our algorithm proposes the transformation of a generalized high-order into an equivalent first-order HMM and then compute the state entropy based on the equivalent first-order model; hence our algorithm is the most efficient in which it requires OTN~2 calculations as compared to the direct evaluation method which requires ONT+k-1 calculations and the extended algorithm which requires OTNk+1 calculations where N is the number of states in a model, N~ is the number of states in an equivalent first-order model, T is the length of observational sequence, and k is the order of HMM.

3.4. Numerical Illustration for an Equivalent First-Order HMM

We consider the second-order HMM in Section 2.5 for illustrating our EOTFA algorithm in computing the optimal state sequence. According to our proposed novel algorithm, we first transformed the second-order HMM in Section 2.5 into the equivalent first-order HMM by using Hadar and Messer method . The equivalent first-order HMM has the following model parameters λ~=π~,A~,B~, where π~ is the initial state probability vector, A~ is the state transition probability matrix, and B~ is the emission probability matrix.(72)π~=0.50.500,A~=0.50.500000.50.501000000,B~=0.50.50000.5000.5010.

Note that the above state transition probability and the emission probability matrices whose components are indicated as a~i1i2 and b~i2ot=vm where 1i1,i24 and 1m3 can be obtained from the graphical diagram in Figure 4.

The graphical diagram shows an equivalent first-order HMM.

The state space for the equivalent first-order HMM is S~=s~1,s~2,s~3,s~4, where s~1=s1,s1, s~2=s1,s2, s~3=s2,s1, and s~4=s2,s2, and the possible symbols per observation are O=v1,v2,v3. Note that π~1=π˙~i2, where π˙~i2=PQ~t=s~i2, A~=a~i1i2, where a~i1i2=P(Q~t=s~i2Q~t-1=s~i1), and B~=b~i2ot=vm, where b~i2ot=vm=P(ot=vmQ~t=s~i2).

The equivalent first-order HMM was developed based on Hadar and Messer’s method  is shown in Figure 4.

Secondly, the optimal state sequence is computed based on the equivalent first-order HMM by using our proposed algorithm. Finally, the optimal state sequence of the second-order HMM is inferred from the optimal state sequence from the equivalent first-order HMM.

The following is the observational sequence used for illustrating our algorithm: (73)o1:6=o1=v1,o2=v1,o3=v3,o4=v2,o5=v3,o6=v1.

We applied our EOFTA algorithm for computing the optimal state sequence based on the state entropy. The computed value of state entropy is shown in Figure 5.

The evolution of the trellis structure for a transformation of a second-order into an equivalent first-order HMM with the observation sequence o1:6=(o1=v1,o2=v1,o3=v3,o4=v2,o5=v3,o6=v1).

The total entropy after each time step for the transformed model, that is, the second-order transformed into the equivalent first-order HMM is displayed at the bottom of Figure 5. For example, this model has produced only one possible state sequence; that is, Q~1:5=(Q~1=s~1,Q~2=s~2,Q~3=s~3,Q~4=s~2,Q~5=s~3), as shown by the bold arrow with a probability of 1 after receiving the fifth observation. The total entropy is 0 at t=5 which indicates that there is no uncertainty. After receiving the sixth observation, that is, o1:6=(o1=v1,o2=v1,o3=v3,o4=v2,o5=v3,o6=v1), this equivalent first-order HMM has produced one possible optimal state sequence Q~1:6=(Q~1=s~1,Q~2=s~2,Q~3=s~3,Q~4=s~2,Q~5=s~3,Q~6=s~2) which is similar to q1:6=(q1=s1,q2=s2,q3=s1,q4=s2,q5=s1,q6=s2) that is produced by the second-order HMM in Section 2.5 with a total entropy of 0 which indicates that there is no uncertainty. As a result, the optimal state sequence of the high-order HMM is inferred from the optimal state sequence of the equivalent first-order HMM. Our proposed algorithm is based on the equivalent first-order HMM and only requires O(TN~2) computation and hence we can conclude that our EOTFA algorithm is more efficient.

4. Conclusion and Future Work

We have introduced a novel algorithm for computing the optimal state sequence for HHMM that requires OTN~2 calculations and ON~2 memory space where N~ is the number of states in an equivalent first-order HMM and T is the length of observational sequence. This algorithm is to be running with Viterbi algorithm in tracking the optimal state sequence as well as the entropy of the distribution of the state sequence. We have developed this algorithm for the case of a generalized discrete high-order HMM. This research can be also extended for continuous high-order HMMs and these models are widely used in speech recognition.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Ciriza V. Donini L. Durand J. Girard S. Optimal timeouts for power management under renewal or hidden Markov processes for requests 2011 http://hal.inria.fr/hal-00412509/en Rabiner L. R. Tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE 1989 77 2 257 286 2-s2.0-0024610919 10.1109/5.18626 Proakis J. G. Salehi M. Communications System Engineering 2002 Upper Saddler River, NJ, USA Prentice-Hall Hernando D. Crespi V. Cybenko G. Efficient computation of the hidden Markov model entropy for a given observation sequence Institute of Electrical and Electronics Engineers Transactions on Information Theory 2005 51 7 2681 2685 MR2246386 10.1109/TIT.2005.850223 Zbl1310.94033 2-s2.0-23744447734 Mann G. S. McCallum A. Efficient computation of entropy gradient for semi-supervised conditional random fields Proceedings of the Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL ’07) April 2007 Morristown, NJ, USA Association for Computational Linguistics 109 112 10.3115/1614108.1614136 Ilic V. M. Entropy Semiring Forward-backward Algorithm for HMM Entropy Computation 2011, https://arxiv.org/abs/1108.0347 Hadar U. Messer H. High-order hidden Markov models—estimation and implementation Proceedings of the 15th IEEE/SP Workshop on Statistical Signal Processing (SSP '09) September 2009 Wales, UK 249 252 10.1109/SSP.2009.5278591 2-s2.0-72349095360 du Preez J. A. Efficient training of high-order hidden Markov models using first-order representations Computer Speech and Language 1998 12 1 23 39 10.1006/csla.1997.0037 2-s2.0-0031651793 Lee L. M. Lee J. C. A study on high-order hidden Markov Models and applications to speech recognition Proceedings of the 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems 2006 682 690 10.1007/11779568_74 Altman R. M. Mixed hidden MARkov models: an extension of the hidden MARkov model to the longitudinal data setting Journal of the American Statistical Association 2007 102 477 201 210 MR2345538 10.1198/016214506000001086 2-s2.0-33947231297 Spagnoli A. Henderson R. Boys R. J. Houwing-Duistermaat J. J. A hidden Markov model for informative dropout in longitudinal response data with crisis states Statistics & Probability Letters 2011 81 7 730 738 10.1016/j.spl.2011.02.005 MR2793738 Cover T. M. Thomas J. A. Elements of Information Theory 2006 New York, NY, USA John Wiley & Sons Wiley Series in Telecommunications and Signal Processing 10.1002/0471200611 MR1122806