Computation of Invariant Measures and Stationary Expectations for Markov Chains with Block-Band Transition Matrix

This paper deals with the computation of invariant measures and stationary expectations for discrete-time Markov chains governed by a block-structured one-step transition probability matrix. The method generalizes in some respect Neuts ’ matrix-geometric approach to vector-state Markov chains. The method reveals a strong relationship between Markov chains and matrix continued fractions which can provide valuable information for mastering the growing complexity of real-world applications of large-scale grid systems and multidimensional level-dependent Markov models. The results obtained are extended to continuous-time Markov chains.


Introduction
This paper is concerned with the computation of invariant measures of discrete-time Markov chains X = ðX k Þ k≥0 having a block-partitioned one-step transition matrix P of the form where all elements are nonnegative matrices of suitable dimension, that is, P in ∈ ℝ d i ×d n for d 0 , d 1 , ⋯ ∈ ℕ = f1, 2, 3, ⋯g. The states of the Markov chain are denoted by ði, jÞ, i ≥ 0 and 1 ≤ j ≤ d i , and are ordered in lexicographic order, that is, ð0, 1Þ, …, ð0, d 0 Þ, ð1, 1Þ, …, ð1, d 1 Þ, ð2, 1Þ, …. Following Neuts [1], the set of states fði, 1Þ, ⋯, ði, d i Þg, i ≥ 0, will be called the level i. We refer to the first component of X k (the level) as X ð1Þ k , and by X ð2Þ k , we denote the second component of X k . Throughout this paper, it is assumed that n ≥ 1 and that the Markov chain is irreducible and recurrent. Note that irreducibility implies P i,i+1 ≠ 0 for all i ∈ f0, 1, 2, ⋯g. From the general theory of Markov chains, we then know that the system has a unique solution (up to a constant factor) which is strictly positive (see Karlin and Taylor ([2], chapter 11)). For the solution vector x, we use the notation x = ðx 0 , x 1 , x 2 , ⋯Þ and x i = ðx ði,1Þ , x ði,2Þ , ⋯, x ði,d i Þ Þ, i ≥ 0: Every constant positive multiple of x is called an invariant measure of X. The class of irreducible recurrent chains can be further subdivided into positive and null-recurrent chains according to whether ∑ ∞ i=0 x i e T < ∞ or = ∞, with e being a vector with all its components equal to one.
Vector-state Markov chains of the form (1) generalize the theory of matrix-geometric processes which was pioneered by M. Neuts [1]. As he points out in his monograph, the matrix-geometric approach is applicable to a wide class of stochastic models. The characteristic feature of these models is that the blocks within the transition probability matrix (or the generator matrix in case of continuous-time models) only depend on their relative position to the main diagonal, that is, P ij only depends on j − i. Otherwise, Neuts' method fails.
A general strategy for solving systems of the form (1) is the following: (i) Truncate the system at level N, that is, consider the northwest corner truncation ðNÞ P of P with ðN + 1Þ × ðN + 1Þ blocks (ii) Derive a stochastic matrix ðNÞP by appropriately augmenting ðNÞ P (iii) Solve x ðNÞ · ðNÞP = x ðNÞ .
Then x ðNÞ might converge to the desired solution x under some appropriate conditions (a natural condition is some normalization, e.g., x ðNÞ ð0,1Þ = 1). In case of (1) with n = 1 and P ij only depending on j − i, level-independent quasi-birth-death processes arise. For n = 1 and arbitrary P ij , the quasi-birth-death process is level-dependent. Several publications have dealt with the problem of computing the invariant measures for this class of stochastic processes numerically: In [3] and in [4], algorithms based on taboo probabilities were developed. In [5], a method relying on matrix continued fractions was discussed. In principle, these three algorithms differ with respect to the question how to compensate the truncation of the infinite state space which is inevitable for algorithmic procedures, that is, how to augment ðNÞ P. The major part of the algorithmic procedure is similar in all algorithms, and as pointed out in [6], for large truncation levels, it can be used without any augmentation. Since ðNÞ P is not stochastic anymore, the last step of the above general method has to be interpreted appropriately: x ðNÞ shall solve x ðNÞ · ðNÞ P = x ðNÞ up to the first scalar equation. This simplified matrix-analytic algorithm becomes even more powerful when only stationary expectations have to be computed since then, the memory-efficient variant suggested in [7] can be applied.
Recently, Masuyama published a technical report [8] in which Markov chains (with a focus on continuous-time Markov chains) with a general block structure are discussed. Under somewhat restrictive conditions, it is shown that the above method (truncation, augmentation, finding a solution of the truncated and augmented system) converges to the desired invariant measure if the truncation level tends to ∞. Since Markov chains with general block structure are discussed, transition matrices of the form (1) or its continu-ous analogue ((20) below) can be interpreted as special cases. However, the focus in [8] is the discussion of augmentation procedures.
We will treat the problem of finding an invariant measure numerically with a different perspective: After the truncation, we perform no augmentation but find x ðNÞ in a similar manner as in the method for quasi-birth-death processes described in [7]. In some situations, not using an augmentation might result in requiring a higher truncation level than with an appropriate augmentation (although this is not clear). However, we discuss not only a basic algorithm for computing invariant measures but also the direct computation of stationary characteristics for which we introduce a memory-efficient implementation. Due to the memory-efficiency, our method can be applied with a very high truncation level.
In total, in some way, we combine the ideas of Neuts' original matrix-geometric approach with considerations for level-dependent quasi-birth-death processes (although the resulting transition structures are not as general as in [8]), and with memory-efficient algorithms as developed in [7] for quasi-birth-death processes.
The rest of the paper is organized as follows: We begin with a derivation of our results and methods based on taboo probabilities in Section 2 and extend the results to continuous-time Markov chains in Section 3. The resulting algorithm for computing invariant measures is presented in Section 4 whereby we also derive a memory-efficient implementation for computing stationary expectations. Section 5 is dedicated to an alternate approach. This derivation is based on generalized continued fractions or matrix continued fractions, respectively. Although these considerations do not result in new computational methods, they give us a better insight in the algebraic and numerical aspects of the approach. Finally, in Section 6, we demonstrate the applicability of our methods by briefly discussing a queueing model with batch service and server breakdowns.

A Derivation Based on Taboo Probabilities
According to [2,9,10], an invariant measure x = ðx ℓ Þ ℓ∈E of an irreducible recurrent Markov chain X with state space E and transition probability matrix P = ðP ij Þ i,j∈E is given by for ℓ ≠ ℓ 0 , where ℓ 0 is some fixed state and ℙðAÞ denotes the probability of event A. The summands on the righthand side are taboo probabilities. Irreducibility of P guarantees that the series converges for all ℓ ∈ E. x ℓ can be interpreted as the expected number of visits to state ℓ before the Markov chain returns to its initial state ℓ 0 . Note that x is the unique invariant measure of P with x ℓ 0 = 1.

Journal of Applied Mathematics
In the present case, we choose ℓ 0 = ð0, 1Þ and obtain for ði, jÞ ≠ ð0, 1Þ. By σ N = inf fk ∈ f1, 2, 3, ⋯g: X ð1Þ k ≥ Ng, we denote the first hitting time of X of some level ≥N.
Following [9,10], we have where h ðNÞ = ðh ðNÞ P denotes the northwest corner truncation of P at level N, I is an identity matrix of appropriate size, and e 1 is the vector with unity in the first position and zeroes elsewhere. The solution is unique, and with x for i ≤ N with ði, jÞ ≠ ð0, 1Þ. Obviously, x ðNÞ ði,jÞ increases monotically in N, and by definition, we have x i = lim N→∞ x ðNÞ i . Furthermore, we set τ i = inf fk ∈ f1, 2, 3, ⋯ : X ð1Þ k ≤ ig to be the first positive hitting time of some level ≤i (note that due to irreducibility and recurrence, τ i < ∞ almost surely), and we define the matrices Again, the entries of R i can be interpreted as expectations. In this case, conditioned on the initial state ði, jÞ, the entry at position ðj, j ′ Þ is the expected number of visits in ði + 1, j ′ Þ before returning to some level ≤i. Similarly, the entries of R ðNÞ i are expectations for the expected number of visits in ði + 1, j ′ Þ before returning to some level ≤i or hitting some ⟶ R i is a direct consequence of the monotone convergence theorem. We are now ready to find the representation of the invariant measures which will be exploited numerically later on. Theorem 1. With the structure (1) of P and the notations introduced above, we have In (12) and (13), the inverse matrices exist. In (11), the solution of the system of linear equations is unique up to constant multiples, and additionally, holds up to the first scalar equation.
Proof. The interpretations of R i are very similar to the levelindependent case (see [1]) and to quasi-birth-death processes (see [4,5]). Therefore, we do not state all proofs in detail.

Journal of Applied Mathematics
This proves (9) for i ≥ 1. For i = 0, we have an additional summand for k = 1, but on the other hand, for k ≥ 2, the sum regarding j starts with j = 2. It turns out that (9) still holds. (10) is proven analogously.
(ii) Using the structure (1) of P and (9) repetitively, it becomes clear that x 0 satisfies equation (11) which can be rewritten as x 0 = x 0 K 0 , where By a calculation similar to that for proving (9), we see that the entry of K i at position ðj, j ′ Þ is given by the probability that the Markov chain returns to level i before hitting a level <i and that this return is to ði, j ′ Þ. In particular, the entries of K 0 represent the probabilities that the Markov chain eventually returns to level 0 and that this return occurs in ð0, j ′ Þ. Irreducibility and recurrence of X imply stochasticity and irreducibility (and due to K 0 being finite, recurrence) of K 0 . Hence, x 0 is an invariant measure for K 0 which is uniquely defined up to a constant factor.
(iii) Alternatively, (11) can be shown by arguments similar to those leading to (9). These arguments also confirm that x ðNÞ 0 satisfies the corresponding equation up to the first scalar equation (iv) Additionally, define as the matrix of the expectations of the number of visits in ði, j′Þ before hitting some level <i. Then considerations similar to those yielding (9) lead to ðI The fact that R ðNÞ i converges monotonically increasing to R i entails that the R i are minimal nonnegative solutions to the equations which corresponds to similar results in [1]. However, we will not make use of this fact in Algorithms 1 and 2 which we suggest for computing invariant measures and stationary expectations.

Continuous-Time Markov Chains
For continuous-time Markov chains Y = ðY t Þ t≥0 with blockstructured generator matrix invariant measures satisfy yQ = 0. For the construction of a continuous-time Markov chain from its infinitesimal generator as well as for precise definitions of properties such as regularity, recurrence, and ergodicity, the reader is referred to [11,12]. Assuming regularity, irreducibility, and recurrence, the solution to yQ = 0 is unique up to constant multiples and strictly positive. Furthermore, these assumptions allow constructing the embedded jump chain which is a discretetime Markov chain X = ðX n Þ ∞ n=0 with transition probability matrix with structure (1) and Journal of Applied Mathematics the diagonal matrix with entries −q ði,jÞ,ði,jÞ at position ðj, jÞ. Irreducibility of Y implies −q ði,jÞ,ði,jÞ > 0 for all ði, jÞ ∈ E, and hence, D i is indeed invertible for all i ∈ f0, 1, 2, ⋯g. Furthermore, irreducibility and recurrence of Y imply irreducibility and recurrence of X and vice versa (note that this is not true for ergodicity). Finally, if x is invariant for P, an invariant measure y for Q can be constructed by setting y i = x i D −1 i . Renaming the matrices from Section 2 toR i , we have x ĩ R i = x i+1 for the invariant measure of the embedded jump chain. Hence, we obtain with (11), we obtain Taking into account thatR i satisfies (12), we arrive at Since the j ′ th diagonal entry of D −1 i+1 can be interpreted as the mean sojourn time in ði + 1, j′Þ, the entries ofR i D −1 i+1 are the expected time intervals spent in state ði + 1, j′Þ before returning to some level ≤i, conditioned on starting in ði, jÞ. Therefore, the entries of R i represent the proportion of the expected times spent in ði + 1, j ′ Þ before returning to some level ≤ i, and the time spent in the initial state ði, jÞ.
Summarizing, we get Theorem 2. With the above assumptions on Q , we have where the solution of the system of linear equations in (26) is unique up to constant multiples. The R i 's satisfy and can be uniquely constructed by up to the first scalar equation.

Algorithms
Next, we demonstrate how to use the results numerically. We focus on continuous-time Markov chains since the methods for discrete-time Markov chains follow an analogous scheme. Basically, we have to use (28) for computing approximations to R N−1 , ⋯, R 0 , then we determine an approximation for y 0 by solving (26) up to the first scalar equation, and finally, we use (24) for computing approximations to y 1 , ⋯, y N . In order to save operations, we suggest the following algorithm for computing an approximation to the invariant distribution.
In case of positive recurrence, an invariant distribution can be obtained afterwards by renormalizing y. We give some remarks: , respectively, in [8], an augmentation is performed first in order to obtain a stochastic or conservative matrix. Note that Theorem 1 guarantees that for N ⟶ ∞, we have convergence of the approximations x ðNÞ or y ðNÞ to the desired invariant measure without any augmentation. The only use of augmentations is that the truncation error might be decreased a bit for small values of the truncation level N. Instead of considering an augmentation, we suggest to choose N very large. If such a choice results in computational problems, we suggest to use the memory-efficient algorithm 5 Journal of Applied Mathematics for computing stationary expectations (see below). Furthermore, note that the results in [8] were derived under different conditions: While P in ≠ 0 is allowed for all i ≤ n + 1 (lower-block Hessenberg matrices), some restrictive conditions (Assumptions 2.1 and 2.2 in [8]) are assumed (ii) A major problem is the choice of the truncation level N. There are some approaches concerning a priori estimates of the truncation error (e.g., [13]), but still the problem is not completely solved.
Instead of using such error estimates, alternatively, as in [8], we might increase the truncation level iteratively, that is, we start with some small N and increase N by 1 in each step until jjy ðNÞ − y ðN−1Þ jj < ε for some fixed norm jj·jj and some fixed small ε > 0. Note that for each new value of N, we have to restart Algorithm 1. Furthermore, note that a small difference jjy ðNÞ − y ðN−1Þ jj does not imply that jjy ðNÞ − yjj is small where y is the true invariant measure (iii) In many applications, the state space is finite with a transition structure corresponding to the northwest corner of (1) or (20), respectively. Then Algorithm 1 is exact up to numerical deviations (iv) Instead of computing the inverse matrix H −1 i+1 first and then perform the matrix multiplication for In addition, we have to perform N · n matrix multiplications. The computational effort is approximately γ 1 Nd 3 + γ 2 Nnd α with some constants γ 1 , γ 2 and α ∈ ð2, 3 (the exponent α and the constants γ 1 , γ 2 depend on the method of choice, we have α = 3 for "conventional matrix multiplication" and α = 2:807 for Strassen's algorithm, see [14]) (viii) For n = 1, Algorithm 1 more or less coincides with the methods proposed in [3][4][5]. One might expect that also in general, a discrete-time Markov chain with transition probability matrix (1) and bandwidth >3 or a continuous-time Markov chain with generator matrix (20) and bandwidth >3, respectively, may be reduced to the block-tridiagonal case by appropriately rearranging the states, i.e., define a new level 0 as the union of the original levels 0, ⋯, n − 1and define a new level 1 as the union of the original levels n, ⋯, 2n − 1, …. But the increased block size heavily impacts the performance of the algorithm. If we apply the same truncation procedure, the truncation level for the modified matrix structure is ≈N/n, and for d i = d, all matrices involved have the dimension ðndÞ × ðndÞ. This results in a computational effort of approximately γ 1 ðN/nÞðndÞ 3 + γ 2 ðN/nÞ Since α > 2, the computational effort for the transformed system becomes significantly larger than the effort for the original system. Similarly, the memory requirement of Nd 2 is replaced by ðN/nÞ · ðndÞ 2 = Nnd 2 .
In applications, often only stationary expectations have to be considered: Consider the case of positive recurrence, and denote by π the invariant distribution. Then the ergodic theorem for discrete-time Markov chains guarantees that Choose N large if πf converges absolutely. Now, let f = ðf 1 , ⋯, f L , f L+1 Þ: E ⟶ ℝ L+1 be a vector valued function with f L+1 ðℓÞ = 1 for all ℓ ∈ E. Then for continuous-time Markov chains, we have πf ℓ = yf ℓ /yf L+1 for ℓ = 1, ⋯, L. Hence, computing yf allows to determine the stationary characteristics πf ℓ which coincide with the long-run costs or rewards measured by f ℓ . When computing yf , we can apply a Horner-type scheme which allows us not to store the matrices R i and therefore save quite a lot memory requirement. This idea is very similar to that in [7]. With we write Obviously, we have Z ðNÞ N = f ðNÞ and The matrices R (i) The problem of finding an appropriate truncation level N to keep the error in prescribed limits is still to be solved. In [16], a generalization of the idea from [13] is discussed, and ∑ N i=0 y i f ðiÞ and ∑ ∞ i=0 y i f ðiÞ are compared. Unfortunately, the method is quite cumbersome to apply, especially because it would be more desirable to compare   Journal of Applied Mathematics (iv) In total, we have to store the matrices H i and Z.
Since we need n matrices H i at the same time, the total amount of memory requirement is Oðnd 2 Þ if the matrix dimensions d i = d are constant. In particular, the memory requirement does not depend directly on the truncation level N (v) As pointed out above, in ( [8], section 3.2), lowerblock Hessenberg matrices (block GI/M/1-type matrices) are considered, that is, P ij = 0 for j > i + 1, but P ij ≠ 0 is allowed for all j < i. All our theoretical considerations carry over (we only have to replace n by ∞ or n by N at some points). Unfortunately, when applying Algorithm 2 to such block matrices, we still have to store N matrices, and hence, it is not memory-efficient anymore. Since the memory-efficient implementation makes the theoretical considerations applicable to a wide range of stochastic models, we have restricted ourselves to considering band-block-structured matrices (vi) The considerations concerning the computational effort are similar to those for Algorithm 1. In particular, we do not recommend to reduce the structure to the case n = 1 (vii) Note that often, "natural" state spaces are multidimensional. However, if there is a partition E = E 0 ∪ E 1 ∪ E 2 ∪ ⋯ into finite levels E i such that transitions with positive probability/transition rate increase the level E i at most by 1 and decrease it at most by n, an appropriate renumbering of the states results in a structure given by (1) or (20). We refer to the end of Section 6 for an example and to [7,13,17] for the corresponding discussion concerning quasi-birth-death processes (that is, n = 1).

Relationship to Matrix Continued Fractions
This section is devoted to the interdependence between Markov chains and matrix continued fractions. The results obtained are of importance not only from an inner mathematical but also from a numerical point of view. Matrix continued fractions are generalizations of ordinary continued fractions and generalized continued fractions (or n-fractions, see [18,19]). We will show that with each transition matrix P of the form (1), one can associate a matrix continued fraction (or a matrix-driven Jacobi-Perron algorithm) being convergent if X is irreducible and recurrent. In addition, it is proven that its approximants and its limits coincide with the coefficients of the recurrence relations (10) and (9), respectively. A milestone in pure mathematics has been the arithmetic characterization of quadratic irrationalities which has been discovered by J. L. Lagrange (see [20]). Lagrange's theorem states that the expansion of a real number as a simple continued fraction is periodic if and only if it is the real root of a quadratic equation in one variable with rational coefficients. Since then, mathematicians like C. G. J. Jacobi [21], O. Perron [22], P. Bachmann [23], L. Bernstein [24], and F. Schweiger [25] tried to expand Lagrange's approach to higher degree irrationalities. G. C. F. Jacobi and O. Perron developed a tool for approximating real algebraic irrationalities of degree n ≥ 3. To the honor of Jacobi and Perron, this algorithm has been named as the Jacobi-Perron algorithm. Although the Jacobi-Perron algorithm does not lead to the desired results, it has been successfully addressed to a lots of other unsolved problems in approximation theory and numerical analysis. In particular, in connection with modern computer technology, the Jacobi-Perron algorithm turns out to be an efficient tool for identifying and calculating subdominant solutions of higher order linear vector recursions.
In the sequel, we will show that the schemes (10) and (13) are equivalent to a specific matrix continued fraction or a matrix-driven Jacobi-Perron algorithm associated with the transition probability matrix (1). For this purpose, we assume that d i = d for all i = 0, 1, 2, ⋯ and that P i,i+1 is invertible for all i = 0, 1, 2, ⋯. Define where I denotes the unit matrix. Then (36) and (2) yield the ðn + 1Þth-order homogeneous linear vector recursion subject to the boundary condition x 0 P 00 − I ð Þ+ x 1 P 10 +⋯+x n P n0 = 0: In this perspective, the problem at hand is to find the "right" initial values, say x l , x l+1 , ⋯, x l+n , l sufficiently large, such that x i > 0 for all i. In order to find a solution to this problem based on generalized continued fractions, we generalize and combine the results from [5,26].
Remember that where ðNÞ h = ð ðNÞ h 0 , ⋯, ðNÞ h N Þ solves the truncated system (6). (36) and (6) lead to the finite difference scheme coupled to the boundary conditions: Journal of Applied Mathematics where e 1 is a 1 × m vector with unity in the first position and zeroes elsewhere. For simplification we transform the system (40) into a first order scheme of the form by putting Now, let ℓ ≤ N be fixed. By repeated substitutions, we obtain from (43) with For the remainder, it will be convenient to partition Φ ℓ ðiÞ as where the matrices Φ rs ℓ,i are of dimension d × d. In view of the boundary condition (42), we infer from (45) that For computing the d × d-matrices Φ 0k ℓ,N−ℓ , the following result reduces the computational effort.
where Φ 00 ℓ,−j = I for j = 0, Proof. From (46), it is clear that the matrices Φ ℓ ðiÞ can be recursively computed by with initial condition Rewriting (52) in expanded form we get By repeated substitutions, it is readily seen that the matrices Φ 0k ℓ,i satisfy the recurrence relation (50) at least for i = n + 1, n + 2, ⋯. To proceed in this manner we assume, for the moment, that the recurrence relation (50) may be continued for i = 0, −1, ⋯, −n + 1, without affecting the initial condition (53) and the scheme (54). Since the matrices AðℓÞ, ⋯, Að−n + 1 + ℓÞ do not appear in formula (46), they can be chosen arbitrarily. By putting 9 Journal of Applied Mathematics To make (48) congruent with (9) and (10), respectively, we first prove that det ðΦ 00 i−1,N−ði−1Þ Þ ≠ 0 for almost all N under a mild condition on P.
Consider the recurrence relation (40) for i = ℓ + 1, ℓ + 2, ⋯, N. Assembling the unknowns ðNÞ h ℓ , ðNÞ h ℓ+1 , ⋯, ðNÞ h N into an 1 × m · ðN − ℓ + 1Þ vector and setting the boundaries where Since we assumed det ðP i,i+1 Þ ≠ 0 for all i, W is a regular matrix. For its inverse, we choose the partition where U is a square matrix of order d. From (48), we know that The matrix U is called the algebraic complement of the matrix C. A theorem which is due to Jacobi (see ( [27], pp. 123-124, formula (4.45))) states that det U ≠ 0 iff det C ≠ 0. Since we assumed det ðP N,N+1 Þ ≠ 0, C is weakly diagonally dominant. To guarantee that C is regular, it is therefore sufficient to assume that C is irreducible (see ( [28], p. 531, Theorem 3)).
From (48), we conclude that where From (63) and (41), we obtain the decoupled recursion and the boundary condition which enable us to compute ðNÞ h ℓ for ℓ = 0, 1, ⋯, N. In view of the representations (49), (50), and (51), we make use of the notation  2   6  6  6  6  6  6  6  6  6  4   3   7  7  7  7  7  7  7  7  7  5 : ð67Þ (67) can be interpreted as the Nth convergent of the generalized matrix continued fraction (GCF) or the matrix-driven Jacobi-Perron algorithm associated with the limiting process compare with [22,29,30]. If the limits (68) do exist, we obtain as a consequence of (65), (39), and (38). For the calculation of R ðNÞ ℓ,ℓ+k and R ℓ,ℓ+k , we recommend the following algorithms: (b) Backward algorithm (BA): since a matrix-driven Jacobi-Perron algorithm may be interpreted as a special case of a matrix continued fraction, its Nth convergent R ðNÞ ℓ,ℓ+k = ðΦ 00 ℓ,N−ℓ Þ −1 Φ 0k ℓ,N−ℓ , k = 1, ⋯, n, can be calculated from a linear fractional transformation (see ([29], Theorem 2.2)). Restricted to the matrixdriven Jacobi-Perron algorithm, this algorithm runs as follows. For N sufficiently large, define Then From a numerical point of view, BA requires less computational effort than FA. The only disadvantage of BA is that for each convergent, the whole algorithm has to be repeated which can be time-consuming if the algorithm converges slowly. Since we are not interested in a particular R i,i+k but in the family of matrices R i,i+k , we take BA for the preferable algorithm in all situations.
It is easy to see that the recurrence relations (71)-(73) are equivalent to (12) and (13) with R ℓ,ℓ+k = R ℓ · R ℓ+1 · ⋯ · R ℓ+k−1 . Hence, the derivation based on continued fractions leads to the same results as the derivation based on taboo probabilities.
Observe that the derivation based on taboo probabilities still works if det ðP i,i+1 Þ ≠ 0 for some i, whereas the derivation based on matrix continued fractions fails. Nevertheless, the derivation based on matrix continued fractions enables us to detect that (65) is a decoupled version of the difference equation (40) which is important with respect to numerical applications. In [31], it was shown that for n = 1 and d = 1, the difference equation (37) is subject to numerical instability, because the invariant measure x is asymptotically dominated by other solutions of the difference equation (see [32]). In addition, it was shown that the continued fraction algorithms (65) and (71) result in a decoupled recursion for the computation of x in which the dominant solutions of (37) do not appear. But however, we cannot assure that in general, the decoupled recursion (65) is kept free from all sources of numerical instability. To take this issue forward, we need tailored Pringsheim-and Pincherle-type convergence criteria for matrix continued fractions, compare with [29].
Remark 4. The existence of the limits (68) has been shown in Theorem 1 by probabilistic arguments. To prove (68) in a purely analytic manner, further research is required. The reason is the following: Obviously, the convergence of the matrix-driven Jacobi-Perron algorithm is implied by the diagonal dominance of the transition probability matrix P. In general, there exists a great variety of convergence criteria for ordinary continued fractions and their generalizations which appeal to the diagonal dominance of their coefficients (the so-called Pringsheim criteria). But in the case of the matrix-driven Jacobi-Perron algorithm, these results are only available in terms of norm-based criteria, which are too restrictive to imply convergence in the present case. In so far, it may be worth to study the relationship between Markov chains and approximation theory in more detail in order to discover alternate techniques for detecting convergence of the matrix-driven Jacobi-Perron algorithm. Notice that the approximants of the invariant measure x of (1) are entirely determined by the finite difference schemes (65)-(67). In so far, the analytic proof of the existence of the limits (68) is mainly of theoretical interest.

Numerical Example
Consider a queueing system consisting of a single server which is subject to breakdowns. The up time of the system is exponentially distributed with parameter α > 0. The down time is assumed to be Erlang distributed with shape parameter K ∈ ℕ and rate parameter Kβ for some β > 0 (implying that the mean down time is 1/β). Customers arrive according to a Poisson stream with rate λ > 0. By κ i > 0, we denote the probability that an arriving customer joins the queue when i customers are present and the server is active. Analogously, θ i ≥ 0 shall denote the probability that an arriving customer joins the queue when i customers are present and the server is down. Notice that it could make sense to assume θ i = 0, that is, customers will not join the queue when the system is interrupted. When less than n customers are present, no service is provided. Otherwise, n customers are served at a time. The service time of each batch is assumed to be exponentially distributed with parameter μ > 0. In addition, we assume that the arrival process, the service process, and the breakdown mechanism are stochastically independent. The model under consideration is an extension of the usual M/M ðn,nÞ /1 queue, which is described in more detail in [33].
Remember that an Erlang distributed random variable may be interpreted as a phase-type distributed random variable [1] with distribution function Erlang distributions are frequently used when modelling random variables with a small variance or even approximating deterministic behaviour. This is due to the fact that the variance of the Erlang distribution with parameters K and Kβ is 1/Kβ 2 , and this is indeed the smallest variance which can occur for a phase-type distribution with order K and expectation 1/β (see [34] for more details).
Due to the lack-of-memory property of the exponential distribution and the independence assumptions, the system under consideration is characterized by a two-dimensional continuous-time Markov chain Y = ðY t Þ t≥0 with Y t = ðN t , U t Þ where (i) N t is the number of customers in the system at time t (ii) U t = j ∈ f1, ⋯, Kg indicates that the server is down and the down time is in phase j at time t (iii) U t = K + 1 indicates that the server is up at time t.
Hence, the underlying state space is E = f0, 1, 2, ⋯g × f1, 2, ⋯, K + 1g, and we obtain a generator matrix of structure (20) with 12 Journal of Applied Mathematics A sufficient condition for Y to be regular and ergodic is This statement can be deduced from Tweedie's criterion ( [35], Theorem 2.3). The condition is to find a nonnegative sequence ξ ði,jÞ and an integer N > 0 satisfying ξ ði,jÞ ⟶ ∞ for i + j ⟶ ∞ and i ≥ n, i + j ≥ N, and some ε > 0: A simple check shows that satisfies this set of inequalities as long as (76) holds.
Next, we focus on the calculation of the following performance values: (i) The expected stationary number E½N ∞ of customers in the system is given by where π i = ðπ ði,1Þ , ⋯, π ði,K+1Þ Þ (ii) The corresponding variance is calculated according (iii) The stationary probability that no service is provided despite ≥n customers waiting is given bỹ where 13 Journal of Applied Mathematics (iv) The stationary probability of the system being down is The latter probability can be computed exactly as 1/β/ð 1/α + 1/βÞ = α/ðα + βÞ. Therefore, the numerical computation of this probability only helps us to check the implementation.
For calculating all these values, we apply Algorithm 2 with L = 5 and where 1 A ðxÞ = 1 for x ∈ A and 1 A ðxÞ = 0 for x ∉ A. After applying the algorithm, the variance can be deduced from the first two entries of the resulting vector. Similarly, the conditional probabilityp = ℙðU ∞ ≤ K | N ∞ ≥ nÞ can be obtained from the third and the forth entry.
In Table 4, we fix λ = μ = 1:0 and α = β = 0:1 and vary the coefficient of variation of the down time (i.e., CV = 1/ ffiffiffi ffi K p ) from 1 to 1/ ffiffiffiffiffi 20 p . The usual M/G/1 queue with service interruptions has been investigated in great detail by Gaver [36] and Federgruen and Green [37]. Their explicit formulas tell us that the mean number of customers in the system and the variance of the number of customers in the system are strongly affected by the arrival rate (i.e., λ), the ratio of the mean repair time and the mean service time (i.e., μ/β), and the   Obviously, the same is true for the state dependent bulk service queue with unreliable server (see Tables 1 and 2). The conditional probabilityp may be interpreted as a certain measure for the non-value added time of the server. The data in Table 3 suggest that for λ ⟶ ∞, convergence is to α/ðα + βÞ = ℙðU ∞ ≤ KÞ. This can be deduced from the law of total probability which states Since ℙðN ∞ ≥ nÞ ⟶ 1 for λ ⟶ ∞, we conclude from (85) thatp ⟶ ℙðU ∞ ≤ KÞ = α/ðα + βÞ as λ ⟶ ∞. So it is not surprising that α and β are the main influencing factors, whereas the arrival rate λ has only little effect onp which may be explained as follows: Due to its state dependence, the arrival stream in our example is massively dampened when the number of customers grows. In other words, in spite of an excessive arrival rate (i.e., λ > μ), the system always reaches a comparatively moderate and balanced state.
Finally, Table 4 reveals that the coefficient of variation of the down time has a relatively small influence on the main characteristics of the system.
It should be mentioned that our model allows the specification of more complex interarrival, service, and up and down times by fitting phase-type distributions to empirical data or by approximating general distributions by phasetype distributions. Since it is only our intention to demonstrate the practical suitability of our algorithm, we do not deepen the discussion of the numerical results.
Obviously, a lot of other queueing situations can be taken into account within our model. In [17], a tandem queue was analysed in which (i) customers arrive at the first station according to a Markovian arrival process (ii) customers served at the first station are transferred to the second station (iii) customers served there leave the system (iv) there are c 1 and c 2 servers at both stations, respectively (v) at both stations, the service time is phase-type distributed.
As pointed out in [17], the state space of the underlying queueing process consists of the number of waiting customers (at the first station and at the second station), the number of servers serving in the various phases of the phase-type distribution (at the first station and at the second station), and the environmental state for the Markovian arrival process. But by appropriately renumbering the states, it is seen that the process may be transformed into a quasi-birth-death process having a generator matrix of the form (20) with n = 1.
Next, consider a batch service at both stations with a maximum batch size n at the second station. Then departures at the second station decrease the total number of customers in the system by n. Hence, a structure of type (20) with nonconstant block-dimensions comes into play. Since the state space remains finite, algorithms 1 and 2 are exact up to numerical errors. Similarly, many other queueing models may be solved in this manner.

Conclusion and Further Research
In this paper, we have suggested methods for computing invariant measures and stationary expectations for Markov chains with a band-block-structured transition probability matrix or generator matrix, respectively. We have presented a probabilistic derivation as well as a derivation based on generalized matrix-valued continued fractions. Although the latter derivation only works under some restrictive assumptions, we have gained some additional numerical and algebraic insights. From a practical point of view, we want to emphasize the benefits of Algorithm 2. This method generalizes the established algorithm for computing stationary expectations for quasi-birth-death processes [7], and in situations where we are only interested in stationary expectations (what should be the case in most practical applications), it is extremely efficient with respect to memory requirement.
As pointed out in the remarks concerning the Algorithms 1 and 2, all considerations but the memory-efficient computation of stationary expectations can be easily extended to lower-block Hessenberg matrices. Unfortunately, for general lower-block Hessenberg matrices (and large truncation level N), the memory requirement will be crucial. Hence, further research could concern memory-efficient use of the lowerblock Hessenberg transition structure (skip-free to the right). Other structures being of interest in practical applications are upper-block Hessenberg matrices or general block-band matrices with upper bandwith >1. Another interesting point concerns the choice of the truncation level N and the derivation of precise error bounds.

Data Availability
No data were used to support this study.

Conflicts of Interest
The authors declare that they have no conflicts of interest.