Finite-State-Space Truncations for Infinite Quasi-Birth-Death Processes

the


Introduction
In many applications, discrete-time or continuous-time Markov chains are used to model real-life problems. For these models, characteristics of interest can be determined, enabling us to understand relationships between parameters and characteristics, solve optimization problems, and so on. When Markov models are used, the characterization of the dynamics by its probability transition matrix P or generator matrix Q, respectively, is quite simple due to all theoretical problems being solved. However, in applications, the determination of interesting characteristics of the process is very important. In many models, there is no chance of finding explicit analytical representations of these characteristics, and hence, we have to use numerical calculations or simulation methods. In most practical situations, interesting characteristics refer to stationary probabilities or stationary expectations which can be derived from the invariant distribution. In queueing theory, elementary examples are given by moments of the stationary number of customers in a queueing system or queueing network, or by the stationary probability that the number of customers exceeds some threshold. Similarly, such characteristics are interesting in Markovian population models, epidemic models, etc. Formally, stationary expectations are given by πf = ∑ xϵE π x f ðxÞ, where E is the state space of the Markov chain, π is the invariant distribution, and f is some cost or reward function.
The invariant distribution for an irreducible and positive recurrent Markov chain is characterized by the unique probability vector which solves a certain homogeneous system of linear equations. The number of states coincides with the number of equations and the number of variables. Unfortunately, in realistic models, we have a very large number of states or even infinitely many states. In these situations, numerically computing stationary probabilities or stationary expectations requires a truncation of the system of equations (which means a truncation of the state space), and this truncation results in inevitable errors.
Hence, numerical calculations only compute some approximation A to the desired characteristic πf , and from both a mathematical and a practical point of views, bounds on the truncation error πf − A are interesting. In this paper, we will derive lower and upper bounds on πf which is equivalent to computing an approximation to πf and bounds on πf − A. Since all values are computed numerically, our results and methods provide a posteriori error estimates.
Competing methods for finding a posteriori error estimates were given in [1][2][3], and an approach for finding a priori estimates can be found in [4,5]. In Section 8, we will briefly compare these methods to the method suggested in this paper.
Although our theoretical results will be quite general, we will only discuss the efficient implementation of the computation of the bounds for level-dependent quasi-birth-death processes (LDQBDs). These processes are characterized by the block-tridiagonal structure of the transition probability matrix (discrete-time) or generator matrix (continuous time). This structure is very typical for a large class of queueing models, population and epidemic models, etc.
The rest of the paper is organized as follows: After introducing some notation in Section 2, we will present our theoretical results on lower and upper bounds on stationary expectations πf in Section 3. In Section 4, we will derive a method for computing these bounds efficiently for LDQBDs. Afterwards, we will apply our method to the elementary M/ M/1 queue (Section 5), a variant of the M/PH/1 queue (Section 6), and a popular retrial queueing model (Section 7). Whereas we have an explicit analytical representation of the invariant distribution and of stationary characteristics in the first example, numerical computations are the only chance to find stationary expectations in the latter two examples. Finally, we give a comparison of the suggested method to competing ones (Section 8).

Preliminaries
2.1. Notations and an Auxiliary Result. In order to introduce our goals formally, we review some well-known facts on the limit behaviour of Markov chains. In this context, and in the whole paper, we will use the following notations: (i) I is a finite or infinite identity matrix. The dimension will become clear from the context (ii) 1 denotes a finite or infinite (column) vector with all entries being 1. Alternatively, 1 refers to a function with constant value 1. The meaning will be clear from the context (iii) 1 A denotes the indicator function of set A, that is, 1 A ðxÞ = 1 if x ∈ A, and 1 A ðxÞ = 0 otherwise. With slight abuse of notation, we use 1 B for Boolean expressions B. If B is true, 1 B is 1, and otherwise, 1 B is 0 (iv) id is the identity function (v) We use the notation ℕ = f1, 2, 3, ⋯g and ℕ 0 = f0, 1, 2, 3, ⋯g.
Furthermore, we will use a probabilistic proof of invertibility frequently throughout the paper: If P is finite and transient, I − P is invertible. We give some details: A finite substochastic matrix P = ðp ij Þ m i,j=1 is referred to as transient if lim n→∞ P n ⟶ 0 (component-wise). This is true if and only if P contains no stochastic submatrix ðp ij Þ i,j∈F for some F ⊂ f1, ⋯, mg. Hence, P is transient if and only if, for all i ∈ f1, ⋯, mg, there are i 0 , i 1 , ⋯, i k ∈ f1, ⋯, mg with i 0 = i, p i r−1 ,i r > 0 for r = 1, ⋯, k, and ∑ m j=1 p i k ,j < 1. Due to P being finite, P n ⟶ 0 entails jλj < 1 for all eigenvalues of P, and hence, ∑ ∞ n=0 P n converges, and by a standard argument, we obtain which means that I − P is invertible.

Results on the Limit Behaviour of the Markov Chains.
We briefly summarize some results on the limit behaviour of Markov chains since these results are the reason why it is important to be able to compute (approximations to) stationary expectations πf . We consider Markov chains ðX n Þ nϵℕ 0 in discrete time or Markov chains ðY t Þ t≥0 in continuous time with countable state space E. Throughout the paper, we assume that ðX n Þ n∈ℕ 0 is irreducible and recurrent with transition probability matrix P = ðp xy Þ x,y∈E or that ðY t Þ t≥0 is nonexplosive, irreducible and recurrent with generator matrix Q = ðq xy Þ x,y∈E . Then, there is an invariant measure ψ, that is, a positive vector ψ with ψP = ψ or ψQ = 0, respectively. In both cases, ψ is unique up to constant multiples. For f : E ⟶ ℝ, we introduce the notation ψf ≔ ∑ xϵE ψ x f ðxÞ. If ψf and ψg converge absolutely with ψg ≠ 0 for functions f , g : E ⟶ ℝ, a fundamental result on the limit behaviour of Markov chains is in the sense of almost sure convergence (see [6], pp. 85-86, 203-209). If we have the stronger assumption of positive recurrence, the sum of the entries of ψ is finite. Hence, by multiplying with an appropriate constant, we obtain the uniquely determined invariant distribution π which still is an invariant measure, but additionally satisfies ∑ xϵE π x = 1. Then, we can set ψ = π and g = 1 constant, and obtain almost surely if πf converges absolutely. If f measures costs or rewards that are associated with the time spent in state x, πf is the long-run average of costs or rewards per time unit. In particular, for A ⊂ E, we can specify and π1 A is the long-run average of the proportion of time spent in set A. For A = fxg, π1 fxg = π x is the long-run average of the proportion of time spent in state x. Note that there is another interpretation of the entries of the invariant distribution π: For irreducible and positive recurrent discrete-time Markov chains which are additionally aperiodic, X n converges to some random variable X ∞ in distribution, and for nonexplosive, irreducible, and positive recurrent continuous-time Markov chains, Y t converges to some random variable Y ∞ in distribution. Then, π is the distribution of X ∞ or Y ∞ (see any textbook on Markov chains, e.g., [6,7]), that is, ℙðX ∞ = xÞ = π x and ℙðY ∞ = xÞ = π x . Hence, if π f converges absolutely for some function f : Due to these results on the limit behaviour, in many applications, all characteristics of interest can be deduced from ψf ðℓÞ for functions f ð0Þ , ⋯, f ðLÞ , where we will set f ð0Þ = 1 in case of positive recurrence. Furthermore, by splitting functions into positive and negative part, we can assume f ðℓÞ ≥ 0 for all ℓ = 0, ⋯, L without restriction. We remark that ψf is vector valued, ψf = ψf ð0Þ , ⋯, ψf ðLÞ .

Notations for Block-Structured Markov
Chains. Throughout this paper, let the state space E = E 0 ∪ E 1 ∪ E 2 ∪ ⋯ be partitioned into finite and pairwise disjoint sets E i which will be referred to as levels. We introduce some notation: (i) We fix K ∈ ℕ 0 . K will be chosen such that the fmodulated drift condition (see below) holds on E N+1 ∪ E N+2 ∪ ⋯. Furthermore, throughout this paper, x 0 ∈ E K is some arbitrary, but fixed state within level E K (ii) ψ = ðψ x Þ x∈E denotes the invariant measure with ψ x 0 = 1. In case of positive recurrence, π = ðπ x Þ x∈E is the invariant distribution (iii) For i, j ∈ ℕ 0 , P ij or Q ij , respectively, shall denote the transition probabilities or transition rates, respectively, for transitions from states ∈E i to states ∈E j .
Later on, multidimensional functions f will be useful for computing approximations to several stationary expectations at the same time. With this notation, we have ψf = ∑ ∞ i=0 ψ i f i , and we are interested in computing approximations to ψ i and use them for approximating ψf , or we want to find an approximation to ψf directly by other means. As in other approaches for finding error bounds on the approximation of stationary characteristics, we assume that an f -modulated drift condition is satisfied, that is, In scalar notation, (6) reads as ∑ y∈E K+1 ∪E K+2 ∪⋯ q xy gðyÞ ≤ −f ðxÞ for xϵE K+1 ∪ E K+2 ∪ ⋯. f -modulated drift conditions are a popular tool to prove positive recurrence ðfor f ðxÞ = 1Þ and convergence of ψf or πf . In our examples in Sections 5-7, we will use this tool, too. For details, we refer to Theorem 1 and Theorem 2 in [8]. Note that the idea of making use of f -modulated drift conditions for this purpose is much older (see, e.g., [9][10][11][12][13], but some of these classical references formally require f ðxÞ = 1, whereas the results in [8] only require f ðxÞ ≥ 0.
Finally, throughout the paper, we will require that We will comment this structural requirement in Section 8.

Upper and Lower Bounds on ψf : Theoretical Results
Our goal is to find a lower bound and an upper bound on ψf such that (i) Both bounds converge to ψf as the truncation level N tends to ∞ (ii) Both bounds can be computed numerically in an efficient manner.
In this section, we focus on the "mathematical" criteria, that is, we derive exact terms for the lower and upper bounds in which both converge to ψf . The latter goal of computational efficiency will be considered in Section 4 for the special case of QBDs.
First, we focus on the discrete-time setting. We use the notation for block-structured Markov chains as introduced above, and we begin with a representation of invariant measures and their approximations. For this purpose, we introduce the hitting times σðNÞ = inf fn ∈ ℕ : X n ∈ E K ∪ E N+1 ∪ E N+2 ∪⋯g, τ = inf fn ∈ ℕ : X n ∈ E K g, and τ 0 = inf fn ∈ ℕ : X n = x 0 g. Lemma 1. Let ðX n Þ n∈ℕ 0 be an irreducible recurrent discretetime Markov chain with block-structured transition probability matrix P as in the preliminaries, let K, N ∈ ℕ 0 with K ≤ N and x 0 ∈ E K . Let and

Journal of Applied Mathematics
(a) S i ðNÞ is well defined, and we have TðNÞ and all S i ðNÞ increase monotonically in N.
Proof. We refer to the representations of S i ðNÞ, TðNÞ, S i , and T as "probabilistic interpretations." (a) The definition of S i ðNÞ implies Since Þ is invertible, and hence, S i ðNÞ is well defined for i = 0, ⋯, K − 1, K + 1, ⋯, N. Afterwards, the definition of S i ðNÞ for i > N is explicit. We omit the proof of the probabilistic interpretation since such considerations are standard in the context of finding probabilistic interpretations for invariant measures (see, e.g., [7,14]), and since these considerations are similar to those in the proof of Lemma 3.2b) or Lemma 4 below. Note that the probabilistic interpretation directly implies the monotonicity since σðNÞ increases monotonically.
(b) More precisely, σðNÞ converges to τ almost surely. By monotone convergence, the probabilistic interpretation of S i ð∞Þ and Tð∞Þ follows. Note that these expectations are finite due to recurrence. S i ð∞Þ = ∑ ∞ i=0 S i ð∞ÞP ij for j ≠ K and Tð∞Þ = ∑ ∞ i=0 S i ð∞ÞP iK can be proved by means of the probabilistic interpretations. Again, we omit all further details.
(d) The statement is a direct consequence of ψf = ∑ ∞ i=0 ψ i f i and the representation of ψ i found in (c).
Both ψ K and F depend on P ij for all i, j ∈ ℕ 0 . Our goal is to find bounds both on ψ K and F which can be computed from S 0 ðNÞ, ⋯, S N ðNÞ, since these matrices only depend on the finite matrix PðNÞ = ðP ij Þ N i,j=0 . We begin with bounds on ψ K .

Lemma 2. Let all requirements of Lemma 1 hold and choose
(a) Such a matrix A exists, and for two such matrices A,Ã, we haveÃ = DA with an invertible diagonal matrix D.
(b) For any such choice of A, we have Journal of Applied Mathematics (c) Let φ x ðNÞ = φ x and φ x ðNÞ = φ x be defined as in (b). Then φ x ðNÞ increases monotonically in N, φ x ðNÞ decreases monotonically in N, and both converge to ψ x for all x ∈ E K .
Proof. We adapt ideas for proving classical bounds on quotients of entries of the invariant measure (see, e.g., [15]). (b) First, we note that φ x and φ x do not depend on the choice of A since a left-hand multiplication by some diagonal matrix has no impact on the quotients under consideration. Now, we will directly construct a matrix A with a xx = 1 for all x ∈ E K . With TðNÞ = ðt xy Þ x,y∈E K and Tð∞Þ = ðt xy ð∞ÞÞ x,y∈E K , this works as follows: We set Additionally, we set a xx = g xx = 1. Then for all, x ∈ E K , ða xy Þy ∈ E K is the standard construction (see, e.g., [7,14]) of the minimal subinvariant measure for the substochastic matrix T subject to a xx = 1. Indeed, for y ≠ x. Hence, AðI − TðNÞÞ is a diagonal matrix. Analogously, the rows of G = ðg xy Þ x,y∈E K are subinvariant measures for Tð∞Þ. This latter matrix is finite and stochastic, and hence, every subinvariant measure is invariant. In particular, every row of G is a constant multiple of ψ K , that is, for all x, y ∈ E K . Due to t xy ≤ t xy ð∞Þ, we have λ ðnÞ xy ≤ μ ðnÞ xy by a trivial induction, and thus, a xy ≤ g xy . From a xx = 1, we obtain for all x, y ∈ E K . By setting x = x 0 , we obtain φ y ≤ ψ y (due to ψ x 0 = 1), and by setting y = x 0 , we obtain 1/ φ x ≤ 1/ψ x .
By setting y = x 0 or x = x 0 , respectively, the statement follows.
Lemma 2 gives a lower and an upper bound on all entries of ψ K . It remains to find bounds on F. A lower bound is a direct consequence of Lemma 1.
(a) We have F ðNÞ ≤ F.
(b) F ðNÞ increases monotonically in N with limit F.
As pointed out above, for finding an upper bound, we have to use some information on p xy with x ∈ E N+1 ∪ E N+2 ∪ ⋯ or y ∈ E N+1 ∪ E N+2 ∪ ⋯. This is done by using the f -modulated drift condition (5). Note that the function f is given, and the function g solving the drift condition can often be found by "educationally guessing" even if there is no chance to find explicit analytical terms for ψ or ψf .
where for y ∉ <E K implying For y ∈ E K , we have and together, we obtain Analogously, we find Hence, it remains to show E½∑ τ−1 n=0 f ðℓÞ ðX n ÞjX 0 = x ≤ E ½∑ σðNÞ−1 n=0 f ðℓÞ ðX n ÞjX 0 = x for all x ∈ E K . For this purpose, we first write Let g ðℓÞ ∞ ðxÞ = E½∑ τ−1 n=0 f ðℓÞ ðX n Þ | X 0 = x. By means of the tower property of conditional expectations, we find Together, we obtain for the entries of F. Below, we will show g ðℓÞ ∞ ðxÞ ≤ g ðℓÞ ðxÞ for x ∈ E K+1 ∪ E K+2 ∪ ⋯, and due to f ðxÞ = f ðxÞ + ∑ y∈E N+1 ∪E N+2 ∪⋯ p xy gðyÞ and N ≥ K, the desired statement follows.
For proving g ðℓÞ ∞ ðxÞ ≤ g ðℓÞ ðxÞ, we define g Journal of Applied Mathematics induction, we obtain for all Due to recurrence, τ is finite almost surely, implying max fτ − 1, Mg ⟶ τ almost surely, and by monotone convergence, g Again, let S j ð∞Þ = ðΨ xy Þ x∈E K y∈E j . For x = x 0 , we have Ψ xz ≤ ψ z (due to the respective probabilistic interpretation, see proof of Lemma 1 for the interpretation of ψ z ), and since we assume that ψ g converges, we have convergence to 0 for N ⟶ ∞. Since the choice of x 0 ∈ E K is arbitrary, this is true for any x ∈ E K . By summarizing all previous results, we obtain our main result for the discrete-time setting where we will omit the dependency of S i F, F, ⋯ on N. Furthermore, we will phrase the result directly for both discrete-time and continuoustime setting. All previous results may be transferred into the context of continuous-time Markov chains easily by means of the embedded jump chain. Since similar considerations have been used in the literature frequently (see, e.g., [4,16,17]), we omit further details.
Theorem 5. Let X = ðX n Þ nϵN 0 be an irreducible and recurrent discrete-time Markov chain, or let Y = ðY t Þ t≥0 be a nonexplosive, irreducible, and recurrent continuous-time Markov chain, respectively. Let the transition probability matrix , respectively, be block-structured as introduced in the preliminaries. Let the drift condition (5) or (6), respectively, and the structural requirement (7) hold. Furthermore, let (a) S i is uniquely defined for i = 0, ⋯N, and I − T or T, resp., is invertible, in particular, there is a matrix A with the desired properties Remark 6. Theorem 5 is phrased for Markov chains with an infinite state space, and the goal is to find approximations to ψf which rely on the transition probabilities for transitions within a finite subset of the state space. Since even finite state spaces can be so large that an exact computation of ψf cannot be performed, we might want to apply Theorem 5 to Markov chains with a finite state space in order to reduce the state space to a smaller one.
j=K+1 P ij g j , (a) and (b) in Theorem 5 remain true. Note that formally, we cannot allow N = M since S K = ðH xy Þ x,y∈G a would become stochastic, and I − T would not be invertible anymore. However, N = M means that we have no reduction of the state space, and hence, this setting makes no sense. Part (c) becomes obsolete.

Efficiently Computing the Bounds for Quasi-Birth-Death Process (Discrete-Time Setting)
We turn now to developing methods for computing both bounds on ψf efficiently (and simultaneously). As pointed out in the introduction, we focus on quasi-birth-death processes (QBDs). QBDs are characterized by P ij = 0 or Q ij = 0, resp., for i, j ∈ N 0 with |i − j | ≥2, that is, each jump changes the level at most by 1.
Level-independent QBDs were analysed by Neuts [18] by means of matrix-geometric methods. First algorithmic approaches for level-dependent QBDs (LDQBDs) are due to Bright and Taylor [17] and Hanschke [19]. The approach in [17] generalizes Neuts' probabilistic interpretations of the matrices which arise in the matrix-geometric method, whereas the approach in [19] is motivated by the relationship between second-order vector-matrix difference equations and matrix-valued continued fractions. Remarkably, both methods are equivalent (up to suggestions regarding some initializations). More details and comparisons can be found in [20,21]. All these methods intend to find (approximations to) an invariant measure or the invariant distribution. In [22], an algorithm was suggested which allows to compute stationary expectations directly. In the continuous-time setting, this method reads as follows: Since the memory requirement does not depend directly on N (with d i = |E i | , it depends on max{d i : 0 ≤ i ≤ N}), the truncation level N can be chosen very large. Despite the possibility to choose large N, results on the truncation error still remained desirable from both a mathematical and a practical points of view.
It is not difficult to prove that (with the notation of the present paper) the method from [22] computes the lower bound on ψf for K = 0 (K has no impact on the lower bound). The goal of this section is to generalize this method in such a way that any value K ∈ f0, ⋯, Ng is allowed, and that the upper bound will be computed, too. For means of conciseness, we focus on the continous-time setting.
Note that the QBD property directly implies Q ij = 0 for i > K > j, that is, (7) holds. The drift condition (6) is met if and the definition of f i simplifies to Most importantly, the computation of the matrices S i according to the system becomes much easier (the matrix-analytic methods in, e.g., [17,[19][20][21] use this fact). Many of the following identities are consequences of the probabilistic interpretation of the involved matrices, and we omit these proofs since the considerations are similar to those in [17] anyway.
Then, the inverses in (39) exist, and we have we have If the level sizes E i are relatively small, the computational effort induced by (39) and (41) is acceptable. Then, it seems natural to compute all R i and B i , then all S i and finally F and F. However, in [22], it was pointed out that for K = 0, computing F can be performed in a much more efficient way (in particular with respect to memory requirement) by using a Horner-type scheme. Here, we generalize this procedure slightly with respect to two issues: We consider an arbitrary K, and we consider the simultaneous computation of F and F. Introduce Journal of Applied Mathematics With this notation, we have The recurrence schemes for R n , Z n , and Z n all start at n = N, and are used in the sequel for computing the other values for n = N − 1, N − 2, ⋯, K. Similarly, the computation of B n and W n starts at n = 1, and then the values for n = 2, 3, ⋯, K can be computed. For finding φ, φ, F , F, we only need the values for n = K. Hence, for n > K, R n is only used for finding Z n , Z n , and R n−1 . Similarly, the other matrices are only used in a single step. Hence, there is no need to store these matrices. Instead, we suggest to use the recurrence scheme for all these matrices as an "update" procedure. In total, we suggest the following method.
(vi) Return φðW + ZÞ as a lower bound and φðW + ZÞ as an upper bound.
Usually the matrices Q ij and f i can be generated when they are needed. Up to four of these matrices are needed at the same time, and we need memory for saving R, Z, B, W. In total, a small number of finite matrices have to be stored at the same time. In particular, if |d i | = E i is bounded by d, the memory requirement is ≤10d 2 + 5dðL + 1Þ, and this bound does not depend on K or N. Note that the "price" for this low memory requirement is that we only calculate the values ψf ðℓÞ for cost/reward functions f ðℓÞ which have to be specified before the computation procedure begins. Since we do not store the values ψ x or the matrices S i , adding any "new interesting" function requires a complete restart of the method.
A discussion of all numerical details of the algorithm is beyond the scope of this paper. For two specific aspects (avoiding ill-conditioned problems when computing ϕ and avoiding instabilities in the update step for B), we refer to the Appendix.

Application to the M/M/1 Queue
We first consider an example where we have explicit terms for ψ x , H xy , ⋯. Of course, we would never use numerical methods for finding bounds on ψf in such a situation, but the following considerations illustrate how the method works, and they show how sharp the upper bound can be. Numerical examples for situations in which we do not have an explicit analytical representation for ψ x will follow in Section 6 and Section 7.
λ > 0 is referred to as arrival rate, and μ > 0 is called service rate. Q is bounded, and therefore, nonexplosive. For any λ, μ > 0, Q is irreducible, and we will assume positive recurrence which is equivalent to ρ ≔ λ/μ < 1.
Here, we have E i = fig, and we will first consider choices of g which allow choosing K = 0.
The invariant measure ψ with ψ 0 = 1 is given by ψ i = ρ i , and the 1 × 1-matrices S i shall solve 0 = ∑ N j=0 S j Q ji for i > 0 which reads as This system of equations can be solved explicitly (e.g., by using standard results concerning linear difference equations with constant coefficients), and we find Due to |E 0 | = 1, we do not have to find the upper and lower bounds on ðψ x Þ x∈E0 = ð1Þ, and instead, we focus on bounds on F.

Journal of Applied Mathematics
First, consider f ðiÞ = 1. As pointed out above, we have positive recurrence, and therefore, ψf should be finite. From the explicit term, we obtain ψf = 1/1 − ρ < ∞. To show the finiteness of ψf by means of the drift criteria, set gðiÞ = 1/μ − λ. Then, for all i ≥ 1, we have By Theorem 2 in [8] (or older references, see remark in the preliminaries), we obtain positive recurrence. Furthermore, the requirements of Theorem 5 are met. Therefore, Hence, the lower bound converges to ψf = 1/ð1 − ρÞ, and the upper bound coincides with ψf which is the best possible result. It is clear that this can only happen if we have ∑q ij g ðjÞ + f ðiÞ = gðiÞ (instead of mere ≤ for all i. Consider now ψ f where f = 1 A , that is, f ðxÞ = 1 for x ∈ A and f ðxÞ = 0 otherwise. Obviously, we can still use the same function g, and with the same computation as before, we obtain Let A = f0, ⋯, Mg, and let N > M. If we are interested in computing the bounds on ψ1 A /ψ1, we combine the previous bounds, and the numerical algorithm would return For any choice of A, both bounds will converge to ð1 − ρÞ∑ i∈A ρ i which is the stationary probability for set A. As an easy choice, let A = fjg for some j ∈ ℕ 0 . Then, we know that the stationary probability is given by π j = ð1 − ρÞρ j , and for N ≥ j, we obtain the bounds Next, let f = id, that is, f ðiÞ = i for all i ∈ ℕ 0 . Note that ψid/ψ1 is the expected stationary number of customers in the system which is one of the most important performance measures in queueing theory. We set gðiÞ = ði 2 /2ðμ − λÞÞ + ðiðμ + λÞ/2ðμ − λÞ 2 Þ, and then we have The exact value is ψf = ρ/ð1 − ρÞ 2 , the computed lower bound is given by and the upper bound is given by Journal of Applied Mathematics Hence again, we find the best possible upper bound F = ψf . Appropriately combining the bounds on ψ1 and ψid leads to the upper and lower bounds on the expected stationary number πf of customers in the system.
At a first glance, the choice gðiÞ = ði 2 /2ðμ − λÞÞ + ðiðμ + λÞ/2ðμ − λÞ 2 Þ might be "difficult to guess," but note that we have used a function g of the form gðiÞ = ci for dealing with f ðiÞ = 1, and it is quite natural to choose a function g of the form gðiÞ = c 1 i 2 + c 2 i for dealing with f ðiÞ = i. Afterwards, it is not difficult to determine c 1 , c 2 such that the f -modulated drift condition is met. Nevertheless, in other situations, we will not be able to find such an optimal function g. Therefore, we demonstrate next that the method still works if we use some function g which is far from being optimal.
Let us directly consider f = ð1, 1 A , idÞ , that is, L = 2, f ð0Þ ðiÞ = 1, f 1 ðiÞ = 1 A ðiÞ, and f ð2Þ ðiÞ = i. Then, ψf ð1Þ /ψf ð0Þ is the stationary probability for set A, and ψf ð2Þ /ψf ð0Þ is the expected stationary number of customers in the system. Let us choose g ðℓÞ ðiÞ = a i for i ∈ ℕ 0 and ℓ = 0, 1, 2. Then for ℓ = 0, 1, 2 and all i > K for some sufficiently large K if we choose a ∈ ð1, μ/λÞ = ð1, 1/ρÞ. Due to K > 0, the values S i will change. Now, we have The solution to this system is given by We omit an explicit representation of F and F (which is strictly larger than ψf here), but we remark that and due to ρ < ρa < 1, this difference tends to 0 as N ⟶ ∞. Note that the speed of convergence of F − F to 0 is still exponentially fast, but slower than for the optimal choices of g ðℓÞ .

Application to a Variant of the M/PH/1 Queue
Next, we consider an application of our method to a queueing model where we do not know the exact invariant distribution and have no chance but to use numerical methods. We consider a variant of the M/PH/1 queue where arriving customers decide whether to join the queue or not depending on the number of customers they find in the queue. Precisely, we assume that (i) Customers arrive according to the Poisson process with intensity λ (ii) An arriving customer joins the queue with probability α n if there are n other customers in the system upon the arrival (iii) The service time is phase-type distributed with parameters β ∈ ℝ 1×d and B ∈ ℝ d×d , that is, the cumulative distribution function of a service time is given by t ↦ 1 − βe Bt 1 and its expectation by 1/μ ≔ βð−BÞ −1 1 (iv) There is one server.
This queueing system can be modelled as the Markov chain Y = ðY t Þ t≥0 , where Y t = ðN t , U t Þ is two-dimensional: N t is the number of customers in the system at time t, and for N t > 0, U t ∈ f1, ⋯, dg is the current service phase. By setting the service phase to 1 for N t = 0, we obtain levels E i with E 0 = fð0, 1Þg and E i = fð0, 1Þ, ⋯, ð0, dÞg for i ≥ 1. The generator matrix is given by For α n = 1, we obtain a level-independent QBD (the classical M/PH/1 queue) for which an explicit analytical representation of the solution exists (see, e.g., [23]). For nonconstant α n , we obtain an LDQBD, and we have to use numerical methods. In what follows, we assume that which is equivalent to the existence of someλ < μ with λα n ≤λ for all n ≥ K 0 with some K 0 ∈ N 0 . Without restriction, we assume K 0 ≥ 2. Condition (63) guarantees that we have positive recurrence and that the stationary number of customers in the system has finite expectation as we will prove below. In order to use our algorithmic method, we set f ð0Þ i = 1 additionally. For applying our method and for proving ψ f ðℓÞ < ∞ for ℓ = 0, 1, 2, 3 under condition (63), we check some f -modulated drift conditions. Set h i = Q i,i−1 g i−1 + Q ii + Q i,i+1 g i+1 for i ≥ 1, and then we have and all choices of g for which g i+1 − g i ≥ 0 (componentwise). For ℓ = 0, 1, 2, we have f ðℓÞ i ≤ 1, and we set g for i ≥ K 0 . From the special case f i ð0Þ = 1, we obtain positive recurrence from Theorem 2 in [8], and by the same result, we have ψf ðℓÞ < ∞ for ℓ = 1, 2: For ℓ = 3, we have f ð3Þ i = 1 ⋅ i, and we set for i ≥ 1 (note that ð−BÞ −1 is a nonnegative matrix, and hence, we have g for i ≥ K 0 . Hence, Choose K ≥ K 0 such that Then, we have h The existence of such a value K proves ∑ ∞ i=0 π i 1 · i < ∞, that is, the stationary number of customers has finite expectation.
For applying our algorithmic method, we have to specify K. In our numerical examples, we set λ = 5, μ = 1, α n = 1/10 + 9/ð10ðn + 1ÞÞ. This allows to chooseλ = 3/4, and it is easy to derive that λα n ≤λ for n ≥ K 0 = 17. Furthermore, we consider the special case where the service times are Erlang-2distributed, that is, β = ð1, 0Þ and implying and Then, (68) is guaranteed if ðð2μ + 3λÞ/ðμ −λÞÞ − ðK + 1Þ ≤ 0 ⇔ K ≥ ðð2μ + 3λÞ/ðμ −λÞÞ − 1 = ðμ + 4λÞ/ðμ −λÞ. For μ = 1 andλ = 3/4, this results in K ≥ 8. Since we have to choose K ≥ K 0 , we set K = 17. Note that g ðℓÞ i simplifies to In Table 1, some numerically computed results are listed. The parameters are chosen as specified above, and additionally, we set M = 30 for πf ð2Þ . The results clearly illustrate the convergence of both bounds to the same value. For obtaining precise results for πf ð2Þ , the truncation level N has to be chosen a bit larger than for obtaining quite precise bounds on π f ð1Þ and πf ð3Þ . This is not surprising since due to the fact that πf ð2Þ is the stationary probability for at least M = 30 customers, the truncation level should be chosen significantly larger than 30. However, the results demonstrate that even for the computation of πf ð2Þ , the truncation level 55 yields very precise results.

Application to a Retrial Queueing Model
As a final example, we consider the M/M/c/d − 1 queue with retrials which can be seen as some kind of "prototype" for LDQBDs.

12
Journal of Applied Mathematics (v) Customers who cannot enter the system due to lack of waiting capacity enter the "orbit" of retrials (vi) Each customer in the orbit will retry to enter the system after a time which is exponentially distributed with parameter ν and independent of all other random variables (vii) Retrying customers who cannot enter the system due to lack of waiting capacity stay in the orbit of retrying customers.
where O t is the number of customers in the orbit of retrials, and N t is the number of customers in the queue (including service). Then, Y = ðY t Þ t≥0 is a continuous- Obviously, a transition with a positive rate will change the level at most by 1, and hence, we have a quasibirth-death process with the following state transitions from state (i, u): (i) Arrivals occur with rate λ. For u < d − 1, the arriving customer enters the queueing system, and the new state is ði, u + 1Þ. For u = d − 1, the arriving customer enters the orbit of retrials, and the new state is (i + 1, d − 1) (ii) For u > 0, service completions occur with rate max fu, cg ⋅ µ, and the new state is ði, u − 1Þ (iii) For i > 0 and u < d − 1, successful retrials occur with rate i ⋅ ν, and the new state is ði − 1, u + 1Þ.
In matrix notation, we have for i ≥ 0, for i ≥ 1, and Note that retrial queues have been discussed intensively in the literature. We refer to the textbook [24] which gives an overview on retrial queueing models and computational methods for determining invariant distributions, stationary expectations, etc. Computational methods are very important in this context since even for d − 1 = c, there are only explicit analytical representations of the invariant distribution if c ≤ 2 (see [24]).
Of course, there are many interesting characteristics such as the mean number of customers in the orbit and the mean number of customers in the queueing system. Our purpose is to demonstrate that our method provides reliable results for interesting characteristics, but we do not want to discuss the retrial queueing model in details. Therefore, we focus on a single characteristic, the stationary probability for arriving customers to find the queuing system full and being forced to join the orbit. If π = ðπ ði,uÞ Þ ði,uÞ∈E denotes the invariant distribution (if there is one), this probability is given by The invariant distribution exists if and only if we have positive recurrence. Below, we will restate a proof for the fact that this is the case if λ < cμ. Note that λ < cμ is also a necessary requirement for stability/positive recurrence; we refer to [24] for more details.
We demonstrate how to prove that λ < cμ is sufficient for positive recurrence by means of drift criteria. The function g which we use for this purpose can also be used for finding the upper bounds on ψ1 and ψ1 A for any A ⊂ E. In particular, by setting A = fði, d − 1Þ: i ∈ ℕ 0 g, we obtain bounds on p b ≔ ψ1 A /ψ1: In any case (both for proving recurrence if λ < cμ and for applying our truncation method), we want to find a function g such that we have We use the approach gðði, uÞÞ = γi + δu. For i ≥ 1 and u < d − 1, the entries of h i compute as If we choose γ > δ, we surely have hðði, uÞÞ + f ðði, uÞÞ ≤ 0 for i > K with some sufficiently large K (due to f ðði, uÞÞ ≤ 1). Additionally, we have to take into account. Hence, we have to guarantee λγ − cμδ ≤ −1. Due to λ < cμ, such a choice with γ > δ is possible. For example, we can choose δ = 2/cμ − λ > 0 and γ = 1 + ðcμ/λÞ/ ðcμ − λÞ > 2/cμ − λ, since then we have For this choice, we have hðði, d − 1ÞÞ + 1 ≤ 0 for all i ≥ 1.
As pointed out above, we have to guarantee that for i ≥ K + 1. Hence, we choose In total, we set where γ = ½1/λ ⋅ ðcμ + λÞ/ðcμ − λÞ and δ = 2/ðcμ − λÞ, and we choose K as above. Then, we can apply our algorithm which returns bounds on ψf ð0Þ and ψf ð1Þ where ψ is the invariant measure with ψ ðK,0Þ = 1. By dividing lower and upper bounds appropriately, we find bounds on In Tables 2 and 3 λ and fixed values μ, ν, c, d. In both cases, we see 14 Journal of Applied Mathematics that lower and upper bounds converge to the same limit (which coincides with the value computed in Table 3.8 in the textbook [24] by other means). In Table 2, we have low traffic due to λ/cμ = 0:2. Then, the blocking probability is quite low, and-more important for the evaluation of our method-we obtain precise results for low truncation levels. In Table 3, we have more traffic due to λ/cμ = 0:8. Hence, the blocking probability is much larger, and we need higher truncation levels to obtain precise results.
The fact that we have to choose higher truncation levels N in the situation of Table 3 is anything but surprising since due to more traffic, the Markov chain will assume states within higher levels much more often. However, it is important to remark that it is not clear how large N should be chosen. The suggested method can be iteratively applied with increased truncation levels until a prescribed accuracy is achieved. Unfortunately, when increasing N, we have to restart most of the calculations (B and W do not change, but R, Z, and Z do). However, for each such calculation, the memory requirement is the same since it does not depend on N. In particular, in all situations considered in Tables 2  and 3, the memory requirement coincides.
As pointed out above, p b is an important performance measure, but there are more interesting characteristics for this model. Note that with the same choice for g, we can find bounds on the stationary probability for any finite set of (finite or infinite) sets. In particular, if p u denotes the stationary probabilities for u customers in the queueing system (waiting or in service), we could determine bounds on p u for u = 0, ⋯, d − 1 simultaneously with little additional effort. Other performance measures (e.g., moments of the numbers of retrying customers) will require other choices for g.
We conclude this discussion of the basic retrial queueing model by remarking that there are many extensions of the retrial queueing model which improve the applicability. For example, we could consider Markovian arrival processes, phase-type distributed service times, impatient customers who leave the queue and enter the orbit, impatient customers who leave the orbit, queueing networks where declined external arrivals join the orbit of retrying customers, etc. For all such models, the suggested algorithm allows to compute important performance measures precisely and efficiently. Note that such retrial queues provide good models for many problems in telecommunications and computer networks, e.g., in the design of wireless networks. However, a deep discussion of a realistic (and hence complex) a model justifies a separate publication and is far beyond the scope of this methodological paper.

Evaluation of the Method
8.1. The f -Modulated Drift Condition. The major requirement for finding the bounds in Theorem 5 and for the resulting algorithmic procedure is the f -modulated drift condition (5) or (6). We give some comments on this condition.
(i) As pointed out above, f -modulated drift conditions are very popular for proving convergence/finiteness of ψf or πf . For f ≥ 0, we remark that the existence of an f -modulated drift condition is even equivalent to the convergence of ψf . Hence, from a purely theoretical point of view, the requirement that an f -modulated drift condition is satisfied is no restriction (ii) Most often, finding a function g which satisfies the f -modulated drift condition is the easiest way (or even the only way) to prove positive recurrence or to prove ψf < ∞. If convergence of ψf is not guaranteed, it makes no sense to compute approximations to ψf (which are always finite) by numerical means. Hence, also from a practical point of view, finding a function g which satisfies the appropriate f -modulated drift condition does not require any additional effort in many situations   (iii) Every function g which can be used for proving ψf < ∞ can be used in Theorem 5. Note that this is not true for all truncation bounds which rely on f -modulated drift conditions, see Section 8.6. (7). We give some comments on condition (7). Remember that K is determined by the drift condition. If (7) is not met, we can always restructure the levels by settingẼ 0 = E 0 ∪ ⋯ ∪ E K andẼ i = E K+i for i ≥ 1, and setK = 0. Then (7) is fulfilled. If we have the QBD structure for the levels E i , this is preserved for the levelsẼ i . Consequently, we haved 0 = d 0 + ⋯ + d K . This is unfortunate since d K (which becomesd 0 after the restructuring) should be quite small due to the operations involving matrices of dimension d K × d K .

The Structural Requirement
In total, from a purely mathematical point of view, K = 0 can be guaranteed without restriction, and then the structural requirement (7) can be omitted. From a practical point of view, it is advantageous to keep d 0 , d 1 , ⋯, and in particular, d K small, and in this context, we benefit from allowing K > 0. Then, the price is the requirement (7).

Asymptotic Properties of the Bounds.
In order to compare our bounds to competing methods, we consider the asymptotic behaviour of the difference between upper and lower bounds.
In order to keep things simple and concise, we focus on the approximation of πf ð1Þ = ψf ð1Þ/ψf ð0Þ (where f ð0Þ = 1) for positive recurrent continuous-time Markov chains, and we assume d K = 1 (which is a clearly restrictive condition), implying φ = φ = ψ K = 1. The bounds on πf ð1Þ are given by F ð1Þ / F ð0Þ and F ð1Þ /F ð0Þ , and for N ⟶ ∞, the difference behaves asymptotically like F where we have used that S i / F ð0Þ = ψ i /ψ K F ð0Þ converges to π i .
In particular, if g ð1Þ j = g ð0Þ j (e.g., if f ð1Þ is an indicator function), we have the asymptotic expansion F These considerations clearly motivate to choose g ðℓÞ j as small as possible.

Augmented Matrices versus Nonaugmented
Matrices. Theorem 5 and the resulting method for computing bounds on ψf or πf rely on considering the northwest-corner truncation P ðNÞ = ðP ij Þ N i,j=0 which is substochastic. From PðNÞ, we obtain an approximation ψðNÞ to ψ, where ψðNÞ is uniquely determined by ψ x 0 ðNÞ = 1 and by satisfying ψðNÞPðNÞ = ψðNÞ up to the scalar equation corresponding to state x 0 .
There are many authors who prefer to "repair" the row sums of PðNÞ, that is, they deal with an augmented stochastic matrixPðNÞ ≥ PðNÞ. Due to finiteness and stochasticity, it is possible to determine an invariant distributionπðNÞ ofPðNÞ directly. Under some mild constraints,πðNÞ converges to π (see, e.g., [3]). The fact thatPðNÞ admits an invariant distribution (in contrast to PðNÞ) can be interpreted as an advantage of methods relying on augmented matrices. There are more theoretical issues why considering augmented matrices is popular, and additionally, the augmented matrices often allow meaningful probabilistic interpretations. A more detailed list of advantages can be found in [3] (for the continuous-time setting).
Despite these undeniable advantages, the approach relying on nonaugmented matrices was chosen in this paper. The simple reason is that ψðNÞ converges (pointwise) monotonically to ψ which makes lower bounds on ψf trivial. As we will see in the next lines, this fact is more important than all the advantages of augmentation procedures.

Comparison to Other A Posteriori Bounds on Truncation
Errors. The most recent results for truncation bounds on stationary expectations can be found in [1][2][3] where the authors consider continuous-time Markov chains. They consider northwest-corner truncations Q = ðQ ij Þ N i,j=0 and their augmented versionsQðNÞ (with row sums 0). With our notation and the invariant distributionπðNÞ ofQðNÞ, Theorem 2.1 in [2] reads as The authors require that (i) the Markov chain is irreducible and positive recurrent (ii) f ≥ 1 (iii)Q =QðNÞ = ðq xy Þ x,y∈E is constructed such that ðq xy Þ x,y∈E 0 ∪⋯∪E N arises from truncation and augmentation and the other entries are arbitrary such that Q~is conservative (iv) e.g., all other entries are 0 16 Journal of Applied Mathematics (vi) β > 0 is arbitrary dt with the transition function t ↦ P ðtÞ arising from Q.
The right-hand side in (86) has some similarities to the asymptotic expansion (85). Asymptotically, the entries ofπ ðNÞðQ − QÞg behave like ∑ N i=0 π i ∑ ∞ j=i+1 P ij g j . Hence, (86) mainly differs by the factor inf f ðxÞ and the summand b/β ϕ ðβÞ 1. Furthermore, note that the right-hand side of (86) provides a bound on the difference between πf and its approximation. Hence, "upper bound − lower bound" is larger by factor 2. Finally, it is important to remark that Φ ðβÞ cannot be determined by means of numerical computations. Therefore, it is suggested in [2] to use ðI − 1/β ⋅ QðNÞÞ −1 that increases monotonically in N with limit Φ ðβÞ . Hence, the entries of ðI − 1/β ⋅ QðNÞÞ −1 can be used to find computable bounds on ϕ ðβÞ , and thus, on ∑ x∈E jπ x −π x ðNÞjf ðxÞ.
Replacing Φ ðβÞ in this manner has two effects: (i) There is no evidence that the bound in (86) is sharp. Even if it was, the bound which results from this replacement cannot be sharp anymore (ii) Evaluating the computable bound which arises from (86) requires computing at least some rows of ðI − 1/β ⋅ QðNÞÞ −1 which can be costly in terms of computational effort.
Together, we see that the algorithm suggested in this paper will provide slightly better bounds which are significantly easier to compute at the same time. Finally, f ≥ 1 in [1][2][3] is clearly restrictive. In particular, this requirement prevents us from using indicator functions in order to obtain stationary probabilities.

Comparison to A Priori Bounds on Truncation Errors.
Theorem 5 provides a posteriori truncation bounds (similar to (86) from [2]), since both the approximation to πf and the bounds are computed numerically (simultaneously).
A priori bounds do not rely on numerical computations. Hence, before any computation scheme starts, a priori bounds would allow to determine the truncation level N in such a way that the (relative) error of the approximation is smaller than a prescribed bound.
An approach in this direction was introduced in [5] and generalized in [4] where a method was developed which determines a level N such that for some prescribed ∈>0 and some scalar function f > 0. The method works as follows: (i) Choosesome function g and determine (continuoustime setting) hðxÞ = ∑ y∈E q xy gðyÞ for all x ∈ E.
In particular, we can find N such that E 0 ∪ ⋯ ∪ E N ⊃ C and obtain (88).
(v) If C is not finite, we have to change our choice of g.
At a first glance, this method provides a priori error bounds, but there are some problems. In the original papers [4,5], it was already pointed out that it is not easy to find appropriate functions g, and that very often, the bounds are quite pessimistic. In order to illustrate these effects, we consider the M/M/1 queue from Section 5 and set f = 1. Whereas Theorem 5 can be applied with gðiÞ = b ⋅ i for an appropriate constant b, this choice will fail here: We obtain hð0Þ = bλ and hðiÞ = bðλ − μÞ for i ≥ 1, that is, hðiÞ is constant for i ≥ 1. In particular, if ∈>0 is very small (and hence, γ is large), we have C = fi ∈ E : hðiÞ>−γf ðiÞg = ℕ 0 . Note that the choice of b is unimportant since it cancels out.
Summarizing this brief numerical example, the most intuitive choice for g does not work, and the next choice provides very pessimistic estimates. It might be possible to find functions g which provide sharp estimates, but it does not seem likely that such functions can be "guessed." There is an even more serious problem when interpreting the results from [4,5] as a priori estimates for the Journal of Applied Mathematics truncation error: The numerator in (88) refers to the exact invariant measure ψ which cannot be computed by numerical means (up to some trivial exceptions like the M/M/1 queue). For honest a priori bounds, we should be interested in guaranteeing with a numerically calculated approximation ψðNÞ to ψ. Unfortunately, the methods in [4,5] cannot be modified in this direction. One might argue that this slight modification is negligible in view of the pessimistic bounds which are obtained. However, if we guessed an optimal function g by accident, we would not have a bound on In total, the results from [4,5] may be exploited to find theoretical estimates on stationary probabilities or expectations. With respect to numerical means, they can only be used for rough preliminary considerations (how large should the truncation level N be chosen) before applying the method suggested in this paper. Even that might be hard due to the increased difficulty to find appropriate functions g (in comparison to finding appropriate functions g for applying Theorem 5).

Conclusion and Further Research
In this paper, we have developed an exact and efficient method for numerically computing lower and upper bounds on ψf where ψ is an invariant measure for an irreducible recurrent discrete-time or continuous-time QBD and f is a multidimensional nonnegative cost/reward function. By combining the lower and upper bounds appropriately, we obtain lower and upper bounds on πf in the case of positive recurrence, where π is the unique invariant distribution and f is an arbitrary cost/reward function. The term πf can be interpreted as the stationary expectation of the Markov chain measured by f , or as the long-run average of cost/rewards measured by f . If f = 1 A is chosen to be an indicator function, we obtain stationary probabilities. The computation of πf ð2Þ in Section 6 illustrates that even very small probabilities can be computed with high relative precision. The algorithm was developed as an extension of previous matrix-analytic methods which also dealt algorithmically with level-dependent QBDs, but for which no error bounds were known up to now. Due to the fact that we compute lower and upper error bounds, we obtain automatically an approximation to πf and (a posteriori) error bounds. In our examples, we have seen that the difference between lower and upper bounds converges to 0 very fast, that is, we obtain very precise results. For optimal choices of g ðℓÞ , we have observed that the upper bound coincides with the true value of ψf .
Obvious directions of future publications concern detailed applications of our algorithm to specific Markov models, e.g., we could consider enhanced retrial queuing models with Markovian arrival processes, phase-type distributed service times, impatient customers, etc., but there is a huge variety of other practical processes which are modelled by infinite LDQBDs. If we focus on the analysis of a specific model, we should spend effort into finding very good functions g which may result in obtaining very precise results for small truncation levels.
In this context, it may be worth to find not only an upper truncation level but also a lower one. Then, the numerical computation would focus on a very small region of the state space, and the method would become even more efficient.
Note that the main results of this paper are not restricted to QBDs. Theorem 5 only requires that no transitions from E i to E j occur with positive probability/rate if i > K > j. Hence, these results also apply if P or Q, respectively, has upper block-Hessenberg structure (also referred to as the block-M/G/1 structure). We have focused on QBDs here since this specialized structure allows a very efficient computation of lower and upper bounds. Future research could deal with developing efficient algorithms for more general structures.
We remark that some of the key results also transfer to nondiscrete state spaces: Let E = E 0 ∪ E 1 ∪ ⋯, and let the levels E i be petite in the sense of [10] (this property is satisfied by compact sets for many Markov chains). Then P ij becomes a (sub-Markovian) kernel, and S i does so, too. The multiplication S j P ji is defined straightforward by S j P ji ðx, dzÞ = Ð y S j ðx, dyÞP ji ðy, dzÞ. We still have ψf = ψ K F, where ψ K solves ψ K lim N→∞ ðI − TÞ = 0, and where S i f ðℓÞ i ðxÞ = Ð y S i ðs, dyÞf ðℓÞ i ðyÞ. Under the structural assumptions (no transitions from E i to E j for i > K > j), the results on the bounds on F remain unchanged. However, for finding S 0 , ⋯, S N by means of numerical calculations (and for finding a lower and an upper bound on ψ K ), it seems that some sort of discretization is inevitable.
We conclude with a remark concerning the relationship to nonprobabilistic topics: As mentioned above, for K = 0, the basic algorithm for computing approximations to the subvectors ψ i of the invariant measure ψ can be deduced from general considerations concerning (matrix-valued) continued fractions. Hence, in some sense, our computation of bounds on ψf is related to the speed-of-convergence statements for matrix-valued continued fractions. Note that the probabilistic interpretation of continued fractions (arising in nonprobabilistic contexts) and their convergents has led to new convergence criteria for continued fractions and their generalizations (see [25,26]), and with further research, considerations similar to those in this paper might provide speed-of-convergence results.
Additionally, we give a less stochastic, more algebraic proof of the key result, that is, the upper bound in Lemma 4, where we concentrate on discrete setting. Let P ab , P ac , ⋯ be blocks with transition probabilities where the index (i) a corresponds to states within E K Then, the structural requirement (7) on the transitions reads as P db = 0 and P cb = 0. The f -modulated drift condition (5) can be rewritten as S a is stochastic, and the property of irreducibility is inherited to S a from P. Let the blocks of an invariant measure for P be denoted by ψ a , ψ b , ψ c , ψ d . Then Þ= ψ a P ab , P ac , P ad ð Þ V −1 , ðA:5Þ and ψ a is an invariant measure for S a . These formulas can be proven in a purely algebraic manner although they have a stochastic interpretation: If the original Markov chain with transition probability matrix P is only observed at return times to set E K , we obtain a new Markov chain with transition probability matrix S a (see [27]). The above representation of ψ allows to write

ðA:6Þ
and due to ψ a = ψ K , F has the same meaning as before. As pointed out above, we do not want to give alternative proofs for all results here. Instead, we focus on the upper bound F on F given in Lemma 4.
From (A.1), we find and with (A.2), we obtain ðA:8Þ Therefore, we can write In the next lines, we will prove that the middle-term matrix is V −1 which obviously yields the desired statement ðA:9Þ As N ⟶ ∞, T converges to an irreducible stochastic matrix, and hence,T converges to a transient substochastic matrix. Hence, I −T is invertible, and this new system of equations is not susceptible to numerical problems. Of course, we have to find all rows of A separately, that is, we solve d K systems of linear equations of the dimension ðd K − 1Þ × ðd K − 1Þ. However, if d K is relatively small, this effort can be neglected in comparison to the other computational steps. These considerations directly transfer to the continuous-time setting.

B.2. Exploiting the GTH Advantage in the Update
Step for the Matrix B. For many systems of linear equations arising in the context of the Markov chains, the coefficient matrix satisfies some row sum conditions which is preserved during all steps of solving the linear system of equations. This fact can be used for implementing these steps in a way which is not affected by numerical instability. Such procedures were introduced by Grassmann et al. [28] and therefore referred to as GTH advantage. In the present paper, this advantage should be used in the update step for the matrix B. We give some more details.
First, note that the probabilistic interpretation guarantees that all entries of B i are nonnegative for all i. Hence, computing the diagonal entries of Q i−1,i−1 + B i−1 Q i−2,i−1 results in computing differences of positive numbers. Furthermore, when performing any elimination procedure, the updates of the diagonal entries result in computing differences of positive numbers again. All other operations only compute sums of nonnegative numbers (which cannot cause numerical problems). For avoiding the need of subtractions, we remark that the row sums of Q i−1,i−1 + B i−1 Q i−2,i−1 and of Q i−1,i add up to 0 (this statement can be deduced from the probabilistic interpretation). Let h i−1 be the column vector of row sums of Q i−1,i . Then the d i−1 × ðd i−1 + 1Þ-dimensional matrix (−Q i−1,i−1 − B i−1 Q i−2,i−1 , −h i−1 ) has row sums of 0 where only the diagonal entries can be positive. An important finding on the application of (scaled) Gaussian elimination to systems with coefficient matrices of this type guarantees that these properties are preserved in each step of the elimination procedure. Hence, the pivot element in each step (which are the entries which might be affected by numerical instabilities) can be "corrected" by computing the negative sum of thẽ  By performing the steps of Gaussian elimination, we convert the left-hand side to the unity matrix, and the right-hand side becomes the desired inverse matrix. The column in between acts as a control column. Before starting the Gaussian elimination, we can correct the first pivot element 4 by recomputing it as −(−2 −1 −1).
(iv) The first row is scaled by 1/4. Afterwards a constant multiple (with factor 2) of the first row is added to the second row. This results in (v) After the next step, we obtain Again, ð7/2,−7/2Þ has a row sum of 0 which can be used for correcting the pivot element 7/2 which is used for the last step of the elimination procedure.
Due to the need of using the control vector h, and due to computing the inverse first instead of solving a matrix-valued system of linear equations directly, we have slightly more computational effort. The effects which are caused by instability justify this additional effort.
As an example for instability, consider the update procedure of B in the retrial queueing model in Section 7. In fact, note that for this specific model, we could even simplify our algorithmic approach since we are able to determine M i ≔ −Q ii − B i Q i−1,i explicitly. Hence, there is no need to compute B i = Q i,i−1 M −1 i−1 recursively. This explicit representation is given by

Data Availability
No data were used to support this study.

Conflicts of Interest
The authors declare that they have no conflicts of interest.