Analysis of the Optimal Resource Allocation for a Tandem Queueing System

In this paper, we study a controllable tandem queueing system consisting of two nodes and a controller, in which customers arrive according to a Poisson process and must receive service at both nodes before leaving the system. A decision maker dynamically allocates the number of service resource to each node facility according to the number of customers in each node. In the model, the objective is to minimize the long-run average costs. We cast these problems as Markov decision problems by dynamic programming approach and derive the monotonicity of the optimal allocation policy and the relationship between the two nodes' optimal policy. Furthermore, we get the conditions under which the optimal policy is unique and has the bang-bang control policy property.


Introduction
Queueing systems where customers must be processed at each station in series from upstream station to downstream station are called tandem queueing system. As we all know, tandem queueing models have widespread applications in both service organizations and production factory in the sense that the system performance measures and optimization are of primary concerns, such as the control of semiconductor fabrication processes and broadband wireless networks, appointment scheduling in hospital, and production inventory system (see [1][2][3] and therein). Recently, this issue has attracted much attention and vast literatures have been studied, especially the dynamic resource allocation problems of the tandem queueing system. Most of them are forced on the models in two directions: admission control type and server resource allocation type. The admission control of the tandem queues have been widely studied (e.g., [4,5]), while little work has appeared concerning the structure of the optimal resource allocation policy in the tandem queues.
For many systems, service consists of two or more phases by one or more servers. A fundamental decision is how to allocate the resource (servers or workforces) owned by system to each station. This problem is a classic topic, which roots from Rosberg et al. [6] where the service rate in station 1 can be selected from a compact set and constant in station 2. Optimal control of a two-stage tandem queues system with only two flexible servers was discussed in Ahn et al. [7]. Arumugam et al. [8] considered inventory based allocation policies for flexible servers in serial systems. Smith and Barnes [9] analyzed the optimal server allocation in closed finite queueing networks. This question has been considered by the above authors for different cost or reward criterion but without considering the structure of the optimal policy. They just make a numerical experiment to get the optimal policy. In fact, it is more complex for the manager to obtain the detailed optimal policy in the practical application. The managers prefer more to make basal insight for the structure of the optimal policy. While for the study of the structure of optimal policy in single queue, many papers have investigated this issue. Iravani et al. [10] studied the optimal service scheduling in nonpreemptive finite-population queueing systems. The single-queue systems of the optimal resource allocation policy were considered by Yang et al. [11], who investigated 2 Mathematical Problems in Engineering the structural properties of the optimal resource allocation policy. Yang et al. [12] studied optimal resource allocation for parallel service facilities multiqueue systems with a shared server pool.
For the optimal control of the tandem queueing system, some relative works are discussed. Weber and Stidham [13] considered a problem of optimal service rate control in queueing networks where the optimal policy has a monotone structure. Veatch and Wein [14] generalize the monotonicity results of [13] where the control policies were studied under the full information and service rate functions are linear in the service resource. Mayorga et al. [15] studied the problem of allocating flexible servers for a firm that operates a maketo-order serial production system. The max-min optimality of service rate control problem in closed queueing networks was studied by Xia and Shihada [16] where the cost functions are strictly convex (concave). Few studies, even among the most recent, have considered the structure of the optimal resource allocation policy in the tandem queues with nonlinear service rate and cost functions. In many logistic environments, however, the assumption of the linear resource cost and service rate is not appropriate. It is well known that if the service cost is linear, these problems have all-or-nothing (bang-bang) optimal policy (see [14]). Different from the works quoted above, in our model, the service resource cost and service rate are more general than linear in service resource. Recently, Xia et al. [17] investigated the optimal control of service rates of a tandem queue with power constraints and general cost function. They mainly derived some structures of the optimal policy, such as the bang-bang control policy and 3-element set policy for some special cases, while, in this paper, we study the structure of the optimal control policy for the case with general cost and service rate functions Moreover, some uninvestigated properties of the optimal policy are obtained in this paper. Using the theory of queueing system, we cast the optimal problem as a MDP. The theory of Markov, semi-Markov, and regenerative decision processes can be found in Morozov and Steyaert [18]. We mainly analyze the properties of the optimal policy under full information and partial information. Concretely, we first derived the properties (monotonicity and convexity property) of value function by the induction method and queueing theory (see [19]). Second, we provide insights into the optimal policy structure based on the properties of the value function and dynamic programming method (see [20][21][22]). Furthermore, we take Howard's iteration procedure to obtain numerical results.
The main contributions of this paper can be summarized as follows. First, to the best of our knowledge, our paper is the first to study the optimal resource allocation policy in the tandem queues with the general service rate and resource cost functions. Second, we get the monotone results of the optimal policy under the partial information based on the quasiconvexity property of the value function. Third, we derive the conditions under which the optimal policy is unique and the bang-bang control policy is established. This conclusion is totally new progress compared with all of the previous works in the literature. Furthermore, we derive the relationship between the two stations' optimal policy. As far as we know, these are the most general results for the optimality of resource allocation in the tandem queueing system. The rest of the paper is organized as follows. In Section 2, we introduce the model formulation in detail based on the controllable Markov decision problem. The characteristics of the optimization problem and the optimality equation are derived in Section 3. In Section 4, we present the structural properties of the optimal policy and main results of the paper. In Section 5, we give some numerical examples to provide the support for the results of the present model. Finally, some further discussions and conclusions are given in Section 6.

Model Description
We consider a tandem queueing system with two stations. Arrivals to the system at station 1 from outside follow a Poisson process with parameter and have exponentially distributed service requirement times at each station. After receiving service at station 1, customers join immediately to station 2 and receive service before leaving the system. A decision-maker can assign a number of service resources to each station. The service rate depends on the number of service resources assigned to the stations precisely. When a station has been allocated resources, the service duration of the customer in station is exponentially distributed with rate ( ), = 1, 2, which is strictly increasing in . Without loss of generality, we assume that (0) = 0, = 1, 2. At any decision epoch, the decision-maker decides to choose the number of service resources to station 1 from a set = [0, max ] and to station 2 from a set = [0, max ] at the same time. Each station has a single infinite-size FCFS queue. The interarrival and service times are assumed to be mutually independent. We assume that the stability condition < min{ 1 ( max ), 2 ( max )} holds. Figure 1 gives an illustration of the system.
We consider the following cost structure in the system. Our objective is to obtain dynamic resource allocation policy that minimizes the long-run average costs.
(1) Resources Cost. When station uses resources, a cost of ( ), = 1, 2 is incurred by the system per unit time ( ( ) is a continuous function and strictly increasing in . Without loss of generality, we assume that (0) = 0).
(2) Holding Cost. Holding costs are incurred at rates ℎ 1 and ℎ 2 per unit time for each customer in stations 1 and 2, respectively.
Let ( ) denote the number of customers at station , = 1, 2. The state of the system at time can be described by ( ) = ( 1 ( ), 2 ( )). The system evolves as a continuoustime Markov process ( ) = {( 1 ( ), 2 ( )), ≥ 0}. We define the notations to classify the certain components of the vector state ∈ . Clearly, the system state space is = {( 1 , 2 ) | 1 , 2 ∈ } with = {0, 1, 2, . . .}. We consider the stationary Markov policy under which the system evolves as a continuous-time Markov chain. Moreover, in order to study the optimal policy in the ergodic Markov process, we assumed that the model is stable and conservative. The transition rate under a control action ( , ) is given by where Here is the 2-dimensional vector with 1 in the th coordinate and 0 elsewhere, = 1, 2.
The problem of the decision-maker is to choose an optimal dynamic policy based on the number of customers in each station that minimizes the long-run average costs. We formulate the service resource management problem as a Markov decision process. The set of decision epochs is composed of the set of all arrivals and service completions. The controllable system associated with a Markov process is a five-tuple in which ( ) = ( , ( )) , ∈ is the infinitesimal generator of the queueing system under the policy . We consider the stationary Markov policy : → with = ( , ). Due to the Markov property of the queueing system, we know that the optimal policy depends only on the current state regardless of . In our model we consider two situations: the decision with partial information and full information. Concretely, when the system state is = ( 1 , 2 ), the manager makes an action as follows: (i) Partial information: the action for station 1 (2) is ( 1 ) ∈ ( ( 2 ) ∈ , resp.). That is the action of resource to station only depends on the number of customers in station .
(ii) Full information: the action for station 1 (2) is ( 1 , 2 ) ∈ ( ( 1 , 2 ) ∈ , resp.). That is the action of resource to station depends on the number of customers in both stations.

Optimization Problem and Optimality Equation
It is obvious that, under the stability condition < min{ 1 ( max ), 2 ( max )}, the two-dimensional stochastic process ( ) = {( 1 ( ), 2 ( )), ≥ 0} is an ergodic continuoustime Markov chain for any fixed stationary policy . As it is known from Tijms [23], the long-run average cost per unit of time for the policy in our ergodic Markov process can be written in the following form: in which ( , ) denotes the total expected costs up to time when the system starts in state = ( , ) and ( ) denotes a stationary probability of the process under policy = ( , ). The goal is to find a policy * that minimizes the long-term average costs: Using the standard tools of uniformization and normalization, we construct a discrete-time equivalent of our original queueing system. Without loss of generality, we assume that + 1 ( max ) + 2 ( max ) = 1. Now we consider a real-valued function V( ) which is defined on the state space. The relative value function V( ) can be regarded as the asymptotic difference in total costs that results from starting the process in state instead of some reference state. As is shown in Puterman [24], the optimal policy and the optimal average cost are the solutions of the optimality equation: where is the dynamic programming operator acting on V defined as follows: 4 Mathematical Problems in Engineering in which The first term in the expression V( ) models the arrivals of customers to station 1 from outside the system and the last one the customer holding cost. Similarly the first term in the expression 1 V( ) corresponds to a customer who finished his service in station 1 and into station 2 and the second one the uniformization constant. The last one in 1 V( ) is the resources cost in station 1. The first term in the expression 2 V( ) corresponds to a customer who finished his service in station 2 and the second one the uniformization constant. The last one in 2 V( ) is the resources cost in station 2.
According to (4), we can solve another optimization problem: if ≡ 0, ℎ = 1, = 1, 2, then (5) is equivalent to minimization of the mean number of customers in the queueing system. In this case, the optimal action would be always ( max , max ) by intuition, which also satisfies the structure of the optimal policy in next section. In addition, the analysis method and structure in this section are held for both the partial and full information cases.

Structural Properties of the Optimal Policy
In this section, we focus on deriving the optimal policy. The properties of the optimal policy will provide basal insight for us, and this also helps one to find the optimal policy with less computational effort due to a reduction of the solution search space.
In order to study the optimal policy, intuitively, the optimal equation V( ) = V( )+ should be solved. However it is hard to solve analytically in practice. It can be obtained by recursively defining V +1 = V for arbitrary V 0 . We know that the actions converge to the optimal policy as → ∞. For existence and convergence of the solutions and optimal policy, we can see more details in the works of Aviv and Federgruen [25] and Sennott [26]. The backward recursion equation in our model is given by For ease of notation, let the arg V( ), = 1, 2 denote the set of optimal action for station with state = ( 1 , 2 ) in the partial information case in which the action is Using the optimality equation and recursive method, we can get some properties of the relative value function in the following lemma which will be used in the proof of the main results and the proofs of these properties are given in Appendix A.

Lemma 1. For the optimal value function V( ) in this model, we have
As we know, at the decision epochs if the manager gets the full information about the system, he will make a decision based on the number of the customers in both stations. Weber and Stidham [13] and Veatch and Wein [14] used submodularity of the value function V( )−V( + 2 − 1 )−V( − 2 ) + V( − 1 ) ≤ 0 to prove the main conclusion transition monotonicity for the full information case ( ) ≤ ( − 2 ) and ( ) ≤ ( + 2 − 1 ). The optimal resource allocation policy has the switching function policy or region control policy type for the full information case [17]. However, the corresponding results for the partial information case are not studied. In this paper, we study the property of the optimal policy under partial information. Different from method in the full information case, we get some structure properties of the optimal policy by the quasiconvexity property of the relative value function and present the structure properties of the optimal policy in the following theorem.

Theorem 2.
In our model under partial information, the optimal policy has the monotonicity properties, that is, for all = ( 1 , 2 ) ∈ : The proof of the above theorem is based on the following property which shows the quasiconvexity properties of the relative value function. The proofs of Theorem 2 and the following Lemma 3 are given in Appendix B.

Lemma 3.
For the optimal value function V( 1 , 2 ) under partial information, we have Based on the above properties of the value functions, we derive the relationship between the two stations' optimal policy by analyzing the properties of the service rate and holding cost functions. The following theorem shows the conditions under which the optimal policy for station 1 is bigger than that in station 2.

Remark 5.
From the above theorem we can conclude that under some conditions the optimal size of the service resources allocated to station 1 is less than that to station 2. We find that the optimal size of the resource allocated to each station depends on the resource cost variation ( ) − ( ) and the service rate variation ( ) − ( ) in each station. This condition seems to imply that when the same service resources are added to both station 1 and station 2, then the performance of station 2 is improved more than station 1 while the higher cost is incurred in station 1 than in station 2. So that it implies that the optimal policy satisfies the relationship ≥ .
It is well known that if the service resource cost function is linear, then an all-or-nothing (bang-bang) control is optimal. Weber and Stidham [13] and Veatch and Wein [14] give a detailed conclusion for this issue. Actually it is not obvious whether the bang-bang control is also optimal, when the service resource cost and service rate functions are more general than linear in service resource. We are interested in the special structure of the optimal control policy in this model. In contrast to existing studies, the results in the following theorem are extension of the model with linear case. We are now ready to give some conditions under which the optimal policy is unique and has the bang-bang control property. Proof. We prove only conclusion for station 1, and the same proof can be applied to get conclusion for station 2. To prove part (i), we consider the optimal policy in station 1 service resource allocation. For the definition of the operator 1 , we have the following minimization problem: Rearranging the terms of the first-order optimality condition of the above problem, we obtain Because the allocation resource action ∈ = [0, max ], the optimal policy in station 1 can be 0 or max or satisfies the above equation. Since the function 1 ( ) = 1 ( )/ 1 ( ) is monotonous on ∈ , there is at most one solution solving the above equation. Next if the optimal policy is 0 or max , we show that the action 0 and max cannot be the optimal policy simultaneous for station 1. We take the contradiction method. Assume that the action 0 and max be the optimal policy simultaneous for station 1 at state . Then we have Because the action 0 is optimal policy, we Taking the above equation into the inequality, we can get 1 ( )/ 1 ( ) ≥ 1 ( max )/ 1 ( max ) for all ∈ (0, max ) which is contradicted against condition (2). Hence the optimal policy for station 1 is unique.
To prove part (ii), we consider the optimal policy in station 1 service resource allocation. We use the contradiction method and assume that there exists a state ∈ for which the optimal policy ∈ arg 1 V( ) in station 1 satisfies ∈ (0, max ). For any > 0, we have 6 Mathematical Problems in Engineering which implies that Since the function 1 ( )/ 1 ( ) is strict decreasing, we get Because the action is the optimal policy for station 1 in state , we have 1 V( ) ≤ 0 1 V( ); that is, From the above theorem we can conclude that, under some conditions, the optimal is a bang-bang control policy. We try to give intuitive interpretations to these conditions and results, which would help us to understand the theorem intuitively. For the conditions in Theorem 4 (ii), it is clear that 1 ( )/ 1 ( ) represents the expected service cost for one customer in station 1 under policy . Since the function 1 ( )/ 1 ( ) is strict decreasing for all ∈ (0, max ) which yields that the policy = 1 is optimal for every customer service cost in station 1. For the total average cost of the system, we can regard it as the average cost per unit time since every customer must be processed in each station. While for the state = (0, 0), it is obvious that no service resource should be allocated in two stations.

Numerical Examples
For the full information case, the corresponding results and numerical example have been investigated in [14,17].
In this section, we conduct numerical experiments under different parameter settings to demonstrate the main results obtained in this paper for the partial information case. On one hand, these examples provide direct insight into how the change of the system state may impact the optimal resource allocation parameters ( * , * ). On the other hand, the numerical experiments and Figures 2, 3, 4, and 5 provide the direct support for the results about the structure of the optimal resource allocation policy obtained in the above section. The following experiments are made for the case of As is shown in the figures, we can make the following observations. From Figures 2 and 3, we present numerical results of the optimal policy for the case with = 1.3, ℎ 1 = 1.5, ℎ 2 = 2, ( ) = 2 2 , ( ) = 10 3 , = 1, 2. As can be seen from Figure 2, the optimal resource allocation policy * increases as the number of customers in station 1 increases, which shows a staircase-like increasing pattern. This phenomenon is consistent with the results in Theorem 2, while the optimal policy * for station 2 remains constant for varied value of 1 . Meanwhile Figure 3 shows that the optimal resource allocation policy * remains constant and * also shows a staircase-like increasing pattern with the number of customers in station 2 increasing. Moreover, it is noted in these two figures that the line graph of the optimal policy * for station 1 is always under the line graph of the optimal policy * for station 2. This is easy to explain from the results in Theorem 4 that the conditions in this numerical experiment satisfy Theorem 4.
In Figures 4 and 5, we describe the characteristics of the optimal policy for the case with = 1.1, ℎ 1 = ℎ 2 = 1, From Figure 4, we find that the optimal policy for station 1 is * = 0 if 1 = 0 otherwise * = 1, which shows that the optimal policy for station 1 has a bang-bang control type, while the optimal policy * for station 2 remains constant. As it is observed from Figure 5, the optimal policy * for station 1 always equals 1, which also belongs to the bang-bang control policy. The optimal policy * for station 2 shows a staircaselike increasing pattern with the number of customers in station 2 increasing. These figures provide a direct support for the results in Theorem 6 since the functions 1 ( ) = 2 3 , 1 ( ) = 2 2 in this numerical experiment follow the conditions in Theorem 6 (ii).

Conclusion
In this paper, we have analyzed the optimal resources allocation control policy of a tandem queueing system with the general service cost and service rate functions. Applying the queueing system and MDP theories, we not only give some traditional properties of the relative value function and optimal policy but also derive the conditions under which the optimal policy is unique and has a bang-bang control property, which has not been studied before our work. In particular, we have provided the relationship between two stations' optimal policies, which can give the manager basal insight into the structure of optimal policy information to improve decision-making of the system. From the above results, there arise some interesting extensions of the model which we may study in the near future. One possible change is to consider the tandem queueing system with retrial or feedback customers which will make the model more useful in practical system. Another way to extend the model is to apply the semi-Markov decision processes to consider the queueing system in which the service time of a customer is a general distribution. Furthermore, in practice, the production systems are often likely to be burdened by mixed uncertainties of both randomness and fuzziness; the study of the optimal control of the tandem queueing system with fuzziness may provide more precise information to managers, which is also an interesting topic for future research.

A. Proof of Lemma 1
The Proof of Lemma 1 Proof. To prove Lemma 1 (i), the proof is done by induction on in V . Define V 0 ( ) = 0 for all state ∈ . This function obviously satisfies (i). Now, we assume that (i) holds for the function V ( ), ∈ , and some ∈ . We should prove that V +1 ( ) satisfies the nondecreasing property as well. Then for = 1, we can get The second term of the right-hand side is obviously positive.
Let ( ∈ arg 1 V( + 1 ), ∈ arg 2 V ( + 1 )) be an arbitrary optimal policy for two stations in state + 1 . Then Mathematical Problems in Engineering Therefore, Lemma 1 (i) holds by induction for any , V( ) is a nondecreasing function. Lemma 1 (i) for = 2 can be proved in a similar manner.
To prove Lemma 1 (ii), the proof is similar to the proof of Lemma 1 (i). Define V 0 ( ) = 0 for all state ∈ . This function obviously satisfies (ii). Now, we assume that (ii) holds for function V ( ), ∈ and some ∈ . We should prove that V +1 ( ) satisfies Lemma 1 (ii): Since condition 2ℎ 2 ≥ ℎ 1 holds, the second term of the righthand side is obviously positive.

B. Proof of Lemma 3 and Theorem 2
The Proof of Lemma 3 (i) and Theorem 2 (i) Proof. To prove Lemma 3 (i), we assume that Lemma 3 (i) for function V ( ), ∈ , and some ∈ holds. Then we need to prove that Lemma 3 (i) for + 1 also holds. We have The inequality holds by the induction hypothesis. The optimal policy of station 1 is only dependent on the number of customers in station 1 and the states + 2 , , − 2 have same first entry 1 . Hence, they have the same optimal policy in station 1. We assume that ∈ arg 1 V( ), 1 ∈ arg 2 V( + 2 ), The first inequality follows by taking a potentially suboptimal action in the second term of ∑ =1,2 V ( + 2 ) − 2 ∑ =1,2 V ( ) + ∑ =1,2 V ( − 2 ). The equality follows by rearranging the terms. The last inequality follows by the induction hypothesis. Hence, we have V( + 2 )−2V( )+V( − 2 ) ≥ 0.

Conflicts of Interest
The authors declare that they have no conflicts of interest.