Delay-Aware Online Service Scheduling in High-Speed Railway Communication Systems

We investigate the downlink service scheduling problem in relay-assisted high-speed railway (HSR) communication systems, taking into account stochastic packet arrivals and quality-of-service (QoS) requirements. The scheduling problem is formulated as an infinite-horizon average cost constrained Markov decision process (MDP), where the scheduling actions depend on the channel state information (CSI) and the queue state information (QSI). Our objective is to find a policy that minimizes the average end-toend delay through scheduling actions under the service delivery ratio constraints. To address the challenge of centralized control and high complexity of traditional MDP approaches, we propose a distributed online scheduling algorithm based on approximate MDP and stochastic learning, where the scheduling policy is a function of the local CSI and QSI only. Numerical experiments are carried out to show the performance of the proposed algorithm.


Introduction
Recently, high-speed railway (HSR) systems have developed rapidly all over the world.The passengers on the train will not only enjoy the short journey but also have a high demand on multimedia services.The cellular network deployed along the rail lines can provide seamless coverage and data packets delivery.However, the data transmission rate is strictly limited due to the penetration loss in traditional HSR communication systems.As an alternative solution, the relay-assisted HSR network architecture has been proposed in [1,2], which was considered a better choice than direct transmission in case of large penetration loss [3] and becomes a promising architecture for future broadband mobile communications in providing high data-rate services [4].
We consider a relay-assisted two-hop HSR network architecture in this work.The data packets are delivered via a relay station (RS) instead of direct transmission to achieve a high data transmission rate.The passengers send service requests when the train is moving.If a large number of services are requested, the resource contention among multiple services should be resolved and an efficient scheduling scheme should be proposed, which not only considers the highly dynamic channel due to the extremely high moving speed but can also be implemented in a distributed manner with low complexity.In addition, the buffering at network devices, for example, content server and RS, is involved; it is thus important to consider not only the throughput performance but also the end-to-end (e2e) delay performance.Delay is a key qualityof-service (QoS) criterion for real-time multimedia services.As a result, we will focus on the delay issues for multimedia services and aim at developing a delay-aware scheduling algorithm for the relay-assisted HSR communication systems.
Many of the previous studies have been conducted to improve performance on scheduling and resource allocation in HSR communication systems.In order to support the e2e real-time data application, [5] studied a circuit domain latency model and employed a priority scheduling algorithm to estimate approximate service latency.In HSR networks with a cell array architecture, [6] proposed a scheduling and resource allocation mechanism to maximize the service rate by considering the channel variations and handover information.The optimal resource allocation problem in a cellular/ infostation integrated HSR network has been investigated in [7], considering the intermittent network connectivity and multiservice demands.However, to the best of our knowledge, none of them has addressed the delay-aware scheduling in downlink relay-assisted HSR communication systems.Although the two-hop HSR network architecture is simple, how to schedule multiple services for such a network when taking account of stochastic packet arrivals and QoS requirements is still an open problem.This motivates us to investigate the delay-aware downlink scheduling problem for relayassisted HSR communication systems.
The contributions of this paper are threefold.First, the two-hop scheduling problem is formulated as an infinitehorizon average cost constrained Markov decision process (CMDP) with the objective to minimize the average e2e delay under the service delivery ratio constraints.The above CMDP problem is converted into an unconstrained MDP by Lagrange theory and then the general solution of the CMDP problem is given by traditional iterative methods.Second, since the general solution could not give a simple implementable solution due to the curse of dimensionality, in order to simplify the solution and address the challenge of centralized control, we propose a distributed online scheduling algorithm based on approximate MDP and stochastic learning.A linear combination of per-node value functions is employed to approximate the value function of the associated optimality equation.Based on the per-node value functions, a distributed two-stage scheduling policy is derived, which is a function of the local state information only.Third, simulation results show that the proposed algorithm can achieve better performance in terms of e2e delay and service delivery ratio compared to conventional schemes.Moreover, the convergence of the proposed scheduling algorithm is established through simulations.
The remainder of the paper is organized as follows.Section 2 provides the details of the system model and assumptions.Section 3 presents the CMDP problem formulation and discusses the general solution.In Section 4, we propose a distributed online scheduling algorithm using approximate MDP and stochastic learning.Section 5 shows the performance evaluation results obtained through simulations.Finally, Section 6 makes some conclusions.

System Model and Assumptions
A relay-assisted HSR network architecture is shown in Figure 1.The cellular network deployed along the rail line can provide seamless coverage and data packets delivery.The base stations (BSs) are connected to the backbone network via wireline links.For simplicity, we assume that the bandwidth of the links from the backbone network to the BSs is sufficiently large so that the packets can be transmitted to BSs with a negligible delay.An RS with powerful antennas is installed on the top of the train to communicate with BSs.The RS is further connected to an access point (AP) which can be accessed by the users based on wireless local area network (WLAN) technologies.Thus, the two-hop wireless link consists of the BS-RS link and the AP-users link.With this two-hop architecture, radio signals do not need to penetrate into the carriages, and thus the radio signal penetration loss problem is resolved.
Distributed content servers are deployed in order to offload data traffic from the backbone network [8].When a service is requested from the passengers, the data packets can be fetched from the corresponding content server (CS).To make the analysis of such network tractable, we assume each CS can provide one type of service.Notice that this can be easily extended to the situation where each CS can provide multiple types of services while one buffer is allocated to each type of service in the CS, which will be clear later on in Section 2.2.In order to simplify the protocol design for HSR applications, erasure coding based service delivery is considered [7] and the advantage is that no retransmission scheme is required for the transmission error due to a highly dynamic wireless channel condition.
Compared to the traditional cellular networks, the deterministic train trajectory in HSR networks is a unique feature [7,9].The train trajectory represents the location of a train at a specific time, and BS provides the service delivery if the train is under its coverage.Since the train moves on a predetermined rail line and the velocity is relatively steady, the information of train trajectory can be obtained in advance with high accuracy so that the service packets can be delivered by the specific BS at a certain time.Therefore, this paper focuses on the delay-aware service scheduling problem regardless of which BS is used for service delivery.

Physical-Layer Model.
We consider a time-slotted system for downlink service transmission with slot period   .When the train moves along the railway, BS and AP can transmit simultaneously without interference by operating on different frequency bands.For the communication in the BS-RS link, we consider the MAC frame structure proposed in [10] which is specifically designed for high-speed trains with a speed of up to 360 km/h.For the communication in the AP-users link, traditional WLAN standards, for example, IEEE 802.11a/b/g, are employed since passengers within the train are relatively stationary with respect to the AP.
Let H() = H 1 () ⋃ H 2 () be the joint CSI, where H 1 () = { BS, (), ∀} with the channel gain  BS, in the BS-RS link and H 2 () = { , (), ∀} with the channel gain  , between the AP and the user requesting service .The channel from BS to the RS is assumed to follow the Rician distribution [11], while the channel between AP and the users can be treated as Rayleigh fading channel [3].We assume that the channel gains remain constant during a slot duration and change across slots in an i.i.d.manner.In addition, to enhance transmission performance, the strong channel coding is used, and hence, the maximum achievable data rate is achieved by the instantaneous mutual information.For the BS-RS link with the bandwidth  1 , the maximum achievable data rate in bitper-second is given by where  BS is the transmit power of the BS and  0 is the noise power.Since all the services share the common channel in the BS-RS link, they have the same data rate  BS .Likewise, for the AP-users link with the bandwidth ), ∀}.Specifically,  CS, () and  , () denote the number of packets at the beginning of slot  in the buffer of th CS and the th buffer in RS, respectively.These  heterogeneous services have different packet arrival rates and QoS requirements.Data packets from the higher layer application arrive into the buffers and are queued until they are transmitted.The packet arrival process for each type of service is assumed to be i.i.d.across slots.Let   () denote the number of packets arriving into the buffer in th CS at slot .Suppose in general,   () follows the truncated Poisson distribution   () with average packet arrival rate   = E[  ()] for service .The distribution   () can be given as where   is found assuming   (  ) → 0.
The average number of arriving packets at the th CS is given by   (1−   ), where    is the dropping probability.This is the same as the average number of packets received by the corresponding buffer in RS since the two buffers are in tandem.The sufficiently large buffer and negligible dropping probability (   ≈ 0) assumptions are considered in this paper.

MDP Model.
In this paper, the service scheduling problem for the two-hop link in HSR networks is formulated as an infinite-horizon average cost CMDP.To make the analysis of the CMDP problem tractable in the sequel, it is necessary to identify the elements of MDP model in our scheduling problem.In general, an MDP model consists of five elements: decision epochs, states, actions, cost function, and state transition probability function.We describe these elements as follows.
The scheduling decisions for the data packet delivery in the two-hop link have to be made slot by slot and the instant slots are called decision epochs.Let S and A be the global state space and action space, respectively.S = {S 1 , S 2 , . . ., S |S| } = Q × H, where Q is the global QSI state space and H is the global CSI state space.The global system state at slot  is denoted by S() = (H(), Q()).The action space is denoted by A = X × Y, where X = {  ∈ {0, 1}, ∀} and Y = {  ∈ {0, 1}, ∀}.  and   are scheduling actions for service  in the BS-RS link and AP-users link, respectively.The scheduling action is set to be 1 if the corresponding service  is scheduled and is set to 0 otherwise.Moreover, the scheduling actions should satisfy ∑    () = 1 and ∑    () = 1 at any slot .
Given a current state S(), the scheduling action can be decided according to a stationary policy.A stationary scheduling policy (Π) map S to A, that is, Π : S → A. Π is called feasible if the associated actions satisfy the constraints.We assume that the next state S( + 1) only depends on the current state and scheduling actions but not on the previous states; hence the process {S()} for a given control policy Π is Markovian with the following state transition probability function: Pr where the equality holds because of the independence of channel state update processes and queue state update processes.The queue state update processes at CSs and RS are given in ( 5) and ( 6), respectively.Consider , ( + 1) Specifically, the dynamics of the buffers for all CSs in (5) are controlled by scheduling actions in the BS-RS link and the dynamics of the buffers in RS shown by (6) are controlled by scheduling actions in the two-hop link.Given a feasible policy Π, the induced Markov chain {S()} is ergodic and there exists a unique steady state distribution   [12].By Little's law [13], for a sufficiently small dropping probability (1 −    ≈ 1), the average time that a data packet of the service  spends in the e2e system is ( CS, +  , )/  .Since the e2e delay in relay-assisted HSR networks is considered, the corresponding per-slot cost function is defined as 2.4.Optimization Objective and Constraints.Our goal is to find an optimal policy Π * so that the average e2e delay is minimized while satisfying the service delivery ratio constraints.
For any policy Π, the average e2e delay can be expressed as where E   denotes the expectation with respect to the induced steady state distribution   .The average e2e delay expressed by (8) means the average time that a data packet of all the services spends when transmitted from the CSs to users.In the MDP model, the average e2e delay can be regarded as the expected cost function.Similarly, in order to indicate heterogeneous delivery ratio requirements for different types of services, the delivery ratio constraints can be given by where   () = min{ , (),  , ()} is the maximum number of packets which can be successfully received for service  at slot , and   ∈ (0, 1] denotes the delivery ratio requirement of service , reflecting that the average delivery rate is proportional to the average arrival rate.

CMDP Problem Formulation
In this section, we formulate the delay-aware scheduling problem as an infinite-horizon average cost CMDP and discuss the general solution.The objective is to choose an optimal scheduling policy Π * so that the expected cost function ( 8) is minimized subject to the service delivery ratio constraints (9), which is expressed by This problem is an infinite-horizon average cost CMDP with system state space S, action space A, the state transition probability (4), and per-slot cost function (7).

Lagrangian Approach and Unconstrained MDP.
The above CMDP problem can be converted into an unconstrained MDP by Lagrange theory.As demonstrated in [14, Theorem 12.7], the optimal cost and policy of the CMDP can be obtained by an unconstrained MDP and Lagrangian approach.Firstly, the Lagrangian of (10a) and (10b) is expressed as where  (S () , Π (S ()) , ) and  = [ 1 ,  2 , . . .,   ]  is the Lagrange multiplier (LM) vector.Hence, the corresponding unconstrained MDP is given by where () gives the Lagrange dual function.As shown in [15, Theorem 2.1], there exists an LM vector  ⪰ 0 such that Π * minimizes L(Π, ) and the saddle point condition holds.Given an LM vector, the optimal policy Π * for ( 13) can be obtained by solving the associated optimality equation [12,16] as follows: where (S) is the value function of the MDP and Pr[S  | S  , Π(S  )] is the state transition probability which can be obtained from (4). = min Π L(Π, ) is the optimal average cost per-stage and Π * (S  ) is the optimal policy minimizing the right-hand-side of ( 14) at any state S  .As shown in [12], the value iteration algorithm is an efficient stable iteration algorithm to solve the optimality equation, which operates by calculating successive approximation to the value function (S) with computation complexity (|A||S| 2 ).

The Reduced State
Optimality Equation for CMDP.The optimality equation ( 14) is very complicated to solve due to the huge cardinality of the system state space.In general, the channel state transition is often unavailable a priori in HSR environment.However, the statistical characteristics of the channel distribution can be obtained, which motivates us to simplify the optimality equation ( 14) in this subsection.By taking expectation on channel state H, the reduced state optimality equation is defined as follows: where is the conditional average queue state transition probability.Instead of working on the global system state, the reduced state optimality equation ( 15) only depends on QSI.As shown in [17], the scheduling policy obtained by solving the optimality equation in ( 14) is the same as that obtained by solving the reduced state optimality equation.

Distributed Online Scheduling Algorithm
In certain cases of practical interest, there are still three difficulties in adopting the optimal scheduling policy presented above.Firstly, solving (15) has exponential complexity.Secondly, excessive latency may not be acceptable for real-time services due to multiple iterations.Finally, the optimal policy by solving ( 15) is a function of global QSI, which cannot be applied for distributed implementation.To overcome the above difficulties, we adopt the key theories of MDP and stochastic approximation and propose a distributed online scheduling algorithm, which illustrates how we could utilize the techniques of approximate MDP and stochastic learning to facilitate the distributed implementation with low complexity.

Approximate MDP.
To reduce the size of the state space and decentralize the service scheduling, we approximate Ṽ(Q) in ( 15) by the linear approximation of global value function, which is given below as where Ṽ(Q) is the global value function and ṼCS, () and Ṽ, () are regarded as per-node value functions at the buffer of th CS and the th buffer in RS, respectively.We note that the dimension of the value function is greatly reduced through the linear approximation.Moreover, the per-node value function can only satisfy the optimality equation ( 14) in some particular states Q par = { , ,  , | ∀ = 1, . . ., ,  = 1, . . .,   }, where  , denotes Q 1 with  CS, =  and  ,  = 0 ∀  ̸ =  and  , denotes Q 2 with  , =  and  , = 0 ∀  ̸ = .

Distributed Scheduling Policy under MDP Approximation.
Using the linear approximation in ( 16), we derive a distributed scheduling policy depending on the local CSI and QSI as well as the per-node value function of each buffer at RS and CSs.
From the above formula derivation, we can obtain the optimal scheduling scheme by solving (17b).Since the solution to the first link depends on the solution to the second link and vice versa, the scheduling decisions in the two-hop link are coupled.One feasible solution can be obtained by enumeration, but it cannot be distributed.In order to obtain the scheduling policy in a distributed manner, motivated by the observation in [18], we split the optimization problem into two stages, which correspond to the BS-RS link and AP-users link, respectively.In the first stage, CSs solve a local MDP problem to determine the scheduling actions in the BS-RS link.In the second stage, by using the scheduling results in the first stage, RS determines the scheduling actions in AP-users link such that the e2e delay is minimized.Specifically, these two stages are described as follows.

Distributed Scheduling Policy in First Stage.
With the given per-node value functions { ṼCS, (), ∀} and local queue state set Q  1 = {  CS, , ∀}, the scheduling policy in the first stage can be obtained by solving the following local MDP problem: = arg min where Notice that the derivation in (18a) and ( 18b) is similar to that in (17a) and (17b).The local MDP problem (18a) and (18b) can be solved by the distributed computing  * 1 = arg min  { 1 ()}, which implies   * 1 = 1.As shown in (19), the probability density functions (PDFs) of the packet arrival processes are required to know a priori to evaluate the  1 ().However, the PDFs are often unavailable a priori in HSR communication systems, hence, developing an online scheduling algorithm without requiring known PDFs is necessary.Fortunately, due to the concept of opportunistically minimizing an expectation [19], an online algorithm can be developed by removing the expectation operator.Given the observed packet arrivals   for all services and the scheduling policy can be obtained by  * 1 = arg min  { F1 ()}, where F1 () = ṼCS, (  CS, −  BS +   ) − ṼCS, (  CS, +   ).

Distributed Scheduling Policy in Second Stage
. By substituting the results from the first stage into (17a) and (17b), the scheduling policy in the second stage can be obtained by solving the following problem: where Similarly, given { Ṽ, (), ∀} and Q  2 = {  , , ∀}, we can solve (20) by the distributed computing  * 2 = arg min  { 2 ()}, which implies   * 2 = 1.

Distributed Online Scheduling Algorithm.
Based on the distributed scheduling policy in two stages, we propose a distributed online scheduling algorithm using stochastic learning, which determines the scheduling actions and the per-node value functions as well as the LMs.As the train moves from the origin station to the terminal, the online algorithm allocates the network resource to multiple services slot by slot.The detailed steps of the proposed algorithm are given as follows.
Step 2 (scheduling decisions).When the train is moving, each CS and RS decide the scheduling actions in the two stages separately at the beginning of slot .After obtaining H 1 (), the link capacity  BS () can be calculated.Then based on the local state information   (),  CS, (), and { Ṽ CS, ()}, each CS calculates F1 () which is exchanged to determine the scheduling actions in the first stage.Similarly, using the scheduling results in the first stage, based on the local state information  , (), H 2 (), and { Ṽ , ()}, RS determines the scheduling actions in the second stage.
The parameters are updated by the subgradient method, which is described as follows: where {  V } and {   } are the step size sequences satisfying [17]: Furthermore, to enforce the convergence of the LMs and per-node value functions,    and   V should also satisfy lim  → ∞    /  V = 0, for example,    = 1/(1 +  log ) and As shown in [20], the key idea of iteration convergence is characterized as follows.In (22a), (22b), and (22c), the updates of the per-stage value functions ṼCS, and Ṽ, as well as the LMs   are performed using different step sizes.The update rate of the value functions is faster than that of the LMs.From the perspective of the LMs, the value functions will approximately converge to the optimal values corresponding to their current values, because they are updated at a faster time scale.Also, from the perspective of the value functions, the LMs appear to be almost constant.These two time-scale updates ensure that the value functions and the LMs converge to the optimal solution.

Implementation Issues.
The distributed online scheduling algorithm runs in the three steps when the train moves from the origin station to the terminal.At each decision epoch, the scheduling actions in both stages are decided after observing each local CSI and QSI.It is worth emphasizing that the proposed algorithm is different from the iterative algorithms in static optimization problems because the data packets can be delivered during the iteration steps.
As an alternative, a table look-up method can be used for service scheduling in HSR networks.Specially, once the scheduling policy is obtained by the distributed online scheduling algorithm or other algorithms it can be stored in a table format.Each entry of the table represents the scheduling action for the given global system state including CSI and QSI.At each decision epoch, after observing the current state, the network controller looks up the table to find out the corresponding scheduling action and then executes the scheduling decision.The table look-up method is effective with low computational complexity.
Furthermore, the periodic updates in the proposed algorithm are necessary.After the proposed algorithm converges, the updates can be performed with long frequency slots or at random slots instead of at every slot.Since all the channel and queue states are realized infinitely many times during the trip, the periodic updates can ensure that the proposed algorithm also converges to the optimal solution.If the table lookup method is used, the table storing the scheduling policy can also be updated corresponding to each update in the proposed algorithm.

Simulation Results and Discussions
In this section, we implement the proposed scheduling algorithms using MATLAB and present simulation results to illustrate the performance of the algorithm.

Simulation Setup.
For the purpose of comparison, we evaluate three related scheduling schemes as reference benchmarks.The first one is the traditional round-robin (RR) scheme which schedules services in a predetermined order.At time-slot , the ( mod ( + 1))th service is chosen for the two links.The second one is the greedy scheme, where service scheduling is done in a greedy method.Specifically,  *  = arg max  { CS, ()} and  *  = arg max  {min( , (),  , ())}.The third one is the heuristic packet scheduling algorithm proposed in [21] and used in the two links, where the services are scheduled for transmission depending on transmission rate, packet utility, and proportional fairness.
In order to better illustrate the performance of the scheduling algorithms, the suitable parameters should be set in simulations.We use a typical setting for HSR communication systems [10], with   = 53 s and  = 240 bits.The wireless communication in the two-hop link is established based on a carrier frequency of 2.4 GHz with bandwidth  1 =  2 = 10 MHz.The transmit power of the BS and AP is  BS = 47 dBm and   = 14 dBm, respectively.The Rayleigh fading channel state  , () is sampled from the probability density function expressed as   (ℎ) = (ℎ/ 2 ) exp(ℎ/2 2 ), ℎ ≥ 0, and  = −2 dB.For the Rician fading channel state,  BS, (), Rician factor is 6 dB according to the Winner II project measurement results [22].The maximum size   for all buffers can be set as 50 packets such that there is no packet dropped from the buffers.A single simulation runs the algorithms for 400 slots and the results are averaged over 100 simulation runs.

Simulation Results.
In our proposed scheduling algorithm, the objective is to minimize the expected cost function defined in (8), which includes the e2e delay.In the simulation results, the average e2e delay and service satisfaction are used as metrics to show the performance improvement.In addition, the convergence of the proposed distributed online scheduling algorithm is established through simulations.
Figure 2 compares the delay performance of the four scheduling schemes with different numbers of requested services.We set the delivery ratio requirement   = 0.8 for all services with an equal average packet arrival rate of 5 packets/slot.It can be observed that the proposed distributed scheduling algorithm could achieve significant performance gains in average e2e delay over the other schemes.This illustrates the advantages of the proposed algorithm with distributed scheduling policy, which could effectively reduce the value of expected cost function in the two-hop HSR networks.Furthermore, we can see that, as the number of requested services increases, the average e2e delay per service for all scheduling schemes grows.This can be explained as follows.Since the capacity of the wireless channel is fixed in general, when the number of requested services increases, there are more data packets to be transmitted and less scheduling chance will be given to a certain service; then the average e2e delay per service becomes larger.Therefore, to reduce the average delay for service delivery and satisfy the QoS requirements, multi-input multioutput (MIMO) antennas can be deployed to improve the capacity of the wireless channel in HSR communication systems.
Figure 3 presents the delay performance with respect to the average packet arrival rate for the four scheduling schemes.In the simulation, there are 6 types of services supported and   = 0.8 for each service.The average packet arrival rate for all services is set from 1 to 7 in order to prevent buffer overflow.From the figure, we can see that the average e2e delay per service in the proposed scheduling algorithm is smaller than the other schemes no matter how much the packet arrival rate is.In addition, the average delay value increases quickly with the increase of average packet arrival rate, which can be explained by Little's law.For the video services on the train, the average packet arrival rate is very high, so it is necessary to provide a large buffer or improve wireless transmission rate so as to prevent buffer overflow.
Figure 4 indicates the service delivery performance under the four scheduling schemes with  = 6 service types.The average packet arrival rates for all services are equal to 6 packets/slot.Based on the delivery ratio constraints (9), we define the satisfaction parameter   of service  as the ratio of average throughput   to     ; that is,   =   /    .A larger   represents the higher satisfaction of service  and   ≥ 1 implies that the delivery ratio constraint for service  is satisfied.Let the delivery ratio requirements from service 1 to service  be 0.9, 0.9, 0.8, 0.8, 0.7, and 0.7.For service , the rightmost bar shows the parameter   for the proposed scheduling algorithm, and the remaining bars represent the parameter   for the other three schemes.Simulation results show that the proposed distributed two-stage scheduling algorithm can achieve better service delivery performance over other schemes.This can be explained as follows.The RR scheme is fair in terms of scheduling opportunities but not channel/queue aware, the greedy scheme schedules the service with more packets in its buffer but does not consider the channel condition, and the heuristic scheduling algorithm is developed based on transmission rate, packet utility, and proportional fairness, which does not consider the heterogeneous delivery ratio requirements for different services.The same is true for both the two stages.Thus, the proposed algorithm, which is channel/queue aware and considers the service delivery ratio requirements, will be efficient on delivery ratio constraints in the long term.
Figure 5 illustrates the convergence property of the proposed distributed online scheduling algorithm.We plot the value functions of the CS for the first type of service versus scheduling slot index.It can be seen that the distributed algorithm converges quite fast and the values are extremely close to the final converged results after 200 iterations.The similar results can be obtained for per-node value functions at other CSs and RS.By comparing different curves, we can see that the per-node value function is increasing in queue backlog .There are two reasons to explain this property.First, the value of per-slot cost function (7) grow linearly with the increase of queue backlog .Second, the value function is positively correlated with the per-slot cost function based on (15).Furthermore, unlike the iterations in static optimization problems, the proposed scheduling algorithm is online, implying that data packets are delivered during the iteration steps.

Conclusion
Providing passengers with multimedia services is one of the most important applications in HSR communication systems.This paper investigated delay-aware downlink service scheduling problem with stochastic packet arrivals and QoS requirements in relay-assisted HSR communication systems.We elaborate on the theory of MDP and illustrate how the approximate MDP and stochastic learning could help in obtaining low-complexity and distributed scheduling solutions.Simulation results show that the proposed algorithm outperforms other existing schemes in terms of average e2e delay and service delivery performances.Furthermore, the convergence of the proposed distributed online scheduling algorithm is shown by simulations.For our future work, we will investigate the dynamic stochastic scheduling problem in the HSR networks using the stochastic network optimization approach.In addition, since the delay-aware is considered, motivated by [23], the design of a dynamic output feedback controller is necessary for multimedia services transmission in HSR communication systems.

Figure 2 :Figure 3 :
Figure 2: Average e2e delay per service versus the number of services.

Figure 5 :
Figure 5: Illustration of the convergence of the proposed algorithm.