Learning for Metro Systems Energy Optimization under Random Disturbances

Considering that uncertain dwell disturbances often occur at metro stations, researchers have proposed manymethods for solving the train timetable rescheduling (TTR) problem.,is paper proposes a Modified Genetic Algorithm-Gate Recurrent Unit (MGAGRU) method, which is a real-time TTR method based on deep learning. ,e proposed method takes the Gate Recurrent Unit (GRU) network as the decision network and uses the results produced by the Modified Genetic Algorithm (MGA) as the training set of the decision network. A well-trained decision network can provide effective solutions in real time after random disturbances occur, in order to optimize the net traction energy consumption of trains inmetro systems. Based on the ShanghaiMetro Line One (SML1) pilot network, this paper establishes a comprehensive model of the metro system as a training and testing environment to verify the energy-saving effect and real-time performance of the proposed method in solving the TTR problem. ,e experimental results show that in the two-train metro system, the three-train metro system, and the five-train metro system, the MGA-GRU method can save an average of energy by 4.45%, 6.16%, and 7.19%, while the average decision time is only 0.15 s, 0.27 s, and 0.33 s, respectively.


Introduction
Compared with ground transportation such as buses and taxis, urban metro systems have achieved rapid development worldwide due to the advantages of no traffic jams, large capacity, and high safety [1]. Although metro systems are energy-efficient compared with other ground vehicles, they still consume a lot of energy [2,3]. Due to problems such as rising energy prices and environmental pollution, in recent years, reducing the net traction energy consumption of trains in metro systems by studying and optimizing the train timetable has become an important research topic [4,5].
In metro systems, when a train brakes, it can regenerate braking energy to the system [6]. Regenerative braking energy (RBE) can be reused by trains that are simultaneously accelerating and can also be stored in energy storage devices such as batteries [7], supercapacitors [8], and flywheels [9].
Otherwise, RBE must be consumed by resistors as thermal energy to prevent the train voltage from surpassing the safety threshold [10,11]. e timetable of a metro system can be either predetermined offline or dynamically changed in real time. By designing a suitable timetable, the acceleration trains and the braking trains can be better synchronized to make better use of RBE [12].
Although researchers have conducted extensive research on train timetable in the past few decades, trains in metro systems are still often subject to unexpected disturbances, such as a sudden increase in passenger flow, unexpected accidents, and unplanned parking [13,14]. To solve this kind of problem, researchers have proposed many train timetable rescheduling (TTR) methods [15][16][17][18][19].
Previous studies on the TTR problem have focused on reducing the delay time caused by disturbances, which can be divided into two categories: one is to minimize the total delay time of passengers [20]; the other is to minimize the total delay time of all trains [21].Šemrov et al. [22] introduced a real-time TTR method based on Q-learning. e proposed method has carried out a large number of experiments on the real-world railway network in Slovenia.
e experimental results show that the solutions of Q-learning are at least equivalent and generally superior to simple first-in-first-out (FIFO) and random walk methods that do not rely on learning agents. Different from these two types of traditional methods, the TTR method studied in this paper aims to reschedule train timetable after disturbances occur to reduce traction energy consumption. Hou et al. [23] developed a mixedinteger programming (MIP) model to solve a metro train timetable rescheduling problem, which aims to jointly optimize the total train delay, the number of stranded passengers, and the energy consumption of trains. Zhao et al. [24] implemented three search methods, namely, enhanced brute force (EBF), ant colony optimization (ACO), and Genetic Algorithm (GA), to minimize energy consumption and delay after being disturbed. e results show that these three methods can find close to the best or the best train trajectories and driving styles to reduce the energy or improve safety and passenger's comfort. Gong et al. [25] proposed a Compensational Driving Strategy Algorithm (CDSA) to restore the disturbed train to the original optimal timetable by reducing the travel time of the disturbed train in the next section after a disturbance occurs. e results show that compared with not using CDSA after a disturbance occurs, using CDSA can save 1.86% of energy on average.
However, these optimization methods (EBF, ACO, and GA) implemented by Zhao et al. are not suitable for solving the TTR problem in real time due to the long calculation time. And the CDSA proposed by Gong et al. only rearranges the coasting speed of the disturbed trains, which does not adjust other trains' coasting speeds and all trains' dwell time. In response to these problems, this paper proposes a TTR method based on deep learning, called Modified Genetic Algorithm-Gate Recurrent Unit (MGA-GRU) by combining the modified Genetic Algorithm (MGA) with the Gate Recurrent Unit (GRU) network.
Up to now, many methods based on a general GA have been proposed to solve scheduling and optimization problems [26][27][28][29]. Corresponding experimental results show that these methods can find high-quality solutions for large-scale case. And GRU has been applied to solve problems with time-series dimensions [30,31]. ese experimental results show that GRU can extract more rich and complex information from sequences and aspects.
Better than EBF, ACO, and GA, MGA-GRU can reschedule the timetable in real time after random disturbances occur. Unlike CDSA which only rearranges the coasting speed of the disturbed train, MGA-GRU rearranges the coasting speed and dwell time of all trains in the metro network in real time after disturbances occur, so as to achieve better energy-saving effect. e remainder of this paper is organized as follows. Section 2 builds three models based on Shanghai Metro Line One (SML1). Section 3 introduces the MGA-GRU method to solve the TTR problem in real time after a disturbance occurs. In Section 4, four experiments based on the SML1 pilot network are conducted to verify the energy-saving effect and real-time performance of the proposed method. Section 5 concludes this paper.

Modeling
In this section, three models are proposed to formulate the metro system: time model, mechanical model, and power model.
For a better understanding of this paper, the assumptions, decision variables, and parameters are first introduced.

Assumptions
(1) e distance between two adjacent stations of SML1 is relatively small. According to the actual operation of trains on the SML1 and the description of Su et al. [32], each train adopts a single-cycle accelerationcoasting-braking strategy instead of repeated acceleration and braking. e time model defines the departure instant t de , travel time t tr , and dwell time t dw of each train at each station [33]. e starting station is defined as station no. 1. e instant when train no. 1 leaves the starting station is defined as time � 0. e interval between each adjacent train leaving the starting station is equal.
If a disturbance occurs at station no. n of train no. m, the corresponding dwell time will increase from t m,n dw to t m,n dw + ε. en the departure instant of train no. m at station no. n is According to the assumptions above, only one disturbance occurs during each entire test procedure. erefore, the instant t M,N total when the last train arrives at the terminal is defined as

Mechanics
Model. According to the assumption above, each train adopts a single-cycle acceleration-coastingbraking strategy instead of repeatedly acceleration and braking [25]. e unit of F T , F R , F B , etc., is N.
In the acceleration phase, F B � 0; the relationship between F T and speed v is shown in the following equation: When the speed is lower than 10 m/s, the train is in a constant torque traction state, and the acceleration of the train is a fixed value. When the speed increases beyond 10 m/ s, the train switches to a constant power traction state. In this state, the traction power is a fixed value, and the traction force is inversely proportional to the speed.
In the coasting phase, F B � 0, F T � 0; the relationship between F R and v conforms to the Davis equation [34]: In the braking phase, F T � 0; the relationship between F B and v is shown in the following equation: When the speed is higher than 18.056 m/s, the braking force is inversely proportional to the speed. When the speed decreases within 18.056 m/s, the deceleration is a fixed value.
2.6. Power Model. In the metro system, there are three driving states of trains: acceleration, coasting, and braking. Accelerating trains convert electrical energy into mechanical energy, while braking trains can regenerate mechanical energy into electrical energy. e electric energy generated by the braking trains can be supplied to the acceleration trains. is implies that if the trains can be arranged with an appropriate strategy, a lot of energy can be saved by the use of this part of RBE. P B , the regenerative braking power, is defined as P T is the traction power and is defined as If the braking power is less than the traction power, it can be fully used; otherwise, resistors will kick in and consume the overflowing braking power to maintain the train voltage under a safe value. e minimum value of traction power and braking conversion power is defined as P F [33]:

Relationship between Coasting Speed and Travel
Time. e area enclosed by the speed curve and the time axis is the distance between two adjacent metro stations. As shown in Figure 1, if the acceleration and driving strategy is determined, the coasting speed and the travel time form a one-toone mapping between two adjacent stations. Higher coasting speed corresponds to a shorter travel time. erefore, the travel time can be controlled by controlling the coasting speed. And the relation can be defined as t m,n

Energy Optimization under Disturbances
In order to optimize the net traction energy consumption in real time after a dwell disturbance occurs, this paper proposes an MGA-GRU method based on deep learning. is method combines the modified Genetic Algorithm (MGA) Journal of Advanced Transportation with the Gate Recurrent Unit (GRU) network. e MGA-GRU method consists of four stages. Specifically, in the first stage, the optimal energy timetable without a disturbance is produced by MGA. In the second stage, the dwell time and coasting speed of each train are used as decision variables. And MGA is used to provide effective actions under different disturbances, which are used as the training set. In the third stage, the outcomes of MGA are used to train the GRU network. All the above three stages are offline. In the fourth stage, the well-trained GRU network is used as a decision network. e well-trained decision network can provide effective solutions in real time after a disturbance occurs.
is stage is real-time.

Modified Genetic Algorithm.
In the first stage, the energy optimization objective function without a disturbance can be expressed as In the second stage, if a disturbance occurs at station no. n 0 of train no. m 0 , the energy optimization objective function under a disturbance can be expressed as where ε is a nonzero random variable. Equations (9) and (10) are single-objective optimization problems that can be solved by Genetic Algorithm (GA). However, using general GA to solve such complex optimization problems, the solution is easy to fall into local optimal rather than global optimal. In order to overcome the problem of premature convergence of general GA, this paper introduces a modified Genetic Algorithm (MGA) based on Simulated Annealing (SA) algorithm to avoid falling into local optimum and approach global optimum.
Pseudocode for MGA is provided in Algorithm 1. GEN is the generation of MGA, and GEN_MAX is the maximum generation of MGA. A is the number of initial individuals, and B is the maximum number of local searches per individual. k c (0 ≤ k c ≤ 1) and k m (0 ≤ k m ≤ 1) are random values. disturbance occurs, FIT(α) can be calculated based on equation (9), which means that FIT(α) � E(t). And if a dwell disturbance occurs, FIT(α) can be calculated based on equation (10), which means that FIT(α) � E(t, ε).

Gate Recurrent Unit.
e Gate Recurrent Unit (GRU) network belongs to one of the Recurrent Neural Networks (RNN). Like the Long Short-Term Memory (LSTM) network, GRU is also proposed to solve the problems of longterm memory and gradient in backpropagation. Better than LSTM, because GRU has fewer parameters, the training speed is faster, and less data is required to generalize. While using GRU can achieve the same effect as LSTM, the GRU network is easier to be trained and the training efficiency is higher. e GRU network has a strong generalization ability and has been successfully and widely used in voice recognition, computer vision, and other fields. e structure and application of GRU are introduced below.

Input and Output Structure of GRU.
e input and output structure of GRU is the same as the Naïve RNN, as shown in Figure 2. ere is a current input x t and the hidden state h t−1 passed from the previous node. e hidden state h t−1 contains information about the previous node. Combining with x t and h t−1 , GRU produces the output y t of the current hidden node and the hidden state h t passed to the next node.

Internal Structure of GRU.
e states of the two gates (r t and z t ) are obtained by the hidden state h t−1 passed from the previous node and the input x t of the current node. As shown in equations (11) and (12), r t is a reset gate that controls reset, and z t is an update gate that controls update. And σ is the sigmoid function. With this function, r t and z t can be transformed into the range [0, 1], which can be used as a gating signal. W xr , W xz , W xh and W hr , W hz , W hh denote weight matrices of the reset gate, the update gate, and the hidden layer, respectively. b r , b z , b h are the bias matrices.
After obtaining the gating signal, the reset gate is the first to be used to produce the reset data h t−1′ � h t−1 ⊙ r t . en h t−1′ is stitched with the input x t . A tanh function is used to shrink the data to the range [−1, 1], that is, h t′ , as shown in equation (13). h t′ mainly contains the current input x t . Adding h t′ to the current hidden state in a targeted manner is equivalent to remembering the current state.
In the update memory stage, two steps of forgetting and memorizing are performed at the same time. e expression is as follows: where ⊙ is the Hadamard product, which is to multiply the corresponding elements in the matrix. ⊕ represents the matrix addition operation. e range of the gating signal z t is [0, 1]. e closer the gating signal is to 1, the more data is remembered; the closer it is to 0, the more is forgotten. In summary, the internal structure of GRU is shown in Figure 3.

Application of GRU.
Each decision under different disturbances can be produced by MGA, which includes a series of coasting speeds and dwell time. e outcomes of MGA can be used to train the GRU network, that is, the decision network. A well-trained decision network can provide intelligent decisions in real time after a random disturbance occurs. Figure 4 shows the structure of the coasting speed and dwell time decision network. e decision network consists of five parts: input layers, previous hidden layer, current hidden layer, a decision network, and a voter. After a disturbance occurs, at the departure instant, the decision network determines the coasting speed of each departing train and the dwell time at the next station. e speed, position, and driving state of other trains, along with the train number and station number of the departing train, are put into the input layer. e train number and station number of the departing train correspond to one GRU cell. Finally, the voter gives the coasting speed and dwell time of the departing train. Furthermore, for a metro system with M trains, the input layer has 3M − 1 neurons, and the output layer has 2 neurons. e dwell disturbance is used in the input layers of the GRU network. After a dwell disturbance occurs at the disturbed trains, the position, speed, and driving state of the disturbed train are delayed as well. erefore, when a train departs from a station, the position, speed, and driving state of the disturbed train (as one of the other trains) are different from the situation where no dwell disturbance occurs. In a word, the dwell disturbance influences the disturbed train's the position, speed, and driving state at each departing instant, which are used in the input layers of the GRU network.

Experimental Verification
In order to verify the energy-saving effect and real-time performance of the proposed MGA-GRU method for solving the TTR problem, four numerical experiments are conducted in this section. In experiment 1, MGA is used to Journal of Advanced Transportation produce the optimal timetable without a disturbance in the two-train metro system. In experiment 2, the timetable is rescheduled after a disturbance occurs, by using the MGA-GRU method. In experiment 3, the MGA-GRU method is applied for solving the TTR problem in a three-train metro system. In experiment 4, the MGA-GRU method is applied in a bidirectional metro system with five trains on two tracks. e information of the pilot metro system is shown in Figure 5. e configuration of the numerical experiment is shown in Table 1.
Some settings for the four experiments are listed in Table 2. Based on the above settings, there are no traffic jam Randomly generate the first generation. e population contains many individuals, and each individual contains a series of decision variables: coasting speed (V), dwell time (T). Ω is used as a series of decision variables, which means Ω � V, { } do Randomly generate a series of V and T within constraints to serve as a set of solutions Ω(α, β) in the neighborhood of Ω(α). β ⟵ β + 1. end while α ⟵ α + 1, β ⟵ 1. end while Equations (9) and (10)

Experiment 1.
e goal of experiment 1 is to validate MGA for solving the energy optimization problem and produce the optimal timetable without a disturbance. MGA is used to solve the energy optimization problem based on equation (7) and the optimal timetable as is shown in Table 3. Figure 6 shows the distribution of individuals of the 1st, 10th, and 15th generations produced by MGA. As shown in Figure 6, the distribution of individuals in the 1st generation is discrete. After 10 generations, the individuals' distribution gradually concentrates. Finally, after 15 generations, the individuals converge with a fitness value of 256.02 kWh, which is the optimal energy consumption without a disturbance.
In order to prove the effectiveness of MGA, a general GA is also applied to produce the offline timetable without a disturbance under the same condition. Figure 7 shows the distribution of individuals of the 1st, 10th, and 15th generations produced by a general GA. After 15 generations, the individuals converge with a fitness value of 276.59 kWh. It can be seen from Figure 7 that the individuals of a general GA concentrate more quickly than those of MGA. What is more, the fitness of a general GA (276.59 kWh) is bigger than that of MGA (256.02 kWh).
erefore, compared with a general GA, the proposed MGA can avoid falling into local optimum prematurely and provide a better solution.

Experiment 2.
is experiment is to use the MGA-GUR method to solve the TTR problem under random disturbances in the two-train metro system. e energy-saving effect of the MGA-GRU method is reflected by the saved energy during test compared with the no-action strategy. e real-time performance of the MGA-GRU method is reflected by the time it takes to provide a pair of strategies during testing.

Dataset.
e outputs of MGA based on equation (8)

Baselines.
e proposed MGA-GRU method is compared against two baselines: (i) no action, where each train does not take any measures after disturbances occur and (ii) MGA, where MGA is used to give an offline strategy to deal with the dwell disturbances. e two baselines are representative of the worst and best possible strategies. It is expected that MGA-GRU falls in between these two extreme cases. It should be emphasized that MGA-GRU can reschedule the timetable in real time.
e changes of the net energy consumption with these three strategies are compared during testing. e dwell disturbance which occurs at train no. 1 at Changshu Road Station is 13.45 s. Figure 8 shows the net energy consumption curve of the whole journey using the three methods. It can be seen from Figure 8 that from the moment when the disturbance occurs, the net energy consumption of no action has always remained the highest. Although MGA can achieve good results in saving energy, it takes a lot of time to provide a decision, which does not meet the real-time requirements for solving the TTR problem. Compared with the above two methods, MGA-GRU can reschedule the timetable in real time and achieve saving energy. e rescheduled timetable with the MGA-GRU method under a 13.45 s dwell disturbance in the two-train metro system is shown in Table 4. Table 5 shows the average calculation time and average total energy consumption of the three strategies in 10 tests. e MGA-GRU strategy is energy efficient compared with the no-action strategy and its average calculation time is only 0.15 s which meets the requirements of real-time effect. erefore, MGA-GRU can reschedule the timetable in real time and achieve saving energy after a dwell disturbance occurs. It should be noted that although the total energy consumption of the MGA strategy is less than the MGA-GRU strategy, it requires a greatly long calculation time (8694.08 s in total) to reschedule the timetable, which absolutely does not meet the real-time requirements of the TTR problem. In terms of calculation time, MGA-GRU has an absolute superiority.

Experiment 3.
In the real case of SML1, there are at most three trains between two substations. erefore, it is essential to apply the MGA-GRU method to a three-train metro system. What is more, according to the real case of SML1 [35], the train departs early frequently, which means that the value of a disturbance can be negative. is situation is also taken into consideration in experiment 3.
First, MGA produces the optimal timetable without a disturbance based on equation (7), as shown in Table 6.

Baselines.
Same as experiment 2, in the three-train metro system, MGA-GRU is also compared against two baselines: no action and MGA. Figure 9 gives the net energy consumption curve of the whole journey under a −2.37 s dwell disturbance (departing early) in the three-train metro system. As can be seen from Figure 9, the MGA method has the best energy-saving effect. However, it also requires a long calculation time.
e rescheduled timetable with the MGA-GRU method under a −2.37 s dwell disturbance in the three-train metro system is shown in Table 7. Table 8 shows the average calculation time and average total energy consumption of the three strategies in 10 tests. e average calculation time of MGA-GRU to provide the coasting speed and dwell time of each group is only 0.27 s, which meets the real-time requirements of the TTR problem. Besides, the MGA-GRU strategy is energy efficient compared with the no-action strategy. erefore, MGA-GRU can reschedule the timetable in real time and achieve energy saving after a random dwell disturbance occurs.

Experiment 4.
According to the real case of SML1, there exist at most 3 trains on one track between 2 substations. e goal of experiment 4 is to apply MGA-GRU to a real metro system, which is a bidirectional metro line with five trains on    e bidirectional metro line with five trains on two tracks is shown in Figure 10.
e fourteen stations are numbered in sequence from 1 to 14. ere are three trains departing from station no.1 and travels in sequence to station no.7, which is called up direction. en, there is a turning of 60 s in duration from station no.7 to station no.8. After that, each train drives from station no.8 to station no.14, which is called down direction. After that, there is also a turning of 60 s in duration from station no.14 back to station no.1. And there are other two trains departing from station no.8 to station no.7 and then back to station no.8. e departure instant of train no.1 at station no.1 and train no.4 at station no.8 is the same. Besides, the headway time of every two train is also 120 s.
First, MGA is also used to produce the optimal timetable under no disturbance.   offline strategy to deal with the dwell disturbances. It is expected that MGA-GRU can achieve better energy saving effect than a general GA, which can prove the effectiveness of MGA. Table 9 shows the average calculation time and average total energy consumption of the four strategies in 20 tests. e average calculation time of MGA-GRU to provide the coasting speed and dwell time of each group is only 0.33 s, which meets the real-time requirements of the TTR problem as well. And compared with no-action strategy, the MGA-GRU strategy is energy efficient compared with the no-action strategy and can save an average of 7.19% energy. erefore, MGA-GRU can reschedule the timetable in real time and achieve energy saving after a random dwell disturbance occurs. What is more, MGA (8.73%) achieves better energy saving effect than a general GA (6.15%), which also proves the effectiveness of MGA. e decision time of MGA-GRU with the onboard computer's configuration is also discussed. And the configuration of the train's onboard computer is shown in Table 10.
e configuration on the PC is restricted to the same as the train's onboard computer, and then experiment 4 is performed again.
e experimental results show that the average calculation time is 0.35 s, which reflects that the proposed MGA-GRU method can be applied to the onboard computer.

Conclusion
In this paper, a Modified Genetic Algorithm-Gate Recurrent Unit (MGA-GRU) method is proposed to solve the train timetable rescheduling (TTR) problem. e proposed MGA-GRU method can reschedule timetable to optimize the net traction energy consumption of the metro system under a random dwell disturbance in real time. Specifically, the outcomes of modified Genetic Algorithm (MGA) under different dwell disturbances are used as the training set to train the Gate Recurrent Unit (GRU) network (the decision network). After a disturbance occurs, the well-trained decision network can provide appropriate coasting speed and dwell time in real time.
Better than traditional optimization methods, such as enhanced brute force (EBF), ant colony optimization (ACO), and Genetic Algorithm (GA), MGA-GRU can achieve real-time train timetable rescheduling. Superior to CDSA, MGA-GRU can rearrange the coasting speed and dwell time of all trains in real time after disturbances occur, so as to achieve better energy-saving effect.
Four experiments are conducted on the Shanghai Metro Line One (SML1) pilot network to verify the energy-saving effect and real-time performance of the proposed method.
e experimental results show that in the two-train metro system, the three-train metro system, and the five-train metro system (a bidirectional metro line on two tracks) after a disturbance occurs, the MGA-GRU strategy can save an average of 4.45%, 6.16%, and 7.19% of energy compared with the no-action strategy, while the average calculation time for each group of coasting speed and dwell time is only 0.15 s, 0.27 s, and 0.33 s, respectively. In all the two-train metro system, the three-train metro system, and the five-train metro system, the proposed MGA-GRU method can solve the TTR problem under random disturbances in real time.
In the future work, according to Taguchi's experimental design method [36] and other intelligent optimization methods [37], the impact of user-defined parameters on the performance of the proposed algorithm should be analyzed, other parameters should be compared, and the best settings should be decided.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.