Multistep Prediction of Bus Arrival Time with the Recurrent Neural Network

Accurate predictions of bus arrival times help passengers arrange their trips easily and ﬂexibly and improve travel eﬃciency. Thus, it is important to manage and schedule the arrival times of buses for the eﬃcient deployment of buses and to ease traﬃc congestion, which improves the service quality of the public transport system. However, due to many variables disturbing the scheduled transportation, accurate prediction is challenging. For accurate prediction of the arrival time of a bus, this research adopted a recurrent neural network (RNN). For the prediction, the variables aﬀecting the bus arrival time were investigated from the data set containing the route, a driver, weather, and the schedule. Then, a stacked multilayer RNN model was created with the variables that were categorized into four groups. The RNN model with a separate multi-input and spatiotemporal sequence model was applied to the data of the arrival and leaving times of a bus from all of a Shandong Linyi bus route. The result of the model simulation revealed that the convolutional long short-term memory (ConvLSTM) model showed the highest accuracy among the tested models. The propagation of error and the number of prediction steps inﬂuenced the prediction accuracy.


Introduction
e rapid and continuous development of China has led to an increase in the number of vehicles. e National Bureau of Statistics of China announced that the number of privately owned vehicles reached 261.5 million in 2019 with 21.22 million vehicles increased in a year. 96 cities in China had more than one million registered vehicles [1]. e rapid increase of vehicles causes traffic congestion, parking problems, and environmental pollution. Public transportation affords a larger number of passengers and alleviates such problems. Mass transportation consumes less energy and emits less amount of pollutants than private transport. erefore, urban planning puts a priority on public transportation. New technologies such as bus rapid transit (BRT) and driverless bus have been developed significantly with huge investment to support the public transportation system. However, a trip by bus takes a relatively long time and is not punctual, which makes people avoid it. Encouraging people to use buses more often requires optimized bus routes and punctuality of bus operation [2,3]. However, the absence of an accurate operation schedule often causes long waiting times and bus bunching on the same route. For the punctual operation of the public buses, the bus schedule needs to be optimized, which needs an accurate prediction of the arrival time of buses on a route accurately. is not only meets the demand of ordinary passengers who want to know the arrival times of a bus at boarding stations but also optimizes the intelligent bus scheduling system and improves the operation efficiency of the bus company. Several neural networks have been used to predict the arrival time of a bus: non-RNN network, RNN with the time series, and temporal and spatial RNN network. Several studies adopted non-RNN networks for predicting bus arrival and operation times using (1) MapReduce-based clustering with K-means [4], (2) a backpropagation (BP) neural network model [5], (3) a particle swarm algorithm [6], (4) a wide-depth recursive (WDR) learning model [7], and (5) RNN with the time series such as long short-term memory (LSTM) [8]. Models with LSTM processed the historical data of the global position system (GPS) and bus stop locations with the influence of different routes, drivers, weather conditions, time distribution [9], heterogeneous traffic flow, and real-time data [10][11][12]. e temporal and spatial RNN network with ConvLSTM or a spatiotemporal property model (STPM) was originally used to predict the precipitation [13]. However, it was also used for predicting bus arrival times based on the total operation time of a bus on a route, waiting and on-board times, transfer location wait times [14][15][16], and multilane short-term traffic flow [17] and for creating the multitime step deep neural network [18]. e bus is running on fixed lines with fixed stations. e spatial relationship between its stations determines the arrival times in the time series. us, this study used an RNN to predict the arrival time of a bus. A route of a bus has 30-40 bus stations in general. Arrival time prediction includes the time prediction of each station along the way from the starting to the finishing stop, the arrival times at subsequent stations, and the arrival time of the nearest vehicle to a station. is study first analyzed the bus arrival time. Based on the analysis, the input eigenvectors of a neural network were defined, and then, seven RNN models for predicting the arrival time from four categories were tested. en, the proposed model was trained by the measured data of arrival and departure times of the buses in a route of Linyi, Shandong Province. en, the multistep prediction of the arrival time was carried out. is paper is organized as follows. Section 2 describes the theoretical background and introduces the recurrent neural network. Section 3 describes the pretreatment and analysis of data. Section 4 discusses the analysis result of the RNN model. Finally, Section 5 concludes this study.

Theoretical Background
A recurrent neural network (RNN) [19] has a feedback structure that processes sequential data for time-series prediction or classification. RNN is widely used in various applications, and new models using it have been suggested such as LSTM, GRU, and ConvLSTM. According to the data in this study, we divided the prediction into four categories and adopted a multistep prediction for bus arrival times. e time-series input data is essential for the prediction with optimal feature extraction and memory efficiency. e data is processed in an RNN with internal feedback and feedforward connection, which retain and reflect the state or memory of a long context window [20]. e RNN suffers from a common disadvantage of the gradient disappearance (gradient vanishing) and gradient explosion problem [21][22][23], which results in limited applications due to training problems. To solve the problems, Hochreiter et al. [24] proposed and continued improving LSTM for different applications [25,26]. LSTM specializes in memorizing long sequences and effectively avoiding the problem of gradient disappearance. Hidden layers of LSTM use memory blocks that store the previous sequence information, while increasing the performance of three gates: input, output, and forget gates. ese control the sequence information for memory. e gated recurrent unit (GRU) [27] is a modestly simplified LSTM. GRU combines the forget and input gate into an update gate and the cell and hidden state. A model with GRU is simpler and has less activation function and output computation than the standard LSTM model.  Figure 1 shows the hidden units of LSTM which are replaced by memory blocks.
Calculating c t and h t requires the following equations: In these equations, ⊙ Hadamard product is the multiplication of the corresponding elements in the operation matrix, W i , W f , W o , and W c are the weights of X t , U i , U f , U o , and U c are the weights of h t−1 , b i , b f , b o , and b c are the bias conditions, σ is the sigmoid function, and tanh is the hyperbolic tangent function. Figure 2 shows the GRU. ere is only one hidden state h t in GRU.
rough the linear transformation of the input tensor and hidden state, the weighted sum of the hidden state inflow is calculated with equations (2) and (3). e linear transformation for r t , h t−1 , and the input tensor is combined with the activation function of equation (4) to calculate the updated value of the hidden state. e mixed weight for calculation of the implicit state in the previous step is shown in equation (5). e final output h t is the same as LSTM. Compared with LSTM, there is one less activation function calculation and output calculation as well as the final hidden state update, so the calculation is relatively simple. Figure 3 illustrates a pure network model of LSTM and GRU. e input layer has the sequence of the arrival time series input, and the other two layers use a fully connected prediction network. Input and output data are 3D tensors with a shape [?, 41, 1] ("? "means that a dimension can have any length). ese models use a single layer of LSTM and GRU. In the input layer, variables are, such as route, direction, vehicle model, and driver, also regarded as a part of the time sequence.

Multi-Input Model Separated by Time Series.
As the variable is not sensitive to any specific ordering, the RNN cannot process it alone. However, a BP network can process through a connection layer. us, the integration of RNN and BP was used for the prediction network ( Figure 4). e integrated network was in accordance with the characteristics of the input data. A two-part network used the time series-related input data such as route number, driver, departure time, and route length for LSTM processing. rough a connection layer, the prediction layer was processed. Since time series input data became shorter even with the addition of LSTM, the total trainable parameters were not significantly increased compared with pure LSTM.

LSTM Stacking Model.
To achieve better accuracy of the prediction than a single layer, a multilayer LSTM was employed. Stacking four LSTMs had hidden units in 256, 128, 64, and 32 layers, respectively. Figure 5(a) shows the diagram of the stacking models.
ere is also a two-way LSTM composition, in which the forward and backward connections also employ a reverse projection function, which is suitable in our case to verify arrival time predictions. Figure 5(b) shows the diagram of the bidirectional network models.

Spatiotemporal Time-Series Model.
e bus operation is in a space-time domain although there are little changes in the spatial dimension for an operation in the fixed route. As ConvLSTM processes the data of time and space, it integrates a convolution of time and space into calculating each gate of LSTM. e following equations are used for the calculation: A ConvLSTM network with batch normalization (BN) consists of a specification and flattening layer and a prediction network. Figure 6 shows the diagram of the network that has a long training time and many parameters in more than five dimensions. is network is appropriate to process time-series data with spatial properties such as bus arrival times with high accuracy.

Pretreatment and Analysis of Data
3.1. Data Characteristics. A bus was equipped with a device that included a GPS and data communication module. e device transmitted data to a bus scheduling system. Table 1 shows the data structure of the reporting system. e data of arrival and departure of a bus at a bus stop consists of route, speed, arrival and departure time, coordination, and driver's number. For obtaining the Lasso variable correlation [8], the bus number, number of bus stops, days in the week, distances between bus stops, arrival and departure times, and weather were included, too. e variables were grouped into two: dynamic and static variables [14]. e dynamic variables include driving times between bus stops, staying times, and weather, while the static variables include a route, direction, vehicle model, driver, arrival and departure times, days of the week, holidays, and working days. We selected variables related to the route and the arrival times at the previous stops as the input  where x t is the difference of the arrival time between the current station and the previous station.

Data Preprocessing
Step 1. Generating a Sample Dataset According to the data in Section 3.1, an arrival time series was obtained from the data of arrival and departure times of a bus at a bus stop. For the convenience of calculation, the difference of the arrival times between two bus stops was calculated in seconds. Table 2 shows the example of the dataset. e existing sequence data are 120, 220, 250, and 260 which correspond to four bus stops A, B, C, and D. is means that 120 s is needed for a bus to drive from the starting location to A, 220 s from A to B, 250 s from B to C, and 260 s from C to D. When the bus arrives at C, the prediction of the arrival time to D is only needed. e input sequence is the sequence of all arrival times from the starting location to C, and the output sequence includes 260 s from C to D and the backward sequence from C to the starting location. e length of the input sequence is shorter than that of the output sequence as it only needs to predict the time to the finishing location of a bus. When predicting the time to the finishing location, it only needs to know the sequences before it. When the bus arrives at a bus stop between the starting and finishing location, for a consistent sequence length, the time to the previous bus stop is input as 0. Figure 7 shows the time-series data of real arrival times. e blue and orange line is for the input and output sequence, respectively. e sequence has the predicted times of 0 at the current bus stop. An output sequence has negative numbers to maintain the correctness of the inverted time from the starting location to the bus stop.
Step 2. Dataset Normalization e variables had different dimensions and units which affected the results of data analysis. us, normalization was necessary to eliminate the differences. Standardizing with the Z-score and the minimum-maximum values were used so that the final values were ranged between 0-1. e equation for standardization is as follows:

Mathematical Problems in Engineering
Z-score standardization uses the mean and standard deviation of the data and is calculated as follows: where µ is the mean and σ is the standard deviation of the data.
In this paper, after sorting the vehicle number and driver number on the route, the sorted sequence was used as the input data and normalized by equation (7). e route number, route direction, departure time (hh:mm), days of the week, holidays, distances from the starting location, weather, and other information were normalized, and their arrival time series were processed by equation (8).

Dataset.
e experiment was based on the data collected from March 28 to June 28, 2020, in Linyi, Shandong Province. e data was obtained from buses that ran on route no. 30 which had 36 bus stops ( Figure 8). Tensorflow-GPU 2.0 was used for data processing and algorithm creation. e numbers of the dataset were 122,336 after pretreatment, 78,303 in the training set, 19,590 in the verification set, and 24,443 in the test set.

Training the RNN Prediction
Model. Seven network models were designed, trained, verified, and tested by using the preprocessed dataset. Pure LSTM and GRU were RNN models. LSTM-BP and GRU-BP were multiple input models with variable features separated from the time series. Bidirectional LSTM (LSTM-Bi) and LTSM-Stack were LSTM stack models. ConvLSTM was a spatiotemporal sequence model. Table 3 shows a model structure and a comparison of the parameters of the RNN network. Pure GRU had the smallest number of the parameter, while the stack model had the largest number. e loss function selected the average absolute error (MAE) which was the difference between the prediction and real value. All network parameters were updated using the Adam optimization algorithm.
e Adam algorithm performs first-order optimization. e first-and second-order optimizations were used for a dynamic design of independent adaptive learning rates for different parameters. e    Input sequence Output sequence (120, 220, and 250) (−120, −220, 0, and 260) (120, 220, and 0) (−120, 0, 250, and 260) (120, 0, and 0) (0, 220, 250, and 260)  Mathematical Problems in Engineering  estimations were better than the traditional gradient descent method. Since the output was a time series, cosine similarity was used to determine the accuracy. Figure 9 shows the training loss and accuracy when an epoch is 1000, a batch is 100, and a verification set is 20% of the training set. As the seven models show similar MAEs, only the training output of the ConvLSTM model was selected. In the process of training, the trend of training and validation data is consistent, and there is no fitting case. e super parameters of the selected model are suitable, too. As seen from Figure 10, the pure GRU had fewer training parameters and less training time than pure LSTM. e LSTM-Bi doubled the number of parameters and training time than the pure LSTM. e LSTM-Stack had five times more parameters than other models, but less training time than the ConvLSTM. e ConvLSTM had the longest training time, 5.8 times more parameters, and 12 times longer training time than the pure LSTM.

Analysis of Results.
e test set was used to predict the training model, and the predicted arrival times are shown in Table 4. e fitting degree of the real and the predicted value in Table 4 shows that the ConvLSTM provides the best prediction. e multi-input hybrid model, which separates the parameters from the time series, not only increased the network complexity but also reduced the prediction accuracy. Table 5 shows the statistics of the prediction results by the seven models. MAE, RMSE, MAE, COS, number of training parameters, and time were used to quantitatively evaluate the seven network models. e prediction accuracy was improved from the pure LSTM to the ConvLSTM, as shown in Figure 11. e results reveal the following: (1) e GRU was more efficient than the LSTM model with fewer parameters and considerable accuracy (2) e LSTM models except the ConvLSTM had more parameters and higher network accuracy than other models (3) e dataset property did not influence the results of the models but the complexity of the models (4) e ConvLSTM showed the highest accuracy as it processed the data of time and space, which indicated the need to include the space-related properties In the process of arrival time-series prediction, the arrival times at subsequent bus stops were based on those at the previous bus stops. e ConvLSTM network model was selected to analyze the prediction accuracy through one-and two-step prediction and total time prediction.           Figure 12 shows the test sample set on the x-axis and the difference between the predicted and real values on the yaxis. e mean and RMSE were calculated from the mean values and mean square deviation of the differences. e one-step prediction had the highest accuracy, and the total time prediction (multistep prediction) showed the lowest accuracy. e regularity in the histogram of Figure 12 reveals that the one-step prediction has the smallest deviation and the highest error, which is related to the accumulation and propagation of errors in the prediction of the arrival times of the subsequent bus stops.

Conclusion
e public transport system is a complex system with a high degree of uncertainty. e system is understood as a multistep prediction problem in which uncertainty leads to poor prediction accuracy.
is paper first analyzed the main variables affecting this uncertainty, and then, the variables such as route, direction, vehicle, driver, departure hour, departure minute, day of the week, holiday, distance from the starting location, and weather were selected. e arrival time series before the current bus stops was also selected. ese variables fully reflected the impact on the arrival timeseries prediction. Among RNN networks for time-series analysis, we processed the data by using seven different network models in four different types of networks.
We analyzed and compared the predictive power of the seven RNN models with the variables and parameters in the measured dataset. We noticed an improvement in prediction accuracy by adding variables in one-and two-step prediction models, but not in the multistep (total time prediction) model. e multistep model increased the network complexity only. e ConvLSTM showed the highest prediction accuracy with spatiotemporal data. e statistics of one-, two-, and multistep prediction showed that the accumulation and propagation of the sequence prediction error caused more steps and a large deviation of the predicted time. e accurate bus arrival time prediction encourages more people to use buses for transportation and allows operating companies to optimize bus schedules for increasing the efficiency of their operation. is also improves the traffic condition in cities.
Accurate bus arrival information also relieves the anxiety of users by decreasing waiting time and helps to provide passengers with an improved service. e accurate prediction of bus arrival times can be integrated into an intelligent bus scheduling system in a smart transportation system. Such a system improves the management of a public transport system, increases the economic benefits of the system, and ultimately brings social benefits.
Data Availability e nature of the data includes excel files, and the data can be accessed at https://github.com/ricebow/multi-step-RNN.
ere are no restrictions on data access. e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.