A Novel Hierarchical Hybrid Model for Short-Term Bus Passenger Flow Forecasting

. For the increasing travel demands and public transport problems, dynamically adjusting timetable or bus scheduling is necessary based on accurate real-time passenger ﬂow forecasting. In order to get more accurate passenger ﬂow in future, this paper proposes a novel hierarchical hybrid model based on time series model, deep belief networks (DBNs), and improved incremental extreme learning machine (Im-ELM) to forecast short-term passenger ﬂow. The proposed model is named HTSDBNE with two modelling steps. First, referring the idea of parallelization, the hybrid model, constructed by time series model, DBN, and Im-ELM, is used to forecast short-term passenger ﬂow in diﬀerent time scales hierarchically and parallel. Second, Im-ELM is utilized to analyse the relationship of forecasting results from the ﬁrst step, and the weighted outputs of Im-ELM are as the ﬁnal forecasting results. Comparing with single forecasting models and typical hybrid forecasting models, the testing results indicate that HTSDBNE has better performances. The mean absolute percent error of prediction results is around 10% and fully meets the application requirements of bus operation enterprise.


Introduction
For current urban bus transport system, it faces more and more problems, such as improper arriving of buses, overcrowded or empty carriages, and so on, which cause passengers delay, bad ride experiences, and waste of transport resources. us, many enterprises try to adopt dynamically setting the timetable in real time based on the passenger flow variations and provide services in a proactive manner as opposed to a reactive manner with a predictive capability [1,2]. Short-term passenger flow prediction, the forecasting time interval not exceeding 60 minutes, is essentially important for setting the timetable in real time. It is one of the most significant basics for the operation planning and decision making so as to rationally utilize transport resources, solve or ease transport problems, and provide better bus services [3,4].
In recent decades, short-term passenger flow prediction has drawn the widespread attention, and various methods have been proposed, which could be categorized as linear models, nonlinear models, and combination models generally [5]. Because the passenger flow statistics are naturally time dependent, the linear models, such as autoregressive integrated moving average (ARIMA), autoregressive moving average (ARMA), and autoregressive (AR) models, are widely used for simple short-term passenger flow prediction. Ma et al. [6] and Xue et al. [7] constructed a combined forecasting model based on multiple time series algorithms to forecast the changes of passenger flow in different time periods. However, linear models are limited in applications and hard to describe the variation characteristics of passenger flow. For tracking the nonlinear characteristics of real passenger flow, many nonlinear methods have been introduced by researchers, such as the support vector machine (SVM) model [8], least squares support vector machine (LSSVM) model [9], fuzzy neural networks [10,11], Bayesian network [12,13], radial basis function neural networks (RBF-ANN) [14][15][16][17], and grey model [18][19][20]. e core idea of these nonlinear methods is to construct the nonlinear relationship between passenger flow and mine more potential information without prior knowledge [21]. However, the nonlinear models are closely related to the specific application environment and lack universality.
Each method mentioned above has its own advantages and limitations in real applications, and it is hard to cover all characteristics of passenger flow and provide the best prediction performances globally by a single method [6]. us, the hybrid prediction model, combining multiple algorithms strategically, can make full use of the advantages of different algorithms and cover the shortcomings of each algorithm, and it has become common practices in improving the prediction performances. For example, Sun et al. [22] proposed a novel hybrid model Wavelet-SVM; it utilized SVM model to capture passenger flow characteristics from different frequencies generated by wavelet decomposition. Yang and Liu [8] introduced affinity propagation to cluster the passenger flow based on the characteristic analysis and then utilized SVM to predict each subset; the prediction accuracy was improved significantly. Liu et al. [23] proposed a combinational prediction model BP-LSSVM; the initial prediction results from BP are refined by LSSVM further to obtain better predicted passenger volume. Wang et al. [24] used Levenberg-Marquardt algorithm to optimize the BP, and constructed SLMBP model with Spearman rank correlation coefficient method [25] to predict passenger flow. Among these hybrid methods, the ANN-based model has been utilized widely because of its better predictive ability [26].
Furthermore, in recent years, deep learning (DL) has attracted considerable academic and industrial interests [27]. Some DL-based hybrid models have been applied in passenger flow prediction, as they could represent the complex nonlinear relationship and capture the latent correlative features from passenger flow data. Liu et al. [27] proposed an end-to-end DL architecture, for short-term metro passenger flow prediction. Bai et al. [21] presented a multipattern deep fusion approach (MPDF), which is constructed by fusing deep belief networks (DBNs) corresponding to multiple patterns, and it utilized DBNs as a deep representation for passenger flow in each pattern generated by the affinity propagation algorithm. Ke et al. [28] proposed a novel DL approach, named the fusion convolutional long short-term memory network. It is stacked and fused by multiple convolutional long short-term memory (LSTM) layers, standard LSTM layers, and convolutional layers, which capture the spatiotemporal correlations of passenger demands accurately. Liu et al. [29] presented a passenger hybrid estimation system based on the convolutional neural network (CNN) and the spatiotemporal context (STC) model. CNN is used to detect the passengers, and then STC is used to track the passengers so as to accurately estimate the passenger volume.
In summary, the linear model is simple in structure, the accuracy of nonlinear algorithms is better than linear ones, and the combination model is more adaptable. However, these research results mainly include the following defects.
(1) Data problem: the samples of passenger flow statistics studied in most research studies are all from the automatic fare collection (AFC) system [30], which cannot cover the number of passengers who buy tickets in cash. According to the equipment currently used, most AFC systems of bus cannot transmit ticket information to the bus operation enterprise in real time, and in this case, the data samples cannot be used for real time prediction. (2) Passenger flow is complex time series data; it has its own special microscopic characteristic and macroscopic characteristics, and most of studies do not analyse in this aspect; they only consider global characteristics or only analyse the nonlinear nature.
(3) Passenger flow in different time scales is correlative with each other. However, many research studies only assume a linear relation between their patterns [31,32], which could lead to underestimate or degenerate performances, and most hybrid models are too complex to be used in practices.
In order to solve the problems mentioned above, make full use of the advantages of linear and nonlinear models, improve the universality and accuracy of the models, and reduce model complexity, this paper proposes a short-term passenger flow hierarchical hybrid forecasting model based on time series model, DBN and Im-ELM, called HTSDBNE. e real-time passenger flow data collected by the automatic passenger counting (APC) [25] systems are selected as statistical samples for forecasting. HTSDBNE finishes the forecasting operation by two steps: (a) utilizing the time series model and the subhybrid model, consisting of DBN [33,34] and Im-ELM [35,36], to analyse the statistical data and forecast the variation trends of the passenger flow and (b) analysing the relationship between the real-time and historical passenger flow and make full use of passenger flow series data in different time scales to improve the final result accuracy. e rest of the paper is organized as follows. Section 2 describes the structure of the bus passenger flow sampling data. Section 3 discusses the novel hybrid forecasting model. Section 4 shows the comparative experiments and analysis. At last, Section 5 summarizes the main findings and future work.

Bus Passenger Flow Statistics
ere are mainly three approaches to obtain bus passenger flow statistics: one is survey on buses or at stops manually. e other two statistical methods are through the AFC and APC systems. Manual statistics are now almost no longer used due to low efficiency and high cost. Because the AFC system is widely installed on buses, passenger flow can be inferred from passenger ticket information, and it has become the main source of passenger flow statistics. However, in the current bus system, a considerable number of passengers use cash to buy tickets, so the passenger flow statistic results from the AFC system cannot fully cover all passengers. e APC system is able to conduct passenger flow statistics relatively comprehensively and accurately and has become one of the important development directions for bus passenger flow statistics. e passenger flow statistics used in this paper are from the APC system, and the structure of the record is shown in Table 1 [5].

2
Journal of Advanced Transportation

Dataset Definition.
e original passenger flow statistics set generated by the APC installed on the bus is related to the bus outbound time. is paper defines the relevant dataset as follows.
Define the arrival bus sequence at a station as where busID 1 is the first bus arriving at the station in one day and busID n is the last bus arriving at the station in one day. Define the passenger flow statistics sequence of a stop in a day as an ordered list: where stopID is the number of the bus stop and t i is the time when passenger flow statistics are uploaded to the database in real time after the bus with busID i leaves the stop. e count is the sum of passengers getting on the bus with busID i at a stop with stopID.

Data Sample Analysis.
In this paper, the passenger flow statistics used are derived from the APC system installed on the buses of line 28 and line 10 in Dalian, China. Some stops with large passenger flow volume determine the whole line's passenger flow variation. erefore, in related research studies, the bus stops with relatively larger passenger flow are usually selected as research objects. Firstly, the daily average passenger flow of each station in the up direction of line 28 in the past six months was counted. e result is shown in Figure 1.
As shown in Figure 1, the origin station (station 1), station 3, and station 7 had an average daily ridership of over 1000.
ese stations are important stops in the upward direction of the line 28. In this paper, station 7 is selected as the research object for short-term passenger flow forecasting. e sample data from station 7 used in this paper are 26 weeks' statistics from Monday 1 October 2018 to Sunday 31 March 2019. e part of origin data diagram is shown in Figure 2.
As shown in Figure 2, the original passenger flow statistics are related to the arrival time of the buses. Due to traffic congestion or other reasons, it is difficult for each bus to arrive at the stop on time according to the timetable, which causes the data sample intervals to be not equal. In order to reduce the instability of the passenger flow statistical sequence caused by abnormal factors, the data are aggregated in equal time intervals to construct a time series as shown below.
Generally, time interval is determined by bus scheduling plan, which is not less than the minimum departure interval. In the applied research of short-term passenger flow forecasting, the maximum time interval cannot exceed 60 minutes, and the equal time interval of 5, 15, and 30 minutes is selected in this paper.
Data segmentation time point can be determined according to the time interval, defined as where t 1 is the earliest time of the first bus leaving the stop in all statistics. Define max(l) as where t n is the latest time for the last bus leaving the stop. e new passenger flow statistical sequence formed after equal time interval convergence is defined as follows: According to equation (7), the passenger flow statistics of station 7 are aggregated in the interval of 30 minutes, and the statistical results are shown in Figure 3.
It can be concluded from the curve changes that daily passenger flow shows double-peak changes in the morning and evening. e early peak period of the working day can be from 7 : 30 to 9 : 30, and the evening peak period can be from 16 : 30 to 18 : 30. In the following study, the model proposed in this paper will be used to forecast the changes of passenger flow during the morning peak period, and the datasets are aggregated in the interval of 30 minutes. From the observation in Figure 3, the passenger flow statistical sequence has time-periodic variation characteristics with linear correlation. However, the changes in each cycle are not exactly the same, with obvious nonlinear characteristics. erefore, it is necessary to combine linear and nonlinear methods to describe the passenger flow statistical sequence in order to accurately forecast the passenger flow variation.

Hybrid Forecasting Model
e theoretical and empirical findings have already indicated that the integrated model of different models is an effective way for improving the forecasting performances and making up for the shortcomings of each model. e proposed hybrid forecasting model is based on the previous works. Khashei and Bijari [37] and Zhang [32] employed linear models to combine with neural network model, by using linear model to identify and magnify the linear structure of the data and then using neural network to model Journal of Advanced Transportation the preprocessed data in order to improve the prediction accuracy. Some works [32,[37][38][39][40] considered the importance of the residual series of time series data and combined the time series forecasting results to improve the performances of the hybrid model. After referring these works and some idea of online sequential algorithms, a novel hybrid forecasting model is proposed.
In general, the novel model performs data modelling in two sequential steps. One is hybrid time series modelling, which is used to analyse the time series from linear and nonlinear characteristic. e other is nonlinear data modelling, and it is used to analyse the previous results from different time scales or time and spatial scales. Figure 4 shows the process of the first step; in general, it performs in three sequential substeps. (I) Using the linear model to forecast the time series data. Given the training time series set X t , the forecasting result is X t,L . For the instability of passenger flow statistical series, the result X t,L is not stable and unacceptable, and it cannot be used as the final results. (II) Analysing and forecasting the residual series. Based on the real value X t and time series forecasting result X t,L , the error or residual is calculated from their difference, E t � X t − X t,L , and then the residual series is used to train the nonlinear model, whose output is X t,NL . e nonlinear model is a hybrid model, consisting of DBN and Im-ELM, and the Im-ELM is used to forecast residual   Nonlinear combination analysis. In the substep, for the input data X t,L and X t,NL , the nonlinear model, X t � f(X t,L , X t,NL ), is used to analyse and describe the relationship between the time series forecasting results and corresponding residual series, which aims to maximize the combination performance. Figure 5 shows the process of second step; it receives several input data: X A t , . . . , X K t . ese data could be from different spatial cross-correlative sections or spatial and temporal points. In real application, , so it does not employ only linear model or nonlinear model simply to describe the relationship among the input datasets. In the hybrid model, Im-ELM is employed to resolve the problem. e "Forecasting Model A", . . ., "Forecasting Model K" are composite prediction models, which are described in Figure 4.

Data Sliding Window.
e key point is to determine the data volume before each training or testing in the substep (I) and (II). Referring the idea of online sequential algorithms [41], the self-adaptive sliding data window is employed to represent the system dynamics [41][42][43], and it can also adjust the structure of the time series model and the neural network model dynamically. e data sliding widow is a first-in-firstout data sequence; its width can be fixed or dynamically adjusted.
In real application, the data are received one by one or chunk by chunk, and the sliding window is updated accordingly by adding new data and discarding the foremost ones; when the volume of received data s is less than the sliding window width N, s ≥ 1 and s < N. However, in some extreme conditions, the received data volume s is larger than the sliding window width N, s > N. For the continuity of the input data in the first substep (I) and the second substep (II), the foremost l data are selected from s receiving data, while l < s and l < N. e data sliding window is expressed by the input samples and corresponding output results, which are the data pairs, as shown in the following equation:

Relationship
forecasting (neural network model)  (11) where N is the width of the sliding window W S D and also denotes the number of the pairs of input data and output result and t is the time index which shows the newest data index.

Time Series Forecasting Model.
e time series forecasting is an important phase in the whole model; it is mainly used to analyse the real-time data and historical data and forecast the changes of the time series data in future. Generally, the time series analysis establishes the mathematical models by using curve fitting and parameter estimation. e basic model is ARMA, and its mathematical description is shown in the following equation: where Y t are the time series data; φ is the coefficient of the autoregressive model; p is the order of the autoregressive model; ε is the white noise series, which fits normal distribution with the zero mean; θ is the coefficient of the moving average model; and q is the order of moving average model. ARMA(p, q) is used to analyse the stationary stochastic process, but the time series data in some fields are changing upward and downward dramatically, and it also shows the characteristics of periodic fluctuation. e time series data are nonstationary stochastic process, which could be modelled byARIMA(p, d, q), shown as equation (13). It is the ARMA(p, q) model with the differential operation.
where B is the backward shift operator, BY t � Y t− 1 , and d is the order of differencing.

Improved Extreme Learning Machine
Model. ELM is employed to forecast the data next time, which is a special learning algorithm for the single hidden layer feed-forward neural network. It only needs to determine the number and the output weight of the hidden layer neurons. e input weight and the threshold of activation function are set randomly and remain unchanged. Given the training samples, the output of ELM is L×m is the output weight of the hidden layer connecting the hidden layer and output layer. Y � [y 1 , . . . , y N ] T N×m are the output results of ELM. After fixing the hidden layer neurons, ELM aims to find the optimal output weight matrix β, in order to make the output error least, so based on the theory of ELM, the optimal result is as follows: For improving self-adaptability, Huang et al. [44,45] updated ELM and proposed an incremental extreme learning machine algorithm (I-ELM). e basic idea of I-ELM is to update β dynamically by the residual e L and output H L+1 before and after adding new hidden layer neuron, as shown in the following equation: For improving the stability and generalization ability of I-ELM, Im-ELM is proposed, in which the parameters ω k,L+1 and b L+1 of the new adding neurons are not generated randomly. Referring the idea of literature [46,47], the two parameters ω k,L+1 and b L+1 are dynamically determined based on the chaos optimization algorithm (COA), which is highly efficient in global searching ability. In COA, the chaotic states are introduced into the optimization variables. e ergodic range of chaos is mapped into the range of optimization variables, and during the first and second carrier wave, shown as equations (16) and (17), to find the optimal solutions meeting the termination conditions.
where i � 1, . . . , n represents the optimization variables; j � 1, . . . , p represents the optimization variables mapped by the multiple chaos variables; [a, b] is the definition domain of X ij ; t represents the iteration number; and d is the amplification gain. For better prediction accuracy, equation (17) is transformed to the following form: where κ is a regulator. e optimization objective function, f(·) � N i�1 (y i − y i ′ ) 2 /N, y i and y i ′ are target value and prediction value of i − th sample, respectively, and y i ′ can be calculated by using equation (15).
Based on the analysis above, Im-ELM is described in detail as follows:

Deep Learning
Model. DBN is the generative one of the deep learning models, which is employed in the paper. It consists of the stacked restricted Boltzmann machine (RBM) [48] with only one hidden layer [49]. Because of the integrating feature learning and deep learning, it has fast analysing and high data fitting ability [50,51].
RBM is a special kind of Markov random field which consists of two parts: one is the visible layer and the other is the hidden layer. In each RBM, the visible variable v is to connect with the hidden units h by undirected weights ω [52]. RBM is considered as an energy model; its energy is defined as [53] e DBN stacks several RBMs as the unsupervised network considering the visible layer to the hidden layer.
e hidden layer of one RBM is the visible layer of the subsequent RBM, and the training process can be divided into two steps. One step is the unsupervised learning. In this process, the training samples are transformed through layer by layer, and the better initial parameters ω i,j , α i , and b i could be obtained. ω i,j is the symmetric weight connecting the hidden unit j and the visible unit i. α i is the bias of the visible unit, and b i is the bias of the hidden unit. e other step is the supervised learning. In this step, some learning algorithms are used to optimize the parameters obtained in the first step. At last, through global fine-tuning process, the optimal parameters are selected. After finishing the training of the RBMs, the DBN features can be extracted from the topmost hidden layer [54].
For determining the parameters α, b, and ω of each RBM, in the first step of training, the contrast divergence (CD) algorithm is adopted to train each RBM one by one [48,55]. CD algorithm is a fast learning algorithm, with one-step Gibbs sampling for making a better approximation. e process of CD is mainly in four steps: (1) set the first visible layer variables as the input samples; (2) from visible layer to hidden layer, the hidden layer variables are updated by P(h j � 1 | v) based on known visible layer states; (3) negative phase: based on the hidden layer states in the second step, the visible layer is reconstructed by P(v i � 1 | h); (4) update the weights. e updating criterion of parameters is as follows: where v i and v i ′ are the states of i − th neuron before and after reconstructing visible layer respectively; h j and h j ′ are the states of j − th neuron before and after reconstructing visible layer; and η is the learning rate.

Experiments and Analysis
For illustrating and verifying the model proposed in the paper, passenger flow statistics at station 7 of line 28 and station 8 of line 10 are used as experimental samples. e sampling period of the original dataset is from October 1, 2018, to March 31, 2019, for a total of 6 months. e length of time is from 7 : 30 to 9 : 30 in the morning peak hours and from 16 : 30 to 18 : 30 in the evening peak hours. As the model input, the original dataset is aggregated into a time series of 30-minute interval based on equations (6)- (10), and the data from station 7 of line 28 are shown in Figure 6. Moreover, the performances of models are evaluated by the mean-squared error (MSE) and mean absolute percent error (MAPE).

Time Series
Model. e following presents the modelling process for passenger flow analysis using time series model, which is based on the content described in the literature [5]. Figure 6, the sequence shows significant instability. After the first-order difference, the unit root test results are shown in Table 2. When the additional item is "Intercept," supposing H 0 : δ � 0, the t-Static of unit root is − 7.962247, and it is obviously less than the 1%, 5%, and 10% significance level. e critical values of t-static are − 3.886751, − 3.052169, and − 2.666593 separately. Obviously, the statistical value of ζ test is less than the corresponding DW critical values. As a result, it means that after the first-order difference analysis, the data sequence is stable and could be analyzed by time series models.

Model Selection.
e process of selecting the time series model is identifying the orders of the autoregressive (p) and moving average terms (q). e orders can be obtained by calculating the autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs) of the sequence, and the general judgment rules are shown in Table 3 [56]. Analysing these results in Figure 7, when lag � 2, both ACF and PACF show the tails off exponentially. According to Table 3, the ARMA model is selected preliminarily, and its parameters p and q are limited in intervals [1,2].

Parameters Estimation.
e fitting degree needs to be tested based on the information criteria AIC, SC, and HQC, so as to determine the lag order p and q and other parameters. Item included in test equation. Based on their interval, the four models ARIMA(2, 1, 2), ARIMA(2, 1, 1), ARIMA(1, 1, 2), and ARIMA(1, 1, 1) are constructed. After setting the sample size 40, 50, and 60, each model is tested three times. Finally, the minimum of AIC, SC, and HQC can be obtained, as shown in Table 4. Analysing the results, the minimum of AIC is 7.108482, SC is 7.306342, and HQC is 7.135764. After comprehensive analysis, the performances of ARIMA (2,1,2) in the four models are best, which is selected as the time series model.

Model Testing.
For verifying the performances of the selected model, we need to test whether the residual series is the white noise sequence or not by calculating the ACF and PACF. e results are shown in Figures 8 and 9. In Figure 8, the lag order shows the residual series is a white noise sequence obviously. Figure 9 shows the fitting curve of the real-world data and forecasting results of the early peak period. e fitting effect is significantly reduced due to traffic jams in the morning peak. e conclusion proves that the single linear model cannot well describe the nonlinear factors affecting passenger flow change.
In the paper, historical data in a large time scale are selected as the assistant to improve the predicting effects, and the analysing model of historical data need to be     determined based on the operations above. After analysing, ARIMA(2, 1, 2) is selected for the historical data. Figures 10 and 11 show the changes of the number of hidden layer neurons and the learning error during the training process, and X t+1 � sin(2/X t ) is selected as the chaotic map function, and regulator κ � 0.45. As the number of hidden neurons increases, the training error is decreasing. From Figure 11, the error decreases rapidly at first, and when the number of hidden neurons is more than 20, the error tends to be stable. e final result is acceptable. For testing the performances of Im-ELM, a piece of passenger time series data is selected from the whole data. Figure 12 shows the training and validating process. With the increasing training process, the training error decreases, and the error is stable and reaches optimal results when MSE is around 10. e forecasting results of the real-world data are also around 10. Table 5 shows the comparison between Im-ELM and other models (SA-ELM [57], ImSAP-ELM [58], and ELM). Because of introducing COA, more time is needed to optimize the parameters of the new neurons in each iteration. e training time is more than SA-ELM, but less than ImSAP-ELM and ELM. e difference is only about 0.12 seconds between Im-ELM and SA-ELM, which is totally acceptable after comprehensive analysis. e number of hidden layer neurons of Im-ELM is 23 less than the others, and the accuracy of Im-ELM is the best, which is suitable in the hybrid model and applications.

DBN Analysis.
In DBN, the number of hidden layers and hidden neurons is determined by the enumeration method layer by layer. Table 6 shows the comparison between DBN and other models. e prediction accuracy of DBN increases with number of hidden layers. However, too many hidden layers or neurons may reduce prediction accuracy. From Table 6, the DBN-4 (with 3 hidden layers and 150 neurons in each hidden layer) performs best, and selected as the part of HTSDBNE.

Hybrid Model Testing and Analysis.
e time series data of the working day and nonworking day are obtained to test the proposed model HTSDBNE. Tables 7 and 8 and Figures 13 and 14 show the comparison between the HTSDBNE and other models (ARIMA (2, 1, 2), ELM, TS-ANN, SLMBP, SAE-DNN [59], and MPDF [21]). In the passenger off-peak time, such as 7 : 30 am, all models show good accuracy and small MSE and MAPE. However, in the peak time of the working day and nonworking day, especially in the critical zone, the performance of HTSDBNE shows greater superiority. For example, in Figure 13, at 8 : 30 am point, the MSE of HTSDBNE is 8.24, far lower than the error 18.13 of ARIMA(2, 1, 2), and compared with others, the accuracy is improved significantly. In few time points (marked in bold in Tables 7 and 8 and red asterisk in Figures 13 and 14), the result of HTSDBNE is weaker than SAE-DNN and MPDF, but the difference is very small, and the largest difference is only 1.4257%. Figures 15 and 16 show the changing trends of the realworld data and the forecasting results in each working day of the station 7 of line 28. Due to space constraints and the similarity of the results, the analysis of station 8 of line 10 is omitted here. From the curve fitting, HTSDBNE is better in time 9 : 00 to 9 : 30 and 16 : 30 to 18 : 30. e difference between real-world data and forecasting results of HTSDBNE is around 2.5, and the least difference of other models is around 3.2. e HTSDBNE has the best performances compared with other models. At 7 : 30 am, ARIMA(2, 1, 2) does not capture the real changes of the passenger flow, and at 8 : 00 and 17 : 00 pm, the road section is in the peak and congestion; these led to a large forecasting delay for ELM, TS-ANN, and SLMBP. HTSDBNE, SAE-DNN, and MPDF are relatively successful in capturing the sharp changes of the passenger flow, but HTSDBNE is better in the forecasting accuracy. In the end of the early peak and evening peak hours, the prediction results of HTSDBNE are most consistent with the actual situation. e results indicate that HTSDBNE shows better performances and applicability than other models in both peak and off-peak time.    e passenger flow peak hour of nonworking days is 30 minutes later than the working days. In Figure 17, the passenger flow is on the rise sharply from 8 : 00 am to 8 : 30 am. ARIMA(2, 1, 2), ELM, TS-ANN, SLMBP, SAE-DNN, and MPDF capture this upward trend, but the accuracy of prediction results is not high, and the maximum error is more than 20%. However, HTSDBNE performs well with an error of only 8.787%. In the end of the morning peak, HTSDBNE has successfully forecasted the real status of the passenger flow. Different from the morning peak of nonworking days, in the evening peak hours shown in Figure 18, the passenger flow increases gradually, and the three hybrid models (SAE-DNN, MPDF, and HTSDBNE) describe this characteristic. e HTSDBNE is the best in the forecasting accuracy, and the maximum difference is only about 2.

Conclusions
In this paper, the original passenger flow statistical data were deeply analysed and constructed as a time series with an aggregation interval of 30 minutes. Based on the characteristics of passenger flow variation, a novel hybrid forecasting model, HTSDBNE, is proposed, which consisted of ARIMA, DBN, and Im-ELM. In the first step, the ARIMA is used to analyse the stability of time series sequences of the historical data and real-time data and then make a    the big data computing environment, in order to improve the computational efficiency to adapt the real-time forecasting of all bus lines in the whole city.
Data Availability e bus passenger flow data used to support the results of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.