Bus Arrival Time Prediction Using Wavelet Neural Network Trained by Improved Particle Swarm Optimization

Prediction of bus arrival time is an important part of intelligent transportation systems. Accurate prediction can help passengers make travel plans and improve travel efficiency. Given the nonlinearity, randomness, and complexity of bus arrival time, this paper proposes the use of a wavelet neural network (WNN) model with an improved particle swarm optimization algorithm (IPSO) that replaces the gradient descent method. The proposed IPSO-WNN model overcomes the limitations of the gradient-based WNN which can easily produce local optimum solutions and stop the training process and thus improves prediction accuracy. Application of the model is illustrated using operational data of an actual bus line. The results show that the proposed model is capable of accurately predicting bus arrival time, where the root-mean square error and the maximum relative error were reduced by 42% and 49%, respectively.


Introduction
In recent years, with the accelerated pace of China's urbanization process, urban transport problems have become increasingly prominent.Public transport is widely regarded as the best choice to solve the tra c problems and improve the urban environment [1].e Chinese government proposed "Give Priority to the Development of Urban Public Transport" policy in 2004 and released "Give Priority to the Development of Urban Public Tra c Guidance" in 2012.Advanced urban public transport systems are under construction and will continue to improve.Bus arrival time prediction is the core content of such systems for bus travel information and bus travel-route guidance.It is an important part of the urban public transport system.
At present, there are many models for predicting bus arrival time of public transit, such as nonparametric regression models, support-vector machine (SVM) models, Kalman lters, arti cial neural network (ANN) models, and hybrid models.Lin et al. [2] used the historical data mean method to predict the average bus arrival time delay.Patnaik et al. [3] used automatic passenger counts of bus data to establish a prediction model of multivariable regression.Sun et al. [4] proposed a model to predict the arrival time using the weighted mean of historical data and real-time global positioning system (GPS) data.Padmanaban et al. [5] proposed an arrival time prediction model that is based on real-time bus data and bus operation delay.Xue et al. [6] developed a mathematical model based on the analysis of the process of bus operation and bus station characteristics.
He et al. [7] proposed a new bus arrival time prediction model with multi-index evaluation which is based on SVM and veri ed its feasibility.Yu et al. [8] developed an SVM prediction model considering the time period and segment, weather, and operation time of current and downstream sections.Li [9] developed a prediction model for roadsection operation time based on real-time correction of bus speed.Zuo and Wang [10] developed a nite-state machine forecasting model based on real-time GPS data.Shalaby et al. [11] used a Kalman lter to predict bus running time based on GPS data.Chien and Kuchipudi [12] developed a Kalman lter to predict the arrival time at bus station based on road and stop characteristics.Vanajakshi et al. [13] proposed the use of automatic vehicle location (AVL) data and Kalman filter to predict bus arrival time in mixed traffic environments, where model parameters were adjusted in real time according to the prediction error.Park and Rilett [14] argued that the ANN model can provide better prediction performance than the Kalman filter.Chien et al. [15] proposed an adaptive feedback ANN model based on the operation time of arterial segment and stop station.
e model can automatically adjust the parameters according to the real-time prediction error.Lin et al. [16] proposed a two-layer ANN model that considered the effect of time and intersection signal lights, but this model required a large amount of training data.e effect of different weather conditions on bus travel time was analyzed by Bladikas et al. [17].
Hybrid models have also been developed for the analysis of bus operation.Ran [18] proposed a hybrid model that combined multivariable regression and ANN based on bus real-time AVL data.Liu [19] proposed a hybrid prediction model, based on a Kalman filter and ANN, that effectively combined historical and real-time data.Among the existing techniques, ANN has the characteristic of nonlinear adaptive information processing, which provides a great advantage in prediction.In particular, a wavelet neural network (WNN) that combines ANN and wavelet analysis exhibits good timefrequency localization characteristics and neural network self-learning function.erefore, WNN has strong abilities of recognition, fault tolerance, and accurate prediction of bus arrival time.However, the traditional WNN has used the gradient descent learning method to correct the weighting parameters, which result in slow training speed and the possibility of being trapped into a local optimum solution.
To address the preceding issues, this paper proposes a hybrid model of bus arrival time prediction that combines WNN and an improved particle swarm optimization (IPSO) algorithm.e next sections present the IPSO algorithm, the proposed IPSO-WNN model, and its implementation for bus arrival time prediction.Application of the model to an actual case study is then presented, followed by the conclusions.

Improved Particle Swarm Optimization Algorithm
2.1.Traditional Particle Swarm Optimization.e traditional particle swarm optimization (PSO) is a stochastic computational intelligent method that has a simple structure, where a few parameters need to be adjusted [1].Similar to other evolutionary algorithms, PSO is initialized with random particles (potential solutions).However, in PSO, each particle is assigned a random velocity and then flies in the Ndimensional space, where its velocity is dynamically adjusted according to the flying experiences of other particles in the group and its own experience.Subsequently, through an iterative update of the position and speed of the particles, the optimal solution is found.e sketch map of the PSO algorithm is shown in Figure 1.
Let the position and speed of particle i of the population in the N-dimensional solution space be expressed as X i � (x i1 , x i2 , . .., x iN ) and V i � (v i1 , v i2 , . .., v iN ), respectively.en, the speed and position of particle i are updated as follows: where V i � speed of particle i, ω � inertia weight, C 1 and C 2 � learning factors, which refer to the acceleration weight of particles that fly to individual and group extremums, respectively, rand () � random number between 0 and 1, Pbest i � position of the optimal solution that particle i has found so far (personal best), X i � position of particle i, and Nbest i � position of the optimal solution that the neighborhood of particle i has found so far (global best).Appropriate values of C 1 and C 2 can accelerate the convergence and avoid falling into a local optimum, where a larger V max can guarantee the global search ability of the particle population.e coefficients ω, C 1 , and C 2 determine the capacity of the space search of the particle.e preceding PSO is a standard algorithm and is the basis for the current research to improve the algorithm.

PSO Algorithm Improvements.
In PSO, based on the experiences of the group and the particle's own experiences, a particle flies to the best particle that has a strong global search ability and a better solution area.However, in the process of the optimization of complex high-dimensional problems, the traditional PSO algorithm has a more global ability at the start and a more local ability at the end of the process.erefore, PSO is more likely to explore local optimum solutions at the end.In addition, the search performance of the algorithm depends on the values of the parameters.To address these limitations, two improvements to the traditional algorithm were adopted: (1) improving subgroup strategy and (2) updating particle velocity and learning factors.
For the subgroup strategy improvement, let the total number of particles N be divided into M subgroups that are multiples (that is, N is a multiple ofM ).Initialize the particle swarm, calculate the fitness value of each particle, and sort the particles according to their fitness values from large to 2 Journal of Advanced Transportation small, where the sorted particle numbers are 1, 2, . .., N. At each interval i � N/M, extract the particle subgroups in turn.e particles contained in subgroup j are {j/j � j + i × k}. is process can effectively avoid uneven grouping of the subgroups.In addition, the better particles can drive the bad particles in all groups, resulting in a balanced evolution of each subgroup [8].
For the improvement related to updating particle velocity and learning factors, the particle velocity updating of equation ( 1) is revised as follows: where C 3 � learning factor and NLbest i � position of the optimal solution that the subgroup particles have found so far.en, the position of particle i, X i , is updated using equation ( 2). e learning factors are given by where C 1s , C 2s , and e process of improving the PSO algorithm is shown in Figure 2. e specific implementation steps are as follows: Step 1: initialize the particle swarm.e position and velocity of the initial particles are randomly generated within the specified range, and the Pbest i coordinates of each particle are set to their current positions.e optimal particle for each subgroup is the best individual value of the subgroup in which the particle is located, and NLbest i is set to the current position of the optimal particle.e optimal particle of the entire neighborhood is the best individual of the optimal particles in each subgroup, and Nbest i is set to the current position of the optimal particle.
Step 2: calculate the fitness value of the particle.For each particle, the current fitness is compared to the fitness of the best position, Pbest i , that it has experienced.If it is better than the previous value, the function value of Pbest i is updated; otherwise, it remains unchanged.e fitness of each particle of this iteration is compared to the fitness of NLbest i experienced by the subgroup in which it is located.If it is better than the previous value, the function value of NLbest i is updated; otherwise, it remains unchanged.e fitness of each particle in this iteration is compared to the fitness of the best Nbest i experienced by the whole group.If it is better than the previous value, the function value of Nbest i is updated; otherwise, it remains unchanged.
Step 3: update particle speed and position.e speed and position of each particle are updated according to equations ( 2) and (3).
Step 4: check whether the end condition is met.When the maximum number of iterations is reached or the minimum error is satisfied, the optimal solution is output; otherwise, return to Step 2.

Proposed IPSO-WNN Model
As previously mentioned, the proposed model of bus arrival time prediction combines the improved PSO with WNN and is called IPSO-WNN.A description of the WNN technique and the IPSO-WNN model is presented in this section.e WNN is a mathematical model that combines wavelet analysis and neural network.It is based on the topology of the backpropagation (BP) neural network and the wavelet basis function as the transfer function of the hidden layer nodes, instead of the original sigmoid function.In other words, the wavelet function is introduced as the transfer function of the BP network.A transfer function of WNN is used in the shift and scaling factors, allowing a stronger ability for recognition, fault tolerance, and prediction.e WNN structure is shown in Figure 3.
Given the input sample data X i (i � 1, 2, . .., k), the mathematical expression of the hidden layer output is expressed as where h(j) � output value of the j node in the hidden layer, h j � wavelet basis function, w ij � linked weights between the input and hidden layers, b j � shift factor of the wavelet basis function, a j � scaling factor of the wavelet basis function, and l � number of nodes in the hidden layer.e mathematical expression of the output layer is given by where y(k) � k-value of the output layer, w jk � weight between the hidden layer j and the output layer k, and m � number of nodes of the output layer.e method of modifying the weights and thresholds of the traditional WNN is similar to that of the correcting algorithm for BP neural network weights.Using the gradient correction method to constantly correct network weights and thresholds of the wavelet basis function can reduce the gap between the expected and predicted outputs.When the error reaches a specified limit, the correction can stop.e WNN correction process involves two steps, as follows: Step 1: calculate network prediction error: where e � prediction error of WNN and y n (k) � expected output value of k.
Step 2: correct the weights of WNN and the coefficients of the wavelet according to the network prediction error e, as follows: where Δw (i+1) n,k , Δa (i+1) k , and Δb (i+1) k are calculated based on the network prediction error as follows: where η is the learning rate.

Procedures of the IPSO-WNN Model.
e fitness function, which indicates the accuracy of the neural network, is used to evaluate the quality of each particle.e following training error (mean squared deviation) of WNN is chosen as the fitness function of PSO: where N � number of training samples and y n (k) and y(k) � expected and actual output values of k, respectively.e optimization specific steps are as follows: (1) Data normalization: normalize the sample data for input and output to produce dimensionless quantities.(2) Parameter initialization: initialize the parameters of WNN, including PSO parameters, such as particle swarm iterations, population size, location, and maximum speed.(3) Population initialization: randomly initialize the position and velocity of the particle and calculate the initial fitness values according to the fitness function.(4) Finding initial extremum: determine individual and group extremums according to the initial particle fitness values.(5) Iterative optimization: use the PSO algorithm to update the position and velocity of the particle according to the fitness value of the new updated individual and group extremums.When the fitness value converges or the specified number of iterations is reached, go to Step 6. (6) Output optimal weights and thresholds: set the position of the global optimal particle as the optimal weights and WNN thresholds.(7) Prediction of WNN: use the optimal weights and thresholds to predict the new samples.

Implementing IPSO-WNN Model for Bus Arrival Time Prediction
Using the improved PSO algorithm, the WNN model was optimized and the bus arrival time prediction model was 4 Journal of Advanced Transportation coded using Matlab.Details on preparing input data, input data processing, and transfer function and determining number of hidden layer nodes are described in this section.

Preparing Input Data.
e input data to the proposed model are determined based on relevant literature [20][21][22] and the historical data on weather, date, time, and bus realtime operation.e bus arrival time at the next stop was selected as the output target.e input data include sample data vector, training dataset, weather factors, date factors, and time factors.

Sample Data Vector.
is vector includes the following nine input variables: where t bi � travel time of the three buses ahead of the bus under consideration from stop (k − 1) to stop k in the same time period of the day, where i � 1, 2, 3, and t hi � travel time of buses whose departure times are in the same period in the previous three weeks from stop (k − 1) to stop k, w � weather conditions, d � date factor, and s � period factor.

Input Data Processing and Transfer Function.
e normalized function mapminmax of Matlab used in this study is given by where x k � normalized data, y max � 1, y min � − 1, and x max and x min � maximum and minimum values of the samples, respectively.For the transfer function, in practice, the Morlet wavelet function is widely used and has achieved good results.is function is a single frequency complex sine function with the Gauss network, given by

Determining Number of Hidden Layer
Nodes. e structure of the neural network is composed of input layer, hidden layer, and output layer.e number of input layer nodes according to the preceding analysis was identified as 9.
e output layer represents bus arrival time as the output value, and therefore, this layer has only one node.e optimum number of nodes in the hidden layer require many iterations during the training process.e reference formula for selecting the optimal number of nodes in the hidden layer is given by [23] where l � optimum number of nodes in the hidden layer, m � number of nodes in the output layer, n � number of nodes in the input layer, and a � constant (0 to 10).

Normalization
Wavelet basis function

Input layer
Hidden layer According to equation ( 14), the number of hidden layer nodes should be between 3 and 13. e idea of implicit node optimization is as follows.When the network training is not sufficient, the number of hidden layer nodes is increased, and the training and prediction errors will be reduced.If the number of hidden layer nodes continues to increase, the prediction error will increase.erefore, this idea can be used to determine whether the number of nodes of the hidden layer is appropriate.

Wavelet basis function
e training error of the sample represents the error obtained when the data of the training set are taken as the input sample.e prediction error of the sample is the error obtained when the test dataset is taken as the input sample.At the same time, an expected error E can be set, where E is a constant with a threshold ranging from 0 to 1 (0.5 was selected in this study).
e training error of the wavelet neural network when the number of hidden layer nodes is M is expressed as ε train M and the prediction error of the wavelet neural network when the number of hidden layer nodes is M is expressed as ε predict M .When the number of hidden layer nodes is (M − 1), the training and prediction errors are expressed as ε train M− 1 and ε predict M− 1 , respectively.When the number of hidden layer nodes is (M + 1), the training and prediction error are expressed as ε train M+1 and ε predict M+1 , respectively.Finally, whether M is the best hidden layer node is determined according to the following formulas: ) To determine the number of nodes in the optimal hidden layer, the process is as follows.When θ 2 (M − 1) > θ 2 (M) > θ 2 (M + 1), the number of nodes in the current hidden layer is less than the number of nodes in the optimal hidden layer.erefore, the number of nodes in the hidden layer should be increased.When θ 2 (M − 1) < θ 2 (M) < θ 2 (M + 1), the number of nodes in the current hidden layer is larger than the number of nodes in the optimal hidden layer.erefore, the number of nodes in the hidden layer should be decreased.When both conditions are simultaneously satisfied and θ 2 (M − 1) > θ 2 (M), the current node number M is the optimal hidden layer node number.After several iterations of training, the training and prediction errors are obtained and are substituted into equations ( 15)-( 20) to obtain the optimal number of hidden layer nodes.
is number was found to be 10.erefore, the structure of the wavelet neural network is finally determined as 9-10-1; that is, the number of nodes in the input layer is 9, the number of nodes in the hidden layer is 10, and the number of nodes in the output layer is 1.

Case Study
e proposed model was applied to bus line 102 in Suzhou city to predict bus arrival times.is bus line runs from Baodai West Road to South Railway Station Square.e route is 12.8 km long and includes 20 bus stops.Bus operational data, which involved 1790 segments, were collected in 2019 during May 7-9, May 14-16, May 21-23, and May 27-30 (every week from ursday to Saturday).e collected data were converted to 34,010 route segment operational times for each bus running along the route (1790 × 20).After data processing, a total of 472 sets of sample data were obtained, 382 groups were selected as the training data, and the remaining 90 groups were used as the test data.Part of the training sample data is shown in Table 1.

Training Samples.
e following input data were assumed: maximum number of iterations of the algorithm � 500, population size � 50, accelerating factors C 1s � 2.5, C 2s � 2.0, C 3s � 1.5, C 1e � 1.5, C 2e � 2.0, and C 3e � 2.5.e output fitness curves of the WNN and IPSO-WNN models are shown in Figure 4.For the WNN model (Figure 4(a)), the early training speed was slow, and 500 iterations were needed to achieve convergence with an error of 0.05.On the contrary, the proposed IPSO-WNN model (Figure 4(b)) achieved rapid convergence in the training iterations number 0-50.When the number of iterations is about 310, the solution converged and the error precision reached 0.01.Compared to WNN, the proposed IPSO-WNN model has a faster convergence rate and a smaller convergence error.

Establishing Bus Arrival Time Prediction Model.
As previously mentioned, PSO was used to replace the traditional descent gradient method.In the training process, the weights and thresholds of the WNN were optimized.Based on the optimization results, the parameters of the calibrated IPSO-WNN model were determined.en, the prediction model of bus arrival time was established as follows: e calibrated parameter values of the IPSO-WNN model are presented in Table 2. Given the nine variables shown previously in equation (11), one can obtain 102 prediction values of bus arrival time.

6
Journal of Advanced Transportation

Model Validation.
e 90 sets of the test samples were normalized, and then the optimal network weights and thresholds were used as input to the model.e prediction errors are shown in Figures 5 and 6. e root-mean square error, used to evaluate model accuracy, is given by the following equation:where RMSE � root-mean square error, T i � actual value, and y i � predicted value.e smaller the RMSE is, the higher the accuracy is.e RMSE of the WNN and IPSO-WNN models were 20.9% and 10.6%, respectively, indicating that the overall accuracy of the IPSO-WNN model is better.In addition, the results show that the relative error of bus arrival time prediction of the WNN model ranged from 5.2% to 16.8%, while that of the IPSO-WNN model ranged from 3.5% to 9.8%.us, the proposed model reduced the maximum relative error of the WNN model by 42% and the RMSE by 49%.Clearly, the proposed IPSO-WNN model has obvious advantages in bus arrival time prediction.

Conclusions
is paper has presented an improved particle swarm optimization model that was integrated with a wavelet neural network to predict bus arrival time, and a new IPSO-WNN model was developed.
e improvements to the PSO algorithm, which were intended to help find the optimal solution quickly and avoid local optimum solutions, were related to improving subgroup strategy and updating of particle velocity and learning factors.e IPSO algorithm overcomes the limitations of the traditional PSO. e IPSO-WNN model was applied for the prediction of bus arrival time in an actual bus line.Actual bus operational data were used for model training, and the proposed model was run to obtain the best network weights and thresholds.
e application results show that the RMSE of the IPSO-WNN model was 10.6% compared to 20.9% for the traditional WNN.
is study has focused on the prediction of bus arrival time using the developed IPSO-WNN model.Future research will continue to improve the PSO algorithm and train it with other common traffic datasets and compare it with other methods.is research was financially supported by the Science and Technology Fund of Education Department of Fujian Province (JAT160079).
e assistance of Shutian Xu,

Figure 1 :
Figure 1: Sketch map of the particle swarm optimization algorithm.

and C 3
at the start of the algorithm and C 1e , C 2e , and C 3e � corresponding values of C 1 , C 2 , and C 3 at the end of the algorithm.At the start of the algorithm, the value of C 1 is larger and the values of C 2 and C 3 are smaller.is is advantageous to the search of the particles in the whole space and provides a stronger global searching ability.At the end of the algorithm, C 1 becomes smaller and C 2 and C 3 become larger and this helps the particles to have a strong local searching ability and in turn finds the global optimal solution.

Figure 3 :
Figure 3: Topology of the wavelet neural network.

Figure 5 :Figure 6 :
Figure 5: Comparison of predicted bus arrival times of WNN and IPSO-WNN models with actual values: (a) WNN model; (b) IPSO-WNN model.
Bus arrival time varies not only between working days and weekend, but also among the seven days of the week.

Table 2 :
Calibrated parameters of the IPSO-WNN model.

Table 1 :
Part of the training sample data.