Radial Basis Function Neural Network Based on an Improved Exponential Decreasing Inertia Weight-Particle Swarm Optimization Algorithm for AQI Prediction

This paper proposed a novel radial basis function (RBF) neural network model optimized by exponential decreasing inertia weight particle swarm optimization (EDIW-PSO). Based on the inertia weight decreasing strategy, we propose a new Exponential Decreasing Inertia Weight (EDIW) to improve the PSO algorithm. We use the modified EDIW-PSO algorithm to determine the centers, widths,andconnection weights ofRBFneural network. To assesstheperformanceof theproposedEDIW-PSO-RBFmodel, we choose the daily air quality index (AQI) of Xi’an for prediction and obtain improved results.


Introduction
The radial basis function (RBF) neural network is a novel and effective feed-forward neural network [1], which has good performance of best approximation and global optimum.It has been broadly used in considerable applications, such as function approximate, classification, regression problems, prediction, signal processing, and other problems [2][3][4][5][6].The RBF neural network architecture has three layers composed of input layer, hidden layer, and output layer.The input layer is composed of input vectors.Before the input vectors are input to the network, data processing should be done, such as normalization processing.This processing also can be done in the input layer.
The hidden layer is composed of hidden neurons, the number of which is determined by the issues described.The RBF networks are different from other types of neural networks mainly in the hidden neurons [7,8].Each hidden neuron has a radial basis function which is a center symmetric nonlinear function with local distribution.The radial basis function consists of a center position and a width parameter.Once the center and width are determined, the input vectors are mapped to the hidden space by the mapping  :   →   .Suppose that the input layer has  input units; the hidden layer has  radial basis functions.The output of the th hidden neuron is expressed as ℎ  () =  (−       −          )  = 1, 2, . . ., , where  is the overall input sample as  = ( 1 ,  2 , . . .,   )  , and each   is the -dimension input vector expressed as   = ( 1 ,  2 , . . .,   ).  and   are, respectively, the center and the width of the th hidden neuron, and   is an -dimension vector as   = ( 1 ,  2 , . . .,   , . . .,   ).‖ ⋅ ‖ is Euclidean norm usually taking 2-norm.(⋅) is the radial basis function.It can take a variable of formula expression such as B-Spline RBF [9], thin-plate Spline RBF [10], Cauchy RBF [11], and Gaussian RBF [12].Among them the Gaussian function is the most used, as in where  is the variable of radial basis function (⋅).
The output layer implements the mapping  :   →   . is the number of output neurons.The output function is a linear combination of the outputs of the radial basis functions through connection weights which connect the hidden layer and the output layer, which is shown as where   is the connection weight between the th hidden layer and the th output of the network.RBF neural network contains three groups of parameters which are centers   , widths   , and connection weights   .To optimize the RBF parameters, many optimization algorithms have been proposed, such as orthogonal least squares (OLS) algorithm [13], Expectation-Maximization (EM) algorithm [14], gradient descent algorithm [15], Kmeans clustering algorithm [16], Genetic algorithm (GA) [17], ant colony optimization (ACO) algorithm [18] and particle swarm optimization (PSO) algorithm [19,20], and support vector machine (SVM) and extreme learning machine (ELM) [21].Compared to other algorithms, PSO algorithm has many advantages: stable convergence, few parameters, and fast convergence speed.Many researchers have successfully applied the PSO algorithm in the learning and structure improvement of the RBF neural network for application problems.
The particle swarm optimizing (PSO) algorithm is put forward by Eberhart and Kennedy in 1995 [22], which is initially motivated by the intelligent collective behavior of birds in the foraging process.In PSO algorithm, each bird also called a particle has a position and a velocity and searches for the optimal solution by updating the position and the velocity.In the following years, many researchers introduce "inertia weight" and propose many dynamic variations of PSO based on the inertia weight [23][24][25][26].Different inertia weight strategies imply different incremental changes in velocity per time step which means exploration of new search areas in pursuit of a better solution.In this paper, we propose an Exponential Decreasing Inertia Weight PSO (EDIW-PSO) algorithm to get the optimal parameters of RBF network.
This paper establishes a RBF network model based on EDIW-PSO algorithm.Section 2 introduces the basic PSO algorithm and several variants of inertia weight.Section 2.3 gives the improved EDIW-PSO algorithm based on an Exponential Decreasing Inertia Weight strategy.Section 3 presents the methodology of the proposed EDIW-PSO-RBF structure.Section 4 shows an experiment using this methodology comparing with other three models.The last section summarizes the conclusions of this study.

Dynamic Particle Swarm
Optimizing Algorithm 2.1.Basic Particle Swarm Optimizing Algorithm.PSO algorithm is a parallel evolutionary computation algorithm.In PSO, each potential solution to an optimization problem is treated as a bird, which is also called a particle.The set of particles, also known as a swarm, is flown through the -dimensional search space of the problem.Each particle changes its own position and velocity based on the experiences of the particle itself and those of its neighbors.In the searching process, every particle is connected to and able to share information with every other particle in the swarm and the swarm communication topology is known as a global neighborhood described in [27].This information sharing mechanism keeps the overall consistency to get the global solution for the overall swarm.
According to a preset fitness function, we obtain the personal best position (also named as the local best fitness) of the th particle denoted as   = ( 1 ,  2 , . . .,   ) and the global best position (also named as the global best fitness) found so far of all particles of the swarm denoted as   = ( 1 ,  2 , . . .,   ).At each iterative, the th particle updates its position and velocity as follows: where  1 and  2 are acceleration factors and positive constants,  1 and  2 are random numbers ranging from 0 to 1, and ω is the inertia weight on the interval [0, 1] keeping the memory of the old velocity vector of the same particle.When ω is a constant [22], it can lead to a static PSO, and when ω is varying iteratively, it leads to a dynamic PSO.

Several Variants of Inertia
Weight.The inertia weight determines the proportion of the current particle velocity.Large inertia weight can lead to large speed and strong searching ability of particles, but the global optimal solution may be missed.In contrast, small inertia weight makes the particle have strong development capability, but it needs long search time for fine tuning the local optimal solution.By changing the inertia weight dynamically, the search capability is dynamically adjusted.Many researchers have proposed several variants of PSO according to the impact of ω.Most of the PSO variants use time-varying inertia weight strategies in which the value of the inertia weight varies with the iteration numbers.These methods can be either linear or nonlinear.In 1998, Shi and Eberhart introduced a Linearly Decreasing Inertia Weight (LDIW) strategy to get better inertia weight ω as the following formula [23]: where  is the number of current iterative steps,  is the maximum number of iterative steps the PSO is allowed to continue, and ωmax is the initial inertia weight and ωmin is the final inertia weight.Chatterjee and Siarry propose a nonlinear decreasing inertia weight (NDIW) strategy to modulate inertia weight adaptation with time for improved performance of PSO algorithm [24].The proposed adaption of ω() is given as where  is the nonlinear modulation index.With  = 1, the system becomes a special case of linearly adaptive inertia weight with time, as proposed by Shi and Eberhart [23].
After implementing the performance of this strategy for the famous Sphere function, they draw a conclusion that  = 1.2 was a typical value and could be suitably determined for each case individually.When  = 1.2, during the early iterations a higher value of ω facilitates taking larger steps in the solution space, and during the later iterations ω is decreased more rapidly than the linear case, which is very suitable to determine the optimal region among the already discovered promising suboptimal regions.
In [25], Arumugam and Rao propose a global-local best inertia weight (GLbestIW) method in which the inertia weight is neither set to a constant value nor set as linearly decreasing time-varying function.The inertia weight is determined by the ratio of the global best fitness and the local best fitness in each iteration.The GLbestIW is given by the following equation: where    is the global best fitness at th iterative and    is the local best fitness of the th particle at th iterative.

The Proposed Inertia Weight Variant.
Larger ω is conducive to find the global best solution as soon as possible in the early iterative steps but may lead to miss the global best solution easily in later iterative steps.However, smaller ω means longer time to provide slower updating for fine tuning a local exploration.Hence in the early iterative steps, larger ω is needed for coarse global exploration, but in later iterations ω should decrease for fine tuning the local exploration.Appropriate inertia weight can help find the best solution with the least number of iterative steps.
A larger inertia weight facilitates global exploration and a smaller inertia weight tends to facilitate local exploration to fine-tune the current search area.In order to balance the global and local exploration, we present a new Exponential Decreasing Inertia Weight (EDIW) strategy.In this strategy, the inertia weight is exponential decreasing with the increase of iterative step .The proposed adaption of ω() is given as where  is controlling parameter to control the convergence rate of the inertia weight,  > 0. When  = 0, ω() = ωmax .When  = , ω() can be expressed by the following equation: Use the optimal structure of RBFNN to perform prediction problem  When  = , we obtain different values of inertia weight ω by the proposed adaption by setting varying sets of {, ωmax , ωmin }.The results are shown in Table 1.According to Table 1, when  = , for same ωmax and ωmin , small  makes ω be still far from the final inertia weight ωmin , and the value of ω is more and more close to ωmin with the increase of .When  ≥ 6, ω is nearly equal to ωmin .So the condition of  ≥ 6 can ensure that ω() varies fully in the given range of [ ωmin , ωmax ] when  ranges from 1 to .
When  takes different values, different decreasing effects will be got.For convenience to compare the different decreasing effects, we set ωmax = 0.9, ωmin = 0.2, and  = 1000.The decreasing curves of inertia weight ω attained by the proposed EDIW strategy for varying  are shown in Figure 1.
According to Figure 1, for the same , the rate of descent of ω gradually declines as the the increase of iterative step and begins to flatten in the later iterative steps.For different , the rate of descent of ω gradually increases as the increase of  in the early iterative steps.Smaller  can ensure that ω does not decrease so fast in the early iterations but will be still far from ωmin in the final iteration.For example, when  = 1 and  = 1000, the inertia weight ω is 0.4575 which is far from ωmin (0.2). Larger  is conducive to making ω decrease fast to discover the promising suboptimal regions but may lead to the algorithm prematurely to begin the local search.For example, when  = 10 and  = 1000, the inertia weight ω begins to flatten at the iterative step  = 400.Parameter  should be appropriately selected in EDIW-PSO algorithm.Based on the analysis above, we can choose the value of  ∈ (6,8).During the early iterations ω decreases more rapidly than the linear case, which is very suitable for the algorithm to discover promising suboptimal regions.During the later iterations ω is fine-tuned, which is conducive to determining the optimal region among the already discovered optimal regions.

The Proposed RBF Model by EDIW-PSO Algorithm
In this section we use the proposed EDIW-PSO algorithm to determine the optimal structure of the RBFNN and establish  the EDIW-PSO-RBF network model.The position vector  of each particle is needed to be optimized, which represents RBF centers   , widths   , and connection weights   , where  = 1, 2, . . ., ,  = 1, 2, . . ., , and  = 1, 2, . . ., .Therefore the dimension of  of each particle in EDIW-PSO algorithm is  =  ⋅  +  +  ⋅ .We map each  to the RBFNN and obtain the prediction output.The fitness function of EDIW-PSO algorithm is defined in terms of the relative mean square error (RMSE) between the prediction output values and the actual values in the network training process.Thereby, we can minimize the fitness value of the network by the powerful search performance of EDIW-PSO algorithm.The fitness function is denoted by formula where ŷ and   are the actual value and the prediction value of the th output neuron in the th sample, and  is the number of training samples, while  is the number of output neurons.
The iteration process of the improved EDIW-PSO-RBF learning algorithm can be described clearly as follows.
Step 1. Initialize the relative parameters, including the size of swarm , the boundary of position  max and velocity  max , the acceleration factors  1 and  2 , and the maximum iterative steps .Initialize  = 1; for each particle, select two dimensional vectors randomly to initialize the position and velocity of this particle, respectively.
Step 2. Map the position vector   of each particle to the parameters of RBFNN.
Step 3. Calculate the fitness value of each particle according to formula (10).Set the current position of each particle as the personal best fitness   .Then find the minimum fitness value as the global best fitness   of the whole swarm.
Step 4. Update the inertia weight ω according to formula (8).Modify the particle velocity   and position   according to formula (4).
Step 5. Map the new position vector   of each particle to the parameters of RBFNN, input training data, and train RBFNN.
Step  Step 7. Judge whether the particle satisfies the conditions:  = .If the condition is met, go to Step 8; otherwise, let  =  + 1, and go back to Step 4.
Step 8. Record the global best value   ; exit the iteration.
Step 9. Use the optimal structure of RBFNN to perform prediction problem.
Apply the above 9 steps until the terminal conditions hold.The flow chart is as in Figure 2.

Experiment
In order to ensure the prediction accuracy of the proposed EDIW-PSO-RBF model, we choose the daily air quality index (AQI) of Xi'an [28] for the time series prediction.In recent days, the air pollution affects peoples' travel and life.Daily AQI is a dimensionless index quantitatively describing air quality, which is calculated by the following six indicators: sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), particulate matter (PM 10 : particle size is less than or equal to 10 microns), particulate matter (PM 2.5 : particle size is less than or equal to 2.5 microns), carbon monoxide (CO), and ozone (O 3 ).Among them, SO 2 , NO 2 , and CO are all the 24-hour average density; O 3 is the 8-hour moving average density.We choose 400 sets of data from 2013.1.1 to 2014.2.5 as train data and 5 sets of data from 2014.2.6 to 2014.2.10 as test data.Normalization processing is done with all the sets of data before it is used in the model.We adopt mapminmax function to normalize the data to a range of [−1, 1] as the following formula: where  is the original data before normalization,  min and  max are the minimum value and the maximum value before normalization, respectively,  is the data after normalization, and  min and  max are the minimum value and the maximum value after normalization, and, respectively, they are −1 and 1.
In this section, we assess the effectiveness of the proposed EDIW-PSO-RBF model comparing with other three inertia weight variants, which are LDIW-PSO-RBF model [23], NDIW-PSO-RBF model [24], and GLbestIW-PSO-RBF model [25].Firstly, we build a RBF network consisting of 7 input neurons and 1 output neuron.The 7 input neurons consist of the six indicators and the air quality index, and the 1 output neuron is the next day's air quality index.The number of neurons in hidden layer is determined to be 10.Therefore, the dimension of each particle in the modified PSO algorithm is  = 7 × 10 + 10 + 10 × 1 = 90.
Secondly, several parameters in the PSO simulation must be specified.In the proposed EDIW-PSO-RBF model, the value of  is set to be 8.In the NDIW-PSO-RBF model, the nonlinear modulation index  is set to be 1.2.In all the four models, the acceleration factors  1 and  2 are fixed to be 2.The minimum velocity  min and minimum position  min of every particle both are set to be −1.Meanwhile, the maximum velocity  max and maximum position  max of every particle both are set to be 1.The maximum number of iterations is set to be 1000.The population size is set as 50.
To assess the performance of the four different models, mean square error (MSE), relative mean square error

Figure 1 :
Figure 1: The decreasing curves of inertia weight ω attained by the proposed EDIW strategy for varying .

Figure 4 :
Figure 4: The best global fitness value for the AQI of Xi'an.Blue short dash line shows the best global fitness value by the improved EDIW-PSO-RBF model; red dashdotted line shows the best global fitness value by the GLbestIW-PSO-RBF model; green solid line shows the best global fitness value by the LDIW-PSO-RBF model; black solid line shows the best global fitness value by the NDIW-PSO-RBF model.

Figure 5 :
Figure 5: The trained output curve by four models for the daily air quality index (AQI) of Xi'an.

Figure 6 :
Figure 6: Plots of trained absolute errors between the trained outputs and the actual values by four models for the daily air quality index (AQI) of Xi'an.

Table 1 :
Values of inertia weight ω by the proposed adaption when  = , for varying sets of {, ωmax , ωmin }.
the swarm, the position L i and velocity V i of each particle Map L i to the parameters of RBF Calculate the fitness value of each particle; select p i and p g
6. Recalculate the fitness values of the new particles and modify    and    .For each particle, if the current fitness value is better than the previous local best, then set the current fitness value to be the local best; or keep the previous local best.For the global swarm, if the best value of all current local best is better than the previous global best, then update the value of the global best; or keep the previous global best.

Table 3 :
The values of MSE, RMSE, and MAPE (%) of trained ouput by four models for the daily air quality index (AQI) of Xi'an.