Real-Coded Quantum-Inspired Genetic Algorithm-Based BP Neural Network Algorithm

The method that the real-coded quantum-inspired genetic algorithm (RQGA) used to optimize the weights and threshold of BP neural network is proposed to overcome the defect that the gradient descent methodmakes the algorithm easily fall into local optimal value in the learning process. Quantum genetic algorithm (QGA) is with good directional global optimization ability, but the conventionalQGA is based on binary coding; the speed of calculation is reduced by the coding and decoding processes. So, RQGA is introduced to explore the search space, and the improved varied learning rate is adopted to train the BP neural network. Simulation test shows that the proposed algorithm is effective to rapidly converge to the solution conformed to constraint conditions.


Introduction
Artificial neural networks (ANNs) are put forward to solve the nonlinear problem by simulating the operational process of nervous system.ANNs are powerful tools for prediction of nonlinearities [1] and with excellent nonlinear mapping ability, generalization, self-organization, and self-learning.ANNs have been widely applied in engineering and are steadily advancing into new areas [2].
The "feed-forward, back propagation" neural network (BPNN) is currently the most popular network architecture in use [3].BPNN can be applied in a variety of fields according to the characteristics of the model.Sun et al. establish a prediction model based on improved BP neural network and adopt it to investigate quantitative evolution laws of equiaxed  in near- forging of TA15 Ti-alloy [4].Xiao et al. propose an approach of back propagation neural network with rough set for complicated short-term load forecasting with dynamic and nonlinear factors to develop the accuracy of predictions [5].Wang et al. apply improved variable learning rate back propagation (IVL-BP) to short-term power load forecasting [6].Yu et al. propose a dynamic all parameters adaptive BP neural networks model through fusion of genetic algorithm, simulated annealing algorithm, and the BP neural network and apply it to oil reservoir prediction [2].
Although system based on BPNN is with good performance, the BPNN lacks stability in some cases.The main faults of BPNN include the following.(1) Fixed learning rate leads to the slow convergence speed of network and long training time.(2) The gradient descent algorithm used to optimize the objective function in BPNN makes the computation overflow or swing between optimal values and does not make it converge to global optimal value [7].(3) Structure and scale have much influence on the performance of BPNN.Different nodes in hidden layer and conversion functions acting on the same data will cause different results.(4) The convergence of BPNN is influenced by the choice of initial weights.Improper selection of the initial set of weights will make the algorithm trapped in local optimal solution.(5) The adjustment of weight and threshold values follows certain rule.It is impossible to adaptively adjust the structure in a fixed layout.(6) The learning algorithm of BPNN is based on back propagation of error, which leads to the slow learning convergence and easily falls into local minimum.
To overcome the abovementioned problems, many scholars have put forward some improved algorithm.Although the disadvantages are not ultimately overcome, the study level of BPNN is gradually increased.Different methods can be classified as follows.
(1) The Optimization of Network Structure.Experiential or statistical method is applied to determine the structure of BPNN, which is the optimal combination of the number of hidden layers, the number of hidden neurons, the choice of input factors, and parameters of the training algorithm [8].The most systematic and general method is to utilize the principle of design experiment of Taguchi [9].Use grey correlation analysis to determine the number of hidden nodes of the optimal network to improve the performance of the network [10].Apply the network growth/removal algorithm to add/remove neurons from the initial structure according to the predetermined standard to represent the effect of changes on the performance of the ANNs.The basic rule is to increase neurons when the training process is slow or the mean square deviation is greater than the specified value.Decrease the number of neurons when the change of the number of neurons cannot change the response of the network accordingly or neuron weights remain unchanged for a long time.Growth/removal algorithm is the basic gradient descent method which cannot guarantee convergence to the global minimum; therefore, the algorithm may fall into local optimal solution near to the initial point.The structure can be changed by the application of genetic operators and evaluation of object function [11].
(2) The Improved Training Method of BPNN.Some new methods can be introduced to train the neural network; for example, the Online Gradient Method with changing scale can be used to train BPNN to achieve better convergence [12].The real-coded chaotic quantum genetic algorithm is applied for training the fuzzy neural network to accelerate the convergence speed [13].The transfer function, parameter, the accuracy of the assessment, and gradient descent coefficient can be improved properly in the process of training.
(3) The Combination of BPNN with Other Optimization Algorithms to Optimize the Weights and Thresholds.Zhuo et al. propose a simulated annealing-(SA-) genetic algorithm-BPNN-based color correction algorithm for traditional Chinese medicine tongue images [14].Liu et al. apply GA-BP to predict bitterness intensity [15].The GA-BP is compared with multiple linear regression, partial derivative least square regression, and BP method to prove the superiority of BP-GA model.Wang et al. improve BPNN by introducing cuckoo search algorithm to forecast lightning occurrence from sounding-derived indices over Nanjing [16].
Through the analysis, we can see that most optimization methods of the weights and thresholds of BPNN are based on the combination with GA or SA [14] or the combination of the three kinds of algorithms.There are some major disadvantages of GA and SA.When solving some complex optimization problems, the convergence speed of GA is slow, the convergence may be premature stagnation, and the numbers of species and individuals are large.SA utilizes the principle of crystallization process to minimum energy of metal to search for the minimum in general system.SA is first proposed in the literature of Kirkpatrick to be used to find the balanced combination of atoms set at a given temperature [17].Compared with other methods, the main advantage of the SA is the ability to avoid falling into local optimal value.SA is a random search algorithm, and better values and worse values can be obtained at a certain probability.However, the calculation amount of SA is large, especially for complex problems.
RQGA is a global optimization algorithm which can find the global optimal solution in the complex, multiple-extreme, and nondifferentiable vector space when the number of parameters is small.RQGA is with fast convergence speed and strong optimization ability and does not easily converge to the local optimal solution.Introducing RQGA to optimize the weights and thresholds of BPNN can guarantee getting better solution with higher probability.

BP Neural Network
BPNN is a kind of multilayer feed-forward network according to training of the error back propagation algorithm proposed by Rumelhart and McCelland in 1986.The mean of "back propagation" is that the adjustment way of the weights of network is back propagation of error.As a result of the simple structure of BPNN, many adjustable parameters, and good maneuverability, the BPNN is one of the most widely used artificial neural network algorithms.
BPNN is a kind of typical forward network [18].The training function is used reversely to change weights and thresholds through the positive transfer method of network structure.The samples to be measured are handled by the training model after training structure model of sample is established.The following formula is operating formula of BP where  is the input matrix,  is the weight matrix, and  is the threshold matrix: The specific process of BPNN is as shown in Figure 1.
( (5) Select another group of input and output data randomly from the sample and return to step (2).Continue this process until the end condition is satisfied.
The learning rate of the conventional BP network is invariable.If the learning rate is too small, though the convergence can be guaranteed, the learning speed is slow; if the learning rate is large, it is easy to cause large fluctuation or deviation from the optimal solution.So the learning rate needs to be adjusted in the process of training.
The basic idea of variable learning rate is that if the average variance increases and exceeds the preset value after weights update, decrease the learning rate; if the average variance is less than the preset value, the learning rate stays unchanged; if the average variance is reduced, increase the learning rate.

Real-Coded Quantum-Inspired Genetic Algorithm
Conventional QGA is based on binary coding and can be used to solve the problem of combinatorial optimization well, such as the traveling salesman problem [19], knapsack problem [20,21], and the filter design [22].Using binary numbers to represent the parameters forces a trade-off between accuracy of representation and string lengths.RQGA is better in optimizing the real-valued problems with multipleextremum.Optimization of weights and thresholds of BPNN is a typical optimization problem of real number.When the real number problem needs to be optimized, real encoding is considered better than binary and gray coding in solving the multiparameter optimization problem [23,24].RQGA has the inherent advantage of QGA.The search performance and quantum operators make RQGA have the characteristics of effectiveness, flexibility, robustness, and so on.The RQGA can solve the problem with real parameters well, so the RQGA is applied to optimize the BPNN.

Coding Method of RQGA. Initial chromosome includes
string of real value and string of quantum bit value and is expressed as ( = 1, . . ., ) is the value of real number coding, and    ( = 1, . . ., ) denotes phase angle of quantum bit.So each chromosome contains information of real number space and phase space at the same time.
The characteristics of RQGA are as follows.
(1) Quantum bit coding makes population obtain better diversity to reduce the calculation.( 2 best .The basic principle of the two operators is as follows: NO1 has better search performance to generate solution strings which are very different from the given string.NO2 has better exploitation performance to make   converge to  best in the process of algorithm.So, this algorithm can keep a balance between exploration and exploitation.The evolution

Generate
where  is the variety of angle,    is the rotation angle, and The formula of individual elements is  where   max and   min are the maximum and minimum values in the allowable range.The flowchart of using NO1 to calculate the th element of the th individual of the th population in the th generation is shown in Figure 3. Neighbor Operator 2: most of the mechanism of NO2 is the same with NO1.In addition, the value point generated by NO2 is between   and  best , and the generated point is used for exploring the search space.Formula (4) is applied to calculate    in NO2 where   max = max( best ,    ),   min = min( best ,    ), and    is the best individual of the th family in the th generation.
NO1 is better in exploration; NO2 is better in exploitation.Exploration is important in the early evolution; exploitation is important in the late evolution.So NO1 is carried out with a greater frequency in the early evolution, and NO2 is carried out with a greater frequency in the late evolution.The specific frequency is defined by formula (9).Consider where  NO1 is the use frequency of NO1,  NO2 is the use frequency of NO2,  is the current evolutionary generation, and  is the total evolutionary generation of every circle.

Update of Quantum Bit String.
The individual states of all quantum bits in  are changing during the process of update so that the probability of the generated solution similar to the current optimal solution increases gradually.The change of probability is determined by the learning rate Δ.The value of Δ under different conditions is shown in Table 1.
Δ determines the speed of the quantum bit changing from 0.707 to the final value of 0 or 1. Δ needs to be small enough to ensure that the number of generation that  changes from 0.707 to 0 (or 1) is large enough.So, the probability to generate the solution similar to the current best solution is larger when most of the quantum bits converge to 0 or 1.
Two kinds of migration (global and local) take effect together.  is selected randomly to update some  application object when RQGA is applied to specific problem.The flowchart of RQGA is shown in Figure 4.
RQGA is with good global search ability.RQGA is usually not restricted by the constraint conditions such as the property of the problem and the structure of the model of the problem and can converge to global optimal solution with a larger probability.Robustness of RQGA makes it capable of being combined with BP algorithm to improve the generalization ability and learning ability of neural network.And the encoding manner based on real number avoids encoding and decoding in binary encoding manner to improve the computational efficiency.

The BPNN Based on RQGA
The convergence speed of BPNN is slow, so the RQGA is introduced to optimize the parameters of network, speed up the convergence, and obtain the global optimal solution.RQGA is a global search process, which searches from one population to another.The parameter space is sampled unceasingly and the direction of the search is toward the area of the current optimal solution.The BPNN based on RQGA (RQGA-BP) combines advantages of RQGA and BPNN.RQGA is applied to optimize the weights and thresholds of input layer and hidden layer to avoid BP algorithm trapped in local minima.The process of RQGA optimizing BPNN is divided into three parts: structure determination of BPNN, RQGA optimization, and BPNN optimization.Structure determination of BPNN needs to specify the number of parameters of input and output.RQGA optimization optimizes the weights and thresholds of BPNN.Each individual of the population contains all the weights and thresholds of network.Flowchart of RQGA-BP is shown in Figure 5.
The specific steps of RQGA-BP are as follows.(1) Code parameters.Set the weights and thresholds as genes.Each weight and threshold is expressed by a real number.So the evolution is based on some weights and thresholds.
(2) Generate initial population.The range of gene is (−0.5, 0.5) because the weights are small in good network.(3) Calculate fitness.The goal of BPNN is to make the residual error between forecast value and expected value as small as possible, so the norm of error matrix between expected value and forecast value is set to be the output of the objective function.Adopt sort of fitness assignment function with the manner of linear ranking and differential pressure 2 to estimate the fitness.(4) Evolve the population with RQGA and calculate the fitness of new individual.(5) The gene of best individual is the optimal solution of the weights and thresholds and is used to predict neural network.

Case Analysis
Two cases are used to justify the performance of RQGA-BP.  2 and 3, respectively.The formula  2 = 2 ×  1 + 1 is adopted to calculate the number of neurons in hidden layer.The transfer function of the neurons in hidden layer is S-shaped tangent function.The transfer function of the neurons in output layer is S-shaped logarithmic function.The states of parts are divided into three kinds of situation and the output form of the three situations is as follows: Normal: (1, 0, 0); Crack: (0, 1, 0); Defect: (0, 0, 1).
The nodes number of input layer is 15, nodes number of output layer is 3, and nodes number of hidden layer is 31.So the number of weights is 558, the number of thresholds is 34, and the number of total parameters that needs to be optimized is 592.The training frequency of network is 1000, the training goal is 0.01, and the learning rate is 0.1.The norm of test error of test sample is set as the quota to measure the generalization capability of network.The fitness value of individual is calculated based on the norm of error.The smaller the error is, the larger the fitness of individual is.
The initial code of RQGA-BP is as follows:  and true value.Choose relative error and the coefficient of decision to evaluate the generalization ability.The calculation formulas are shown in formula (8) and formula (9), respectively: where ŷ ( = 1, 2, . . ., ) is the prediction value of the th sample;   ( = 1, 2, . . ., ) is the true value of the th sample;  is the number of samples.The smaller the sum   of relative error   the better, and the larger the decision coefficient  2 ∈ [0, 1] the better.Run GA-BPNN and RQGA-BPNN 10 times, respectively; the result is shown in Table 5.
As can be seen from the data in the table, the seventh result of GA-BP is better than RQGA-BP; the remaining results of RQGA-BP are better than GA-BP.
Set the fourth result as the example; the specific calculation results are as shown in Table 6.
The contrast of prediction results of two methods with true value of the fourth experiment is as shown in Figure 8.
It can be seen that both GA-BP and RQGA-BP can predict the octane content and the result of RQGA-BP is better.

Conclusions
The optimization of weights and thresholds of BPNN is a numerical optimization problem.The purpose of RQGA optimizing BPNN is to obtain better initial weights and thresholds through RQGA.The individual in RQGA represents the initial weights and thresholds of network, and the norm of test error of the prediction sample is the output of objective function.Compared with conventional BPNN, the RQGA-BP is with higher convergence rate.

Figure 3 :
Figure 3: Flowchart of using NO1 to calculate elements of individual.

Figure 7 :
Figure 7: Diagram of NIR of samples.

R 2 =Figure 8 :
Figure 8: Result of the fourth experiment.
) RQGA utilizes special quantum evolutionary operators to generate candidate solutions containing real parameters, which is different with the candidate solutions generated by quantum observation in QGA.(3) RQGA applies quantum rotation gate to realize the evolution of quantum bit, which is the same with QGA.(4) Migration of different quantum bits realizes the migration of population of different solution, so the convergence degree and the quality of the solution are improved.The following method is utilized to generate real number candidate solution strings.There is a group of   quantum bit strings,    ( = 1, 2, . . .,   ), which is the th quantum bit string in the th generation.Correspondingly, there is the other group of   strings,    ( = 1, 2, . . .,   ); each string contains   real number.There are   quantum bits in each   representing the probability amplitude of   .The probability to generate the real number which is larger (smaller) than present number is determined by |  | 2 (|  | 2 ).All the probabilities are equal at the beginning of search;   and   are initialized to 0.707.Every element of   ( = 1, 2, . . .,   ) is initialized to a random number in the allowable range.Each pair of   and   constitutes the th family in the th generation.  solution strings    ( = 1, 2, . . .,   ) of the th family are generated by    ,    , and   best (the best solution found present).Fitness    is calculated under the constraint condition.The process to generate   best is shown in Figure 2.
Diagrammatic sketch of generating new  best .ofquantumbitrepresents the evolution of the superposition state, and change of || 2 and || 2 is translated to the change of real number generated by the two neighbor operators.The role of the two neighbor operators is that NO1 is used to search the solution space and NO2 is used to converge to extreme value.Neighbor Operator 1: in the th generation, there are   quantum bit strings   ; there are   elements in each quantum bit.NO1 generates solution string    ( = 1, 2, . ..,   ); there are   elements in each string.Generate an array   with   elements; the value of elements of   is +1 or −1 randomly.  is the th element of   ; then

Table 1 :
The method to determine the value of Δ in the th generation. is the size of the rotation angle in the th generation,   =  max ⋅exp(−/),  max is the maximal rotation angle,  is the total evolutionary generation, and  is the current evolutionary generation.This method to determine   is helpful to the search in early stage and convergence precision in late stage.
in local migration;  best is used to update    in the global migration.It also needs to consider the particularity of the i Figure 4: Flowchart of RQGA.

Table 2 :
The training sample data.

Table 3 :
The test sample data.

Table 4 :
The experiment results of the three algorithms.

Table 5 :
Result of GA-BPNN and RQGA-BPNN.The number of training sets is 50, and the number of test sets is 10.Since the training set and testing set are randomly generated each time, the results of each run may be different from each other.After the test, evaluate the generalization ability of the network by calculating deviation between predicted value

Table 6 :
Result of the fourth experiment.