Vehicle Information Influence Degree Screening Method Based on GEP Optimized RBF Neural Network

,


Introduction
With the continuous progress in the field of vehicle hardware technology, coupled with mobile Internet applications and other technology-driven applications, the amount of realtime vehicle information (particularly text information) has increased rapidly.A large amount of real-time information, in addition to a large number of storage and processing resources, occupies a large amount of transmission bandwidth, and practical applications often do not need to use all of the data generated.Therefore, the actual use of the data generated by the influence degree impact assessment is urgently required to solve one of the problems of the related applications.Therefore, this topic has attracted considerable research attention thus far.Chuanqin and Yufeng [1], from the perspective of the influence of safe driving, analyzed the influencing factors using the survey data of AHP and a comparison of the effects of the most influential factors on the driving safety of the driver.Wang et al. [2] used factor analysis and principal component analysis to analyze the influencing factors of urban typical passenger transport mode selection.Xu et al. [3], on the basis of an analysis of the data sampling interval and the relationship between the traffic flow parameters and the other factors, proposed the methods of real-time dynamic traffic data screening of abnormal values and their recovery.Yulong and Ma [4] proposed a method of screening and recovering real-time traffic data.Longhui et al. [5] proposed a screening and testing method of intelligent traffic sensor data.Yulong and Junying [6] sorted the influencing factors according to the size of the correlation degree and then screened the key factors affecting the evolution of the urban traffic structure.In addition, methods including traffic impact factor screening methods [7,8] and neural network optimization methods [9][10][11][12] have been developed by scholars [13,14].However, for the screening of various factors related to traffic travel, most of the current methods are focused on trip data analysis, while few actual vehicle data applications have been considered.Therefore, in this study, we considered the normal operation of a purely electric bus in the process of running data through the GEP optimization of the RBF neural network, using the driving range of the purely electric bus as the research object, the influence of the main factor analysis, and the continued screening of various factors affecting the driving range of the electric buses, to provide decision support for the purely electric bus line planning and adjustment, vehicle scheduling, and charging station planning.

Vehicle Information and Impact Factor Data Description
Nowadays, a large number of public transport vehicles are new energy vehicles.In these new vehicles, the on-board information collection is more accurate and faster than earlier.Therefore, we considered the vehicle data of 26 purely electric buses running on 801 roads in Guangzhou as the object of this study and verified the accuracy of the algorithm proposed in this paper.The influence degree of the various influencing factors on the energy consumption was a result of the output.The influence of various types of vehicle information on energy consumption was thus analyzed.The data sources included the following: the purely electric bus battery management system, time, SOC, total voltage, total current, maximum temperature, minimum temperature and maximum voltage, and the minimum voltage calculated using the historical and real-time data of eight parameters.We obtained the position, speed, direction, time, elevation, departure, shift, scheduling, vehicle entrance, vehicle departure, driver position information and management information from the intelligent bus dispatching system, and the satellite positioning vehicle travelling data recorder.The traffic card clearing and settlement system, combined with the traffic surveys, passenger flow analysis, and derivation, provided the passenger flow data.The data of the traffic performance index system were obtained from the traffic performance analysis system.Further, meteorological data were obtained from an open meteorological website.
As the energy consumption of the purely electric buses is considerably affected by season, in this study, we chose the season with the maximum energy consumption for the analysis.The main reasons for this selection were as follows: first, when the energy consumption increases, the relationship between the driving range and the energy consumption is relatively easy to distinguish.Second, in the case of the purely electric bus, the replacement battery must use specialized equipment, and this equipment is not movable.In the actual operation, it is unable to carry on the complete power consumption experiment; in the process of data collection, it can only be used for pure electric bus energy consumption, and for the influence factors of operating state data as the research object and the use of energy, more working process can make the prediction range closer to the true value.Third, the factors affecting the working life of a purely electric bus battery include the battery capacity, battery voltage, battery energy and specific energy of the battery, charging efficiency, and battery self-discharge rate.Generally, for experienced several charge-discharge processes and battery rated capacity in the first use of a period of time, the battery capacity will increase 5%-20%; subsequent use will remain unchanged after gradually reduced in a certain period of time.When the capacity of the battery is only 80% of the initial capacity, the battery life can be considered to be over, and the power batteries of the purely electric buses are in the stable state.
Therefore, first, the energy consumption data of electric buses in May 2013, five-month to 9-month high temperature based on the bus passenger flow, weather, operation, line, and station data samples, collected a total of 29113828 original data.According to the requirements of the adjacent section of the station, we integrated the data samples, on the basis of the relevant requirements of the station traffic energy index, data fusion, classification, information extraction, matching, removal of missing data, and other operations.Finally, 1023768 samples of complete data were obtained.The average section of each station contained 26941 attributes, complete stations, energy consumption data of the adjacent sections, and the actual operating driving range.
Figure 1 shows the actual value curve of the energy consumption for the 100 consecutive days from June 1, 2013, to September 8, 2013, at the University City station departure to the backyard.The energy consumption of SOC was the highest between 45% and 61%, the maximum energy consumption of SOC was 60.80%, the minimum 2 Complexity energy consumption of SOC was 45.20%, the average energy consumption of SOC was 53.67%, and the 69 operation energy consumption of SOC was 50%-57%.Figure 2 shows the passenger flow data of the entire route of route 801 on July 11, to 15 minutes for the statistical cycle, from 6:00 in the morning statistics until 23:45 classes closed.Figure 3 shows the average distribution of the passenger flow on days 1-15 of July with the statistical period of 15 min during the operational hours.Figure 4 shows the passenger flow diagram of the same condition.With statistical intervals of 5 min, Figure 5 shows the traffic performance index of the 801 line from July 11, 2013, 6:00-23:59 in Guangzhou City, which reflects the changing trend of the regional traffic congestion and the temporal and spatial evolution laws of congestion.Similarly, with a 5-min statistical interval, Figure 6 shows the 801 lines on July        3 Complexity of the comprehensive congestion time and space to the total time and space range indicates the congestion proportion of the road network.

Influence Degree Description and Test Result of Energy Consumption Influence Factors
The influence degree of the energy consumption factors on the driving range of purely electric buses can be described by calculating the influence degree of the influencing factors of energy consumption.The RFB neural network is a type of artificial neural network, which uses local adjustment to perform function mapping [15][16][17].It has strong input and output mapping functions and is the best network for a mapping function in a forward network.The RFB neural network has a strong nonlinear approximation ability, simple network structure, and fast learning speed.The output matrix of the hidden layer after iteration is linear with the output.Therefore, this is an ideal algorithm for calculating the influence degree.
In the two stages of the learning process of RBF neural networks, unsupervised learning was based on the input samples to determine the vectors and normalized parameters of the Gauss function of each node in the hidden layer.In the supervised learning phase, the hidden layer parameters were determined by using the least squares method, and the weights of the hidden layer and the output layer were calculated.The RBF neural network i's hidden layer output was calculated as follows [15][16][17]: where x t is the tth time network input vector; c i is the central vector of the ith cell in the hidden layer; s i is the shape parameter of the Gauss function, besides s i > 0, 1 ≤ i < L; and L is the number of hidden nodes.
For the RBF neural network, the overall network output can be expressed as follows: where For the RBF network with k input nodes, M output nodes, and n learning samples, the error objective function can be expressed as follows: Among them, λ is called the forgetting factor, and δ i n , y i n , and y i n represent the output node error, expected output, and actual output, respectively.
The sample data of the adjacent sections of the same bus at the same time of departure in July 2013 were selected.The following figure (Figures 7 and 8) are based on these sample data of the impact factors, braking mileage proportion, and the total voltage according to the RBF neural network calculation of the prediction error map.
Since the number of hidden nodes in RBF networks and the centers of the hidden nodes were difficult to determine, the accuracy of the entire network was affected The above figure shows that when the training sample selection was more random, the error reached more than 33%, and these errors directly led to the calculation of the influence degree of the influencing factors.In addition, as the input samples existed in various forms, including discrete values, continuous values, and missing values, the training samples were usually formed by random sampling.The center of the hidden layer basis function of the RBF neural network was selected from 4 Complexity the input sample set, which had a considerable dependence on the training samples, and in many cases it was difficult to reflect the real input-output relationship of the system.Moreover, if the initial center points too many cases caused by the optimization process, there will be data phenomenon, which is the key problem for RBF neural network modeling to solve the nonlinear system and is the sample selection problem.To solve this problem, in this study, all of the training samples were normalized, and the introduction of the gene expression programming (GEP) algorithm [15][16][17], the other algorithm [12,[18][19][20][21][22][23][24][25][26][27][28][29][30][31], and the method of influence degree calculation were based on RBF neural network optimization.
For the normalization, we used the maximum and minimum method to normalize the sample data to the range of 0, 1 : where min x n is the minimum in the data sample of a specific influencing factor and max x n is the maximum value in the data sample of a specific influencing factor.After normalization, GEP was optimized and the RBF neural network was computed.GEP combines the advantages of genetic algorithm (GA) and genetic programming (GP) with the advantages of simple coding, strong local refinement, and broad adaptability of GP.The algorithm could genotype and phenotype to solve complex application problems through simple compact encoding; has strong global search ability; is a highly parallel, randomized, adaptive search algorithm; and overcomes the two types of GA operation with a low survival rate and semantic richness.The GEP optimization of the RBF neural network can be expressed as follows [16].
First, the chromosome was encoded as an expression tree, the initial cluster centers of the gene tail were automatically segmented and merged, and the new clustering centers and the number of centers were obtained.Second, the new clustering centers as the center vectors of the RBF neural network with the chromosome and the weight were used to construct an RBF neural network structure, and the sample data were input to the neural network to obtain the actual output.Then, we used the 2 (where N is the sample logarithm and L is the output node number of the network.ξ k represents the expected output of the k neurons under the action of the sample p, and y p k represents the actual output of the first k neurons under the action of the sample p) formula for calculating the total network error and according to the fitness function for calculating the fitness value.The chromosomes with a large fitness value were retained in the next generation, and the GEP operator was used to perform the genetic manipulation of the chromosomes in the population to obtain the next-generation chromosomes.Then, the same was true for the next generation of chromosomes, with the evolution of chromosomes, the center vector, and the weights of the RBF neural network minimizing the total error of the network (i.e., the maximum fitness value) in the direction of gradually closer, so as to achieve the purpose of optimizing the entire network.
Considering the actual situation of purely electric buses, in this study, we adjusted the input randomness of the GEP optimized RBF neural network; the specific steps follow: input: normalized dataset, GEP parameter, and its input power range array w, and output: trained RBF neural network (including influence degree of influencing factor).
Step 1. Initialize the RBF neural network, and select a number of weights from the input power range array w as initial weights.
Step 2. Select the fuzzy cluster center of the adjacent section of the station as the initial cluster center, form the individual chromosomes of GEP, and initialize the population.
Step 3. Segment and merge the initial cluster centers formed by the fuzzy clustering in the adjacent section of the station, and form a new clustering center according to the individuals' expressions.
Step 4. Use the new clustering center as the central vector of the RBF neural network, together with the weights in the chromosome, to construct the RBF network structure.
Step 5. Calculate the total network error 2 and the fitness of chromosomes according to the network structure constructed, and retain the chromosomes with the highest fitness in the next generation.
Step 6. Select the chromosomes by using the roulette strategy.
Step 7. Perform the cross operation on chromosomes by using the cross probability P c .
Step 8. Mutate chromosomes according to mutation probability P m .
Step 9.If the maximum number of iterations or the maximum fitness value convergence is performed to choose the optimal chromosome, otherwise the third step of individual expression was segmented and combined to form a new cluster center on the site of adjacent sections by fuzzy clustering in the initial cluster center.
Step 10.Decode the selected optimal chromosome and construct the final network structure.
Step 11.Output the RBF neural network optimized by GEP.Simultaneously, we selected the sample data of the adjacent sections of the station for July 2013 to predict the impact factors.After the GEP optimization, the error maps were compared with the optimized ones, which are shown by Figures 9 and 10.

Complexity
After the optimization of the RBF neural network by GEP, the influence of the influencing factor input on the ith hidden layer node can be expressed as follows: where cov h i t , x j t is the ith hidden layer node that outputs the covariance between h i t and the jth input influencing factor x j t ; var h i t and var x j t indicate the variance of h i t and x j t , respectively; and v ij is the correlation coefficient.
The influence of the ith hidden layer node on the output can be expressed as follows: where cov H i , y is the ith hidden layer node that outputs the covariance between the h i t and the network output y; var h i t and var y indicate the variance of h i t and y, respectively; and w i is the output weight.
The impact of the jth input factor on the output quantity can be expressed as follows: All of the factors affecting the total influence degree can be expressed as follows: Generally, for the influence degree of the influence factor in the order from large to small, from the maximum influence degree of the beginning of the year, if the cumulative results reach the influence degree sum of 85% or more, the factors affecting the influence degree are the cumulative corresponding results that can be regarded as important influence factors; these factors will play an important influence on the prediction results.
Calculated for the May 2012-April 2013 data, the impact factor' influence degree values of all the available quantitative information for the impact calculations are shown in Tables 1-5.
The ranking of the influencing factors was as follows: residual SOC value, energy feedback mileage proportion, season (air conditioning use), time and direction of travel, braking time proportion, uniform time proportion, acceleration/ deceleration time proportion, minimum temperature, slipping mileage proportion, braking mileage proportion, energy feedback proportion, traffic congestion index, maximum temperature, slipping time proportion, congestion duration, uniform mileage proportion, average speed on main road, average deceleration, average speed of secondary trunk roads, average deceleration mileage proportion, average acceleration, weather, acceleration and deceleration mileage proportion, average accelerated mileage proportion, section weight coefficient, and congestion mileage proportion.The total percentage of factors reached 86.11%, and the following factors were considered the important factors: passenger number (2.33%), network stability index (2.06%), and number of congested sections of adjacent stations (2.00%).The proportion of the three influencing factors was more than 2%, and it also occupies a large proportion: outside lighting (light), voltage difference, total current, total voltage, ceiling voltage, and minimum voltage.Only 7.5% of the total number of the factors was affected.These factors were, in general, nonimportant factors.6 Complexity

Influence Degree Application and Test Result
Using Guangzhou's 801 line of the purely electric bus as the research object, according to the influence degree of screening factors, the establishment of pure electric bus operating driving range forecasting model operates on the line of pure electric bus driving operation status prediction to verify.What needs to be explained is that some influencing factors have certain regularity, such as traffic data on the same day the same weekly class time has certain regularity, so this paper uses the same day the same departure time as the data sample under the same condition, prediction of operating driving range.In general, 801 lines are maintained daily between 52-58 shifts; the working days and nonworking days are different, and the departure is adjusted according to the actual situation.The buses on the 801 line departed between 6:00 and 23:00 on July 11, 2013, according to the halfhour statistics, the number of times, and departure time distribution diagram, which is shown by Figure 11.
The data from May 2012 to August 2013 were used as the training samples, and the actual data from September and October 2013 were used as the actual values to compare with the predicted values.The error values of the predicted values during the period were statistically analyzed.During the festival, there are 11 days (Mid-Autumn Festival, National Day holiday, and September 30 count as holidays), 11 days of weekend, and 39 days of work.In the removal of incomplete data or the frequency of information obvious data errors, for comparison with the predicted value of holidays, there are a total of 613 departures, a total of 597 weekend departures, and a total of 2105 work day departures.
Table 6 shows the statistical results of the relative error and the absolute value of the partial energy consumption influencing factors.The factors considered were those that were very influential and could be quantified, such as minimum temperature, energy feedback mileage proportion, traffic congestion index, sliding time proportion, energy feedback proportion, braking mileage proportion, and uniform mileage proportion data.
Table 6 lists the prediction results of some influential factors.The total percentage of the seven influencing factors was 23.58%, which has a certain significance.Seven factors in the "number of sample errors less than 20%" statistics and the absolute value of error statistics mostly greater than 70%, basically can achieve more accurate prediction of the value of the impact factor requirements.Most of the absolute values for "number of sample errors less than 5%" were between 10% and 15%, and the accurate estimate of the running mileage was realized.Among the three categories of work days, weekends, and holidays, the influencing factors of the working days and the number of sample error less than 20%" statistics, in addition to the proportion of energy feedback proportion and brake mileage proportion of two influence factors, have reached more than 80%.The average value of the seven factors for the working days and the "number of sample errors less than 20%," was 81.65%, which was considerably larger than the average of 74.11% and 69.99% on the weekends and on holidays, respectively.This implied that the influencing factors had strong regularity, but the regularity of weekends and holidays was relatively weak, which was consistent with the actual flow of traffic, passenger flow, and so on.
To visually express the difference between the predicted and the actual values of the influencing factors, the predicted values and the actual values were displayed on the coordinates in the form of scatterplots.If the predicted value was the same as the actual value, the data point fell on the y = x line.If the predicted value was greater than the actual value, the data point fell below the straight line; otherwise, it fell over.The actual and the predicted values of all the energy feedback proportions during the same operating period of September and October 2013 were used as examples to illustrate the deviations between the predicted and the actual values.Figure 12 shows the scatter diagram for the predicted values and the actual values of the factor of energy feedback proportion on the workdays.
The scatter plot of the energy feedback proportion was between 0.05% and 0.2%.The predicted and the actual values were larger than those of the other regions, and the mean square error of all the data was 0.0786.The ∑ y i − y 2 were considered the basis of the correlation measurement, and the calculated value was r = 0 9238.The scatter diagrams for the traffic congestion index, acceleration and deceleration proportion, braking time proportion, uniform mileage proportion, and the slipping mileage proportion are shown in Figures 13-17.The mean square errors between the predicted values and the actual values of the slipping mileage proportion, braking mileage proportion, braking time proportion, uniform time proportion, and acceleration/deceleration time proportion were 0.0657, 0.0877, 0.0723, 0.0802, and 0.0902, respectively, and the Pearson coefficients were 0.9109, 0.9321, 0.9103, 0.9331, and 0.9213, respectively.
The error of the operating mileage values of the purely electric bus was compared, by the ways of expressing the same and main factors and by comparing the operating mileage errors between actual and predicted values, indicating the degree of accuracy of the algorithm.Figure 18 shows the scatter diagram of the actual values and the predicted values of the driving range; the abscissa shows the predicted values, and the ordinate shows the actual values.Similar to the expression of the influencing factors, if the predicted value was the same as the actual value, the data point fell on the straight line.If the predicted value was greater than the actual value, the data point fell below the straight line; otherwise, it fell over the top.
The operating driving range's test samples were concentrated in the mileage between 49 km and 56 km, in line with the actual operation of the law, the proportion of 81.45%.The mean square error of the predicted value was MSE = 1 9673, and the Pearson coefficient was r = 0 9313.
We extracted the actual mileage value of the test sample and analyzed the error between the data of 49-56 km.In Figure 19, the abscissa denotes the actual values of the operating driving range, and the ordinate represents the difference between the predicted value of the operating driving range and the actual value.
From the operating driving range's predicted value and the actual value, the prediction error distribution of the two scatter diagrams was observed.The proposed algorithm achieved the operating driving range accurately, and the error control was in the acceptable range.The statistics of the overall results (including working days, weekends, and holidays) showed that the operating driving range's prediction error value was less than or equal to 3% and less than or equal to 5% and less than or equal to 8%, respectively.8 Complexity       According to the proportion of the operating lines that approved the longest mileage of 60 km, corresponding to the error values of less than 1.80 km and mileage of 3 km and 4.80 km, the average mileage was 54.13 km for the test samples, and the corresponding error values were less than 1.08 km, 2.17 km, and 3.25 km (they were 14.48%, 44.43%, and 81.87%, respectively), which meet the basic requirements of the pure electric bus operation operating driving range and the actual operation safety requirements.

Conclusions
This collection of the purely electric bus's location data and energy consumption data as the basic data, combined with the factors affecting the quantitative expression of the bus operation process, was used for analyzing the operating driving range's influencing factors and establishing a database for the purely electric bus.The RBF neural network algorithm was optimized by GEP, and the influencing factors of the operating driving range of the pure electric bus were calculated.The influence degree of all of the factors was analyzed, and the influential factors were obtained.Based on the fuzzy clustering algorithm and the fuzzy time series algorithm, a prediction model of the purely electric bus's operating driving range was established in this study.The experimental results showed that this method could simplify the mathematical model and reduce the computational complexity and did not have a greater effect on the accuracy of the prediction results.Although the vehicle hardware continued to progress, the amount of data required by the application continued to increase, but this required a certain amount of mobile bandwidth, computing capabilities of mobile devices, and so on.In this study, through the optimization algorithm, we attempted to improve the efficiency of the bandwidth and the application data; to realize the safe operation of electric buses, battery protection, battery energy management, line adjustment, and estimation of the purely electric bus's emergency scheduling mileage; and to provide decision-support charging station planning.The further promotion of the use of the purely electric buses, hybrid buses, and other new energy vehicles is of considerable significance.At the same time, it provides experience for other applications that need to be screened and analyzed.

Figure 1 :
Figure 1: Consumption of SOC value curve for 100 consecutive days.

Figure 5 :
Figure 5: City road traffic congestion index.

Figure 9 :
Figure 9: Influencing factor prediction error of RBF neural network before and after optimization (brake mileage proportion).

Figure 10 :
Figure 10: Influencing factors prediction error of RBF neural network before and after optimization (total voltage).

Figure 12 :
Figure 12: Scatter diagram for predicted value and actual value (energy feedback proportion).

Figure 13 :Figure 15 :
Figure 13: Scatter diagram of predicted values and actual values (traffic congestion index).

Figure 14 :
Figure 14: Scatter diagram of predicted values and actual values (acceleration and deceleration time proportion).

Figure 16 :
Figure 16: Scatter diagram of predicted values and actual values (uniform range proportion).

Figure 17 :
Figure 17: Scatter diagram of predicted values and actual values (sliding range proportion).

Figure 18 :
Figure 18: Scatter diagram of predicted values and actual values (operating driving range, km).

Table 1 :
The influence degree of energy consumption influence factors (battery).

Table 2 :
The influence degree of energy consumption influence factors (traffic performance index).

Table 3 :
The influence degree of energy consumption influence factors (driving characters).

Table 4 :
The influence degree of energy consumption influence factors (driving actual service life characteristics).

Table 5 :
The influence degree of energy consumption influence factors (other).

Table 6 :
The operating driving range influence factors forecasting results of statistics for pure electric bus (relative error values).