Genetic Algorithm Based Microscale Vehicle Emissions Modelling

1MOE Key Laboratory for Transportation Complex Systems Theory and Technology, School of Traffic and Transportation, Beijing Jiaotong University, 3 Shangyuan Cun, Haidian District, Beijing 100044, China 2Faculty of Civil Engineering, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, Malaysia 3School of Civil Engineering, Faculty of Engineering, Architecture and Information Technology, The University of Queensland, Brisbane St Lucia, QLD 4072, Australia


Introduction
Fuel consumption and emission estimation can be critical for comprehensive transportation planning.In light of the strong relationship between CO 2 emissions and fuel consumption [1] and the high volatility in crude oil prices, this requires more accurate and robust tools to quantify environmental impacts so that project evaluation can adequately address community expectations.Quantitative travel demand and emission models are necessary for the evaluation of future transport/land use options, as well as for the management of existing transport systems.The modelling of emissions is seen as an increasingly important tool in transportation planning and management.
Historically, car-following and traffic flow models have been developed using different theoretical basis.This has given rise to two main kinds of models of traffic dynamics, namely, microscopic representations, based on the description of the individual behavior of each vehicle, and macroscopic representations describing traffic as a continuous flow obeying global rules [2].Strategic travel demand models tend to be large and regional in nature whereas microsimulation models are used for detailed tactical or operational testing of options.Taking the highest macroscopic level as an example, the total vehicle flow and the average speed over an entire network may be all that is provided [3].At the lowest level of the hierarchy, high-resolution microscopic transportation models typically produce second-by-second vehicle trajectories (location, speed, acceleration, etc.).Hence, the traffic modelling and emission modelling should match by the accuracy level and aggregation level.For instance, driving cycles used for vehicle emission testing are specified on a secondby-second speed-time profile.Microscopic traffic models should integrate real time emission prediction models, which are able to utilize high-resolution transportation modelling results, therefore generating potentially more precise emission estimations.
Taking the highest macroscopic level as an example, the total vehicle flow and the average speed over an entire network may be all that is provided [4].Correspondingly, a number of emission models deliver the predictions based on the aggregated traffic model outputs, for example, the ARTEMIS in Europe and Mobile 6 from US [5,6].The common macrolevel modelling approach used to produce a mobile source emission inventory is based on two processing steps.The first step consists of determining a set of emission factors that specifies the rate at which emissions are generated, and the second step is to produce an estimate of vehicle activity.The emission inventory is then calculated by multiplying the results of these two steps together.This methodology has two major shortcomings as follows.
Inaccurate Characterization of Traffic Activities.The overall rate of error in 20-year traffic forecasts resulting from strategic transport models is likely to be large due to the uncertainty attached to input forecasts.The predictive ability of current emission models depends on (a) the errors in the input data used to estimate the amount and characteristics of traffic flow-such data are usually the result of travel demand modelling analysis that is known to have a large degree of uncertainty associated with it-and (b) the adequacy and quality of the information on emission rates that is used in the models.

Emissions Factors May Not Represent Actual Conditions Adequately.
The current methods used for determining emission factors are based on average driving characteristics embodied in a predetermined driving cycle.Emission measurements are used as base values to reconstruct statistically the relationship between emission rates and average vehicle speeds.These "averaged speeds" are at variance with the vehicle dynamic operation in microscope.
At the lowest level of the hierarchy, high-resolution microscopic transportation models typically produce second-by-second vehicle trajectories (location, speed, and acceleration).Driving cycles used for vehicle emission testing are also specified on a second-by-second speed-time profile.Microscopic models should be integrated with real time emission prediction models which are able to utilize highresolution transportation modelling results, thereby generating potentially more precise emission estimates.Several commercial microsimulation traffic packages are widely used to estimate the emissions [7].
There have been a number of modelling approaches on microlevel proposed to estimate future vehicle emissions in conjunction with the outputs of transport models.One such approach is the use of engine power as the main predictive basis.Another is the use of vehicle speed and acceleration as predictive variables.There are three main types of modelling approaches, namely, power-based, speed-based, and hybrid models.Two models from each category, as shown in Table 1, have been analysed [13].It is found that power-based models have good performance on CO 2 emission and fuel consumption.In contrast, the predicted HC, CO, and NO x do not match well with the measured results [8].Moreover, it is difficult to collect some coefficients associated with instantaneous power, such as aerodynamic drag coefficient, vehicle frontal area, and gearing.The results from speedbased models highlight the need to model acceleration, Comprehensive modal emissions model (CMEM) [3] Speed-based Energy and emissions model (VT) [9] Instantaneous traffic emissions [10] Hybrid Microscale modelling [11] Microscale modelling [12] deceleration, and cruising stages of the urban cycle separately.
The instantaneous traffic emissions model, a speed-based approach which utilises the microtransportation simulation result as an input, was found to have merit based on the evaluation results.More complex models, whilst theoretically more desirable, may mean additional input measurement errors, such that the overall effect may not yield more accurate estimates [14].
The instantaneous traffic emissions model developed by Int Panis et al. [10] has been adopted by the AIMSUN traffic simulation model [7].The latter integrates traffic simulation results with emission prediction equations.Emission functions for each vehicle were derived with instantaneous speed and acceleration as parameters using nonlinear multiple regression techniques.The model, shown in (1), was calibrated using data from twenty-five vehicles (six buses, two trucks, and seventeen cars) in Europe.The pollutants modelled are nitrogen oxide (NO x ), hydrocarbon (HC), carbon dioxide (CO 2 ), and particulate matter (PM): where  0 is a lower limit of emission (g/s) specified for each vehicle and pollutant type; V  () is instantaneous speed of vehicle  at time ;   () is acceleration of vehicle  at time ;  1 to  6 are emission constants specific for each vehicle and pollutant type determined by the regression analysis.
For certain pollutants, whenever visual inspection of the data plot reveals a clear distinction in the scatter for acceleration and deceleration, the approach suggests that different functional forms should be derived for different driving modes, such as acceleration (with   () ≥ 0.5 m/s 2 ), cruising (with −0.5 m/s 2 ≤   () < 0.5 m/s 2 ), and deceleration (with   () < −0.5 m/s 2 ).The modelling results for pollutant emissions such as HC are unsatisfactory, as shown in Section 4.
The current paper aims to develop vehicle emission models for predicting emission pollutants other than CO 2 .The models adopt genetic algorithm (GA) for selection of the predicting variables.This paper is structured as follows: Section 2 briefly describes the data sources used and the selection and validation prior to the model development; Section 3 proposes the model development approach; and Section 4 then discusses the findings.Finally, some overall conclusions and limitations are drawn from the results and future research avenues are recommended.

Data Sources, Selection, and Validation
The data used for analysis in the current paper were extracted from the Australian national in-service emissions study (NISE2) [15], which was developed using a Composite Urban Emissions Drive Cycle (petrol CUEDC).This cycle represents typical vehicle journeys in a given metropolitan study area, where data were collected in the Brisbane, Sydney, Melbourne, Adelaide, and Perth areas.The emission rates for CO 2 , CO, HC, and NO x of the test-bed vehicles from the NISE2 fleet, which travels on the composite urban driving cycle (CUEDC), were recorded second-by-second in addition to the instantaneous speed.Prior to analysis, the integrity of the emission measurements from NISE2 was reconfirmed and corrected to enhance the reliability of the instantaneous emissions using the method followed by Smit et al. [16].In addition, cold start affected datasets were filtered prior to analysis by adopting an approach recommended by Favez et al. [17].Eight average-aged passenger vehicles (which had travelled approximately 50,000-100,000 km each), as listed in Table 2, were shortlisted from the NISE2 database for model development.The vehicles selected are similar to Australian vehicle fleet characteristics (e.g., in mileage and age) [18].The instantaneous emission observations were averaged over those vehicles.The AIMSUN emission methodology was adopted for further evaluation using the NISE2 data subset described earlier.Each vehicle was "driven" through the speed profile of the CUEDC drive cycle which was simulated using a purposespecific program.The averaged CO 2 emission observations from the selected eight vehicles were compared with the predicted second-by-second output from the AIMSUN model.The goodness of fit,  2 , for the two sets of data was 0.80, as shown in Figure 1.

Development of Emission Models
Microsimulation based modelling results for pollutants other than CO 2 are usually unsatisfactory [12].The current paper proposes a methodology to improve those results.The concept is briefly described below.
Current microscopic models use a combination of instantaneous velocity and acceleration to predict various gaseous pollutants including HC and CO.Gaseous pollutants such as HC and CO are primarily formed during in-cylinder combustion processes depending on many factors such as air-fuel ratio, cylinder temperature and pressure, and engine speed [19].The formation of HC and CO rises in a rich fuel environment.Thus, the fact of high correlation between CO 2 emission rate and fuel consumption was taken into account in the modelling of these gaseous pollutants.In addition, vehicle acceleration or deceleration leads to substantial change in fuel injection per combustion cycle.The change in air-fuel ratio forces the engine to adapt to a new equilibrium and tends to lead to a transient variation in pollutant formation [20].This effect may be compounded by dynamic effects in the catalyst and exhaust system, such as catalyst malfunctioning, which can cause a sudden increase of the pollutant emissions.For these reasons, modelling of HC and CO, as the products of incomplete combustion fuel, should take the time-lag effect described above into consideration.Hence, several "historical" variables (i.e., variables at previous time steps of -1, -2, -3, or -4 seconds) of the time-lag effect are introduced into the models as predicting variables.
A range of variables for the instantaneous and "historical" velocity, acceleration, and CO 2 emission rate were selected and tested: where  0 is constant;   is velocity at time ;   is acceleration at time ; Rate CO 2 is CO 2 emission rate at time .
A total of 23 candidate predicting variables were identified.An exhaustive enumeration method is not a practical alternative, while stepwise and stagewise regression procedures produce only local optimum solutions [21].Current highly divided emissions models that are reviewed in Section 1 cannot provide any prototype of variable combination.Compared with other artificial intelligent algorithms, unsupervised heuristic genetic algorithm is independent from any a priori knowledge, such as training dataset.In addition, the GA is a stochastic search process that mimics the natural process of "survival of the fittest" through the manipulation of a population of chromosomes [22].With proper binary encoding, the GA can deliver a fair hybrid emission model which synthesizes variables of different models reviewed in Section 1.To be pointed out, the GA may not deliver an optimized result due to premature convergence and "Hamming cliff" problems [23].However, it is more difficult, but not impossible, for a GA to become trapped in a local minimum unlike the more conventional gradient methods [24].
The elite individual solution is a balanced one with high accuracy and consisting of strong statistically related variables.The GA programme was compiled using Matlab 8.4.A selective weighted fitness for the GA, (2), was used to implement an automated variables selection procedure to build the calibration models based on least-square regression: where   = ∑     Variable  ;   is estimated instantaneous emission rate at time ;   is observed instantaneous emission rate at time ;   is binary logic control parameter after regression analysis; is coefficient derived from regression analysis;  is number of observations.  is the prediction of emission rate as a product of selected coefficients and the corresponding candidate variables whose  value of the -statistical analysis is lower than 0.05.The reciprocal relationship of root-mean-square error enables the accurate prediction of solutions with high fitness values.Figure 2 shows the flowchart of the GA application.Firstly, chromosomes that represent feasible solutions of a nonlinear optimisation problem are randomly generated to form the initial population following the fitness test (see ( 2)).Each individual chromosome in binary format symbolises a potential solution to the emission modelling problem. Figure 3 demonstrates an example of an individual chromosome consisting of 23 bits.Each bit represents a corresponding variable in the list of all potential variables.The dichotomous data on each bit are the choice of "included in the model" valued 1 or "not included in the model" valued 0. The size of the solution is therefore 2 23 .In determining the fitness value, the chromosome tends to take more variables for better prediction accuracy.This may include irrelevant candidate variables and undermine the robustness of the model.In order to avoid this from happening, the calculation of the emission rate prediction,   , was limited to those statistically significant variables ( < 0.05).In order to achieve acceptable solutions, the calculations of the fitness values and selection of chromosomes continued.This procedure evolves through many generations by a natural genetic process.The genetic process includes three types of operation, namely, crossover, selection, and mutation [25].It repeats until the number of iterations exceeds a predefined limit.The number of chromosomes in a population and the number of iterations are set to 600 and 100, respectively.The tuning of genetic operator parameters follows the simple genetic algorithm proposed by Mitchell [25]; the details are presented in Table 3.
Figures 4(a) and 4(b) show the maximum and averaged fitness reached over the iterations.The averaged fitness for each generation increases substantially at the beginning stage and reaches stability towards the end.The elite fitness trends upward and fluctuates at the beginning.After 20 generations' evolution, the fitness becomes mature.These results indicate that the chromosome evolvements improve the accuracy of the solution and reach a saturated condition eventually.After 30-40 different runs, the paper shortlists the results and selects the modelling equation based on statistical analysis.Table 4 shows three candidates for acceleration-phase HC emission modelling.Based on modelling fitness and variable significance, this paper selects candidate 1 to be the promising one.As the GA program can be configured to enable the mutation operator only, a random solution is generated by a 600 × 100 random process.The chosen one is compared with the random solution.However, the randomly selected candidate is hardly a solution, due to variable significance  constraint ( value).Hence the elite chromosome can be justified as the optimal (or subprime) solution.

Results and Main Findings
4.1.General.The proposed GA approach was applied to the HC pollutant for different driving modes, namely, acceleration, cruising, and deceleration.The models for the emission rate, , from the ultimate chromosome are presented in (4), ( 5), and (6), respectively.All the predicting variables selected are statistically significant at  < 0.05.Both the instantaneous traffic emissions model (see (1)) and the newly modified equation parameters were calibrated by least-square regression on the same test-bed dataset that was used to develop the new model.To conduct meaningful comparisons of modelling-induced error, the goodness of fit,  2 , for the three driving modes, between the newly modified model and model ( 1) is shown in Table 5.The proposed new approach significantly improved the modelling results for HC, although the prediction accuracy for the deceleration component was relatively low.The deceleration component accounts for 42% of total driving cycle duration, but the summation of HC deceleration component only takes up 25% of total cycle measurements.As a result, the deceleration component does not play an important role as a percentage of total emissions.Due to the multidimensionality of the solution search space, the GA method is not certain to lead to a global optimum when formulated as an optimization problem.GA has a tendency to converge towards local optima or even arbitrary points rather than the global optimum of the problem.This means that it is not wise to sacrifice short-term fitness to gain longer-term fitness.To tackle this issue, the proposed GA methodology relies on the fitness function, which enables quantification of individual solution appropriateness in terms of the statistical significance of both model accuracy and shortlisted variables.To test the fitness function, the size of the population was increased to 2,000 in order to enable more potential solutions to search in the multidimensional space, the results being similar to those previously presented.Moreover, taking the HC acceleration component as an example, the new proposed equation     Rate CO 2 is CO 2 emission rate at time . 1 to   are emission coefficients for each vehicle and pollutant type.

Mathematical Problems in Engineering
The same experimental dataset was used to test the prediction accuracy of the model.Figure 5 shows the correlation between modelling predictions and HC measurements, and Figure 6 illustrates the corresponding residuals plots.vehicles with mileages from 75,000 to 95,000 km; and older vehicles with mileages from 130,000 to 140,000 km.
Comparisons of the goodness of fit,  2 , of the overall modelling results for the three vehicle age ranges, between the proposed new models (summation of ( 4) to ( 6)) and the instantaneous traffic emissions model (1) are shown in Tables 6(a) to 6(c), respectively.Overall, the emission prediction results prove that the proposed new models provide improved results.The new models are more robust and accurate for HC prediction.
Similarly, this methodology is applied to other pollutants including CO and NO x .The new CO emission prediction equations for acceleration, cruising, and deceleration components are shown in (7), (8), and (9), respectively.The overall  2 for whole test-bed dataset is 0.70.

Paralleled Genetic Algorithm
Cantú-Paz [26] classified parallel GAs into three main types: (i) Global single-population master-slave GA.
Single-population fine-grained GA and multiple-population coarse grained GAs are suitable to tackle dynamic function optimization problems [27].It has an important role in optimizing complex functions whose optima vary in time (learning-like process).In a master-slave GA there is a single population, but the evaluation of fitness is distributed among several processors.Matlab 8.4 enables the full functionality of the parallel language features by creating a special job on a pool of workers and connecting the pool to the Matlab client [28].Distributed synchronous GA is based on distribution of workload among processors during the fitness function evaluation phase followed by single central population regeneration.Hence, the massive fitness computations are assigned to workers in order to improve the computation efficiency.When testing on a workstation with i7 CPU (3.6 GHz) and 32 G memory, one generation of genetic algorithm consumes 44.0 seconds without paralleled configuration and 11.8 seconds with 4-worker paralleled configuration.

Conclusions
Past research on modelling vehicle emissions other than CO 2 reveals relatively weak predicting results.The current paper proposes a GA based methodology to determine the contributing variables for predicting vehicle emissions.This method provides a new approach to the selection of a combination of variables among a large potential set.The applications of the new models show enhanced results for modelling vehicle emissions, supporting the new variable selection methodology using GA.The modified fitness function for the proposed GA demonstrates the ability to establish a balanced multivariate model.In addition, the improved HC prediction results, obtained by introducing "historical" CO 2

Figure 1 :
Figure 1: CO 2 by AIMSUN model predictions and averaged observed data.

Figure 5 :
Figure 5: New model predicted versus measured HC.

Table 2 :
Selected average-aged vehicles for model development.

Table 3 :
Parameters tuning of genetic operators.

Table 5 :
Results of model development: goodness of fit,  2 .