Multigene Genetic Programming for Estimation of Elastic Modulus of Concrete

1 Department of Civil Engineering, Islamic Azad University, Kashmar Branch, Kashmar, Iran 2 School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China 3Department of Civil Engineering, Islamic Azad University, Bandar Abbas Branch, Bandar Abbas, Iran 4Department of Civil and Environmental Engineering, Michigan State University, East Lansing, MI 48824, USA 5Department of Civil Engineering, The University of Akron, Akron, OH 44325, USA


Introduction
The importance of elastic modulus of concrete in structural and material engineering is well understood.This parameter has been widely used for the analysis of structure deformations, concrete creep, shrinkage, crack control, and so forth [1][2][3].The elastic modulus of concrete can easily be obtained from the slope of a tensile test stress-strain curve.In practical cases, the elastic modulus is mostly calculated using empirical equations proposed by various codes of practice, rather than performing time-consuming laboratory tests.The existing empirical equations are commonly derived via traditional statistical analyses such as regression, which have major drawbacks [3][4][5].For instance, the regression modeling is based on predefining the structure of the model with a limited number of linear or nonlinear equations.To cope with such limitations, several alternative soft computing approaches have emerged.One of the main features of the soft computing techniques is that they learn from experience and extract the knowledge contained in the experimental data [4,5].Artificial neural networks (ANNs), fuzzy logic (FL), adaptive neurofuzzy inference system (ANFIS), and support vector machine (SVM) are the well-known soft computing methods.These techniques have been utilized for the prediction of the elastic modulus of normal and high strength concrete [6][7][8][9][10].The major disadvantage of the ANNs, FL, ANFIS, and SVM is that they are not capable of providing practical prediction equations.To overcome the limitations of such techniques, a new approach, called genetic programming (GP), is proposed by Koza [11].GP generates simplified prediction equations without assuming prior form of the existing relationship [5,[12][13][14][15].GP and its variants such as linear genetic programming (LGP), gene expression programming (GEP), and multiexpression programming (MEP) have been successfully applied to the behavioral modeling of elastic modulus of concrete [3,16,17].This study proposes a new multigene genetic programming (MGGP) approach to derive prediction models for the elastic modulus of concrete.MGGP combines the modeling capabilities of both GP and statistical regression methods.Despite remarkable prediction capabilities of the MGGP approach [18], there have been very limited studies focusing on the application of MGGP to civil engineering tasks [19][20][21][22][23][24].However, three MGGP-based models are obtained relating the tangent elastic modulus and compressive strength of concrete.A comparative study is conducted between the results obtained by MGGP and those obtained from the buildings codes (i.e., ACI-318-95 [25], NBS [26], CEB-FIB [27], BS-8110 [28], CSA-A23.3 [29], NS-3473 [30], and TS-500 [31]), compatibility aided (i.e., Wee et al. [32] and Gardner and Zhao [33]), FL [6], ANN [7], LGP [3], GEP [16], and MEP [17] models.

Multigene Genetic Programming
GP creates computer programs to solve a problem by simulating the biological evolution of living organisms [11].The genetic operators of genetic algorithm (GA) and GP are almost the same.The difference between GA and GP is that the former gives the solution as a string of numbers, while the solution generated by the latter is computer programs represented as tree structures [3,5,11].A comprehensive description of GP can be found in Alavi and Gandomi [5] and Koza [11].MGGP [18,34] is a new variant of GP.As discussed, the traditional GP representation is based on the evaluation of a single tree (model) expression.In MGGP, a single GP individual (program) is derived from a number of genes, each of which is a tree expression [18,19].In other words, each model evolved by MGGP is a weighted linear combination of the outputs from a number of GP trees.The tress are called "gene." Figure 1 shows a typical program evolved by MGGP.The inputs of the model are , , and  and the functions used for the evolution process are ×, −, +, Log, and √.The model is linear in the parameters with respect to the coefficients  0 ,  1 , and  2 despite using nonlinear terms.As it is seen, the evolved model is a linear combination of nonlinear transformations of the predictor variables [18,19].Two important MGGP parameters that need notable control are the maximum allowable number of genes and maximum tree depth.Restricting the tree depth mostly results in generating more compact models [18,19].
In order to obtain the linear coefficients, an ordinary least squares analysis is performed on the training data.Besides, it is possible to embed multigene approach within a partial least squares method [34].The initial population generated by MGGP contains GP trees with different randomly generated genes.In addition to traditional GP's recombination operators, MGGP uses a tree crossover operator, called twopoint high level crossover to acquire and delete the genes [18,19].As an example, assume that two parent programs evolved by MGGP contain two (Gene 1 Gene 2) and three genes (Gene 3 Gene 4 Gene 5).The genes enclosed by the crossover points are denoted by {} as follows: (Gene 1 {Gene 2}) and (Gene 3 {Gene 4 Gene 5}).Thus, during the crossover operation the genes are exchanged to create two new programs: (Gene 1 Gene 4 Gene 5) and (Gene 3 Gene 2).In MGGP, standard GP subtree crossover is referred to as low level crossover.In this case, a gene is chosen at random from each parent individual.Then, the standard subtree crossover is applied and the created trees replace the parent trees in the unaltered individual in the next generation.Moreover, there are different types of mutation in MGGP such as subtree mutation, mutation of constants using an additive Gaussian perturbation, and set of a randomly selected constant to zero [18,19].Further details about MGGP can be found in [18,19].

MGGP Modeling of Elastic Modulus of Concrete
The modulus of elasticity is frequently formulated as a function of the compressive strength of concrete.Most of the national and international codes use this way to express the modulus of elasticity of concrete (e.g., American Concrete Code (ACI-318-95) [25], British Concrete Code (BS-8110) [28], and Canadian Concrete Code (CSA-A23.3)[29]).Thus, this study is aimed at developing explicit formulas for the tangent elastic modulus (  ) of normal strength concrete (NSC) and high strength concrete (HSC) in terms of compressive strength (  ) as follows: Hence, one parameter is used for the MGGP models as the input variable.The NSC and HSC databases are separately used to derive two different MGGP-based formulas for the   of each of NSC and HSC.In order to propose a generic model for both of NSC and HSC, another MGGP model is developed based on the entire test results.Various parameters are involved in the MGGP predictive algorithm.These parameters selected are based on some previously suggested values [18][19][20][21][22][23][24], and after making several preliminary runs and observing the performance behavior.The parameter settings are shown in Table 1.In this study, basic arithmetic operators and mathematical functions are utilized to get the optimum MGGP models.The number of programs in the population is set by the population size.The number of generation sets the number of levels the algorithm uses before the run terminates [18][19][20].The proper number of population and generation often depends on the complexity  of problems and on the number of possible solutions.A fairly large number of population and generations are tested to find models with minimum error.The programs are run until the runs automatically terminated.The maximum allowable number of genes in an individual and the maximum tree depth directly influence the size of the search space and the number of solutions explored within the search space [18][19][20].The success of the MGGP algorithm usually increases with increasing these parameters.In this case, the complexity of the evolved function increases and the speed of the algorithm decreases.The allowable number of genes and tree depth are, respectively, set to optimal values as tradeoffs between the running time and the complexity of the evolved solutions [18][19][20].There are 3 × 3 × 3 × 2 × 2 × 2 = 216 different combinations of the parameters.All of these parameter combinations are tested and 2 replications for each were carried out.Therefore, the overall number of optimal individual runs is equal to 216 × 2 = 432.GPTIPS toolbox [35], in conjunction with subroutines coded in MATLAB, is used to implement MGGP.Fitness function evaluates the evolved expressions to designate the best encoded expressions [19].
The default GPTIPS multigene symbolic regression function is used to minimize the error (root mean squared error).
The best MGGP models are chosen on the basis of providing the best fitness value on the training data as well as the simplicity of the models [3].Correlation coefficient () and mean absolute error (MAE) are used to evaluate the   performance of the models. and MAE are calculated using the following equations:

MGGP Prediction
Figure 2 shows a comparison between the predicted and experimental   values for NSC.As it is seen, the performance of the model on the testing data is better than training data.Figure 3 shows the variation of the best (log values) and mean fitness with the number of generations.It can be observed from this figure that the fitness value decreases with increasing the number of generations.The best fitness is found at the 197th generation.The statistical significance of each of the three genes of the derived model is visualized in Figure 4.According to Figure 4, the weight of the bias term is higher than the other genes.Figure 4 also depicts the degree of significance of each gene evaluated using  values.As it is seen, the contribution of the genes to explain variations in   is very high, as their relevant  values are very low and are approximately equal to 0. The statistical significance of the second gene (Gene 2) is lower than the bias term and the first gene.

MGGP Prediction
Model for the   of HSC.The optimal formulation of the   of HSC in terms of   is as follows.
The population size, number of generations, maximum number of genes, and maximum tree depth for the MGGP II model are 500, 1000, 3, and 4, respectively.The crossover LGP Figure 5 presents a comparison between the predicted and experimental   values for HSC.As it is seen, performance of the model on the training data is better than testing data.
Although there is a probability that the model is slightly overfitted, it has been the best model obtained through the conducted runs.As can be seen in Figure 6, the fitness value decreases with increasing the number of generations.The best fitness is found at the 199th generation.According to Figure 7, the weight of the bias term is higher than the other genes.Figure 7 indicates that the contribution of the Genes 1 and 2 to explain variations in   is higher than the bias term, as their relevant  values are lower.good on both of the training and testing data.As can be seen in Figure 9, the best fitness is found at the 190th generation.According to Figure 10, the weight (coefficients) of the bias term is higher than the other genes.Figure 10 indicates that the contribution of the Genes 1 and 2 to explain the variations of   is higher than the bias term, as their relevant  values are lower.

Performance Analysis
Figures 11 and 12 illustrate the prediction performance of the MGGP models, American (ACI-318-95 [25]), Iranian (NBS [26]), European (CEB-FIB [27]), British (BS-8110 [28]), Canadian (CSA-A23.3[29]), Norwegian (NS-3473 [30]), and Turkish (TS-500 [31]) codes, two compatibility aided models (i.e., Wee et al. [32] and Gardner and Zhao [33]), FL [6], ANN [7], LGP [3], GEP [16], and MEP [17] models for the   of NSC and HSC, respectively.Moreover, the predictions made by available generalized models for the   of both NSC and HSC are presented in Figure 13.These figures visualize the ratio of the predicted to experimental   values.Apparently, a ratio closer to 1 indicates a more precise prediction.It can be seen from Figures 11 to 13 that the proposed MGGP models provide a significantly better performance than the available codes and empirical models.Moreover, MGGP makes better predictions than the robust soft computing tools (FL, ANN, LGP, GEP, and MEP).As shown in Figure 13, the proposed MGGP model for both of NSC and HSC yields very good results on the entire database.The superior performance of the generic model implies the reasonability of developing comprehensive models for the   of both NSC and HSC rather than developing separate models for each of them.

Parametric Analysis
For further verification of the MGGP models, a parametric analysis is performed in this study.The parametric analysis investigates the response of the predicted   by the MGGP models to a set of hypothetical input data.The robustness of the design equations is determined by examining how well the predicted   values agree with the underlying physical behavior of NSC and HSC [40]. Figure 14 presents the tendency of the predictions to the   variations.The results indicate that the   of NSC and HSC continuously increases due to increasing   .The parametric analysis results are expected cases from a structural engineering viewpoint [41].The results confirm that the proposed design equations are robust and can confidently be used.

Conclusion
In this paper, a promising extension of the classical GP, namely, MGGP, is employed for the analysis of the tangent   of NSC and HSC.MGGP integrates the capabilities of the GP and linear regression methods to formulate the nonlinear behavior of   .Three design formulas are obtained for the prediction of   .The proposed models are developed upon several test results obtained from the literature.The MGGP models provide reliable estimations of the   of NSC and HSC and outperform the existing empirical and other soft computing-based models.The generic MGGP model provides significantly accurate determinations of the   of both NSC and HSC.In addition to the acceptable accuracy, the MGGP-based prediction equations are very simple.The robustness of the proposed MGGP models is confirmed with the results of the parametric study.With the use of the MGGP approach,   can be estimated without carrying out sophisticated and time-consuming laboratory tests.The models can be easily retrained and improved to make more accurate predictions for a wider range by including the data for other test conditions [42].

Figure 2 :
Figure 2: Predicted versus experimental   of NSC using the MGGP I model.

Figure 3 :
Figure 3: Variation of the best and mean fitness with the number of generations for MGGP I.

PFigure 4 :
Figure 4: Statistical properties of the evolved MGGP I model (on training data).

Figure 5 :
Figure 5: Predicted versus experimental   of HSC using the MGGP II model.

Figure 6 :
Figure 6: Variation of the best and mean fitness with the number of generations for MGGP II.

Figure 7 :
Figure 7: Statistical properties of the evolved MGGP II model (on training data).

Figure 11 :
Figure 11: A comparison of the ratio between the predicted and experimental   of NSC using different models.(Vertical axis:  , Predicted / , Experimental ; horizontal axis: test number.) .635 and MAE = 1.662

Figure 12 :
Figure12:A comparison of the ratio between the predicted and experimental   of HSC using different models.(Vertical axis:  , Predicted / , Experimental ; horizontal axis: test number.)

Figure 13 :
Figure13: A comparison of the ratio between the predicted and experimental   of NSC and HSC using different models.

Figure 14 :
Figure 14: Parametric analysis of the   of NSC and HSC.

Table 1 :
Parameter settings for the MGGP algorithm.
Model for the   of NSC.The optimal formulation of the   of NSC in terms of   is as given below.
Model for the   of NSC and HSC.The best prediction model for the   of NSC and HSC in terms of   is as given below.The population size, number of generations, maximum number of genes, and maximum tree depth for the MGGP III model are similar to those for the MGGP II model:  of NSC and HSC is shown in Figure8.As can be seen in this figure, the performance of the model is very Mathematical Problems in Engineering