Component Thermodynamical Selection Based Gene Expression Programming for Function Finding

1 School of Science, JiangXi University of Science and Technology, Ganzhou 341000, China 2 State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 3 Computer School, Wuhan University, Wuhan 430072, China 4 State-Owned Assets Supervision and Administration of Jiangxi Province, Nanchang 330006, China 5 College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China 6 School of Information Engineering, Shijiazhuang University of Economics, Shijiazhuang 050031, China


Introduction
Gene expression programming (GEP) [1,2], improved genetic programming (GP) with linear representation [3,4], is an artificial problem solver inspired in natural genotype/phenotype system.GEP combines both the simple, linear string of chromosomes with fixed length to represent the solutions similar to the ones utilized in genetic algorithm (GA) and the ramified structures with different sizes and shapes similar to the parse trees of GP [3,5,6].Thus, GEP has the advantages of both GA and GP, while overcoming some of their individual limitations [3,4].Because of its high performance, GEP has attracted increasing attention recently as an efficient and effective data mining approach.Moreover, it has been successfully applied to many fields, such as function finding [7][8][9], symbolic regression [10][11][12][13], parameter optimization [14], rule mining [15], classification [3,16], time series forecasting [2], prediction of flow number of asphalt mixes [17], prediction of material load [18,19], prediction of the strength of concrete [20], engineering design [21], and machine scheduling [22,23].
Although GEP has been successfully employed in a variety of areas, in practical applications, it is found that the conventional GEP usually suffers from premature convergence and slow convergence rate resulting in poor solution quality and/or large computational cost [2][3][4].The main reason is that the conventional GEP cannot quantitatively keep a balance between the selective pressure and the population diversity during the evolution process.Therefore, this may lead to trapping in the local optimum and/or slowing down the search speed.
In general, increasing selective pressure and promoting population diversity in GEP are often in conflict with each other [3,4].This means that increasing selective pressure may lead to more individuals being close to the best individual, and then the average fitness of the population is better.

Mathematical Problems in Engineering
Hence, this can accelerate the convergence speed of the population.However, increasing selective pressure may result in an evolutionary state of which most of the individuals are approaching the best individual.As a result, the population diversity is significantly reduced after some generations, increasing the possibility of trapping into local optimum solutions.On the contrary, promoting population diversity can make the individuals distribute widely in the search space and increase the probability of finding the global optimum, but this may slow down the convergence speed.
To the best of our knowledge, there has been little research focusing on how to quantitatively balance the selective pressure and population diversity of GEP during the evolution process.Therefore, this motivates us to investigate a selection mechanism that can quantitatively keep a balance between the selective pressure and population diversity of GEP to enhance the global search ability and simultaneously to accelerate the convergence speed.Our work along this idea has produced a novel GEP based on component thermodynamical selection operator (CTS), called CTSGEP.This proposed approach, inspired by the principle of minimal free energy in thermodynamics, seeks to map the selective pressure and the population diversity into the mean energy and the entropy, respectively.In order to quantitatively balance the selective pressure and the population diversity of GEP, in the CTS, when selecting individuals for the next generation from the parent and offspring individuals, the selected individuals for the next generation should satisfy the principle of minimal free energy.
The rest of the paper is organized as follows.Section 2 introduces the notations and terminologies of GEP that are useful for the review of the previous works of GEP in Section 3. The proposed algorithm, CTSGEP, is elaborated in Section 4, with detailed explanations on the component thermodynamical selection operator.The computational results and comparisons are provided in Section 5. Finally, we end the paper with some conclusions in Section 6.

GEP Basic Concepts
2.1.Chromosomes Representation.The most innovative feature of GEP is the improved representation of chromosomes.GEP separates the genotype from the phenotype of the chromosomes [3], which is one of the greatest limitations of both GA [24,25] and GP.In GEP, individuals are represented by linear strings and called chromosomes.In addition, the chromosomes consist of genes and link operators, in which the link operators connect the genes.The link operators usually can be arithmetic operators, such as +, −, * , and /.Moreover, the genes of GEP can be categorized into two types [2]: genotype and phenotype.The genotype is the code of genes similar to that used in GA and the genetic operators directly manipulate the genotype, while the phenotype is the decoding of the genes consisting of the same kind of ramified structures with different sizes and shapes similar to the parse trees of GP.For instance, the detailed transformation process of gene " * − / +     " can be shown in Figure 1.Hence, the merits are obvious to separate the genotype from phenotype of the chromosomes.On the one hand, the representation of the chromosomes is simple and compact.Therefore, the genetic operators are easy to implement and very efficient.
On the other hand, this mechanism makes GEP able to solve complex problems.In GEP, each gene is composed of two parts: a head and a tail.The head contains functional symbols (e.g., +, −, * , /, etc.) and terminal symbols, but the tail contains only terminal symbols.Moreover, the length of the head ℎ is, selected by the user, determined by specific problems, while the length of the tail  is a function of ℎ and .In addition,  should satisfy (1), which makes sure that any gene can be decoded to a correct mathematical expression, where  is the number of arguments for the function that takes the most arguments: For example, we consider a gene composed of {+, −, * , /, , , }, where  represents the square root function.In the set of functional symbols  = {+, −, * , /, },  is 2. We assume ℎ is 4; it can be concluded that  = 4 × (2 − 1) + 1 = 5.Thus, the length of the gene is 4 + 5 = 9.

Genetic Operators.
There are many genetic operators in GEP, including selection operator, mutation operator, transposition-insertion operator, and recombination operator.These genetic operators should be subject to the following conditions.(1) The length of the head and that of the tail are subject to formula (1).(2) The tail contains only terminal symbols [2].Moreover, these conditions ensure that the genetic operators can generate new genes that are decoded to correct mathematical expression.Therefore, these operators are simple and easy to implement.The detailed description of these operators can be referred in [1,2].
Mathematical Problems in Engineering 3 2.3.Fitness Functions.Generally, different fitness functions are suitable for different problems.The choice of fitness functions is quite crucial for GEP.This is mainly because the fitness functions may directly affect the convergence speed and the solution quality.In GEP, there are many kinds of fitness functions: absolute error fitness function, relative error fitness function, and logic synthesis fitness function [1,2].They are described as follows: where Max is a constant, which determines the range of   ,  (,) is the value calculated by the individual  for the sample instance ,  () is the target value for sample instance ,   is the total number of sample instances, and  is the number of sample instances correctly predicted.In general, fitness functions (2) and ( 3) are employed to solve function regression problems and fitness function ( 4) is applied to Boolean concept learning problems.

The Framework of GEP.
The framework of GEP is similar to that of GA [26].The major difference between GEP and GA is the representation of chromosomes.However, the essential idea of GEP is the same as the one of GA [2], which is based on the concepts of natural selection and survival of the fittest.The procedure of GEP is described in Algorithm 1.

Previous Work
In order to enhance the performance of the traditional GEP algorithm, many scholars recently have proposed several GEP variants.Moreover, these GEP variants can be classified into two categories: accelerating convergence speed and promoting population diversity.

Accelerating Convergence Speed.
In order to accelerate the convergence speed of the traditional GEP, Karakasis and Stafylopatis [3] proposed a novel GEP for data mining tasks, which combined the principle inspired by the immune system, namely, the clonal selection principle.In the proposed algorithm, a receptor-editing step was added in order to achieve faster exploration of the antibody-antigen binding space.Experimental results showed that the proposed GEP variant outperformed the conventional GEP in terms of both prediction accuracy and computational efficiency.Zhang et al. [27] introduced an improved gene expression programming (IGEP), which employed a dynamic mutation operator to enhance the efficiency.The proposed algorithm can obtain better prediction results for the prediction of retention times for a larger set of pesticides than heuristic method.Further, IGEP as a nonlinear method had good generalized performance.By applying parallel taboo search, Rao et al. [28] presented an enhanced GEP to improve the local search ability of the conventional GEP.Wu et al. [29] proposed a parallel niche GEP based on general multicore processor to improve the evolution efficiency and the parallel model of niche GEP was designed by OpenMP.Based on analyzing the intelligibility and efficiency of expression-treebased expression on GEP, Chen et al. [7] introduced a reduced GEP, of which the chromosomes were evaluated directly on the reduced gene without being expressed them into expression trees.Moreover, the result of the evolution by reduced GEP was simplified and easier to be understood and explained.

Promoting Population Diversity.
For maintaining good population diversity of the conventional GEP, Jiang et al. [30] proposed an adaptive GEP algorithm based on cloud model.The proposed GEP algorithm employed an adaptive cloud strategy to determine the mutation and crossover rate dynamically to improve the population diversity.Li et al. [31] introduced an improved GEP (AMACGEP) by statistical analysis and critical velocity, which utilized statistical analysis of repeated bodies to enhance the diversity of the initial population.Moreover, it proposed a dynamic mutation operator to improve the diversity of individuals.Liu et al. [32] proposed a population diversity-oriented GEP (Mod-GEP) for function finding, in which two strategies including population updating and population pruning were used to increase the diversity of population.The experimental results showed that Mod-GEP can obtain more satisfactory solution than GP, GEP, and some other GEP variants.Zhang and Xiao [33] presented a population diversity strategy GEP (GEP-PDS).
The presented GEP-PDS inherited the advantage of superior population producing strategy and various population strategies to maintain the diversity of population.Further, Zhang et al. [34] proposed an improved GEP based on block strategy (BS-GEP), in which the population was divided into several blocks according to the individual fitness of each generation and the genetic operators were reset differently in each block to preserve the population diversity.In addition, BS-GEP was also utilized in prediction of software failure sequence.

The Proposed CTSGEP Algorithm
4.1.Motivations.As pointed out in Section 3, some researchers have developed various GEP variants to improve the selective pressure in order to accelerate the convergence speed, whereas this may increase the possibility of trapping in local minima solutions [3,27,28].Meanwhile, for the sake of decreasing the possibility of trapping in local minima solutions, many scholars have also attempted to encourage the population diversity during the evolution process.However, this may decelerate the searching speed [13,[31][32][33].Therefore, a feasible solution to overcome these deficiencies of GEP cannot only improve one of the selective pressures or population diversities.Thus, a better approach is to keep a balance between the selective pressure and the population diversity during the evolution process.Actually, the essence of reconciling the conflicts between the selective pressure and the population diversity is to solve a biobjectives optimization problem that can be formulated as follows.
In the parent population   of size ,  offspring individuals are created by GEP genetic operators.Hence, there are  +  individuals in total.Further, the biobjectives optimization problem is to select  individuals from the parent and offspring individuals for the next generation population  +1 , which make sure that the selective pressure measured by the average fitness AF and population diversity  of the next generation population  +1 satisfy Min  = (−AF, ).
Notice that, in the above formulation, without loss of generality, we assume that the larger fitness value implies the better individual in GEP.In addition, the selective pressure can be measured by the average fitness AF.
Many existing approaches, such as evolutionary multiobjective optimization algorithms, can tackle the above biobjectives optimization problem.However, the solving process of this biobjectives optimization problem is executed for every generation of GEP.Therefore, the computational complexity of the solving process should be low.Otherwise, it may lead to very slow convergence speed of the overall GEP algorithm.Thus, approaches with high computational complexity (e.g., evolutionary multiobjective optimization algorithms) may not be suitable.Furthermore, it is unrealistic to obtain the accurate solution of the biobjectives optimization problem, because the computational complexity is (  + ).Based on the above considerations, we present a novel method, called CTS, to obtain the approximation solution of the above biobjectives optimization problem with very low computational complexity.Its primary idea is inspired by the principle of minimal free energy in thermodynamics.The principle of minimal free energy refers to [35,36]; in the annealing process, a metal, starting with high temperature and disordered state, is gradually cooled in order that the system at any temperature approximately reaches thermodynamic equilibrium.This cooling process can be regarded as an adaptation procedure to achieve the stability of the final crystalline solid.In addition, any change from nonequilibrium to equilibrium of the system at each temperature follows the principle of minimum free energy.This means the system will change spontaneously to reach a lower total free energy and the system achieves equilibrium when its free energy seeks a minimum [36].The free energy  is defined by where  is the mean energy of the system and  is the entropy.According to the principle of minimal free energy, we can know that any change of the system can be viewed as a result of the competition between the mean energy and the entropy, and the temperature  determines their relative weights in the competition [36].In other words, the two objectives, namely, the mean energy and the entropy, are in conflict with each other, and the temperature  is the weight between the mean energy and the entropy.Moreover, the final objective can be converted into the minimal free energy.Thus, this is similar to the relationship between the selective pressure and the population diversity addressed before.Therefore, we can solve the above biobjectives optimization problem according to the principle of minimal free energy.

Basic Concepts of Component Thermodynamical Selection Operator.
In order to utilize the principle of minimal free energy to reconcile the conflicts between the selective pressure and the population diversity, we should first map the selective pressure and the population diversity into the mean energy and the entropy, respectively.According to the characteristics of GEP and our previous works in [36,37], we give the following definitions.
Definition 1.Let  be the search space; for any GEP individual   ∈ , its fitness value is (  ) and the characteristic of the fitness value is that the larger fitness value indicates that the individual is better.The absolute energy (  ) of individual   is defined by Definition 2. Let   = { 1 ,  2 , . . .,   } ∈   be the GEP population of generation .The absolute energy window   is defined as follows.
where  is the temperature and (  ,   ) is the mean energy, which is defined by: From the above definitions, we can obtain the following conclusion and the proof can be referenced in our previous work [36,37]: As we know, our objective is the minimal free energy.Therefore, according to this conclusion, we can calculate the free energy by computing the mean of the component free energy of every individual in the population.Hence, the minimal free energy can be approximatively obtained by the minimal component free energy of every individual in the population.Next, we will present the component thermodynamical selection operator of GEP based on this conclusion.

Component Thermodynamical Selection Operator of GEP.
Based on the definitions in Section 4.2, we will introduce the component thermodynamical selection operator (CTS) of GEP.The main idea of CTS is to pick  individuals, the component free energy of the picked individuals are the  largest ones from the parent and offspring population, and then eliminate the  individuals.Further, it can be proved that the remaining individuals approximately satisfy the principle of minimal free energy.The proof is similar to our previous work [36,37].The pseudocode of CTS operator is presented in Algorithm 2.
In the CTS of GEP, we first calculate the component free energy of the  +  individuals of parent and offspring population, and then eliminate the  largest component free energy individuals to compose the next generation population.Using this method, we can select individuals for the next generation with very low computational cost and the computational complexity is (( + ) ⋅ ).Furthermore, the process of computing the component free energy of each individual in the temporary population   +1 is shown in Algorithm 3, where  is the number of ranks,  is an array which recorded the number of individuals in each rank, and   +1 is the temporary population.9) and ( 14

Experimental Setup.
In order to evaluate the performance of our proposed CTSGEP algorithm for function finding, in this section we compare CTSGEP algorithm with the traditional GEP and some GEP variations on the function finding data sets, including IGEP [27], AMACGEP [31], and Mod-GEP [32].In addition, all of the compared algorithms are implemented with C++ program language.
The function finding datasets are taken from the UCI machine learning repository [38].There are about 200 test instances for the function finding problems in UCI [38], and we randomly select 15 test instances, which are instances 10, 21 35, 44, 49, 52, 76b, 84b, 103, 126a, 148c, 155, 163, 182c, and 203.In our experimental studies, for each algorithm and each test instance, 30 independent runs are conducted with 400000 function evaluations (FES) as the termination criterion.To fairly compare the mentioned algorithms, the common parameter settings of all the algorithms, as used or recommended in [1,2,31], are shown as follows: (i) head length: 20, (ii) gene length: 41, (iii) number of genes: 5,

GEP Based on Component Thermodynamical Selection
Step 1 Create a random initial population  0 ; Step 2 Evaluate the population  0 , and calculate the absolute energy of each individual according to (6); Step 3  = 0,  = 0,  = 0; Step 4 Compute the absolute energy window   according to (7); Step  In addition, the other parameter values of IGEP [27], AMACGEP [31], and Mod-GEP [32] are the same as their original papers., , 0, , and  in CTSGEP are set to 20, 20, 10, 2, 100, respectively.In our experiments, as recommended in [2], the average and standard deviation of the mean square error (MSE) are recorded for measuring the performance of each algorithm.The mean square error Err is calculated by [2] where   is the target value for sample ,   is the predicted value by the algorithms for sample , and SN is the total number of samples in each dataset.

Comparison between CTSGEP and Other GEP Algorithms.
The mean and the standard deviation of the MSE obtained by each algorithm for 15 test instances are summarized in Table 1.All the results are obtained from 30 independent runs.In addition, the best results among the five algorithms are marked in boldface.In order to have statistically sound conclusions, two-tailed -test at a 0.05 significance level is conducted on the experimental results.The last three rows of Table 1 summarize the experimental results.Clearly, CTSGEP is the best among the five algorithms on the 15 test instances.It performs significantly better than GEP, IGEP, AMACGEP, and Mod-GEP on fifteen, fourteen, thirteen, and ten test instances according to the two-tailed -test, respectively.In addition, GEP cannot outperform CTSGEP on any test instance, while IGEP, AMACGEP, and Mod-GEP only surpass CTSGEP on one, one, and three test instances, respectively.
To compare the performance of these algorithms on the 15 test instances, the average ranking of the Friedman test is conducted by the suggestions considered in [39,40].Table 2 reports the average ranking of the five GEP algorithms on the 15 test instances.These GEP algorithms can be sorted by the average ranking into the following order: CTSGEP, Mod-GEP, AMACGEP, IGEP, and GEP.Thus, the best average ranking is obtained by the CTSGEP algorithm, which outperforms the other four GEP algorithms.
To compare the performance differences between CTS-GEP and the other four GEP algorithms, we conduct a Wilcoxon signed-ranks test [41,42] with a significance level equal to 0.05.Table 3 shows the resultant  values when comparing between CTSGEP and the other four GEP Two-tailed t-test at a 0.05 significance level is conducted between CTSGEP and each of GEP, IGEP, AMACGEP, and Mod-GEP.
"+", "−", "≈" denote that the performance of the corresponding algorithm is better than, worse than and similar to that of CTSGEP according to the two-tailed t-test, respectively.The best results among the five algorithms are typed in bold.algorithms.The  values below 0.05 are typed in bold.From the results, it can be observed that CTSGEP is significantly better than GEP, IGEP, and AMACGEP algorithms.Besides, CTSGEP is not significantly better than Mod-GEP.However, CTSGEP performs better than Mod-GEP according to the average rankings shown in Table 2.In summary, CTSGEP is the winner on these 15 test instances.This can be because CTSGEP could quantitatively  stable convergence, for it can obtain a compromise between the selective pressure and the population diversity.Mathematical Problems in Engineering the number of ranks .The former is related to the selective pressure, while the latter is correlated with the population diversity.

Sensitiveness to Offspring Population Size
.An experiment is conducted to investigate the sensitivity of CTSGEP algorithm to variations in offspring population size  based on the 15 test instances described in Section 5.1 over 30 independent runs.Obviously, the offspring population size  is related to population size .Therefore, we set  which varies from  * 5% to  * 50% with a step equal to 5 in the experiment.In addition, all the other parameters of CTSGEP are the same as those in Section 5.1.Results for some typical test instances, reported in Figure 5, show that the performance of CTSGEP changes with offspring population size .Here, we omit plots for all other test instances as they exhibit a similar behavior.The -coordinate of each plot in Figure 5 represents the offspring population size , while the -coordinate stands for the average MSE over 30 independent runs.It can be easily seen from Figure 5 that CTSGEP performs best when the offspring population size  is selected in the range [ * 15%,  * 30%].

Sensitiveness to Number of Ranks 𝐾.
The impact of the number of ranks  is investigated using the 15 test instances described in Section 5.1 over 30 independent runs.We fix the parameters of CTSGEP the same as those in Section 5.1 except that  ranges from  * 5% to  * 50% with a step of 5.The results for some typical test instances are shown in Figure 6.
Here, we also omit results for all other test instances since they show the similar tendency as well.In the figure, it is clear that CTSGEP works best with the number of ranks  ∈ [ * 15%,  * 35%].

Conclusion
GEP is an increasingly popular tool for data mining.However, it tends to suffer from premature convergence and slow convergence rate when solving complex problems.Aiming at this drawback of GEP, we present a novel GEP based on the component thermodynamical selection operator.CTSGEP, proposed in this paper, is inspired by the principle of minimal free energy in thermodynamics, which maps the selective pressure and the population diversity into the mean energy and the entropy, respectively.Further, due to the chosen individuals for the next generation satisfying the principle of minimal free energy, the proposed approach can quantitatively keep a balance between the selective pressure and population diversity of GEP.
The experimental studies in this paper were conducted on 15 test instances of function finding problems taken from the UCI machine learning repository.CTSGEP was compared with the conventional GEP and three GEP variations, that is, IGEP, AMACGEP, and Mod-GEP.The experimental results demonstrated that its overall performance was better than the four competitors.Moreover, the parameters sensitivity study of CTSGEP was also experimentally investigated.
In the future, we will perform more detailed evaluation of CTSGEP for the large scale data-mining problems, which is considered as a challenge by the data mining community.In addition, it is also interesting to study how to incorporate parameter adaptation schemes to CTSGEP.

); } Algorithm 3 :
Process of computing the component free energy of each individual.the termination criterion is reached.The CTSGEP algorithm description is summarized in Algorithm 4.

)
Definition 7. Let   = { 1 ,  2 , . . .,   } ∈   and   = [  ,   ] be the GEP population of generation  and the absolute energy window, respectively.For any GEP individual   ∈  located in the rank    where the number of the individuals is   , the component free energy   (  , ,   ,   ) of individual   is defined by   (  , ,   ,   ) =   (  ,   ) + log              .
Combine offspring population   with parent population   to generate a temporary population   +1 ; Step 2 Compute the component free energy of each individual in Eliminate the M picked individuals from population   +1 to generate the population  +1 for the next generation.Algorithm 2: Pseudocode of CTS operator.The process of computing the component free energy of each individualStep 1 / * initialize the number of individuals in each rank and compute rank Proposed CTSGEP.Similar to the traditional GEP, CTSGEP starts with initializing a population of  individuals.Then at each temperature , it evolves  generations.At each generation,  new individuals are created by the uniform selection, mutation, transposition-insertion, and recombination operators, and then select  individuals from the  +  individuals for the next generation using CTS.This process is repeated untilComponent thermodynamical selection operator of GEPStep 1 Establish the offspring population   by the M new individuals; Evaluate the population   , and calculate the absolute energy of each individual according to (6); Save the best individual; Compute the absolute energy window  +1 according to (8); Utilize CTS operator to select N individuals from   ∪   for the next generation;

Table 2 :
Average Rankings of the five GEP algorithms for the 15 test instances achieved by Friedman test.

Table 3 :
Wilcoxon test between CTSGEP and the other four GEP variations for the 15 test instances.
The P values below 0.05 are typed in bold.