^{1}

^{2}

^{3}

^{1}

^{2}

^{3}

Many genetic algorithms (GA) have been applied to solve different NP-complete combinatorial optimization problems so far. The striking point of using GA refers to selecting a combination of appropriate patterns in crossover, mutation, and and so forth and fine tuning of some parameters such as crossover probability, mutation probability, and and so forth. One way to design a robust GA is to select an optimal pattern and then to search for its parameter values using a tuning procedure. This paper addresses a methodology to both optimal pattern selection and the tuning phases by taking advantage of design of experiments and response surface methodology. To show the performances of the proposed procedure and demonstrate its applications, it is employed to design a robust GA to solve a project scheduling problem. Through the statistical comparison analyses between the performances of the proposed method and an existing GA, the effectiveness of the methodology is shown.

The combinatorial optimization involves problems in which their set of feasible solutions is discrete or can be reduced to a discrete one, and the goal is to find the best possible solution. In areas such as routing, task allocation, scheduling, and so forth, most of the problems are modelled in the form of combinatorial optimization problems.

Due to the NP completeness of many combinatorial optimization problems, they are quite difficult to be solved analytically, and exact search algorithms such as branch and bound may degenerate to complete enumeration, and the CPU time needed to solve them may grow exponentially in the worst case. To practically solve these problems, one has to be satisfied with finding good and approximately optimal solutions in reasonable, that is, polynomial time.

In recent decades, researchers have developed evolutionary algorithms to solve the combinatorial problems with practical sizes. Evolutionary algorithms (EA) form a class of search methods that work by incrementally improving the quality of a set of candidate solutions by variation and selection (Eiben and Smith [

The two major steps in applying any GA to a particular problem are the specification of the representation and the evaluation (fitness) function (Deb [

Evolution in GA is partly made at random, and the other part is based on both the behaviour of the applied patterns in component of GA and the setting desired values of the GA parameters. Therefore, the efficiency of a GA extremely relates to the selection of good patterns and tuning of its parameters. Many researchers have tried to optimize their GAs so far. In total, setting GA parameters are commonly divided into two cases: parameter control, and parameter tuning. In case of parameter control the parameter values are changing during a GA run. This requires initial parameter values and suitable control strategies, which in turn can be deterministic, adaptive, or self-adaptive (Eiben et al. [

De Jong and Spears [

While in the previous researches, there has been less attention in selecting the optimum or near the optimum pattern for the components of a GA and tuning its parameters simultaneously, in this paper, we attempt to introduce a new approach in which at first the effects of the GA components and parameters as well as their interactions are statistically analyzed. Then, in the second step, the optimal values of the parameters are discovered. Whereas the previous works have just assessed the effects of parameters while they were adjusted on some discrete points, in the present work we, search a continuous interval in order to find the optimal values of the parameters.

The organization of the rest of the paper is as follows. The description of the proposed methodology is given in Section

A genetic algorithm is designed as a combination of patterns in encoding, generation, selection, joining, replacement, and stopping criteria. Although different patterns have been proposed to encode and to generate initial population, the relationships between the problem variables and the condition of the search range play an important role in designing the patterns. Nevertheless for other components of a GA, while a desirable combination is usually unknown, almost all existing patterns can be generally used. Furthermore, it seems that to design a robust GA, one needs to employ the following three phases:

Phase 1: designing different patterns to create the, algorithm

In the next subsections, we first review some of the most commonly used patterns of the GA components and then demonstrate the existing methods in the last two phases. At the end of this section, the proposed methodology will be described.

In this section, we review some of the most commonly used patterns in GA components.

In a typical optimization problem, there exists a wide discrete/continuous range of points named search range, and the goal is to find a point that leads us to the optimal solution. An essential characteristic of GAs is the encoding of the variables that describe the problem. There are many encoding patterns. Some patterns such as binary and gray encode the problem by a set of ones and zeros, and some others such as random keys, permutation, and real value encodings use numeric values to represent the chromosomes. For a review of encoding patterns, each with specific characteristic and capability, see (e.g. Jenkins [

Some of the most prevalent of the chromosomes’ selection patterns proposed so far are based on the chromosomes’ fitness and are the roulette selection (Michalewicz [

One of the most important components of a GA is the crossover operator. There are two broad classes of the crossover operation, namely, the point crossovers (Deb [

Traditionally, GAs have relied upon point crossovers. However, there are many instances in which having a higher number of crossover points is beneficial (Eshelman et al. [

Another important operator of a GA is mutation that acts like an insurance policy against premature loss of important information (Goldberg [

In a GA, the children of the old generation are replaced with the newly generated children. A variety of methods have been proposed in the literature for this task. In one of these methods called (

The stopping criteria are usually of two types. The first one (such as either reaching a predefined number of iteration or a predefined iteration time of running GA) is called the passive stopping criterion in which the algorithm independently of the obtained results is stopped. The second one (such as either reaching a unique solution in some sequential generations or the difference between the average fitness of some sequential generations is low) is called the sequential criterion in which stopping the GA depends on the quality of the obtained results. Both of these criteria may be treated as the GA parameters.

In summary, some of the predetermined patterns along with their parameters and symbols used to describe them are given in Table

Common patterns in designing GA components and their parameters.

Component | Operators | Parameters | Symbols |
---|---|---|---|

Parent selection | Roulette | — | |

Tournament | Tour size | ||

Random | — | ||

Unlike | — | ||

Recombination | One point | Crossover probability | |

Two point | Crossover probability | ||

Uniform | Crossover probability | ||

Mutation | Prob. to any child | Mutation probability | |

Prob. to any gene | Mutation probability | ||

Replacement methods | ( | ||

( | |||

Elitism | Number of chromosomes for reproduction | ||

Preselection | — | ||

Stopping criteria | Passive | Number of iteration/time | |

Sequential | Number of the successive iterations with the same best solution |

While the performances of a GA (responses) are affected by some controllable factors such as encoding, selection, crossover, mutation, replacement, and stopping patterns, as well as uncontrollable factors in which their different combinations would result in different performances, design of experiments can be implemented to study the effects of the input factors and their interactions on the system performances and delineate which factor(s) has (have) the most effect(s) on the response(s). Furthermore, when the study involves two or more factors, factorial designs are generally the most efficient way of delineating the significant factors (Montgomery [

In designing a robust GA, the effects of different patterns and parameters along with their interactions can be analyzed using a factorial design such as a full factorial design (a

Response surface methodology (RSM) is a collection of statistical and mathematical techniques useful for modelling and analysis of problems in which responses affected by several variables, and its goal is to optimize these responses. In most of these problems, the relation between the response and the factors (independent variables) is not definite, requiring to first estimate this relationship. To do this, if there exists no curvature in the response surface, one can employ the first order model; otherwise, a model of higher order (usually a second order model) should be used. Then, a level combination of the factors should be found such that the optimal value of the response is reached. To do this, the deepest ascent method (for maximization problem) or the deepest descent method (for minimization problem) are employed to move on a path with the most gradient that makes the most increase or decrease in the objective function. A comprehensive survey of RSM refers to Myers and Montgomery [

In order to tune the parameters of the designed GA, one may use RSM. If the relation between the algorithm efficiency and the important GA parameters is linear, then depending on the constraints (time and cost) on the size of experiments either the

The present methodology can be carried out independent of the selected encoding pattern. This is due to the fact that the patterns work independent of encoding as they use the fitness of chromosomes. In other words, regardless of the encoding patterns, we need to have a method to calculate the fitness of chromosomes. There is also a similar consequence for the replacement patterns. They all are either completely free of encoding type or in some situations such as Elitism they are only associated with the fitness. For the most important part of GA, that is, the crossover pattern, since some characteristics (genes) of patterns are replaced through the point and the uniform crossovers, the methodology is applicable in all encoding patterns. The only part of GA that is associated with the encoding system is mutation. As explained in Section

In the next section, a project scheduling problem is used to demonstrate the steps involved in the proposed methodology along with its possible implementation in practice.

Project scheduling problem is an important branch of combinatorial optimization problem in which the goal is optimization of one or some objectives subject to constraints on the activities and the resources. Due to its NP-Hardness structure, researchers have widely considered metaheuristics, particularly genetic algorithms, in order to solve it. Among these research works, different chromosome encodings and parameter settings have been introduced so far. For example, the priority-value based and priority rule-based representations were developed by Lee and Kim [

Resource investment problem with discounted cash flows introduced by Najafi and Niaki [

Graph of the example network.

Since Andrzej [

Since having more patterns are desired for a robust GA, in addition to the ones devised by Najafi et al. [

The coded factor levels.

Factors | Coded levels | |

High level (+1) | Low level (−1) | |

Tournament | Unlike | |

One-point crossover | Uniform crossover | |

A probability to any child | A probability to any gene | |

Preselection | Elitism | |

Passive | Sequential | |

0.9 | 0.7 | |

0.1 | 0.05 | |

0.1 | 0.05 |

One run of the full factorial design of the 8 aforementioned factors requires

For the analysis of variance, the relative deviation percentage (RDP) of the proposed GA solutions to the optimal solutions which is calculated according to (

Although the probability distribution of the response is not known, since 30 test problems is used in each of which the summation of the responses are used for different factor combinations, the central limit theorem implies an approximate normal distribution. Assuming that more than 2-way interaction effects are not significant, the analysis of variance of this experiment is given in Table

Analysis of variance for RDP.

Source | DF | Seq SS | Adj SS | Adj MS | ||

Main effects | 8 | 9.464 | 9.464 | 1.18294 | 17.74 | 0 |

2-way interactions | 7 | 2.804 | 2.804 | 0.40054 | 6.01 | 0 |

Residual error | 464 | 30.933 | 30.933 | 0.06667 | ||

Pure error | 464 | 30.933 | 30.933 | 0.06667 | ||

Total | 479 | 43.2 |

In order to find the specific significant effects, the Duncan's multiple range test as one of the ad hoc analysis methods results in the following ranking:

In which “

The candidate parameters in the tuning process are population size (

The search ranges for input variables.

Parameters | Ranges |
---|---|

Population size | |

Crossover probability | 0.8–1 |

Mutation probability | 0.025–0.075 |

Local improvement probability | 0.05–0.15 |

In the tuning phase, a 2^{4} central composite factorial (CCF) design with 4 central points and 8 axial points of (

The results of the CCF design.

Input variables | Responses | |||||

Runs | ||||||

(1) | 0 | 0 | 1 | 0 | −0.02 | 0.38 |

(2) | 1 | 0 | 0 | 0 | −0.02 | 0.29 |

(3) | 0 | −1 | 0 | 0 | −0.02 | 0.46 |

(4) | −1 | 0 | 0 | 0 | −0.02 | 0.75 |

(5) | 1 | 1 | −1 | 1 | −0.01 | 0.23 |

(6) | 1 | −1 | 1 | −1 | −0.02 | 0.37 |

(7) | −1 | 1 | 1 | 1 | −0.02 | 0.57 |

(8) | 1 | −1 | 1 | 1 | −0.01 | 0.28 |

(9) | 0 | 0 | 0 | −1 | −0.02 | 0.48 |

(10) | 0 | 1 | 0 | 0 | −0.01 | 0.37 |

(11) | −1 | −1 | 1 | 1 | −0.02 | 0.73 |

(12) | 0 | 0 | 0 | 0 | −0.02 | 0.41 |

(13) | 0 | 0 | 0 | 0 | −0.01 | 0.41 |

(14) | 1 | 1 | 1 | 1 | −0.01 | 0.23 |

(15) | 0 | 0 | 0 | 1 | −0.01 | 0.36 |

(16) | 1 | 1 | 1 | −1 | −0.02 | 0.3 |

(17) | −1 | −1 | −1 | −1 | −0.03 | 0.96 |

(18) | 1 | 1 | −1 | −1 | −0.01 | 0.31 |

(19) | 0 | 0 | 0 | 0 | −0.01 | 0.41 |

(20) | −1 | 1 | −1 | 1 | −0.01 | 0.61 |

(21) | −1 | −1 | −1 | 1 | −0.02 | 0.76 |

(22) | −1 | −1 | 1 | −1 | −0.03 | 0.96 |

(23) | 1 | −1 | −1 | −1 | −0.01 | 0.38 |

(24) | −1 | 1 | −1 | −1 | −0.03 | 0.78 |

(25) | 1 | −1 | −1 | 1 | −0.01 | 0.29 |

(26) | 0 | 0 | −1 | 0 | −0.02 | 0.41 |

(27) | 0 | 0 | 0 | 0 | −0.02 | 0.42 |

(28) | −1 | 1 | 1 | −1 | −0.03 | 0.78 |

The first response

The results of Table

Tables

Analysis of variance for solution accuracy.

Source | DF | Seq SS | Adj SS | Adj MS | ||
---|---|---|---|---|---|---|

Regression | 14 | 0.000712 | 0.000712 | 0.000051 | 9.28 | 0 |

Linear effect | 4 | 0.000551 | 0.000551 | 0.000138 | 25.11 | 0 |

Quadratic effect | 4 | 0.000056 | 0.000056 | 0.000014 | 2.54 | 0.09 |

Interaction | 6 | 0.000106 | 0.000106 | 0.000018 | 3.21 | 0.037 |

Residual error | 13 | 0.000071 | 0.000071 | 0.000005 | ||

Lack of fit | 10 | 0.000058 | 0.000058 | 0.000006 | 1.32 | 0.458 |

Pure error | 3 | 0.000013 | 0.000013 | 0.000004 | ||

Total | 27 | 0.000783 | ||||

Analysis of variance for solution quality.

Source | DF | Seq SS | Adj SS | Adj MS | ||
---|---|---|---|---|---|---|

Regression | 14 | 1.25963 | 1.25963 | 0.089973 | 720.15 | 0 |

Linear effect | 4 | 1.1406 | 1.1406 | 0.285149 | 2282.35 | 0 |

Quadratic effect | 4 | 0.09321 | 0.09321 | 0.023302 | 186.51 | 0 |

Interaction | 6 | 0.02582 | 0.02582 | 0.004304 | 34.45 | 0 |

Residual error | 13 | 0.00162 | 0.00162 | 0.000125 | ||

Lack of fit | 10 | 0.00156 | 0.00156 | 0.000156 | 7.27 | 0.065 |

Pure error | 3 | 0.00006 | 0.00006 | 0.000021 | ||

Total | 27 | 1.26125 | ||||

Since the goal is to find the parameters values such that both objective functions are simultaneously optimized, a bi-objective optimization problem is needed to be solved. The fuzzy goal-programming (FGP) technique transforms the multiobjective decision-making problem to a single objective using fuzzy set theory. In FGP, a membership function is defined for each objective (

Suppose

To solve this problem, we first obtain the pay-off table of the positive ideal solution (PIS) as shown in Table

Payoff table of PIS.

Max | −0.009 | 0.24 |

Max | −0.029 | 0.97 |

The membership functions of these two objectives can be obtained as follows:

Then, the model becomes

The optimal values obtained by the FGP method with

The optimal values of the parameters.

Parameter | Optimum value |
---|---|

Population size | 1.56 |

Crossover probability | 1 |

Mutation probability | 0.048 |

Local improvement probability | 0.15 |

To compare the performance of the proposed optimized genetic algorithm, all 180 problems used in Najafi et al. [

Comparison results.

No. of activities | No. of problems | ||||
---|---|---|---|---|---|

10 | 60 | 0.2% | 3 | 2 | −22% |

20 | 60 | 2.4% | 37 | 30 | −13% |

30 | 60 | 8.2% | 102 | 94 | −7% |

The following notations are used in Table

Average relative deviation percentages of the optimized GA solution to the GA of Najafi et al. [

Average runtime (in seconds) required to obtain the solutions by GA of Najafi et al. [

Average runtime (in seconds) required to obtain the solutions by the optimized GA.

Average relative-deviation-percentages of the optimized GA runtime to the GA of Najafi et al. [

These results of Table

Hypotheses test of

Variable | Mean | StDev | SE mean | 95% lower bound | |||
---|---|---|---|---|---|---|---|

RDP Solution | 180 | 0.035769 | 0.164963 | 0.012296 | 0.015439 | 2.91 | 0.002 |

Hypotheses test of

Variable | Mean | StDev | SE mean | 95% lower bound | |||
---|---|---|---|---|---|---|---|

RDP Runtime | 180 | −0.1425 | 0.395694 | 0.029493 | −0.09374 | −4.83 | 0 |

The genetic algorithm is known as one of the most robust and efficacious methods to solve combinatorial optimization problems and has been widely used in recent researches. Since different viewpoints suggested to design this algorithm and its parameters greatly affect the solution quality, in this research a methodology that contains three phases has been proposed to design an optimal genetic algorithm. The phases are designing different combinations of common viewpoints to create GA, selection of the significant combinations by design of experiments, and tuning the parameters using response surface methodology. This methodology was then applied to optimize a GA to solve a project scheduling problem. Statistical comparisons of the results obtained by the proposed GA with the ones from an existing GA to solve the project scheduling problem verified better performances of the new algorithm with less amount of required CPU time. The proposed methodology can be employed for other encoding patterns of GA in addition to other metaheuristic algorithms as future researches.