^{1, 2}

^{1}

^{2}

It is commonly believed that diversity is crucial for an evolutionary system to succeed, especially when the problem to be solved contains local optima from which the population cannot easily escape. There exist numerous methods to measure population diversity, but none of these have been shown to be consistently useful. In this paper, a new diversity measure is introduced, and it is shown that high diversity according to this new measure generally leads to a more successful overall evolution in most of the cases considered.

The concept of diversity has been studied extensively in the evolutionary computation literature [

When applying evolutionary algorithms to machine learning problems [

The rest of the paper is organized as follows. In Section

One of the key features of all evolutionary systems is the maintaining of some kind of population. A major advantage of maintaining a population of potential solutions is that compared to simple local search methods, an evolutionary system can follow multiple paths through the search landscape and pursue solutions in multiple directions. This makes evolutionary systems remarkably robust, even when applied to multimodal and noisy optimization problems [

However, in many problems, known as deceptive multimodal problems, there exist several rather good but suboptimal solutions from which it is difficult to develop better solutions. If such a solution is found, and if at that point of evolution no better solutions are known, selection pressure will increase the number of individuals representing this solution at each generation. At some point, this suboptimal solution may take over the entire population, leading to what is known as premature convergence. Once a population has prematurely converged onto a suboptimal solution, the evolutionary system will perform no better than ordinary local search, and given a deceptive problem, it is unlikely that any further improvement can be found.

The most important factors determining the probability of premature convergence include the following.

The first of these factors is usually an inherent property of the problem at hand, and can therefore not be addressed by the user of the evolutionary system.

The other three can be influenced by the user in various ways. However, since increased population size and decreased selection pressure both lead to slower evolution, maintaining the diversity of the population is usually regarded as the most important technique available to avoid premature convergence.

There is no single and universal definition of the concept of diversity within the field of evolutionary computation, rather a number of different diversity measures have been proposed. These can be divided into two distinct classes.

Examples of phenotypical diversity measures include counting the number of distinct fitness values [

For a detailed description of these and other diversity measures and a thorough analysis of how they affect fitness during evolution, see Burke et al. [

Within the fields of Ecology and Evolutionary Biology, it is generally agreed that the more diverse an ecosystem, the higher its chances of survival through various environmental changes, and the better it will be able to evolve to adapt to new environments [

One can define a similar diversity measure for evolutionary computation systems where fitness is calculated based on a set of training examples. Examples of such systems include Genetic Programming [

The Euclidean Phenotype Distance can be used to define the diversity of a population by computing the average distance between all pairs of individuals, we call this diversity measure

Burke et al. [

We expect diversity in the ability of the individuals to solve different subsets of training examples to be advantageous in many evolutionary domains, mostly due to the following two reasons.

Firstly, in many applications, the training examples have varying degrees of difficulty. For example, in the field of automatic programming [

In other words, in cases in which the training examples differ in difficulty, high diversity indicates the existence of some individuals capable of solving some of the more difficult examples, which in turn is likely to simplify the remaining evolution.

Secondly, according to the Building Block Hypothesis [

If we regard the ability to solve the different training examples as building blocks, a population with high diversity obviously contains a larger number of different building blocks than a population with lower diversity. In cases in which two individuals able to solve different training examples can with reasonably high probability be combined into one individual able to solve most of the training examples solved by each of its two parents, we should expect high diversity to be of significant importance.

Consider the situation depicted in Figure

Different ways of measuring the distances between two individuals.

However, now consider the situation depicted in Figure

There is an obvious difference between the two situations in Figure

The use of angular distances to measure or promote diversity in evolutionary systems is not new (see, e.g., [

In order to investigate the effects of Euclidean and Angular Phenotype Diversity on the ability of an evolutionary system to evolve highly fit individuals, and to compare them with previously investigated diversity measures, we have conducted a series of experiments. We use a

The regression problems consist of sets

In order to be able to compute Angular and Euclidean Phenotype Diversity, we need to define the fitness vector

The fitness vector

The fitness of

When solving real-world machine learning problems, it may often be disadvantageous to evolve optimal solutions based on a limited set of training data, due to the problem of overfitting. However, we have chosen to ignore this issue in our experiments, since our focus here is on how well the evolutionary system is able to search for good solutions, rather than on actually evolving good and general solutions to the machine learning problems considered.

In our experiments, we used real-world data from six regression data sets. These sets were the following.

breast-cancer-wisconsin: A set containing data on a number of breast cancer patients. Each case is recorded with 32 numerical attributes in addition to the target value, which was the time of recurrence. The original data set contains data from a total of 198 cases, many of these are nonrecurring and/or lack some data. In our experiments, we only used the 46 recurring cases with no missing data.

concrete: This set gives the compressive strength of concrete for various ages and ratios of seven different ingredients. 1030 instances are given, each of which contains 8 numerical input values in addition to the compression strength target value.

forest fires: Data from 517 different forest fires in Portugal. Each instance contains 12 numerical attributes, such as geographical coordinates and various meteorological data. The target value is the total burned area. This data set turned out to be the hardest one, requiring more computational resources than any of the other data sets.

cars: A set containing fuel consumption data for 392 different cars. Each car is recorded with 7 numerical attributes in addition to the target value (miles per gallon). The original data set contained data for 406 cars; however, some of these lack data the target value or the horsepower attribute, and were therefore removed from the set.

NO2: Measurements of NO2 concentrations at Alnabru in Oslo, Norway. Each of the 500 measurements is recorded together with 7 numerical attributes containing information about traffic volume, various meteorological data, time and date.

boston_corrected: Data about house prices in Boston, coupled with various geographical and demographical data. The 16 instances containing censored observations as reported in [

The first three datasets were downloaded from the UCI Machine Learning Repository [

Although it is also possible to evolve the topology of neural networks by evolutionary approaches [

Number of hidden nodes and generations used for the various data sets.

Data set | Number of | Number of |

hidden nodes | generations | |

Breast-cancer-wisconsin | 40 | 10,000 |

Concrete | 40 | 5,000 |

Forestfires | 100 | 14,000 |

Cars | 40 | 10,000 |

NO2 | 40 | 5,000 |

Boston_corrected | 70 | 10,000 |

All hidden nodes used the tanh function as activation function. Since the target values contain any ranges of

Each of these sets of experiments were conducted using a (100, 700) Evolution Strategy, that is, using a parent size of

After each experiment, the evolution was evaluated based on the fitness of the best individual ever achieved during the run. The resulting fitness value, together with the different diversity measures of the population at each generation, were used as described in Section

The diversity measures used were Euclidean and Angular Phenotype Diversity as defined in (

To compute the correlation between two variables, a commonly used measure is Spearman's Rank Correlation Coefficient [

A Spearman Correlation of 1 means that the relationship between the two variables is completely monotonic, and that if one of them increases, the other one increases as well. Conversely, a Spearman Correlation of

Given two series of values, the Spearman Correlation between them can be computed using the simple formula

Like in [

For each data set, 100 runs were conducted. The results of these runs were used to compute the Spearman Correlations for the different diversity measures at each generation. To compute the correlations, the 100 runs were ranked by the fitness of the best individual found during each run. Therefore, the computed correlation values give the correlation between diversity at each generation and the best fitness ever achieved during the run. This is in contrast to Burke et al. [

After running the 100 runs for each data set, we investigated how fitness and Angular Phenotype diversity evolved during the runs. Figure

Averaged evolution of fitness and angular phenotype diversity during the experiments with the six data sets.

Breast-cancer-wisconsin

Concrete

Forestfires

Cars

NO2

Boston_corrected

The scales vary; this is due to different numbers of generations and various differences among the data sets. Note in particular the scale of the Angular Phenotype Diversity in the forestfires experiments, where diversity levels are significantly lower than in all the other experiments.

Although the scales vary, we see some clear trends in these graphs. In all cases, the evolution of diversity over time is about the same: initially, a slight increase in diversity is observed, followed by a steady decrease until the run is terminated. The initial increase in Angular Phenotype Diversity might indicate that at the start of evolution, there is a short phase in which some sort of specialization occurs, in that individuals evolve to specialize in different subsets of the training examples. The decrease in diversity for the remainder of the evolution is then probably a result of a gradual convergence towards one of these specialized individuals, which at the same time is improved and generalized by mutations and recombinations with other individuals.

The correlation between the different forms of diversity at each generation and best fitness achieved during evolution is plotted against generation number in Figure

Spearman correlation between fitness and various forms of diversity during evolution using the six different data sets. Negative correlations indicate that diversity is beneficial for evolution of good fitness; positive correlation indicates that diversity is harmful.

Breast-cancer wisconsin

Concrete

Forestfires

Cars

NO2

Boston_corrected

In general, high diversity means that a high number of different genes, traits, or other kinds of potential building blocks are present in the population. However, this high number of different potential building blocks will only be useful for further evolution if a sufficient number of the building blocks are actually useful in some sense, in that they contain some parts of a solution to the problem which can later be recombined into better and more general solutions. Since each run is initialized with a random population, the number of useful building blocks in the initial population is likely to be very low. Only after a few generations, useful building blocks are likely to have evolved, and until then the impact of diversity is very low.

After the first few generations, the correlations of the different diversity measures evolve differently. Comparing the four different forms of diversity considered, we note that for all our data sets, Angular Phenotype Diversity seems to be more beneficial for the evolution of fit individuals than any of the other diversity measures. This is particularly true for the breast-cancer-wisconsin and forestfires data sets, especially in the latter, Angular Phenotype Diversity correlates very strongly with best fitness achieved by the evolution. In the cars and NO2 data sets, Angular Phenotype Diversity is also more beneficial for evolving fit individuals than any of the other diversity measures investigated, although correlations here are quite weak. During the final half of the evolution when solving the boston_corrected data set, Angular Phenotype Diversity is somewhat more beneficial than the other diversity measures considered, but on this data set, the correlation between Angular Phenotype Diversity and fitness was positive in the first half of the evolution, indicating that high diversity is only useful in the final half of the evolution. Finally, in the concrete data set, none of the diversity measures seem to be significant; most of them show a slight positive correlation with fitness, indicating that high diversity decreases the chances of evolving fit individuals. But even here, Angular Phenotype Diversity seems more useful than the other diversity measures.

By comparing Figure

As mentioned in Section

The results of these experiments are given in Figure

Spearman correlation between fitness and various forms of diversity during evolution using the breast-cancer-wisconsin data set and different numbers of hidden nodes. Negative correlations indicate that diversity is beneficial for evolution of good fitness, positive correlation indicates that diversity is harmful.

10 hidden nodes

100 hidden nodes

In this paper, we have introduced a new diversity measure, the Angular Phenotype Diversity, based on angular distances between the fitness vectors of the individuals in the population. Comparisons with other diversity measures have been made by repeatedly running a number of regression problems and comparing the Spearman Correlation between achieved fitness and the different diversity measures, and in most of the experiments, Angular Phenotype Diversity turned out to have a stronger correlation with fitness than the other diversity measures considered. We draw the conclusion that our new diversity measure is potentially very useful for the domain considered, in the sense that the amount of Angular Phenotype Diversity in a population has a significant impact on the probability of finding good fitness values during the remainder of the evolution.

However, we have so far only considered evolution of neural networks to solve regression problems. Directions for future research may include investigating Angular Phenotype Diversity in other applications like Genetic Algorithms based Classifier Systems [