A Simplified Recombinant Pso

Simplified forms of the particle swarm algorithm are very beneficial in contributing to understanding how a particle swarm optimization (PSO) swarm functions. One of these forms, PSO with discrete recombination, is extended and analyzed, demonstrating not just improvements in performance relative to a standard PSO algorithm, but also significantly different behavior, namely, a reduction in bursting patterns due to the removal of stochastic components from the update equations.


INTRODUCTION
Originally conceived as a modification to the standard PSO algorithm for use on self-reconfigurable adaptive systems used in on-chip hardware processes, PSO with discrete recombination (PSO-DR) introduces several appealing and effective modifications, resulting in a simpler variant of the original [1].It is one of the more interesting advances in PSO research over the last few years because these simplifications apparently do not degrade performance yet they remove various issues associated with the stochasticity of the PSO acceleration parameters that hinder theoretical analysis of PSO.
Physical creation of hardware-based optimizers is a substantially more intricate undertaking than software implementations, so fast, simple algorithms are desirable in order to minimize complexity.The comparative straightforwardness of PSO to many other evolutionary optimization algorithms makes it a good choice for this purpose, and further modifications were applied by the authors of [1] in order to simplify it even further and to introduce concepts from recombinant evolutionary techniques.The resulting algorithm, which can be implemented using only addition and subtraction operators and a simple 1-bit random number generator, is well suited for dedicated hardware settings.
Despite this rather specific original design specification, PSO-DR has shown to be a robust optimizer in its own right, equalling or surpassing a more common PSO implementation on a few tested benchmarks [1].In this paper we extend the original work of Peña et al. by considering alter-native topologies and parameter settings, running comparisons over a more comprehensive test suite, deriving simplified variants of the algorithm, and subjecting the model to a burst analysis.
The following section introduces PSO-DR (known here as model 1) as originally defined by Peña et al. and summarizes the burst analysis of [2].Section 3 describes a series of simplifications to PSO-DR (models 2 and 3) which are introduced in this paper.The motivations for these simplifications are explained.Section 4 presents the results of performance experiments of models 1-3, and for comparative purposes, standard PSO.Following this, the paper proceeds with an empirical investigation of bursting patterns in recombinant PSO.The final section together draws together the experimental results of this paper and advances some ideas for the immediate future of PSO research.

PSO WITH DISCRETE RECOMBINATION
The velocity update for particle i in standard PSO (SPSO) in the inertia weight formalism is where d labels components of the position and velocity vectors, d = 1, 2, . . ., D, p i is the personal best position achieved by i, p n is the best position of informers in i's social neighborhood and u 1,2 ∼U(0, 1) [3].After velocity update, the particle position is adjusted: Peña et al. introduced a recombinant version of PSO by replacing either the personal best or the neighborhood best position by the recombinant position [1].We focus here on the former for reasons of improved performance and the more interesting social aspect.A recombinant position vector r is defined by where η d = U{0, 1} and p l,r are immediate left and right neighbors of i in a ring topology.While separate random numbers η d are used for separate dimensions d, a single value is generated for each single dimension and used for both occurrences of η d in that dimension.This places r i at a corner of the smallest D-dimensional box which has p l and p r at its corners.
The authors of [1], in a search for a very efficient implementation, argued for the removal of the random numbers u 1,2 from (1) and parameter settings φ = 2 and w = 0.5.The velocity update for the original form of PSO-DR is The choice of φ was based on the observation that φ ≈ 4.0 in standard PSO, but, since u 1,2 are uniform in [0, 1], the expectation value of φu 1,2 is 2.0.Furthermore, the multiplication by w = 0.5 can be implemented in hardware by a right shift operation.While optimal efficiency is desirable for hardware implementations, this issue does not concern us to the same degree in this study of (4) and it is one aim of this paper to study PSO-DR for arbitrary parameter values.
Although (4) contains a random element in the recombinant position, the acceleration parameters are constant.In other words, the update rule has additive rather than multiplicative stochasticity [2].This has two ramifications; first, a stability condition can be computed based on the theory of second order, fixed parameter, difference equations and second, recombinant PSO is predicted not to exhibit particle velocity bursts.The details of these results are to be found in [2].The stability condition is ( It is known that PSO at stagnation, that is, when no improvements to personal bests are occurring, and the particles effectively decouple, exhibits bursts of outliers [4].These are temporary excursions of the particle to large distances from the attractors.A burst will typically grow to a maximum and then return through a number of damped oscillations to the region of the attractors.The origin of bursts, and of the concomitant fattening of the tails of the position distribution at stagnation, can be traced to the second-order stochastic difference equation which is equivalent to SPSO with the identification a(t) = (φ/2)(u 1 + u 2 ) − w − 1, b = w, and c(t) = (φ/2)(u 1 p 1 + u 2 p 2 ) for fixed attractors p 1,2 .Since max(|a|) > 0, amplification of x(t) can occur through repeated multiplication of x(t) by a despite the second order reduction by multiplication by the constant b.Interestingly, the distribution tail of |x|, by virtue of the bursts that become increasingly less probable for increasing size, is fattened compared to an exponential falloff as provided by, for example, a Gaussian.A theoretical justification of these power laws and some empirical tests can be found in [2].PSO bursts differ from the random outliers generated by PSO models which replace velocity by sampling from a distribution with fat tails such as a Richer and Blackwell [5].In contradistinction to the outliers of these "bare bones" formulations [6], the outliers from bursts occur in sequence, and they are one dimensional.Bursting will therefore produce periods of rectilinear motion where the particle will have a large velocity parallel to a coordinate axis.Furthermore, large bursts may take the particle outside the search space.Although this will not incur any penalty in lost function evaluations if particles that exit the feasible bounds of the problem are not evaluated, as is the common approach to this situation, they are not contributing to the search while in outer space.PSO-DR, which is predicted not to have bursts [2], therefore provides a salient comparison.

SIMPLIFYING RECOMBINANT PSO
This section details the two new recombinant models that are being proposed in this paper.To begin, an investigation into PSO-DR reveals more interesting properties of the formulation.Performance plots for a sweep through parameter space to find an optimal balance between the inertia weight coefficient w and the φ coefficients show that while the optimal region is spread across the parameter space, it also intersects the axis for the w term (see Figure 1 for results on selected functions from Table 1).This demonstrates that the system is able to obtain good optimal results even at w = 0.0 and there is no inertia term in the velocity update equations.
Model 2 PSO-DR sets w = 0, with a velocity update, Velocity now serves as a dummy variable in the update equations ( 1) and ( 2) and model 2 can be represented as a single, velocity-free rule DR2 : At this point, the two φ terms were detached and another sweep through parameter space to find an optimal combination of the recombinant component via its coefficient φ 1 and the neighborhood best component via its coefficient φ 2 was performed.Surprisingly, results again showed that the optimal region intersects an axis, this time for the neighborhood term (p gd − x t id ) (see Figure 2 for selected results).This allows a further simplification to the update equation (4), down to PSO-DR model 3: which is clearly a substantial reduction of the original PSO-DR equation.This PSO variant, if it proves to be viable, would raise a couple of interesting questions.To what extent is velocity a necessary component, or is it a relic of the biological origins of PSO [6]?Secondly, how important is the neighborhood component drawn from the single best neighbor?The optimization process of Model 3 is entirely driven by the recombinant component; this idea is reminiscent of fully informed particle swarms (FIPS) [7], where the entire neighborhood influences particle behavior.However, whereas FIPS allows every neighbor to influence a particle's behavior in every dimension, Model 3 allows only a single randomly chosen neighbor to fully influence the particle in each dimension.This gives the particle an updated position that is a combination of the best positions of all of its neighbors throughout all dimensions.
The following section presents evidence that PSO-DR3 is a viable alternative to standard PSO by reporting on performance results for all three models of PSO-DR over a number of commonly used test functions.

PERFORMANCE EXPERIMENTS
Algorithms were tested over a series of 14 benchmark functions chosen for their variety, shown in Tables 1 and 2. Functions f 1 − f 3 are unimodal functions with a single minimum, f 4 − f 9 are complex high-dimensional multimodal problems, each containing many local minima and a single Equation x 2 i − 10 cos 2πx i + 10 μ x i , 10, 100, 4 μ x i , 5, 100, 4 global optimum, and f 10 − f 14 are lower-dimensional multimodal problems with few local minima and a single global optimum apart from f 10 , which is symmetric about the origin with two global optima.Particles were initialized using the region scaling technique where initialization takes place in an area of the search space known not to contain the global optimum [8].To avoid initializing the entire swarm directly within a local minimum, as could be possible with f 12 − f 14 if initialization takes place in the bottom quarter of the search space in each dimension (as is common), an area of initialization composed of the randomly chosen top or bottom quarter of each dimension was defined, into which all particles were placed with uniform distribution.This method ensures that the swarm will not be initialized within the same area for every optimization run, but will still be confined to an area at most 0.25 D of the search space, making the chance of initialization directly on or near the global optimum extremely unlikely.In instances where the global optimum was located at the center of the search space (i.e., f 1 , f 2 , f 5 − f 7 ), the function was shifted by a random vector with maximum magnitude of a tenth of the size of the search space in each dimension for each run to remove any chance of a centrist bias [9].This investigation tested PSO-DR model 1 using both global (as used in the originally proposed algorithm) and local ring topologies for selecting the neighborhood operator p n .The parameter settings were Pena's, giving a velocity update with the form Results shown for PSO-DR model 2 use the value φ ≈ 1.6, while those for PSO-DR model 3 use φ ≈ 1.2.These values were empirically determined to be optimal for these algorithms; an analytical determination is the subject of current research.Results for both models 2 and 3 are shown for runs using a ring topology, which showed superior performance in testing.
For comparison, results are presented for a standard PSO algorithm (SPSO), which operates using the constricted velocity update equation with φ = 4.1, χ = 0.72984 and with 50 particles [3].All PSO-DR model tests were carried out using 50 particles as well.
Algorithm performance was measured as the minimum error | f (x)− f (x )| found over the trial where f (x ) is the fitness at the global optimum for the problem.Results were averaged over 30 independent trials, and are displayed, with standard error, in Table 3.Values less than 10 −15 have been rounded to 0.0.Performance results in Table 3 for all models of PSO-DR versus SPSO clearly indicate that it is a competitive variant, especially on highly complex problems such as f 5 (Rastrigin).Statistical tests were performed on these results to determine the significance of the performance differences between the Results for these statistical tests on PSO-DR model 3 and SPSO are shown in Table 4 and confirm that the performance is significantly improved on 3 of the 14 tested functions, equivalent for 10 functions, and worsened for 1 function for PSO-DR model 3 versus SPSO with ring topology.Perhaps the most impressive improvement comes for f 5 (Rastrigin), a notoriously difficult multimodal problem that PSO algorithms perform poorly on some problems in high dimensionality.
Due to the high number of function evaluations that were performed to obtain these results relative to previ-ous work (where only 30 k-60 k function evaluations might be performed), selected convergence plots are shown in Figure 3.These show that the standard PSO obtains superior results at the very start of the optimization process, up to 5000 function evaluations for the highest observed value (Figure 3(b)).After the point at which this occurs, PSO-DR model 3 surpasses the standard algorithm in performance, and maintains this advantage to the end of the 300 k function evaluations on 7 of the 14 tested problems ( f 4 − f 6 , f 8 , f 12 − f 14 ).On problems for which both algorithms attained equal error levels of 0.0 ( f 1 , f 7 , f 9 − f 11 ), the point at which this occurs, that is, when SPSO "catches up" to PSO-DR model 3, can be observed in Table 3.On average, SPSO took 25% more function evaluations to attain the optimum than PSO-DR model 3 on these problems.Finally, for the two problems on which SPSO outperformed PSO-DR model 3 ( f 2 , f 3 ), the same early performance is seen with PSO-DR model 3 surpassing SPSO in performance early in the optimization process; in these cases, SPSO eventually repasses the other algorithm by 50 k function evaluations.A potential explanation for this behavior lies in the diversity of the swarms at this point in the optimization process.Figure 4 shows the mean Euclidean distance between particles for the corresponding convergence plots of Figure 3.It should be noted that uniform initialization was used in the trials used to generate these plots; relative performance between the algorithms was unaffected, and initializing particle positions uniformly throughout the search space removes an unrelated phenomenon in subspace initialization wherein the swarm expands greatly beyond the relatively small initialization region at the start of the optimization process to explore the search space.Expansion is common in the first few iterations using uniform ini-tialization as well, but this is inherent to the swarm behavior and influenced only by the size of the entire search space.
As can be seen in the plots of Figure 4, neither swarm type begins converging immediately following initialization but rather they maintain their diversity or expand slightly.On a comparative basis, the standard PSO swarm expands substantially more than the PSO-DR model 3 swarm; for example Figure 4(c) shows that after the first 100 function evaluations, the mean distance between particles in the standard PSO swarm increases from 23 to 31.5, while the PSO-DR swarm diversity increases only from 23 to 24.5.Similar disparities were observed for all other tested problems.It is reasonable to gather from these results that the higher swarm diversity for the standard PSO algorithm early in the optimization process demonstrates a wider spread of particle dispersion, and hence an improved probability of finding and starting to explore the basin of attraction for global or good local optima.PSO-DR model 3 expands very little, if at all, early in the optimization process, resulting in delayed acquisition of optimal regions of the search space.

EXAMINATION OF BURSTING
Bursts in the velocities of particles are commonly observed using the standard PSO algorithm.These are generated by means of the multiplicative stochasticity of the algorithm [2].In order to investigate bursting behavior in PSO-DR and SPSO an empirical measure was devised.
This bursting measure was implemented to highlight when a particle had a velocity in a single dimension that was considerably higher than the next highest dimensional velocity.Bursting patterns of behavior were detected by reporting that every time particle velocity in a single dimension was a set amount λ times higher than velocity in the next highest dimension.Bursting behavior is demonstrated in Figure 6, where the velocity of a single particle in a 10-dimensional problem is shown.On the plot of the multidimensional velocity of the SPSO particle, it can be seen that velocity in a single dimension increases suddenly and dramatically while remaining relatively level and low in all other dimensions.This is an example of a velocity burst.While the figure shows velocity for a single particle on a single run, examination of velocity plots for hundreds of particles over dozens of runs confirmed this to be representative of general particle behavior.
Velocity for a PSO-DR particle is also shown in Figure 6, and demonstrates the absence of bursts.Similarly to the SPSO plot, examination of a large number of plots confirmed this to be representative of general behavior for PSO-DR.
Examination of these empirical analyses show that PSO-DR clearly does not contain bursting behavior on the scale of SPSO while demonstrating equal or superior performance on 13 of the 14 benchmark functions, leading to the hypothesis that bursts are not, in fact, integral to the successful operation of particle swarm algorithms.The fact that a very few bursts do occur with PSO-DR indicates that it is a highly improbable feature of DR dynamics.
Analysis performed on statistics of several functions shows that particle updates involving bursts are far less effective than more common nonbursting updates.For example, results showed that for SPSO on f 5 with λ = 100, on average 20.1% of all particle, updates involve an improvement to the particle's best found position p i , whereas only 1.8% of updates involving bursts result in an improvement to p i .Likewise, on average 0.9% of all particle, updates improve the best found swarm position g, as opposed to only 0.01% for bursting particles.Burst frequencies for values of λ from 10 to 150 are shown in Figure 5.
It is also interesting to note that far fewer total updates result in an improved p i or g for PSO-DR when compared to SPSO, for example, results showed that 20.1% of all updates improve p i for SPSO compared with 0.64% for PSO-DR, and 0.91% improve g for SPSO compared with 0.02% for PSO-DR on f 5 for λ = 100.

CONCLUSIONS
Simplification of the standard PSO algorithm is an important step toward understanding how and why it is an effective optimizer.By removing components of the algorithm and seeing how this affects performance, we are granted insight into what those components contribute to overall particle and swarm behaviors.
In particular, this paper has proposed a very simple PSO DR3 : x t+1 id = x t id + φ r id − x t id (13) which offers competitive performance to standard PSO, but removes multiplicative randomness, inertia, and the personal memory term p i from the position update.
There is still much to be done before questions concerning PSO behavior can be completely answered, and it is expected that the next decade of PSO research will be focused on understanding the basic algorithm that powers both the standard implementation and its variants.
In that light, the PSO-DR variant is important not only because of its improved performance on several benchmark functions, but also because its simplified state allows us to examine what happens to the standard algorithm when pieces are modified or removed.Based on the results presented here, it can be argued that large bursts are not generally beneficial or integral to PSO performance, and may possibly be detrimental.Although the presence of particle outliers is demonstrably important for swarm optimization (as demonstrated in bare bones analysis, [6]), bursts, which are sequences of extreme particle positions, occurring along an axis and reaching outside the search space, remain a special feature of velocity-based swarms.This work, which compares standard PSO to a burst-free but comparable optimizer suggests that bursts are disadvantageous in general.(However, in the coincidence that the objective function has a rectangular symmetry aligned with the axes, then bursting may actually be fortuitous.) Further, the replacement of the direct personal influence operator p i from SPSO with the recombinant term r i derived from its neighborhood in PSO-DR strengthens the case for PSO being mostly reliant on social interaction as opposed to personal experience.This is further supported by the effectiveness of PSO-DR model 3, which lacks a cognitive term altogether.The social behavior occurring inside of a swarm is still a wide-open area in the field, and will hopefully constitute a great deal of the future research devoted to the development of a better understanding of this deceptively simple optimizer.
Another property of PSO-DR resides in attractor jiggling that takes place even at stagnation (no updates to any p i ) since r i is never fixed.This jiggling will work against convergence and could propel the swarm onwards.This, and other matters concerning the nature of recombination within PSO, will be the subject of further study.

1 ( 5 (Figure 1 :
Figure1: Optimal regions for w versus φ in PSO-DR model 1, found empirically through 30 runs for each combination of parameters w = 0.0, . . ., 1.0, φ = 0.0, . . ., 5.0 at a granularity of 0.1.Each contour line represents a 10% improvement in performance with the region within the innermost line representing the best performing 10% of possible combinations of w and φ.

Figure 3 :
Figure 3: Convergence plots for SPSO and PSO-DR model 3 early in the optimization process.

Figure 4 :
Figure 4: Diversity plots for SPSO and PSO-DR model 3 early in the optimization process.

Figure 5 :
Figure 5: Frequency of updates showing burst behavior for values of λ.

Table 3 :
Mean error after 30 trials of 300 000 evaluations.Necessary function evaluations are shown where 0.0 error was attained.

Table 4 :
Significance for SPSO versus PSO-DR model 3 with ring topologies.