Forma Analysis of Particle Swarm Optimisation for Permutation Problems

Particle swarm optimisation (PSO) is an innovative and competitive optimisation technique for numerical optimisation with real-parameter representation. In this paper, we examine the working mechanism of PSO in a principled manner with forma analysis and investigate the applicability of PSO on the permutation problem domain. Particularly, our derived PSO schemes are empirically studied based on the quadratic assignment problem (QAP) benchmarks to justify its comparable performance, which in turn implies the benefits of our approach in applying PSO to the discrete problem domain.


INTRODUCTION
PSO was originally designed as a numerical optimisation technique based on swarm intelligence.In the literature, there are a few attempts to exploit its usage in the discrete problem domain [1][2][3][4], which are mostly performed using a binary encoding.However, research on the transformation of the working mechanism of PSO to the permutation problem domain, where the representations are highly constrained, has been relatively limited [3,5,6].This limitation is mainly caused by the lack of a principled generalisation of PSO to guide its adaptation to discrete combinatorial problems [3].
In this paper, we aim to design PSO operators for permutation problems without losing the underlining principles of the original PSO.A PSO operator template will be formally defined with forma analysis in form of equivalence relations.Following the introduction of formal descriptions of permutations, concrete PSO operators are then formally derived based on the PSO operator template.Empirical study of the derived PSO schemes is also carried out based on the QAP benchmarks.

PARTICLE SWARM OPTIMISATION
PSO was initially introduced by Kennedy and Eberhart [7] as a function optimisation technique inspired by the behaviors of fish schooling and bird flocking.The PSO approach simulates the social behavior of particles/agents moving in a multidimensional search space, while each particle has its position and velocity.Each particle is treated as a potential solution to the optimisation problem.Usually, the position is represented as a vector: (1), x it (2), . . ., x it (D) , ( 1 ) while the velocity is represented as another vector: where i represents the index of the particle, t is the time step, and D is the dimensionality of the search space.
For each generation, the particle compares its current position with the goal (global best/personal best) position, adjusting its velocity accordingly towards the goal with the help of the explicit memory of the best position ever found both globally and individually.
The most popular formulation of how particle adjusts its velocity and position [8,9] is shown in In the above formulation, d is the index of dimension in the search space, w represents the inertia weight of the "flying" dynamic, which describes the degree of maintaining its previous velocity vector against the attraction point.c 1 and c 2 are regarded as cognitive and social parameters for the algorithm, respectively, while r 1 and r 2 are random numbers within the interval [0, 1].Pb i is the personal best position which is recorded by particle i, while Gb is the global best position obtained by any particle in the population.
Adapting standard PSO to permutation problems has been a rather interesting task, as researchers are curious about its performance in the discrete domain.In this paper, we suggest that forma analysis gives a possible solution to achieve such task in a principled manner.

FORMA ANALYSIS
Forma analysis [10] is a formal but practical method that allows the problem representation and its operators to be structured in a formal manner by using equivalence relations.Each equivalence relation Ψ divides the search space into disjoint equivalence classes Ξ ψ (depending on which value the solutions match), with individual equivalence classes being denoted by ξ, which gathers solutions that are equivalent under a certain equivalence relation.
The initial aim of forma analysis [10] was to codify knowledge of the problem domain using a set of equivalence classes (or formae) which is assumed to be able to cluster solutions with related performance in order to guide the search process more effectively, for example, edges if we are considering the travelling salesman problem.Since equivalence relations/classes have the ability to capture the properties of solutions, concrete operators can thus be mathematically derived with regards to how these equivalence relations are formally manipulated by the operator templates.
Figure 1 illustrates briefly the approach we adopt here in this paper, which can be explained as follows.Given an operator template, any suitable description of the considered problem domain gives rise to a concrete operator, whose behavior and performance are related to the assumption adopted to describe the search space.This approach effectively allows PSO to be adapted to arbitrary problem domain with the underlying working mechanism retained.Taking a step further, this approach is applicable across different problem domains and different optimisation techniques.Some of the characteristics and operator templates related to forma analysis [10][11][12] are given below to facilitate our generalisation of PSO.

Describing the search space
The key concept is that of a basis: a set of equivalence relations that allows us to describe the search space S. Definition 1.A subset Ψ of a set of equivalence relations is termed as a basis for the set of equivalence relations, if Ψ spans the set and Ψ is independent.
An encoding can thus be derived by taking the image of the basis equivalence classes corresponding to a particular solution in the search space.

Domain independent operator templates
Forma analysis can derive operators that explicitly manipulate the given equivalence relations.This is achieved by combining the basis with domain independent operators for specifying operator behavior in terms of basis.
One such (domain independent) operator template, which is related to the work presented in this paper, corresponds to the (strict) k-change operator template [12], formally as where x and y represent the solution to be perturbed and the produced child, respectively, while dis Ψ (x, y) represents the forma distance [13] between x and y under basis Ψ.This operator template can be interpreted as changing the values for k equivalence relations in x produces solution y.
The other operator template, random transmitting recombination (RTR) [10], is defined to select a child solution z out of the dynastic potential of the parent solutions x and y.RTR(x, y, Ψ) can be formally defined as where the actual child solution z is chosen from the set above uniformly at random.The underlying idea of RTR is to randomly generate child solution using parental equivalence classes (genetic material).

Applicability
Although the above concepts of forma analysis are developed under genetic algorithms, it has been shown that the forma analysis methodology itself is generalisable to other evolutionary optimisers based on (theoretically) any problem domain from the knowledge-based system (KBS) design standpoint [12].Also, the underlying idea proposed by Surry [11] on defining formal representation in order to derive domain specific operators has also been justified to be rather successful.Furthermore, our previous work on adapting PSO to the binary search space following the forma analysis approach has provided some preliminary positive results and expectations.In summary, the proposed forma analysisbased operator design approach is applicable in the context of PSO and permutation problem domain, considering the previous theoretical and practical evidence.

FORMAL DESCRIPTIONS OF PERMUTATION
As previously studied in [12,14], there are mainly three different representations for permutation that we can adopt to describe the permutation problems: (1) position-based representation, which decides the absolute position of an item (e.g., item 5 is at position 4), (2) precedence-based representation, which decides whether one task is performed before another (e.g., task 5 appears before task 4), (3) adjacency-based representation, which decides whether two items are next to each other (e.g., item 4 is next to item 5).
In the following sections, formal descriptions for permutation with a set of n elements N = {e 1 , . . ., e n } will be reviewed in form of equivalence relations, followed by some formal definitions of their induced constraints and their distance measurements [12,15].

Position-based description of permutation
For position-based description, each position in the permutation i (i = 1, . . ., n) is defined as an equivalence relation ψ pos(i) to form the basis of position equivalence relations such that The equivalence classes (formae) for each position i are the set of n elements: As an example, the permutation (1, 4, 3, 2) can be described by the set of equivalence classes In addition, an induced feasibility constraint for this description C pos needs to be added to ensure that different elements do not occupy the same position, and no two different positions can take the same element, formally as follows.
Definition 2. Given any two equivalence relations ψ pos(i) and ψ pos( j) for a permutation, the position-based constraint C pos can be defined as ∀i, j (i / = j) : ψ pos(i) / = ψ pos( j) .( A direct implication of this constraint is that the 1change neighborhood structure should be prevented as this would involve placing two elements in the same position. The distance metric for this formal description is simply the number of positions in the permutation that have different elements (i.e., the hamming distance) according to the definition of forma distance [13].For example, the distance between (1,4,3,2) and (1, 4, 2, 3) is 2, since they are different in two positions.

Precedence-based description of permutation
For precedence-based description, a set of basis precedence equivalence relations Ψ prec between any two different elements in the permutation will be considered formally as However, by considering the fact (constraint) that ψ prec(ei,ej ) and ψ prec(ej ,ei) are reverse relations: we can remove unnecessary relations by enforcing a sequence (e.g., e 1 < e 2 < • • • < e n ) in the definition of the relation, such that Obviously, the equivalence classes are simply true/false for whether element e i precedes element e j in the permutation formally as Ξ ψprec(e i ,e j ) = ξ 0 prec(ei,ej ) , ξ 1 prec(ei,ej ) .(15) In addition, the feasibility constraint C prec needs to be added that a valid permutation exists if and only if the relationship between the precedences is consistent (in that the transitivity condition is preserved), as shown below.
Definition 3. Given any two equivalence relations, ψ prec(ei,ej ) and ψ prec(ej ,ek) , for a permutation, the precedence-based constraint C prec can be defined as ∀e i , e j , e k ∈ N e i / = e j / = e k : ψ prec(ei,ej ) ∧ ψ prec(ej ,ek) The distance metric can be specified as the number of different precedence relations between two solutions.For example, the distance between permutation (1, 2, 3, 4) and (1, 4, 2, 3) can be obtained by comparing the following two sets: (2,4) , ξ 1 prec (3,4) , In practice, bubble sort will be sufficient to calculate the distance from one permutation towards another.This can be achieved by sorting one permutation towards another, when setting one as the initial permutation and the other as the goal permutation which represents the right "order."If we are looking at the previous example with two permutations, to calculate the distance from (1, 2, 3, 4) (i.e., the initial permutation) to (1, 4, 2, 3) (i.e., the goal permutation), the "priority" information can be obtained from (1, 4, 2, 3): Thus, the initial permutation can be represented (or more closely, implemented) in terms of (e i , order(e i )): (1, 1), (2, 3), (3,4), (4, 2) , (19) and it should be sorted according to the values of order(e i ) in order to "transform" to the goal permutation with a certain number of basic swap adjacency mutations.

Adjacency-based description of permutation
For adjacency-based description, a set of basis adjacency equivalence relations Ψ adj is considered for any two elements to decide whether they are adjacent formally as Due to the fact that ψ adj(ei,ej ) and ψ adj(ej ,ei) are equivalent relations for symmetric problems (Surry presented an extensive study about directed edge representation for permutation in his work [11]): ∀e i , e j ∈ N e i / = e j : ψ adj(ei,ej ) ⇐⇒ ψ adj(ej ,ei) , (21) we can remove redundant relations by enforcing a sequence (e.g., e 1 < e 2 < • • • < e n ) in the definition of the relation, such that As undirected edges are considered for adjacency-based description, the equivalence classes are simply true/false for whether there exists an edge between element e i and element e j formally as Ξ ψadj(e i ,e j ) = ξ 0 adj(ei,ej ) , ξ 1 adj(ei,ej ) .(23) In this case, ξ 1 adj(ei,ej ) represents a positive edge so that edge (e i , e j ) must exists in the solution (e.g., a tour), while ξ 0 adj(ei,ej ) stands for a negative edge so that edge (e i , e j ) cannot be included into the solution.
In addition, the feasibility constraint C adj needs to be added so that each vertex of the undirected graph corresponding to the permutation can only participate in two edges and still be a valid permutation as follows.
Definition 4. Given an equivalence relation ψ adj(ei,ej ) for a permutation, the adjacency-based constraint C adj can be defined as ∀e i , e j , e k , e l ∈ N : ψ adj(ei,ej ) ∧ ψ adj(ei,ek) =⇒ ¬ψ adj(ei,el) .(24) The distance between any two solutions in the search space under adjacency basis is thus calculated as the number of different edges that they possess.For instance, the distance between permutation (1, 2, 3, 4) and (1, 2, 4, 3) can be obtained by calculating the number of different adjacency relations between the following two sets: However, on the "phenotypical" level this forma distance reduces to the number of different positive edges (n minus common positive edges), which should be 2 in this case.This is mainly because negative edges do not directly affect the quality of solution, although they implicitly affect the selection of positive edges through the feasibility constraints.

PSO OPERATOR TEMPLATE-A GENERALISATION
The generalisation of PSO is not as straightforward as some other optimisation techniques, such as the generalisation of Differential Evolution [16].This is mainly because, besides the generalisation of position in a multi-dimensional space, we also need to generalise the velocity of particle in a multidimensional space.
By observing the update equation (as shown in ( 4)), we can easily find out that in which case the velocity is effectively the step size to perturb one solution towards another (in other words, the distance between the two solutions), which is jointly decided by the inertia component w * v it (d), the bias towards its personal best , and the bias towards the global best

PSO operator template
By revealing the fact that velocity is the distance between the previous and current positions of the particle, we can define the operator template (under the basis Ψ) as follows.
Definition 5. Given a current position X t (if one temporarily ignores the index i of this particle in the population), its position for the next time step X t+1 can be produced according to (27): where dis Ψ (X t , X t−1 ) is used to formalise the previous velocity of the particle, while dis Ψ (X t+1 , X t ) represents the current velocity for perturbing the current solution X t .

Stochastic interpretation of accumulation
In the context of real-vectors, the accumulation of distances is straightforward to understand.However, for permutation problems the consideration of "directionality" becomes rather complex from the practical standpoint.By taking into account the fact that forma distance includes domainspecific distance magnitude and direction which cause certain difficulties in the context of permutation problems, a reasonable interpretation of the PSO operator template is required to facilitate the derivation of suitable PSO operators for permutation problems.From this perspective, the original PSO operator template (before interpretation), which abstracts how solutions are manipulated, can be regarded as an operator design guideline embedded with the PSO working mechanism.As a matter of fact, various interpretations and approximations have also been made in the previous work of forma analysis for the purpose of facilitating operator derivations [11,12] (such as the derivation of RTR operator template to blend crossover/line recombination crossover for continuous domain).
By understanding the fact that the perturbation of the current individual is jointly decided by three components (with their degrees of influence distributed proportionally), we can give a stochastic interpretation of the PSO operator template as follows.
The perturbation dis Ψ (X t+1 , X t ) can be interpreted separately in three cases: where b equals (w + c 1 r 1 + c 2 r 2 ).In this interpretation, the parameters w and c 1 /c 2 are used to represent the probabilities of the current particle's commitment to each of the components.In other words, the new velocity has a probability of (w/b) to stay the same as previous value, a probability of (c 1 r 1 /b) to converge towards its personal best record, and a probability of (c 2 r 2 /b) to converge towards the global best record.
The decomposition of the flying dynamics of a particle is illustrated in Figure 2. The current particle has three options: a random k-change according to the magnitude of its previous velocity, change towards its personal best record, or change towards the global best record.(Strictly according to the original PSO working mechanism, this should not be a random k-change.Instead it should be a directed perturbation following the previous direction.However, since this direction is difficult to "replicate" practically, we choose to simplify this to a random k-change with the magnitude maintained only.) However, the mixing effect of several distances with different directions is hard to represent in the context of permutation.Modelling the accumulation from a stochastic perspective helps us avoid this unnecessary complication.

Incorporation of direction-A crossover perspective
Given that we already have the mechanism to separate each distance component, the next question is how to incorporate direction to guide our PSO operator so that the particles can converge towards superior records.
As aforementioned, if a greedy component with superior record considered is selected to perturb the current particle, we directly have which means that the particle should jump to the attraction point (i.e., Pb/Gb) without any reservations.However, this is the situation we should try to avoid in practice, since the search may stagnate at a very early stage dominated by the initial superior component.Thus, we desire the current particle to converge somewhere between itself and the attraction point along the right direction.By taking a closer look at the guided PSO operator, we can actually find that the effect of perturbing one individual towards another is the same as making a crossover (e.g., RTR) between these two individuals where the direction can be naturally retained.Understanding the greedy components of PSO operator in terms of crossover makes the practical incorporation of direction in the operator much easier.

Understanding PSO operator template
From the above discussion, we can evolve a new interpretation of the PSO operator template.The new position X t+1 can be generated according to three cases separately: where for the first case a random k-change, with k = |dis Ψ (X t , X t−1 )|, is adopted to perturb the current particle, while for the second and the third cases a RTR operator template is used to perturb the current particle towards the attraction point (i.e., Pb/Gb).For the first case, only the magnitude of the previous velocity is considered to parameterise the random k-change.For the second and the third cases, the magnitude of velocity should be updated according to the difference between the new position and the previous position of the particle after the RTR "move."

Brief comparisons with geometric PSO
In this section, our forma analysis framework is compared with the geometric framework [3] which has also been proposed for generalising PSO.

Comparisons on the framework
On the framework level, both frameworks aim to generalise PSO based on underlying optimisation components (e.g., solution representation or distance notion) so that the abstraction of optimisers, either in the form of standard operator template RTR/RRR/RAR or in the form of line segment/ball [3], can be further generalised to arbitrary problem domain.In both cases, these abstractions need to be instantiated to concrete operators by embedding domain knowledge into the corresponding abstraction.In other words, these abstractions all carry some form of domain knowledge, either by using equivalence relations or by using distance notions, so that they can be applied to concrete problem instances.
The main difference on the design concept level lies in the choice of such abstractions and the "carrier" of domain knowledge.For our forma analysis approach, the solution representation is generalised with equivalence relations/classes so that formal representation can be defined in an unified manner, while operators that manipulate the solutions are abstracted as operator templates that process equivalence relations.In contrast, the geometric framework is more about generalising optimisers based on a notion of distance where different distance metrics give rise to different operators with regards to the predefined geometric operators using the notions of line segment and ball [3].Each framework looks at operator design through abstraction and formal description (of either solutions or distance) at different levels, with potentially equivalence relations lying at a slightly lower level than distance notion as distance notion is effectively a derivative of equivalence relations once forma distance [13,15] is defined under the forma analysis framework.

Comparisons on PSO generalisation
On the practical PSO generalisation level, both approaches are generally different in two places as well.
First of all, the concept of velocity has been removed from the geometric framework (thus, including the simplification of the concept of inertia as a component in geometric crossover), while a random mutation is added to the geometric PSO as a potential replacement for perturbation purposes.In our forma analysis approach, velocity has been interpreted and formulated as distance (more precisely forma distance) in the previous time step.However, velocity itself is a rather complicated concept to formulate as it involves the interpretation of both magnitude and direction which are hard to represent in the context of permutation problems.Certain simplifications and compromises have been made to maintain this concept for future research.
Secondly, the accumulation of greediness toward personal best and global best, balanced by previous velocity (or position), is interpreted differently as well.In geometric PSO, multiparental geometric crossover is used to linearly recombine these positions to produce the next position with different weights taken into account through the concept of product geometric crossover.In contrast, in our forma analysis approach, different convergence components are treated stochastically according to their different weights where higher weight represents higher probability of being treated as the convergence direction for the next time step (and vice versa).By looking at the full picture, different components (personal best, global best, or previous velocity) all share the probability of being selected to guide the next move.Standard RTR operator template is used to converge towards superior solutions with the direction of distance naturally maintained, while standard k-change operator template is used to represent the inertia component.

"BLENDED" PSO OPERATORS FOR PERMUTATION PROBLEMS
As mentioned earlier, the formal descriptions of permutation problems implicitly introduces some feasibility constraints to produce a valid solution.When we design PSO operators for permutation problems, these constraints must be satisfied (or handled properly) which is effectively a subproblem to solve.(Of course, these constraints only exist if we are only interested in searching feasible regions, while search techniques making use of infeasible regions are out of the scope in this discussion.)In our previous work [15], we have presented that the application of standard genetic operators for permutation problems can be viewed as a process of constraint satisfaction, so as to instantiate the declarative nature of forma analysis in a systematic manner and bring forma analysis to a more practical level.In this section, we will show how the PSO operators for permutation problems can be obtained procedurally based on the operator template from a constraint satisfaction problem (CSP) [17] solving perspective.
According to the aforementioned stochastic interpretation of PSO operator template, the outcome is effectively a blended operator with three different "phases": k-change according to k = |dis Ψ (X t , X t−1 )|, RTR (X t , Pb, Ψ), and RTR (X t , Gb, Ψ).The separate instantiations of k-change and RTR in the context of different formal descriptions of permutation should directly give rise to the instantiations of the corresponding blended PSO operators.

Position-based k-change: O k-pos
For the position-based description, dis Ψpos (X t+1 , X t ) can be obtained as the hamming distance between X t+1 and X t under basis Ψ pos .Thus, we only need to apply a distance of k based on X t to generate X t+1 such that dis Ψpos (X t+1 , X t ) = k.For example, given that X t represents a permutation (1, 2, 3, 4) such that what would be the potential candidates for the permutation X t+1 so that they have a distance of k (e.g., k = 2)?
The most straightforward thought would be to randomly select k equivalence relations and change the equivalence classes they fall into.However, the feasibility constraint C pos induced by position-based description must be satisfied to produce a valid permutation.In this case, the kchange operator for permutation following the positionbased description should involve a constraint satisfaction subproblem, where C pos must be satisfied to guarantee a valid permutation.
The CSP we consider here is defined as the k-change operator itself with C pos satisfied.Classical CSP techniques can be directly utilised to implement the operator.Now, we will follow the above example to illustrate how CSP can be effectively used in the position-based PSO operator design.
Given X t ξ 1 pos(1) , ξ 2 pos(2) , ξ 3 pos(3) , ξ 4 pos(4) , we can first uninstantiate k (e.g., k = 2) equivalence relations to produce a partial permutation for a potential distance of 2. This gives us C 2  4 options from which we can uniformly select one to serve as the base (partial permutation) of X t+1 (e.g., ξ − pos(1) , ξ 2 pos(2) , ξ − pos(3) , ξ 4 pos(4) ).Then, what we need to do is simply to reinstantiate these 2 equivalence relations to suitable classes such that C pos is satisfied.The first position of the partial solution (−, 2, −, 4) is considered first.The domain of the first position is {1, 3}, while the domain of the third position is {1, 3} as well.After an equivalence class (1 or 3) is chosen for the first position, the domain is reduced for the third position through constraint propagation.After instantiating all the possible solutions, X t+1 can be randomly selected among the whole set of feasible solutions to the CSP.
In fact, the working mechanism of O k-pos has a similar effect as the scramble mutation [18], which rearranges a certain number of positions.The only difference is that the selection of these positions should be purely random, other than inside a continuous block.Thus, O k-pos can be defined as the modified scramble mutation operator with k random positions, where k is decided by the magnitude of the current velocity |dis Ψpos (X t , X t−1 )|.

Precedence-based k-change: O k-prec
For the precedence-based description of permutation, the distance of two permutations is the number of different precedence relations dis Ψprec (P 1 , P 2 ).As aforementioned, the bubble sort algorithm is sufficient to calculate the precedence distance of two permutations, with the distance decided as the number of adjacency-swap to sort one permutation towards another.

Uninstantiate
Figure 3: Illustration of reinstantiation of a permutation based on precedence-based description.Symbol "(i, j, >)" means element i is before element j, while "(i, j, <)" means otherwise.The framed permutation has a "strict" distance of 3 to the initial permutation.
">" to represent the precedence such that symbol "(i, j, >)" means element i is before element j while "(i, j, <)" means otherwise.X t+1 can be randomly selected among the set of feasible solutions to the CSP.By observing the effect of changing a single precedence equivalence class, it is not difficult to find that a 1-change (minimal mutation) reduces to the adjacent swap mutation [18], which swaps two contiguous elements.Thus, O k-prec can be approximately regarded as equivalent to a k-iterated adjacent swap mutation with k decided by the magnitude of the current velocity |dis Ψprec (X t , X t−1 )|, which can be effectively calculated through bubble sort.

Adjacency-based k-change: O k-adj
Since each potential edge is defined as an equivalence relation for the adjacency-based description of permutation, the distance of two permutations corresponds to the "edgedifference" between them.
For the simplicity of illustration, we assume that k = 4, which should be applied to X t to generate X t+1 .As an example, given the base permutation (1, 2, 3, 4) being represented as we first uninstantiate 4 relations (randomly chosen from C 4 C 2 n options) to produce the partial permutation for X t+1 , for example: By solving the CSP to generate X t+1 such that C adj is satisfied, we can get a solution permutation (1, 3, 2, 4) with a distance of 4 to X t : The minimal mutation implied by the adjacency-based description is an edge 2-change mutation, because 1-change automatically violates the feasibility constraint C adj .(A standard edge2-change mutation involves the "replacement" of 2 edges which in turn results in the change of 4 adjacency equivalence relations, since there are also changes involving negative edges.However, from the implementation perspective only phenotypical (or positive) edges are considered.)This operator is equivalent to the edge-reverse mutation in the literature [18], which reverses the positions of a segment in the permutation in such a manner that the feasibility constraint C adj can be satisfied automatically.Due to the fact that any edge-reverse mutation involves the change of 2 edges, O k-adj can be generally approximated by a k * -iterated edge-reverse mutation with k * decided by k as follows: where any function that rounds k/2 to its nearest integer is sufficient.

Position-based RTR: RTR pos
For the position-based description Ψ pos , its feasibility constraint C pos largely reduces the number of feasible solutions to the basic CSP, where RTR is interpreted as recombination of parental equivalence classes, because not all combinations of them lead to valid permutations.By satisfying C pos while instantiating P c , we can obtained all potential candidates for RTR pos as a result of solving the CSP.For example, given P p1 = (1, 2, 3, 4) and P p2 = (4, 3, 2, 1), RTR pos can produce potential candidate(s) for P c by reinstantiating it (as shown in Figure 4) with corresponding equivalence classes, as shown in (37) (37) To transmit position features from parents to children and interpret the feasibility constraint C pos in a more natural way, we produce a fully-transmitting crossover for permutation, namely position transmitting crossover (PTX), by identifying the constraint satisfaction process of operator as a CSP.
In PTX, both C pos and transmitting C t have to be satisfied as an interpretation of its CSP.(Transmitting can be regarded as an additional constraint imposed by operator.)In this sense, constrained positions are clustered to function separately.For example, given two permutations 3, 6, 5, 4, 2, 7, 8, 1 , 3, 6, 2, 5, 4, 8, 7, 1 , (38) we are able to construct the constraint graph implied by both C pos and transmitting C t , as shown in Figure 5 with constrained positions linked together.
The construction of the constraint graph is straightforward to understand-a value that has been taken for one position must be forbidden (constrained) for another position.For example, for position 3 (P 3 in Figure 5) either while the uninstantiated relations can be reinstantiated randomly to produce the child permutation, by solving a CSP such that simply C prec should be satisfied.
In fact, strictly transmitting crossover is also possible for precedence-based description.Due to the fact that the reinstantiation process of precedence relations is equivalent to the topological sorting problems [20], where a partial order needs to be completed to a linear order (in a directed acyclic graph (DAG) based on precedence) with a complexity of O(edges + vertices), we argue that the reinstantiation of precedence relations in the considered CSP can be solved in a deterministic polynomial-time in terms of topological sorting problems.(The partial order is defined by the equivalence relations where the parents are equivalent-the order that must be enforced for the child.) In the literature, precedence preservative crossover (PPX) [18] was found to be strictly transmitting.The underlining principle in PPX is that the precedence equivalence classes of parents are passed to the child in such an order (from left to right or more specifically from the node with no incoming edges in the precedence graph) that both C t and C prec are satisfied automatically.Practically, this order is implemented using a "from" table which indicates the switching process between the two parents.
Many readers may find that this is rather similar to the most popular algorithm used for topological sorting where the order can be completed by starting from the node(s) with no incoming edges.Switching between two parents simply aims at recombining the precedence equivalence classes of the two parents.
It is also easy to find that the set of all possible solutions produced by PPX is in fact a subset of the set of solutions produced by the above CSP approach.In other words, for each of the solution produced by PPX, there is always a corresponding reinstantiation of the partial child permutation.

Adjacency-based RTR: RTR adj
Regarding the adjacency-based description of permutation which has been proved to be non g-separable [10], literature [11,12] pointed out that transmission cannot be achieved without sacrificing assortment.From the CSP viewpoint, this implies that C adj and C t all together may make the corresponding CSP not strictly solvable (if the original parental permutations are not allowed to be repeated as a child permutation).Through investigation, we can also find that it is the case when two parents are the same for some equivalence relations that makes the CSP NP-complete in "strict" sense.For those adjacency relations that two parents are different, no restriction will be applied to the child solution for that relation (as both {1, 0} are allowed).
PSO schemes on the quadratic assignment problem (QAP) benchmarks.However, due to the fact that for QAP the absolute positioning of element is more related to the quality of solution [14], it is estimated that the PSO scheme with position-based description (i.e., PSO pos ) should perform better than the other PSO schemes.
After a brief description of the problem formulation, we show both the experiment configurations and the experimental results, followed by a few discussions to help the understanding of the benefits of our approach.

Problem formulation of QAP
The quadratic assignment problem (QAP) is an important problem in both practice and theory.Many practical problems can be formulated as QAPs [23][24][25].The QAP can be described as the problem of assigning a set of facilities to a set of locations with the given distances between the locations and the given flows between the facilities.The goal is to place the facilities on locations such that the sum of the product between flows and distances is minimised.Formally, given n facilities and n locations, two n × n matrices A = [a i j ] and B = [b rs ], where a i j is the distance between locations i and j, and b rs is the flow between facilities r and s, the QAP can be formally defined as The QAP is a class of NP-hard optimisation problems.It is considered as one of the hardest optimisation problems as general instances of size larger than 20 cannot be solved to optimality.Therefore, to practically solve QAP with high quality solutions various heuristic algorithms have been proposed [26][27][28] trying to find high quality solution with limited computational resources.

Benchmarks and experimental settings
The benchmarks for this experimental study are acquired from QAP-LIB [29].It has been pointed out [30] that there are four types of QAP instances: unstructured, randomly generated instances, unstructured instances with griddistances, real-life instances, real-life like instances.We select 2 instances for each type of QAP as our benchmarks, with size no smaller than 20.For type 1, we choose Tai20a and Tai40a.For type 2, we choose Nug20 and Sko56.For type 3, we choose Bur26a and Ste36a.For type 4, we choose Tai20b and Tai40b.
For each of the instances, fine tuning is carried out for each of the algorithms to reach its best performance among different combinations of parameter settings with equal number of generations.The parameter settings with the best performance over 20 independent runs for each algorithm will be used to get the execution results.(The performance is evaluated by considering both its average best solution found and its average number of generations to reach its best solution.The number of generation to reach its best solution is only considered when two parameter settings have the same average best solution.) The original free parameters for each instances are W and C, where W represents the parameter to control the inertia weight and C represents the value taken by c 1 and c 2 (as in literature c 1 and c 2 often adopt the same value).By taking a closer look, it is not difficult to realise that in our PSO schemes for QAP, the probability of taking each branch can be simplified to The above simplification means that the essential element controlling the search dynamics is (W/C).This implies that (W/C) is effectively the only parameter we need to deal with for our PSO scheme, where a relatively larger (W/C) would encourage a more explorative search, while a relatively smaller (W/C) would favour a greedier search towards either one of the superior attraction points (i.e., Pb/Gb).The population size is fixed to be an appropriate number (100) for each instance through observations.Since we do not have preknowledge about the discrete PSO operators, the tuning process is only carried out in a coarse manner, where the options for W and C are both formatively set to be {1, 2, 3, . . ., 9, 10} (although only W/C matters).

Observations and experimental results
According to the tuned parameter settings, 50 independent runs were executed for each PSO scheme under each instance to produce experimental results.

Comparisons among PSO schemes
To examine the search behavior of the proposed PSO schemes, we track three components of each scheme that are felt to be essential to the search dynamics of PSO.These three components are: the mean cost of the population; the average best of the population; the average velocity of the population.The mean cost and average best of the population are plotted together to reveal the convergence pattern, while the average velocity is plotted to track the exploration power.Only the search patterns for Tai20a are illustrated here (as shown in Figure 6), since the patterns for all the other instances are quite similar.In addition, the execution results are also shown in Table 1.
Through observation shown in Figure 6(a), we can find that for position-based PSO (PSO pos ) both the mean cost and the average velocity of population are changing throughout the search process smoothly, as the whole population is constantly converging towards the attraction point(s).The pattern of the corresponding velocity (as shown in Figure 6   higher level, before it decreases constantly as the search goes to a later stage.These searching patterns are generally very similar as those in the traditional PSO in the real-vector space. The situation for PSO prec is however slightly different from PSO pos according to the patterns shown in Figures 6(c) and 6(d).Although the mean quality of the whole population still converge towards the superior individuals (i.e., the global/local best solutions) and the average velocity decreases with the generation number, it is observed that there are some "resistances" in its convergence process.The situation is even worse for PSO adj (as shown in Figures 6(e) and 6(f)), as the mean quality of the whole population hardly converges, and the average velocity of the population decreases more slowly relative to the average velocity for PSO pos .Presumably, the degree of such resistance is the main cause for performance loss/gain, considering the fact that both PSO prec and PSO adj are outperformed by PSO pos (as shown in Table 1).
This can be mainly explained by the different information transfer efficiencies with different descriptions for QAP.The position-based PSO scheme (PSO pos ) is the most efficient scheme among the three since absolute positioning is being processed for its corresponding position-based description, which is most related to the quality of solution for QAP.From the implementation standpoint, the "recombination" of the position-based equivalence classes from both the current individual and its superior records serves as the main drive for convergence.However, the information processing of absolute positioning is disrupted by both precedence description and adjacency description to different degrees.This is also quite obvious from the implementation standpoint, since the "recombination" of precedence/adjacency information certainly will not produce the convergence of solution quality efficiently in terms of absolute positioning.The degree of such "disruption/deviation" is mainly decided by its correlations to the positional description.This can be further illustrated by the fact that precedence-based PSO performs better than adjacency-based PSO.As a matter of fact, precedence relations are more correlated to positional relations, which can be easily understood by inspecting the shift operator-as the number of precedences changed by the shift operator increases, so does the number of positions in a smooth progression.In contrast, adjacency relations are found to be poorly correlated with positional relations, since the number of adjacency relations changed by edgereverse mutation is poorly correlated with the changes in the absolute positioning of permutations.
The above results also reflect the main argument we are making for the methodology in this paper: the search behavior and performance of the derived operator depend on the description for the specific problem.Further estimations can be that PSO prec should perform well in those problems where precedence information processing in permutation is essential (e.g., JSSP), while PSO adj should perform well in those problems where adjacency/edge information is related to solution quality (e.g., TSP).

Extended comparisons
In addition, another PSO scheme for QAP is also implemented with ring topology based on PSO pos to illustrate the benefits of changing topology.As shown clearly in Figure 7(a), the position-based PSO scheme with ring topology (PSO pos R) outperforms the position-based PSO scheme with global topology (PSO pos G) by enhancing the exploration power through the implicit search control introduced by ring topology.In order to compare the exploration power of both PSOs, the average velocities of both schemes are plotted in Figures 7(b).From Figure 7(b), we can easily find that the ring structured PSO (PSO pos R) has a greater persistence in maintaining its velocity level than the global PSO (PSO pos G), which in turn enables a better exploration of the search space.Another observation is that, by the end of the execution, the average velocity of PSO pos R has not decreased to 0 yet, which means that given a larger number of generation an even better performance can be expected.The mean-best and standard deviation produced by our PSO schemes for each instance are presented in Table 2 under algorithms PSO pos G (for global structure) and PSO pos R (for ring structure).The results are also compared against a steady state standard GA using cycle crossover, with swap mutation (mutation rate = 0.1).
From the results, we can see that PSO pos R with ring topology overall outperforms PSO pos G with global topology, while generally GA still performs slightly better than the PSO schemes we have for some cases.We understand that this inferiority is mainly caused by the simplification of the inertia component in the PSO operator template, since a random k-change is not good enough to represent a directed k-change which should be parameterised by its previous velocity.This drawback requires our further research on how to replicate and apply distance, so that the previous distance/velocity can be retained and applied during the next generation.If we are able to implement this, the PSO should display a better convergence pattern.Also, further tuning and exploration of PSO options will inevitably lead to improved performance.However, we should point out that the main aim of this paper is certainly not to produce sophisticated PSO schemes with competitive results.Instead, the intent of this work is to generalise PSO in a formal manner, adapt its working mechanism to the permutation problem domain with reasonably good performance, and hopefully show some future research directions on generalising PSO.In this case, performance comparison at a competitive level is not performed due to the fact that most of those results were produced by highly tuned/adapted hybrid algorithms [5,31] where the separate contribution of each component is usually hard to justify, and comparisons against them would be of limited value.

CONCLUSIONS AND FUTURE WORK
In this paper, we have presented how the original PSO operator can be generalised in a formal manner to the permutation problem domain using forma analysis, with both the formal descriptions of permutation and a stochastic PSO operator template defined.By considering the application of operators as a process of constraint satisfaction, we derived several concrete PSO schemes for permutation problem, each of which involves a different assumption made on the description of the search space.Through observations of the search patterns of the derived PSO schemes together with the ring structured extension of position-based PSO on the QAP benchmarks, it is clear that the description choice is a critical issue in operator design, and the position-based PSO scheme for QAP achieves a certain degree of convergence towards the optimum in a similar manner as the traditional PSO for realvector space, with results comparable to a standard GA.
More importantly, we have presented in this paper a principled approach to formally derive algorithms with regard to the actual problem domain, in which case the behaviors and the performance of the derived algorithms are directly related to the assumption we make to describe the search space.
In the future, efforts on the improvement of these discrete PSO schemes are possible by considering additional issues (e.g., topological search control, local search, and even parameter selections).Application of our methodology to a wider range of problems and optimisation techniques can also be explored.In addition, the interpretation of applying a directed k-change (e.g., distance replication) in the context of permutation problems should be studied in the future to give a better understanding of the working mechanism of PSO for those problems.

Figure 1 :
Figure 1: Illustration of the methodology based on forma analysis.

Figure 2 :
Figure 2: Decomposition of the flying dynamics of PSO.
(b)) implies that the exploration power of the whole population initially increases to a relatively Mean and AVG best of the population for Tai20a with Tai20a-mean and best with PSO pos Average velocity of the population for Tai20a with PSO pos Tai20a-velocity with PSO pos Mean and AVG best of the population for Tai20a with Tai20a-mean and best with PSO prec Average velocity of the population for Tai20a with PSO prec Tai20a-velocity with PSO prec AVG best of pop.Mean cost of pop.Mean and AVG best of the population for Tai20a with Tai20a-mean and best with PSO adj AVG velocity of pop.Average velocity of the population for Tai20a with PSO adj Tai20a-velocity with PSO adj

Figure 6 :
Figure 6: The mean cost, average best cost, and average velocity of the population over generation for Tai20a with different PSO schemes (with global topology).

Figure 7 :
Figure 7: Comparison of average best cost and velocity of the population over generation for Tai20a between the global and ring structured PSO pos schemes.

Table 1 :
Experimental results over 50 independent runs for different PSO schemes.

Table 2 :
Experimental results over 50 independent runs for PSO schemes with different topologies against a standard GA.