An Integrated Method Based on PSO and EDA for the Max-Cut Problem

The max-cut problem is NP-hard combinatorial optimization problem with many real world applications. In this paper, we propose an integrated method based on particle swarm optimization and estimation of distribution algorithm (PSO-EDA) for solving the max-cut problem. The integrated algorithm overcomes the shortcomings of particle swarm optimization and estimation of distribution algorithm. To enhance the performance of the PSO-EDA, a fast local search procedure is applied. In addition, a path relinking procedure is developed to intensify the search. To evaluate the performance of PSO-EDA, extensive experiments were carried out on two sets of benchmark instances with 800 to 20000 vertices from the literature. Computational results and comparisons show that PSO-EDA significantly outperforms the existing PSO-based and EDA-based algorithms for the max-cut problem. Compared with other best performing algorithms, PSO-EDA is able to find very competitive results in terms of solution quality.


Introduction
The max-cut problem is one of the most classical combinatorial optimization problems. It is formally defined as follows. Given an undirected graph ( , ), with vertices set = {1, . . . , } and edges set , each edge ( , ) ∈ being associated with a weight , the max-cut problem is to find a partition ( 1 , 2 ) of , so as to maximize the sum of the weights of the edges between vertices in the different subsets. Let = ( 1 , . . . , ) ∈ {1, −1} denote a solution of the max-cut problem. = 1 ( = −1) indicates that vertex is partitioned into 1 ( 2 ). Let = ( ) × be the symmetric weighted adjacency matrix of . The max-cut problem can be formulated as the following discrete quadratic optimization problem [1]: wherê= Diag( ) − is the Laplace matrix of .
The max-cut problem has long served as a challenging test for researchers testing new methods for combinatorial algorithms [2] and has a wide range of practical applications such as numerics, scientific computing, circuit layout design, and statistical physics. It is one of the Karp's original NP-complete problems [3].
Due to the significance of the max-cut problem in academic research and real applications, it has gained much attention over the last decade. Because of the NP-hardness of the max-cut problem, heuristics have a crucial role for the solution of large scale instances in acceptable computing time. Various heuristic methods have been proposed including rank-two relaxation heuristic [4], GRASP [5,6], scatter search [7], filled function method [1], dynamic convexized method [8], tabu Hopfield network and estimation of distribution [9], tabu search [2], particle swarm optimization [10], path relinking [11], breakout local search [12], and tabu search based hybrid evolutionary algorithm [13]. Among the above heuristic algorithms, breakout local search, path relinking, and tabu search based hybrid evolutionary algorithm are the best heuristics for solving challenging max-cut problems.  (8) if can be divided by 2 then (9) PSO procedure is executed to generate a new population Pop ( ). (10) else (11) EDA procedure is performed to generate a new population Pop ( ). (12) end if (13) LetPop( + 1) = Pop ( ). (14) if * is not improved after no continuous generations then (15) for eacĥ( = 1, . . . , ) do (16)̂= mutation(̂). (17)̂= (̂). (18) end for (19) end if (20) Let = + 1. Particle swarm optimization (PSO) [14] is one of the most popular population-based algorithms. In this technique, all particles search for food in the search space based on their positions and velocities. Every particle can adjust its flying direction by learning from its own experience and the performance of its peers [15]. Different variants of PSO have been shown to offer good performance in a variety of continuous and discrete optimization problems [16,17]. Although information between particles is shared with each other to some extent, it is performed in a strictly limited fashion, and the global information about the search space is not utilized.
Estimation of distribution algorithm (EDA) [18] is a new paradigm in the field of evolutionary computation and has been applied to solve many optimization problems [19][20][21]. It uses a probability model, which gathers the global information of the search space, to generate promising offsprings. The probability model is updated at each generation using the global statistical information. However, the local information of the solutions found so far is not utilized. The algorithm may get stuck at local optima due to lack of diversity.
Blum et al. [22] observed that complementary characteristics of different optimization heuristics benefit from hybridization; for example, see [23,24]. In this work, we focus on developing an integrated algorithm (PSO-EDA) based on PSO and EDA to benefit from the advantages of PSO and EDA. The integrated algorithm PSO-EDA consists of hybridization of PSO, EDA, local search procedure, path relinking, and mutation procedure. PSO is utilized to find local information of the search space quickly, while EDA is used to guide the search by the global information. Local search procedure and path relinking are applied to further improve the solution quality. To maintain the diversity, mutation procedure is adopted. The integrated algorithm overcomes the shortcomings of PSO and EDA and keeps a proper balance between diversification and intensification during the search. We use two sets of 91 benchmark instances from the literature to test the performances of the PSO-EDA. The comparisons of PSO-EDA with the existing PSO-based and EDA-based algorithms for the max-cut problem show that PSO-EDA significantly outperforms these algorithms in terms of solution quality and solution time. Compared with other metaheuristic algorithms, including GRASP, breakout local search, path relinking, and tabu search based hybrid evolutionary algorithm, PSO-EDA can find very competitive results in terms of solution quality. Moreover, PSO-EDA finds the best known solutions on 62 instances out of total 91 instances. In addition, its deviations range from 0.01% to 0.47%. It shows that the proposed algorithm is able to find high quality solutions of the max-cut problem.
The remainder of this paper is organized as follows. Section 2 describes a detailed explanation of the PSO-EDA. The computational results and comparisons are given in Section 3. The conclusion remarks are made in Section 4.

The Proposed Algorithm
2.1. The Framework of PSO-EDA. The general structure of the PSO-EDA is given in Algorithm 1. Essentially, PSO-EDA alternates between PSO procedure and EDA procedure. PSO procedure and EDA procedure play different roles in PSO-EDA. PSO procedure is used to gather the local information. The obtained local information is then used to Computational Intelligence and Neuroscience 3 update the probability vector while EDA procedure is used to guide the search by the global information. It generates new promising solutions. These two complementary procedures are iteratively performed to obtain high quality solutions.
In the PSO-EDA, ( − 1)-dimensional probability vector = ( 2 , . . . , ) ∈ [0,1] −1 is used to represent the probability model of the solution space, where ( = 2, . . . , ) is the probability of = 1 . Let be the current generation. At the beginning of PSO-EDA, a population Pop( ) consisting of particles is generated randomly, and each particle is further improved by a local search procedure. Let̂be the personal best position of particle , and let * be the best solution found so far. We initializê= and * = arg max{ ( ), = 1, . . . , .}. The probability vector is initialized as = (0.5, . . . , 0.5). Then, PSO procedure and EDA procedure are executed alternately. If can be divided by 2, EDA procedure is performed; otherwise, PSO procedure is executed. After that, a new population Pop ( ) is generated and is used to form the next population Pop( + 1). If the current best solution * is not improved after no continuous generations, the current personal best solutions can not guide the search efficiently. Eacĥis perturbed by a mutation procedure and improved by the local search procedure. The obtained solution is used to replace the personal best solution of particle . The above process is repeated until Maxcount of generations is reached.
The PSO-EDA consists of five main components: PSO procedure, EDA procedure, local search procedure, path relinking procedure, and mutation procedure. These procedures are described in detail in the following subsections.

PSO Procedure.
The standard PSO is introduced for solving continuous optimization problems. To deal with discrete optimization problems, Kennedy and Eberhart [26] developed a binary version of PSO. After that, many discrete versions of PSO [27][28][29] have been proposed. Recently, Qin et al. [28] proposed an algorithmic framework of discrete PSO (denoted by DPSO for short), and the application of DPSO to number partitioning problem has demonstrated the effectiveness of the proposed algorithm.
The basic idea of canonical PSO is that any particle moves close to the best of its neighbors and returns to the best position of itself so far. The DPSO follows the basic idea of canonical PSO. It uses one of the following equations to generate a new position for particle in the swarm: wherêand̂are personal best position of particle and the neighborhood best position, respectively; 1 , 2 , and 3 are three random numbers in [0, 1]; is chosen at random from the current swarm; and ∼, •, and ⊕ are three operators, and their definitions are as follows. Difference operator (∼): given any positions and , the difference of them, denoted by ∼ , is a sequence of least number of consecutive flip operators. Difference of two positions is used to act as velocity in the DPSO; that is, V = ∼ . Product operator (•): supposing that is a real number and V is a velocity (i.e., the difference of two positions), the product of them, denoted by • V, is a subsequence of V such that the length of this subsequence is [ |V|], where |V| is the length of V.
Sum operator (⊕): given a position and a velocity V, ⊕V starts with and flips all the variables in V to obtain a new position.
Equations (2) and (3) try to make particle moves close to the best position of itself so far and the best of its neighbors, respectively. Equation (4) introduces a stochastic factor to avoid premature convergence of DPSO. In each iteration, exactly one of the three equations is employed to update a particle.
Inspired by the idea in [28], our PSO procedure employs the basic structure of DPSO [28] and redefines the operators of DPSO based on the specific structure of the max-cut problem. Supposing that = ( 1 , . . . , ) and = ( 1 , . . . , ) are two solutions of max-cut problem, we define ∼ = { | ̸ = , ∈ {1, . . . , }}. It is used to determine the differences between and . In our PSO procedure, the velocity V is denoted as a set of variables, that is, ∼ . Different from the definition of operator • in DPSO, our operation •V generates a variable subset of V by removing each variable ∈ V from V with a probability . This operator increases the exploration ability of our PSO procedure. The operation ⊕ V generates a new solution. It starts from and flips all the variables in V.
In (3), particle tries to reduce the distance to the best of its neighborŝ. It is time consuming to updatêfor each particle in each generation, especially for large scale problems. In addition, the landscape analysis of max-bisection problem [30] shows that, in most cases, the distances between high quality solutions are very small. Their research result [30] indicates that the degree of similarity between̂and the current best solution * is very large. To speed up the search, each particle tries to move close to * . The search can concentrate fast around * . In our PSO procedure, (3) is replaced by The pseudocode of our PSO procedure is given in Algorithm 2. For each solution in the population Pop( ), a new position is updated by (2), (5), and (4) with probabilities prob , prob , and prob , respectively (lines (1)-(11)). We have prob + prob + prob = 1; that is, updating of a particle is influenced by exactly one of (2), (5), and (4). Then, the newly obtained position is further improved by a local search procedure (line (12)). We use a path relinking procedure, which will be described in Section 2.5, to intensify the search. And̂and * are updated (line (13)).

EDA Procedure.
EDAs produce offsprings through sampling according to a probability model. Probability models Generate a random number ∈ [0, 1).
Notice that = ( 1 , . . . , ) and its symmetric solution = (− 1 , . . . , − ) correspond to the same partition of . Since the solution space of the max-cut problem is symmetric, the traditional probability models used in other binary optimization problems can not be directly applied.
We propose a probability model according to the symmetric solution space of the max-cut problem. Our EDA procedure uses ( − 1)-dimensional vector = ( 2 , . . . , ) to characterize the distribution of promising solutions in the search space, where ( ̸ = 1) is the probability of = 1 . At the beginning of the PSO-EDA, vector is initialized as = (0.5, . . . , 0.5). The PSO-EDA performs the PSO procedure and EDA procedure alternately. After the PSO procedure, a new population Pop( ), which contains new local optimal solutions, is obtained. The EDA procedure identifies the best solutions in Pop( ), and the probability vector −1 is updated according to the following equation: where is the probability vector in the th generation, = ( 1 , . . . , ) is the th individual of the best solutions in Pop( ), and ∈ (0, 1) is the learning speed. Since ∈ {1, −1}, |( + 1 )/2| is binary. If = 1 , it holds |( + 1 )/2| = 1; otherwise, we have |( + 1 )/2| = 0. Wu and Hao [30] concluded from their experimental tests that the degrees of similarity between high quality solutions are very large. The best solutions, which are selected to update , may be very similar. It leads the EDA procedure to produce very similar new solutions. The range of the value of probability is limited to an interval [ min , max ] with the aim of avoiding search stagnation. More formally, the probability vector is reset as follows: In each generation of the PSO-EDA, EDA procedure generates new solutions via sampling according to the probability vector . A new solution = ( 1 , . . . , ) is generated as follows. First, EDA procedure randomly generates 1 ∈ {1, −1}. Then, for every , ̸ = 1, to be determined, a random number The pseudocode of EDA procedure is given in Algorithm 3. In the procedure, firstly we identify best solutions in population Pop( ) (line (1)). The probability vector is generated according to (6), as well as (7) in line (2). Then, new solutions are generated by the probability vector , and they are further improved by local search procedure (lines (3)-(13)). A path relinking procedure is employed to intensify the search. Line (14) updateŝand * if a new better solution is found.

Local Search
Procedure. Local search has been proven to be very helpful when incorporated in PSO and EDA [29,31]. To enhance the exploitation ability, a local search procedure is adopted. It is a simple modification of the local search method (denoted by FMMB) [32] for the max-bisection problem. Experimental results show that the FMMB is very effective. The max-bisection problem consists in partitioning the vertices into two equally sized subsets so as to maximize Computational Intelligence and Neuroscience Generate ∈ (0, 1) at random. (7) if < then (8) L e t = 1 . the sum of the weights of crossing edges. It is a special case of the max-cut problem.
The steps for our local search procedure are presented in Algorithm 4. The local search procedure performs passes repeatedly until a pass fails to generate a better solution. Each pass is described between lines (2) and (19). Let best be the current best solution found in a pass and let be the set of unlocked vertices. Suppose that 0 is a starting solution and its corresponding partition is ( 1 , 2 ). A pass progresses in epochs. At the beginning of a pass, all vertices are unlocked (i.e., are free to be moved). We move free vertices according to their gains. The gain of a vertex is the objective function  value and would increase if vertex is moved from its current belonged subset to the other. More formally, Line (4) calculates the gains of all free vertices according to (8). There are two steps in each epoch. An epoch consists of lines (6)- (17). Firstly, the local search procedure moves an unlocked vertex with the highest gain in from its current belonged subset (denoted by ) to the other subset (denoted by ). And the current moved vertex is not allowed to be moved again during this pass. Line (8) updates the gains of the affected vertices. Then, an unlocked vertex with the highest gain in is moved to . It is locked in this pass. The gains of the affected vertices are updated. To speed up our local search procedure, a pass ends if /10 epochs have been performed. The best partition best observed during the pass is returned. Then, another pass starts with 0 = best . The local search procedure terminates when a pass fails to find a better solution.

Path Relinking Procedure.
Path relinking is originally introduced in [33]. It explores trajectories that connect initiating solutions and guiding solutions to find better solutions. Our path relinking procedure uses the current best solution * as the guiding solution. Algorithm 5 presents the path relinking procedure in detail. Suppose that is an initiating solution, which is generated by the PSO procedure or the EDA procedure. Given two solutions 1 and 2 , the difference set Δ( 1 , 2 ) between 1 and 2 is defined as The distance ( 1 , 2 ) between 1 and 2 is defined as the number of flipping variables for transforming 1 to 2 . More formally, Notice that the solution space of the max-cut problem is symmetric; that is, and = −( 1 , . . . , − ) represent the same partition. In order to reduce the difference set and speed up the path relinking procedure, we set = when ( , * ) > /2. The gains of all vertices are calculated according to (8) in line (5). The difference set Δ( , * ) is determined (line (6)). In each iteration, a vertex with the highest gain in is identified (line (8)). If flipping will result in finding a better solution than * , we let = − and stop the path relinking procedure (line (10)). Otherwise, a vertex with the highest gain in Δ( , * ) is identified (line (12)), and the vertex is moved from its current belonged subset to the other subset ; that is, is flipped. The gains of the affected vertices are then updated, and is deleted from Δ( , * ) (line (13)). After that, vertex with the highest gain in Δ( , * ) ∩ is determined (line (15)). The gains of the affected vertices are updated, and is deleted from Δ( , * ) (line (16)). The above process is repeated until a better solution is found or Δ( , * ) = ⌀.
Computational Intelligence and Neuroscience 7  2.6. Mutation Procedure. The PSO-EDA uses the personal best position̂( = 1, . . . , ) and the current best solution * found so far to guide the search. At the beginning of the search, the degrees of the similarity between̂and * are relatively small, which guides the search to find good solutions quickly. However, with progress of the search, the degrees of the similarity between̂and * become large. It makes the search to find a better solution hard.
To make the search retain in a long term, we apply a simple mutation procedure tô. It diversifies the search. The mutation procedure flips a variable with a probability = 0.2. In other words, for everŷ, a random number ∈ (0, 1) is generated. The mutation procedure set̂= −̂if < .

Computational Results and Analysis
In this section, we report the computational experiments to show the efficiency and effectiveness of the PSO-EDA. The PSO-EDA was programmed in C and the experiments were run on PC with AMD processor (3.4 GHz CPU and 4 GB RAM).

Test Instances and Parameter Settings.
We use two sets of benchmark instances to test the PSO-EDA. They have been used to test many algorithms for the max-cut problem and max-bisection problem in the last two decades. The first set is G-set graphs [34]. The second set is from [4]. The instances of the second set arise from Ising Spin glasses cubic lattice graphs.
There are several parameters in our proposed PSO-EDA. The values of the population size and the learning speed and and min and max highly affect the performance of PSO-EDA. To investigate the influence of those parameters on the performance of PSO-EDA, we fixed Maxcount = 100, no = 6, prob = 0.25, prob = 0.05, and prob = 0.7 and implemented the Taguchi method of design of experiment (DOE) [31,35] by using problem G59. Combinations of different values of those parameters are given in Table 1.
For each parameter combination, we run PSO-EDA 5 times independently. We use the orthogonal array 16 (4 4 ) and the orthogonal array and the obtained average cut values and average CPU time (time) are listed in Table 2.
From Table 2, one can observe that the PSO-EDA with the third parameter combination (i.e., = 10, = 0.3, = 3, min = 0.2, and max = 0.8) performed better than other parameter combinations in terms of average solution quality and solution time. In the following experiments, the values of parameters in PSO-EDA are given in Table 3.

Comparison of the PSO-EDA with Existing PSO-Based and EDA-Based Algorithms.
In this subsection, we compared PSO-EDA with three PSO-based and EDA-based algorithms, that is, a memetic algorithm with genetic particle swarm optimization and neural network (GPSO-CDHNN) [10], a discrete Hopfield network with estimation of distribution algorithm (DHNN-EDA) [25], and tabu Hopfield neural network with estimation of distribution algorithm (THNN-EDA) [9]. We have run PSO-EDA 10 times with parameters listed in Table 3 on some instances used in [9,25]. Tables 4 and  5 list the best objective function value ( best ), the average Computational Intelligence and Neuroscience 9   Table 4 are taken from [10]. The data of DHNN-EDA and THNN-EDA is from [9]. Both DHNN-EDA and THNN-EDA were terminated within the same run time, which is shown in the subcolumn "time" under the column "THNN-EDA." Note that GPSO-CDHNN was tested on DELL-PC (Pentium 4 2.80 GHz), and DHNN-EDA and THNN-EDA were run on a PC (Pentium 4 2.9 GHz with 2.0 G of RAM). According to the CPU speed data from http://www.cpubenchmark.net/, their computers are 6.15 times slower than our computer. Considering the difference between their computers and our computer, we normalize the CPU times of GPSO-CDHNN, DHNN-EDA, and THNN-EDA by dividing them by 6.15.
From Table 4, we observe that PSO-EDA is able to find better solutions compared to GPSO-CDHNN for 14 instances out of 15 selected instances from the first set. In addition, the average objective function values of PSO-EDA are better compared to GPSO-CDHNN for all tested instances from the first set. The CPU time of PSO-EDA is smaller than that of GPSO-CDHNN. These mean that PSO-EDA has a better performance than GPSO-CDHNN.
From Tables 4 and 5, we can see that the best objective function value and the average objective function value of PSO-EDA are much better than those of DHNN-EDA for all 24 considered instances from the first set, as well as 20 instances from the second set. The PSO-EDA takes less CPU time compared to DHNN-EDA for all tested instances, expect for G1, G2, and G3. Therefore, PSO-EDA significantly outperforms DHNN-EDA for these instances.
THNN-EDA and PSO-EDA found the best objective function values on 23 and 40 out of the total 44 tested instances, respectively. The average objective function value of PSO-EDA is better compared to THNN-EDA for 13 instances from the first set, while it fails to match the average results of THNN-EDA for 6 instances from the first set. PSO-EDA is able to find better average results than THNN-EDA for all instances from the second set. The PSO-EDA takes less CPU time compared to THNN-EDA for all tested instances, expect for G1, G2, and G3. These observations reveal that PSO-EDA performs better than THNN-EDA.
From all the results mentioned above, we can conclude that the performance of PSO-EDA is much better than the existing PSO-based and EDA-based algorithms for the maxcut problem.

Comparison of the PSO-EDA with Other Metaheuristic
Algorithms. In this subsection, the PSO-EDA is compared with several metaheuristic algorithms for the max-cut problem, including grasp based heuristic (GRASP-TS/PM) [6], path relinking based heuristic (PR2) [11], breakout local Table 6: Comparison of the results obtained by the GRASP-TS/PM, PR2, BLS, TSHEA, and PSO-EDA on instances from the first set.   Instances Best GRASP-TS/PM [6] PR2 [11] BLS [12] TSHEA [13] PSO-EDA best avg best avg best avg best avg best avg search (BLS) [12], and tabu search based hybrid evolutionary algorithm (TSHEA) [13]. To compare PSO-EDA with these state-of-the-art algorithms, the maximum generation Maxcount is increased to 2000. We run PSO-EDA 10 times. Tables 6 and 7 report the best known solutions (Best), the best values ( best ), and average solution values ( avg ) obtained by GRASP-TS/PM, PR2, BLS, TSHEA, and PSO-EDA, respectively. Since GRASP-TS/PM and BLS do not report their results on the instances of the second set, we do not include comparisons with GRASP-TS/PM and BLS on the instances of the second set. The mark "-" in Tables 6 and 7 means that the experimental result is not reported. The subcolumn "gap" under the column "PSO-EDA" lists the deviation of the best solution value obtained by PSO-EDA with respect to the best known solution value Best. The deviation is calculated as follows: ((Best − best )/Best) × 100.
Since GRASP-TS/PM, PR2, BLS, TSHEA, and PSO-EDA were coded on different programming languages and run on different hardware platforms, it is very difficult to make a completely fair comparison of the computing time. Therefore, similar to [13], we only compare algorithms based on the solution quality.
We can make the following observations on the results in Tables 6 and 7: (1) Table 6 shows that GRASP-TS/PM, PR2, BLS, TSHEA, and PSO-EDA find the best known solutions on 25, 36, 48, 54, and 43 instances out of the first 54 small or medium instances from the first set, respectively. For 17 large instances from the first set, BLS and TSHEA find the best known solutions on 5 and 13 instances, respectively. The experimental results in Tables 6 and 7 show that BLS and TSHEA are the best performing algorithms.
(2) Compared with GRASP-TS/PM and PR2, PSO-EDA finds very competitive results on the first 54 small or medium instances from the first set. In terms of best solution quality and average solution quality, PSO-EDA is better than PR2 on 15 large instances from the first set.
For 20 instances from the second set, in terms of best solution quality, PSO-EDA is better than PR2 on 9 instances and same as PR2 on 11 instances. In terms of average solution quality, PSO-EDA is better than PR2 on 15 instances, same as PR2 on 3 instances, and worse than PR on 2 instances.
(3) PSO-EDA finds the best known solutions on 62 instances out of total 91 instances. In addition to the other 29 instances, PSO-EDA can obtain the best solution with very small deviations to the best known solutions. The range of deviations is only from 0.01% to 0.47%.
(4) For the large scale instances, the performance of PSO-EDA is not stable. Two main reasons are as follows: (I) with the increase of the instance size, the number of the local optima increases rapidly and (II) the degree of similarity between high quality solutions is generally very large [30].
The above computational results show that the proposed algorithm is very effective for solving the max-cut problem.