A Hybrid Differential Evolution and Tree Search Algorithm for the Job Shop Scheduling Problem

The job shop scheduling problem JSSP is a notoriously difficult problem in combinatorial optimization. In terms of the objective function, most existing research has been focused on the makespan criterion. However, in contemporary manufacturing systems, due-date-related performances are more important because they are essential for maintaining a high service reputation. Therefore, in this study we aim at minimizing the total weighted tardiness in JSSP. Considering the high complexity, a hybrid differential evolution DE algorithm is proposed for the problem. To enhance the overall search efficiency, a neighborhood property of the problem is discovered, and then a tree search procedure is designed and embedded into the DE framework. According to the extensive computational experiments, the proposed approach is efficient in solving the job shop scheduling problem with total weighted tardiness objective.


Introduction
The job shop scheduling problem JSSP has been known as a very stubborn combinatorial optimization problem since the 1950s.In terms of computational complexity, JSSP is NPhard in the strong sense 1 .Therefore, even for very small JSSP instances, it is by no means easy to guarantee the optimal solution.In recent years, the metaheuristics-such as genetic algorithm GA 2, 3 , tabu search TS 4, 5 , particle swarm optimization PSO 6, 7 , and ant colony optimization ACO 8, 9 -have clearly become the research focus in practical optimization methods for solving JSSPs.
Remarkably, most recent research concentrates on the hybridization of two or more heuristics to solve JSSP.For example, ACO is hybridized with TS in 10 ; GA is hybridized with VND variable neighborhood decent in 11 ; a hybrid algorithm based on PSO, VNS Mathematical Problems in Engineering variable neighborhood search , and SS scatter search is proposed in 12 for solving a biobjective JSSP.The underlying motivation for constructing hybrid algorithms is that different search mechanisms and neighborhood structures are usually complementary to each other i.e., when one fails, the other may be effective .Therefore, a combination of several optimizers is likely to promote the probability of finding optimal solutions.A common strategy is to hybridize an algorithm good at large-scale exploration diversification with an algorithm good at fine-scale exploitation intensification , which provides reliable searching ability for tackling complex solution spaces.
In a general JSSP instance, a set of n jobs J {J j } n j 1 are to be processed on a set of m machines M {M k } m k 1 under the following basic assumptions.
i There is no machine breakdown.
ii No preemption of operations is allowed.
iii The transportation time and the setup time can be neglected.
iv Each machine can process at most one job at a time.
v Each job may be processed by at most one machine at a time.
Each job has a fixed processing route which traverses all the machines in a predetermined order, and the manufacturing process of a job on a machine is called an operation.The duration time of each operation is fixed and known.Besides, a preset due date d j and a preset weight w j are given for each job.Due date is the preferred latest finishing time of a job, so completion after this specific time will result in losses such as a worsened reputation in customers.Weights reflect the importance level of the orders from different customers, larger values suggesting higher strategic importance.If we use C j to denote the completion time of job j, the objective function of JSSP can be makespan C max max n j 1 {C j } , maximum lateness L max max n j 1 {C j − d j } , total weighted tardiness TWT n j 1 w j T j , where T j max{0, C j − d j } , number of tardy jobs U n j 1 U j , where U j 1 if C j > d j and U j 0 otherwise , and so forth.
Up till now, most research on JSSP has been focused on the makespan criterion.However, due-date-related performances are becoming more significant in the make-toorder manufacturing environment nowadays.In this sense, the total weighted tardiness measure better reflects the critical factors that affect the profits of a firm.Meanwhile, from the theoretical perspective, total weighted tardiness is more difficult to optimize than makespan.According to the concepts of computational complexity, minimizing C max is only a special case of minimizing TWT, which means the complexity of solving JSSP with total weighted tardiness objective is much greater than that of solving JSSP with makespan objective.Below we will use the abbreviation "TWT-JSSP" to denote the job shop scheduling problem with total weighted tardiness objective.
The rest of this paper is organized as follows.Section 2 provides a brief review on existing solution methods for TWT-JSSP and the differential evolution algorithm.Section 3 discusses the mathematical model of TWT-JSSP and a neighborhood property.Section 4 describes the design of a hybrid differential evolution algorithm for solving TWT-JSSP.Section 5 presents the computational results.Finally, Section 6 concludes the paper.

Existing Algorithms for TWT-JSSP
The contributions on TWT-JSSP are relatively rare in the literature.The only exact solution methodology is the branch-and-bound algorithm proposed by Singer and Pinedo 13 , while all the following surveyed methods belong to the heuristic category.
Reference 14 presents efficient dispatching rules for sequencing the operations in TWT-JSSP, the most powerful one being the ATC apparent tardiness cost rule.In 15, 16 , the authors propose modified shifting bottleneck heuristics, in which the subproblems are solved by dispatching rules the basic ATC rule or the BATCS rule for complex job shops .The large step random walk LSRW algorithm is designed in 17 and aimed at the TWT-JSSP with release times Jm|r j | w j T j .References 2, 3 present hybrid genetic algorithms for TWT-JSSP.Reference 18 presents a rolling horizon approach for TWT-JSSP, in which each time window is scheduled with a modified shifting bottleneck heuristic.Reference 19 presents a tabu search algorithm for the generalized TWT-JSSP with release times and precedence constraints.Recently, a new meta-heuristic called electromagnetic algorithm EM 20 has been explored and tested on TWT-JSSP.
Although these algorithms are effective, they have not fully utilized the inherent properties of TWT-JSSP.Hence, they will not perform satisfactorily when faced with largescale instances of the problem.Exploring the structural properties including neighborhood properties of TWT-JSSP is an important research direction, which also constitutes the motivation of this study.

The Differential Evolution Algorithm
The differential evolution DE algorithm, which was first proposed by Storn and Price 21 in the mid-1990s, is a relatively new evolutionary optimizer.Characterized by a novel mutation operator, the algorithm has been found to be a powerful tool for continuous function optimization 22 .Due to its easy implementation, quick convergence and robustness, the DE algorithm is becoming increasingly popular in recent years.A wide range of successful applications have been reported, such as the design of fixed-structure robust controllers 23 , space trajectory optimization 24 , multiarea economic dispatch 25 , and exergoeconomic analysis and optimization 26 .
Despite the low computational complexity, DE has also been shown to have some weaknesses.In particular, DE is good at exploring the search space and locating the promising region, but it is slow at exploiting the high-quality solutions 27 .Recently, some researchers try to improve the performance of DE by hybridizing it with other local search-based algorithms.In 28 , the authors propose the 2-Opt based DE 2-Opt DE which is inspired by 2-Opt algorithms to accelerate DE.They show that 2-Opt DE can outperform the original DE in terms of solution accuracy and convergence speed.In 29 , the authors incorporate the orthogonal design method into DE to accelerate its convergence rate.They show that the proposed approach outperforms the classical DE in terms of the quality, speed, and stability of the final solutions.
Due to its continuous feature, the traditional DE algorithm cannot be directly applied to scheduling problems with inherent discrete nature.Indeed, in canonical DE, each solution is represented by a vector of floating-point numbers.But for scheduling problems, each solution is a permutation of integers.To address this issue, two kinds of approaches can be identified in the literature.
1 A transformation scheme is established to convert permutations into real numbers and vice versa 30, 31 .In this way, we only need to add a few lines to the encoding and decoding procedures, and it is not necessary to change the implementation of DE itself.The advantage is that the search mechanism of DE is well preserved, while the disadvantage is the redundancy in the mapping from real to permutation spaces.
2 The mutation and crossover operators in DE are modified to suit the permutation representation 32, 33 .Clearly, the difficulty is how to redesign these operators such that the characteristics of high-quality solutions can be inherited and exploited.The design of operators should be problem dependent and thus requires a specific analysis of the optimization problem.Therefore, the lack of generality is a disadvantage of this approach.
Despite the success on permutation flow shop scheduling problems, the application of DE to JSSP has rarely been reported.The job shop model is more complex because the processing sequences of each machine need to be optimized separately.To tackle such a difficult problem, we design a tree-based local search module which can enhance the exploitation ability of DE.To our knowledge, this is the first attempt that DE is applied to TWT-JSSP.

The Mathematical Model and Neighborhood
Properties of TWT-JSSP

The Mathematical Model and Its Duality
We utilize the concept of disjunctive graph for formulating TWT-JSSP 34 .In the graph G N, A, E , N O ∪ {0} {0, 1, 2, . . ., n × m} is the set of nodes, where O {1, . . ., n × m} corresponds to the operation set of the JSSP instance.Node 0 stands for a dummy operation which starts before all the real operations, while the starting time and the processing time of this dummy operation are both zero.A is the set of conjunctive arcs, and each conjunctive arc indicates the processing order of two operations belonging to the same job.Meanwhile, node 0 is connected with the first operation of each job with a separate conjunctive arc.In other words, if we use F O to denote the set of the first operations of all jobs, then for all f j ∈ F O , 0, f j ∈ A. E k∈M E k is the set of disjunctive arcs, where E k represents the disjunctive arcs related with machine k.Each disjunctive arc connects two operations that should be processed by the same machine, but the processing order of the two operations is yet to be determined.
Under the disjunctive graph representation, TWT-JSSP can be described as a mixedinteger linear disjunctive programming model:

3.1
In the above formulation, s i denotes the starting time of operation i, and the decision variable S is a vector consisting of the starting times of all the operations.p i denotes the required processing time of operation i.For the dummy operation 0, we set s 0 p 0 0. u j represents the last operation of job j, and thus U O {u 1 , u 2 , . . ., u n } denotes the set of ultimate operations of all the jobs.For any operation i, d i and w i , respectively, denote the due date and the weight of the job to which operation i belongs.For the ultimate operation u j of job j, T u j is the tardiness of job j.
From the perspective of disjunctive graphs, finding a feasible solution to JSSP is equivalent to determining the directions of all the disjunctive arcs such that, for any i 1 , i 2 ∈ E, only one of the two disjunctive inequalities in constraint b is satisfied.Now, let us suppose the directions of all disjunctive arcs have been determined, and we use σ to denote the set of directed disjunctive arcs.In this situation, the problem 3.1 becomes a linear programming model, that is,

3.2
As suggested by 19 , the dual problem of the linear program 3.2 is a maximum cost flow problem:

3.3
In the above formulation, F i 1 ,i 2 represents the flow over the arc i 1 , i 2 ; U { u 1 , 0 , u 2 , 0 , . . ., u n , 0 } { u j , 0 } n j 1 is a newly defined arc set, each arc pointing from the last operation of job j to node 0; TC denotes the total cost of the flows.Theorem 3.1.For the maximum cost flow problem 3.3 , there exists an optimal solution F * which satisfies the following: each node (except node 0) has at most one incoming arc with nonzero flow.
Proof.See Appendix A.

A Neighborhood Property for the Swap of Adjacent Operations
We know from the previous subsection that, given a feasible σ i.e., the directions of all disjunctive arcs , the schedule is determined, and meanwhile, a network flow graph is associated with the current schedule.The task of neighborhood search is to modify certain Mathematical Problems in Engineering parts of σ to get σ , so that the total weighted tardiness can be reduced, that is, TWT min σ < TWT min σ or equivalently, TC max σ < TC max σ .Definition 3.2 block .A sequence of operations in a critical path is called a block if 3.1 it contains at least two operations and 3.2 the sequence includes a maximum number of operations that are consecutively processed by the same machine.
If we want to improve the schedule under the current σ, we should only consider modifying the disjunctive arcs that satisfy the following two conditions: 1 the arc belongs to a certain block and 2 the arc carries positive flow in the dual network.The latter condition implies that this disjunctive arc is on the critical path of at least one tardy job, so altering this arc can possibly reduce the TWT.Now we consider whether swapping two adjacent operations in a block can really improve the TWT.Under a given σ, suppose the operations 1, 2, . . ., z constitute a block in the corresponding schedule, and the associated network flows are partially shown in Figure 1.The amount of flow on the outgoing disjunctive arc of operation i is denoted by x i and suppose x i > 0, for all i 1, 2, . . ., z − 1.We assume this dual solution satisfies the condition described in Theorem 3.1, that is, each node can have at most one incoming arc with positive flow.In this case, the input flows of these nodes all come from the input disjunctive arcs, so the input conjunctive arcs of these nodes must carry zero flow and thus they are not marked in the figure.However, each of the nodes can still have an outgoing arc with positive flow.For simplicity, these outgoing arcs with possibly positive flow are drawn in solid arrows in the figure, and the amount of flow is denoted by F i 0, respectively.
After executing the SWAP operator, we can construct a new feasible solution to the dual problem by adjusting the amount of flows on each arc.In the new network, the flow on the outgoing disjunctive arc of operation i is denoted by y i .In order to enable accurate analysis, we keep all the F i values constant in the process of adjusting the local flows within the block so that the flow equilibrium outside the considered scope will not be affected .

Theorem 3.3. Suppose a block with consecutive positive flows contains the following operations:
1, 2, . . ., α, β, . . ., z .If the condition is satisfied, then swapping the two operations α and β will not lead to improvement on the objective function (TWT).
Proof.See Appendix B.
Therefore, the function of Theorem 3.3 is that it helps to exclude some nonimproving moves in the local search process, so that the optimization efficiency can be improved.However, such a greedy mechanism must be combined with a large-scale randomized search like DE in order not to get trapped by local optima.

The DE Optimization Framework
Like other evolutionary optimizers, DE is a population-based stochastic global optimizer.In DE, each individual in the population is represented by a D-dimensional real vector x i x i,1 , x i,2 , . . ., x i,D , i 1, . . ., SN, where SN is the population size.In each iteration, DE employs the mutation and crossover operators to generate new candidate solutions and then applies a one-to-one selection policy to determine whether the offspring or the parent can survive to the next generation.This process is repeated until a preset termination criterion is met.
The standard DE algorithm can be described as follows.
Step 2 Mutation .For i 1, . . ., SN, generate a mutant solution v i as follows: where x best denotes the best solution in the current population; r 1 and r 2 are randomly selected from {1, . . ., SN} such that r 1 / r 2 / i; F > 0 is a weighting factor.
Step 3 Crossover .For i 1, . . ., SN, generate a trial solution u i as follows: where r j is an index randomly selected from {1, . . ., D} to guarantee that at least one dimension of the trial solution u i differs from its parent x i ; ξ j is a random number generated from the uniform distribution U 0, 1 ; CR ∈ 0, 1 is the crossover parameter.
Step 4 Selection .If u i is better than x i , let x i u i .
Step 5.If the termination condition is not satisfied, go back to Step 2.
According to the algorithm description, DE has three important parameters, that is, SN, F, and CR.In order to ensure a good performance of DE, the setting of these parameters should be reasonably adjusted based on specific optimization problems.
It is worth noting that there exist other variants of the DE algorithm with respect to the mutation and crossover method 35 .In fact, the above procedure is noted as "DE/best/1/bin" in the literature.If we change the mutation policy, we can obtain the "DE/rand/1/ * " variant: v i x r 1 F × x r 2 − x r 3 where the base solution x r 1 is also randomly chosen from the population.

The Encoding and Decoding Schemes
The encoding scheme used here is based on the random key representation and the smallest position value SPV rule.Each solution is described by n × m continuous values, and in the decoding process, this set of values will be transformed to a permutation of operations by the SPV rule.
Formally, let x i x i,1 , x i,2 , . . ., x i,n×m denote the ith solution, where x i,d is the position value of the ith solution with respect to the dth dimension d 1, . . ., n × m .To decode a solution, the SPV rule is applied to sort the position values within a solution and then determine the relative positions of the corresponding operations.This process is exemplified in Table 1 for a problem containing 6 operations suppose N {1, . . ., 6} .In this example, the smallest position value −0.99 resides in the second dimension, and thus, the operation "2" comes first in the resulting sequence the third row of the table .After sorting all these dimension values constituting solution x i , the operation sequence π i 2, 5, 4, 1, 6, 3 can be obtained.
Then, the decoded operation sequence π i can be used to build an active schedule for TWT-JSSP.The detailed schedule building algorithm is described as follows.
Input.An operation sequence π.
Step 1.Let Q 1 O {1, . . ., nm} the set of all operations , R 1 F O {f 1 , . . ., f n } the set of first operations of each job .Set t 1.
Step 2. Find the operation i * arg min i∈R t {r i p i }, and let m * be the index of the machine on which this operation should be processed.Use B t to denote all the operations from R t which should be processed on machine m * .
Step 3. Delete from B t the operations that satisfy r i r i * p i * .
Step 4. Find the operation o which belongs to B t and meanwhile ranks first in π.Schedule operation o on machine m * at the earliest possible time.
where JS o is the immediate job successor of operation o.
Step 6.If Q t 1 / ∅, set t ← t 1 and go to Step 2. Otherwise, the decoding procedure is terminated.
In the above description, the release time r i equals the completion time of the immediate job predecessor of operation i.So r i p i is the earliest possible completion time of operation i. Q t represents the set of operations yet to be scheduled at iteration t, while R t represents the set of ready operations whose job predecessors have all been scheduled at iteration t.

The Local Search Module Based on Tree Search
As mentioned in Section 2.2, DE alone does not have satisfactory "exploitation" ability.In order to cope with complex search spaces, it is usually required that a local search module be embedded into the general framework of DE.In this paper, we devise a tree-based local search optimizer by borrowing ideas from the filter-and-fan algorithm in 36 .In each iteration of DE, the local search is carried out for the best e% of solutions in the current population immediately after the selection phase.Thus, e is an important parameter for adjusting the frequency of local search and achieving a balance between exploration and exploitation.
In order to improve a selected solution, the proposed tree search algorithm generates compound neighborhoods for the solution based on the SWAP operator.The algorithm searches in a breadth-first manner, but unlike the beam search heuristic, each tree node represents a complete schedule rather than a partial schedule.The tree is gradually expanded by applying the SWAP operator iteratively.In each trial, the pair of operations to be swapped is randomly chosen from the critical blocks in the current schedule.To avoid repeated search, the reverse of the previous SWAP operator is prohibited.For example, if the current schedule is obtained by swapping α, β , then a swap on β, α if it is in the critical block again should be tabooed in the immediate expansion from the current node.In the search process, we can utilize the relevant neighborhood property Theorem 3.3 to promote the efficiency.In other words, when the theorem predicts that a certain move will not lead to improvement, then the action is canceled and another neighborhood move will be tried.
The proposed tree search algorithm is heuristic in nature, because it is computationally infeasible to enumerate all the possible swaps of the critical operations.Indeed, the algorithm only makes η 2 trails for each solution other than the "root" solution.Meanwhile, we need to apply a pruning strategy in the breadth-first search process in order to control the computational time.In particular, only the η 1 best solutions on each level of the tree will be exploited in the subsequent trials.We implement the tree search algorithm using the queue data structure as follows.
Input.A selected base solution σ.
Step 1.Let l 1. Create an empty queue.
Step 2. Try applying the SWAP operator on η 1 different locations in the critical blocks of σ.Let the produced η 1 solutions enter the queue.Denote the best solution among these by σ * .
Step 3. Perform the following steps for η 1 times.
3.1 Take out the first solution in the queue, denoted by σ f .3.2 Try applying the SWAP operator on η 2 different locations in the critical blocks of σ f .Let the produced η 2 solutions enter the queue.

Mathematical Problems in Engineering
Step 4. Retain the best η 1 solutions in the queue, and delete the rest η 1 η 2 − 1 ones.Denote the currently best solution in the queue as σ * l .If TWT σ * l < TWT σ * , let σ * σ * l .
Step 5. Let l ← l 1.If l < L, go back to Step 3.
According the above description, the tree has η 1 nodes on level 1, each of which will be expanded to η 2 nodes on level 2. Thus, there are totally η 1 × η 2 nodes on level 2. However, only the best η 1 nodes among these have a chance to be exploited and further expanded to level 3, while the rest will be abandoned.In this way, the number of nodes being considered on each level is controlled at η 1 × η 2 and will not increase exponentially.After expanding to L levels, the algorithm is terminated.
For example, if we set η 1 3, η 2 2, and L 4, the entire tree structure may look like Figure 2. The starting solution is denoted by σ 0 .On each level from l 2, three promising solutions are expanded further while the other three are discarded.The best solution found by this local search endeavor may appear on the last level.
The complexity of the tree search algorithm can be briefly analyzed as follows.There are roughly η 1 × L solutions that need to be expanded in the tree search process.For each expansion, the algorithm should first find the critical paths related with the tardy jobs.According to the Bellman's algorithm 37 , this can be done in O nm time.Then, the neighborhood operator has to be applied for approximately η 1 η 2 × L times in the tree search process.Therefore, the computational complexity of the algorithm can be described as Compared with the extensive exploration behavior of DE, the proposed tree search algorithm works in a greedy manner.First, each neighborhood move is performed on the critical blocks of the corresponding schedule, because only such moves are possible to produce improvements.Second, the neighborhood property is utilized to exclude some unpromising moves when there are more than η 2 candidates to choose from.Third, on each level of the tree, only η 1 best solutions from the totally η 1 × η 2 will be further considered.These features make the tree search algorithm extremely concentrated, which provides a complementary mechanism to DE's search process.Thus, using the tree search as an embedded local search module is beneficial for enhancing the exploitation capability and the overall performance of the hybrid DE.

The Test Problems and Parameter Setting
In order to test the performance of the proposed hybrid DE abbreviated as HDE hereinafter , randomly generated TWT-JSSP instances with different sizes are used in the computational experiment.For a specific problem size, the processing route of each job is a random permutation of the m machines, and the required processing time of each operation follows a uniform distribution U 1, 99 and takes only integer values.The due date of each job is set based on the total processing time of the job as d j u × f × i∈O j p i .In this expression, u ∼ U 1, 1.1 × max{1, n/m} is a random number uniformly distributed in the interval 1, 1.1 × max{1, n/m} , and O j denotes the set of operations that constitute job j. f ∈ {1.1, 1.3, 1.5} is a coefficient that reflects the tightness level of the due date setting.The weight of each job integer values follows a uniform distribution, that is, U 1, 10 .In order to reduce random errors in the computation, we randomly generate 5 different instances for each due date tightness and scale.In particular, 10 problem sizes from 100 operations to 500 operations and 3 due date levels f 1.5 for loose due dates, f 1.3, for moderate due dates and f 1.1 for tight due dates are tested, so the total number of instances is 150.The first two columns of Tables 2, 3 and 4 list the scales of the 10 instance sets.We compare the performance of the proposed HDE with the hybrid genetic algorithm GLS for TWT-JSSP 2 .The algorithms have been implemented using Visual C 2010 and tested on a platform of Intel Core i5-750 2.67 GHz, 3 GB RAM, and Windows 7.
In this experiment, the parameters of HDE are set as follows.
iii Termination criterion: the best-so-far solution has not been updated for 50 generations or controlled by the external computational time limit .

The Results and Discussions
The computational results are processed in the following way before listed in Tables 2, 3  Meanwhile, with respect to the first instance of each set under f 1.3 marked with "#-1" where # represents the index of instance sets , we record the average computational time over 10 independent runs of the two algorithms and plot the data as bar graphs in Figure 3.The following comments can be made according to the results displayed in Tables 2-4 and Figure 3.
1 For most instances, the gap between the best and the worst solutions obtained by HDE is smaller than the gap obtained by GLS, which implies that HDE performs more robustly in different executions and for different scheduling instances.
2 When the due dates are tighter f 1.3 or 1.1 , we find that the solution quality of GLS is considerably inferior to that of HDE.This is because under tight due dates, many jobs are prone to be tardy, and the optimization difficulty increases systematically.In this situation, the proposed tree search mechanism exhibits greater advantage.The search is more effective because the neighborhood moves are conducted on the critical blocks and a part of unpromising moves are eliminated with simple calculations.To certain extent, HDE has overcome the blindness of traditional local search, and thus it can access the high-quality solutions with larger probability.
3 By comparing the computational time, we see that, for most instances, the consumed time of HDE is less than that of GLS.More remarkably, the increasing speed of computational time with instance size is slower on the part of HDE.This is because the mutation and crossover based on random key representation in HDE is more efficient than the crossover and mutation based on operation sequence representation in GLS.Meanwhile, the phenomenon also verifies the fact that the tree-based local search is able to accelerate the convergence of DE note that the termination criterion of HDE is decided by the convergence status .

Influence of the Parameter Settings
The following experiments are designed for observing the impact of parameter settings on the final solution quality of HDE.A 10 × 10 instance with f 1.3 is used in these experiments.The termination criterion adopted here is an exogenously given computational time limit.When one parameter is being tested, the other parameters are all fixed at their recommended values given in Section 5.1.
The population size SN is one of the key parameters for the HDE algorithm.If SN is large, many solutions need to be maintained in the population, which increases the computational burden of solution decoding and evaluation.If SN is small, more generations can be implemented within the DE framework, but the limited population diversity will impair the effectiveness of mutation and crossover.Therefore, under a given computational time limit, the selection of SN can influence the overall optimization performance.In this experiment, we fix the available computational time at three different levels and then identify the most suitable value of SN in each scenario.The time limit is set as 20 sec tight , 40 sec moderate , and 60 sec loose , respectively.The computational results are displayed in Figure 4, where the vertical axis gives the average objective value obtained from 20 independent executions of the proposed HDE under each SN.
According to the results, the selection of SN has a noticeable impact on the solution quality.Generally, the solution quality will deteriorate if SN is either too small or too large.But the best SN varies with different time constraints.If the time budget is tight or moderate, the best population size is 40 ∼ 50.If the time resource is abundant, the best population size is 60 ∼ 70.Thus, setting SN 50 is reasonable for ordinary uses.
Another important parameter for HDE is e%, the percentage of solutions that undergo the local search process.A reasonable selection of e will result in an effective balance between exploration and exploitation.The computational results for this experiment are displayed in Figure 5.
According to the results, the setting of e has a considerable impact on the solution quality, especially when the computational time is scarce 20 sec .A small e means that only a few solutions in each generation can be improved by the local search, which has little effect on the entire population.A large e suggests that too much time is consumed on local search, which may reduce the normal function of DE.
The best setting of e under each constraint level is 70 for tight time budget , 60 for moderate time budget and 40 for loose time budget .When the exogenous restriction on computational time is tight, DE has to rely on frequent local search to find good solutions.This is because, in the short term, the tree-based local search is more efficient than DE's mechanism mutation and crossover in improving a solution.However, the price to pay is possibly a premature convergence of the optimization process due to the greedy nature of the tree search.On the other hand, when the computational time is sufficient, DE will prefer a larger number of generations to conduct a systematic exploration of the solution space.In this case, the local search need not be used very frequently, otherwise the steady searching process may be disturbed.Overall, a recommended value for e is 50.
Finally, we focus on the local search parameters η 1 , η 2 , and L. Following the suggestion of 36 , we fix the relationship between η 1 and η 2 as η 1 2η 2 and then test the influence of L and η 1 on the final solution quality.The tested ranges are L ∈ {8, 10, 12, . . ., 24}, η 1 ∈ {10, 12, 14, . . ., 26}, with a step size of 2, which leads to 81 combinations.Again, we choose the first instance of each set #-1 under f 1.3 for this experiment.The computational time limit is set as 0.6 nm sec for an n × m instance.The results are displayed as relative values in Figure 6.
The following comments can be made according to the results.
1 The tree search parameters produce a remarkable effect on the final solutions of HDE, which confirms that such a local search optimizer is effective in improving the performance of DE.When L and η 1 both take small values, the tree search process does not have chance to examine a sufficient number of neighborhood solutions Mathematical Problems in Engineering  before termination, which results in solution quality decline.On the other hand, when L and η 1 are both set large, the overall performance also worsens because the exploitation is consuming too much time and crowds out the exploration process of DE. deteriorate.Based on the experiments, the recommended settings are L 14 and η 1 18 thus η 2 9 .

Conclusion
In this paper, we propose a hybrid differential evolution algorithm for the job shop scheduling problem with the objective of minimizing total weighted tardiness.Based on the mathematical programming models, we show a neighborhood property for the swap of adjacent operations in a critical block, which can be used to exclude some nonimproving neighborhood moves.Then, a tree-based local optimizer is designed and embedded into the DE algorithm in order to promote the exploitation function.The tree search algorithm generates compound neighborhood for the selected solution and therefore it helps DE to exploit the relatively high-quality solutions.Finally, the computational results verify the effectiveness and efficiency of the proposed hybrid approach.
The future research can be carried out from the following aspects.

Appendices
A. Proof of Theorem 3.1 Proof.We prove the theorem by construction.First, it is noticeable that the unit cost of each arc in A ∪ σ is equal to the processing time of the operation that is connected to the tail of the arc, that is, c i 1 ,i 2 p i 1 , which means the largest cost path from node 0 to node u j is exactly the same as the critical path from operation 0 to operation u j , and the total cost of this path equals the length of the critical path.
Then, because constraint f requires that the total input flow should equal the total output flow for every node, a feasible solution F to this problem must be composed of several cycle flows.However, we know that G N, A ∪ σ does not contain any cycles, so each cycle in F must contain at least one arc from U. On the other hand, we can see that the arcs in U all point to the same node 0, and because a cycle cannot include duplicate nodes, it is clear that each cycle in F contains at most one arc from U. Finally, it is concluded that each cycle contains exactly one arc from U.
Based on the previous discussions, we can construct an optimal solution F * by the following steps.
Step 1. Initialize the flow on each arc to be 0, and let j 1.
Step 2. Find the unique critical path from operation 0 to u j , denoted by P * 0, u j the uniqueness is guaranteed by a simple rule detailed in the following .
Step 3. If the total unit cost i.e., length of the path P * 0, u j plus the unit cost of arc u j , 0 is greater than 0, then add a flow amount of w u j to arc u j , 0 and each arc in P * 0, u j notice that the capacity limit for arc u j , 0 is w u j .
Step 4. Set j ← j 1.If j n, then return to Step 2; otherwise, terminate the algorithm.
In such a constructed solution F * , each node except 0 has at most one positive-flow incoming arc.This is because the flows are only distributed on the arcs belonging to critical paths, and meanwhile, only one critical path is considered from node 0 to any other node i.
The last issue is how to maintain the uniqueness of the critical path to a certain node.Here we use a rule called "machine predecessor first."Indeed, when looking for a critical path from 0 to u j , we begin from u j and move backward.In each step, we must select an operation whose completion time equals the starting time of the current operation i from its immediate job predecessor JP i and its immediate machine predecessor MP i .Then, if C JP i s i and C MP i s i both hold, the rule requires that the machine predecessor should be selected as the next current operation.This tie-breaking rule guarantees the uniqueness of the critical path to any node.

B. Proof of Theorem 3.3
Proof.As Figure 1 shows, the flows in the initial network are denoted by x i x i > 0 , so we have the flow equilibrium condition:

B.1 Mathematical Problems in Engineering
Now we are swapping the operations α and β.After the SWAP operation is performed, the flows on the relevant arcs are denoted by y.So the following equilibrium equations must be satisfied:

B.4
Since the flows outside this block are all kept unchanged, the difference in the total cost of the whole network resulted from executing SWAP α, β is the same as ΔC calculated for this subnetwork, that is, TC y − TC x C y − C x .Therefore, if ΔC 0, then TC max σ TC y TC x TC max σ σ denotes the original set of directed disjunctive arcs while σ denotes the new arc set obtained after performing SWAP α, β .The last " " is because we assume the original network is related with the optimal solution to the dual problem under σ.In fact, TC max

Figure 4 :
Figure 4: The influence of parameter SN.

Figure 5 :
Figure 5: The influence of parameter e.
it concludes that, when ΔC 0 ⇔ F α /p α F β /p β , swapping α and β will not improve the current solution TWT min σ TWT min σ .

Table 1 :
Illustration of the decoding process using SPV.

Table 2 :
The computational results under loose due dates f 1.5 .

Table 3 :
The computational results under moderate due dates f 1.3 .

Table 4 :
The computational results under tight due dates f 1.1 .