Scheduling Semiconductor Multihead Testers Using Metaheuristic Techniques Embedded with Lot-Specific and Configuration-Specific Information

In the semiconductor back-end manufacturing, the device test central processing unit (CPU) is most costly and is typically the bottleneck machine at the test plant. A multihead tester contains a CPU and several test heads, each of which can be connected to a handler that processes one lot of the same device. The residence time of a lot is closely related to the product mix on test heads, which increases the complexity of this problem. It is critical for the test scheduling problem to reduce CPU’s idle time and to increase tester utilization. In this paper, a multihead tester scheduling problem is formulated as an identical parallel machine scheduling problem with the objective of minimizing makespan. A heuristic grouping method is developed to obtain a good initial solution in a short time. Three metaheuristic techniques, using lot-specific and configuration-specific information, are proposed to receive a near-optimum and are compared to traditional approaches. Computational experiments show that a tabu search with lot-specific information outperforms all other competing approaches.


Introduction
Semiconductor manufacturing consists of front-end and back-end manufacturing.The front-end manufacturing contains the processes of wafer fabrication and wafer probe.In wafer fabrication, a pattern of circuitry is imprinted onto the surfaces of a wafer, on which hundreds of dice are fabricated, and dice are individually tested in the process of wafer probe.In recent years, vast research of scheduling has been done in the field of wafer fabrication [1][2][3][4][5][6][7][8].In [9,10] and the wafer probe process was scheduled as a mathematical model to minimize the makespan.Lot and process information are used to develop scheduling algorithm.The back-end manufacturing comprises the processes of assembly and final testing.In an assembly process, dice are sealed into packaged devices (or briefly, devices), which are then tested in a final testing process in order to ensure the functionality of the products before delivery to customers.In [11], the flow of the final test is formulated as a job shop model with various resource limitations.An assignment algorithm is developed to obtain the machine configuration of each job and allot specific resources.Moreover, a genetic algorithm called GASFTP is used to obtain a near-optimal schedule in real settings.In [12][13][14][15], back-end operations such as assembly and final test are considered as serial production stages or workstations and heuristic techniques such as ant colony optimization [12], multistep reinforcement learning algorithm [13], reactive greedy randomized adaptive search procedure (GRASP) [14] and dynamic machine/lot prioritization [15] are introduced.In this paper, we focus on the scheduling problem of multihead testers in order to solve the bottleneck of the final testing process.
A device is the smallest unit in the back-end manufacturing and a lot is a group of the same devices transported together.However, in a lot, each device is processed individually on a handler, through which each device is tested by a central processing unit (CPU).Device testing time (or briefly, testing time), which varies with device types, is the CPU's duration of inspecting a device's functionality.Device handling time (or briefly, handling time) is the duration that a handler takes to unload a finished device from a testing location, to load a waiting device onto the testing location, and, in some cases, to perform a temperature treatment.
Multihead tester operations were firstly introduced in detail to present the formulas for determining the residence times of lots [16].A multihead tester contains a CPU and several test heads, each of which can be connected with a handler.At any particular time, a handler can only process one lot of the same device.However, a tester's CPU, as shown in Figure 1, is often connected to more than one handler.A tester can simultaneously place several lots to reduce CPU's idle time and to increase tester utilization.A CPU tests a device of a lot on each test head sequentially but skips the test head that is idle or is undergoing the handling time.The residence time of a lot, which is how long a lot stays in a test head, includes the testing time, the handling time, and the idle time of total devices in this lot.The cost of a handler is far less than the cost of a tester's CPU; hence, a tester normally carries several handlers to avoid the idle time of a CPU [16].Thus, it becomes important to reduce CPU's idle time and to increase tester efficiency.We formulate this scheduling problem as an identical parallel machine scheduling problem with the objective of minimizing makespan ( max ).The main difference of scheduling problems between a traditional parallel machine and a tester is the nature of residence times.In a tester scheduling problem, the residence time of a lot depends on the product mix on all the heads of a tester, which increases the complexity of this problem.
A configuration  is the status that a set of different lots,   , is simultaneously placed on the heads of a tester during a specific period of time,   .The configuration will be changed if the following conditions occur: (5) a lot finishes on a test head and a changeover then needs to be performed for the next lot.(2) A changeover is completed and then the next lot will start on the test head.Both conditions will affect the residence times of the lots placed on the other heads of the tester.Figure 2 is the Gantt charts of the three cases of a configuration in a three-head tester based on a microview of device types.The three test heads are connected to the three handlers, each of which processes the same devices in a lot.The time   is the unit testing time of a device type  and the time ℎ  is the unit handling time of a device type .In each device test cycle, the CPU tests a device on each test head sequentially but skips the test head that is idle or is undergoing the handling time.
The device in a test head needs to wait until the testing completion of the CPU on another test head if a device's handling time is less than the sum of the device's testing times on the other heads.For example, under case 1, the test head 1 first tests the device type 1 with in a testing time  1 and then prepares for the next device with a handling time ℎ 1 .After that, the test head 1 needs to wait until the testing completion of the CPU on test head 3.The CPU then switches to test on test head 1.Therefore, the sum of testing times is larger than the maximum of device testing and handling time among all the heads; that is, ∑ 3 =1   > max =1,2,3 (  + ℎ  ).On the other hand, the CPU needs to wait if the CPU turns back to a test head, which is still under the process of the handling time for the next device.That is to say, the CPU will be forced to be idle if a device's handling time is larger than the sum of the device's testing times on the other heads.For example, under case 2, the CPU needs to wait for handler 1 finishing the preparation for the next device after the CPU finishing the test of device type 1, 2, and 3. Thus, in such a case, the CPU is forced to be idle and the sum of testing times is less than the maximum of device testing and handling time among all the heads; that is, Both cases, including the idle test bed under case 1 and the idle CPU under case 2, prolong the residence times of the lots and cause a longer makespan.However, as shown in case 3, the most ideal case occurs when neither of the CPU and the test bed needs to wait for a period of time; that is, In this paper, the testing parameters of any device type at any test head are assumed to be the same; that is, all the test heads are identical.At test plants, a certain device may only be tested on a given test head.The cost of a CPU is much more expensive than the cost of a handler.Hence, the limitation of handler availability is not considered.We herein assume that each running test in any test head cannot be terminated; that is, preemption and breakdown are not allowed.Moreover, between two consecutive lots, different operating parameters of test handlers exist and the test head and handler need to be reinstalled.This changeover time includes the installation time, loading time, and unloading time.For the sake of the simplicity, the changeover time between two consecutive lots is assumed to be independent of the device types involved and is a constant value.
This scheduling problem of multihead testers is similar to an identical parallel machine scheduling problem, which has been proved to be a NP-hard problem [17].In this paper, this multihead tester scheduling problem is solved by a more complicated parallel machine scheduling problem because the residence time of a lot on a test head is dependent on the lots being processed on the other test heads.Due to the problem difficulty, most researchers have used search algorithms and an advanced initial solution to solve parallel machine scheduling problems [18][19][20][21][22][23][24][25].As a result of the rapid development in personal computer speed, it would be worthwhile to invest in a relatively cheaper computer for solving this scheduling problem of a much more expensive  tester and to increase its productivity.Simulated annealing was first proposed in [26] and was developed initially to emulate cooling and recrystallization process.Reference [27] reviewed this search technique.Tabu search was first proposed in [28] and a review of tabu search was introduced in [29].Genetic algorithm was first introduced in [30] and relevant review were given in [31,32].This problem is similar to parallel machine scheduling problems but is more complicated in that the residence time of a lot on a test head is dependent on the lots being placed on the other test heads.Recent research on parallel machine scheduling problems solved by simulated annealing, tabu search, and genetic algorithm can be seen in [33][34][35][36][37][38][39][40].In this paper, three metaheuristic techniques, including simulated annealing, tabu search, and genetic algorithm embedded with lot-specific and configuration-specific information, are proposed and are compared to the traditional approaches.In order to tackle such a complex problem, two sets of measures, developed by the concepts of the lot information and the configuration information, are used in heuristic rules for generating new solutions.These heuristic rules used in this paper are embedded in traditional approaches.Furthermore, since a better initial solution normally helps locate a good solution faster, a heuristic rule, called a grouping heuristic method, is proposed to efficiently generate a good initial solution.
The remainder of the paper is organized as follows.In Section 2, a procedure of evaluating a schedule is introduced and a heuristic grouping method is developed to locate a good initial solution.Three metaheuristic techniques, including simulated annealing, tabu search, and genetic algorithm embedded with lot-specific and configurationspecific information, are compared to traditional approaches.This methodology is also extended to the discussion on the scheduling problem of multiple multihead testers.Computing results are presented in Section 3. Finally, we conclude with a brief discussion of the results in Section 4.

Methodology
In the overall design of the methodology, Section 2.1 presents the procedure for computing the makespan of a given schedule (solution).A two-phase methodology is presented in our proposed approach.The first phase uses the heuristic grouping method, outlined in Section 2.2, to obtain a good initial solution.Three metaheuristic techniques introduced from Sections 2.3 to 2.5, including simulated annealing, tabu search, and genetic algorithm, are developed to improve the initial solution in the second phase.In addition, limited running time is executed as the stopping criterion.
The following notations are used for a single multihead tester scheduling problem discussed from Sections 2.1 to 2.5.Section 2.6 discusses the extensions to multiple multihead tester scheduling problems.: configuration , the status that a set of different lots is simultaneously placed on the heads of a tester during a specific period of time.
Parameters   : the testing time of a device in a lot .
ℎ  : the handling time of a device in a lot , which includes an unloading time, a loading time, and the time for temperature treatment.  : the number of devices in a lot .

Sets
: the order sequence of the configurations in which a lot  is processed.  : the set of lots processed in a configuration .

Variables
: the index number of the lot, that is, the th lot processed on a test head .
It is worth noting that the CPU idleness (  ) in a configuration  is allowed to be negative.In such a case, the absolute value of CPU idleness (  ) is the minimum device waiting time among all the lots in a configuration  as shown in case 1 of Figure 2. Hence, the CPU idle time per device test cycle in a configuration  is equal to max{0,   } and the device waiting time per device test cycle of a lot  in a configuration  is equal to max{0, ∑   ∈     − (  + ℎ  )}.An example schedule for a three-head tester, as shown in Figure 3, represents the residence times and the order sequence of different lots and the changeover times between two lots separately in three test heads based on a macroview of lot types.Under this schedule, the set   of a lot  is  1 = {1, 2, 3, 4},  2 = {1, 2}, and  4 = {1} whereas the information of configurations 1, 2, and 3 is computed in Table 1.

Computing Procedure for Evaluating a Schedule.
As discussed above, the residence time of a lot depends on the device types on the other heads at the same tester because the device types on the other heads will affect the device idle times of a lot.Thus, a procedure is needed to evaluate the makespan of a particular schedule, which is obtained by one of the methods outlined from Sections 2.2 to 2.5.In this procedure, the following variables are mentioned.

𝑡 now: current time.
next  : the next event time of a test head , which will change the configuration.remain  : the number of devices remaining untested in a lot .
event  : the next event type of a test head .Event  represents the event when a lot is completed and Event  represents the event when a changeover is completed.(): the current index number of a lot  on a test head .
After a solution (schedule)  is obtained, the following procedure, which is similar to a discrete event simulation, is applied to compute its makespan.
Step 2. Let  now = 0. Schedule the first lot of each test head.
Let event  = , for all  and  = 1.Determine   based on the device types on all test heads.Compute t next  =   ⋅ () , for all .
> 0), the CPU will be idle for a time period of   per device test cycle in a configuration .A lot with a large handling time (ℎ  ) is likely to make   large and thus cause CPU to be idle.Therefore, the lots with large device testing times should be assigned to the other heads in order to make ∑ ∈    larger.If lots can be assigned to the other heads and the value max ∈  (  + ℎ  ) is larger than ∑ ∈    , the CPU idleness can be avoided.
On the other hand, if max ∈  (  + ℎ  ) < ∑ ∈    (i.e.,   < 0), at least a device needs to wait for the CPU.Further, the minimum device waiting time per device test cycle among all heads is |  |.If the value ∑ ∈    is too large, the devices in a configuration  will need to wait for a long time and largely prolong the residence time of lots in a configuration .Thus, under the condition of max ∈  (  + ℎ  ) < ∑ ∈    , the proposed heuristic grouping method is implemented to make ∑ ∈    as close to max ∈  (  + ℎ  ) as possible.The notations and the procedure of the heuristic grouping method are as follows.Let : the set of all lots, which are sorted by a descending order of device testing times; : the set of all lots, which are sorted by a descending order of device handling times;   : the th element in set ; : the set of the lots, which are currently under process and were selected from set ; : the lot which is currently under process and was selected from set .
The Procedure Is as Follows.
Step 1. Sort the lots based on their device testing times and device handling times in order to generate the set  and the set , respectively.
Step 2. Let  = the first element in  and remove  from both  and .
Step   Step 7. Stop the procedure.
For example, Table 2 is the information of 6 lots waiting for test in a three-head tester.The steps of the heuristic grouping method are organized in Table 3. Figure 4 is the solution solved by the heuristic grouping method.The information of the lots and configurations is presented in Tables 4 and  5. Several examples of the calculation in Tables 4 and 5 are described as follows.

Simulated Annealing.
Three simulated annealing methods developed in this study, each of which uses a different rule to generate neighbor solutions, are traditional simulated annealing (TSA), simulated annealing using configuration information (HSA-1), and simulated annealing using lot information (HSA-2).HSA-1 and HSA-2 are simulated annealing methods within which heuristic rules are used to guide the stochastic selection of neighbor solutions.

Traditional Simulated Annealing (TSA).
An insertion rule and an exchange rule are developed to generate neighbor solutions for the use of simulated annealing [41].In our TSA, these two rules are randomly chosen with equal probability of generating neighbor solutions.If the insertion rule is chosen, it randomly selects a lot and then randomly selects a slot for the insertion of the selected lot.If the exchanging rule is chosen, it randomly selects two lots and then swaps their positions.

Simulated Annealing Using Configuration Information (HSA-1). Let
max : under current solution, the configuration  with the maximum   ;  min : under current solution, the configuration  with the minimum   .
Configuration  max has the longest CPU idle time due to long handling times, whereas configuration  min has the largest minimum device wait time among all heads due to long testing times.Based on the previous discussion in Section 2.2, a long handling time should be matched with long testing times to avoid waiting and reduce processing time.Thus, the lots in these two configurations are good candidates for exchanging positions.The exchange rule in HSA-1 randomly selects one lot from each of these two configurations and then exchanges them, whereas the insertion rule in HSA-1 randomly selects one lot from the set of the lots in  max and then inserts the selected lot into a random slot.
In computing   ,   /  is the average processing time per device in lot  under a current solution and (  + ℎ  ) is the minimum processing time per device.A larger   value indicates that the devices in lot  wait longer for CPU; thus, lot  seems to be a good candidate for a change.Thus, we calculate the probability of lot  being selected (  ) by using the above formula.
The inserting method randomly selects a lot using the above probabilities and then randomly inserts it into a slot.The exchanging method uses the above probabilities to randomly select two lots and then swap their positions.

Tabu Search.
A simple tabu search is successfully used to solve a certain class of NP-hard problems.However, for some problems, their neighborhood is too large or a neighbor solution is too expensive to evaluate.Evaluating the makespan of a given schedule in our problem is not trivial and requires a simulation-like procedure.Thus, a simple tabu search method may not be suitable.Certain modifications, such as neighborhood reduction or candidate list strategy, may be necessary to improve the efficiency of neighborhood examination in a tabu search [29].
Using a similar idea of reducing neighborhood, we developed three heuristic rules to select a better subset of all neighbors for evaluation.

Traditional Tabu Search (TTS)
. TTS implements a traditional simple tabu search.It evaluates each possible insertion of each lot.Therefore, the number of neighbor solutions in a tabu iteration is approximately equal to the square of the number of lots.Thus, it will take a long time for an iteration in a large problem.

Tabu Search with Deterministic Selection Using Configuration Information (HTS-1).
According to the previous discussion, lots in  max and lots in  min are good candidates to swap positions; thus, HTS-1 examines the neighbor solution by each possible swap between these two configurations.Roughly speaking, if  max has  lots and  min has  lots, then there are ( ⋅ ) neighbor solutions.Those methods deterministically select the two configurations  max and  min .

Tabu Search with Stochastic Selection Using Configuration Information (HTS-2). Let
: the nonnegative CPU idleness measure for configuration  in a current solution; : the probability of configuration , which will be used for selection.
max : a randomly selected configuration based on large CPU idleness; min : a randomly selected configuration based on small CPU idleness.
This method randomly selects two configurations, then swaps the lots between these two configurations.First, compute then compute the probability In this way, a configuration  with a larger   will have higher probability   .Then, randomly select a configuration  max based on   .
Compute new and then compute new In this way, a configuration  with a smaller   will have higher probability   .Then, randomly select a configuration  min based on   .
In  max , it is likely that the CPU waits due to long handling times, whereas in  min , it is likely that devices wait due to long testing times.Based on the previous discussion, a long handling time should match with long testing times to reduce processing time.Thus, HTS-2 examines each possible lot swap between the two configurations  max and  min .

Tabu Search Using Lot Information (HTS-3)
. HTS-3 randomly selects  (the number of test heads) lots based on the probabilities   .The higher   , is the worse position lot  has.Thus, HTS-3 makes it more likely to be selected for changing positions.Then, HTS-3 examines the neighbor solution by exchanging each pair of the selected lots; that is, there are (  2 ) = !/( − 2)!2! neighbor solutions in a tabu iteration.[42] pointed out that the performance of genetic algorithm is determined by its crossover operator.An effective crossover is necessary for a successful genetic evolution.We use the heuristic measures introduced above and the order crossover operator proposed by [43] to design crossover operators and to preserve good genes from parent chromosomes.

Genetic Algorithm. Reference
We follow the method used by [41] to form the first generation.That is, the initial solution produced by the heuristic grouping method is duplicated once; then the other chromosomes in the first generation are randomly generated.

Genetic Algorithm Using Deterministic Selection to Preserve a Good Configuration (HGA-1). Neither a large positive
nor a small negative   is desirable.Let f : the absolute value of   in a particular solution; f = |  |, for all ; : the number of lots under processing in configuration ; ∈ : the configuration  belongs to chromosome .
Let  1 and  2 be the two selected parent chromosomes for crossover.f is the distance to the most ideal case; thus, we find the configuration  1 such that   1 =  and   1 = min   ∈ 1 f  .Similarly, we obtain  2 from solution  2 .Then, we put all the lots in  1 at the first position of each test head in a child chromosome   1 .In this way, we are maximizing the duration of using configuration  1 , since  1 has the minimum distance to ideal case  1 .After that, we delete the lots in  1 from  2 and then append the remaining schedule of  2 to   1 .We can generate   2 similarly.An example is shown in Figure 5.We assume that the best configuration in solution  1 consists of lots 1, 2, and 4; thus, the configuration is preserved in the first positions of the test heads in   1 to maximize its duration.Then, lots 1, 2, and 4 are deleted from  2 .The lots in the remaining schedule of  2 are appended to the corresponding test head in    1 .Assume that the best configuration in  2 consists of lots 3, 6, and 1.Similarly, they are preserved in the first position of each test head in   2 .Then, the other lots are appended to   2 from the remaining schedule of  1 .

Genetic Algorithm Using Stochastic Selection to Preserve a Good Configuration (HGA-2).
HGA-1 deterministically selects  1 and  2 based on the absolute values of the CPU idleness of configurations.However, HGA-2 randomly selects them based on the probabilities calculated from CPU idleness.We compute   = max   f  − f , for all , then calculate the probability of  being selected by   =   / ∑      for a particular chromosome.A configuration with CPU idleness closer to zero has a higher probability of being selected.Therefore, HGA-2 stochastically selects a configuration  1 from  1 and  2 from  2 , respectively, then performs the crossover operation outlined in HGA-1.

Genetic Algorithm Using Deterministic Selection to
Preserve a Good Head (HGA-3).Let (): the test head number, on which lot  is processed, in a particular chromosome;    : the aggregated measure of all the lots on head   in a particular chromosome; To preserve the head with a good schedule, we design the following crossover operation.Let  1 and  2 be the two selected parents for crossover operation.Find  1 , such that   1 = min  {  } in chromosome  1 and, similarly,  2 in  2 .Then, we put the sequence of lots on head  1 on the corresponding head of new chromosome   1 .In this way, we maintain the sequence of lots on a head with a good schedule (a small   ).After that, we delete the lots in  1 from  2 and then append the remaining schedule of  2 to   1 .Similarly, we can generate   2 .

Genetic Algorithm Using Stochastic Selection to Preserve a Good Head (HGA-4). Let
: the nonnegative measure for head  in a particular solution;   = max      −   , for all ; : the probability of head  in a particular chromosome being selected for preserving in a new chromosome;   =   / ∑      , for all .
A head with a lower value of   has a higher probability of being selected.Then, we randomly select  1 and  2 from  1 and  2 , respectively.Next, we perform the crossover method outlined in HGA-3 using  1 in  1 and  2 in  2 .

Traditional Genetic Algorithm by Randomly Preserving
a Configuration (TGA-1).Instead of using the f measures, the configuration to be preserved is randomly selected with each configuration having an equal probability.Then, the crossover operator presented in HGA-1 performs to generate new chromosomes.

Traditional Genetic Algorithm by Randomly Preserving
a Head (TGA-2).Instead of using the   measures, the head to be preserved is randomly selected, with each head having an equal probability.Then, the crossover operator presented in HGA-3 performs to generate new chromosomes.

Extension to Multiple Multihead Testers with Identical
Testers.All the search methods outlined above are used for a single multihead tester problem with identical tester heads can easily be extended to a multiple multihead tester problem with identical tester heads and identical testers.In the computing procedure for evaluating a schedule, the cycle time per device test cycle in a configuration depends only on the heads of the same tester.That is, the configuration of a tester is independent of the configurations of the other testers.When an event changes a configuration, we merely need to update the event time of the heads on the same tester.However, when advancing the time, we should consider all heads on all machines.
The heuristic grouping method can also be applied by initially moving one lot from set  to the first tester and moving the remaining lots from set  to the first tester, then moving one lot from set  to the second tester and finding the reaming lots from set  to the second tester, and continuing in the same way until all heads are full.After advancing time to the first event, the procedure is the same as that outlined in Section 2.2.
All the search methods can also be easily extended.We should consider every configuration of every machine when considering configuration information, every head on every machine when using head information, and every lot when using lot information.In this way, all the methods are similarly applied.

Computational Experiments
A series of computational experiments is designed to test the above methods and find out which one performs best in solving this problem.Microsoft Visual C++ is used to run our experiments.Experiments are conducted and divided in to two parts.In the first part, the problems with optimal solutions are designed by ensuring that the CPU idle time of all configurations is zero.The initial solution, solved by a heuristic grouping method, is compared to the results obtained from Long Processing Time (LPT) method [44] and Multifit method [45].In addition, the solutions solved by three metaheuristic techniques using lot-specific and configuration-specific information and the traditional approaches are compared to the optimal solutions.In the second part, test problems are randomly generated by specified problem parameters.By controlling the parameters, the analysis of variance (ANOVA) is used to realize the factor effects which will affect the performance of the three metaheuristic techniques.

Parameter of the Metaheuristic Techniques.
In the algorithmic process of simulated annealing, three settings need to be determined.The initial temperature and the cooling procedure used in our experiments are called adaptive simulated annealing, which is suggested by [46].The epoch length (the number of moves in a temperature) is the square of the number of lots in the problems, as suggested by [47].
To avoid a local optimum recurring in a tabu search, a tabu list is needed.The size of the tabu list is set to 7, as suggested by [48,49].
Reference [50] proposed a so-called adaptive genetic algorithm, which dynamically adjusts the probability of crossover and mutation in genetic algorithms, and it shows that the adaptive genetic algorithm is better than a standard genetic algorithm.We use this adaptive approach for setting the crossover and mutation probabilities in our experiments.The population size used in our experiment is set to 100 [42].

Experiments Using Problems with Known Optimal Solutions.
In order to find a problem with a known optimal solution, the inequality max ∈  {  + ℎ  } < ∑ ∈    , for all , needs to be satisfied in the test problem; that is, no CPU idle time occurred in each configuration.Figure 6 is the optimal schedule on one tester in one of the test problems.The device testing time and device handling time of each lot are shown in Table 6.The lots are duplicated on the machine twice to make it a three-head tester problem; that is, 18 lots are generated for this problem.Observing the experimental results of the initial solution obtained by different methods in Table 7, we can find out that the proposed heuristic grouping method is the best initial solution because other approaches do not consider the interaction between the test heads.The experimental results of all methods developed by three metaheuristic techniques are shown in Table 8 and the optimal solution can be found in most methods within 3000 CPU seconds, the time limitation of execution.HTS-3 takes the least amount of CPU time (0.5 seconds) to find the optimal solution.Generally speaking, tabu search is better than simulated annealing, and simulated annealing in turn is better than genetic algorithm.The metaheuristic techniques embedded with lot-specific or configuration-specific information are better than their corresponding traditional implementations.

Experiments Using Randomly Generated Problems.
In this part of the experiments, the following factors are considered when generating the test problems: (1) the number of machines: two factor levels are set to one machine and three machines in order to realize the performance of single multihead tester and multiple multihead tester scheduling problems; (2) the number of test heads: three factor levels are set to two heads, three heads, and four heads in this single multihead tester problem; it is unusual to have a machine with more than four heads in practice; (3) the ratio of the number of lots over the number of test heads: two factor levels are set as in (2) and (4); (4) the ratio of handling to testing: this ratio might affect the performance of the algorithms; let : the average testing time; ℎ: the average handling time; : the ratio of handling to testing and it is defined by Three levels for this ratio () are set to 0.8, 1.0, and 1.2 in our random problems; (5) the variation of device testing time and device handling time in the problem: these two values are randomly generated by the following uniform distributions: where  is the parameter to control the variation of testing time and handling time.Two factor levels for  are set to 0.1 and 0.3.
In sum, a total of 72 (= 2 × 3 × 2 × 3 × 2) factor combinations are considered and 10 random problems are tested for each factor combination.Thus, a total of 720 problems, each of which is solved by the above four tabu search methods, are tested.After several initial tests, the stopping search time is set to 250 CPU seconds in each search experiment.The average testing time () is set to 2 CPU seconds and the average handling time is then determined by (5).The lot sizes are randomly generated by letting   ∼ uniform(1000, 2000).The changeover time is set to 1200 CPU seconds.
Since the optimal solutions are not obtained in this part of the experiments and the problem sizes are varied, we need to normalize the objective values of the solutions found by the search methods.The following notations are used to present how to normalize objective values and CPU times.
( max )  V : the makespan found by method  at time  for problem V. ( max ) * V : the best solution (minimum makespan) found within 250 CPU seconds among all the methods for problem V.   V = ( max )  V /( max ) * V : the normalized objective value for problem V using method  at CPU time . * V : the CPU time (in seconds) when the first method finds ( max ) * V for problem V.  * V = log 10 (100 ×  * V ): the logarithm to base 10-of 100fold of the CPU seconds at which the best solution is found.To avoid having the logarithm value of a CPU time being negative, we have a CPU time in seconds multiplied by 100.That is to say, the logarithm value of CPU time will only be negative when the actual CPU time finding the best solution is less than 0.01 second, which is impossible in our experiments.  V = log 10 (100 × )/ * V : the normalization of CPU time  for problem V.

Factorial Analysis Results and Algorithm Performance.
We utilize multifactor testing of the general linear model to analyze our results.The method (METHOD) is treated as a fixed factor, and thus a total of six factors are considered in this experiment.The number of test heads (HEAD) is also a fixed factor, whereas the other factors-number of machine (MAC), the ratio of the number of lots to the number of testing heads (LOT R), the ratio of testing to handling   (T H R), and the variation of testing time and handling time (RANGE)-are random factors.In the statistical analysis, a significant level of  = 0.05 is used throughout.The analysis of variance (ANOVA) results are shown in Table 9.All the factors have significant effects on search performance.The plots of the average normalized objective versus normalized CPU time for the simulated annealing, tabu search, and genetic algorithm are shown in Figures 7, 8, and 9, respectively.The results of simulated annealing are shown in Figure 7.In the early stage, HSA-2 is better than TSA.However, during the middle and later stages, these two methods are close.TSA is even better than HSA-2 in the final stage; that means that a simulated annealing with lot-specific information (HSA-2) finds a good solution faster.However, if the execution time is longer, traditional simulated annealing could catch up. Figure 8 shows that a tabu search with lot information (HTS-3) is always the best among all the tabu search methods.Overall, HTS-3 is always the best among all the 4 methods.This represents that the lot information used in HTS-3 is very useful in developing the heuristic rule to modify the traditional tabu search.Figure 9 shows that HGA-2 is the best while all of the other methods are not significantly different.
By choosing the best method from each metaheuristic group as shown in Figure 10, we can clearly see that HTS-3 is the best in the first 3/4 stage of whole time period.TSA, however, catches up and overtakes HTS-3 at the final 1/4 stage of whole time period.It is interesting that HTS-3 could find a good solution faster but the quality of traditional simulated annealing would be better than HTS-3 if the executive time is long enough.In addition, HGA-2 falls behind other two methods in searching a good solution in whole time period.

Conclusions
In semiconductor back-end testing facilities, it is very important to improve the efficiency of the tester which is the bottleneck at the plants.This paper focuses on the multihead tester scheduling problem with the objective of minimizing makespan, which tries to finish current waiting lots in a minimal time.The special features of such a parallel machine scheduling problem are utilized to propose a heuristic grouping method to generate a good initial solution efficiently.As a result of both CPU idle time and device waiting time prolonging the makespan, two performance measures are developed in this paper.One is calculated based on the CPU idleness of configurations; the other one is computed by using the device waiting times of lots.The three metaheuristic techniques are embedded with heuristic rules using these two measures.
Based on the comparative analysis of our experiments, a stochastic selection is better than a deterministic selection.For example, HTS-2 is better than HTS-1, HGA-2 is better than HGA-1, and HGA-4 is better than HGA-3.If the executive time is short, the tabu search with lot-specific information (HTS-3) performs best.When a longer executive time is allowed, the performance of traditional simulated annealing (TSA) would overtake the tabu search with lotspecific information (HTS-3) and becomes the best of all.

Figure 2 :
Figure 2: Three cases of a configuration in a three-head tester.

4 Figure 3 :
Figure 3: An example schedule of lot residence times for a threehead tester.

Figure 4 :
Figure 4: The solution obtained by the heuristic grouping method.

3 Figure 5 :
Figure 5: An example of crossover operator using configuration information.

6 Figure 6 :
Figure 6: The optimal solution in a single tester problem.

Figure 7 :
Figure 7: The normalized objective versus normalized CPU time for simulated annealing methods.

Figure 8 :
Figure 8: The normalized objective versus normalized CPU time for tabu search methods.

Figure 9 :
Figure 9: The normalized objective versus normalized CPU time for genetic algorithm methods.

Figure 10 :
Figure 10: The normalized objective versus normalized CPU time for the best method from each meta algorithm group.

Table 1 :
Information about configurations.
,  = 1, . . ., }.   : the number of devices in a lot   scheduled on a test head .  : the number of devices which are tested per head during a configuration .  : the cycle time per device test cycle in a configuration ,   = max{∑ ∈    , max ∈  {  + ℎ  }}.
: the period length of a configuration ,   =   ⋅   .  : the end time of a configuration ,   =  −1 +   .  : the residence time of a lot ,   = ∑ ∈    ⋅   , where If no lot is remaining on head   , select the waiting lot with the largest number of devices from the other heads and reassign the lot to head   .Schedule the next lot onto head   .Let event   = .Determine   based on a new configuration.Step 5. Determine   based on a new configuration.Let event   = .Compute  next   =  next   + changeover time.Compute  next  =   × remain () , for all , such that event  = .If all lots are finished, the makespan of the schedule,  now, is obtained and the procedure is stopped ; otherwise, go to Step 3.2.2.HeuristicGrouping Method for Initial Solutions.Under a configuration , the CPU idleness is computed by go to Step 5; if (event   = ), go to Step 4.Step 4. Compute  next  =   × remain () , for all , such that event  = .Go to Step 3.
3. Find the largest , such that ∑ +−2 =    > ℎ  , and put   ,  +1 , . . .,  +−2 into .If no such  can be found, put  1 ,  2 , . . .,  −1 into  and remove the elements in  from both  and .Step 4. Schedule the selected lots (lot  and the lots in ) onto the test heads.If  and  are empty, go to Step 7; otherwise, go to Step 5.
Step 5.If  is the first lot to finish under the current configuration, let  = the first element in , remove  from both  and , and go to Step 4.Step 6.If a lot in  is the first to finish under the current configuration, remove this lot from .Find the largest , such that (   + ∑ ∈   ) > ℎ  .Put   into  and remove   from both  and .If no such  can be found, put  1 into  and remove  1 from  and .Go toStep 4.

Table 3 :
Steps of the heuristic grouping method.

Table 6 :
The parameters of lots of a known optimal problem experiment.

Table 7 :
The results of experiments using different algorithms solving initial solution with a known optimal problem.

Table 8 :
The results of the experiments using the problem with a known optimal problem.
*The optimal solution found.

Table 9 :
The ANOVA table.