Enhancing Scheduling Performance for a Wafer Fabrication Factory: The Biobjective Slack-Diversifying Nonlinear Fluctuation-Smoothing Rule

A biobjective slack-diversifying nonlinear fluctuation-smoothing rule (biSDNFS) is proposed in the present work to improve the scheduling performance of a wafer fabrication factory. This rule was derived from a one-factor bi-objective nonlinear fluctuation-smoothing rule (1f-biNFS) by dynamically maximizing the standard deviation of the slack, which has been shown to benefit scheduling performance by several previous studies. The efficacy of the biSDNFS was validated with a simulated case; evidence was found to support its effectiveness. We also suggested several directions in which it can be exploited in the future.


Introduction
Semiconductor manufacturing is undoubtedly one of the most noticeable high-technology industries because semiconductor products have widespread applications. However, the life cycles of new semiconductor products are getting shorter. Therefore, semiconductor manufacturers are facing pressure to meet the various needs of customers within shorter time spans. Manufacturers consider rapid product development, agile production, shortened response times, and similar strategies to be viable. All of these strategies compress the cycle times of related processes. Of the various types of cycle times, production cycle time is particularly important because it determines the time of delivery to customers. In other words, if the production cycle time is shortened, the delivery to customers will be faster. To this end, shortening the production cycle time through effective job dispatching is an important task [1]. Much research has been done concerning semiconductor shop floor control as a special type of supervisory control [2], particularly in the domains of deterministic scheduling and job dispatching. However, Chen and Lin [3], Chen and Wang [4], and Chen [5] have noted that for semiconductor factories, job dispatching is very difficult. Theoretically, this is an NP-hard problem. In practice, many semiconductor factories suffer from lengthy cycle times and thus are not able to make favorable promises to their customers.
This study discusses how to determine the sequence of jobs to be processed on each machine in a semiconductor factory so as to shorten the cycle times of jobs. To this end, an innovative dispatching rule is proposed, which involves the applications of fuzzy logic, artificial neural networks, and mathematical programming.
In this field, some innovative dispatching rules considering job parameters have been proposed recently. For example, Chen [6] reported a nonlinear fluctuation smoothing rule that uses the divisor operator instead of the subtraction operator, which diversifies the slack and makes the nonlinear fluctuation smoothing rule more responsive to changes in the parameters. Chen and Wang [7] also proved that the effects of parameters are balanced better by a nonlinear fluctuation smoothing rule than by a traditional one if the variation in the parameters is large. In short, magnifying the difference in the slack seems to improve scheduling performance. For these reasons, a biobjective slack-diversifying nonlinear fluctuation smoothing rule is presented in this study to improve 2 Computational Intelligence and Neuroscience the scheduling of job dispatching in a wafer fabrication factory.
In a fluctuation smoothing rule, jobs that are expected to have long remaining cycle times are assigned lower slack values, which gives these jobs higher priorities to be processed and quickens their progress. There are two sorts of jobs with long remaining cycle times. The first sort comprises jobs that are just in their early stages; these jobs still have many stages to undergo. It is not necessary to deal with jobs of this type. The second type comprises jobs that have been delayed for long periods of time; these jobs have undergone few stages and have more unprocessed stages than the other jobs that started at the same time. Such a situation should be tackled somehow. However, even though these jobs have high slack values according to a fluctuation smoothing rule, they might not be assigned appropriately high priorities because sometimes many jobs have high slack values at the same time and we are not able to determine an absolute sorting for these jobs. To tackle this problem, we need a rule that is able to generate slack values that are as diverse as possible. To this end, we propose the biobjective slack-diversifying nonlinear fluctuation smoothing rule. This rule differs from 1f-biNFS because it maximizes the difference in the slack as measured by the standard deviation of the slack. There are many factors which must be optimized to achieve this goal, so a complex optimization problem must be solved to produce the rule. We apply a polynomial fitting technique to convert it into a more tractable form, for which several optimal solutions can be found. After screening some values from the specified range, the remaining values are used to construct an optimized 1f-biNFS rule.
The later sections of this paper are arranged in the following way. Section 2 is dedicated to the literature review. Section 3 provides the details of the proposed methodology. In Section 4, a simulated case is used to validate the effectiveness of the biobjective slack-diversifying nonlinear fluctuation smoothing rule. The performance levels of some existing rules in this field are also examined using the simulated data. Section 5 concludes this paper and points out some interesting topics for future work.

Literature Review
Semiconductor manufacturing can be divided into four stages: wafer fabrication, wafer probing, packaging, and final testing. The most important and most time-consuming stage is wafer fabrication, which starts with approximately 25 wafers grouped as a lot. This lot is passed through hundreds of operations to build up complex layers of patterned metal and wafer materials that produce the required circuitry. In this study, we investigate job dispatching for this stage. Among the various categories of methods (including dispatching rules, heuristics, data mining-based approaches [8,9], agent technologies ( [8,[10][11][12], and simulation) in this field, dispatching rules (e.g., first-in first-out (FIFO), earliest due date (EDD), least slack (LS), shortest processing time (SPT), shortest remaining processing time (SRPT), critical ratio (CR), the fluctuation smoothing rule for the mean cycle time (FSMCT), the fluctuation smoothing rule for cycle time variation (FSVCT), least total work (LTWK), modified due date (MDD), operation due date (ODD), cost over time (COVERT), FIFO+, SRPT+, and SRPT++) have received a lot of attention these years [8][9][10] and are also the most prevalent method in practical applications. For the details of the traditional dispatching rules, refer to Lu et al. [13]. A recent simulation comparison is presented in Chiang and Fu [14].
Some advances in this field are introduced in the following. Altendorfer et al. [15] proposed the work in parallel queue (WIPQ) rule targeting at maximizing throughput at a low level of work in process (WIP). Zhang et al. [16] proposed the dynamic bottleneck detection (DBD) approach by classifying workstations into several categories and then applying different dispatching rules to these categories. Three dispatching rules including FIFO, the shortest processing time until the next bottleneck (SPNB), and CR were used. Depending on the current conditions in the wafer fabrication factory, Hsieh et al. [9] chose one approach from FSMCT, FSVCT, largest deviation first (LDF), one step ahead (OSA), and FIFO.
Chen [17] modified FSMCT and proposed the nonlinear FSMCT (NFSMCT) rule, in which he smoothed the fluctuation in the estimated remaining cycle time and balanced it with that of the release time or the mean release rate. To diversify the slack, the division operator was applied instead. Followed by Chen [18], the one-factor tailored NFSMCT (1f-TNFSMCT) rule and the one-factor tailored nonlinear FSVCT (1f-TNFSVCT) rule were proposed. Both rules contain an adjustable parameter in order to customize them for a target wafer fabrication factory. As a multiple-objective study, Chen et al. [19] pro-posed a biobjective nonlinear fluctuation smoothing rule with an adjustable factor (1f-biNFS) to optimize the average cycle time and cycle time variation at the same time. More degrees of freedom seem to be conducive to the performance of customizable rules. For this reason, Chen et al. [19] extended 1f-biNFS to a biobjective fluctuation smoothing rule with four adjustable factors (4f-biNFS). For a summary of these rules refer to Table 1. One drawback of them is that only static factors are used, and these factors need to be determined in advance. To this end, most studies (e.g., [17][18][19]) have performed extensive simulation. Such simulation is not only time consuming but it also fails to consider enough possible combinations of these factors. Chen [6] established a mechanism that was able to adjust factor values for 1f-biNFS dynamically (dynamic 1f-biNFS). However, even though satisfactory results were obtained in that experiment, there was no theoretical basis supporting the proposed mechanism. Chen [20] tried to relate the scheduling performance to the factor values with a back propagation network (BPN). Artificial neural networks have been widely applied to various control fields [21][22][23]. When such applications work, one can find the factor values that contribute to optimal scheduling performance. However, the explanatory ability of the BPN was not sufficient.
Computational Intelligence and Neuroscience 3

Methodology
The variables are defined as follows: (2) BQ i : the total queue length before the bottlenecks at  Obviously, Replacing all variables with their estimates gives

Remaining Cycle Time Estimation.
Before applying the biobjective slack-diversifying nonlinear fluctuation smoothing rule, the remaining cycle time required for each job must be estimated in advance. There is not a great deal of research in this field, but the fuzzy c-means (FCM) and fuzzy back propagation network (FBPN) approach of Chen et al. [24] has been shown to be effective [25][26][27] and therefore has been used in this study. In the FCM-FBPN approach, FCM is first used to cluster jobs with similar attributes. FCM performs classification by minimizing the following objective function: where K is the required number of categories; n is the number of jobs; μ i(k) represents the membership of job i belonging to category k; e i(k) measures the distance from job i to the centroid of category k; m ∈ [1, ∞) is a parameter to increase or decrease the fuzziness. The procedure of applying FCM to classify jobs is as follows.

4
Computational Intelligence and Neuroscience (2) Iterations: obtain the centroid of each category as where x (k) is the centroid of category k and μ (t) i(k) is the membership of job i belonging to category k after the tth iteration.
(3) Remeasure the distance of each job to the centroid of every category, and then recalculate the corresponding membership.
(4) Stop if the following condition is satisfied. Otherwise, return to step (2): where d is a real number representing the threshold of membership convergence. Finally, the separate distance test (S test) proposed by Xie and Beni [28] can be applied to determine the optimal number of categories K: subject to The K value minimizing S determines the optimal number of categories.
The remaining cycle time of a job that is being processed in a wafer fabrication factory is the time still required to complete the job. If the job has just been released into the wafer fabrication factory, then the remaining cycle time of the job is its cycle time. The remaining cycle time is an important performance measure for all work-in-progress (WIP) in a wafer fabrication factory. To predict the remaining cycle time, we usually subtract the step cycle time from the cycle time forecast: For this reason, we need to predict both the cycle time and the step cycle time.
After clustering, a portion of the jobs in each category is fed back into the FBPN as "training examples" in order to determine the parameter values for the category. The configuration of the FBPN is as follows.
(1) Inputs: eight parameters are associated with the nth example/job including U n , Q n , BQ n , FQ n , WIP n , and There is a single hidden layer.
(3) The number of neurons in the hidden layer is the same as the number of neurons in the input layer.
(4) Output: the estimated (normalized) cycle time (CTE n ) or estimated step cycle time (SCTE n j ) of the example. In other words, there are two groups of BPNs. The first group estimates the CTE n 's of all the jobs to be scheduled, while the other group estimates their SCTE n j 's. The remaining cycle time estimate (RCTE n j ) can be derived by subtracting SCTE n j from CTE n .
(5) The network learning rule is the Delta rule.
(6) The transformation function is the Sigmoid function .
(8) Initial conditions: because FBPNs tend to be very sensitive to initial conditions, in this study, a GA is employed to generate the initial values of the connection weights in the FBPN. Each chromosome is a vector of about 132 connection weights (see Figure 1). The connection weights are read off the FBPN and placed in a vector from left to right and from top to bottom. Each gene in the chromosome is a real number instead of a bit. To calculate the fitness of a given chromosome, the connection weights in the chromosome are assigned to the corresponding connections in the FBPN, the FBPN is trained using the training data, and the RMSE is returned. A low RSME value indicates high fitness: An initial population of 100 vectors is chosen randomly, with each connection weight set to some uniformly distributed random value between −1.0 and +1.0. The mutation operator selects n noninput neurons and, for each incoming connection to those neurons, adds a uniformly distributed random value between −1.0 and +1.0 to the connection weight. The crossover operator takes two parent connection weight vectors; each noninput neuron in the offspring vector selects one of the parents randomly and where h j values are also transferred to the output layer with the same procedure. Finally, the output of the FBPN is generated as where To improve the applicability of the FBPN and to facilitate comparisons with conventional techniques, the fuzzy-valued output o is defuzzified according to the following formula: In the backward phase, the deviation between o and a is propagated backward, and the error terms of neurons in the output and hidden layers can be calculated, respectively, as Based on these error terms, adjustments to be made for connecting weights and thresholds can be obtained as It is based on the basic gradient descent algorithm. For details refer to Chen [29] and Pendharkar [30]. To accelerate convergence, a momentum term can be added to the learning expressions. For example, 6 Computational Intelligence and Neuroscience Theoretically, network learning stops when the RMSE falls below a prespecified level, or when the improvement in the RMSE becomes negligible over several epochs, or when a large number of epochs have already been run. Then test examples are fed into the FBPN and the accuracy of the network is measured with the RMSE. However, the accumulation of fuzziness during the training process continuously increases the lower bound, the upper bound, and the spread of the fuzzy-valued output o (and those of many other fuzzy parameters); this might prevent the RMSE (calculated with the defuzzified output o) from converging to its minimal value. Conversely, network learning tends to shrink the centers of some fuzzy parameters. A fuzzy parameter can become invalid if its lower bound is higher than its center. To deal with this problem, the lower and upper bounds of all fuzzy numbers in the FBPN will no longer be modified if the following index converges to a minimal value: Finally, the FBPN can be applied to estimate the cycle time or the step cycle time of a new job. When a new job is released into the factory, the eight parameters associated with the new job are recorded. Then the FBPN is applied to estimate the cycle time or step cycle time of the new job.

The Bicriteria Slack-Diversifying Nonlinear Fluctuation
Smoothing Rule. The bicriteria slack-diversifying nonlinear fluctuation smoothing rule is derived by diversifying the slack in the 1f-biNFS rule: where, , The following two theorems explain the theoretical properties of 1f-biNFS. Theorem 1. 1f-biNFS is more responsive than the traditional fluctuation smoothing rules to changes in R n if RCTE n j R n [6]. Theorem 2. The effects of parameters are balanced better by 1f-biNFS than by the traditional fluctuation smoothing rules if RCTE n j − min(RCTE n j ) ≥ R n − min(R n ), that is, if the variation in RCTE n j is greater than that in R n , which is a common phenomenon in a wafer fabrication factory [7].
However, (21) is difficult to deal with. For this reason, the following polynomial fitting technique is used to convert it into a more tractable form: The mean absolute percentage error (RMSE) of (23) is less than 5% when x ≤ 20. The RMSE will not be a serious problem since it is the ξ value associated with the minimum σ SKij to be found, not the SK i j values. Such a polynomial fitting technique is especially effective when x exceeds 1 (see Figure 2). Applying (23) to (21) yields where d i j = 0.94a i c i j + 0.02b i c i j , To diversify the slack, the standard deviation of the slack is to be maximized: It is equivalent to maximizing the following term: Computational Intelligence and Neuroscience Taking the derivative of (27) with respect to ξ, and setting it equal to zero, we obtain where The optimal solution ξ * can be derived as Further, (31) and (32) are complex numbers that will only be considered if their imaginary parts are equal to zero. An example is given in Table 2 to illustrate the procedure mentioned previously. The optimal solution is ξ * = 0.94 with the maximum σ SKij equal to 64115.3. Finally, ξ * can be used to construct an optimized 1f-biNFS as However, it is possible that a job might have a very high or a very low slack value, which could distort the results. For this reason, we exclude the jobs with the highest or lowest slack value from (32): where Q = {max(SK l j ), min(SK l j )}. As a result, In the previous example, after excluding the minimum and maximum slack values, the optimal value of ξ was determined to be 0.11. We compared the results associated with the two settings in Figure 3. Obviously, the second setting achieved better slack diversification because it excluded the minimum and maximum slack values.

Experimental Results and Discussions
The effectiveness of the biobjective slack-diversifying nonlinear fluctuation smoothing rule was assessed with simulated data. To this end, a memory fabrication factory was simulated with a monthly capacity of up to 32,000 wafers. In the wafer fabrication factory, more than 500 workstations Computational Intelligence and Neuroscience were devoted to single-wafer or batch production using 58 nm∼110 nm technologies. The large-scale and the reentrant process flows made production control in the wafer fabrication factory a very tough task. The release policy was uniform; that is, jobs were released into it at a fixed interval, as is common in memory fabrication factories. FIFO was employed to sequence jobs on most of the workstations. The research sought to replace FIFO with better rules that might shorten the average cycle times and quicken deliveries to customers.
Although there were more than 10 products in the wafer fabrication factory, this research only considered the two major products that occupied most of the factory capacity; these were labeled A and B. The simulated jobs were assigned various priorities. Jobs with higher priorities were to be processed first.
Nine existing approaches, FIFO, earliest due date (EDD), shortest remaining processing time (SRPT), CR, FSVCT, FSMCT, 1f-TNFSVCT, 1f-TNFSMCT, and 1f-biNFS, were evaluated for the simulated data. In EDD and CR, the internal due date of a job was determined by changing the cycle time multiplier [19]. Then, from several possible values, the value that gave the best performance was chosen (see Figures 4 and 5). Eleven values of ξ in 1f-TNFSMCT and 1f-TNFSVCT were taken from a list of possible values (0.1, 0.2, . . ., 1) and the ξ-value that returned the best schedule was taken as the output of the rule. The value of the factor in 1f-biNFS was determined in a similar way. The average cycle time, cycle time standard deviation of each product, and priority were compared for all approaches, as summarized in Tables 3 and 4. (1) Table 3 compares the performance levels of these methods with respect to the average cycle time. From the tabulated results, it is obvious that the biobjective slack-diversifying nonlinear fluctuation smoothing rule effectively shortened the average cycle times; for product B with normal priority, it was more than 10% better than FIFO. All the compared approaches were inferior to the biobjective slack-diversifying nonlinear fluctuation smoothing rule in this respect. (2) At the same time, it can be seen from Table 3 that the cycle time standard deviation was also controlled by applying the biobjective slack-diversifying nonlinear fluctuation smoothing rule. For a job of product A with the greatest time requirement and superhigh priority, the deviation of the cycle time from the average value was only 13 hours. This is remarkable for job dispatching in a wafer fabrication factory and conduces to reliable due date promises. (3) From Figures 4 and 5, it is obvious that the effects of the cycle time multiplier on EDD and CR were quite different, even though they employed the same method to determine the internal due date. (4) The biobjective slack-diversifying nonlinear fluctuation smoothing rule was better than the 1f-biNFS, with regard to both the average cycle time and the cycle time standard deviation. The advantages were 9% and 25% on average, respectively, which confirmed the usefulness of factor optimization to tailored rules like 1f-biNFS.
To determine whether the differences between the performance of the biobjective slack-diversifying nonlinear fluctuation smoothing rule and those of the nine existing approaches were significant, the following hypotheses were tested. six existing dispatching rules in reducing cycle time standard deviation was also significant at α = 0.025.

Conclusions and Directions for Future Research
In this paper, we have presented a biobjective slack-diversifying nonlinear fluctuation smoothing rule modified from 1f-biNFS. Our new rule provides superior performance for job dispatching in a wafer fabrication factory. Our new rule maximizes the standard deviation of the slack dynamically; many studies have considered this feature to be conducive to scheduling performance. A simulation experiment was set up to validate the effectiveness of the biobjective slack-diversifying nonlinear fluctuation smoothing rule.
(1) The biobjective slack-diversifying nonlinear fluctuation smoothing rule incorporates the concept of factor optimization, so as to avoid the drawbacks of existing tailored nonlinear fluctuation smoothing rules. Through self-adjustment and continuous response to the changing conditions in the wafer fabrication factory, the biobjective slack-diversifying nonlinear fluctuation smoothing rule proved itself to be an effective dispatching rule in the simulation experiment.
(2) The effectiveness of the biobjective slack-diversifying nonlinear fluctuation smoothing rule was fully revealed by the overall improvement in the scheduling performance, which was also examined and confirmed by statistical analyses.
(3) The biobjective nature of the biobjective slack-diversifying nonlinear fluctuation smoothing rule was best revealed by the simultaneous improvements in the average cycle time and cycle time standard deviation.
Conversely, there are also disadvantages or limitations associated with the proposed methodology.
(1) The way of diversifying the slack in the proposed methodology is subjective. For the same purpose, there are many other possible ways that can be tried to achieve better performance.
(2) Compared with the existing dispatching rules, the proposed method requires more time to estimate the remaining cycle time and optimizing the rule content.
However, the same concept can be applied to optimize other rules to pursue better scheduling performance. This might be examined in future studies. In addition, to further evaluate the advantages and disadvantages of the proposed methodology, it has to be applied to a full-scale actual semiconductor factory.