ANonlinear Programming and Artificial Neural Network Approach for Optimizing the Performance of a Job Dispatching Rule in aWafer Fabrication Factory

A nonlinear programming and artificial neural network approach is presented in this study to optimize the performance of a job dispatching rule in a wafer fabrication factory. The proposed methodology fuses two existing rules and constructs a nonlinear programming model to choose the best values of parameters in the two rules by dynamically maximizing the standard deviation of the slack, which has been shown to benefit scheduling performance by several studies. In addition, a more effective approach is also applied to estimate the remaining cycle time of a job, which is empirically shown to be conducive to the scheduling performance. The efficacy of the proposed methodology was validated with a simulated case; evidence was found to support its effectiveness. We also suggested several directions in which it can be exploited in the future.


Introduction
This study attempts to optimize the performance of a job dispatching rule in a wafer fabrication factory.The production equation required by a wafer fabrication factory is very expensive and must be fully utilized.For this purpose, to ensure that the capacity does not substantially exceed the demand is a perquisite.Subsequently, how to plan the use of the existing capacity to shorten the cycle time and maximize the turnover rate is an important goal.In this regard, scheduling is undoubtedly a very useful tool.
However, some studies [1][2][3][4] noted that job dispatching is very difficult task in a semiconductor manufacturing factory.Theoretically, it is an NP-hard problem.In practice, many semiconductor manufacturing factories suffer from lengthy cycle times and are not able to improve on their delivery promises to their customers.
Semiconductor manufacturing can be divided into four stages: wafer fabrication, wafer probing, packaging, and final testing.The most important stage is wafer fabrication.It is also the most time-consuming one.In this study, we investigated the job dispatching for this stage.This field includes many different methods, including dispatching rules, heuristics, data-mining-based approaches [5,6], agent technologies [5,[7][8][9], and simulation.Among them, dispatching rules (e.g., first-in first out (FIFO), earliest due date (EDD), least slack (LS), shortest processing time (SPT), shortest remaining processing time (SRPT), critical ratio (CR), the fluctuation smoothing rule for the mean cycle time (FSMCT), and the fluctuation smoothing rule for cycle time variation (FSVCT), FIFO+, SRPT+, and SRPT++) all have received a lot of attention over the last few years [5][6][7] and are the most prevalent methods used in practical applications.For details on the traditional dispatching rules, please refer to Lu et al. [10].Some advances in this field are as follows.Altendorfer et al. [11] proposed the work in parallel queue (WIPQ) rule targeting maximizing throughput at a low level of work in process (WIP).Zhang et al. [12] proposed the dynamic bottleneck detection (DBD) approach by classifying workstations into several categories and then applied different dispatching rules to these categories.They used three dispatching rules including FIFO, the shortest processing time until the next bottleneck (SPNB), and CR.Based on the current conditions in the wafer fabrication factory, Hsieh et al. [6] chose one approach from FSMCT, FSVCT, largest deviation first (LDF), one step ahead (OSA), or FIFO.Chen [13] modified FSMCT and proposed the nonlinear FSMCT (NFSMCT) rule, in which he smoothed the fluctuation in the estimated remaining cycle time and balanced it with that of the release time or the mean release rate.To diversify the slack, he applied the "division" operator instead.This was followed by Chen [14], in which he proposed the onefactor-tailored NFSMCT (1f-TNFSMCT) rule and the onefactor-tailored nonlinear FSVCT (1f-TNFSVCT) rule.Both rules contain an adjustable parameter to allow them to be customized for a target wafer fabrication factory.Chen [15] used more parameters and proposed 2f-TNFSMCT and 2f-TNFSVCT.
In a multiple-objective study, Chen and Wang [16] proposed a biobjective nonlinear fluctuation smoothing rule with an adjustable factor (1f-biNFS) to optimize both the average cycle time and the cycle time variation at the same time.More degrees of freedom seem to be helpful in the performance of customizable rules.For this reason, Chen et al. [17] extended 1f-biNFS to a biobjective fluctuation smoothing rule with four adjustable factors (4f-biNFS).For a summary of these rules please refer to Table 1.One drawback of these rules is that only static factors are used, and they must be determined in advance.To this end, most studies (e.g., [13][14][15][16][17]) performed extensive simulations.This is not only time-consuming but it also fails to consider enough possible combinations of these factors.Chen [18] established a mechanism that was able to adjust the values of the factor in 1f-biNFS dynamically (dynamic 1f-biNFS).However, even though satisfactory results were obtained in his experiment, there was no theoretical basis supporting the proposed mechanism.Chen [19] attempted to relate the scheduling performance to the factor values using a back propagation network (BPN).If that would have worked, then the factor values contributing to the optimal scheduling performance could have been found.However, the explanatory ability of the BPN was not good enough.
At the same time, Chen [18] stated that a nonlinear fluctuation smoothing rule uses the divisor operator instead of the subtraction operator, which diversifies the slack and makes the nonlinear fluctuation smoothing rule more responsive to changes in the parameters.Chen and Wang [16] proved that the effects of the parameters are balanced better in a nonlinear fluctuation smoothing rule than in a traditional one if the variation in the parameters is large.In addition, there will be fewer ties since the slack values are very different.Further, magnifying the difference in the slack seems to improve the scheduling performance, especially with respect to the average cycle time [20].For these reasons, a slack-diversifying fuzzy-neural rule is used in chen et al. [20] for job dispatching in a wafer fabrication factory, in order to further improve the performance of job dispatching in a wafer fabrication factory.The slackdiversifying nonlinear fluctuation smoothing rule is modified from 1f-TNFSVCT by maximizing the difference in the slack measured with the standard deviation of the slack.
This study adopts several treatments to further improve Wang et al.'s approach.(1) In nonlinear fluctuation smoothing rules, it is common that some jobs have very large or small slack values, that is the extreme case (see Figure 1), which usually distorts the results of calculating the standard deviation of slacks.In this study, the extreme cases are excluded before calculation.
(2) Two objectives, the average cycle time and cycle time standard deviation, are considered at the same time by fusing the results from 2f-TNFSMCT and 2f-TNFSVCT.
(3) A nonlinear programming problem is solved to find the optimal values of parameters in 2f-TNFSMCT and 2f-TNFSVCT.
(4) On the other hand, the remaining cycle time of a job needs to be estimated in 2f-TNFSMCT and 2f-TNFSVCT.For this reason, we also propose a more effective fuzzy-neural approach to estimate the remaining cycle time of a job.The fuzzy-neural approach is a modification of the fuzzy c-means and back propagation network (FCM-BPN) approach [17] by incorporating in the concept of principal component analysis (PCA).According to Chen and Wang [3], with more accurate remaining cycle time estimation, the scheduling performance of a fluctuation smoothing rule can be significantly improved.
In the original study, Chen and Wang used a gradient search algorithm for training the BPN, which is timeconsuming and not very accurate.In this study, we use the Levenberg-Marquardt algorithm to achieve the same purpose, which is more efficient than that in Chen and Wang's study and can produce more accurate forecasts.
The differences between the proposed methodology and the previous methods are summarized in Table 1.
The remainder of this paper is arranged as follows.Section 2 provides the details of the proposed methodology.In Section 3, a simulated case is used to validate the effectiveness of the nonlinear programming and artificial neural network approach.The performances of some existing approaches in this field are also examined using the simulated data.Finally, we draw our conclusions in Section 4 and provide some worthwhile topics for future work.

Methodology
The variables and parameters that will be used in the proposed methodology are defined in the following.
(1) R j : the release time of job j; j = 1∼n.
(2) BQ j : the total queue length before bottlenecks at R j .
(3) CR ju : the critical ratio of job j at step u.
(4) CT j : the cycle time of job j.
(5) CTE j : the estimated cycle time of job j.
(6) D j : the average delay of the three most recently completed jobs at R j .
(7) DD j : the due date of job j.
(8) FQ j : the total queue length in the whole factory at R j .
(9) Q j : the queue length on the processing route of job j at R j .
(10) RCTE ju : the estimated remaining cycle time of job j from step u.
(11) RPT ju : the remaining processing time of job j from step u.
(12) SCT ju : the step cycle time of job j until step u.
(13) SK ju : the slack of job j at step u.
(14) U j : the average factory utilization before job j is released.If the utilization of the factory is reported on a daily basis, then U j is the utilization of the day before job j is released.
(15) WIP j : the factory work in progress (WIP) at R j .
(18) h l : the output from hidden-layer node l, l = 1∼L.The proposed methodology includes the following seven steps.
Step 2. Use FCM to classify jobs.The required inputs for this step are the new variables determined by PCA.To determine the optimal number of categories, we use the S-test.The output of this step is the category of each job.
Step 3. Use the BPN approach to estimate the cycle time of each job.Jobs of different categories will be sent to different three-layer BPNs.The inputs to the three-layer BPN include the new variables of a job, while the output is the estimated cycle time of the job.
Step 4. Derive the remaining cycle time of each job from the estimated cycle time.
Step 5. Incorporate the estimated remaining cycle time into the new rule that is composed of two subrules-2f-TNFSMCT and 2f-TNFSVCT.
Step 6. Find out the optimal value of parameters in the new rule by solving a nonlinear programming problem.
The remaining cycle time of a job being produced in a wafer fabrication factory is the time still needed to complete the job.If the job is just released into the wafer fabrication factory, then the remaining cycle time of the job is its cycle time.The remaining cycle time is an important input for the scheduling rule.Past studies (e.g., [21][22][23][24]) have shown that the accuracy of the remaining cycle time forecasting can be improved by job classification.Soft computing methods (e.g., [3,20,25,26]) have received much attention in this field.

PCA Analysis.
First, PCA is used to replace the inputs to the FCM-BPN.The combination of PCA and FCM has been shown to be a more effective classifier than FCM alone.Although there are more advanced applications of PCA, in this study PCA is used to enhance the efficiency of training the FCM-BPN.PCA consists of the four following steps: (1) Raw data standardization: to eliminate the difference between the dimensions and the impact of large numerical difference in the original variables{U j , Q j , BQ j , FQ j , WIP j , D j }, the original variables are standardized: where x j and σ j indicate the mean and standard deviation of variable j, respectively, (2) Establishment of the correlation matrix R: where X * is the standardized data matrix.The eigenvalues and eigenvectors of R are calculated and represented as λ 1 ∼ λ m and u 1 ∼ u m , respectively; (3) Determination of the number of principal components: the variance contribution rate is calculated as: and the accumulated variance contribution rate is Choose the smallest q value such that η Σ (q) ≥ 85%∼ 90%.
(4) Formation of the following matrixes: ( After PCA, examples are then classified using FCM.

The FCM Approach.
In the proposed methodology, jobs are classified into K categories using FCM.If a crisp clustering method is applied, then it is possible that some clusters will have very few examples.In contrast, an example belongs to multiple clusters to different degrees in FCM, which provides a solution to this problem.
FCM classifies jobs by minimizing the following objective function: where K is the required number of categories; n is the number of jobs; μ j(k) indicates the membership that job j belongs to category k; e j(k) measures the distance from job j to the centroid of category k; m [1, ∞) is a parameter to adjust the fuzziness and is usually set to 2. The procedure of FCM is as follows.
(2) (Iterations) Calculate the centroid of each category as 2/(m−1) , where x (k) is the centroid of category k. μ (t) j(k) is the membership that job i belongs to category k after the tth iteration.
(3) Remeasure the distance from each job to the centroid of each category, and then recalculate the corresponding membership.(4) Stop if the following condition is met.Otherwise, return to step (2): where d is a real number representing the threshold for the convergence of membership.Finally, the separate distance test (S-test) proposed by Xie and Beni [24] can be applied to determine the optimal number of categories K: subject to The K value minimizing S determines the optimal number of categories.

The BPN Approach.
After clustering, a portion of the jobs in each category is input as the "training examples" to the three-layer BPN to determine the parameter values.The configuration of the three-layer BPN is set up as follows.First, inputs are the six parameters associated with the jth example/job including the q new variables.These parameters have to be normalized before feeding into the three-layer BPN.Subsequently, there is only a single hidden layer with neurons that are twice that in the input layer.Finally, the output from the three-layer BPN is the (normalized) estimated cycle time (CTE j ) of the example.The activation function used in each layer is Log Sigmoid function: The procedure for determining the parameter values is now described.Two phases are involved at the training stage.At first, in the forward phase, inputs are multiplied with weights, summated, and transferred to the hidden layer.Then activated signals are outputted from the hidden layer as where h l 's are also transferred to the output layer with the same procedure.Finally, the output of the BPN is generated as where Subsequently, in the backward phase, some algorithms are applicable for training a BPN, such as the gradient descent algorithms, the conjugate gradient algorithms, and the Levenberg-Marquardt algorithm.In this study, the Levenberg-Marquardt algorithm is applied.The Levenberg-Marquardt algorithm was designed for training with secondorder speed without having to compute the Hessian matrix.It uses approximation and updates the network parameters in a Newton-like way, as described below.
The Levenberg-Marquardt algorithm is an iterative procedure.In the beginning, the user should specify the initial values of the network parameters β.Let β T = (1, 1, . .., 1) is a common practice.In each step, the parameter vector β is replaced by a new estimate β + δ, where δ = [Δw h 11 , . .., The network output becomes f (x j , β + δ) that is approximated by its linearization as where is the gradient vector of f with respect to β. Substituting ( 17) into ( 16), When the network reaches the optimal solution, the gradient of SSE with respect to δ will be zero.Taking the derivative of SSE(β + δ) with respect to δ and setting the result to zero gives where J is the Jacobian matrix containing the first derivative of network error with respect to the weights and biases.Equation ( 20) includes a set of linear equations that can be solved for δ.Finally, the BPN can be applied to estimate the cycle time of a job, and then the remaining cycle time of the job can be derived as 2.4.The New Rule.In traditional fluctuation smoothing (FS) rules there are two different formulation methods, depending on the scheduling purpose [22].ne method is aimed at minimizing the average cycle time with FSMCT: The other method is aimed at minimizing the variance of cycle time with FSVCT: Jobs with the smallest slack values (SKM ju or SKV ju ) will be given higher priority.These two rules and their variants have been proven to be very effective in shortening the cycle time in wafer fabrication factories [10,[14][15][16][17].
Chen [15] normalized the parameters and used the division operator instead and derived the 2f-TNFSVCT rule: and the 2f-TNFSMCT rule: where There are many possible models to form the combination of ξ and ζ.For example, The new rule is composed of two rules.The first rule is derived by diversifying the slack in the 2f-TNFSVCT rule, aimed at minimizing the variation of cycle time [22].To diversify the slack, the standard deviation of the slack is to be maximized as follows: However, in nonlinear fluctuation smoothing rules, it is common that two of the jobs will have very large or small slack values, that is, the extreme cases, which distort the sequencing results.For this reason, such jobs are put in a set EC that will be excluded from calculating the standard deviation: The second rule is derived by diversifying the slack in the 2f-TNFSMCT rule, aimed at minimizing the mean cycle time: To diversify the slackness, the standard deviation of the slack is to be maximized: To generate a biobjective rule, the two rules need to be combined into a single one, for which the following nonlinear programming model is to be optimized: which is an NP problem.

A Simulation Study
To evaluate the effectiveness of the proposed methodology, simulated data were used to avoid disturbing the regular operations of the wafer fabrication factory.Simulation is a widely used technology to assess the effectiveness of a scheduling policy, especially when the proposed policy and the current practice are very different.This investigation is not possible to implement in the actual production environment.The real-time scheduling systems will input information very rapidly into the production management information systems (PROMIS).To this end, a real wafer fabrication factory located in Taichung Scientific Park of Taiwan with a monthly capacity of about 25,000 wafers was simulated.The simulation program has been validated and verified by comparing the actual cycle times with the simulated values and by analyzing the trace report, respectively.The wafer fabrication factory is producing more than 10 types of memory products and has more than 500 workstations for performing single-wafer or batch operations using 58 nm∼110 nm technologies.Jobs released into the fabrication factory are assigned three types of priorities, that is, "normal," "hot," and "super hot."Jobs with the highest priorities will be processed first.Such a large scale accompanied with reentrant process flows make job dispatching in the wafer fabrication factory a very tough task.Currently, the longest average cycle time exceeds three months with a variation of more than 300 hours.The wafer fabrication factory is therefore seeking better dispatching rules to replace first-in first-out (FIFO) and EDD, in order to shorten the average cycle times and ensure the on-time delivery to its customers.One hundred replications of the simulation are successively run.The time required for each simulation replication is about 30 minute using a PC with Intel Dual CPU E2200 2.2 GHz and 1.99G RAM.A horizon of twenty-four months is simulated.
To assess the effectiveness of the proposed methodology and to make comparison with some existing approaches-FIFO, EDD, SRPT, CR, FSVCT, FSMCT, Justice [27], NFS [16], 2f-TNFSMCT, and 2f-TNFSVCT all of these methods were applied to schedule the simulated wafer fabrication factory to collect the data of 1000 jobs, and then we separated the collected data by their product types and priorities.That is about the amount of work that can be achieved with 100% of the monthly capacity.In some cases, there was too little data, so they were not discussed.
To determine the due date of a job, the PCA-FCM-BPN approach was applied to estimate the cycle time, for which the Levenberg-Marquardt algorithm rather than the gradient descent algorithm was applied to speed up the network convergence.Then, we added a constant allowance of three days to the estimated cycle time, that is, κ = 72, to determine the internal due date.
Jobs with the highest priorities are usually processed first.In FIFO, jobs were sequenced on each machine first by their priorities, then by their arrival times at the machine.In EDD, jobs were sequenced first by their priorities, then by their due dates.In CR, jobs were sequenced first by their priorities, then by their critical ratios.In the proposed methodology, the nonlinear model with k = 2 is used.In Justice, jobs were sequenced on each machine first by their priorities, then according to the job speed matrix (Table 2).Subsequently, the average cycle time and cycle time standard deviation of all cases were calculated to assess the scheduling performance.With respect to the average cycle time, the FIFO policy was used as the basis for comparison, while FSVCT was compared in evaluating cycle time standard deviation.The results are summarized in Tables 3 and 4.
According to the experimental results, the following points can be made: (1) For the average cycle time, the proposed methodology outperformed the baseline approach, the FIFO policy.The average advantage was about 16%.
(2) In addition, the proposed methodology surpassed the FSVCT policy in reducing cycle time standard deviation.The most obvious advantage was 59%.
(3) As expected, SRPT performed well in reducing the average cycle times, especially for product types with short cycle times (e.g., product A), but might give an exceedingly bad performance with respect to cycle time standard deviation.If the cycle time is long, the remaining cycle time will be much longer than the remaining processing time, which leads to the ineffectiveness of SRPT.SRPT is similar to FSMCT.Both try to make all jobs equally early or late.
(4) The performance of EDD was also satisfactory for product types with short cycle time.If the cycle time is long, it is more likely to deviate from the prescribed internal due date, which leads to the ineffectiveness of EDD.That becomes more serious if the percentage of the product type is high in the product mix (e.g., product type A).CR has similar problems.
(5) The proposed rule was also compared with the traditional one without slack diversification.Taking product type A with normal priority as an example, the comparison results are shown in Figure 2.
Obviously, the proposed rule dominated most of the traditional rules without slack diversification.According to these results, slack diversification did indeed improve the performances of the fluctuation smoothing policies.

Conclusions and Directions for Future Research
For capital-intensive industries like wafer fabrication, efficient use of expensive equipment is very important.To this end, job dispatching is a challenging but important task.However, for such a complex production system, to optimize the scheduling performance is a tough task.As an innovative attempt, this study presents a nonlinear programming and artificial neural network approach to optimize the performance of a slack-diversifying dispatching rule in a wafer fabrication factory, to optimize the average cycle time, and to optimize cycle time standard deviation.The proposed methodology merges two existing rules-2f-TNFSMCT and 2f-TNFSVCT, and constructs a nonlinear programming model to choose the best values of parameters in the two rules.A more effective approach is also applied to estimate the remaining cycle time of a job, which is empirically shown to be conducive to the scheduling performance.
To further enhance the accuracy of the remaining cycle time estimation, other dynamic parameters must be considered.In addition, some advanced methods for the cycle time estimation, such as data mining methods [28], can be applied as well.
After a simulation study, we observed the following phenomena.
(1) Through improving the accuracy of estimating the remaining cycle time, the performance of a scheduling rule can indeed be strengthened.
(2) Optimizing the adjustable factors in the two rules appears as an appropriate tool to enhance the scheduling performance of the rule.
(3) Slack diversification is indeed conducive to the performance of a fluctuation smoothing rule.
However, to further assess the effectiveness and efficiency of the proposed methodology, the only way is to apply it to an actual wafer fabrication factory.In addition, other rules can be optimized in the same way in future studies.

( 19 )
w o l : the connection weight between hidden-layer node l and the output node.

( 21 )
θ h l : the threshold on hidden-layer node l.(22) θ o : the threshold on the output node.

Table 1 :
The differences between the proposed methodology and the previous methods.
The objective function of the BPN is to minimize the root mean-squared error (RMSE) or equivalently the sum of squared error (SSE): The network parameters are placed in vector β = [w h 11 , . .., w h qL , θ h 1 , . .., θ h L , w o 1 , . .., w o L , θ o ].The network output o j can be represented with f (x j , β).

Table 2 :
The job speed matrix.

Table 3 :
The performances of various approaches in the average cycle time.

Table 4 :
The performances of various approaches in cycle time standard deviation.