Precise and Accurate Job Cycle Time Forecasting in a Wafer Fabrication Factory with a Fuzzy Data Mining Approach

Many data miningmethods have been proposed to improve the precision and accuracy of job cycle time forecasts for wafer fabrication factories.This study presents a fuzzy datamining approach based on an innovative fuzzy backpropagation network (FBPN) that determines the lower and upper bounds of the job cycle time. Forecasting accuracy is also significantly improved by a combination of principal component analysis (PCA), fuzzy c-means (FCM), and FBPN. An applied case that uses data collected from awafer fabrication factory illustrates this fuzzy data mining approach. For this applied case, the proposed methodology performs better than six existing data mining approaches.


Introduction
To forecast the job cycle time in a wafer fabrication factory is difficult because each wafer fabrication factory is a complicated production system with idiosyncratic features such as changing demand, a variety of product types and priorities, equipment unreliability, unbalanced capacity, job reentry into machines, alternative machines, sequence-dependent setup times, and shifting bottlenecks.An average job cycle time is several months with hundreds of hours of standard deviation.Many studies have shown that accurately predicting the cycle/ completion times for such large systems is very difficult [2][3][4].
There are two approaches to forecasting the cycle time of a job.The first approach, the input-output relationship approach, is to determine certain factors (e.g., average waiting time, queuing length, utilization, future release plans, bottlenecks, etc.) that influence the job cycle time, and then to apply different approaches (e.g., multiple linear regression (MLR), artificial neural networks (ANN), etc.) to model the relationship between the job cycle time and these factors, in order to forecast the cycle time of a new job.The second approach, the time-series approach, is to treat fluctuations in the job cycle time as a type of time series.Many approaches, for example, moving average (MA), weighted moving average (WMA), exponential smoothing (ES), MLR, ANN, autoregressive integrated moving average (ARIMA), and others, can be applied to forecast the cycle time of a new job.The cycle time of a job is usually predicted when the job is released into the factory, but several months are required for the job to complete all operations.In such a case, the production controller may need to forecast the cycle time of job number +10000 according to the historical data of jobs 1 ∼ .This is a difficult problem for time-series methods; the present study contributes a useful approach to this problem.
Existing approaches to job cycle time estimation for a wafer fabrication factory can be classified into six categories: statistical analysis, production simulation (PS), ANN, casebased reasoning (CBR), fuzzy modeling methods, and hybrid approaches [5].Many approaches are in fact special types of data mining methods.Table 1 summarizes references related to these categories.Data mining methods that have been extensively applied to job cycle time forecasting include clustering, genetic algorithms (GA), ANNs, decision trees, and hybrid approaches.The relevant references are reviewed below.
Backus et al. [1] compared the performances of clustering, K-nearest neighbors (i.e., CBR), classification and regression tree (CART), and ANN; CART gave the best performance of that experiment.Meidan et al. [6] focused on the choice of key factors in cycle time forecasting.They compared multiple linear regression (i.e., statistical analysis), ANN, and a selective naive Bayesian classifier (SNBC); they found that their ANN had the best performance.
Wang and Mendel [7] proposed a fuzzy modeling method called the WM method; the first step of that method was partitioning the range of each input variable into several fuzzy intervals.Chang et al. [8] modified the first step of the WM method with a simple GA and proposed an evolving fuzzy rule (EFR) approach to predict the cycle time of a job in a wafer fabrication factory.Their EFR approach outperformed CBR and ANN in terms of forecasting accuracy.
A BPN is an effective tool for modeling complex physical systems described by sets of different equations.BPNs are useful for prediction, control, and design purposes; they can offer both effectiveness (high forecasting accuracy) and efficiency (short execution time).Chang and Hsieh [9], Chang et al. [8], and Sha and Hsu [10] all predicted the cycle times of jobs in wafer fabrication factories with backpropagation networks (BPNs) that each had a single hidden layer.These BPNs delivered better average forecasting accuracy than statistical analysis approaches, as measured by root mean squared error (RMSE).For example, an improvement of about 40% in RMSE was achieved in Chang et al. [8].Chen [5] constructed a fuzzy backpropagation network (FBPN) that incorporated expert opinions to modify the inputs of the FBPN.Chen's FBPN surpassed a crisp BPN especially with respect to efficiency.Chen [11] incorporated the job release plan of a wafer fabrication factory into a BPN, and constructed a "lookahead" BPN for the same purpose, which led to an average reduction of 12% in RMSE.Beeg [12] and Chen et al. [13,14] predicted the cycle time of a job in a ramping-up wafer fabrication factory.Chen et al. used a BPN-based method; Beeg tried to figure out the impact of utilization for the cycle time.Tirkel [15] applied two data mining approaches-decision trees and ANN-and concluded that ANN was more accurate.
Chang and Liao [26] combined a self-organizing map and the Wang and Mendel method (SOM-WM); a job was classified using a SOM before the job cycle time was predicted by the WM method.Chen [11] constructed a look-ahead kmeans-fuzzy backpropagation network (kM-FBPN) for the same purpose and discussed the effects of using different look-ahead functions.More recently, Chen et al. [23] proposed a look-ahead SOM-FBPN approach for job cycle time forecasting in a semiconductor factory; a set of fuzzy inference rules was developed to evaluate the achievability of a cycle time forecast.Subsequently, Chen et al. [3] added a selective time allowance to the look-ahead SOM-FBPN approach to determine internal due dates.Further, Chen [24] showed that a combined SOM and FBPN could be improved by feeding back the forecasting error by the FBPN to adjust the classification results of the SOM.Chen et al. [13,14] proposed a postclassification FBPN-BPN approach in which a job was not preclassified; the job was classified after the cycle time had been predicted.Experimental results showed that the post-classification approach was better than some preclassification approaches in some cases.In order to combine the advantages of preclassifying and post-classifying approaches, Chen [25] proposed a bidirectional classifying approach that combined FCM, FBPN, and a radial basis function network (RBF), in which jobs are not only preclassified but also postclassified.Instead of bi-directional classification, Chien et al. [4] used nonlinear regression equations and then related the forecasting error to some factory conditions and job attributes with a backpropagation network (BPN) to improve the forecasting accuracy.
Table 2 summarizes the data mining methods in this field.The advantages and disadvantages of these methods are compared in Table 3.
To improve the precision and accuracy of job cycle time forecasting in a wafer fabrication factory, this study presents a fuzzy data mining approach that emphasizes both accuracy and precision.
(1) Accuracy: the forecast value should be as close as possible to the actual value.
(2) Precision: a narrow interval containing the actual value should be established.
Chen [30] defined several methods to measure the precision and accuracy of a fuzzy forecasting method and noted that precision has seldom been considered.However, past studies have confirmed that precision and accuracy are closely related.Song and Chissom [31] proposed a first-order timevariant model to forecast enrollment and considered the effect of the model basis on the forecasting precision.Specifically, if the space of feasible solutions is narrowed, it becomes more likely that the actual value can be found [30,32].The fuzzy data mining approach has the following innovative characteristics (1) Some factors used to estimate the cycle time are dependent on each other, which may cause problems in classifying jobs and in fitting the relationship between the job cycle time and these factors.To solve this problem, PCA, FCM, and FBPN are combined to Meidan et al. [6], and Tirkel [15] Hybrid approaches SOM-WM [26], kM-FBPN [11,29], look-ahead SOM-FBPN [3,23,24], post-classifying FBPN-BPN [13,14], and bi-directional classifying FCM-FBPN-RBF [25] predict the job cycle time, which is an innovation in this field.(2) The job cycle time is predicted by an effective FBPN approach.For the job cycle time, there are upper and lower bounds that constrain the forecast to be within a possible range.The FBPN approach from Chen and Wang's FBPN research [33] has been modified for this purpose.In Chen and Wang's FBPN approach two nonlinear programming (NP) models are solved to determine the upper and lower bounds of the job cycle time.However, the NP models involve complicated constraints and therefore are difficult to solve.In addition, the NP models will become too huge if many jobs are to be considered.To solve these problems, in this study, the upper and lower bounds are determined in a simpler way.
The differences between the proposed methodology and various previous methods are summarized in Table 4.The remainder of this paper is organized as follows.Section 2 introduces the proposed methodology, which is composed of six steps.Section 3 demonstrates the proposed method with an illustrative example, then presents another case with data collected from a real wafer fabrication factory, and makes comparisons with some existing approaches.Finally, concluding remarks and directions for future research are given in Section 4.

Methodology
The fuzzy data mining procedure consists of several steps.
Step 3. Forecasting the cycle times of jobs in each category using a FBPN.
Step 4. Updating the upper and lower bounds.
Step 5. Evaluating and comparing the forecasting precision and accuracy of the proposed methodology.Step 6.If the improvement in the forecasting precision or accuracy becomes negligible, stop; otherwise, return to Step 3.
A flow chart of the proposed methodology is shown in Figure 1.These steps can be compared to the basic steps of data mining, as shown in Table 5.

2.1.
Step 1: Variable Replacement Using PCA.First, PCA is used to replace the inputs to the FBPN.A series of linear combinations of the original variables forms a new variable so that these new variables are as unrelated to each other as possible.Although there are more advanced applications of PCA, in this study PCA is used to enhance the efficiency of FBPN training.PCA consists of the four following steps.
(1) Raw data standardization: to eliminate dimensional differences and large numerical differences the original variables are standardized [13,14]: where   is the th attribute of job ,  = 1 ∼ ;   and   indicate the mean and standard deviation of variable i, respectively.(2) Establishment of the correlation matrix  is as follows: where  * is the standardized data matrix.The eigenvalues and eigenvectors of  are calculated and represented as  1 ∼   and  1 ∼   , respectively; (3) Determination of the number of principal components: the variance contribution rate is calculated as and the accumulated variance contribution rate is Choose the smallest  value such that  Σ () rangesfrom 85% to 90%.A Pareto analysis chart can be used to show the percentage of variability that can be attributed to each principal component.
(4) Formation of the following matrices: × =  * ×  × . ( × = [  ] ( = 1 ∼ ;  = 1 ∼ ) are the component scores.These scores contain the coordinates of the original data in the new coordinate system defined by the principal components.They will be used as the new inputs to the FBPN.

Step 2: Classifying
where  is the required number of categories; n is the number of jobs;  () indicates the degree of membership that job  has in category k;  () measures the distance from job  to the centroid of category ;  ∈ [1, ∞) is a parameter to adjust the fuzziness and is usually set to 2. The procedure of FCM is as follows.
(3) Iterations calculate the centroid of each category as where  () is the centroid of category . () () is the degree of membership that job  has in category  after the th iteration.
(4) Remeasure the distance from each job to the centroid of each category and then recalculate the corresponding membership.
(5) Stop if the following condition is met.Otherwise, return to Step (3): where  is a real number representing the membership convergence threshold.
The performance of FCM is affected by the settings of the initial values; initialization can be repeated multiple times in order to find the optimal solution.Finally, the separate distance test ( test) proposed by Xie and Beni [34] can be applied to determine the optimal number of categories : The  value that minimizes  determines the optimal number of categories.

Step 3: Forecasting the Cycle Times of Jobs in Each
Category Using a FBPN.All categories in the present study are considered by the FBPN.There are different types of FBPNs, for example, Nomura et al. [35], Chen [5], Lin [36], and Chen and Wang [33].Among them, Chen and Wang's FBPN is unique since it is aimed at optimizing precision.For this reason, it is chosen for this study.This FBPN is a proven technology that has incorporated many features of prior artificial neural networks, which have solved a wide variety of problems characterized by sets of different equations.Some special problems can benefit from specialized tools, such as compositional pattern-producing networks, cascading neural networks, and dynamic neural networks, but for the present job cycle time problem, a well-trained FBPN with an optimized structure can still produce very good results.All parameters in the FBPN are expressed in triangular fuzzy numbers (TFNs).There are various types of fuzzy numbers with different shapes; among them, the triangular type is easily implemented and has been universally applied to numerous applications (e.g., [37,38]).Although the proposed methodology uses TFNs, it can be easily modified to use other types of fuzzy numbers, such as trapezoidal fuzzy numbers or bounded LR-type fuzzy numbers.
The FBPN is configured as follows.
(1) Inputs: there are  sets of inputs; each set consists of the new factors determined by PCA from the th job.These factors have to be partially normalized so that their values fall within [0.1, 0.9] [13, 14].
(2) Single hidden layer: generally one hidden layer gives the best convergence results for the FBPN.
(3) For simplicity, the number of neurons in the hidden layer is twice the number in the input layer.An increase in the number of hidden-layer nodes lessens the output errors for the training examples, but increases the errors for novel examples.This phenomenon is often called "overfitting." Some research has considered the connections between the complexity of a FBPN, the performance for the training data, and the number of examples; noteworthy research includes Akaike's information criterion (AIC) [39] and Rissanen's minimum description length (MDL) [40].
(4) Output: the (normalized) cycle time forecast of the example.
The procedure for determining the parameter values is now described.After pre-classification, a portion of the adopted examples in each category is fed as "training examples" into the FBPN to determine the parameter values.Two phases are involved at the training stage.First, in the forward phase, inputs are multiplied with weights, summed, and transferred to the hidden layer.Then, activated signals are outputted from the hidden layer as where h is the output from hidden-layer node ,  = 1 ∼ ; θℎ  is the threshold for screening out weak signals by hidden-layer node ; wℎ  is the weight of the connection between input node  and hidden-layer node ,  = 1 ∼ ;  = 1 ∼ ;   is the th input,  = 1 ∼ .The remaining parameters are transition variables.(−) and (×) denote fuzzy subtraction and multiplication, respectively.
h s are also transferred to the output layer with the same procedure.Finally, the output of the FBPN is generated as follows: where õ is the network output, which is the normalized value of the cycle time forecast of job ; θ is the threshold for screening out weak signals by the output node; w  is the weight of the connection between hidden-layer node  and the output node;  = 1 ∼ .
Subsequently, in the backward phase, the training of the FBPN is decomposed into three subtasks: determining the center value and upper and lower bounds of the parameters.
First, to determine the center of each parameter (such as  ℎ 2 ,  ℎ 2 ,   2 , and   2 ), the FBPN is treated as a crisp network.Algorithms applicable for this purpose include the gradient descent algorithms, the conjugate gradient algorithms, the Levenberg-Marquardt algorithm, and others.In this study, the Levenberg-Marquardt algorithm is applied.The Levenberg-Marquardt algorithm was designed for training with second-order speed without having to compute the Hessian matrix.It uses approximation and updates the network parameters in a Newton-like way, as described below.
The network parameters are placed in vector  = [ ℎ 11 , . . .,  ℎ  ,  ℎ 1 , . . .,  ℎ  ,   1 , . . .,    ,   ].The network output   can be represented with (x  , ).The objective function of the FBPN is to minimize RMSE or equivalently the sum of squared error (SSE): The Levenberg-Marquardt algorithm is an iterative procedure.In the beginning, the user should specify the initial values of the network parameter vector .It is a common practice to set  T = (1, 1, . . ., 1 where is the gradient vector of  with respect to .Substituting ( 16) into (15) gives When the network reaches the optimal solution, the gradient of SSE with respect to  will be zero.Taking the derivative of SSE(+) with respect to  and setting the result to zero gives where J is the Jacobian matrix containing the first derivative of network error with respect to the weights and biases.Equation ( 19) includes a set of linear equations that can be solved for .

2.4.
Step 4: Determining the Upper Bounds of Parameters in the FBPN.In Chen and Wang [33] and Chen and Lin [41], an NP model is constructed to adjust the connection weights and thresholds in the FBPN; such an NP model is not easy to solve.The proposed methodology only makes iterative adjustments to the threshold on the output node.This way is much simpler and can also achieve good results.Substituting ( 13) into (12) gives Therefore, ln So Assume that the adjustment made to the output node threshold is denoted as Δ  =   3 −   2 .After adjustment, the output from the new FBPN,  3 , determines the upper bound of the cycle time: where Substituting ( 24) into (23) gives and substituting ( 22) into (25) gives Obviously, the maximum of Δ  determines the lowest upper bound, since  3 is the upper bound of the cycle time,  3 ≥ (CT  ), Equation ( 28) holds for all jobs, so According to (26), the optimal value of Δ  should be set to the maximum possible value: Then the FBPN's optimization results depend on the initial conditions and therefore are different at every iteration.Assume that the optimal value of  3 in the th iteration is denoted by  3 (), then, after some iterations, In this way, the upper bound of the cycle time is decreased gradually (see Figure 2).Another merit of this approach is that it does not rely on the parameters of the FBPN.
Theorem 1.The upper bound can be found by considering only jobs with (CT  ) ≥  2 .

Step 4: Determining the Lower Bounds of Parameters in the FBPN.
In a similar way, the threshold on the output node can be modified to determine the lower bound of the cycle time forecast so that the actual value will be greater than the lower bound of the network output.The optimal value of Δ  can be obtained as Theorem 2. The lower bound can be found by considering only jobs with (CT  ) <  2 .
The proof of Theorem 2 resembles that of Theorem 1.
Proof.Assume that the optimal value of  1 in the th replication is indicated with  1 (), then after some iterations, In this way, the upper bound of the cycle time is increased gradually.Δ  * does not rely on the parameters of the FBPN.

Application and Analyses
To demonstrate the application of the proposed methodology, an illustrative example containing the data of 40 jobs (see Table 6) was used.
We standardize the data and obtain the correlation matrix .The eigenvalues and eigenvectors of  are calculated as .20 0.52 0.16 0.56 0.07 respectively.The variance contribution rates are Summing up   's, we obtain A Pareto analysis chart is used to compare the percentage of variability explained by each principal component (see Figure 3).There is a clear break in the amount of variance accounted for by each component between the first and second components.However, that component by itself explains less than 50% of the variance, so more components are considered.To meet the requirement  Σ () ≥ 85%∼90%, p is chosen as 3.We can see that the first three principal components explain roughly 80% of the total variability in the standardized data, so we reduce the issue to these three components in order to visualize the data.The component scores are recalculated as the new inputs to the FCM-FBPN (see Table 7).These scores contain the coordinates of the original data in the new coordinate system defined by the principal components.
Subsequently, jobs are classified using FCM based on the new variables.The results of the  test are summarized in Table 8.In this case, the optimal number of job categories is 5.However, there will be some categories with very few jobs.For this reason, the second best solution is used, that is, 4 categories.A common practice is to set a threshold of member- ship   to determine whether a job belongs to each category.With the decrease in the threshold, each category contains more jobs.Because the overall forecasting performance is affected by the outliers,   should not be set too high.In this case,   is set to 0.3.The classifying results are shown in Table 9.
After job classification, 3/4 of the examples in each category are used as the training example.The remaining 1/4 is left for testing.A three-layer FBPN is then used to predict the cycle time of jobs in each category according to the new variables with the following settings.

Single hidden layer
The number of neurons in the hidden layer: 2 * 3 = 6 Convergence criterion: SSE < 10 −6 or 10000 epochs have been run Iterations: 5 The forecasting results are shown in Figure 4.The performance level of the proposed methodology is compared with those of statistical analysis (i.e., multiple linear regression), CART, PCA-CART, BPN, FCM-BPN, and PCA-BPN in Table 10.The forecasting precision is evaluated with the average range: The average range of a nonbiased crisp approach can be measured with 6 (or 6 RMSE): where is the number of independent variables;   and   denote the actual value and forecast of job j, respectively;  is the total number of data.When  → ∞, The CART algorithm is based on that of Breiman et al. [42].
The results of CART are shown in Figure 5.In PCA-CART, the new variables found by PCA are used as input variables to determine the job cycle time.The results of PCA-CART are shown in Figure 6.The adjustment of the lower and upper bounds for the proposed methodology is shown in Figure 7.
According to the experimental results, we have the following.
(1) The nonlinear nature of this problem is obvious, since the performance of statistical analysis is poor, and statistical analysis is a linear approach.
(2) The accuracy of the job cycle time forecasting by the proposed fuzzy data mining approach is better than those of the compared approaches because the proposed approach has achieved 25%∼85% reductions in MAPE.(3) Moreover, the precision level of the proposed methodology, measured in terms of the average range of forecasts, is also superior to those of the compared approaches.The proposed methodology can be used to specify a very narrow range of the job cycle time.Since the internal due date is based on the cycle time forecast, a narrow range promotes the reliability of the internal due date.
(4) The proposed method is unusually slow in its modification of the lower bound (see Figure 7); a more effective way to solve this problem is needed.
(5) PCA seems to be useful for methods with job classification like CART.The combination of PCA and CART has significantly better forecasting accuracy than either method in isolation.
(6) The simple combination of PCA and BPN does not have much effect.The main effect of PCA is to improve the correctness of the job classification.
In order to test the performance of the proposed methodology on real world situations, it has been applied to schedule a problem of 457 jobs from a wafer fabrication factory located in Taichung City Scientific Park, Taiwan.There are more than ten products in the wafer fabrication factory.The wafer fabrication factory has a monthly capacity of 20,000 wafers.Jobs are regularly released into the wafer fabrication factory.Various types of priorities are assigned to these jobs.Each  product has 150∼200 steps and six to nine reentries to the most bottlenecked machine.The first-in first-out (FIFO) rule is used to dispatch the jobs.The production characteristic of "reentry, " which is highly relevant to the semiconductor industry, is clearly reflected in this problem.It also shows the difficulties facing production planners and schedulers who attempt to provide an accurate due date for a product with a very complicated routing.For each job, twelve parameters were collected or retrieved from production management information system (PROMIS) databases and reports.After a backward elimination by regression analysis, six parameters (the job size, factory utilization, the queue length on the route, the queue length before the bottleneck, the work-inprocess (WIP), and the average waiting time) were chosen as the inputs to the FBPN.Here "the average waiting time" refers to the average of the waiting times of the three jobs that were completed most recently, not to the waiting time of the job under consideration.The proposed methodology and several existing methods have been applied to this case.The forecasting precision (measured in terms of the average range) and accuracy (measured in terms of RMSE) of these approaches are compared in Figures 8 and 9, respectively.The samples in this case exhibited notable variability, which limited the performance of the forecasting methods.Nevertheless, it is obvious that the proposed methodology still outperformed the six existing methods in both regards.It is also noteworthy that the methods based on job classification (such as CART, PCA-CART, FCM-BPN, and the proposed methodology) achieved better forecasting accuracy than methods without job classification.Such an advantage was further strengthened by variable replacement techniques like PCA. have been applied.A fuzzy data mining approach for precise and accurate job cycle time forecasting is presented in this study.The forecast of the job cycle time by the proposed methodology is a fuzzy value; to improve precision, an effective procedure tightens the upper and lower bounds of the forecast.The forecasting accuracy is enhanced by a combination of PCA, FCM, and FBPN.This study presents an illustrative example and a real case with the data of 457 jobs from a wafer fabrication factory.According to the experimental results, we have the following.

Conclusions and Directions for
(1) the forecasting accuracy (measured with MAE, MAPE, and RMSE) of the PCA-FCM-FBPN was significantly better than the accuracy levels of the other approaches.
(2) Methods based on job classification have better forecasting accuracy.The incorporation of PCA makes the relationships between variables clearer and strengthens the advantage of job classification.
(3) It is possible to specify a very narrow range of the job cycle time using the proposed methodology.
The upper bound can be adjusted faster than the lower bound.An effective method is needed to tackle this issue.In addition, the FBPN approach focuses on the modification of the threshold on the output node.Similar concepts can be applied to modify other parameters in the FBPN.

Figure 1 :
Figure 1: The flowchart of the proposed methodology.

Figure 2 :
Figure 2: Iterative reduction of the upper bound.

Figure 9 :
Figure 9: Comparison of the forecasting precision.

Table 2 :
Data mining methods for job cycle time forecasting.

Table 3 :
Advantages and disadvantages of various data mining approaches for job cycle time forecasting.

Table 4 :
The differences between the proposed methodology and the previous methods.

Table 5 :
The proposed methodology compared to data mining.
* PROMIS: production management information system; MAE: mean absolute error; MAPE: mean absolute percentage error.
Similarly, in probability theory the naïve Bayes method provides a probability that each item belongs to a class.However, the classification of jobs in FCM can include subjective judgments.FCM classifies jobs by minimizing the following objective function: Jobs Using FCM.After PCA, examples are classified by FCM.If a crisp clustering method is applied instead, then it is possible that some clusters will have very few examples.FCM avoids the problems of crisp clustering, because in FCM, an example belongs to multiple clusters to different degrees.

Table 6 :
The illustrative example.

Table 7 :
New inputs to the FCM-FBPN.

Table 8 :
The results of the S test.

Table 10 :
Comparison of the forecasting performance levels.