Preventive maintenance activities require a tool to be offline for long hour in order to perform the prescribed maintenance activities. Although preventive maintenance is crucial to ensure operational reliability and efficiency of the tool, long hour of preventive maintenance activities increases the cycle time of the semiconductor fabrication foundry (Fab). Therefore, this activity is usually performed when the incoming WorkinProgress to the equipment is forecasted to be low. The current statistical forecasting approach has low accuracy because it lacks the ability to capture the timedependent behavior of the WorkinProgress. In this paper, we present a forecasting model that utilizes machine learning method to forecast the incoming WorkInProgress. Specifically, our proposed model uses LSTM to forecast multistep ahead incoming WorkinProgress prediction to an equipment group. The proposed model's prediction results were compared with the results of the current statistical forecasting method of the Fab. The experimental results demonstrated that the proposed model performed better than the statistical forecasting method in both hit rate and Pearson’s correlation coefficient,
In semiconductor manufacturing, preventive maintenance (PM) is an activity that takes the entire tool offline to carry out prescribed maintenance activity in order to maintain or increase the operational efficiency and reliability of the tool and minimizes unanticipated failures due to faulty parts [
Reducing cycle time is one of the main goals to ensure ontimedelivery to the customers, while ensuring that the wafers have good yields. Thus, it is necessary to do proper PM planning to minimize cycle time impact while ensuring the tool is operational reliable. To achieve this goal, PM should be done when the tool group has low IWIP. However, the IWIP to a tool group has high variations as it is influenced by the conditions of the tools supplying the WIP to it, and various lots dispatching decision that changes dynamically every day.
In this paper, we present a multistep univariate IWIP prediction model to forecast the IWIP to a particular tool group in a semiconductor fabrication foundry (Fab) for the next seven days. We predict seven days ahead in this study as a requirement from the Fab. The problem domain is based on XFab Sarawak Sdn. Bhd., which has been abbreviated as the Fab. Long ShortTerm Memory (LSTM) recurrent neural network is used in the prediction model to learn the historical incoming WIP pattern of the tool group to predict the future incoming WIP pattern of that tool group. LSTM has been used in various research areas such as traffic flow prediction [
A machine learningbased approach to predict the incoming WIP for a tool group of interest in the Fab. Specifically, LSTM recurrent neural network is used as the machine learning algorithm to predict the incoming WIP.
A simplified prediction model that is capable of modeling the dynamic environment of the Fab and delivers higher prediction accuracy than the Fab’s baseline method.
The remainder of the paper is organized as follows. Section
In the research domain of forecasting in semiconductor manufacturing, majority of the research works focus on forecasting the cycle time of the Fab. For instance, Wang et al. [
In another research, Wang et al. [
Scholl et al. [
Another similar research work was done by Mosinski et al. [
The forecast elements of the authors’ research consists of three main stages which are data collection, statistics generation, and forecast calculation. The data collection stage collects all lots that are currently WorkInProgress (WIP). The statistics generation step aims to calculate the cycle time from the start operation step
In the statistical generation step, the weightage used in the authors’ model are Fab specific and hence should be defined upfront if the model is being used in a different fab. In addition, various manual data sanitization steps are required in order to remove outliers in the data. In addition, the regeneration of the cycle time statistics is resource intensive especially with large number of lots involvement. Such limitation makes this model impractical for most Fab with large number of lots. Proper lot classification is necessary to ensure that the cycle time statistics generated are relevant. In addition, special software for lot scheduling is also required to generate the relevant data.
Due to the lack of similar research works done in the domain of semiconductor fabrication, a crossreference to similar research problem in a different domain is necessary. From the literature reviews, vehicle traffic arrival forecasting exhibits the closest similarity to the forecast of WIP arrival in the fab. Consider the comparison of the following two models presented in Figures
Geometric layout of traffic flow prediction scenario by Larry [
Typical scenario of WIP arrival to an equipment group in a fab.
In Figure
A logical grouping of tools into tool group.
According to Larry [
The prediction models can commonly be divided into 2 categories, parametric models and nonparametric models. Parametric models refer to models with fixed structure based on some assumptions and the model parameters can be computed with empirical data [
Nonparametric models refer to models with no fixed structure and parameters [
LSTM, a variant of deeplearning neural network, has recently gained popularity in traffic flow prediction. In [
In [
In [
The daily IWIP to a tool group is a form of time series data. This is because it is a sequence of values that observed sequentially in time. The IWIP forecast to a tool group is similar to traffic arrival forecast, where the objective is to ensure that there is enough capacity for the traffic to flow through with minimum obstruction for a given timeframe in the future in order not to create any bottlenecks in the traffic flow. The amount of WIP arriving to a tool group is analogical to the number of vehicles arriving to a road junction or a group of interlinks of interest. This research also requires multistep ahead forecasting approach as the research problem requires forecasting the IWIP multiple days ahead from the last observation in order to plan for PM activities.
The existing solution in the Fab uses a basic statistical forecasting approach to forecast the WIP arrival for all tool groups for the next 7 days. The forecast is run once a week at the beginning of each week. The calculation steps of statistical forecasting approach are summarized in Table
Existing statistical WIP forecasting steps.
Step  Description 

1  Given the process flow name of product 
2  For each process step 
3  Let 
4  Cycle time (CT) for a process step 
5  For each lot, sum the next 
6  To forecast the destination step of the lot for the next 
The existing forecasting method only caters to products with the number of wafers ordered dominating the total WIP in the production line. This is because the calculation requires the number of operation steps and their respective
The long shortterm memory (LSTM) was developed in 1997 by Hochereiter and Schmidhuber [
Let time be denoted as
Input gate:
Forget gate:
Output gate:
Cell Input:
Sigmoid function,
Hyperbolic tangent function,
Using equations (
Lastly, the hidden layer output is calculated using the following equation:
The hidden layer of the LSTM can be stacked such that the architecture of neural network consists of more than one LSTM hidden layers. Figure
Nonstacked LSTM neural network.
Stacked LSTM neural network.
With reference to Figure
Figure
Proposed method.
In the data scaling step, the historical IWIP data to be used for supervisedlearning is scaled according to the following equation:
The next step is the data formatting step. In time series domain, the term “lags” is commonly used to denote values of time steps observed prior to the prediction. Generally, the time series data are separated into training and testing set where the training set contains the lags, while the testing set contains the actual values of future time steps. Therefore, in the data formatting step, let
Data formation for training dataset.
Set  Training dataset  





1 

7  1 
2 

7  1 
3 

7  1 
4 

7  1 
5 

7  1 
6 

7  1 
7 

7  1 
Data formation for testing dataset.
Set  Testing dataset  




1 

1 
2 

1 
3 

1 
4 

1 
5 

1 
6 

1 
7 

1 
Following this format, column
As a nonparametric model, neural network model does not have a fixed structure. In a RNN with one hidden layer, the ability of the neural networks to discover important relationship in the training data during the supervisedlearning is affected by the batch size used per epoch, the number of epoch, hidden layers, and hidden neuron. The combination of the sizes of these four parameters that results in stable supervisedlearning and delivers the lowest forecast error is desired.
Each parameter being examined will have a list of predefined sizes to be tested. When one of the parameters is being examined, the remaining two parameters will be fixed to their current sizes in their respective list. This is to control the variation across the examinations. For each combination of the parameters, the model will be tested with that combination to measure its performance in terms of the forecasting error and the stability of its supervisedlearning.
For the LSTM setup of this research, we construct a LSTM model using the LSTM cell. Let
The metric used to measure the forecasting error in the supervisedlearning is the rootmeansquared error (RMSE). Let
RMSE is a frequently used evaluation metric because it measures the difference between the values predicted by a model and the actually observed values.
For each parameter size combination to be tested, the model will be experimented multiple times with the same parameters setting. If
After the supervisedlearning is completed, the proposed method will proceed to parameter combination evaluation and selection. The evaluation of parameter combination and selection step is necessary because it is common to assume that a particular parameter combination that gives a low RMSE at the end of the supervisedlearning directly translated to a good parameter combination that allows sufficient capability of the model to perform forecast. However, this assumption is misleading because a model that has overlearned during the supervisedlearning can deliver results with very low RMSE at the end of the training. An overlearned model will perform poorly in the actual forecast. Therefore, it is necessary to also measure the stability of the supervisedlearning of the model, given a particular combination of the four parameters. During each epoch in the supervisedlearning, the model will be required to perform two forecasts: one uses a reserved set from the training set and other uses a reserved set from the testing set. With two forecasts performed, two RMSE will be generated. The RMSE generated by using training set is the training error, while the RMSE generated by using the reserved testing set is the testing error. To measure the stability of the supervisedlearning, the RMSE for both training error and testing error of each epoch are collected and plotted in a single graph. With
RMSE curves when the model is well learned.
RMSE curves when then model is underlearned.
RMSE curves when the model is overlearned (overfitting).
The combination of the four parameters’ sizes that enables the model to show stable performance in the supervisedlearning and lowest RMSE will be selected to forecast the IWIP.
For each of the selected parameter combination, the model is required to forecast for three consecutive weeks. The accuracies for the forecast results will be measured according to the selected measurement metrics to evaluate the forecasting capability of the model.
The IWIP for a particular tool group is denoted as IWIP and can be calculated as
The data use for this experiment is acquired from the Fab’s internal development database, with the application running hourly to collect the WIP and calculate the number of wafers moved for each tool group in the production line every 24 hours. Due to the Fab’s data security and confidentiality policies, we are only allowed to access production system’s data source of the company to perform data collection for a specific duration. Given the allowed duration from the Fab, we were able to collect three months data to create a data set with 90 days of historical IWIP. With each IWIP as a data point, 70 percent of the data points are used for the LSTM training phase and the remaining 30 percent for testing phase.
For the number of epochs, numerical values of 100 and 200 are selected. For batch size, numerical values of 10 and 20 are selected; for the number of hidden layers, numerical values of 3 and 4 are selected, and for the number of hidden neuron, numerical values of 384 and 512 are selected for the first hidden layer, while the numerical values of 8 and 16 are selected for the subsequent layers. It is worthwhile to mention that by using seven
Parameter size selection is done by selecting the lowest RMSE among the three experimental runs followed by examining the graphs of the supervisedlearning result of the same run that produced the lowest RMSE. The desired supervisedlearning graph should resemble the pattern illustrated in Figure
To measure the performance of the models, two accuracy measurements are used. These twomeasurement metrics are hit rate and correlation measurement.
Hit rate or probability of detection (POD) is the probability that the forecasted event matches the observed event. In the context of this research work, the observed events are either low IWIP or high IWIP. Therefore, hit rate can be used to measure the forecast capability of the proposed method to match the actual IWIP events. Let
From the requirement of the Fab, it is only necessary for the proposed method to be able to forecast any two days with highest IWIP and any two days with lowest IWIP. For these four days to be forecasted, the hit rate required by the Fab is 75 percent. In other words, at least three out of these four days must be detected.
To measure the correlation between the actual IWIP and the forecasted IWIP, this research uses the Pearson’s correlation coefficient,
The correlation coefficient takes values in the range [−1, 1]. The value of 1 implies that a linear equation describes the relationship between the two vectors perfectly. This means that all data points of the two vectors fit perfectly on a straight line on a graph. The positive sign of the coefficient indicates positive correlation. This means that as the actual IWIP increase, and the forecasted IWIP increases as well. The negative sign of the coefficient indicates negative correlation. This means that, as the actual IWIP increase, the forecasted IWIP decreases as well. Positive correlation is therefore desirable for the forecast results.
Due to the Fab’s privacy protection agreement, only the obtained Pearson’s
We conduct the experiment for three consecutive weeks. This allows us to monitor the consistency of the models’ prediction. At the beginning of each week, we will predict seven days ahead and measure the performance at the end of each week. The implementation of the proposed method is accomplished using Python programming language and Keras [
Table
Parameter size selection results.
Combination  Parameter  RMSE  

Epoch  Batch size 

LSTM neuronsize  
1  100  10  3  512, 8, 8  0.0096 
2  100  10  3  512, 8, 16  0.0086 
3  100  20  3  512, 16, 16  0.0091 
Figures
Supervisedlearning result for Combination 1.
Supervisedlearning result for Combination 2.
Supervisedlearning result for Combination 3.
The parameters from each of the selected three combinations were applied on the proposed LSTM model to perform the three consecutive weeks forecasting. The experiments were run and recorded separately for each combination. Tables
Hit rate for Combination 1.
Week 

Actual  Forecast  Hit 

w1  1  
2  L  
3  H  H  1  
4  H  H  1  
5  L  L  1  
6  L  
7  
HR = 75%  


w2  1  L  
2  L  
3  
4  L  H  
5  H  L  
6  
7  H  H  1  
HR = 25%  


w3  1  L  L  1 
2  L  
3  L  
4  H  
5  H  H  1  
6  H  
7  
HR = 50% 
Hit rate for Combination 2.
Week 

Actual  Forecast  Hit 

w1  1  
2  L  
3  H  H  1  
4  H  H  1  
5  L  L  1  
6  L  
7  
HR = 75%  


w2  1  L  
2  L  L  1  
3  
4  L  
5  H  H  1  
6  H  
7  H  
HR = 50%  


w3  1  L  
2  L  
3  L  
4  H  
5  H  H  1  
6  H  
7  L  
HR = 25% 
Hit rate for Combination 3.
Week 

Actual  Forecast  Hit 

w1  1  
2  L  
3  H  H  1  
4  H  H  1  
5  L  L  1  
6  L  
7  
HR = 75%  


w2  1  
2  L  L  1  
3  
4  L  
5  H  H  1  
6  
7  H  H  1  
HR = 75%  


w3  1  L  L  1 
2  L  
3  L  
4  H  
5  H  H  1  
6  H  
7  
HR = 50% 
Pearson’s
Week 


w1  0.31 
w2  0.06 
w3  0.34 
Pearson’s
Week 


w1  0.40 
w2  0.28 
w3  0.46 
Pearson’s
Week 


w1  0.42 
w2  0.31 
w3  0.43 
Summary of hit rate for Combinations 1, 2, and 3.
Combination  Hit rate (%)  

w1  w2  w3  
1  75  25  50 
2  75  50  25 
3  75  75  50 
Summary of Pearson’s
Combination  Pearson’s 


w1  w2  w3  
1  0.31  0.06  0.34 
2  0.40  0.28  0.46 
3  0.42  0.31  0.43 
Figures
IWIP forecast using Combination 1.
IWIP forecast using Combination 2.
IWIP forecast using Combination 3.
From the results obtained, the model performed the best using Combination 3. In terms of hit rate, Combination 3 scored the highest compare to Combinations 1 and 2 for all three weeks. Combinations 1 and 2 scored 75 percent for week 1, but for subsequent weeks, both combinations only scored the maximum of 50 percent. In terms of Pearson’s
We then compare the forecast result using Combination 3 to the statistical forecasting method used in the Fab. In order to make the writing clearer, the statistical forecasting method used in the Fab is abbreviated as Fab method. Tables
Hit rate for Fab method.
Week 

Actual  Forecast  Hit 

w1  1  
2  L  
3  H  H  1  
4  H  H  1  
5  L  
6  L  
7  L  
HR = 50%  


w2  1  
2  L  H  
3  H  
4  L  
5  H  L  
6  L  
7  H  
HR = 0%  


w3  1  L  H  
2  L  
3  
4  H  
5  H  L  
6  L  
7  H  
HR = 0% 
Pearson’s
Week 


w1  0.28 
w2  −0.11 
w3  −0.82 
Forecast result comparison between proposed method and Fab.
Hit rate (%)  Pearson’s 


Week  Week  
w1  w2  w3  w1  w2  w3  
Proposed method  75.0  75.0  50.0  0.42  0.31  0.43 
Fab  50.0  0.0  0.0  0.28  −0.11  −0.82 
IWIP forecast using Fab method.
The Fab method serves as the baseline to measure the performance of the LSTM forecasting model. From the results tabulated in Table
For hit rate, the proposed method only scored 50% for week 3 while for Pearson’s
The next factor that could contribute to the inconsistent result of the model is the limited number of features that used to represent Fab’s production environment. Having additional features to represent Fab’s production environment could allow LSTM to perform better modeling of WIP arrival. The examples of additional data that could serve as such features are the actual number of equipment that supplies the WIP to the tool group of interest, the amount of time each equipment in the tool group is processing the production wafers instead of performing other maintenance activities, and the number of wafers that each equipment in the tool group of interest has actually processed.
The last factor that contributes to the inconsistent results could be the need for more hidden layers. As the number of hidden layers increases, it creates a deeper neural network that could potential allow the model to capture even more timedependent relations in the data. However, in order to benefit from deeper neural network, larger dataset must first be obtained so that the model can be properly trained.
For the experiments conducted, the selection of sizes for the LSTM model’s parameters and the number of experimental runs is largely affected by the hardware resources allocation and the software capability setup. From the hardware resource perspective, sufficient CPU should be allocated in the computing machine, while from the software capability perspective, parallelization should be enabled to fully utilize the available CPU. With 4 CPUs allocated in a virtual machine environment and parallelization enabled in Keras, it took approximately 8 hours to complete one full experiment. One full experiment refers to the complete evaluation all the predefined sizes. For real production deployment, 8 hours is too long to obtain a usable model. Parallelization with sufficient number of CPUs in the computing machine are therefore critical in the production environment as the results should be obtained as fast as possible in order for the managements to make the necessary decision for production line stability. Hence, proper hardware planning is required for production deployment.
PM activity is an important activity in the Fab as it maintains or increases the operational efficiency and reliability of the tool. Proper PM planning is necessary as PM activity takes significantly long time to complete, thus it is desirable to perform this activity when the IWIP to the tool group is expected to be low. With an IWIP prediction model that is capable to predict the IWIP with high accuracy, PM activity can be planned and managed better to reduce its negative impact to the fab’s CT. Reducing the negative impact to CT is important as this will enable the Fab to meet the OnTimeDelivery (OTD) committed to customers. With consistency in the OTD, the logistic management of the company can be improved as well, such as proper storage place planning to keep the fabricated wafers and scheduling their transportations for shipments. Wellplanned PM activities also allow better manpower planning in areas such as manpower planning. When performing the PM activity, sufficient tool engineers and tool vendors are required to be onsite to perform the prescribed maintenance activities. Wellplanned PM activities allow the required manpower to be properly prepared. Wellplanned manpower directly contributes to better manpower cost planning. With proper PM planning inplace, tools in the Fab can be scheduled to receive their proper maintenances on time. It is important for tools in the Fab to receive their appropriate maintenances ontime to improve its productivity and lifetime extension. With improved performance and extended lifetime, the capital investments of the company on the tools can be optimized. Reliable tool performance will also increase the trust of the customers as chances of the fabricated wafers being scrapped due to unhealthy tool are minimized.
In this paper, we investigated LSTM to assist in PM planning in Fab by predicting the IWIP to a tool group. The comparison of the performance of the proposed method was done with an existing forecasting method from the Fab. The proposed method was trained using the historical IWIP data provided by the Fab, which is a time series data. Both hit rate and Pearson’s correlation coefficient are important criteria that determine the forecast capability. The proposed method has demonstrated results that outperformed the Fab method by reaching above the requirement of the Fab for week 1 and week 3, while the Fab method fails to meet Fab’s requirement for all three weeks. In terms of hit rate, the proposed method shows higher percentage than Fab’s method. Following the requirement given by the Fab, the results of the proposed method signifies that for forecast duration of seven days, it is able to identify more accurately the two days that the IWIP will be highest and the two days the IWIP will be lowest in a week. In terms of Pearson’s correlation coefficient,
Although the proposed method is outperformed the existing Fab’s statistical method, there is still room for improvement. The first future work is to increase the size of historical dataset. With the use of larger historical dataset that spans across longer historical time horizon to train the LSTM model, it may be potential for the LSTM model to discover significant WIP arrival pattern that could have been missed out in smaller historical dataset. The second future work is to extend the univariate forecasting model in this research to multivariate forecasting model. The reason for this extension is to allow the inclusion of more features to train the LSTM model so that the LSTM model can better model the actual environment of the Fab. The next future work is to increase the number of hidden layers in the LSTM forecasting model. The approach to increase the number of hidden layer is also an initial step to experiment the potential use of deeplearning model in time series forecasting. The last future work is to extend the application of the proposed method to predict the IWIP of other types of tool group to experiment if the proposed method is capable to delivering the same prediction performance. The collected prediction results across various types of tool groups from the future work will also allow us to generalize the proposed method to be used as a generic IWIP prediction model for the fab.
The timeseries data used to support the findings of this study were supplied by XFab Sarawak Sdn. Bhd. under privacy agreement, and there, the data cannot be made freely available. The data potentially reveal sensitive information, and therefore, their access is being restricted.
The authors declare that they have no conflicts of interest.
The funding for this project was made possible through the research grant from the Ministry of Education, Malaysia, under the Research Acculturation Collaborative Effort (Grant No. RACE/b(3)/1247/2015(03)). The authors would like to thank XFAB Sarawak Sdn. Bhd. for their support in this research by providing the environment and resources to extract the relevant data.