Utilizing an Adaptive Grey Model for Short-Term Time Series Forecasting : A Case Study of Wafer-Level Packaging

1 Department of Business Administration, Chung Yuan Christian University, No. 200, Chung-Pei Road, Chung-Li City, Taoyuan County 32023, Taiwan 2Department of Industrial and Information Management, National Chen Kung University, No. 1, University Road, Tainan City 70101, Taiwan 3Department of Information Management, Tainan University of Technology, No. 529, Zhongzheng Road, Tainan City 71002, Taiwan


Introduction
Firms can gain competitive advantages by more effectively controlling their manufacturing systems [1].In this context, the early stages of a manufacturing system are especially important for managers, because any slight change in the parameters used at this time will significantly influence final product quality and manufacturing performance [2].Timely and precise information is thus needed for effective operations management [3].However, few observations are available at the early stages of a manufacturing system, meaning that statistical theory and data mining techniques cannot obtain accurate results [4,5].
The introductory stage of the wafer-level packaging (WLP) process is an example of a small dataset problem.The WLP process is a new, advanced technology for the packaging industry, and thus manufacturers do not have much experience to draw its form when seeking to improve it.However, despite the lack of data, engineers still need to understand the features of this production system to maintain a high manufacturing yield [2]. Consequently, an appropriate forecasting tool that can deal with incomplete information based on small datasets is required for effective management of the WLP process.
Grey system theory was proposed by Deng (1982) [6] and is used to examine uncertain systems based on incomplete information [7,8].The main principle of this approach is to process the data indirectly through data mapping to transform the state space, so as to extract hidden information in the data [9,10].Because of its ease of use, grey system theory has been successfully applied in various domains [11][12][13][14][15][16][17].
Many studies have demonstrated that the grey model is an effective approach for data analyses with small samples, and it is applied in this research to the packaging process of semiconductor manufacturers, in order to solve forecasting problems in the pilot run stage.The grey model developed in this work is preexamined by four different measurements and compared with two other popular forecasting methods to examine its flexibility and value in this context.The results show that the grey approach is an effective tool that can enable manufacturers to make more accurate forecasts based on limited observations collected in the early stages of the packaging process.
The rest of this paper is organized as follows.Section 2 introduces the development of the adaptive grey model used in this work.In Section 3, we briefly describe the problem of interest in the pilot run of the wafer-level packaging process, where the applicability of the grey approach is demonstrated in this study.Finally, the conclusions are presented in Section 4.

Methodology
While the conventional grey forecasting model, GM(1,1), has been widely applied, it is still possible to improve its forecasting performance.Li et al. [18] proposed a revised grey model, called AGM(1,1), which is one of the best and most accurate of the existing models.It uses the trend and potency tracking method [19] to form a function to determine the background values of the grey model.This model can better reflect the data growth trends at different stages of the process and overcomes some of the weaknesses of the ordinary modeling procedure in the GM(1,1), thus producing more accurate forecasts.Therefore, this study uses this method to solve the forecasting problem that arises in the pilot run of the waferlevel packaging process.The AGM(1,1) modeling procedure has two main parts, which are (1) determining the trend and potency value and (2) building the grey forecasting model.Both of these will be described in the following subsections.[19] proposed the trend and potency tracking method (TPTM), which is an analysis method that uses the characteristics of data to explore possible changes in data behavior in different stages of a process, and this is the key concept used in AGM(1,1) to improve the accuracy of the conventional grey forecasting model.

Trend and Potency Tracking Method. Li and Yeh
The detailed procedure of the TPTM is as follows.
Step 1.Given the original data series,  = { 1 ,  2 , . . .,   }, let  min be the minimum value in  and  max the maximum one.
Step is an increasing potency (IP) and   < 0 is a decreasing potency (DP).
Step 5. Determine the central location (CL) of existing data using the equation CL = ( min +  max )/2.The CL is then utilized as the main point to conduct the asymmetric domain range expansion.
Step 6. Compute the average of increasing potencies (AIP) and the average of the decreasing potencies (ADP), and then use them to asymmetrically extend the domain range.The upper limit of the extended domain range is EDR UL =  max + AIP, and the lower limit is EDR LL =  min + ADP.
We thus obtain an extended domain range within which to explore extrainformation about the data trend and potency.
Step 7. Form a triangular TP function using the CL, EDR UL, and EDR LL.Here, we set the TP value of the CL to be 1, and then we can obtain the TP values of existing data through the ratio rule of a triangle.Figure 1 illustrates an example and the TP value of  min is  = /( + ), where  is the distance between EDR LL and  min , and  is the distance between  min and CL.The range of the TP value is between 0 and 1, and the TP value reveals that the current datum's intensity is close to the CL.

Adaptive Grey Forecasting Model.
The background value of GM(1,1) is the most important factor that affects model construction and the final forecasting results.Li et al. [18] studied the impact of differences in the background value on forecasting performance and proposed an improved grey forecasting model, AGM(1,1), by integrating the concept of TPTM into the formula for calculating the background value.This study uses this approach to deal with forecasting problems in a small dataset.The AGM(1,1) model is described as follows.

Experimental Studies: Forecasting the Wafer-Level Packaging Process
The applicability of the adaptive grey forecasting model is examined here by an experiment, and its details are described in the following subsection.

Background of Wafer-Level Packaging.
Semiconductor manufacturing is an important link in the supply chain of consumer electronics and has significant effects on the quality and characteristics of the final products.The packaging method used in this industry is increasingly the waferlevel packaging (WLP) rather than the chip-scale packaging, as this can enable smaller package sizes.Moreover, WLP can allow manufacturers to integrate the wafer fabrication, packaging, testing, and burn-in processes.In practice, WLP is a technology that packages an integrated circuit at the wafer level, instead of the traditional process of assembling individual units in packages after dicing them from a wafer.Many different materials are combined on a substrate in WLP to form a chip, as shown in Figure 2, so the physical properties of this combination depend on the rigidity of materials, the thermal expansion coefficient, and the temperature variation among them.There are five quality inspections in the WLP process to ensure the process standards, and these examine the passivation layer defects, diameter size, coplanarity, metal bump defects, and the height of metal bumps.Further, because the metal bump height affects the overall quality in the packaging process, the detection of this is very important and needs to be effectively controlled to maintain a good yield.In addition, while solder balls are generally used for surface mounting, because they have a relatively low melting point, they can also suffer from creep and plastic behavior during mounting under common operating conditions.This means that the height of solder balls is unstable in the packaging process, which can adversely affect the quality of the final products.In short, if engineers can better understand the moving trend of the solder ball height and adjust the process parameters in a timely fashion, then this can help improve overall manufacturing efficiency.

Experimental Data and
Procedure.This research employs data collected from a leading packaging manufacturer in Taiwan.The case company is a wafer original equipment manufacturer, with relatively little experience of WLP production at the time when the data was collected.
The experimental time series data were collected at the end of 2011, the period when the case company was implementing a new WLP process and subjecting it to postinstallation testing.This pilot run yielded a total of thirteen observations, as shown in Table 1.The unit of measurement for the height in this table is micrometers (m).The aim of this study is to build an effective forecasting model for the height of solder balls over time.Four data are used to construct the forecasting model, with the rest of the data used one at a time for testing, so that a total of nine sets of data are obtained.

Computation of the AGM(1,1) Model.
To illustrate the modeling of the AGM(1,1) in more detail, this study employs the preceding four data {192.16, 192.78, 192.95, 193.11} as an example to build the model.
The computation process is shown below, and the results are listed in Table 2.

Preevaluation Indexes for the Performance of AGM(1,1)
Model.To assess the reasonableness and applicability of the adaptive grey forecasting model in the early stage of the WLP process, we first use four different measurements that are often used in pretesting modeling fitness in the grey system theory [20] to evaluate the forecasting performance.These are the mean relative error (MRE), absolute degree of grey incidence (ADGI), ratio of standard deviation (RSD), and probability of small error (PSE).We let  be the sum of testing samples, while x ,   and   are the predicted outputs, actual values and errors of the th testing sample, respectively.The average and standard deviations of   and   , written as , ,   and   , are then computed by basic formulas in the statistics.The definitions of MRE, RSD, and PSE are presented below.Since ADGI is relatively complex, it is not easy to define briefly here, and so please refer to an earlier study [21] for further information about this The MRE is a measurement that is used to estimate the forecasting accuracy.The ADGI judges the closeness of the relationship between the actual and forecast values, based on the similarity of the geometric patterns of the sequence curves.It is often used to check whether or not a model is able to appropriately reflect the data profile.The RSD is an indicator of the degree of discreteness of the errors, that is, the whole amplitude of the error fluctuation.The PSE shows what ratio of errors lie in the acceptable range.With these four measurement indexes, smaller values of MRE and RSD represent better simulated results, while bigger ones are needed with ADGI and PSE [20,21].
The levels used to examine the accuracy of the model and the critical values of the previous four indexes are shown in Table 3.These four measurements can help determine whether a model is accurate or not.If they fall within the range of levels 1 or 2, this suggests that the model has high forecasting accuracy.

Comparison with Other Conventional Forecasting
Approaches.To verify the effectiveness of the AGM(1,1) model, we compare its results with those from two popular forecasting techniques, the backpropagation neural network (BPNN) and support vector regression (SVR) approaches.
BPNN is a neural network that has shown good forecasting ability in many contexts [22].The learning tool used in this work is the Pythia software.We use four training samples to train a BPNN, where each sample includes one input attribute and one output attribute, such as (1,  1 ), (2,  2 ), (3,  3 ),  ).The topology of the BPNN is a 1-2-1 structure, and after it is trained it returns the corresponding value for x5 .SVR is a nonparametric estimation learning algorithm based on statistical learning theory, which is used to solve training problems with limited samples [23,24].Due to its good performance, it is one of the primary methods used in machine learning.We use Weka 3.6.9as the learning tool for the SVR, and the training set is the same as that used with the BPNN.
Since accuracy is a critical index when measuring the ability of a forecasting method [25], we employ three error indexes in this work, namely, the mean square error (MSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), and these are calculated as follows: where  is the sum of testing samples and x and   are the predicted and actual values of the th testing sample, respectively.

Experimental
Results.This research uses the four pretesting measurements in Section 3.4 to assess the forecasting performance from various perspectives.The forecasting results of the proposed approach are shown in Table 4, and the four measurements are as follows: MRE is 0.0016, ADGI is 0.9515, RSD is 0.4496, and PSE is 0.8889.This shows that the AGM(1,1) falls within level 2 and thus has a high forecasting accuracy.Further, the MRE value, which is smaller than 0.01, suggests that the risk when using this grey model is controllable.The ADGI value is about 0.95, indicating a high similarity between the actual and forecast results with regard to the geometric shape, and this also explains that the proposed approach can properly reflect the data trend, with no serious inconsistencies between the forecast results and the real situation.Lastly, both the measurements of RSD and PSE are at an acceptable level, so there are neither extreme values nor a high level of error discreteness, which both indirectly confirm the stability of the AGM(1,1) model.The results shown in Table 5 indicate that the AGM(1,1) model has better forecasting performance and outperforms the other two methods, with its average improvements being over 20% for all three error indexes.In addition, the MAPE of AGM(1,1) is less than 0.2%, which falls into an acceptable range in the WLP industry.These results clearly show that the AGM(1,1) model is a feasible tool to deal with forecasting task in the WLP process.

Conclusions and Discussion
In recent years, the packaging technology has focused on the WLP process.However, due to the limited data available at the early stages of the introduction of this technology, it is difficult for companies to effectively control it.To improve production yields, it is thus necessary for engineers to have an effective approach to forecast in this context, as if adverse phenomena are discovered in time; they can be addressed in a timely manner.
This paper presented a new adaptive model based on grey system theory to solve the small dataset forecasting problem

Figure 2 :
Figure 2: Flowchart of the wafer-level packaging process.

Table 3 :
Level of accuracy.

Table 5 :
Comparison of the forecasting methods.The experimental results show that the AGM(1,1) model can obtain satisfactory outcomes in the four pretesting measurements, outperforming the two popular forecasting techniques and thus that the grey approach has practical value for use in the packaging industries.Furthermore, one direction for future research would be to examine how to apply this method to other real-world industrial cases, such as the panel and biotechnology industries.