Clustering Ensemble for Identifying Defective Wafer Bin Map in Semiconductor Manufacturing

. Wafer bin map (WBM) represents specific defect pattern that provides information for diagnosing root causes of low yield in semiconductor manufacturing. In practice, most semiconductor engineers use subjective and time-consuming eyeball analysis to assess WBM patterns. Given shrinking feature sizes and increasing wafer sizes, various types of WBMs occur; thus, relying on human vision to judge defect patterns is complex, inconsistent, and unreliable. In this study, a clustering ensemble approach is proposed to bridge the gap, facilitating WBM pattern extraction and assisting engineer to recognize systematic defect patterns efficiently. The clustering ensemble approach not only generates diverse clusters in data space, but also integrates them in label space. First, the mountain function is used to transform data by using pattern density. Subsequently, k -means and particle swarm optimization (PSO) clustering algorithms are used to generate diversity partitions and various label results. Finally, the adaptive response theory (ART) neural network is used to attain consensus partitions and integration. An experiment was conducted to evaluate the effectiveness of proposed WBMs clustering ensemble approach. Several criterions in terms of sum of squared error, precision, recall, and F -measure were used for evaluating clustering results. The numerical results showed that the proposed approach outperforms the other individual clustering algorithm.


Introduction
To maintain their profitability and growth despite continual technology migration, semiconductor manufacturing companies provide wafer manufacturing services generating value for their customers through yield enhancement, cost reduction, on-time delivery, and cycle time reduction [1,2].The consumer market requires that semiconductor products exhibiting increasing complexity be rapidly developed and delivered to market.Technology continues to advance and required functionalities are increasing; thus, engineers have a drastically decreased amount of time to ensure yield enhancement and diagnose defects [3].
The lengthy process of semiconductor manufacturing involves hundreds of steps, in which big data including the wafer lot history, recipe, inline metrology measurement, equipment sensor value, defect inspection, and electrical test data are automatically generated and recorded.Semiconductor companies experience challenges integrating big data from various sources into a platform or data warehouse and lack intelligent analytics solutions to extract useful manufacturing intelligence and support decision making regarding production planning, process control, equipment monitoring, and yield enhancement.Scant intelligent solutions have been developed based on data mining, soft computing, and evolutionary algorithms to enhance the operational effectiveness of semiconductor manufacturing [4][5][6][7].
Circuit probe (CP) testing is used to evaluate each die on the wafer after the wafer fabrication processes.Wafer bin maps (WBMs) represent the results of a CP test and provide crucial information regarding process abnormalities, facilitating the diagnosis of low-yield problems in semiconductor manufacturing.In WBM failure patterns the spatial dependences across wafers express systematic and random effects.Various failure patterns are required; these pattern types facilitate rapidly identifying the associate root causes of low yield [8]. Based on the defect size, shape, and location on the wafer, the WBM can be expressed as specific patterns such as rings, circles, edges, and curves.Defective dies caused by random particles are difficult to completely remove and typically exhibit nonspecific patterns.Most WBM patterns consisted of a systematic pattern and a random defect [8][9][10].
In practice, thousands of WBMs are generated for inspection and engineers must spend substantial time on pattern judgment rather than determining the assignable causes of low yield.Grouping similar WBMs into the same cluster can enable engineers to effectively diagnose defects.The complicated processes and diverse products fabricated in semiconductor manufacturing can yield various WBM types, making it difficult to detect systematic patterns by using only eyeball analysis.
Clustering analysis is used to partition data into several groups in which the observations are homogeneous within a group and heterogeneous between groups.Clustering analysis has been widely applied in applications such as grouping [11] and pattern extraction [12].However, most conventional clustering algorithms influence the result based on the data type, algorithm parameter settings, and prior information.For example, the -means algorithm is used to analyze substantial amount of data that exhibit time complexity [13].However, the results of the -means algorithm depend on the initially selected centroid and predefined number of clusters.To address the disadvantages of the means algorithm evolutionary methods have been developed to conduct data clustering such as the genetic algorithm (GA) and particle swarm optimization (PSO) [14].PSO is particularly advantageous because it requires less parameter adjustment compared with the GA [15].
Combining results by applying distinct algorithms to the same data set or algorithm by using various parameter settings yields high-quality clusters.Based on the criteria of the clustering objectives, no individual clustering algorithm is suitable for whole problem and data type.Compared with individual clustering algorithms, clustering ensembles that combine multiple clustering results yield superior clustering effectiveness regarding robustness and stability, incorporating conflicting results across partitions [16].Instead of searching for an optimal partition, clustering ensembles capture a consensus partition by integrating diverse partitions from various clustering algorithms.Clustering ensembles have been developed to improve the accuracy, robustness, and stability of clustering; such ensembles typically involve two steps.The first step involves generating a basic set of partitions that can be similar to or distinct from those of various parameters and cluster algorithms [17].The second step involves combining the basic set of partitions by using a consensus function [18].However, with the shrinking integrated circuit feature size and complicated manufacturing process, the WBM patterns become more complex because of various defect density, die size, and wafer rotation.It is difficult to extract defect pattern by single specific clustering approach and needs to incorporate different clustering aspects for various complicated WBM patterns.
To bridge the need in real setting, this study proposes a WBM clustering ensemble approach to facilitate WBM defect pattern extraction.First, the target bin value is categorized into binary value and the wafer maps are transformed from two-dimensional to one-dimensional data.Second, -means and PSO clustering algorithms are used to generate various diversity partitions.Subsequently, the clustering results are regarded as label representations to facilitate aggregating the diversity partition by using an adaptive response theory (ART) neural network.To evaluate the validity of the proposed method, an experimental analysis was conducted using six typical patterns found in the fabrication of semiconductor wafers.Using various parameter settings, the proposed cluster ensembles that combine diverse partitions instead of using the original features outperform individual clustering methods such as -means and PSO.
The remainder of this study is organized as follows.Section 2 introduces a fundamental WBM.Section 3 presents the proposed approach to the WBM clustering problem.Section 4 provides experimental comparisons, applying the proposed approach to analyze the WBM clustering problem.Section 5 offers a conclusion and the findings and future research directions are discussed.

Related Work
A WBM is a two-dimensional failure pattern.Based on various defects types, random, systematic, and mixed failure patterns are primary types of WBMs generated during semiconductor fabrication [19,20].Random failure patterns are typically caused by random particles or noises in the manufacturing environment.In practice, completely eliminating these random defects is difficult.Systematic failure patterns show the spatial correlation across wafers such as rings, crescent moon, edge, and circles.Figure 1 shows typical WBM patterns which are transformed into binary values for visualization and analysis.The dies that pass the functional test are denoted as 0 and the defective dies are denoted as 1.Based on the systematic patterns, domain engineers can rapidly determine the assignable causes of defects [8].Mixed failure patterns comprise the random and systematic defects on a wafer.The mixed pattern can be identified if the degree of the random defect is slight.
Defect diagnosis of facilitating yield enhancement is critical in the rapid development of semiconductor manufacturing technology.An effective method of ensuring that the causes of process variation are assignable is analyzing the spatial defect patterns on wafers.WBMs provide crucial guidance, enabling engineers to rapidly determine the potential root causes of defects by identifying patterns.Most studies have used neural network and model-based approaches to extract common WBM patterns.Hsu and Chien [8] integrated spatial statistical analysis and an ART neural network to conduct WBM clustering and associated the patterns with manufacturing defects to facilitate defect diagnosis.In addition to ART neural network, Liu and Chien [10] applied moment invariant for shape clustering of WBMs.Model-based clustering algorithms are used to construct a model for each cluster and compare the likelihood values between clusters to identify defect patterns.Wang et al. [21] used model-based clustering, applying a Gaussian expectation maximization algorithm to estimate defect patterns.Hwang and Kuo [22] modeled global defects

Proposed Approach
The terminologies and notations used in this study are as follows: : number of gross dies; : number of wafers; : number of particles; : number of clusters; : number of bad dies;

Problem Definition of WBM Clustering
Ensemble.Clustering ensembles can be regarded as two-stage partitions, in which various clustering algorithms are used to assess the data space at the first stage and consensus function is used to assess the label space at the second stage.Figure 2 shows the two-stage clustering perspective.Consensus function is used to develop a clustering combination based on the diversity of the cluster labels derived at the first stage.Let X = {x 1 , x 2 , . . ., x   } denote a set of   WBMs and Π = { 1 ,  2 , . . .,   } denote a set of partitions based on  clustering results.The various partitions of   (  ) represent a label assigned to   by the th algorithm.Each label vector   is used to construct a representation Π, in which the partitions of X comprise a set of labels for each wafer x  ,  = 1, . . .,   .Therefore, the difficulty of constructing a clustering ensemble is locating a new partition Π that provides a consensus partition satisfying the label information derived from each individual clustering result of the original WBM.For each label   , a binary membership indicator matrix  () is constructed, containing a column for each cluster.All values of a row in the  () are denoted as 1 if the row corresponds to an object.Furthermore, the space of a consensus partition changes from the original   features into   features.For example, Table 1 shows eight WBMs grouped using three clustering algorithms ( 1 ,  2 ,  3 ); the three clustering results are transformed into clustering labels that are transformed into binary representations (Table 2).Regarding consensus partitions, the binary membership indicator matrix  () is used to determine a final clustering result, using a consensus model based on the eight features (V 1 , V 2 , . . ., V 8 ).

Data Transformation.
The binary representation of good and bad dies is shown in Figure 3(a).Although this binary representation is useful for visualisation, displaying the spatial relation of each bad die across a wafer is difficult.
To quantify the spatial relations and increase the density of a specific feature, the mountain function is used to transform the binary value into a continuous value.The mountain method is used to determine the approximate cluster center by estimating the probability density function of a feature [24].Instead of using a grid node, a modified mountain Table 2: Binary representation of clustering ensembles.

Clustering results
function can employ data points by using a correlation selfcomparison [25].The modified mountain function for a bad die  on a wafer (  ) is defined as follows: where and (  ,   ) is the distance between dices  and .Parameter  is the normalization factor for the distance between bad die  and the wafer centroid  wc .Parameter  is a constant.
Parameter  determines the approximate density shape of the wafer.Figure 3   from a two-dimensional map into a one-dimensional data vector [8].Such vectors are used to conduct further clustering analysis.

Diverse Partitions Generation by 𝑘-Means and PSO Clustering.
Both -means and PSO clustering algorithms are used to generate basic partitions.To consider the spatial relations across a wafer, both the binary and continuous values are used to determine distinct clustering results by using -means and PSO clustering.Subsequently, various numbers of clusters are used for comparison.
-means is an unsupervised method of clustering analysis [13] used to group data into several predefined numbers of clusters by employing a similarity measure such as the Euclidean distance.The objective function of the -means algorithm is to minimize the within-cluster difference, that is, the sum of the square error (SSE) which is determined using (3).The -means algorithm consists of the following steps as shown in Procedure 1: Data clustering is regarded as an optimisation problem.PSO is an evolutionary algorithm [14] which is used to search for optimal solutions based on the interactions amongst particles; it requires adjusting fewer parameters compared with using other evolutionary algorithms.van der Merwe and Engelbrecht [26] proposed a hybrid algorithm for clustering data, in which the initial swarm is determined using the -means result and PSO is used to refine the cluster results.
where Z  is a matrix representing the assignment of the WBMs to the clusters of the th particle.The following quantization error equation is used to evaluate the level of clustering performance: In addition, is the maximum average Euclidean distance of particle to the assigned clusters and is the minimum Euclidean distance between any pair of clusters.Procedure 2 shows the steps involved in the PSO clustering algorithm.
(2) For iteration  = 1 to  = max do For each particle  do For each data pattern x  calculate the Euclidean distance to all cluster centroids and assign pattern x  to cluster   which has the minimum distance end for calculate the fitness function (p  , Z  ) end for find the personal best and global best positions of each particle.update the cluster centroids by the update velocity equation (i) and update coordinate equation (ii).
Step 2 is iterated until these is no data change Procedure 2: PSO clustering algorithm.

Consensus Partition by Adaptive Response Theory.
ART has been used in numerous areas such as pattern recognition and spatial analysis [27].Regarding the unstable learning conditions caused by new data, ART can be used to address stability and plasticity because it addresses the balance between stability and plasticity, match and reset, and search and direct access [8].Because the input labels are binary, the ART1 neural network [27] algorithm is used to attain a consensus partition of WBMs.
The consensus partition approach is as follows.
Step 1. Apply -means and PSO clustering algorithms and use various parameters (e.g., various numbers of clusters and types of input data) to generate diverse clusters.
Step 2. Transform the original clustering label into binary representation matrix  as an input for ART1 neural network.
Step 3. Apply ART1 neural network to aggregate the diverse partitions.

Numerical Experiments
In this section, this study conducts a numerical study to demonstrate the effectiveness of the proposed clustering ensemble approach.Six typical WBM patterns from semiconductor fabrication were used such as moon, edge, and sector.In the experiments, the percentage of defective dies in six patterns is designed based on real cases.Without losing generality of WBM patterns, the data have been systematically transformed for proprietary information protection of the case company.Total 650 chips were exposed on a wafer.Based on various degrees of noise, each pattern type was used to generate 10 WBMs for estimating the validity of proposed clustering ensemble approach.The noise in WBM could be caused from random particles across a wafer and test bias in CP test, which result in generating bad die randomly on a wafer and generating good die within a group of bad dies.It means that some bad dices are shown as good dice and the density of bad die could be sparse.For example, the value of degree of noise is 0.02 which represents total 2% good die and bad dies are inverse.
The proposed WBM clustering ensemble approach was compared with -means, PSO clustering method, and the algorithm proposed by Hsu and Chien [8].Six numbers of clusters were used for single -means methods and single PSO clustering algorithms.Table 3 showed the parameter settings for PSO clustering.The number of clusters extracted by ART1 neural network is sensitive to the vigilance threshold value.The high vigilance threshold is used to produce more clusters and the similarity within a cluster is high.In contrast, the low vigilance threshold results in fewer numbers of clusters.However, the similarity within a cluster could be low.To compare the parameter setting of ART1 vigilance threshold, various values were used as shown in Figure 4.Each clustering performance was evaluated in terms of the SSE and number of clusters.The SSE is used to compare the cohesion amongst various clustering results, and a small SSE indicates that the WBM within a cluster is highly similar.The number of clusters represents the effectiveness of the WBM grouping.According to the objective of clustering is to group the WBM into few clusters in which the similarities among the WBMs within a cluster are high as possible.Therefore, the setting of ART1 vigilance threshold value is used as 0.50 in the numerical experiments.WBM clustering is to identify the similar type of WBM into the same cluster.To consider only six types of WBMs that were used in the experiments, the actual number of clusters should be six.Based on the various degree of noise in WBM generation as shown in Table 4, several individual clustering methods including ART1 [8], -means clustering, and PSO clustering were used for evaluating clustering performance.Table 4 shows that the ART1 neural network yielded a lower SSE compared with the other methods.However, the ART1 neural network separates the WBM into 15 clusters as shown in  In addition to compare the similarity within the cluster, an index called specificity was used to evaluate the efficiency of the evolved cluster over representing the true clusters [28].The specificity is defined as follows: where   is the number of true WBM patterns covered by the number of evolved WBM patterns and   is the total number of evolved WBM patterns.As shown in the ART1 neural network clustering results, the total number of evolved WBM clusters is 15 and number of true WBM clusters is 6.Then, the specificity is 0.4.Table 5 shows the results of specificity considering the defect density of the spatial relations between the good and bad dies across a wafer.Based on the -measure, the clustering ensembles obtained using all generated partitions exhibit larger precision and recall values and superior levels of performance regarding each pattern compared with the other methods.Thus, the partitions generated by using -means and PSO clustering in various data types must be considered.
The practical viability of the proposed approach was examined.The results show that the ART1 neural network performing into data space directly leads to worse clustering performance in terms of precision.However, the true types of WBM can be identified through transforming original data space into label space and performing consensus partition by ART1 neural network.The proposed cluster ensemble approach can get better performance with fewer numbers of clusters than other conventional clustering approaches including -means, PSO clustering, and ART1 neural network.

Conclusion
WBMs provide important information for engineers to rapidly find the potential root cause by identifying patterns correctly.As the driven force for semiconductor manufacturing technology, WBM identification to the correct pattern becomes more difficult because the same type of patterns is influenced by various factors such as die size, pattern density, and noise degree.Relying on only engineers' experiences of visual inspections and personal judgments in the map patterns is not only subjective, and inconsistent, but also very time-consuming and inefficient.Therefore, grouping similar WBM quickly helps engineer to use more time to diagnose the root cause of low yield.
Considering the requirements of clustering WBMs in practice, a cluster ensemble approach was proposed to facilitate extracting the common defect pattern of WBMs, enhancing failure diagnosis and yield enhancement.The advantage of the proposed method is to yield high-quality clusters by applying distinct algorithms to the same data set and by using various parameter settings.The robustness of clustering ensemble is higher than individual clustering method because the clustering from various aspects including algorithms and parameter setting is integrated into a consensus result.
The proposed clustering ensemble has two stages.At the first stage, diversity partitions are generated using two types of input data: various cluster numbers and distinct clustering algorithms.At the second stage, a consensus partition is attained using these diverse partitions.The numerical analysis demonstrated that the clustering ensemble is superior to using individual -means or PSO clustering algorithms.The results demonstrate that the proposed approach can effectively group the WBMs into several clusters based on their similarity in label space.Thus, engineers can have more time to focus the assignable cause of low yield instead of extracting defect patterns.
Clustering is an exploratory approach.In this study, we assume that the number of clusters is known.Evaluating the clustering ensemble approach, prior information is required regarding the cluster numbers.Further research can be conducted regarding self-tuning the cluster number in clustering ensembles.
First stage: data space Second stage: label space L a b e l s  1 L a b e ls  2 L a b e l s  q

Figure 3 :
Figure 3: Representation of wafer bin map by binary value and continuous value.

A
single particle p  represents the  cluster centroid vectors: p  = [ 1 ,  2 , . . .,   ].A swarm defines a number of candidate clusters.To consider the maximal homogeneity within a cluster and heterogeneity between clusters, a fitness function is used to maximize the intercluster separation and minimize the intracluster distance and quantisation error

Figure 5 .
The ART1 neural network yields unnecessary partitions for the similar type of WBM pattern.In order to generate diverse clustering partitions for clustering ensemble method, four combinations with various data scale and clustering algorithms including -means by binary value (KB), -means by continuous value (KC), PSO by binary value (PB), and PSO by continuous value (PC) are used.Regardless of the individual clustering results based on six numbers of clusters, using -means clustering and PSO clustering individually yielded larger SSE values than using ART1 only.Table 4 also shows the clustering ensembles that use various types of input data.For example, the clustering ensemble method KB&PB integrates the six results including the -means algorithm by three kinds of clusters (i.e.,  = 5, 6, 7) and PSO clustering by three kinds of clusters (i.e.,  = 5, 6, 7), respectively, to form the WBM clustering via

Table 3 :
Parameter settings for PSO clustering.

Table 4 :
Results of clustering methods by SSE.

Table 6 :
Clustering result on the index of precision.

Table 7 :
Clustering result on the index of recall.

Table 8 :
Clustering result on the index of -measure.