Identification of CTQs for Complex Products Based on Mutual Information and Improved Gravitational Search Algorithm

The identification of CTQs for complex products is the first step to implement quality control. To improve the efficiency and accuracy of CTQs identification, we propose a novel hybrid approach based on mutual information and improved gravitational search algorithm, which has advantages of filter and wrapper. At first, the information relevance and redundancy are measured by mutual information.Then, the improved gravitational search algorithm is used to search the CTQs. Experimentation is carried out using 2 UCI data sets, and the classification capability of CTQs is tested by SVM and tenfold cross validation.The results show that the presented method is verified to be effective and practically applicable.


Introduction
The quality characteristics of complex products with complex structure, high-tech, and components highly integrated are high-dimensional characteristics.It is not economically or logistically feasible to control or monitor all of the quality characteristics for the high-dimensional data.Of all these characteristics, some are critical to quality characteristics (CTQs) which determine the quality of the product while others are redundant or insignificant [1][2][3][4].Therefore, identifying the CTQs is the key to monitor, analyze, and improve the quality of complex products.In this paper, we propose a novel hybrid approach based on mutual information and improved gravitational search algorithm (MIGSA) for CTQs identification; see Algorithm 1.The rest of this paper is structured as follows.Section 2 introduces the literature review of CTQs identification.Some basic concepts on mutual information, gravitational search algorithm, and rough sets theory are described briefly in Section 3. The fundamentals of the proposed hybrid approach to CTQs identification of complex products are described in Section 4. The experimental results and discussion are presented in Section 5. Finally, conclusions are given in Section 6.

Literature Reviews of CTQs
According to the literatures, the research on CTQs identification can be divided into two sections: CTQs in design phase and CTQs in manufacture process.
For CTQs of design, the design team identifies the voice of the customer (VOC) and determines the CTQs from engineering characteristics based on the relationship of VOCs and engineering characteristics.The typical method of CTQs identification is quality function deployment (QFD) [5,6].QFD provides a means of translating customer requirements into the appropriate technical requirements for each stage of product development and production [7].For example, Zhang et al. established a multiple CTQs optimization model by using QFD technology and successfully extracted various influencing factors for multiple quality characteristics [8].He et al. put forward an approach to CTQs decomposition from customer requirements into critical technical parameters based on the relational tree [6].Thornton distinguishes importance degree of obtained characteristics through adding process data [4,9].Rowe presented a methodology compatible with Design for Six Sigma (DFSS) for constructing comprehensive statistical design and process control specifications for CTQs [10].
During manufacture process, the CTQs can be identified by using sequential experimental design.Shen and Wan proposed controlled sequential factorial design (CSFD) for discrete-event simulation experiments [11].Rout and Mittal used combined array design of experiment approach to screen the factors influencing the performance of manipulator [12].Mathieu and Marguet presented an integrated method for CTQs identification based on assembly directed graph [13].Whitney proposed the concept of a data flow chain (DFC), which was used to analyze the effect factors of CTQs [14].Variation mode and effect analysis (VMEA) was used in identification of noise factors that caused CTQs to fluctuate and risk coefficient was used to measure the importance of them [15].Wang et al. proposed CTQs identification method in multistage manufacturing process by combining the partial least squares regression (PLSR) method with the state space model [16].Wu presented an approach to optimizing the correlated multiple quality characteristics based on the modified double-exponential desirability function [17].
The above methods have been applied successfully in many fields.However, if the number of factors is large (say more than 30) and the output dimension is relatively high, it is hard to obtain the expected characteristics reduction by using the above methods.
According to the characteristic of the manufacturing process, Yan et al. constructed the relationship between the quality characteristics and the class of each product sample and then used information gain (IG) methodology to identify the CTQs of high dimensional complex products [18].CTQs identification in complex products can be regarded as a feature selection problem.Data mining approach can be used to solve the problem.Yan et al. [1] used ReliefF algorithm to identify CTQs in complex products.The method is verified feasible, but the classification accuracy of the results is low (nearly 70%).
The method MIGSA proposed in this paper merges the merits of efficiency of both filter and high accuracy wrapper.
The following sections provide a more detailed description of the approach.

Preliminaries
3.1.Mutual Information.In information theory, the uncertainty of random variables can be measured by the entropy [19].Let  be random variables with discrete values; its entropy () is defined as where () = Pr{ = } is the probability density function of .Then, let  and  be two discrete random variables; their joint probability density function is (, ); then, the joint entropy (, ) of them is Conditional entropy ( | ) is used to describe the uncertainty reduction of variable  when variable  is known.It is defined as The mutual information (; ) is defined to measure the common information of two variables  and : From the above definition, the high value of (; ) means that the two variables  and  are closely related; otherwise, the two variables are not closely related when the value is small; specially, they are totally unrelated when (; ) = 0.

Gravitational Search Algorithm.
The gravitational search algorithm (GSA) is a recently proposed heuristic search algorithm by Rashedi et al. [20], which has been inspired by the Newtonian laws of gravity and motion.As a new stochastic population-based heuristic optimization tool, the GSA algorithm provides an iterative method that simulates mass interactions and moves through a multidimensional search space in the influence of gravitation.In the GSA algorithm, agents are considered as objects and their performance is measured by their masses.The GSA algorithm is introduced as follows [20,21].
For a system with  agents, the position of the th agent is defined as where    presents the position of the th agent in the th dimension and ℎ is the dimension of the search space.
The velocity V   () and position    () of the th agent in the th dimension can be updated using ( 7) and ( 12): where rand  is uniform random data in the interval [0, 1], which is utilized to give a randomized characteristic to the search.The acceleration    () can be calculated as follows: () is the total force exerted on agent  in the th dimension and   () is the mass of the agent  at time .
The force acting on agent  from the agent  at time  is defined as where () is gravitational constant at time , () =  0 (1 − /),  is a small constant, and   () is the distance between agents  and .Then,    () and   () can be calculated: where rand  is a random number in the interval [0, 1].fit  () is the fitness value of the agent  at time .best() and worst() are the best and worst fit  () at time , respectively.According to the difference of the position updating, GSA can be divided into continuous GSA (CGSA) and binary GSA (BGSA).In the binary algorithm, the position updating means a switching between "0" and "1" values.In the implementation of the BGSA, a large value of velocity must provide a high probability of hanging the position of the mass with respect to its previous position and a small value of the velocity must provide a small probability of changing the position.So, V   can be transferred into a probability function (V   ) as follows: Then, the agents will move according to the following rule:

Characteristics Ranking by Mutual Information.
In information theory, mutual information is used to quantitatively analyze the relationship between two characteristics or between a characteristic and a class variable.There are two subsets about the characteristics, one is the already selected subset ; the other is unselected subset .Among all the characteristics in , the characteristic   , which has the largest information about classes  that not provided by the already selected characteristics, can be selected into  [19].The mutual information (  ) of characteristic   can be estimated as follows [22]: where (;   ) = (;  )/log 2 (|Ω  |) and   is the characteristic of already selected subset.

Improved GSA.
We use the GSA method to identify the CTQs for complex products.It has been proved that GSA is a better optimization algorithm than PSO and GA in most cases [20].However, similar to other intelligent algorithms, GSA has a limitation of premature convergence.In order to overcome the drawback, we use opposition-based learning and immune strategy to improve GSA (IGSA).
The concept of opposition-based learning was introduced by Tizhoosh [23].The main idea behind opposition-based learning is the simultaneous consideration of an estimate and its corresponding opposite estimate in order to achieve a better approximation for the current candidate solution [24].Let  ∈  be a real number defined on a certain interval:  ∈ [, ].The opposite number x is defined as follows: Analogously, the opposite number in a multidimensional case can be defined.Immune algorithm (IA) is a kind of optimization method based on the characteristics of the biological immune system.It is proposed by Farmer, Packard, and Perelson in 1986, when they discussed links between immune system and other artificial intelligence methods.It has the capability to control a complex system [25] and has been applied in many fields.In the IA, antigen represents the problem to be solved.An antibody is generated where each member represents a candidate solution.Affinity is the fitness of an antibody to the antigen.The key of IA is how to generate antibodies [26].
Immune strategy can effectively solve the problem of population diversity and improve the convergence speed through immune recognition and immune memory.In the immune strategy, affinity is used to describe the information contained in an antibody [25].The affinity function can be calculated as follows: where  is the dimension,  is the number of antibodies, and   is the best position.Then the affinity of all memory antibodies with best position is calculated.

Implementation of Hybrid Approach
4.3.1.Representation of Position.Assume that there are  total characteristics; there will be 2  kinds of characteristic subsets which are different from each other.If each agent takes one characteristic subset, the agent's position can be represented as binary bit strings of length ; every bit represents a characteristic; 1 means the corresponding characteristic is selected while 0 means the characteristic is not selected.

Fitness Function.
The CTQs subset should not only have a small length but also have a high classification quality [27].So the fitness function can be defined as follows: where   () is the classification quality of condition attribute set  relative to decision  in rough set theory and || is the length of selected characteristic subset.|| is the total number of characteristics. and  are two parameters corresponding to the importance of classification quality and subset length,  ∈ [0, 1] and  = 1 − .The high  assures that the best position is at least a real rough set reduction.We can calculate the quality of each position by the formula.The goal is to maximize fitness values [27].

Hybrid Approach (MIGSA).
The proposed method has two major parts.The first one is ranking the characteristics by mutual information, and the second one is finding the optimal subset of characteristics by improved GSA based on the result of the first part.Let  be the number of the characteristics that we select by the mutual information,  < .Then, the characteristics dimension is reduced through the selection.

Experiments
For our experiments, we implement the hybrid approach for CTQs identification in Matlab 7.After that, characteristics are selected to optimize through improved GSA.The program is terminated when the algorithm reaches the stopping criterion.The parameters of configuration of GSA are given in Table 2. Figure 1 is the process of global best on SECOM and Statlog (LS) by IGSA and GSA.From Figure 1, we find that the IGSA overcomes the premature convergence of GSA.Then, SVM is used as the training procedure, and the classification accuracy of CTQs is estimated by tenfold cross validation.The results are given in Table 3. From Table 3, we can find that the number of CTQs is 11, and the classification is 88.3% for SECOM dataset; the number of CTQs is 5, and the classification is 84.9% for the Statlog (LS) dataset.
In order to prove the proposed method's capability, it is compared with four algorithms (MRPSO, BPSO, CBPSO, and ReliefF) in the literature [28] from two dimensions: number of CTQs and accuracy using Statlog (LS).The results are listed in Table 4. From Table 4, we can find that the accuracy of CTQs obtained by 5 methods is similar.However, the numbers of CTQs are significantly different.Of the four existing methods, the best is MRPSO with 85.24 percent accuracy and 16 CTQs, while the accuracy of CTQs obtained by our proposed method MIGSA can reach 84.9 percent and only needs 5 CTQs.Hence, the proposed approach is an efficient method of CTQs identification.

Conclusions
In order to solve the identification of CTQs for complex products, we propose a hybrid approach based on mutual information and GSA.Due to premature convergence that often happens on GSA, we improve the algorithm through opposition learning and immune algorithm.At first, we compare the improved method IGSA with the original method GSA, and the results of experiment show that IGSA has a strong search capability.Then, MIGSA is compared with 4 methods in the literature; the experimental results show that it can reduce CTQs dimension greatly.From experiments, it can be said that MIGSA is an effective method to identify CTQs for complex products.
Input: data set ; , the number of characteristics;  = { 1 ,  2 , . . .,   }, the set of characteristics;  = ⌀; , the set of classes; Values of parameters in GSA: , the size of agents;  max , the maximum iteration;   , a constant;  0 , the initial positions of agents. 0 , initial velocities of agents; Fitness max , the best fitness value.  ∈ , do Compute (  ) end Select the next characteristic   with the maximum (  ).

Table 1 :
The dataset of SECOM is used to test the proposed method's capability, and Statlog (LS) dataset is used to compare with other methods in literatures.Before the beginning of hybrid approach, data preprocessing, which contains missing values preprocessing and data balance, should be done first.Then mutual information was calculated between characteristics and classes .Information of 2 datasets.

Table 2 :
Parameters of algorithm used in the experiment.

Table 4 :
Number of CTQs and accuracy with different methods for dataset landsat satellite.