A Multi-Index Generative Adversarial Network for Tool Wear Detection with Imbalanced Data

The scarcity of abnormal data leads to imbalanced data in the field of monitoring tool wear conditions. In this paper, a novel multi-index generative adversarial network (MI-GAN) is proposed to detect the tool wear conditions subject to imbalanced signal data. First, the generator in the MI-GAN is trained to produce fake normal signals, and the discriminator computes scores of testing signals and generated signals. Next, the generator detects abnormal signals based on the performance of imitating testing signals, and the discriminator will compute the scores of testing signals and generated signals. Subsequently, two indexes, i.e., 
 
 
 
 L
 
 
 2
 
 
 
 -norm and temporal correlation coefficient (CORT), are put forward to measure the similarity between generated signals and testing signals. Finally, our decision-making function further combines 
 
 
 
 L
 
 
 2
 
 
 
 -norm and CORT with two discriminator scores to determine the tool conditions. Experimental results show that our method obtains 97% accuracy in tool wear detection based on imbalanced data without manual feature extraction, which outperforms traditional machine learning methods.


Introduction
e past decades have witnessed the rapid development of equipment-manufacturing technologies. With the widespread applications of computerized numerical control (CNC) machines, the productivity and accuracy in the manufacturing processes have been significantly enhanced. As an important part of CNC machines, the conditions of the cutting tools are critical to the quality of products. However, since the cutting tools are usually subject to extremely severe rubbing environment, tool wear is inevitable in the manufacturing processes. Due to the frequent friction and extrusion, the cutting tools gradually wear out, lose sharpness, and become dull. e wear of cutting tools might increase the cutting forces and temperature, degrade the machine surface texture, and even lead to fatal failures to normal operation of the manufacturing systems in some extreme cases. erefore, designing effective real-time tool wear monitoring techniques that can guarantee high quality in the manufacturing process has received much attention from both academia and industry [1][2][3].
Many attempts have been made to detect tool wear in recent years, among which image-based methods are the most widely used approaches. For example, the authors in [4][5][6] have developed an image-based online tool condition detection technique by using optical methods. However, it should be noted that such an image-based method is usually vulnerable to the manufacturing environment as the cutting fluid, chips, and dirt might hinder the image acquisition of the workpieces, and hence, the tool conditions can only be measured at the end of one batch. To handle such an issue, some online tool condition detection methods are proposed based on signal analysis. As is well recognized, there exists a certain relationship between the tool wear condition and the manufacturing signals, e.g., vibration, acoustic, and current signals. ese signals would present different features when the cutting tool works in different conditions. By deploying specific sensors, manufacturing signals can be easily collected and then exploited to determine the tool wear condition in a real-time manner.
Up to now, various methods have been proposed on tool wear monitoring techniques based on signal analysis. To be specific, the authors in [7] have utilized feedback current signals to classify the tool condition. e performance and accuracy of several pattern recognition techniques, including support vector machine (SVM), linear discriminant analysis (LDA), K-nearest neighbor (KNN), neural network (NN) with one hidden layer, naïve Bayes (NB), and decision trees (DT), have been compared. It has been concluded that, among these classification techniques, LDA and SVM are the most effective ones. Moreover, a comparative study has been conducted in [8] which demonstrates that the predictive capability of random forests (RFs) outperforms artificial neural networks (ANNs) with one hidden layer and support vector regression (SVR). However, all these traditional techniques [7][8][9][10][11][12][13] highly rely on the features manually extracted from raw data, which need to analyze characteristics of the manufacturing signals and extract reliable features based on prior knowledge of specific tasks, and thus are cumbersome. Consequently, it is of practical significance to develop an easy-to-use method without the feature extraction procedure.
It should be pointed out that deep learning is a promising technique capable of automatically searching information features behind the raw data. In [14][15][16][17], the feasibility of convolutional neural networks (CNNs) on extracting information features has been presented. In fact, one can convert input signals into images using Gramian Angular Summation Fields and then leverage CNN to classify the tool wear condition [18]. In [19], a stacked sparse autoencoder neural network has been presented to diagnose the tool wear conditions. By decomposing original 1D signals to reconstruct 2D ones, CNNs can accurately predict the tool wear condition [20]. However, there are some critical shortcomings of the CNN-based tool condition detection [14][15][16][17]. For example, a large labeled dataset is required to improve the performance of the CNN, and the signal-to-image conversion procedure might result in a great information loss of the raw signals [18]. In addition, for the majorities of the industrial processes, the normal and abnormal labels of the data are imbalanced. e low failure rate of the manufacturing machines leads to the scarcity of abnormal data, and with limited abnormal data, it is almost impossible to describe the accurate characteristics of all fault types.
Very recently, generative adversarial networks have been utilized to solve imbalanced data problems in unsupervised anomaly detection. To be specific, the authors in [21] have put forward a generative adversarial network with an encoder-decoder-encoder three-subnetwork generator to handle imbalanced industrial time series, but an obvious shortcoming lies in that it still has to manually extract features. Moreover, some other GAN-based methods such as AnoGAN and GAN-AD have also been proposed to achieve detection targets [22,23]. In these previous works, an anomaly score is established to detect the abnormal signals by directly combining the residual loss and the discriminator loss. However, it is noted that the magnitude of the residual loss is usually different from that of the discriminator loss, and thus, a direct combination of these two losses would not lead to a satisfactory detection.
Motivated by the above discussion, in this paper, we propose a novel multi-index GAN (MI-GAN) method for tool wear detection involving imbalanced data. To deal with the challenges stemming from data imbalance, the generator of the proposed MI-GAN focuses on learning the characteristics of the normal signals, and the discriminator outputs the scores which represent the fake level of the input signals. Such a network architecture can effectively detect the tool condition by learning how to generate and distinguish normal signals. Especially, a novel multi-index decisionmaking function which incorporates L 2 -norm, temporal correlation coefficient (CORT) [24], the score of the input signal, and the score of the generated signal is developed to aggregate more representative criteria. In the end, we use the temporal convolution networks (TCNs) as the backbone of our generator and discriminator networks to encode more temporal and sequential features, which could further leverage the performance of the model instead of the empirical feature definition. We sample real current signals from a CNC machine in different manufacturing processes and then conduct the tool wear detection experiments. e experimental results demonstrate that our designed MI-GAN could achieve competitive performance on the accuracy, precision, recall, and F-measure in detecting the abnormal tool conditions. Overall, the main contributions of this paper can be highlighted as follows: (1) a novel MI-GAN which can generate and distinguish normal signals is established to solve imbalanced data in monitoring tool wear conditions; (2) a decision-making function with multi-index is developed to aggregate more representative criteria; and (3) the feature extraction ability is enhanced by applying the TCN as the backbone of our MI-GAN generator and discriminator networks without predefined feature selection. e rest of the paper is organized as follows. Section 2 introduces the methodology of the proposed MI-GAN including the data acquisition procedure, the architecture of the MI-GAN, the training process, and the multi-index decision-making algorithm. In Section 3, we carry out the experimental results to evaluate the performance of our method. Section 4 finally draws conclusions.

Structure of Monitoring Systems
In the manufacturing processes of CNC machines, it is more suitable to monitor the state of the tool by sampling the current signals of the spindle motor and analyzing their realtime characteristics. As shown in Figure 1, the spindle transmission system consists of a motor, spindle, timing belt, drive, etc. e main shaft is connected to the motor through a belt, the machine tool sends a control signal to the drive which outputs the spindle motor current to drive the motor to rotate, and the spindle is derived by the motor via a timing belt. Since the spindle motor current is positively related to the cutting force, the change of the real-time spindle current could reflect the change of the cutting force. With the gradual wear of the tool during the machining process, the cutting performance of the tool will gradually decrease and thus requires greater cutting force which leads to the increase of motor current. erefore, the current signal of the spindle is utilized to detect the tool conditions throughout the paper.

Methodology
A detailed introduction of the proposed MI-GAN will be presented in this section. Generally speaking, the procedure of tool condition detection based on MI-GAN can be roughly divided into the following two stages.
In stage 1, the normal signals are exploited to train the generative adversarial network. e generator tries to synthesize normal signals by minimizing generator loss. Generated signals and normal signals are fed into the discriminator in sequence to train the network. When the training losses converge, the generator can stably generate plausible normal signals, and the discriminator can measure the fake-level scores of actual signals and generated signals. Here, actual signals denote the signals in the dataset.
In stage 2, we will make full use of our pretrained model to calculate four specific indexes (i.e., L 2 -norm, CORT, and two fake-level scores, which will be further clarified in Section 3.4). First, by inverting input signals to specific noises z, the pretrained generator is able to generate signals similar to input signals. Next, the values of indexes L 2 -norm and CORT can be calculated between the generated signals and the input signals. e fake-level scores of the generated signals and the input signals will be output from the pretrained discriminator. Furthermore, the mean and standard deviation of these four indexes are calculated to determine the detecting thresholds. Eventually, the model compares four indexes of the testing signal with the thresholds to determine either normal signal or abnormal. A brief overview is provided to help understand our method in Figure 2, and all the technical details will be explained in the following sections.

Data Acquisition.
e dataset used is the spindle current signals collected by the CNC machine tool when drilling 3 holes in a workpiece. e sampling frequency of the current signals is 10 Hz, and each 130 sampling point data collected is recorded as a set of signals. In each manufacturing process, the CNC machine tools are monitored in time, and their tool condition (i.e., normal or abnormal) was recorded as a label attached with each set. e dataset can be denoted as . . , X in is time series data with the total length n, and X ik is the current value of C i at sampling instant k. We divide our dataset into two subsets, i.e., training dataset and testing dataset. e training dataset includes 800 normal signals, and the testing dataset contains 287 normal signals and 11 abnormal signals.

Architecture of the MI-GAN.
e schematic diagram of the MI-GAN architecture is illustrated in Figure 3, which consists of a generator and a discriminator. e generator projects the random noise space to the normal signal space so as to confuse the discriminator. It takes a vector of 130 random numbers independently drawn from a uniform distribution as the input and outputs a fake normal signal of length 130. Both generated signals and normal signals are fed into the discriminator to obtain fake-level scores, which indicate how fake the input signals are. A higher fake-level score indicates that the input signal is more likely to be an abnormal one, and vice versa. Notice that the TCN residual block (as depicted in Figure 4) has excellent sequence modeling performance [25]. Consequently, the generator first utilizes six TCN residual blocks to extract the features of input noise vector z and then a linear layer to synthesize the plausible signal. Similarly, the TCN residual blocks are also exploited as a part of our discriminator, and two linear layers are added to compute fake-level scores.

Training Generative Adversarial Networks.
e classical training method for the generative adversarial networks is to optimize Jensen-Shannon (JS) or Kullback-Leibler divergence between the model distribution and real-data distribution. However, it is pointed out in [26] that such methods are unreliable and might lead to training instability and mode collapse. To avoid this problem, we use Wasserstein divergence [27] as our min-max objective as follows: where P g is the distribution of generated signals, P r is the distribution of real signals, P u is a Radon probability measure, z represents random noises, C indicates the normal signals, and C is a sequence sampled from P u . G(.) and D(.) represent the generator and discriminator, respectively. Hence, the generative adversarial networks are trained by optimizing the generator loss L G and the discriminator loss L D given in (2)  Complexity 3 where C � (1 − μ)C + μG(z) and μ is a nonnegative scalar belonging to [0, 1]. Inspired by [21,23], only the normal signals are input into the discriminator for the sake of encouraging the MI-GAN to generate fake normal signals. e generator produces fake normal signals to confuse the discriminator by minimizing L G , and the discriminator is trained to compute the fake-level score to distinguish normal signals and fake normal signals by minimizing L D . Furthermore, the Adam algorithm [28] is adopted to optimize the generator and discriminator losses. In the implementation, we choose the same weights for C and G(z) to compute C, i.e., μ � 0.5.

reshold Determination.
We utilized both generator and discriminator to monitor tool wear conditions. To make full use of the generator, the input signals will be projected to the latent space using our pretrained generator. Referring to the inversion technique in [29], we apply Algorithm 1 in Table 1 to search specific noises z as the input of the generator, whose corresponding generated signals C are similar to the input signals C.
is inversion technique uses the RMSprop optimization algorithm [30] to minimize the mean square error of generated signals and input signals.
Since the generator captures only the normal pattern, it is difficult to output fake abnormal signals. When the input signals are abnormal, the generated signals will be dissimilar    Figure 2: e overall framework of the MI-GAN. In stage 1, the generative adversarial network is trained. Stage 2 makes full use of the pretrained model to calculate four specific indexes. 4 Complexity with input signals. erefore, the generated signals will fall into two categories: one is similar signals whose inputs are normal signals, and the other one is dissimilar signals whose inputs are abnormal signals. Denote C i � (X i1 , X i2 , . . . , X in ), X k � (X 1k , X 2k , . . . , X mk ), and X k � (X 1k , X 2k , . . . , X mk ). In order to measure the similarity between generated signals and input signals, we introduce L 2 -norm (i.e., L(C, C)) and first-order CORT (i.e., CORT(C, C)) as follows: In fact, L 2 -norm in (4) indicates the distance between generated signals and input signals, and the first-order CORT in (5) evaluates their growth behavior similarity. A larger L 2 -norm or a smaller CORT means a larger dissimilarity between generated signals and input signals. at is, the input signals are more probable to be abnormal. Furthermore, we also utilize our pretrained discriminator to help distinguish whether the input signal is normal. e output of our discriminator can be regarded as a fakelevel score. Since we only feed normal data in the training stage, a higher fake-level score represents the signal is less likely to be a normal one. erefore, the discriminator will output a high fake-level score for abnormal signals. e fakelevel scores of normal signals will be lower than those of abnormal signals, and their corresponding generated signals will have the same situation. By now, we have put forward four indexes to help identify the tool wear condition, which are L 2 -norm, CORT, the fake-level scores of input signals, and the fake-level scores of generated signals.
Given these indexes, four thresholds could be established to distinguish the normal and abnormal signals. In order to determine the threshold values, we need to calculate four indexes of the normal signals in the training dataset. Subsequently, the mean and standard deviation of the values of the indexes are computed, which are critical for identifying abnormal inputs. Since CORT is inversely related to the possibility of signals to be abnormal and the other three indexes are positively correlated, the thresholds of L 2 -norm of normal signals and two fake-level scores are defined as their mean plus standard deviation, whereas the threshold of CORT is denoted as its mean value minus standard deviation. ese thresholds are based on statistic principles and thus can classify the input signals effectively.

Multi-Index Decision-Making Function.
We develop a decision-making function to reach the detection target. For each input signal, the function uses the pretrained model and equations (4) and (5) to obtain four indexes. At first, these indexes are uniformed by subtracting their corresponding thresholds to determine whether the signal is abnormal or not. For uniformed L 2 -norm and two fake-level scores, positive values indicate that the inputs are more likely to be abnormal, and their magnitude describes the corresponding possibility. For uniformed CORT, negative values indicate a higher probability of input signals to be abnormal. In other words, the less similar the generated signals are, the smaller the CORT is. After the uniformed procedure, these indexes are further scaled into the same range by dividing their standard deviation, respectively. Finally, we integrate all the scaled indexes to a comprehensive score, based on which one can infer the tool wear condition. To sum up, our multi-index decision-making algorithm is elaborated in Table 2.

Evaluation Metrics.
As the current signal data are imbalanced, we use accuracy, weighted average recall (W-recall), precision, and F-measure to evaluate the overall performance of our detection method. Denote TP, FP, TN, and FN as true positive, false positive, true negative, and false negative, respectively. For each class, the recall, precision, and F-measure can be calculated as follows: Moreover, the W-recall, W-precision, and W-F metrics are defined as follows:
(2) for k � 1 to t do: (3) Generate a batch of fake normal signals: C � G(z). (4) Calculate the mean square error between C and C as L←1/m m prop(L, α). (6) end for (7) return the specific noises z.
where c i is the number of instances in class i and s is the number of total classes. Recall i , Precision i , and F-measure i are the recall, precision, and F-measure of the ith class, respectively.

Generation Capability of the Generator.
To evaluate the performance of the generator, we compare the generation ability of the generator when the inputs are normal signals and abnormal signals. As we utilize only normal signals to train the MI-GAN, it is effective to obtain inverted noises z and produce similar signals to the normal signals. Nevertheless, the pretrained generator never learns the features of abnormal signals and thus cannot produce fake abnormal signals even when the inputs are abnormal signals. In other words, when the input signals are abnormal, the generated signals will be dissimilar with input signals. erefore, the generated signals will fall into two categories: one is similar signals whose inputs are normal signals, and the other one is dissimilar signals whose inputs are abnormal signals. Figure 5 presents the generation ability of our pretrained generator. In Figure 5, we input two types of signals (i.e., normal and abnormal ones) and utilize Algorithm 1 to search specific noises z. Since the reconstruction capability for normal and abnormal signals is different, the generated signals will fall into two categories, i.e., similar signals and dissimilar signals. Using this characteristic of the generator, one can classify different types of input signals to achieve the tool wear conditions.

Discrimination Ability of the Discriminator.
In this paper, instead of computing real or fake probability, the discriminator outputs a fake-level score, which can interpret how fake the signals are. To demonstrate the discrimination ability of the discriminator, we put the mean and standard deviation of the fake-level scores in Table 3.
According to Table 3, we can see that the mean fake-level scores of actual signals (− 129.47 and − 88.08) are lower than those of generated signals (− 78.30 and − 49.11), which indicates actual signals are more real than generated signals. In addition, normal signals have a lower mean fake-level score (− 129.47) than that of abnormal signals (− 88.08) since the generator can capture only the features of the normal signals. As for generated signals, it is straightforward to expect that the signals dissimilar to abnormal input signals have a higher mean fake-level score (− 49.11) than that of the signals similar to normal input signals (− 78.30). e standard deviation of abnormal signals is obviously larger than the remaining ones since the abnormal signals have more diverse modes than other ones.

Distribution of Four Indexes.
To prove the effectiveness of four indexes, their respective distributions are plotted in Figure 6. We consider three different data types including normal signals in the training dataset, normal signals in the Table 2: Multi-index decision-making algorithm. Algorithm 2 Require: mean of L 2 -norm, CORT, fake-level score of normal signals, and fake-level score of the generated signal, denoted as L m , CORT m , F m (n), and F m (g). Standard deviation of L 2 -norm, CORT, fake-level score of normal signals, and fake-level score of the generated signal, denoted as L std , CORT std , F std (n), and F std (g). Input signals C, pretrained Generator G, and pretrained Generator D.
(1) Invert C to the latent space to search noises z using Algorithm 1.
(3) Calculate the L 2 -norm and CORT between C and C based on (4) and (5). (4) Compute the fake-level scores of C and C, i.e., D(C) and D(C), based on the discriminator (5) Subtract thresholds: ) Scale the indexes: (11) In de x 1 � In de x 1 /L st d (12) In de x 2 � − In de x 2 /CORT st d (13) In de x 3 � In de x 3 /F st d (n) (14) In de x 4  testing dataset, and abnormal signals in the testing dataset (abbreviated as TraNS, TeNS, and TeAS) and present the corresponding distribution of each index. According to Figure 6, it can be seen that four indexes of TraNS and TeNS are extremely close, whereas four indexes of TraNS are different from those of TeAS. erefore, we can exploit the mean and standard deviation of the indexes of TraNS to establish effective thresholds in the decision-making function so as to guarantee a satisfactory detection performance.

Effectiveness of Different Indexes.
In this section, we explore the influence of different indexes on the performance of tool wear detection. For each individual index, we can calculate the corresponding value according to lines 11-14 of Algorithm 2 for anomaly detection. Furthermore, the performance of our MI-GAN is also presented as a comparison. In light of the results in Table 4, it can be observed that our MI-GAN has the best performance. e reason is that the proposed MI-GAN integrates multiple  criteria and thus processes a more comprehensive detection capability.

Comparison with Other Machine Learning Methods.
To further demonstrate the efficiency of the proposed MI-GAN, we compare it with other state-of-the-art machine learning methods: KNN [31], SVM [32], RF [33], CNN [14], Inception Net [34], DenseNet [35], ResNet [36], SE-Net [37], and Nonlocal-Net [38]. For a fair comparison, we reimplement those methods and adjust the hyperparameters on our training dataset to gain the best performance. e numerical results of different methods are illustrated in Table 5, from which we can conclude that the performance of CNN-based methods is better than that of the traditional machine learning ones (i.e., LNN, SVM, and RF). Besides, the proposed MI-GAN has a better performance with a significant improvement on accuracy, recall, precision, and F-measure compared with the CNN method. is is due to the fact that our MI-GAN could extract more critical features, and the excellent generation and discrimination capability of the MI-GAN could effectively handle the data imbalance challenge.

Conclusion
In this paper, we have developed a novel MI-GAN to detect the tool wear conditions for imbalanced industrial data where normal signals are much larger than abnormal ones. TCN residual blocks have been utilized as the backbone of the generator and discriminator networks to leverage the ability of feature extraction. In addition, a decision-making function with multi-index has been exploited to aggregate more representative criteria such that the detection performance can be further improved.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.