Intelligent Deep Adversarial Network Fault Diagnosis Method Using Semisupervised Learning

In recent years, deep learning has become a popular issue in the intelligent fault diagnosis of industrial equipment. Under practical working conditions, although the collected vibration data are of large capacity, most of the vibration data are not labeled. Collecting and labeling sufficient fault data for each condition are unrealistic. ,erefore, constructing a reliable fault diagnosis model with a small amount of labeled vibration data is a significant problem. In this paper, the vibration time-domain signal of the fault bearing is transformed into a 2-dimensional image by wavelet transform to obtain the time-frequency domain information of the original data. A deep adversarial convolutional neural network based on semisupervised learning is proposed. A large amount of fake data generated by the generator and unlabeled true vibration data are used in the discriminator to learn the overall distribution of data by judging the authenticity of the input. ,ree regular terms for different loss functions are designed to constrain the parameters of the discriminator to improve the learning ability of the model. ,e proposed method is validated by two bearing fault diagnosis cases. ,e experiment results show that the proposed method has higher diagnostic accuracy than traditional deep models on multigroup small datasets of different capacities. ,e proposed method provides a new solution to the fault diagnosis problem with large vibration data but few labels.


Introduction
In complex industrial systems, the study of advanced methods of mechanical fault diagnosis is an important part of ensuring the safety of equipment [1,2]. Deep learning (DL) theory has become a popular issue in the field of datadriven intelligent faults diagnosis with its powerful modeling and characterization capabilities [3,4]. DL-based fault diagnosis method has a strong ability to learn the fault feature of raw data. However, a large amount of labeled data is required for the training of DL model [5,6]. Under practical working conditions, although the collected vibration data are of large capacity, most of the vibration data are not labeled. Collecting and labeling sufficient fault data for each condition are unrealistic [7,8].
In order to solve the fault diagnosis problem of insufficient labeled data, there are some research studies on fault diagnosis of limited data. Hang et al. proposed principal component analysis (PCA) and applied it to the field of highdimensional imbalance fault diagnosis data [9]. Duan et al. applied a new support vector data description method for mechanical fault diagnosis of unbalanced datasets [10]. Many scholars introduced the Generative Adversarial Networks (GANs) [11] idea to generate the required enough vibration data for classifiers. Wang et al. proposed a novel fault diagnosis method that combined the GAN and Stacked Automatic Encoder (SAE) [12]. e samples produced by the generator were used to expand the sample size on the input along with the original samples. Xie and Zhang proposed a deep convolutional GAN model that simulated the original distribution of several fault categories and solved the imbalance of training data by generating new data [13]. Li et al. proposed a cross-domain fault diagnosis method based on deep generation neural network, which provided reliable cross-domain diagnosis results by artificially generating fake samples for domain adaptation [14]. Mao et al. used FFT to preprocess the original vibration signal to obtain the spectrum data of fault samples as the input of GAN and generated fake samples for a few fault classes [15]. en, a stacked denoising auto encoder model for fault diagnosis was established and verified on a series of comparison experiments.
From the above, GAN is used to generate the required target data for establishing effective mapping relationships of fault feature [16,17]. However, in the practical industrial environment, the fault features in the raw vibration signal are complicated [18,19]. e data generated by the GAN can only fit the significant features of the fault signal, which finds difficulty in fully meeting the original signal in actual situation. Moreover, the fault mode of vibration data is variable under different working conditions. If we used the vibration data under one condition to establish the mapping relationship of the different fault features, the robustness of the GAN model is unsatisfactory.
erefore, it is difficult to obtain high fault classification accuracy under variable working conditions.
Considering that labeled fault data under variable working conditions are difficult to collect, this paper proposes a semisupervised learning deep adversarial convolutional neural network model (SACNN). We no longer use GANs to generate target data but directly train high-precision diagnostic models with few labeled data and large amounts of unlabeled data. e main insights and contributions of this study are summarized as follows: (1) A semisupervised learning deep convolutional network is proposed for fault diagnosis with insufficient labeled data. e semisupervised GAN is composed of a generator and a variant discriminator, which are constructed by multiple groups of convolutional network modules. By respectively defining true and fake labels for the unlabeled data and the generated data to train the discriminator, the model can obtain the fault features of the overall sample distribution.
(2) ree regular terms, one-sided label smoothing, L2 weight penalty, and feature-matching regular terms, are, respectively, defined into the loss function of the model to constrain the parameters of the discriminator, which can improve the classification accuracy and stability of the model.
(3) e series experiments are conducted on the CWRU bearing dataset and lab-built bearing experimental platform. Compared with three other popular methods, the advantages of the model in a few labeled datasets are verified. e rest of the paper is organized as follows. Section 2 introduces preliminaries of SACNN. Section 3 details the proposed SACNN model, including problem description, data preprocessing, model structure, and optimization objectives. In Section 4, experimental verification and corresponding analysis are conducted. e conclusions are drawn in Section 5.

Generative Adversarial Networks.
e generation adversarial network (GAN) is a generic model that can randomly generate observation data. GAN is consisted of two subnetworks: a generator and a discriminator, as shown in Figure 1. e noise from the prior distribution z noise as the input of the generator hopes to generate the data G(z noise ), which is expected to be true. e discriminator is used to determine whether the input data are generated data or a real sample. If the input is the real data, then the output of the discriminator is expected to be 1, where D(χ) � 1; otherwise, D(χ) � 0. rough this training process, the discriminator learns the characteristics of the real data. e loss function of the discriminator is expressed as where the first part is used to identify real data and the second part is used to identify fake data. e loss function of generator is GAN is usually expressed in the form of a zero-sum game, using the loss function V(D, G) to represent the loss of the GAN, where G wants to minimize V, while D wants to maximize V: (3) e parameters of the two networks are updated by alternating gradient descent. e parameters of the generator are fixed firstly; the discriminator is trained by using the generated and the true data. en the parameters of discriminator are fixed and the generator's parameters are updated. e generator can finally generate realistic data by repeated alternating training.

Proposed Method
3.1. Problem Description. GAN can realize the expansion of small samples, but the generated data are difficult to meet the actual situation. e semisupervised learning can add unlabeled data to the supervised classification algorithm. Although unlabeled data do not provide labels, it provides information about the overall distribution of datasets, which provide a better classification surface for the model. erefore, according to the idea of semisupervised learning, the intelligent fault diagnosis problem based on semisupervised learning deep adversarial network is formally defined: (1) ere is a fault bearing dataset, including two parts of data. One has n L labeled vibration fault dataset , and the other has n U unlabeled vibration fault dataset X U � x U i n U i�1 and n L ≪ n U .
(2) X L is used to supervise training and X U is used for assisted training to improve the fault classification accuracy of the model. (3) GAN is used in the semisupervised learning field; it is still responsible for generating images from the input noise data. So we define the data generated by the generator as X z � x z i z i�1 . Define all data labels in X U as true and in X z as fake.
(4) e discriminator D is not a simple true-fake classification (binary classifier). Assuming that the input data have k classes, D is a classifier of k + 1, and the k + 1th class is the discriminator of whether the input is true or fake. e overview of the problem description is shown in Figure 2.
e output of discriminator network D extracts the estimated probability of input data from the data generation distribution. Traditionally, this is a feedforward network, which ends with a single sigmoid unit. e output based on the sigmoid unit is [True, Fake]. But, in our paper, we modify the discriminator network, ending with the Softmax output layer.
us, the discriminator network has n + 1 output units, corresponding to [1-class, 2-class, . . ., k-class, True/ Fake]. In this case, the labeled data can be input into the discriminator for training. e discriminator can not only distinguish between true and fake but also output fault category.

Data Preprocessing.
To obtain the time-frequency domain information of the original vibration signal, we used the wavelet transform (WT) [21] to transform the one-dimensional time-domain vibration signal into 2-dimensional image, as shown in Figure 3. is transformation enables localized analysis of the time-frequency domain and progressive multiscale refinement of the signal. Finally, the time domain is subdivided at the high frequency of the signal and the frequency domain is subdivided at the low frequency.
us, WT can focus on any detail of the signal, so as to automatically adapt to the analysis requirements of timefrequency signal. e formula of continuous WT is given by where a is the scaling factor of the wavelet function; τ is the translation factor of the wavelet function; f(t) is the raw vibration signal; and ψ(t) is the wavelet basis function used in this paper, expressed as follows: In this paper, we selected 512 time-domain data points of the original signal for wavelet transformation each time. e wavelet function center rate was 0.8125, and the transformation results were converted into black and white images with a size of 64 * 64, which were input into the model.

Model Structure of SACNN.
is proposed model consists of two parts: a generator and a discriminator. Generator G receives an n-dimensional vector as input, represents the data space mapping the noise to real images, θ g represents the parameters of generator, and Z is the output data. e network structure of the generator is first a fully connected layer that maps the original input to 1024 dimensional vectors, followed by a Batch Normalization (BN) layer, a ReLU activation function layer, and finally a deconvolution layer. A deconvolution module is shown as follows: where ⊗ is the deconvolution operation, W g and b g are the deconvolution kernel and the bias, and h g is the result of the deconvolution operation. e generator is composed of four such deconvolution modules. e size of the four deconvolution cores is 5 × 5, and the number of deconvolution cores is 8, 16, 32, and 64, successively. en, BN layer is used to speed up convergence and improve generalization performance. e structure of the discriminator consists of three convolution modules. Each convolution module is firstly a 2D convolution operation and then a Max_Pooling layer, followed by a BN layer and then a Leaky ReLU activation function layer. e form of the convolution block is expressed as follows: where * is the convolution operation, W d and b d are the convolution kernel and the bias, and h d is the result of the convolution operation. e sizes of the three convolution kernels are 5 × 5, 3 × 3, and 3 × 3, respectively, and the numbers of convolution kernels are 64, 128, and 256, respectively.

Generator
Discriminator

Mathematical Problems in Engineering
For identifying multiple fault classes, we enhanced the discriminator as a standard classifier based on previous work proposed by Salimans et al. [22]. We use the Softmax functions to replace the output of the discriminator. Suppose that the random vector z has a uniform noise distribution P z (z), and G(z) maps it to the data space of the real image. Assuming that there is a distribution P data (x, y) of training dataset, the input x of the discriminator is a real or fake image with the label y. Discriminator output is a k + 1 dimensional vector l � l 1 , l 2 , . . . , l k+1 . Finally, it is converted into a k + 1 dimensional vector of class probability P � p 1 , p 2 , . . . , p k+1 by Softmax function. e probability p j for the jth class is expressed as en, the real data are divided into one of the former k classes, and the fake data are divided into the k + 1th class. e flow chart of the proposed model is shown is Figure 3.

Optimization Objective.
In the process of training, we input labeled data and unlabeled data into the model training at the same time. We divide the input data into two parts according to semisupervised learning. e first part is labeled data, and the second part is data generated by the generator and unlabeled data. When the input is labeled data, it is supervised training and the discriminator needs to identify the specific fault category. When inputting the true unlabeled data and the fake data generated by the generator, it is unsupervised training, and the discriminator only needs to complete true-fake discrimination. e training of the whole model needs to optimize three kinds of loss function. For the labeled data in the training set, the model needs to calculate the corresponding probability of fault classification; the classification loss L label is as follows: For the unlabeled data in the training set, by calculating the probability of not being estimated as the k + 1th class, the loss L unlabel discriminating the truth and fakeness of the input data is as follows: For the fake data generated by the generator, by calculating the probability of being estimated as the k + 1th  class, the loss L fake discriminating the truth and fakeness of the input data is as follows: where P model (y | x) represents the probability that the discriminator output is fake, G represents the probability distribution of fake data produced by the generator, and P data represents the probability distribution of the real data.
When we calculate all the loss functions, one-sided label smoothing method is used for each label of the data in order to encourage the discriminator to estimate a flexible probability rather than predicting the classification of extreme confidence. If the discriminator relies too heavily on a certain set of features to detect the real data, the generator can quickly mimic these features to fool the discriminator in the GAN. en the discriminators will be easily exploited by generators to seriously damage classification accuracy. To avoid this problem, we optimize the discriminator when the prediction value of any real data exceed 0.9 (D(real data) > 0.9). e target label value is set to 0.9 instead of 1.0.
When the model input is real unlabeled data, it is sufficient to maximize the probability that the output is real data, without requiring specific classification, that is, maximize log(p model (y|x, y < k + 1)). As the training data of generator consist of two parts, generated fake data and unlabeled real vibration data, the loss function of the discriminator D consists of two parts: one is the supervised learning loss and the other is the unsupervised learning loss. e formula is as follows: where D regular represents the L2 regularization of all weights in the discriminator. For the generator, we hope the output data can fool the classifier. us, the loss function of the generator L G is where G feature_matching is a regularization term, which is proposed by Goodfellow et al. [9] to improve the stability of the model during the training process. e fake data and real data are input to the discriminator to match the fault characteristics of the middle layer, so that the results of each layer are as similar as possible. Suppose that f(x) represents the activation function of the middle layer of discriminator, and Z is the noise output of the generator. G feature_matching is defined to match the fault feature between the training data distribution and the generated fake data, expressed as After the design of the two objective functions, the backpropagation algorithm can be used to train the model parameters. In this paper, the adaptive moment estimation (Adam) algorithm is used to update the parameters. e algorithm of the gradient provides independent adaptive learning rates for different parameters by calculating the first-order moment estimation and the second-order moment estimation of the gradient. Based on the aforementioned equation, the parameters θ d and θ g are updated as follows: where α, β 1 , and β 2 are the Adam optimizer parameters and θ d and θ g are the determiner and generator network parameters. e algorithm pseudocode is shown in Algorithm 1.

Case Study
In this section, the performance of our proposed SACNN model is evaluated on the CWRU bearing datasets [20] and datasets of the lab-built experimental platform.  Figure 5 shows the conversion of four different fault vibration signals to 2-dimensional image. It can be seen from Figure 5 that the vibration signals of the different fault types have distinctly different distribution.

Case Study I: Fault
Based on the above fault data, we set the training and testing datasets with different capacity to verify the proposed model, as shown in Table 1. Each of the datasets contained four faults with different fault diameters. e training set contained a small amount of labeled data and a large amount of unlabeled data. e labeled data are randomly selected from vibration data of different fault types.

Experimental Results and Analysis.
e fault diagnosis results of the proposed method on four different training datasets are shown in Figure 6, compared with three different popular methods, CNN, CNN_SVM, and SAE; the details of three methods are shown in Table 2. Convolutional neural network (CNN) is widely used in fault diagnosis due to its advantages of local perception and parameter sharing. In this paper, we constructed the CNN model with a threelayer two-dimensional convolutional network, and Softmax Mathematical Problems in Engineering is used as the classifier. Support vector machine (SVM) classifier has good performance and generalization with few samples.
us, we chose CNN based on Support Vector Machine (CNN_SVM) method as another contrast method, which uses a three-layer two-dimensional convolutional network as the feature extractor and support vector machine Algorithm: SACNN Training Algorithm Demand: Number of total iteration I; e number of step to apply to the discriminator t; Batch size m; Adam optimizer parameters α, β 1 , β 2 ; Number of fault categories k; Amount of labeled data control parameters flag.

For I epochs do
For t steps do If t < flag then Draw m real samples (x (1) , y (1) ) · · · (x m , y (m) ) from P data (X) Else Draw m real samples x (1) · · · x (m) from P data (X) Draw m noise samples z (1) , . . . , z (m) from noise P g (Z) x noise ⟵G(Z) End If e discriminator is updated by ascending its stochastic gradient:

End for
Draw m noise samples z (1) , . . . , z (m) from noise P g (Z) x noise ⟵G(Z) e generator is updated by descending its stochastic gradient: as a classifier. Stacked Automatic Encoder (SAE) initializes the network parameters through unsupervised layer-wise pretraining, so as to improve the convergence speed of deep network and slow down the influence of gradient disappearance. We constructed the SAE network with two selfencoders and uses Softmax as the classifier. e experimental results are the average results of multiple experiments. e DL framework TensorFlow is used for the implementation of each method, and all methods are solved using the Adam algorithm. Figure 6 shows the classification accuracy of the four methods on datasets nos. 1, 2, 3, and 4, where 50 samples represent 50 labeled data in the training dataset, the same as 50 samples, 150 samples, and 200 samples. e CNN model has the same 3 convolutional layers as the discriminator of SACNN model. Because only a limited amount of labeled data is used for supervised training and no unlabeled data is used for auxiliary training, the accuracy of the CNN model is the lowest in the four models. e diagnosis accuracy on 50 labeled samples is only 63.35%, while the diagnosis accuracy on 200 labeled samples is 88.34%.
Similarly, the CNN_SVM model also contains three convolutional layers. Due to the enhanced constraints of the network parameters, the classification accuracy of CNN_SVM on four datasets is higher than that of CNN model. e diagnosis accuracy of CNN_SVM on 50 labeled samples was 87.26%, while the diagnosis accuracy on 200 labeled samples was 95.61%. e SAE model contains three hidden layers, and each hidden layer contains 400 neural network cells. e SAE method only uses labeled data for supervised training. e diagnosis accuracy of SAE on four datasets is higher than that of CNN and CNN_SVM. Because each layer of SAE network is trained separately, which is equivalent to initializing a reasonable value, such a network will be easier to train and have faster convergence and higher accuracy.
As we know that the proposed SACNN model is trained on not only small labeled samples but also a large number of unlabeled samples, besides the fact that model parameters are constrained by the loss of three regularization terms, the diagnostic accuracy of the proposed model on four datasets is the highest among the four models. For example, the   Since the SACNN method performs well on all four datasets, we only chose one case to discuss the details of the model performance on the dataset No. 3, as shown in Figure 7. Figure 7(a) is the relationship between test accuracy and number of iterations during training, and Figures 7(b) and 7(c) are training loss of the generator and discriminator in the training process. From the three figures, we can see the fast convergence and high precision of the SACNN model. Furthermore, in order to observe the impact of different labeled samples on the SACNN model, the diagnosis accuracy on four datasets with different capacities is shown in Figure 8. It can be concluded that the fault diagnosis results will be better with the increase of the numbers of training labeled samples. ese results are reasonable. Even if there are only 50 labeled samples in the training set, the diagnosis accuracy of the model can still reach 98.49%, which strongly proved the effectiveness of the model with large vibration data but few labels.

Case Study II: Fault Diagnosis of Lab-Built
Experimental Platform

Data Description
Bearing Dataset of Lab-Built Experimental Platform. In order to further study the performance of the proposed method in practice working conditions, we built the experimental platform in our lab, as shown in Figure 9. e three-phase motor through a flexible coupling controls the speed of bearing. Radial load is applied to the bearing block to simulate real working conditions, and vibration signals are collected by the acceleration sensor on the bearing block. e bearing contains four faults: Ball fault (BF), Inner ring fault (IF), and Outer ring fault (OF), as well as Normal bearing (N). e sampling frequency is 12.8 kHz and the rotating speed is 1500 r/min. We selected the original vibration signal from four types of fault bearings, as shown in Figure 10.
Comparing the time-domain vibration signals from Figures 5 and 10, it can be seen that the data distributions in the two datasets are quite different. Because the speed, load, and other conditions we set in the experimental platform are consistent with the actual working conditions, the collected vibration data can be considered as close to the actual working conditions, which can be used to test the performance of the method under actual complex working conditions.
We set up training and testing datasets with different capacities to validate the proposed model, as shown in Table 3. e training set contains a small amount of labeled data and a large amount of unlabeled data. e labeled data are randomly selected from vibration data of four different fault types. Each type of fault sample contains 1400 pieces of unlabeled training data and 600 pieces of testing data. All data have been transformed into a 2-dimensional image by wavelet transform from 512 sample points.

Experimental Results and Analysis.
e diagnosis accuracy on the different datasets is shown in Table 4, compared with the three other models. e results are the average accuracy of ten experiments on each dataset. It can be seen that the diagnostic accuracy of the proposed model on four datasets is the highest among the four models. For instance, the accuracy of our proposed model can achieve 96.91% on 50 labeled samples, while that of CNN model is 57.36%, that of CNN_SVM model is 82.33%, and that of SAE model is 78.39%.
Moreover, we also find that the diagnosis accuracy of the four models on the datasets of lab-built experimental platform is lower than that on the CWRU bearing dataset. We analyze that the reason is that, compared with the standard data set of CWRU, the operation condition of labbuilt experimental platform is closer to the actual working condition, so the vibration signals collected from the fault bearings are more complex, and then the fault characteristics are more difficult to learn in the model. ereby the diagnosis accuracy of the model is lower than that of the CWRU dataset.
In order to illustrate the diagnosis results of each fault type in detail, we draw a confusion matrix for the test accuracy on the CWRU dataset No. 3. Figure 11 shows the classification accuracy of the model for each fault at different training stages. e model has different recognition accuracy for different fault classes. In Figure 11(a), after training 20 epochs, some Inner ring faults are misclassified as Outer ring faults, and some Outer ring faults are misclassified as Normal and Inner ring faults. erefore, the accuracy of Outer and Inner ring faults using SACNN model is 62% and 64%, respectively.
Besides, as the number of iterations increases, the classification accuracy of each type of fault is improved. It can be seen from Figure 11(c) that the diagnosis accuracy of Outer ring faults is nearly 100% after training 300 epochs, while it is 88% after training 100 epochs and 62% after training 20 epochs.
In order to intuitively judge the quality of the features learned by the proposed method, we visualize the fault features trained on dataset No. 3, as shown in Figure 12. For different epochs, we use the T-SNE [23] method to reduce the high-dimensional fault features obtained from the previous layer of the classifier to 2 dimensions. Figure 12 We describe its characteristics qualitatively: the feature distribution points of different categories should be separated as much as possible, which indicates that the learned 8 Mathematical Problems in Engineering     fault features are more separable and the classification accuracy is higher. As can be seen from Figure 12(a), the feature points of the original vibration signal are completely interleaved and cannot be directly used for classification. After training 50 epochs, Figure 12     are decreasing, and the distinction between different fault feature data points is becoming increasingly apparent. erefore, the features learned during the training process are highly discriminative, and the model has satisfactory diagnosis performance.

Conclusion
In this study, a fault diagnosis method based on semisupervised learning deep adversarial network model is proposed to solve the problem of insufficient fault label data under practical working conditions. Based on semisupervised learning, a small amount of labeled data can be trained in supervised learning. In unsupervised learning combined with GAN, we added a large amount of unlabeled data to help model training. e network structure and the optimization loss of the model are designed. We demonstrate the performance of our proposed method on two different datasets, compared with three other common models. e change process of the characteristics learned during the whole training process is further analyzed by visualization technology to verify the superiority of the method. Despite obtaining the promising results, the method in this paper can effectively identify known faults, but the algorithm cannot be generalized to new fault categories to achieve cross-category fault diagnosis. In future work, we will attempt to combine few-shot learning to study crosscategory fault diagnosis.
Data Availability e data used in this paper include two parts. e first part is the public bearing dataset from the Western Reserve University Data Center, and the second part is the bearing data collected by the experimental platform set up by our laboratory.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.