A Generalizable Sample Resolution Augmentation Method for Mechanical Fault Diagnosis Based on ESPCN

Data augmentation has become a hot topic in the field of mechanical intelligent fault diagnosis. It can expand the limited training dataset by generating simulated samples, but there is still no effective method augmenting the resolution of low resolution sample. In this paper, a simple algorithm, namely, efficient subpixel convolutional neural network (ESPCN), is proposed to solve this deficiency. The ESPCN model performs the arrange operation on the raw low resolution data through the subpixel layer and outputs the result of four-channel multifeature maps. Then, the sample resolution is increased to four times compared with the raw low resolution sample. Finally, the generated high resolution dataset is employed to train the stacked autoencoders (SAE) for fault classification, and the raw high resolution dataset is used for testing. Two fault diagnosis cases with different sample dimensions and rotating speeds are set up to simulate the low resolution situation, and the experimental results verify the feasibility of the proposed algorithm.


Introduction
Mechanical fault diagnosis has entered the age of artificial intelligence as technology rapidly increases [1,2]. Meanwhile, the development of intelligent fault diagnosis cannot be separated from the support of enough measured vibration signal. When a part of a rotating machine has a local defect, a pulse with a short duration will be generated, and the vibration signal will show the fault feature of amplitude modulation [3,4]. The vibration signal measured from the surface of the machinery consists of many vibrating parts, such as the rotation of bearings and the meshing of gears [5]. Thus, vibration signal analysis is a useful technique for mechanical fault diagnosis.
However, the phenomenon of insufficient samples occurs frequently in practical application scenarios. Therefore, many researchers focus on using deep learning algorithms such as generative adversarial networks (GAN) [6] to increase the number of raw samples. Zhou et al. [7] proposed to use a scheme of global optimization to enhance the raw GAN to generate more discriminative fault samples. Shao et al. [8] input fault time-domain data with different label types to GAN to generate 1D simulation signal samples and then input them to convolutional neural network (CNN) with raw signals to realize data augmentation and fault classification. Wang et al. [9] used conditional GAN (CGAN) to simulate effective fault features automatically from fault signal and realize data augmentation and selected stacked autoencoders (SAE) [10] for accurate fault classification.
Furthermore, the quality of data is also particularly important. The higher the sample resolution is, the easier to detecting the smallest change of the measured object is [11]. It is also the requirement for the establishment of digital twins. More data points collected per unit time facilitate analyzing the internal characteristics of the measured object and realizing the accurate diagnosis of the machinery [12][13][14]. Thus, high resolution samples are generally employed in the study of fault diagnosis [15,16]. However, the aforementioned methods can only increase the volume of the dataset but cannot augment the sample resolution.
For a set of mechanical devices rotating at high speed, it is difficult to collect enough feature information by a signal collector with a low sampling frequency. So, the higher of the sample resolution is, the more conveniency to find the fault of the equipment, but the high resolution samples cannot be acquired due to the limitation of the signal acquisition equipment. In addition, research on resolution augmentation in fault diagnosis field is few and far between. By contrast, the resolution augmentation technology is a common tool to recreate high resolution image in the field of image processing [17]. Thus, it is worth considering to adopt resolution augmentation technology for the low resolution signal of the rotating machinery. Superresolution convolutional neural network (SRCNN) [18], superresolution generative adversarial network (SRGAN) [19], deep reconstruction classification networks (DRCN) [20], and efficient subpixel convolutional neural network (ESPCN) [21] are common resolution augmentation algorithms. For example, the low resolution image can be magnified to the shape of the target according to bi-3 interpolation of the SRCNN, and a CNN model is used to realize nonlinear mapping. Next, the low resolution image can be expanded and reconstructed by the upper sampling interpolation. In this paper, ESPCN is presented to augment the signal resolution of single sample. Furthermore, the accuracy of fault classification is tested by the SAE network to evaluate the performance of data augmentation. The contributions of this study can be summarized as follows: (1) The proposed ESPCN model can learn features from a low resolution sample and enhance the sample resolution by four times compared to raw signal (2) The generated high resolution dataset is employed to train the SAE model for fault classification and the raw high resolution dataset are used for testing (3) Two experimental cases (different sample dimensions and rotating speeds) are set to simulate the low resolution situation and verify the effectiveness of the propose method The remainder of this paper is structured as follows: Section 2 details theoretical backgrounds of ESPCN and SAE. In Section 3, the faut diagnosis framework based on ESPCN is described. Two diagnosis cases of gear and bearing datasets are set up in Section 4. Section 5 gives the conclusion.  Figure 1 shows that a subpixel convolutional layer and several convolutional layers constitute the ESPCN network. The raw image serves as the input to the network, and the low resolution image with the same size is output through the l-channel convolutional neural network. Then, the subpixel convolution layer is adopted to sequentially arrange the low resolution hidden layer features sequentially into a group of high resolution images. The first l-1 channel of the convolutional neural network is as follows:

Theoretical Background
where W l , b l , and l ∈ ð1, L − 1Þ are learnable weights and offsets, W l is a 2D convolutional tensor with the shape of n l−1 × n l × k l × k l , where n l is the feature number at layer l, k l is the convolutional kernel number at layer l, and offset b l is the vector with the length of n l . After the convolutional network layer, the feature map of r 2 channel is obtained and then is sent to the subpixel convolution layer for sampling.
where PS is a periodic shuffling operator that rearranges the elements with shape H × W × C ⋅ r 2 to shape rH × rW × C, and H and W are the height and width of the real-value tensor, respectively. I LR and I SR own C color channels. x and y denote the output pixel coordinates in the high resolution space. PS can sequentially arrange low resolution features into a group of high resolution image and can be defined as follows: where x and y are the output pixel coordinates in the high resolution space. Pixel-wise mean squared error (MSE) of the reconstruction is used as the cost function to train the network, whose mathematical expression can be written as follows: where I n HR (n = 1 ⋯ N) denotes the high resolution image examples, and I n LR (n = 1 ⋯ N) denotes the resulting low resolution image examples. [22] is adopted for feature extraction and sample reconstruction. As the composition of SAE, it has the feedforward neural network structure [23]. The basic architecture of AE consists of an encoder section and a decoder section. The encoder can compress the input data into latent space features, whereas the decoder reconstructs the input from the latent space representation.

Stacked Autoencoders. Autoencoder (AE)
Assuming fx n g N n=1 is an unlabeled dataset where x n ∈ R m×1 , the process of the encoder can be depicted as follows: where h n is the hidden encoder vector mapped from x n , s f denotes activation function, b denotes bias vector, and W denotes the weight matrix. g θ ′ is the decoding function that maps h n from the lowdimensional feature back into the high-dimensional feature. The process of the decoder can be defined as follows: Journal of Sensors where the activation function s g is the same as s f , and d and W T are the bias vector and weight matrix, respectively. MSE is adopted to minimize the reconstruction error: where the parameter set of the AEs are θ = fW, bg and θ ′ = fW T , dg. Figure 2 depicts that SAE to stack the autoencoder layer by layer to construct the DNN that is to take the hidden layer of the first AE as the input of the second AE. Feed forward layer-wise learning is employed for network training, softmax regression is adopted as classifier, and back propagation (BP) algorithm is used for weight updating and parameter fine-tuning. Figure 3 shows the structure of the proposed ESPCN and SAE for signal resolution augmentation and fault classification. In the process of signal resolution augmentation, the hidden layers of ESPCN are expressed by the following:

Proposed Framework
where the first two layers own 64 and 32 channels, respectively [24]. Scaled exponential linear unit (SELU) [25] is selected as the activation function ϕð⋅Þ, which makes the sample distribution automatically normalized to zero mean and unit variance, to ensure that the gradient does not explode or disappear during the network training. Then, the feature of low resolution sample is learned through the hidden layers, and the subpixel layer is adopted to realize the operation of resolution augmentation, which is composed of a fully connected layer and a periodic shuffling operator. The fully connected layer outputs r 2 (r = 2) channel feature maps with the same dimension of the input data, and then the generated data are obtained according to the following formula: where X SR is a high resolution sample generated by the proposed network, PS function is adopted to enhance the resolution by four times, and the form is able to be defined by the following: The final loss function we used is MSE which can measure the difference between the raw sample and the generated low resolution features as follows: In the process of fault classification, the generated high resolution dataset is input to the SAE directly for feature extraction and fault classification, so as to achieve model training. Then, the raw high resolution dataset is used for testing.  Figure 4 shows that the bearing fault test rig from Shandong University of Science and Technology (SDUST) [3] is set to explore the performance of ESPCN in resolution augmentation. The platform contains electric motor, load disc, bearing seat, gearbox, and powder brake. The bearing type is N205EU cylindrical roller bearing. As shown in Figure 5, three fault types including inner race fault (IF), outer race fault (OF), and roller fault (RF) are introduced to the bearing. 800 (200 × 4) samples from 4 health states are extracted to form the required dataset. The motor bearing vibration signal is collected from the LMS data acquisition instrument, the type of the sensor is vibration acceleration sensor as show in Figure 4, and the sampling frequency is 25.6 kHz. The senor is installed on the surface of the bearing seat. Simultaneously, motor speed is set to 3000 r/min.

Experiment Results and Analysis
In this section, 600 is set as the dimension of a low resolution sample, and 300 Fourier coefficients are obtained after FFT. Then, 2400 is set as the dimension of a high     [26] is adopted at the classifier, and error back propagation [27] is used to fine-tune the model. Batch normalization (BN) [28] is applied before each hidden layer of the SAE. The raw low resolution dataset is input into the ESPCN network, and the generated high resolution dataset is used as the training set of the SAE. Finally, the raw high resolution dataset are used for testing.

Diagnosis
Results. Figure 6 displays the spectra graphs of three data types (low resolution spectra, generated high resolution spectra, and raw high resolution spectra). Distinguish different fault types from th spectra graphs is difficult, and the classification network based on SAE must be used to extract features and distinguish fault types. On the one hand, the higher the sampling frequency is, the better the training effect of the SAE network, since the low resolution sample includes just one circle information of the bearing signal that contains much fewer features. On the other hand, the feature distribution of the generated high For comparative analysis, two datasets (low/high resolution dataset) are employed for fault classification, respectively. The diagnosis results are shown in Figure 7. 15 trials are repeated for each experiment to reduce the effect of randomness. It can be seen that the result from the low resolution dataset is not well, and the testing accuracy is 95.76% with 0.25% standard deviation. In contrast, the data obtained by the ESPCN can achieve a higher average accuracy which is 98:76 ± 0:46%. Since ESPCN can enhance the resolution of low resolution dataset by 4 times, the sample owns much more effective features, which helps the classification network to identify samples with different health conditions. Besides, the high resolution data achieves the highest accuracy of 99:96 ± 0:06%, since the high resolution own the raw and abundant information of the fault type.
To map the learned high-dimensional feature vector of SAE into a 2D feature vector, t-distributed stochastic neighbor embedding (t-SNE) [29] is applied that can show the diagnosis effect of the three kinds of datasets. In Figure 8(a), the clustering result of the low resolution data is the worst, several samples are mixed with one another, and various levels of misclassification are observed between different fault types. In Figure 8(b), the classification result of the ESPCN is much better than that of the low resolution data, and almost all the samples under different health conditions are well separated. In addition, the clustering result of the high resolution dataset in Figure 8(c) is the best, and the same sample set displays much more compact aggregation. Moreover, the feature learning process of the ESPCN network is displayed in Figure 9. Figure 9(a) displays the learned features of 64 channels from the low resolution samples. Different colors in each channel represents the current feature strength. The features of these channels are combined and extracted into one feature value, and then the feature map of Figure 9(b) is achieved. The differentiation of all the channels increases as the network layer increasing, and the learned features become more and more apparent. Finally, four-channel low-resolution simulation samples are obtained as shown in Figure 9(c).

Case 2: Fault Diagnosis of Bearing under Speed
Fluctuation Condition. Figure 10 shows the gearbox fault test bench that contains electric motor, coupling, gearbox, and bearing seats. Figure 11 shows 10 different health conditions: normal condition (NC), sun wheel crack (WC), sun wheel pit (WP), sun wheel worn (WW), pinion crack (PC), pinion pit (PP), pinion worn (PW), and three compound fault types (WWPW, WPPC, and WPPW). The sampling frequency is 12.8 kHz, and the senor is installed on the surface of the gearbox. The rotating speed is controlled by a frequency converter as shown in Figure 10, and the speed varies between 700 r/min and 1500 r/min randomly. The sampling frequency is fixed; so, the sample resolution will become lower with the speed increasing, because the sample points per rotate will become much less and the collected feature information will also reduce.
The model parameters of ESPCN are the same as that in case 1. The low resolution and high resolution datasets have 200 samples, and each sample owns 600 and 2400 sample points, respectively. The structure of three gear samples is randomly displayed in Figures 12(a)-12(c). The collected signals of different fault categories exhibit different speed fluctuations in the time domain. Figure 12(d) shows the irregular rotation rate fluctuation curves of three fault categories vary from 700 to 1500 r/min. Figure 13 shows the spectra of three different data types. It is found that the low resolution spectra also have the least feature information in Figure 13(a). Figure 13(b) illustrates the spectra generated by the proposed method based on the low resolution data,      Journal of Sensors and Figure 13(c) displays the raw high resolution spectra. It can be easy to see that the generated high resolution spectra also display the same feature trend as the raw high resolution spectra, which is much helpful for the accurate fault diagnosis of gearbox. The diagnosis results of the above three data types are displayed in Figure 14. Undoubtedly, the high resolution dataset achieves the highest accuracy 99:53 ± 0:23%. In comparison, the low resolution dataset obtains the lowest accuracy 94:56 ± 0:45%. The diagnosis accuracy of the ESPCN is lower than that of the high resolution dataset: the average accuracy is 98:23 ± 0:67%. In addition, t-SNE is also adopted to display the visualization result of dimension reduction. Figure 15(a) shows that the dimension reduction samples of WC and WP are mixed with each other, and samples of all the types are also not clustered. Figure 15(b) shows that the cluster result of the ESPCN is better than those of the low resolution dataset. In addition, the high resolution dataset also performs the best, and all the samples under different health states are separated with each other.

Conclusions
In this paper, a generalizable deep learning framework based on ESPCN is proposed to improve the resolution of bearing and gearbox signals. First, four-channel features are mapped from the low resolution sample. Then, the high resolution sample is output from the subpixel fully connected layer. Finally, the sample resolution is augmented by four times by using the ESPCN network. In the case studies, the classification results illustrate that the sample generated by the ESPCN model is effectiveness and can obtain a high diagnosis accuracy. Although the proposed method can improve the diagnosis accuracy of the low resolution dataset, the theoretical basis of the high resolution feature learning is not explicit. Moreover, it is interest to generate high resolution sample directly from the raw time domain signal. The authors will investigate this topic in future study.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.