Bearing Defect Classification Algorithm Based on Autoencoder Neural Network

/e postproduction defect classification and detection of bearings still relies on manual detection, which is time-consuming and tedious. To address this, we propose a bearing defect classification network based on an autoencoder to enhance the efficiency and accuracy of bearing defect detection. An improved autoencoder is used to reduce dimension feature extraction and reduce largescale images to small-scale images through encoder dimensional reduction. Defect classification is completed by feeding the extracted features into a convolutional classification network. Comparative experiments show that the neural network can effectively complete feature selection and substantially improve classification accuracy while avoiding the laborious algorithm of the conventional method.


Introduction
Bearing quality is related to the overall performance of a machine, affecting its stable operation and indirectly affecting the quality of its output. Bearing surface defects are a key factor in the life cycle of bearings based on the internet of things [1][2][3][4][5]. Bearing manufacturers attach great importance to the quality of their products and generally inspect them before they leave the factory. A bearing's outer ring surface is more prone to defects than other parts in the assembly and production process, and it has a great impact on a machine's performance. erefore, bearing inspection focuses on the outer ring surface. Bearing factories presently rely on manual sampling inspection ( Figure 1). is method is inefficient, and not all outputs can be tested, which may lead to overlooked defects. Also, due to the influence of experience, working state, and fatigue of inspectors, detection standards cannot be unified. erefore, it is necessary to design automatic detection equipment.
With high speed, nondestructive characteristics, low noise, and automation capabilities, machine vision systems have found a wide range of applications in product defect inspection.
is process is well suited to bearing manufacturing. Inspection requires knowledge of whether defects exist and the types of defects. e probability distribution of types of defects should be determined to address production problems and make improvements.

Related Work
Defect-classification algorithms have been developed. Features used as raw data for classification have a direct effect on the results, and their performance depends largely on the form of data and the choice of features.
Belkin [6] proposed a manifold dimensionality reduction method based on the conversion of high-dimensional initial samples to low-dimensional manifold structures to reduce the sample dimensionality and facilitate sample visualization. Sooraksa et al. [7] aimed to detect defects on an air bearing surface. ey combined block matrix-based image segmentation technology with a neural network to recognize defects. Çelik and Dülger [8] detected and classified four commonly occurring defects in fabrics-lack of warp yarn, weft yarn defect, yarn hole contamination, and yarn flow-using an algorithm based on a wavelet transform, image morphology analysis, and binary operations. e defect-classification algorithm was based on a grayscale cooccurrence matrix and feedforward neural networks. It achieved defect-detection accuracy as high as 93.4% and defect-classification accuracy of 96.3%.
Extracted original feature data in image processing are often high-dimensional and contain considerable redundant information. To directly process the original data requires substantial computing resources, so it is often necessary to reduce its dimensionality. Principal component analysis (PCA) is a commonly used unsupervised dimensionality reduction algorithm. Minka [9] proposed Bayesian model selection to estimate the true dimension of data when determining the number of principal components retained by the PCA algorithm and applied the Laplace method after selecting the appropriate parameters to solve the integration problem on the Stiefel framework. is led to better results and a higher processing speed compared to a cross-validation algorithm. e choice of classification algorithm has a huge impact on classification results. Mien Van and Hee-Jun Kang [10] proposed a wavelet-kernel local linear Fisher discriminant analysis algorithm (WKLFDA) using a wavelet-kernel function for linear Fisher discriminant analysis (LFDA). Particle swarm optimization was used to automatically select the parameters of WKLFDA and to convert the multi-classification problem into binary classification using a one-versus-one strategy. e features of each binary classification process after dimensionality reduction were fed into a single support vector machine (SVM), whose results were combined with a decision fusion mechanism to determine the condition of a bearing. e classification efficiency of this algorithm was better than that of other algorithms, and its classification accuracy could reach 98.80%. Zapata et al. [11] used 12 geometric characteristics to represent the shapes and orientations of weld seam defects and proposed an artificial neural network (ANN) and an adaptive neurofuzzy inference system (ANFIS) to classify defects. rough experiments, the correlation coefficient and trust matrix of the ANN and ANFIS were determined, and respective classification accuracies of 78.9% and 82.6% were achieved. ere are many classic network models. e Alex network [12] used dropout technology to improve the overfitting problem. Google network used 1 × 1 convolution and proposed a concept embedding structure, using average pooling instead of full connection to greatly improve the network depth. It was the champion of ILSVRC detection and classification in 2014. It used the convolution pooling full connection VGG network [13], and the convolution kernel of all convolution layers was 3 × 3, which greatly improved the expression ability. e Res network [14] introduced a residual structure, which made it easier to optimize a deep network and further deepened it. e concept of a neural network was proposed in the 1940s when the multi-perceptron (MP) prototype model and Hebb learning rules first appeared. Rosenblatt [15] proposed a perceptron model and laid the foundation for neural networks. Rumelhart [16] proposed a backpropagation algorithm (BP) to solve the problem that perceptron models could only have single layers. Based on the BP algorithm, LeCun [17] proposed a convolutional neural network suitable for deep learning, and Hinton [18] proposed a deep belief network that used a hierarchical initialization method to train deep network parameters, solved the bottleneck of the BP algorithm, and made it possible to increase the number of network layers. us, the concept of deep learning was born. A neural network can directly take inputs and output a classification result. We apply neural network technology to classify bearing defects.

Equilibration and Enhancement of Data.
Based on defects that actually occur in bearing production, we collected three types of representative defect samples-abrasions, bruises, and overgrinding defects-on model 6204 deep groove ball bearings, as shown in Figure 2. Of these samples, 308 were of abrasion, 240 bruises, and 247 overgrinding. A network trained on a dataset with significantly different numbers of samples in each category would acquire inconsistent recognition ability of different categories. us, we balanced the samples to 240 per category using undersampling, for a total of 720 samples. is order of magnitude was small in terms of the training of a deep-layer neural network. More samples usually give better training results. We increased the number of samples to 4320 by rotating each defect sample by 90°, 180°, and 270°to obtain horizontal and vertical mirror images. Of these samples, 1296 were used as a testing set and the remaining 3024 were used as the training set.

Preprocessing of Samples.
In training a network, the speed of training and robustness of the model are enhanced by using samples of the same size, so it is necessary to normalize the sample size. e size range of the original samples was 50-600, with most samples concentrated between 100 and 300. If the image sizes were decreased too much, some feature information could be lost. If too much of the sample size was retained, then redundant information would increase the computational cost. Considering these two factors, we normalized the sample sizes to 112 × 112 by linear interpolation.
ree typical filters-Gaussian, median, and mean--were compared experimentally, as shown in Figure 3. Figure 3(a) shows the original image, where brighter areas correspond to defects. ere are many irregularly distributed white spots caused by small impurities on the bearing surface or noise generated by electromagnetic interference during signal transmission. e templates of the Gaussian, mean, and median filters were all 7 × 7. e Gaussian and median filters could smooth out the noise to a certain extent but not completely.
e Gaussian filter used different weighting coefficients and achieved better results than the mean filter. None of the filters could preserve the defect textures. Blurred textures can cause an image to lose features that are helpful in classification. e median filter could completely remove noise regions of a certain size and reduce the area of larger noise areas, while retaining the texture characteristics of defects. A comprehensive comparison of the effects of the three filtering algorithms showed that the median filter should be selected for denoising.

Bearing Defect Classification Network Based on
Convolutional Autoencoder [19] is a three-layered unsupervised neural network consisting of input, hidden, and output layers. e input and hidden layers constitute the encoder, and the hidden and output layers constitute the decoder. e network structure is shown in Figure 4. e functional relationship between the input and output of the encoding process can be expressed as follows: where S 1 (x) is the encoder activation function and W 1 and b 1 are the weight and bias, respectively, of a neuron. e decoding process can be expressed as follows: where S 2 (x) is the activation function and W 2 and b 2 are the weight and bias, respectively, of a decoder neuron. When y has a smaller dimension than x, the autoencoder can be used for dimensionality-reduction feature extraction. e training of the autoencoder generally aims to minimize the reconstruction error, and the loss function is generally chosen as the mean squared error between the reconstruction and the input. is loss function can effectively quantify the difference between the output and input, and its derivative can be obtained as follows:

Improved Convolutional Autoencoder Bearing Defect
Classification Network. Figure 5 shows the network training process of the model structure designed in this paper. e bearing defect samples were photographs, but conventional fully connected autoencoders are not suitable for image processing. rough the sharing of weighting factors, a convolutional autoencoder can greatly reduce the number of parameters. We adopted a convolutional autoencoder to extract features from the defect samples. e convolution process is shown in Figure 6, and the network layer parameters are listed in Table 1.
Based on the characteristics of the bearing defect samples, we made the following improvements to the autoencoder structure.
(1) Elimination of Pooling. In a conventional convolutional network, convolution is followed by pooling to reduce the dimension of the output and the size of the feature map. Figure 7 shows maximum pooling with a sliding step of 2 × 2. After pooling, the size of the feature map is reduced by half. To improve the stability of the autoencoder and reduce the number of network layers, the convolution stride was set to two to replace the original pooling operation. e network will automatically learn the appropriate sampling function during training. (2) Leaky ReLU Activation Function. Rectified linear unit (ReLU) activation functions are used extensively in neural networks. eir positive semiaxis derivative remains unchanged, which facilitates propagation in the gradient direction. However, some information may be lost when the input data to ReLU [12,13,[20][21][22] is less than zero. To retain such input data, we use the revised leaky ReLU activation function. is improves the portion that is less than zero based on the ReLU function. Leaky ReLU is expressed as follows: where a is the minimum nonzero parameter that can be learned from the directional propagation algorithm. Figure 8 shows the function and its derivative. is function enables the avoidance of the issue of the neuron not being activated in backpropagation of the model. After dimensionality reduction and feature extraction by the autoencoder, the data are sent to the classification network for classification. e pooling operation is removed, and the network structure parameters are modified to meet the classification requirements of this project. e classified network structure parameters are shown in Table 2.    Advances in Civil Engineering e kernel size of the first layer of the network is 1 × 1. e number of network parameters can be greatly reduced by compressing the dimensionality of the encoder output from 28 × 28 × 6 to 28 × 28 × 1. Overfitting is prevented by using the dropout on the full connection of the fifth layer. e last layer contains three neurons. e activation functions of both the convolution and full connection are leaky ReLU functions, and a softmax function was used as the activation function for the last layer. e loss function is the cross-entropy, (3) Addition of Fully Connected Network Layer. In a traditional convolutional self-encoder, the encoder and decoder network layers are convolution operations, while the convolution layer generally requires multiple convolution cores, which will cause the encoder output feature dimension to be too high. e decoder and encoder are connected by deconvolution, which makes it difficult to control the encoder output dimension. We improve the network structure of the self-encoder. After the convolution operation of the second layer of the encoder, we use the full connection layer to realize symmetry between the decoder and encoder. is solves the problem of the encoder output dimension control,

Advances in Civil Engineering
it causes the convolution neurons to be connected, and the extracted features are more representative. Figure 9 shows the network structure of the improved convolutional self-encoder. e network layer structure parameters are as follows: (1) e first layer is an encoder input layer, and the input data dimension is (112, 112), with a single-channel gray image. (2) e second layer network is a convolution layer with two convolution kernels of size (3,3), the sliding step size is (2,2), the padding operation is the same, the output dimension is a (112/(2) × (112/(2) � 56 × 56 characteristic graph, and the activation function is leaky ReLU. (3) e third layer is a convolution layer with four convolution kernels of size (3,3), the sliding step size is (2, 2), the padding operation is the same, the output dimension is (56/(2) × (56/(2) � 28 × 28, and the activation function is leaky ReLU. (4) e fourth layer is a tile operation. e output of the previous network layer is tiled into one dimension, and the output dimension is 3136.
C denotes the convolutional layer and F denotes the fully-connected layer. 6 Advances in Civil Engineering (5) e fifth layer is a full-connection operation with 50 neurons. e output is the encoder output, and the output dimension is 50. No activation function is used.
e network layer structure and parameters of the decoder are as follows: (1) e first layer is a decoder input layer, with input dimension of 50. (2) e number of neurons in the second layer is 3136, and the output dimension is 3136. (3) e input data of the third layer has dimension 3136, and it is transformed to a feature graph of dimension (28, 28, 4). (4) e fourth layer is deconvolution, the number of deconvolutions is 4, the convolution kernel size is (3,3), the sliding step size is (2,2), the output dimension is (56, 56, 4), and the activation function is leaky ReLU. (5) e fifth layer is deconvolution, the number of neurons is 2, the size of the convolution nucleus is (3,3), the sliding step size is (2,2), the output dimension is (112, 112, 2), and the activation function is leaky ReLU. (6) e sixth layer is deconvolution, the number of neurons is 1, the size of the convolution nucleus is (3,3), the sliding step size is (1, 1), the output dimension is (112, 112, 1), and there is no activation function. (7) e seventh layer is the activation layer. e sigmoid activation function is used to normalize the output value range of the decoder to (0, 1), and the output dimension is (112, 112, 1).

Experiment
e experimental environment was a 64 bit Windows 10 operating system, the processor was an AMD2600x, the graphics card was an Nvidia1060 6G, the memory was 16 GB, the programming language was Python, the backend of the deep learning framework was TensorFlow 1.13.2, and the interface language was Keras 2.3.1.

Training of Autoencoder.
First, the autoencoder was trained.
e weighting factors were initialized using a truncated normal distribution with a mean value of 0, a standard deviation of 0.1, a batch size of 64, and 1500 iterations. Figure 10 shows the reconstructed autoencoder. e first row contained the original defective samples, the second row contained the reconstructed samples of the autoencoder, and each column corresponded to a different defective sample.
e results show that the autoencoder could successfully restore the overall structure and detailed local texture of the input samples, so the encoder efficiently extracted the features of the samples.

Training of Classification Network.
e training and optimization of the classification network were performed using the Adam algorithm. e weighting parameters were initialized using a truncated normal distribution with a mean of 0, a standard deviation of 0.1, a bias of 0, a batch size of 64, and a dropout of 0.5. Figure 11 shows the changes in the training and test sets as the number of iterations increased. e accuracies of the test and training sets did not  Figure 9: Network structure of self-encoder.
Advances in Civil Engineering differ significantly over the entire iteration process, and they remained within the acceptable range. In the first 1000 iterations, the accuracy increased rapidly, indicating that the optimizer found the correct direction for convergence at this time. e automatic increase of the learning rate by the optimizer facilitated rapid convergence. After 1000 iterations, the loss gradually approached the minimum value, and the optimizer switched to a slower learning rate to prevent the training from skipping over the optimal solution. e final accuracy of the model was 98.74% on the training set and 98.13% on the testing set.

Comparison of Experimental Results.
To verify the performance of the network designed in this paper, the accuracies of the proposed network, a BP neural network, and classification using PCA-extracted features with the proposed classification network are compared in Table 3. e accuracy using features of the local defect area extracted using threshold segmentation and color conversion methods combined with the BP neural network was better than using PCA-extracted features with the proposed classification network but worse than that of the proposed algorithm alone.
is showed that our autoencoder-based neural network classifier could indeed improve the classification accuracy of bearing defects.

Summary
To improve the classification accuracy of bearing surface defects, we designed a neural network classifier based on an autoencoder. rough experimental analysis and comparison with conventional algorithms, the algorithm proposed in this paper was shown not only to improve classification accuracy but to reduce the algorithm design workload of feature extraction, dimension reduction, and classification.

Advances in Civil Engineering
Compared to the PCA algorithm, we found that the features extracted by the improved autoencoder in this paper were more representative and could achieve higher classification accuracy.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.