Bearing Defect Detection with Unsupervised Neural Networks

,


Introduction
With the continuous development and progress of manufacturing industry, the demand for bearings is increasing as a basic component widely used. Performance and life of the machine itself often have a great relationship with the quality of the bearings [1], so the requirements for the quality of the bearings in industrial production continue to increase. In the process of manufacturing and assembly of bearings, defects on the bearing surface are often caused by various reasons. Common defects include pull marks, dark spots, pits, scratches, rust, and yellow spots. ese surface defects will cause the corrosion resistance, elasticity, wear resistance, and lubricity of the bearing to decrease, resulting in a greatly reduced service life of the machine, and even serious safety accidents. erefore, it is essential to detect the defects of the bearing.
For the detection of bearing surface defects, there are methods such as manual inspection, physical inspection, and machine vision inspection [2]. At this stage, the most important method is manual detection. However, manual inspection is very subjective, and it is often determined by the experience of the inspection operators based on their practice, which is time-consuming and labor-intensive. In addition, when the operation is performed under continuous light, the inspecting staff are prone to misdetection or missed inspection due to visual fatigue, and it will cause serious harm to the health of the inspector. e common methods of testing in physics are eddy current testing, ultrasonic testing, magnetic particle testing, and so on. ese physics-oriented inspection methods are widely used to detect the defects of bearing rollers, but this type of inspection method also has its own shortcomings; that is, it also requires operators to determine the defects of the bearing, but the inspection is not accurate. If the performance is still too low, it will cause missed detection or false detection.
With the continuous development and progress of modern science and technology, when we need to detect defects, machine vision begins to be more and more used. Ye and Hsu designed a new lighting system to collect images in a darkroom, avoiding the influence of external factors and light sources, and developed a rule-based local mask sensor algorithm to achieve high-precision detection of metal defects [3]. Shen et al. designed a new type of lighting and image acquisition system. By taking three photos of the bearing, the left and right photos are used to detect the deformation on the sealing ring, and other defects are detected by the central illumination image to correct the deformation on the sealing ring. Defects have high accuracy and efficiency [4]. Tao proposed a multithreshold segmentation image based on OSTU to quickly detect defects on the bearing surface. After denoising the collected images, use OSTU to perform threshold segmentation to obtain two thresholds before detecting and locating defects [5].
Traditional surface detection algorithms obtain detected images through image preprocessing and then use statistical machine learning methods to extract image features to achieve the goal of defect detection. ese algorithms have achieved good results in some specific applications, but there are still many shortcomings. For example, there are many image preprocessing steps and strong pertinence, with poor robustness; a variety of algorithms have an amazing amount of calculation and cannot accurately detect the size and shape of defects. Deep learning directly updates parameters through learning data, avoids manual design of complex algorithm processes, and has extremely high robustness and accuracy. Zhao et al. [6] proposed a new defect detection framework based on positive sample training, which combines GAN and autoencoder to reconstruct defect image, and LBP is used for image local contrast to detect defects. Wen et al. [7] proposed a multitask convolutional neural network to detect defects. Instead of using a large convolution kernel, a smaller convolution kernel is used to convolve the input data, and the shared neural network is used to classify and locate the defects after extracting the defect features of the sample data. Cha et al. [8] used a sliding window-based convolutional neural classification network to realize the location of crack surface defects, and the combination of two sliding window redundant paths to achieve full image coverage. Wang et al. [9] used a deep convolutional neural network to classify samples of defects when detecting defects in cloth and then detect defects after classification. Chen et al. [10] use DCNNs combined with SSD, Yolo, and other network methods to build a cascaded detection network from coarse to fine, including firmware positioning, defect detection, and classification. DCNNs have good robustness and adaptability, which means that this method has a good application prospect in the defect detection and classification of fasteners. Mei et al. [11,12] adopt the idea of image pyramid hierarchy and convolutional denoising autoencoder network to realize defect detection of cloth texture images. e results show that full use of unsupervised learning and multimodal result fusion strategy can improve the robustness and accuracy of defect detection. Bergmann et al. [13] propose an improving unsupervised defect segmentation by applying structural similarity to autoencoders, and the proposed method achieves significant performance gains on a challenging realworld dataset of nanofibrous materials. Yang et al. [14] propose an end-to-end surface quality detection method based on deep convolutional neural networks (CNNs) to improve the accuracy and efficiency of VDR surface quality detection. Essid et al. [15] develop a new machine vision framework for efficient detection and classification of manufacturing defects in metal boxes. e results show that the proposed autoencoder deep neural network (DNN) architecture can not only classify manufacturing defects, but also localize them with high accuracy. Wu et al. [16] propose a high-sensitivity magnetic flux leakage method based on magnetic induction head for the detection of tiny cracks in bearing rings. Xu et al. [17] propose a new multidefect detection method based on a combination of an improved visual attention model and image partitioning-weighted eigenvalue for surface defects of explosive cartridge in the automatic sorting process that are of small area, irregular shape, and random distribution. Kong et al. [18] propose a unified framework for detecting defects in planar industrial products or planar surfaces of nonplanar products based on a template-matching strategy. Tao et al. [19] propose an algorithm for pixel-level segmentation and classification of defects. e entire network can be divided into two stages: defect detection stage and defect classification stage. Fang et al. [20] propose an SLIC head of object instance segmentation in proposal regions (Mask R-CNN) containing a network block to learn the quality of the predict masks. Park et al. [21] propose a convolutional neural network (CNN) based method that inspects nonpatterned welding defects (craters, pores, foreign substances, and fissures) on the surface of the engine transmission using a single RGB camera. Ming et al. [22] propose a combined classifier with dynamic weights (CCDW) to classify the LPG samples considering both feature extraction diversity and base classifiers diversity after image segmentation and enhancement. Martínez et al. [23] propose a machine vision system, performing the detection of flaws on textured surfaces, and multiple images under different lighting conditions are processed and merged into one, which is used to extract features with a supervised classifier. Peng et al. [24] propose a precision measurement and inspection of O-rings with good accuracy and efficiency.
is research is to use the deep neural network to realize the defect detection of the bearing. e main content of this work focuses on the following topics: (1) how to increase the number of samples, (2) how to improve the AUC of the model, and (3) how to enhance the feasibility of the method. e organization of this paper is as follows. Section 2 describes the defect representation and data acquisition system, and Section 3 introduces the methodology. Experiment and results are illustrated in Section 4, and Section 5 gives some discussion. Finally, Section 6 summarizes this paper.

Data Acquisition System.
e data acquisition system is composed of cameras, lighting systems, and computers, as shown in Figure 1. e image capture device can capture images of the inner end surface, outer diameter, inner diameter, and lower end surface separately. Basler industrial camera as A1300-60gm with resolution of 1282 × 1026 pixels is selected, and the lens is PCHI012. Different field of view sizes can be obtained by adjusting the focal length, so as to match the inner diameter, outer diameter, upper end surface, and lower end surface size. By adjusting the exposure time to obtain the largest signal-to-noise ratio, the light source is uniformly illuminated by the ring LED (the light source model is HZN DRL-70-60-W). e final images obtained are shown in Figure 2.

Defect Representation.
Bearing defects mainly include the following types: outer diameter defects (stretch marks, dark spots, pits, scratches, rust, and yellow spots); lower end surface defects (dents, convex deformation, scratches, and embroidery); inner diameter defects (dimples, scratches, and embroidery); inner end surface defects (dents, convex deformation, rust, and yellow spots). ere are many types of defects, and the characteristics of defects are not obvious, as shown in Figure 3.

Methodology
Carefully observe the samples obtained by the abovementioned devices, and you can find that, in addition to useful information, there is some useless redundant information in the samples. In order to ensure the accuracy of detection, a series of pretreatments are required on the samples. Although the defects of the inner end surface, inner diameter, outer diameter, and lower end surface are different, their distributions are similar. ey are all distributed along the circumference of the bearing, but the position is different. erefore, this article selects the inner diameter sample with more complicated appearance and more interference factors. Processing: samples from other parts can be processed in the same way.

Normalized Sample Method.
Since the bearing is taken on the liner, in addition to the bearing, images of other parts are also taken. To solve this problem, we first find the contours of the outer and inner edges and perform ellipse fitting on the contours. en, based on the center position of the fitted ellipse, move the bearing to the center of the image, and use perspective transformation to transform the ellipse into a circle based on the parameters of the ellipse. Finally, remove all the parts outside the outer edge and inside the inner edge after the transformation. e captured bearing image and the processing algorithm schematic are shown in Figure 4.

Sample Split Based on Normalized Sample Symmetry.
After the sample is normalized, the inner diameter part of the bearing is converted into a standard ring, which satisfies the characteristics of stacking based on the center of the image. Since the defect part is generally very small and only occupies a small part of the ring, the symmetry can be used to split the sample into a large number of fan-shaped rings, as shown in Figure 5. e 12 samples obtained will be labelled, and the classifier will be trained based on the divided samples.

Supervised Neural Networks Using ResNet Neural
Networks. Deep convolutional neural networks have already shined in image classification problems. Recent studies have also shown that the depth of the network plays a crucial role in accuracy. However, as the network deepens, there is a problem worth noting. As the network continues to stack and deepen, will the effect of the network always get better and better? Obviously, you will encounter the problem of gradient disappearance or gradient explosion, and this problem can already be solved by normalizing the input during initialization, but when the network finally converges, there will be a "degradation" problem, resulting in a decrease in accuracy (not overfitting), so although the number of network layers can be continuously stacked to allow it to train and converge, there is still no way to encounter degradation problems [25].
He et al. [25,26] build a new network structure (ResNet) to solve the above problem that when the number of network layers is too high, the effect of the deeper network is not as good as the shallower network, and a proper explanation is made. ResNet uses the input of one layer and the output of another layer as the output of a block. Assuming that x is the input of a block, and one block is composed of two layers, then he first passes through a convolutional layer and activates relu to obtain F (x), and then the result of F (x) after the convolutional layer is added to the previous input x to obtain a result, and the result is activated by relu as the output of the block. For ordinary convolutional networks, we output F (x), but in ResNet, we output H (x) � F (x) + x, but we still use F (x) � H (x) − x. is changed the learning goal, changing the original learning to make the objective function equal to a known constant value to make the residual between the output and the input 0, which is the identity mapping. e result is that after the residual is introduced, the output is mapped to the output. e changes are more sensitive.
Based on the samples obtained with Sections 3.1 and 3.2, supervised neural networks can be trained with ResNet neural networks as the following process, as shown in Figure 6. Also, more details can be found in our previous work [27].

Autoencoder Neural Networks Implemented with U-Net.
In the field of image generation, there is a very important network structure called Autoencoder [28]. An autoencoder neural network architecture is a feedforward network composed of one or multiple connected hidden layers. It uses a nonlinear mapping function between the original data as input and output specific learned features. e feature of autoencoder is that the first half is the downsampling part, which is generally implemented by CNN; the second half is the upsampling part, which is generally implemented by inverse convolution.
e most amazing thing about the entire autoencoder is that even if we only have the features of the middle layer, we can recover a picture that is very close to the original picture through the second half. erefore, the entire autoencoder has at least two attractive applications: (1) use the first half for feature extraction; (2) use the second half for image generation.
U-Net itself is not used for autoencoder; it first appeared in the segmentation of medical images [29]. On the one hand, its structure is very similar to the traditional structure of autoencoder. On the other hand, its unique feedforward structure allows the network to capture a lot of spatial information. So recently, a lot of image synthesis and generation work are based on U-Net. In this paper, U-Net is used to extract feature map from the original image firstly, and then feature map is used to generate gradient image.

e Proposed Unsupervised Neural Network.
Lighting attenuation or batches will affect the classification effect of the supervised network; therefore, an unsupervised neural network is proposed to solve the disturbing factors, as shown in Figure 7. Based on the samples obtained with Sections 3.1 and 3.2, the proposed unsupervised neural networks can be trained with AE neural networks implemented with U-Net as the following process.
Step 1: raw bearing samples are normalized using Algorithm 1 in Section 3.1 Step 2: normalized samples are split based on normalized sample symmetry using Algorithm 2 in Section 3.2 Step 3: the gradient of the samples is extracted as label data, and Sobel operator is selected to calculate the gradient of the samples Step 4: AE neural networks implemented with U-Net are used to predict the gradient of the samples Step 5: the loss function is defined with the argmax of the difference between the label data and the predict data Shock and Vibration 5 Step 6: new data can be updated to online train and online modify the model

Experiment and Results
e image processing algorithm in this article is trained and tested on the server. e server's processor is Intel(R) Xeon(R) CPU E5-2678v3@2.5 GHz, the graphics card is 2 GeForce GTX 1080 Ti from NVIDIA, and the deep learning architecture uses TensorFlow.

Model Evaluation Method.
Generally, the parameters of the classification confusion matrix in the following table are used for statistical calculation. Table 1 shows the classification confusion matrix. In this paper, the accuracy rate ACC, accuracy rate P, and recall rate R of the training model on the black box set are used to evaluate the pros and cons of the model. e accuracy rate ACC is defined as follows: the proportion of the correct result of the classification model to the total observation sample, that is, the proportion of all the predicted results that is correctly predicted. e accuracy rate P is defined as follows: among the samples that are identified as positive samples, the model predicts the correct proportion. From the perspective of prediction, one type of prediction result is taken out to evaluate the prediction accuracy rate. e recall rate R is defined as the ratio of correctly identified samples in all positive categories, reflecting the sensitivity of the model. e accurate rate, accuracy rate, and recall rate are defined as e accuracy rate can better represent the accuracy of the model. Accuracy and recall rate are better performance evaluation indicators than correct rate, which is an evaluation of a certain category. Accuracy and recall are a pair of contradictory measures. Generally speaking, when the accuracy is high, the recall is often low; when the recall is high, the accuracy is often low.
Another more comprehensive evaluation index is receiver operating characteristic (ROC) curve. e ROC curve is used to describe the performance of the two classification systems (the threshold of the classifier is Input: inner diameter sample with 1280 × 1024 pixels. Output: normalized samples of inner diameter sample with 760 × 760 pixels. (1) Morphological denoising: the original image is corroded and expanded, and the template is a 5 * 5 rectangular morphological structural element; (2) Binarize the original image, take the maximum gray value and minimum gray value of the inner diameter area as the threshold, set the image greater than the maximum threshold and less than the minimum threshold to 255, and the inner diameter area becomes 0;  , a comprehensive index of continuous changes in response sensitivity and specificity, and the points on the ROC curve reflect the susceptibility of the same signal stimulus. ROC curve and AUC are indicators to evaluate the pros and cons of the two-class model as a whole, where AUC is the area between the ROC curve and its horizontal axis. e ROC curve is generally above y � x. e larger the AUC value, the better the model. e ROC curve is drawn by two indicators, the true-positive rate (TPR) and the false-positive rate (FPR). e true-positive rate (TPR) is defined as follows: the true label is the proportion of the positive sample, in which the prediction is also the positive sample. e false-positive rate (FPR) is defined as the proportion of positive samples, whose true labels are negative.
TPR � TP TP + FP , 4.3. Results. Train the three experimental models and test them on the same test set, draw the ROC curve, and calculate the AUC, as shown in Figures 8(a)∼8(c). It is easy to find that the model of Figure 8(b) has the best performance, while Figure 8(a) has the worst. Statistics of the above indicators are shown in Table 2. e R indicator of all the three networks is 100%. From the ACC, P and AUC indicators, the unsupervised networks have better performance than supervised network. e AUC of the three models is 0.8567, 0.9721, and 0.9623 separately.
ough the indicators of the third model are slightly less than those of the second model, the third model is still good enough for actual use. What is more, the third model is totally an unsupervised model, which is very convenient in actual use and can update the model online.

Discussion
Some experiments about the supervised neural networks with ResNet networks and unsupervised neural networks with AE networks for bearing defect detection have been carried out in Section 4. According to the results, some points should be discussed further: (1) Why does the unsupervised network have better performance than the supervised network? We think the supervised network can have good performance if the defect characteristics are obvious. However, the defects of the bearing are very small and very inconspicuous. e unsupervised networks are good at identifying small defects. us, the unsupervised network has better performance.
(2) Training process: in experiment 2, the unsupervised networks are trained with positive samples, which have the best performance; however, the samples have to be selected manually. In experiment 3, the unsupervised networks are trained with positive samples and negative samples; that is to say, the process of selecting samples is not necessary, which will be of great convenience for industrial site processing.   automatically for good performance of the networks. e proposed networks can update the networks with the update samples.

Conclusions
is paper proposes new unsupervised neural networks based on AE networks for bearing defect detection. Sample preprocessing algorithm based on normalized sample symmetry of bearing is adopted to greatly increase the number of samples. Gradients of the unlabeled data are used as labels, and AE networks are created with U-net to predict the output. ree experiments, one with supervised network and the other two with the unsupervised network, are conducted. e AUC of the three models is 0.8567, 0.9721, and 0.9623 separately. ough the indicators of the third model are slightly less than those of the second model, the third model is still good enough for actual use. What is more, the third model is totally an unsupervised model, which is very convenient in actual use and can update the model online. e experiment results demonstrate the feasibility and superiority of the proposed unsupervised networks. It can be expected that, with the widespread application of visual inspection systems in bearing automation production lines, the proposed method can greatly improve production efficiency and make a certain contribution to the improvement of bearing production quality.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.