Hot-Rolled Steel Strip Surface Inspection Based on Transfer Learning Model

In the production process of steel strips, the detection of surface defects is very important. However, traditional methods of defect detection bring problems of low detection accuracy and dependence on subjective judgment. In this study, the surface defects of steel strips are detected by a classic convolutional neural network method that is improved by the use of a transfer learning model. This model has the advantages of shorter training time, faster convergence, and more accurate weight parameters. The transfer learning model obtained through experiments secures better results in defect detection than the classic convolutional neural network method, as its accuracy of training and testing has reached about 98%. Finally, a model based on a full convolutional neural network (FCN) is proposed for segmenting the defective areas of steel strips.


Introduction
As an important steel product, the hot-rolled strip is widely used in manufacturing industry [1][2][3]. During the process of producing hot-rolled strips, factors such as the declining performance of rolling equipment and fluctuations in the production process may cause different types of surface defects to appear on the surfaces of hot-rolled strips, which will result in financial losses. In practice, inspection of the strips is usually carried out visually by individuals, a method that is unreliable and time-consuming. With the development of computer vision and pattern recognition, a series of automated methods have been applied to the inspection of strip steel [4,5]. Chengming et al. [6] proposed a research method for surface quality inspection of cold-rolled strips based on a back propagation (BP) neural network. First, the features were extracted by a wavelet transform, the nonlinear characteristics of a pattern-recognition method based on a BP neural network were used, and then images of five kinds of typical cold-rolled strip surface defects were studied. The average recognition rate was 92%. Jiahui et al. [7] proposed a method of applying a Gabor filter for the surface defect detection of strip steel. Using the Gabor filter's characteristics of frequency selection and direction selection, the evaluation function was introduced to maximize the difference of energy response between images containing defects and defect-free images, and the image with defects was segmented. Ke et al. [8] applied the hidden Markov tree model to the on-line detection of the surface defects of the strip steel, respectively, modeling and realizing multiscale defect segmentation using this model. When different types of defects are represented by the same "defect model," the segmentation accuracy of the model for common defects on the strip surface reaches 94.4%.
The traditional methods of manual detection and classical pattern recognition described above are highly dependent on the experience parameter-setting of operators or algorithms. These methods are often only suitable for the inspection of strip steel under specific conditions. If the external environment changes slightly, such as a change in illumination intensity, the change of image size and the increase of noise carried by signal source will cause the detection to fail. In recent years, due to the excellent performance of deeplearning technology in many visual tasks [9,10], a method based on deep learning is proposed for the automatic detection of defects on steel surfaces, replacing manual inspection and traditional pattern-recognition methods.
Yanxi et al. [11] proposed a detection algorithm for defects on strip steel surfaces based on convolutional neural networks (CNN). By establishing a CNN model with the introduction of deep-learning knowledge and the creation of data sets, the automatic extraction and detection of strip steel surface defects have been realized. The effectiveness of the algorithm has been verified by experiments.
He et al. [12] proposed a hierarchical learning framework based on convolutional neural networks to classify the defects of hot-rolled steel and introduced a multiscale receiving field (MSRF) to be used together with the pretraining model concept-v4 to extract multiscale features. At the same time, a group of small automatic encoders (AE) has been trained to reduce the size of extracted features adaptively, so as to avoid overfitting the training set. Experiments on samples captured from two hot-rolling production lines show that the proposed framework achieves 97.2% and 97% classification rates, respectively, which is much higher than the traditional method.
Other deep-learning algorithms applied to the location and classification of strip surface defects include fast r-CNN [13], yolov3 [14], and SSD [15], which can realize the function of locating the position of defects.
Therefore, because the machine vision and deep-learning technology can realize the automation and intelligentization of the product surface defect detection process, this technology has great practical application value. It has been applied to the surface defect identification of steel as mentioned above and the wood defect inspection, glass surface defect inspection, shape defect inspection of small electronic components and surface defects of magnetic sheet, etc. For manufacturers, this technology can not only better guarantee the production quality of products but also greatly improve the production efficiency of products and greatly improve the competitiveness of industrial production.
In this paper, the convolutional neural network model is first introduced, and then, a transfer learning model with higher accuracy is proposed. For the detection of surface defects of hot-rolled strips, we compared a convolutional neural network with a transfer learning model. The training time of the transfer learning model [16] is shorter, its convergence speed is faster, the weight parameters are more accurate, and the accuracy of its detection of the surface defects of strip steel is also higher.

The Structure of a Convolutional Neural Network
A convolutional neural network is mainly composed of several layers, including an input layer, convolution layers I and IV, pooling layers II and IV, a full connection layer V, and a softmax layer VI, as shown in Figure 1.
As its name implies, the input layer is responsible for the input tasks of the whole convolutional neural network. When the input is an image, the input represents the pixel matrix of the image. The input layer transforms the pixel matrix or three-dimensional matrix into another three-dimensional matrix, layer by layer through different neural network structures, until it finally reaches the full connection layer.
The convolution layer is the most important part of a convolutional neural network. The information received by the convolution layer is only a small part of the upper layer neural network. However, although only a small piece of information is accepted, the convolution layer will process the information at a deeper level, so as to obtain higher abstract features.
The role of the pooling layer is completely different from that of the convolution layer. When the pooling layer receives the node matrix from the previous layer, it will reduce the size of the matrix. This process therefore reduces the parameters of the whole neural network and the number of nodes in the full connection layer.
The full connection layer is located in the last two layers of the convolutional neural network. It mainly gives the classification results for the information processed by the multilayer convolution layer and the pooling layer. In the convolutional neural network, the convolution layer and pooling layer are equivalent, so this is a tool to automatically extract image features and enhance information content, while the full connection layer takes over the final work of collection and classifies and summarizes the final information.
The softmax layer is also used for classification problems, and this layer gives the probability of the current information being classified into each category.

Transfer Learning
Transfer learning is a kind of learning process that uses the similarity of data, tasks or models to apply a model learned in an old field to a new field.
Transfer learning has emerged to solve the contradiction in the current state between big data and less annotation and weak computing. Under the current state, we possess a great deal of data information consisting of images, languages, and text, but most of these data are unprocessed, and only a few of them have been correctly labeled manually. Similarly, big data also requires more powerful equipment for storage and calculation, but this condition cannot be met by most ordinary scholars. To solve these contradictions, the theory and application of transfer learning have emerged as the times require. Through the methods of migrating data annotation, model migration, and adaptive learning, migration learning can meet the user requirements of large amounts of data annotation, fine-tuning after model migration, and personalized and flexible adjustment of the model. The transfer learning model used in this paper is an image classification model trained on the large image database named ImageNet. It often takes weeks or even months to train the model on a powerful computer with multiple GPUs. The model can classify over a thousand kinds of images. After training the model, we only need to understand the input and output layer and how to fine tune it. Therefore, after using large sample data sets, the same model solves the classification problem well. Then, the weight parameters used in training the model can effectively complete the tasks of feature extraction and classification on small data sets, as shown in Figure 2. The upper part is the model pretrained on the large data set, and the lower part only needs to adjust the network 2 Journal of Sensors output structure according to the actual needs. The model is applied to the small data set in the lower part.

FCN.
A fully convolutional neural network (FCN) is used here for segmentation. Figure 3 is a schematic diagram of an FCN network. The full connection layer is behind the CNN network, and the full volume build-up layer is changed behind the FCN network. FCN has made this change based on this. A classification network is a whole reasoning process from picture to probability value-that is, when the input is a picture, the output is a number. However, a FCN can input any size and then produce the corresponding size output through an effective reasoning and learning process-that is, the input is a picture, and the output is a picture. Any input size is ideal. This was also our original intention in choosing an FCN model. The FCN network has achieved very good results in three relatively comprehensive data sets. There are five downsamplings in FCN. The first three downsampling blocks are called the shallow network, and the last two are deep networks. In the field of semantic segmentation [18,19], there will be a more important problem, that is, local information and global information. The local information receptive field is smaller, but the global information receptive field is larger. The size of the receptive field sometimes affects the semantic segmentation of images, in the way that a large window and a small window have different views. In the segmentation of small objects, if the receptive field is large, it will lead to too much information to separate the object accurately, and vice versa. Local information is helpful to segment smaller targets, while global information is generally used to segment larger targets. Then, the FCN defines a jump connection, which combines semantic information from the deep layer and representation information from the shallow layer, fusing these two sources of information to segment more accurately and precisely.

Data Set.
In the production process of hot-rolled strips, surface defects will inevitably be caused, for a variety of reasons. As well as improving the production conditions and      Journal of Sensors production process and reducing the possibility of surface defects, it is also necessary to do a good job of surface defect detection. It is therefore particularly important to use good defect detection methods. In order to use a surface detection method based on a convolutional neural network, a complete and effective data set is required. The data set of surface defects of hot-rolled strips used in this experiment is from Northeast University [21], as shown in Figure 4. It includes six types of common surface defects of hot-rolled strips: rolled in scale (RS), patches (PA), cracking (Cr), pitted surface (PS), inclusions (in), and scratches (SC). It is difficult to collect the surface defect images of hot-rolled strips, so there are only 300 images for each type of defect, a total of 1800 defect images. The second training set contains a total of 600 defect images, and the test set contains 300 defect images. The third training set contains a total of 1200 defect images and a test set of 600 defect images. In this experiment, we use the Google collaborative notebook with the TensorFlow platform to load the Google cloud hard disk. We can directly import the data set from the Google cloud hard disk by modifying the path and realizing the cloud GPU operation.

Experiment Based on a Convolutional Neural Network
Model. The experiment is divided into training sets and a test set. The first training set contains 300 defect images (average score of six categories, 50 images of each category) and 150 defect images of the test set (25 images of each category). In this experiment, the training times of Experiment 1, Experiment 2, and Experiment 3 are 78 minutes, 160 minutes, and 300 minutes, respectively, when the epoch is 20. The training accuracy of the model is still the measure of test accuracy. With the increase of epoch, they all show an increasing trend, but the growth results are different. When the epoch is 20, the training accuracy is 0.8733, 0.9517, and 0.9498, and the test accuracy is 0.4867, 0.7067, and 0.6933, respectively. The training accuracy of each group of experiments can reach about 90%, but the accuracy of the model test is only about 70%.

Experiment Based on a Transfer Learning Model.
Previous experiments based on a convolutional neural network model enjoyed good training results, but their evaluation test results were not very good.
Given that transfer learning has many advantages, including no need to retrain the model, faster training speed, and less time spent, a strip detection experiment based on a transfer learning model was carried out. The transfer learning method replaces the last full connection layer of the model pretrained on ImageNet data set and trains a new full connection layer with the output of a bottleneck layer, so as to deal with the classification problem for hot-strip defects.
The number of the experimental data set is 1800, there are a total of six types of hot-rolled steel strip defect sample pictures, and the experimental data set is the same as the one used before with the convolutional neural network model, so the training effects can be compared to a certain extent. The parameter-settings are as follows: the learning rate is 0.01, the batch is 100, and the step is variable. By changing different steps, that is, by changing different training steps, we can observe the change of model accuracy with the step increases.
As shown in the experimental results in Table 1, we can visualize the changes in accuracy of the training process and the test process in the experiment, so as to obtain intuitive results, as shown in Figures 5 and 6. Figure 5 shows the variation of training accuracy. It can be seen from the figure that training accuracy increases from 58% at the beginning to 98-100% at the end. At the same time, it can be seen that the training accuracy of the steps has been stable since 80%, which indicates that the training of the data set is good. Figure 6 shows the variation of test accuracy. From these figures, it can be seen that the test accuracy increases from 68.2% at the beginning to 97-99% at the end. At the same time, it can be seen that the test accuracy of the step has been stable since 60, which is more than 92%, indicating that the test precision of the data set is still very good. Therefore, under the experimental conditions, the overfitting phenomenon that occurred in previous experiments is not observed. For example, there is no similar situation in which the model training is not good enough due to the small number of data sets, which implies a good classification effect for six types of defects. By comparing the convolutional neural network model with the transfer learning model, it can be seen that the training accuracy of the two parts of the experiments without transfer learning can reach more than 90%. However, under the same data set conditions, the training time of the convolutional neural network is significantly longer than that of transfer learning, and the test accuracy is far lower than that of the transfer learning method. Therefore, As shown in Figure 7, for the scratch category, the four images (d)-(g) are iterated 20000 times. Although they are still a little larger than the original figure, the contour has become more accurate. For the (f) scratch image, we can see that the scratch on the far left is very small, and the segmentation of the model is not accurate, and the boundary on the right is relatively fuzzy. Two points can be made here.   Journal of Sensors The first is that the pixel resolution value of the steel plate itself is relatively poor, and the boundary is not obvious, which leads to unsatisfactory segmentation. The second point is that the size of the input image, which leads to a relatively large receptive field and finally makes the image boundary segmentation effect, may not be ideal. Therefore, no matter what model is used, it will have shortcomings and deficiencies.
In the experiment with the steel plate data set model, the segmentation image is still relatively successful, but the steel plate defect is not segmented with clear contours after all, so we do not have high segmentation accuracy for the defective parts of the steel plate. We think that the model segmentation and marking can be carried out through the comparison in the subsequent production data set, so that the model can provide more accurate segmentation.

Conclusion
In order to solve the problems of traditional hot-strip defect detection methods, such as slow detection speed, low detection accuracy, and the dependence of parameter-setting on subjective experience, this study applied a convolutional neural network to the detection of surface defects of hot-rolled strips. On this basis, the convolutional neural network model is studied and improved, and a transfer learning model with higher-optimization training accuracy is proposed. The transfer learning model offers a shorter training time, faster convergence speed, and more accurate weight parameters. The experimental results show that the transfer learning model has higher accuracy than the convolutional neural network method in the detection of strip defects. The transfer learning model and FCN defect segmentation method we proposed can not only be applied to strip steel detection but also be easily extended to other surface defect detection areas.

Conflicts of Interest
The authors declare that they have no conflicts of interest.