Research Article Automatic Fabric Defect Detection Based on an Improved YOLOv5

Fabric defect detection is particularly remarkable because of the large textile production demand in China. Traditional manual detection method is ineﬃcient, time-consuming, laborious, and costly. A deep learning technique is proposed in this work to perform automatic fabric defect detection by improving a YOLOv5 object detection algorithm. A teacher-student architecture is used to handle the shortage of fabric defect images. Speciﬁcally, a deep teacher network could precisely recognize fabric defects. After information distillation, a shallow student network could do the same thing in real-time with minimal performance degeneration. Moreover, multitask learning is introduced by simultaneously detecting ubiquitous and speciﬁc defects. Focal loss function and central constraints are introduced to improve the recognition performance. Evaluations are performed on the publicly available Tianchi AI and TILDA databases. Results indicate that the proposed method performs well compared with other methods and has excellent defect detection ability in the collected textile images.


Introduction
e textile industry is the traditional advantageous industry in China's economic development and is an important livelihood industry. e quality of textiles has a great influence on the textile industry. Fabric defects would reduce the cost and profit by 45%-65% [1]. erefore, defect detection plays an important role in the control of textile quality. Traditional textile defect testing is usually achieved by training skilled operators with high training costs, and the manual detection efficiency is low (the detection speed is less than 20 m/min). e error and leakage rates are high due to personnel fatigue or other subjective factors. Hence, how to detect fabric defects by automatic means has become an engaging, difficult research spot in the field of textile industry and machine vision. e core of machine-vision-based fabric defect detection is extracting the characteristics related to defects from the textile images. A detailed review of the machine-vision-based fabric defect detection methods could be found in References [2,3]. omas and Cattoen [4] used the gray-scale means of image rows and columns as defect-related characteristics, which are sensitive to illumination changes. Ye [5] presented the fuzzy inference based on image histogram statistical variables, which is robust to defects' rotation and translation. However, handling complex image texture is difficult. For complex texture images, researchers proposed methods based on edges [6], local binary patterns [7,8], contour waves [9], and gray co-occurrence matrix [10,11]. ese methods perform well in identifying defective images but have difficulty recognizing specific fabric defects. Moreover, several researchers used the characteristics of the high-frequency parts, such as Fourier transform methods [12,13], Gabor filter methods [7,14], and wavelet transform methods [15,16]. Compared with fabric defect detection in the spatial domain, it has more space-time overheads in the frequency domain.
Deep learning has been widely used in the fields of computer vision [17][18][19]. Researchers designed deep neural networks to realize fabric defect detection in a data-driven manner. Liu et al. [20] proposed using multistage GAN the detection of fabric defects through unsupervised data reconstruction. Hence, it could overcome the challenges of diversified fabric defects. Mei et al. [21] introduced a multiscale convolutional denoising autoencoder to learn the reconstruction of textile images. e reconstruction errors are utilized to realize automatic defect detection. Xian et al. [22] studied the problem of metallic surface defect detection that is similar to fabric defect detection. Convolutional neural network-based segmentation is used to detect and recognize defect regions. Wei et al. [23] used faster-RCNN to detect fabric defects automatically. It achieves satisfied detection performance benefiting from faster-RCNN's strong feature engineering ability. However, faster-RCNN has large space-time complexity due to its two-stage object detection scheme. Jing et al. [24] improved YOLOv3, which is a single-stage object detection method with real-time detection performance. en, it could better detect fabric defects.
In addition, several researchers studied the model-driven fabric defect detection methods, such as Markov random field [25], autoregression [26], and sparse dictionary [27,28]. After effective training, these methods could identify smallregion defects. However, they are vulnerable to external signals such as noise and light.
In conclusion, many researchers have proposed different methods to study how to detect fabric defects. However, detecting fabric defects is still challenging owing to many kinds of defects with large differences and uneven distributions. ese problems lead to the difficulty of designing an effective system to detect and localize the fabric defects automatically. Moreover, the proposed system is required to operate faster and could be realized in an intelligent edge device platform.
According to the above requirements, a lightweight fabric defect detection method is proposed by improving YOLOv5 [29] based on the special needs of the defect detection system. It could detect and recognize special fabric defects in real time. e main contributions of this article are as follows.
A teacher-student architecture is introduced to detect fabric defects. e deep teacher network could precisely recognize fabric defects. After information distillation, a shallow student network could do the same thing in real time with minimal performance degeneration. e student network could be deployed in the edge equipment because of its low space-time overheads.
To solve the problems of many kinds of fabric defects that are difficult to be distinguished, a multitask learning strategy is proposed to detect ubiquitous and specific defects simultaneously. Such a strategy could fully utilize the complementary between ubiquitous and specific defects. Moreover, an attention mechanism is used to enhance the defect-related features.
To handle data imbalance and small-region defects better, the focal loss function [30] is employed to mitigate data imbalance. e center loss is introduced as a constraint to increase the interclass distance while reducing the intraclass distance, hence improving the recognition performance of specific defects. e proposed method is evaluated on the publicly available Tianchi AI and TILDA databases. e results reveal its ability to detect and recognize specific fabric defects. To verify the generalization capability of the proposed algorithm, it is tested on self-collected fabric defect images and achieves good results.

Related Technologies
2.1. Convolutional Neural Networks. Convolutional neural networks (CNNs) are widely used in computer vision tasks [31]. CNN is a kind of feed-forward neural network that contains convolutional computation and deep structure. It has the representation learning ability to learn structured and translation-invariant information from input images. Compared with fully connected operations, CNN has the advantage of small computational overhead. A common CNN-based computer vision system consists of the following parts: Input layer: it performs gray processing, normalization, and data augmentation on the input images.
Convolutional layer: it performs convolutional operations in each layer to ensure the forward and backward transmission of the information. e feature map of the lth layer is derived from that of the l−1th layer using the convolutional operation, as follows: where K l i is the weight of ith convolutional kernel in the lth layer, and x l(r j ) represents the jth local region being calculated in the lth layer.
Activation layer: it always follows the convolutional layer to introduce nonlinearity. Hence, the network could have better representation learning ability. Commonly used activation functions contain Sigmoid, Tanh, ReLu, and their variants. Figure 1 shows the curves of three different activation functions.
(1) Pooling layer: it is used to subsample the feature maps to decrease computational overheads. It could also mitigate the overfitting phenomenon. Commonly used pooling functions consist of the average and maximum pooling strategies. (2) Output layer: it presents various structures according to different computer vision applications. For classification tasks, the SoftMax function is often used in the output layer to calculate the probability that the input belongs to each category, thus obtaining the classification results. e five components above are used in the improved YOLOv5. ey would not be introduced in detail in the following sections.

Object Detection Algorithm.
Object detection is one of the essential issues in the field of computer vision. It enables the computer to discover and locate targets of interest from images automatically, such as flaws in the fabric. Deep learning-based object detection algorithms have achieved great successes recently. Commonly used methods include RCNN [32], fast-RCNN [33], faster-RCNN [34], SDD [35], and YOLO [36]. However, the above methods have difficulty meeting the real-time requirements of the fabric defect detection system because they have high computational overheads. To balance precision and speed, a lightweight object detection network, named YOLOv5, is used in this work. e traditional YOLOv5 is improved based on the characteristics of the fabric defects, such that it can be applied to the fabric defect detection system. Figure 2 demonstrates the structure of the traditional YOLOv5, which mainly includes Bakbone, PANet, and Output. Bakbone is used to perform feature engineering from input images. PANet could obtain visual features robust to scale changes due to the used pyramid structure. e positions are output, and the regions of interest are classified simultaneously. Assuming the input image size as 608 * 608 * 3 (height * width * channels), the Output part could output three different scales of features with dimensions of 76 * 76 * 255, 38 * 38 * 255, and 19 * 19 * 255. Specific details of the YOLOv5 network could be found in [29].

Attention Mechanism.
e attention mechanism draws on human's selective attention characteristic. Specifically, a human being could quickly scan the global image and concentrate on the regions of interest. en, detail information of these regions are obtained, and useless information is suppressed. Based on different applications, attention mechanism could be divided into temporal attention, spatial attention, and channel attention. Temporal attention [37] could assign different weights to sequence features. en, the model could automatically focus on important sequence features, thus enhancing the ability to process sequence data without increasing the computational costs. Spatial attention [38] transforms the spatial information in the original image into another space and retains the key information, thereby identifying the substantial areas and increasing the attention on these areas. Channel attention [39] excavates effective features from the channel dimension and suppresses task-independent features, thus improving network performance.
For fabric defect detection, temporal attention cannot be used because the input is a static image. Considering that the defects may occupy a small proportion in the overall image, spatial attention can be used to pay more attention to smallregion defects. Moreover, channel attention is used to refine features and improve the algorithm performance. Figure 3 shows that input feature F is initially processed by maxpooling (MaxPool) and average-pooling (AvgPool). en, channel and spatial attention realize feature transformation with the shared three-layer MLP and convolutional operation, respectively. Finally, the sigmoid activation function is used to calculate different attention weights. Figure 4 illustrates the flow chart of the overall algorithm: (1) training stage: fabric images after data augmentation are sent to the teacher network to detect specific fabric defects. en, the defect-related knowledge is distilled from the teacher network to the lightweight student network. (2) Testing stage: the student network is used to detect specific fabric defects in real-time performance and with minimal performance degradation. e testing stage requires deploying the student network on the NVIDIA JETSON TX2 platform based on TensorRT, which is used to accelerate the student network.

Teacher Network Structure.
e structure of the proposed teacher network is shown in Figure 5. e feature extraction part and multiscale information extraction part of the teacher network are implemented using Backbone and PANet of the YOLOv5 network. eir specific structures have been introduced in Section 2.2 and are not repeated here. Two improvements are presented to perform better fabric defect detection.
(1) Attention enhancement mechanism: the defect areas may occupy small regions in the overall textile image.
Extracting defect-related features from these small regions is still a problem, even if PANet could extract the context information. Hence, the attention enhancement mechanism is introduced to mitigate the problem. First, spatial attention is used to enhance the network's sensitivity to small defect areas. en, channel attention is used to suppress the    Figure 2: Pipeline of YOLOv5. nondefective features, thus highlighting the defective features. Assuming that the output of PANet is F, spatial attention weight A s (F) and channel attention weights A c (F) could be calculated as follows: where MLP () represents a shared multilayer perceptron (three layers, the number of neurons is m, m/4, m, respectively; m represents the channel dimension of F.) and Conv () represents a convolution operation with the kernel size 7 * 7. e attention enhancement mechanisms used in this work are defined as follows: (2) Multitask learning strategy: the fabric defect detection task is usually divided into ubiquitous defect detection and specific defect recognition. Complementarity exists between these two tasks. Hence, the multitask learning strategy is introduced to utilize the complementarity fully. Specifically, two detection heads are designed to detect ubiquitous defects and recognize specific defects. A fusion model is then proposed to fuse the outputs of two detection heads to predict a more accurate defect recognition probability. Details are as follows: (1) For the detection head to detect ubiquitous defects, the defective probability of the ROI with the largest defective probability is defined as P A . en, the normal probability of the given fabric image is defined as P N � 1 − P A .
(2) For the detection head to recognize specific defects, the defective probability of each ROI is defined as P j (j � 1, . . ., M), where M indicates the number of ROIs.
(3) P N and all P j are concatenated, and the concatenated vector is then sent into the SoftMax activation function for normalization. en, the probability that the given fabric image belongs to a normal sample or a certain defect could be obtained. Figure 6 exhibits the structure of the proposed student network. Different from the teacher network, the student network performs the following lightweight processing:

Student Network Structure.
(1) e backbone part is thin. Specifically, only two sets of BottleNeckCSP modules are preserved in the new backbone part. Details of the BottleNeckCSP module could be found in [29]. (2) e PANet is removed to reduce the space-time complexity. e student network relies on the knowledge distilled from the teacher network to extract multiscale features. e rest of the student network, including the attention enhancement, multitask learning strategy, and information fusion, are the same with the teacher network.

Loss Functions.
e network is trained in a multitask learning manner, and a weighted combined loss function is presented to optimize the network. e loss functions used consist of the following sections: (1) e ubiquitous defect detection is termed as a binary classification problem. A cross-entropy loss function L T is used and defined as follows: where y i represents the sample label, pi represents the output probability of the ubiquitous defects detection head, and N represents the number of samples.

Mathematical Problems in Engineering
(2) e specific defect detection is termed as a multiclass problem. A SoftMax loss function L s is used and defined as follows: where K represents the kinds of specific defects, y i represents the one-hot encoding of the ground truth label, and s i indicates the probability that the sample belongs to the i th defect. (3) Considering the sample imbalance in the ubiquitous defect detection head, focal loss function L F is used to mitigate the problem. L F is defined as follows: where the hyperparameters α and c are used to alleviate the imbalance problem of positive and negative samples and difficult samples, respectively. (4) To improve the feature discriminability in the specific defect detection head, central loss function L C is employed to increase the interclass distances while reducing the innerclass distances of learned features. L C is defined as follows: where x i represents the sample encoding, and c y i is the center of the corresponding category, which x i belongs to.
e final loss function of the proposed method is calculated in a weighted manner as follows: where the weights w 1 , w 2 , w 3 , and w 4 are set to 0.4, 0.4, 0.1, and 0.1, respectively. Settings of different weights are obtained based on the crossvalidation on the publicly available databases.

Databases.
One public database comes from the Xuelang Tianchi AI Challenge. It contains 3,331 labeled images with the rectangular locations to label the defects. e number of normal pictures is 2,163, and the number of defective pictures is 1,168. It has 22 kinds of defects, including jumps, knots, stains, puncture holes, and lacking warp. e data distribution on the database shows an unbalanced state in which the number of normal pictures is much higher than the number of defective pictures. Using the same experimental protocol as [19], the specific defect category is reintegrated into puncture hole, knots, rubbing hole, thin spinning, jumps, hanging warp, lacking warp, brushed hole, stains, and others. In experiments, 70% of the entire database is taken as the training set, and the remaining 30% are the test set. Several training samples and their labels are shown in Figure 7.
Another used public database is TILDA, a well-known fabric texture database containing eight kinds of representative fabric categories. Seven error classes and a correct class are defined according to the textile atlas analysis. Similar to [40], 300 fabric images are chosen and are divided into six categories, such as holes, scratch, knots, stain, carrying, and normal. Each class consists of 50 fabric images, and each image is resized to 256 × 256 pixels. In experiments, 70% of the entire database is taken as the training set, and the remaining 30% are the test set. Figure 8 demonstrates several samples and their labels.

Evaluation Metrics.
e defect detection algorithm proposed in this work could distinguish between normal and defect images and identify specific fabric defects. erefore, area under the ROC curve (AUC) and mean average precision (mAP) are used as metrics for evaluation. e former reflects the algorithm's ability to distinguish between normal and defective fabric images, whereas the latter reflects the algorithm's ability to recognize specific fabric defects. To calculate AUC and mAP, precision (P) and recall (R) are calculated initially, as follows: where TP (true positive) represents the number of samples whose labels are positive, and the actual forecasts are positive. FP (false positive) indicates the number of samples whose labels are negative, and the actual forecasts are positive. FN (false negative) represents the number of samples whose labels are positive, and the actual forecasts are negative. Based on the calculated P and R, the P-R curve could be obtained. en, the ROC curve could be obtained. e cover area of the ROC curve is AUC. mAP represents the mean of different APs, where AP represents the area under the P-R curve. mAP is calculated as follows: where k represents the number of categories.

Qualitative Analysis.
A qualitative analysis of the proposed method is performed from three aspects: (1) the ability of the proposed teacher network to detect specific defects on public databases is evaluated, and OurNet is used for comparison; (2) the accuracy of the proposed teacher network to locate the defect areas is evaluated, and the improved YOLOv3 proposed by Jing et al. [24] is used for comparison; and (3) comparisons between the teacher and student networks are performed on self-collected fabric images to verify the generalization performance of the proposed method. Quantitative comparisons between the teacher and student networks will be introduced in the following section. Figure 9 demonstrates the comparisons between the proposed teacher network and OurNet in detecting specific defects on the Tianchi AI database. e results show that our method successfully recognizes different defect types benefiting from the used multitask learning, focal loss function, and the center loss constraint. By contrast, OurNet fails to identify the puncture hole defects. It also mistakes the brushed hole and thin spinning defects for others and jumps defects, respectively. Figure 10 shows the location results between the proposed teacher network and the improved YOLOv3 proposed by Jing et al. [24] on the Tianchi AI database. Types of specific defects are labeled under each subfigure for a clearer view. In the subfigure, the green box represents the real defect area, the red box is the positioning result of the proposed teacher network, and the yellow box is the positioning result of the improved YOLOv3. Figure 10 shows that the defect regions predicted by the proposed method are more accurate than those predicted by the improved YOLOv3. Such superiority may be a benefit from the strong YOLOv5 and our improvements. e improved YOLOv3 suffers from positioning small defect areas, although it could detect most defects. For example, it fails to detect the hanging warp and jump defects. Figure 11 compares the teacher and student networks on self-collected fabric images, specifically, their Mathematical Problems in Engineering performance in positioning defect areas. In each subfigure, the green box represents the real defect area, the red box is the positioning result of the teacher network, and the yellow box is the positioning result of the student network. e teacher network could more accurately identify the defect areas. e defect detection performance of the student network is slightly weaker than that of the teacher network. However, the student network has lower space-time overheads; thus, it is more suitable to be arranged for embedded systems.

Quantitative Analysis
Results. An ablation study is performed on the Tianchi AI database to verify the effects of different improvement methods, including multitask learning, focal loss, and central loss constraints. e results are presented in Table 1. e ablation study of the teacher network shows that the student network has similar results. Table 1 shows that the teacher network is degraded into traditional YOLOv5 when none of the improvements is used. Compared with the YOLOv5-based detection method, the introduced attention module could lead to an improved performance with increased AUC and mAP. en, AUC and mAP are further improved by simultaneously detecting ubiquitous and specific defects with the proposed multitask learning strategy because of the complementarity between different tasks. Based on the multitask learning strategy, the introduction of the focal loss function and central loss constraint could further improve the defect detection results. Simultaneously using all improvements achieves the best performance on the Tianchi AI database, which verifies the effects of different improvement methods.
A quantitative comparison between the teacher and student networks is presented in Table 2. e identification times are tested on an Nvidia JETSON TX2. e table shows that the student network could still meet the needs of fabric defect detection, despite the performance degradation observed compared with the teacher network. More importantly, the identification time of the student network is approximately half of the teacher network. Its identification time guarantees the real-time performance on embedded devices.
Finally, comparisons with other mainstream methods are performed to verify the effectiveness of the proposed method. e improved YOLOv3 [24] and the pretrained deep CNN [40] are selected as the fabric defect detection algorithms. Faster-RCNN [34] and YOLOv5 [29] are selected as the universal object detection methods. e comparison results are presented in Table 3.
e above table shows that the original OurNet based on AlexNet has poor defect detection performance because it fails to handle small defect areas well. Two variants of OurNet, namely, OurNet-VGG16 and OurNet-ResNet, obtain better performance benefit from extracting better features with deeper structures. Jing et al. [24] achieves better defect detection performance using improved YOLOv3 networks. A pretrained CNN is also beneficial in boosting the defect detection performance as proposed by Jing et al. [40]. YOLOv5 and faster-RCNN achieve similar defect detection performance benefiting from their strong power in object detection. Both methods are superior to the student network proposed in this work, but the time overhead is relatively large. e proposed teacher network achieves the best fabric defect detection performance,

Normal
Knots Holes Scratch Stain Carrying whereas the student network provides an alternative to detect fabric defects with acceptable accuracy on embedded devices. Table 4 presents the comparisons between different methods on the TILDA database. OurNet [41] and its variants perform much better than on the Tianchi AI database because the TILDA database contains fewer categories and equal samples per category. Improved YOLOv3 [24] proposed by Jing et al. [40] achieve similar performance due to the reason discussed above. Similar to the comparisons on the Tianchi AI database, two state-of-the-art detectors, YOLOv5 [29] and faster-RCNN [34], obtained higher AUC and mAP compared with that of the proposed student network. e proposed teacher network still achieves the best defect detection performance, which verifies the accuracy of the proposed method.  10 Mathematical Problems in Engineering Figure 11: Comparisons between the teacher and student networks on self-collected fabric images.    [24] 0.927 0.372 Jing et al. [40] 0.932 0.382 YOLOv5 [29] 0.957 0.412 Faster-RCNN [34] 0

Discussion and Conclusion
An automatic fabric defect detection method based on YOLOv5 is proposed because of the considerable role of fabric defect detection in the textile industry. A teacherstudent architecture is used in considering the real-time requirements of the fabric defect detection. e deep teacher network could precisely detect specific fabric defects. After knowledge distillation, the shallow student network could perform fabric defects in real time with an acceptable accuracy. A multitask learning strategy is introduced to detect ubiquitous and specific defects simultaneously, and better utilize the complementarity between different tasks. Focal loss and center loss constraints are introduced for better defect detection performance. Evaluations are performed on the public databases and self-collected fabric images. Comparisons with other mainstream methods indicate that the proposed method is applicable to the automatic detection task of textile defects, which can greatly improve the accuracy and efficiency of defect detection and enhance the automation level of the textile industry.

Data Availability
e Xuelang Tianchi AI Challenge dataset is publicly available.