Research on Asphalt Pavement Disease Detection Based on Improved YOLOv5s

Pavement disease detection and classi ﬁ cation is one of the key problems in computer vision and intelligent analysis. This is an automated target detection technology with great development potential, which can improve the detection e ﬃ ciency of road management departments. The research based on the convolutional neural network is aimed at realizing asphalt pavement disease detection based on low resolution, occlusive interference, and complex environment. Considering the powerful function of the convolutional neural network and its successful application in object detection, we apply it to asphalt pavement disease detection, and the detection results are used for subsequent analysis and decision-making. At present, most of the research on pavement disease detection focuses on crack detection, and the detection of multiclass diseases is less, and its detection accuracy and speed need to be improved, which does not meet the actual engineering application. Therefore, a rapid asphalt pavement disease detection method based on improved YOLOv5s was proposed. The complex scene data enhancement technique was developed, which is used to enhance and extend the original data to improve the robustness of the model. The improved lightweight attention module SCBAM was integrated into the backbone network, which can enhance the feature extraction ability and improve the detection performance of the model for small targets. The spatial pyramid pooling was improved into SPPF to fuse the input features, which can solve the multiscale problem of the target and improve the reasoning e ﬃ ciency of the model to a certain extent. The experimental results showed that, after the model is improved, the average accuracy of pavement disease reaches 94.0%. Compared with YOLOv5s, the precision of the improved YOLOv5-pavement is increased by 3.1%, the recall rate is increased by 4.4%, the F1 score is increased by 3.7%, and the mAP is increased by 3.8%. For transverse cracks, longitudinal cracks, mesh cracks, potholes, and repaired pavement, the detection accuracy of pavement disease detection method based on YOLOv5-pavement is improved by 3.4%, 3.1%, 4.0%, 7.5%, and 4.8%, respectively, compared with that based on YOLOv5s. The proposed method provides support for the detection work of pavement diseases.


Introduction
Asphalt pavement is durable, comfortable to drive on, smooth, and has other characteristics, that make it widely used in road construction. During the service of asphalt roads, it is easy to produce diseases due to vehicle load and water intrusion, which affect the comfort of automobile driving and road traffic safety. Therefore, to ensure the normal service of roads, timely monitoring, evaluation, and repair of road diseases are very important and necessary. At present, the detection methods of pavement diseases mainly include visual inspection, camera survey, and ground pene-trating radar [1,2]. Visual inspection relies on human eye observation or video to observe road disease, which is inefficient and has large errors. Camera measurement requires the use of professional detection vehicles driving along the lane and taking images of the road surface, and the use of postprocessing software to identify the types of pavement diseases. This method has high detection accuracy, but the detection cycle is long and the processing links are too many, resulting in low efficiency. Ground penetrating radar (GPR) can detect most pavement diseases by transmitting an electromagnetic pulse [3], but it is difficult to apply on a large scale because of its high cost. Zhou et al. [4] put forward a focus on intelligent road disease detection and summarized the commonly used detection equipment in intelligent road disease detection technology, including cameras, GPR, LiDAR, and IMU. The evolution and development of road disease detection technology are described systematically, which is of practical significance to the development of road disease detection technology in the future. Therefore, it is of great practical significance for developing an automated pavement disease detection method based on images and deep learning.
With the continuous development of computer technology, object detection algorithm based on deep learning have achieved rapid development and has been widely used in autonomous driving, face recognition, crop disease and pest recognition, defect detection, and other fields [5][6][7]. There are two main types of object detection algorithms based on deep learning: One is a two-stage target detection algorithm that divides feature extraction and target localization into two stages, such as R-CNN [8,9] (Region Proposals for Convolutional Neural Networks), Fast R-CNN [10], and Faster R-CNN [11][12][13]. The second category is a one-stage target detection algorithm that integrates feature extraction and location processing, such as SSD [14] (Single Shot Multibox Detector) and YOLO [15,16] (You Only Look Once) series. YOLOv5s is an improved version of the YOLO series [17][18][19], with a simple training process, small physical space occupation, and high detection accuracy that can be used for the recognition and detection of asphalt pavement diseases. YOLOv5s extracts target features through multilayer convolution and pooling operations, which will result in a large amount of feature information loss and low accuracy of the target. The enhancement module can be used to improve the feature extraction ability of the model and help with the detection task. In the industrial field, it is proposed to change the last part of the deep residual network (ResNet) structure into a deformable convolution for lightweight improvement [20]. The feature fusion structure was improved to improve the positioning accuracy and recognition rate [21]. The function detection scale was added [22], and the NMS is replaced by the introduction of Distance Intersection over Union nonmaximum suppression (DloU NMS) [23,24], which can suppress error detection and enhance the detection capability for small targets. The performance of the model is greatly improved by adding a small target detection layer, Squeeze and Excitation Networks (SENet) [25], introducing CIOU loss function, and using migration learning methods [26].
However, in the research of road surface disease recognition based on deep learning, existing road surface disease detection methods mainly focus on crack detection, and there are few studies on multiclass disease detection. For example, Zhu [27] proposed a pavement crack detection algorithm based on defect image segmentation and image edge detection. Wu et al. [28] proposed a novel and efficient image-processing method for extracting pavement cracks from blurred and discontinuous pavement images. And the crack detection network is based on a multiscale extended convolution module and upsampling module proposed by Song et al. [29]. Li et al. [30] describe an innovative vision-based pavement crack detection strategy that provides a direct pavement surface condition index (PCI) for specific pavement locations. This strategy uses a convolutional neural network (CNN) algorithm to mine a database containing more than 5000 pavement damage images to classify pavement crack types, and the overall pavement damage rate detection accuracy reaches 90%. Zhang et al. [31] proposed a unified crack and seal crack detection method, which detects and separates cracks and seal cracks under the same framework. It trains the deep convolutional neural network to preclassify pavement images into cracks, sealed cracks, and background regions. A block threshold segmentation method is proposed to effectively separate crack and seal crack pixels. Finally, a curve detection method based on tensor voting was used to extract the crack/seal crack. For the detection of multiclass road diseases, Dong and Liu [32] proposed an automatic road damage detection and location method using the Mask R-CNN algorithm and active contour model. Tang et al. [33] proposed a new deep-learning framework called IOPLIN. Similarly, Zhao et al. [34] proposed DASNet, a deep convolutional neural network, which can be used to automatically identify road diseases. The network uses demorphable convolution instead of conventional convolution as the input of the feature pyramid. Before feature fusion, the same supervisory signal is added to multiscale features to reduce semantic differences. Context information is extracted through residual feature enhancement, and information loss of the top pyramid feature map is reduced. The loss function is improved according to the problem of unbalance of positive and negative backgrounds. Compared with the Faster RCNN baseline, the method achieved an improvement of 41.1 mAP and 3.4 AP. Mao et al. [35] built a framework of pavement disease recognition and perception system based on UAV, used UAV to carry out the pavement image data acquisition experiment, analyzed the pavement disease image preprocessing technology based on wavelet threshold transform, studied the pavement disease image preprocessing technology based on DPM, and proposed the pavement disease recognition method based on VGG-16 neural network model. Due to the high complexity, many parameters, and large size of these algorithms, the detection speed of many models is slow, which is difficult to meet the practical application. Therefore, it is of high research significance to improve the recognition accuracy of multiclass road diseases and ensure the lightweight model at the same time.
This research takes the four diseases of asphalt pavement, namely, transverse cracks, longitudinal cracks, mesh cracks, and potholes, and the repaired pavement, as the research object, uses the detection vehicle to collect pavement image data, improves the lightweight target detection model YOLOv5s, and develops a special fast detection model YOLOv5-pavement for various diseases of asphalt pavement. The improved attention module SCBAM is integrated into the backbone network. The spatial pyramid pooling structure is improved into SPPF; complex scene data enhancement technology is developed to enhance the training set images. The processing flow is shown in Figure 1. This method provides technical support for the accurate 2 Journal of Sensors and rapid detection of multiclass disease targets on asphalt pavement. pixels × 1840 pixels, the pixel size is 5:5 μm × 5:5 μm, the sampling spacing is 5 m, the shooting height is 2 m, and the shooting time is daytime. A total of more than 50,000 images were collected in this study, and the collection method was continuous along the road. After the collection, images containing diseases in the data set should be screened out. After inspection, 7641 images in the data set were found to have diseases. Due to certain limitations, this study tested the four diseases of transverse cracks, longitudinal cracks, mesh cracks, potholes, and repaired pavement, as shown in Figure 2.

Materials and Methods
2.1.2. Image Preprocessing. The object detection model based on deep learning is implemented based on training a large amount of image data, so it is necessary to build a data set with sufficient data volume and rich types. Firstly, 764 images were randomly selected from 7641 disease image data as the test set, and the remaining 6877 images were used as the training set. The distribution details of image samples in the data set are shown in Table 1. Secondly, to improve the training efficiency of the road disease detection model, the original 6877 images of the training set are compressed, and their length and width are compressed to 1/2 of the original image. Then, the image data annotation software "Labe-lImg" was used to draw the outer rectangular boxes of different road disease targets in the compressed road disease images to realize the road disease annotation. The images were labeled according to the smallest rectangle around the different pavement diseases to ensure that the rectangle contains as little background area as possible. Finally, to enrich the background information of the image data, data enhancement was performed on the training set to better extract the features of diseases belonging to different labeled categories and avoid overfitting the trained model. Table 1 shows that there are large differences in the number of different types of diseases. The number of transverse cracks, longitudinal cracks, and mesh cracks is sufficient, and only background information enhancement can be used to build a rich data set. The number of potholes and repaired pavements is seriously insufficient, which is unbalanced with other diseases in terms of quantity, and there are many small targets of potholes, so it needs to be strengthened and expanded. In the actual detection task, complex environmental factors, such as light, shadow, and water are the key to affect the accuracy of the model. Due to the different causes of asphalt pavement diseases, transverse cracks and longitudinal cracks have strong directionality, so the traditional data enhancement methods of stretching and rotation cannot be used to enhance pavement disease data sets. To improve the detection performance of the algorithm model in a complex environment and change the problem of unbalanced samples among multiple diseases, this study referred to the possible complex environment of the road surface and proposed the complex scene data enhancement technology based on the function in the python data enhancement library "Imgaug", which changed the problem of the unbalanced number of different  The detailed steps of complex scene data enhancement are as follows: the complex scene data enhancement is divided into two parts. In the first part, the size of the Gaussian noise, fog, rain, snow, mud, and other environmental noise is determined by calling the modules of "Imgaug" library, and then it is gradually added to the original image to imitate the road environment. The second part is to invoke the modules of motion blur, brightness adjustment, and saturation in "Imgaug" library and add them to the original image to mimic the weather environment. In the first part, 2 or 3 enhancement methods are randomly selected, and then in the second part, 1 enhancement method is randomly selected and added to the original image, respectively, to complete the image enhancement process.
The original image data were enhanced by complex scene data enhancement technology, and the amount of data in each category was enhanced to about 2500. The final training set consisted of 12500 images, which were used as the training data set of the pavement disease detection model, including 5623 enhanced images and 6877 original images. The training set data enhanced by complex scene data is shown in Figure 3.

The Structure of YOLOv5s.
YOLOv5 is the latest in the YOLO series. The network model detection accuracy is high, the inference speed is fast, and the fastest detection speed can reach 140 frames per second (FPS). The size of the YOLOv5 target detection network model is small, only 14.4 M. Therefore, the advantages of the YOLOv5 network are high detection accuracy, lightweight, and fast detection speed.
The YOLOv5s consist of four parts, as shown in Figure 4, including input, backbone, neck, and prediction. Input mainly contains the preprocessing of data. Backbone extracts image features at different levels through multilayer convolution operations. Neck network consists of two parts, including a feature pyramid network (FPN) and the path   2.3. Improvement of YOLOv5s. The pavement disease detection algorithm needs to accurately identify various diseases under various circumstances in the complex pavement environment. Considering the simple structure of YOLOv5s, the weak feature extraction ability, and the complex pavement environment, these factors will lead to the low detection accuracy of the pavement disease detection algorithm. The attention mechanism has been proven to enhance the ability of model-degree feature extraction. Adding attention opportunity to the road surface disease detection model is helpful to improve the detection accuracy for complex environments and small targets. Therefore, this study improves the backbone network of YOLOv5s by adding the improved attention module SCBAM to improve the feature extraction ability of the model, which can improve the detection accuracy of the model in complex scenes and strengthen the detection ability of small targets. At the same time, in order to solve the multiscale problem of the target to a certain extent and improve the reasoning speed of the model, this paper improved the SPP module to the SPPF. In order to achieve the high reasoning speed of the model and improve its detection accuracy, the asphalt pavement disease detection model can be optimized and improved.

CBAM Attention
Module and Its Improvement. The convolutional block attention module (CBAM) is lightweight. CBAM processes the input feature map through the internal channel attention module and spatial attention module to simplify Feature extraction and improve the detection speed of the model. The action mechanism is weighted fusion (Equations (1) and (2)). In Equation (1), F is the input feature map. F 1 is the feature map obtained by channel attention weighting. F is the feature map obtained by spatial attention weighting. M c ðF 1 Þ is the channel attention output weight. M s ðF 1 Þ is the spatial attention output weight. The CBAM attention module and its improvement are shown in Figure 5.
The mechanism of the channel attention module is as follows: the input feature map F is processed by global max pooling and global average pooling to obtain two 1 × 1 × c feature maps. The multilayer perceptron (MLP) was fed with them, and the feature map of channel attention was generated by the addition operation of element-wise on the output feature map of MLP, which was processed by the Sigmoid activation function. Finally, M c an input feature map F is processed, and an element-wise multiplication The principle of spatial attention is the spatial dimension is unchanged, and the channel dimension is compressed. This module focuses on the location information of the target. Its operation flow is as follows: F 1 is taken as the input feature map of this module. After maximum pooling and average pooling, the tensor with the size of H is generated, and then the two are stacked together by concat operation, and then the channel is changed to 1 by convolution operation, with H and W unchanged. M s is obtained by activation function (Sigmoid), and finally M s and F 1 are multiplied to get the generated features.
The traditional spatial attention module extrudes image spatial information using average pooling and maximum pooling. The error of feature extraction comes from the increase of estimated value variance caused by the limitation of neighborhood size and the deviation of the estimated mean caused by the error of convolution layer parameters. The former can be reduced by average pooling to retain more background information, while the latter can be reduced by maximum pooling to retain more detailed tex-ture information. However, this extrusion method does not make full use of the spatial information of the image and does not capture the spatial information of different scales to enrich the feature space. Moreover, spatial attention only considers the information of local regions, but cannot establish long-distance dependence. In this paper, based on squeezing image spatial information using average pooling and maximum pooling, Stochastic-pooling is added to enrich spatial information and retain detailed texture information. Stochastic pooling assigns a probability to pixels according to their numerical size and then subsampling according to the probability. The improved spatial attention module adopts a Stochastic pooling operation on the input feature map, and then the spatial attention feature map is obtained by concatenating three feature descriptors and using 3 × 3 convolution kernel for operation. This scheme is helpful to obtain more feature information and strengthen the feature extraction ability of the algorithm model for small targets.
In this study, the improved CBAM attention module is named SCBAM and added to backbone's C3 module, as shown in Figure 6. The function of the attention module is    7 Journal of Sensors to give more weight to important areas. When outputting diseases, it will focus on the corresponding areas in the picture to improve the feature extraction ability of road diseases.

Improvement of Spatial Pyramid
Pooling. Spatial pyramid pooling (SPP) was proposed by Microsoft in 2015 [36]. It can transform the input features of any scale into the same dimension and output the features after block stacking in parallel, which can solve the problem of underutilization of deep semantic information. Different SPP structures are shown in Figure 7, and the feature extraction process includes the following: firstly, the input feature map is partitioned by pooling of 1 × 1, 2 × 2, and 4 × 4 sizes, and 21 subblocks are obtained. Secondly, the maximum value is selected from the 21 subblocks, so that the input feature map of arbitrary size is converted into a fixed size of the 21-dimensional feature. Finally, the 21-dimensional features are stacked and pooled to complete the whole SPP process. Compared with the original SPP structure, the new SPP has one more convolution block (Conv, Batch Normalization, Leaky ReLU, CBL) before and after the new SPP, and the size of the pool core in the middle is 1 × 1, 5 × 5, 9 × 9, and 13 × 13, respectively.
The SPP module realizes the fusion of local features and global features, enriches the expression ability of the final feature mAP, and thus improves the mAP. However, the parallel pooling method is time-consuming. Therefore, in this paper, the improved spatial pyramid pooling structure (SPPF) with higher efficiency is adopted for pooling processing, and the deep semantic information expression ability is enhanced by stacking features after multicore pooling. Its structure is to change the pooling mode of input feature layers from parallel to serial, and stack pooling is carried out after multiple convolutions in serial mode to increase the image receptive field and enrich feature information. SPPF uses three 5 × 5 serial pooling layers instead of a single 13 × 13 pooling layer to obtain the same processing effect, but the calculation speed is significantly improved. Under the conditions of the same basic convolution code, the number of input and output layers is 32 and 128, respectively, and the inference time of SPP is 0.5373 s, while the inference time of SPPF is 0.2078 s. SPPF takes 0.3295 s less than spp.
Based on the improvement of the YOLOv5s detection task for multiclass asphalt pavement diseases, the overall framework of asphalt pavement disease detection was proposed and named YOLOv5-pavement. The attention module was added to the backbone network, the SPP module was improved to SPPF, and the data set was enhanced by using data enhancement technology. Its overall framework is shown in Figure 8.   The initial learning rate, weight attenuation coefficient, training momentum, confidence, IOU threshold, and Epoch of the pavement disease detection model training based on YOLOv5-pavement were set to 0.001, 0.0005, 0.9, 0.5, and 300, respectively. After training, the weight file of the pavement disease detection model was saved, the test set was validated, and the results were used to evaluate the performance of the model. The final output of the network is the identification of the five target boxes and the probability of belonging to a specific category.

Evaluation Indicators of Model.
Precision, recall, F1 score, mean average precision (mAP), space occupied by the model, and reasoning speed were selected as the evaluation indexes of each model.
Precision and recall are important indexes for evaluating model accuracy, as shown in Equations (3) and (4).
where P is the precision, R is the recall, TP is the true positives, FP is the false positives, and FN is the false negatives. The F1 score is a measure of classification problems. It is the harmonic average of the accuracy rate and recall rate, as shown in Equation (5): where P is the precision and R is the recall. AP and mAP are calculated, as shown in Equations (6) and (7).
AP at 0.5 is defined as when the IOU threshold is 0.5, all detection results of a certain type of sample with n positive examples are arranged in descending order of confidence, and each additional positive example corresponds to a Precision value (P i ). The average value of P is obtained to obtain AP at 0.5, and mAP at 0.5 is the mean value of all AP classes. n is the number of detected categories. The space occupied  Figure 8: The structure of YOLOv5-pavement.

Results of Training and Validation.
The change in the training loss value of YOLOv5-pavement is shown in Figure 9, where the results show that the loss value drops sharply when the number of iterations is from 0 to 20. From 20 to 300 times, the change range of loss value was stable, showing a slow downward trend. After 300 iterations, the loss value tends to be stable around 0.02, and the model reaches a relatively stable state to complete the training. The results in the figure show that the model is well trained without overfitting.

Analysis of Pavement Disease Detection.
In order to verify the performance of the pavement disease detection model  10 Journal of Sensors based on YOLOv5-pavement proposed in this study, the recognition results of the model on 764 images in the test set were further analyzed. There are 910 targets to be detected in the 764 test set images, including 307 targets for transverse cracks, 253 targets for longitudinal cracks, 248 targets for mesh cracks, 39 targets for potholes, and 63 targets for repaired pavement. The pavement disease detection results based on YOLOv5-pavement proposed in this study are shown in Table 2 and Figure 10, including the precision, recall, and mAP of each type of detection target. The pavement disease detection and validation results based on YOLOv5s in Table 2 and Figure 10 shows that the average precision of transverse cracks, longitudinal cracks, mesh cracks, potholes, and repaired pavement is 88.5%, 88.9%, 92.7%, 87.2%, and 87.3%, respectively, and the mAP is 89.7%. In the pavement disease detection validation based on YOLOv5-pavement, the average precision of transverse cracks, longitudinal cracks, mesh cracks, potholes, and repaired pavement is 91.9%, 92.0%, 96.7%, 94.7%, and 92.1%, respectively, and the mAP is 93.5%. In the pavement disease detection validation based on YOLOv5-pavement, the F1 score of transverse cracks, longitudinal cracks, mesh cracks, potholes, and repaired pavement is 91.7%, 91.1%, 94.1%, 94.9%, and 88.9%. The F1 score of the model is 92.0%. Figure 10 shows that the detection average precision of YOLOv5s for transverse cracks, longitudinal cracks, pot-holes, and repaired pavement is lower than 90%. This is because many transverse cracks, longitudinal cracks, and potholes collected in this study exist in the form of small targets, while the feature extraction ability of YOLOv5s is weak, resulting in the poor detection ability of small targets and the low average precision of potholes. The characteristics of the repaired pavement are often not obvious, and the feature extraction ability of YOLOv5s is weak, resulting in low mAP of the repaired pavement. Compared with the pavement disease detection method based on YOLOv5s, the average precision of YOLOv5-pavement for transverse cracks, longitudinal cracks, mesh cracks, potholes, and repaired pavement had been improved by 3.4%, 3.1%, 4.0%, 7.5%, and 4.8%, respectively. The detection results show that the improved scheme in this paper has a positive effect on the detection of pavement diseases, effectively improves the feature extraction ability of the pavement disease detection model and the detection ability of small targets, and significantly improves the average precision of the four types of asphalt pavement diseases and the repaired pavement. Figure 11 showed the detection results of the two models for five types of targets in the complex environment. It can be found that the pavement disease detection model based on YOLOv5s misses mesh cracks under strong light conditions and misses potholes with small areas under weak light conditions. The pavement disease detection model based on YOLOv5-pavement had not been disturbed by

11
Journal of Sensors the environment, the mesh cracks were completely detected, and all the potholes were detected in the low-brightness environment.

Ablation Experiment and Comparison of Different
Models. To verify the lifting effect of each module of the improved algorithm, ablation experiments were conducted in this paper. Based on YOLOv5s, the CBAM module, SCBAM module, and SPPF were added, respectively, to explore the influence of the improved algorithm on the model. The ablation experiment results are shown in Table 3: The ablation experiment results show that the precision, recall, F1 score, mAP, and reasoning speed of the plan B model are improved by 0.6%, 0.8%, 0.7%, 0.4%, and 4FPS, respectively, compared with plan A by adding the SPPF module. The results show that SPPF can better integrate the feature information of the target in the multiscale target detection task, improve the detection accuracy of the algorithm, and accelerate the reasoning speed of the model to a certain extent. Compared with plan A, the precision of plan D integrated with the improved SCBAM attention module is increased by 2.8%, the recall is increased by 3.2%, the F1 score is increased by 3.0%, and the mAP is increased by 2.5%. It showed that the feature extraction ability of the model is enhanced after adding SCBAM. At the same time, compared with plan C, the accuracy of the plan D model integrated with the SCBAM is increased by 0.6%, the recall is increased by 0.4%, the F1 score is increased by 0.5%, and the mAP is increased by 0.4%. It shows that the improved SCBAM retains more target features than CBAM and strengthens the feature extraction ability of the model. The results of the pave-ment disease detection model based on YOLOv5pavement in plan E show that the precision-recall F1 score and mAP are increased by 3.1%, 4.4% 3.7%, and 3.8% compared with the results in plan A, respectively. The results of ablation experiments show that the improved module has a positive effect on the model.
To verify the effectiveness of the proposed method, the proposed algorithm model is compared with other object detection algorithm models, and 200 pavement images were selected for testing. Faster R-CNN, SSD, YOLOv3, YOLOv4, and YOLOv5-pavement are proposed in this paper and are selected for lateral comparison experiments. The results are shown in Table 4.
Experimental results show that the average accuracy of the pavement disease detection model based on YOLOv5pavement was slightly lower than the two-stage algorithm faster R-CNN and was second only to YOLOv4 in the onestage algorithm, reaching 93.5%. Compared with SSD and YOLOv3, YOLOv5 pavement saw mAP increases of 7.4% and 9.6%, respectively. In terms of the space occupied by the model, the YOLOv5 pavement is the smallest, at only 14.1 M, and the size of the model is only 7.5% of SSD and 6.0% of YOLOv3. Moreover, YOLOv5-pavement has obvious advantages in reasoning speed, with FPS reaching 82, which is 34 FPS, 70 FPS, 26 FPS, and 18 FPS faster than SSD, faster R-CNN, YOLOv3, and YOLOv4, respectively. SSD, Faster R-CNN, YOLOv3, and YOLOv4 spent 16.1 s, 29.3 s, 14.2 s, and 14.0 s, respectively, in detecting 200 pavement images, while the time of YOLOv5-pavement proposed in this paper was only 12.6 s. The method in this paper has the fastest detection speed when dealing with the same detection task.  Figure 11: Detection results of two models in a complex environment.

Discussion
The advantages of the pavement disease detection method proposed in this paper are reflected in the following aspects: first, the algorithm model can detect and identify multiclass diseases, which is not possessed by many pavement disease detection models. Secondly, the model has high stability in a complex environment and can adapt to various complex detection environments. Finally, the detection accuracy of the improved YOLOv5 pavement is significantly improved, and the high reasoning speed and lightweight are maintained, which has the potential for large-scale deployment and application. YOLOv5 includes four different architectures (YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x), whose main difference is the number of feature extraction modules and convolution at specific locations in the network. The size of the model and the number of model parameters increase in each of the four architectures. So YOLOv5 has high flexibility, according to the actual requirements to choose different structures for development and application. This study considers the precision and reasoning speed of the pavement disease detection algorithm, considering the pavement disease detection methods depend on the road detection vehicle collected along the road, with minimal YOLOv5s size, implement the deployment of the biggest potential in the mobile terminal, can be deployed on the test vehicle and in the process of acquisition directly in pavement disease detection.
The pavement disease detection method proposed in this study relies on images to detect and classify pavement diseases. This method cannot work without light at night, and there is a lot of traffic during the day, which will have a certain impact on the detection work. Due to the limitation of experimental conditions, this study only studied some possible diseases of asphalt pavement, did not detect and classify all kinds of diseases of asphalt pavement, and did not consider the possible types of diseases caused by roads of other materials, which limited its application scope. On the other hand, the method proposed in this paper can only detect road surface diseases, while the diseases generated by the internal structure of the road cannot be detected, which is another limitation of this research method.

Conclusion
Aiming at the problems of lack of asphalt pavement disease detection means, low accuracy of automated detection model, poor ability of small target detection, and easy to be disturbed by environment, an asphalt pavement disease detection method based on improved YOLOv5s was proposed.
A complex scene data enhancement technique is proposed to balance different types of pavement diseases and simulate complex road environments to improve the robustness of the model. The SCBAM attention module is integrated into the backbone network to optimize the feature extraction ability of the backbone network and improve the feature extraction ability and detection accuracy of the pavement disease detection model for difficult-to-detect objects. The SPP is improved as SPPF, and the input features are further fused, which can solve the target multiscale problem and improve the reasoning speed to a certain extent. The results showed that the improved network model can effectively recognize road surface diseases. The precision, recall, F1 score, mAP, and detection speed of the model reached 91.2%, 92.9%, 92.0%, 93.5%, and 82 FPS, respectively. Compared with the detection methods based on YOLOv5s, the mAP of transverse cracks, longitudinal cracks, mesh cracks, potholes, and the repaired pavement has increased by 3.4%, 3.1%, 4.0%, 7.5%, and 4.8%, respectively, and the mAP has increased by 3.8%. The ablation experiment results show that the improved scheme proposed in this paper can improve the accuracy of the pavement disease detection method, and maintain the light weight of the model, and the reasoning speed of the model is also maintained at a high level. The results of the comparison between YOLOv5-pavement and other models show that the detection accuracy of YOLOv5-pavement has certain advantages, and the accuracy gap between YOLOv5-pavement and faster R-CNN with the highest accuracy is very small. Moreover, the reasoning speed of YOLOv5-pavement is the fastest, and the model occupies only 14.1 M. The asphalt pavement disease detection model presented in this paper has certain practical value for the detection of asphalt pavement diseases.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare no conflict of interest.

Authors' Contributions
All the authors contributed substantially to the manuscript. Original draft preparation and formal analysis were performed by Lingxiao WU. Supervision was conducted by Zhugeng Duan. Review and editing were done by Chenghao Liang. All authors have read and agreed to the published version of the manuscript.