Research on Road Adhesion Condition Identification Based on an Improved ALexNet Model

Automotive intelligence has become a revolutionary trend in automotive technology. Complex road driving conditions directly aﬀect driving safety and comfort. Therefore, by improving the recognition accuracy of road type or road adhesion coeﬃcient, the ability of vehicles to perceive the surrounding environment will be enhanced. This will further contribute to vehicle intelligence. In this paper, considering that the process of manually extracting image features is complicated and that the extraction method is random for everyone, road surface condition identiﬁcation method based on an improved ALexNet model, namely, the road surface recognition model (RSRM), is proposed. First, the ALexNet network model is pretrained on the ImageNet dataset oﬄine. Second, the weights of the shallow network structure after training, including the convolutional layer, are saved and migrated to the proposed model. In addition, the fully connected layer ﬁxed to the shallow network is replaced by 2 to 3, which improves the training accuracy and shortens the training time. Finally, the traditional machine learning and improved ALexNet model are compared, focusing on adaptability, prediction output, and error performance, among others. The results show that the accuracy of the proposed model is better than that of the traditional machine learning method by 10% and the ALexNet model by 3%, and it is 0.3h faster than ALexNet in training speed. It is veriﬁed that RSRM eﬀectively improves the network training speed and accuracy of road image recognition.


Introduction
As car ownership has risen continuously, traffic jams, delays, and accidents spiraled upward. According to statistics [1,2], 16.12% of traffic accidents on highways are attributed to slippery road conditions and the driver's response to changes in terrain caused by road damage. To improve vehicle safety, research on vehicle safety control has gradually changed focus from passive safety to active safety. As an important part of the vehicle's perception of the surrounding environment, road surface type recognition plays an important role not only in the power, smoothness, and comfort of intelligent driving vehicles, but also in vehicle safety.
In the 1960s, Wiesel and Hubel [3] found that their unique network structure could effectively reduce the complexity of the feedback neural network when they studied the neurons used for local sensitivity and direction selection in the cortex of cats and proposed a convolutional neural network (CNN). Lecun et al. [4] made a great breakthrough in optical character recognition and computer vision by using a CNN, which promoted the development of computer vision. In recent years, CNNs have been widely used in many fields and have shown excellent performance in image target detection [5][6][7] and classification [8,9]. e appearance of a CNN also provides a new solution for road condition recognition.
Several researchers and institutions have focused on pavement type identification and adhesion coefficient prediction. Chen [10] extracted the feature parameters of the gray-level cooccurrence texture matrix of the pavement image, studied the selection of pavement texture features, and achieved certain results. However, this method has the disadvantages of fewer image features and lower recognition accuracy. Bekhtike and Kobayashi [11] used a camera to collect pavement images and evaluated the texture attributes obtained from the fractal dimension using Gaussian process regression for function approximation and predicted road types by fusing road texture features and vibration data received from motion. is method still has some limitations. For instance, when the background lighting changes obviously, motion blur occurs, or if the road is covered by rain, snow, or ice, it is difficult to accurately identify the road type.
Ward and Iagnemma [12] successfully classified asphalt, paved, and gravel roads with acceleration sensors. is method has drawbacks when the road surface roughness is similar, and it is obviously insufficient to use acceleration data to distinguish the road type. Alonso et al. [13] proposed a real-time acoustic pavement state recognition system based on tire noise, using a noise measurement system and a signal processing algorithm to classify the pavement state, and achieved accurate classification of wet and dry pavement states.
Neupane and Gharaibeh [14] proposed a method for detecting pavement types based on heuristic lidar and identified the pavement type by the mean and variance of the laser reflection intensity.
is method is mainly used for asphalt pavement. Jonsson et al. [15] proposed road classification based on near infrared camera image spectral analysis, using KNN and support vector machine (SVM) methods to classify dry, wet, icy, and snowy roads and achieved certain results. Bystrov et al. [16] used automotive ultrasonic sensors to analyze reflected ultrasonic signals for road classification, with a recognition accuracy of up to 89%.
Meng [17] proposed a method based on the basic principles of machine learning to classify pavement types by combining data from vertical acceleration sensor signals and camera features. e accuracy of using an acceleration sensor or image data to identify road type was only 62% and 88%, respectively. When the two were combined, because of the small sample size, accuracy reached only 90%. Wang [18] classified and discriminated road images based on highdimensional features and RBF neural networks and performed recognition experiments on eight different road images with an accuracy of approximately 78.4%. Based on the SVM, Zhao et al. [19] obtained the best classification model by PSO parameter optimization, classified the road types, and improved the recognition accuracy of the test image, achieving an accuracy rate of over 90% for the five basic road types.
Casselgren et al. [20] studied the light performance of asphalt pavements covered by water, ice, or snow. ey conducted a detailed study on the changes in light intensity with the angle of incidence and spectrum changes and proposed two different wavebands to classify road conditions. Linton and Fu [21] described a networked vehicle-based winter road condition (RSC) monitoring solution that combines vehicle-based image data with data from road weather information systems. Jokela et al. [22] presented a method and evaluation results to monitor and detect road conditions (ice, water, snow, and dry asphalt). e developed device is based on light polarization changes when reflected from the road surface. e recognition capability has been improved with texture analysis, which estimates the contrast content of an image, but the results show that the proposed solution does not currently adapt to different conditions perfectly well. Yeong [23] and Yu and Salari [24] developed a pothole detection system and method using 2D LiDAR. Caltagirone [25] developed a method for road detection in point cloud top-view images using fully CNN. However, according to the material presented in [26,27], even LIDAR, which is the safest laser, can cause damage to the human eye during longer exposure (e.g., cataracts and burn of the retina). In the future, with the popularity of smart cars, this type of laser may be a problem. We consider a method to improve road condition recognition through image vision.
To summarize, most road recognition algorithms are based on traditional machine learning. Traditional machine learning extracts artificial image features as algorithm input. It was found that the process had a certain randomness, and the whole process including the classification algorithm was complex. To solve these problems, this paper proposes a road surface condition identification method based on an improved ALexNet model, namely, the road surface recognition model (RSRM). erefore, the main contributions of this paper can be summarized as follows: (1) e ALexNet [28] network model is pretrained on the ImageNet [29] dataset offline. e weights of the shallow network structure after training, including the convolutional layer, are saved and migrated to the proposed model. In addition, the fully connected layer fixed to the shallow network is replaced by 2 to 3, which improves the training accuracy and shortens the training time. (2) e traditional machine learning and improved ALexNet model are compared, focusing on adaptability, prediction output, and error performance, among others.

Research Method for Identifying Road Surface Conditions Based on Improved ALexNet Model (RSRM)
e traditional road type identification method has some limitations, such as a complex extraction process, weak adaptability, poor light robustness, low recognition accuracy, and difficulty in practical application. Meanwhile, the rapid development of artificial neural networks has also given birth to the progress of deep learning [30] in recent years. Common deep learning networks include autoencoders [31], deep belief networks [32], and CNNs. In deep learning, CNNs play a key role in image recognition. Road condition recognition belongs to the field of image recognition; therefore, in this study, the road image recognition model is built by combining CNNs and deep learning theory. With the help of CNN's self-learning and training of road image features, the actual road types can be identified.

2
Journal of Advanced Transportation

Convolutional Neural Networks (CNN)
. CNN [33] has high efficiency and accuracy in image recognition, which is due to the shared parameters of convolutional kernels in the hidden layer and sparsity of interlayer connections. A CNN model is generally formed by alternately stacking convolutional layers and pooling layers, and the specific operation for input data is saved in the weight of this layer. e loss function is used to evaluate the difference between the output and target values. e optimizer uses the difference between the target value and the output value as the feedback signal to update the weight value through the backpropagation algorithm [34] and finally reduces the loss value corresponding to the current target, which makes the network prediction more accurate. e feature values of the last layer of the pooling layer generate a list of vectors through the fully connected layer and input them to SoftMax [35], for classification and recognition. e CNN training process is shown in Figure 1.

Convolutional Layers.
e convolutional layers principally perform convolution operation on the image or feature map, which is input into the convolution layer, to extract feature and output the convoluted feature map. erefore, as shown in equation (1), each feature map of convolution layer is obtained by combining and calculating multiple feature maps output from the previous layer: where M n is the feature map set filtered from the input feature map, X l n is the n th feature map in the l th layer, K l in is the i th element of the n th convolution kernel in the l th layer, b l n is the n th offset of the l th layer, and " * " is the process of convolution.

Pooling Layers.
e pooling layer, also known as the lower sampling layer, is mainly used to reduce the calculation amount of feature extraction. e pooling layer retains the number of feature maps but changes the size of the feature maps; equation (2) represents the calculation process of the sampling layers.
where down(·) is the lower sampling (pooling) function, β l n is the n th multiplication offset of the l th layer, and b l n is the n th offset of the l th layer. e lower sampling function is largely divided into mean-pooling and max-pooling. Mean-pooling is to calculate the average of all elements in the pooling area.
e max-pooling is to select the maximum element in the pool area.
where R n is the n th pooling area in the feature map and c i is the i th pixel value in R n .

Fully Connected Layers.
e fully connected layers generally locate at the last part of the hidden layers in CNN. e fully connected layers form a multilayer perceptron like the shallow neural network, which nonlinearly combines the feature vectors output by the convolutional layer and the pooling layer to get the output.

Output Layers.
e output layers in CNN are usually behind the fully connected layers. For image classification problems, the output layers use a logical function or a normalized exponential function (SoftMax function) to output classification labels. e range of the multiclassification label y in SoftMax regression is y ≥ 2. e training sample set is composed of k labeled samples: where y (i) ∈ 1, 2, . . . , k { } is the classification labels, and x (i) is the sample set. j represents different classifications, and it is estimated probability value. e probability that a single sample is classified into class K is e regression sample set is transformed into a k-dimensional probability vector, and it is given by where is to normalize the probabilities and make the sum of the probabilities be 1.
rough the training of sample set, the optimizer adjusts parameters to minimize model loss function value, and its loss function formula is defined as
As a classic model, ALexNet accelerates the development of deep learning, which is a milestone in image recognition. Before the research, we have done the comparison between ALexNet and VGG, GoogleNet, and other networks, and ALexNet network can reach a higher recognition accuracy in a shorter time.
Second, theoretically, the deeper the model layer is, the better the classification effect is. However, the training process of deep convolution network is extremely difficult. For example, many parameters lead to the disappearance of backpropagation gradient and overfitting. At the same time, the deeper network often needs to consume more computing resources. ALexNet can meet the accuracy of road image recognition, while reducing computer resources. So, the diversity of road images is low, and ALexNet can achieve higher recognition accuracy and occupy less computer resources.
ird, the road image is relatively simple, and the latest network is usually to solve more complex image classification problems. ALexNet has been able to solve the problem of road condition image recognition extremely well.
is is due to several advantages of the ALexNet network: (1) In the training process, dropout is used to randomly ignore some neurons to avoid overfitting the model. (2) Samples are data augmented [38] to expand the samples with insufficient training images. (3) Rectified Linear Units (ReLUs) [39] are used as the excitation function of the network, which improves its nonlinearity and solves the problem of gradient dispersion. To solve the problem of gradient dispersion, ALexNet adopts the ReLU activation function. ReLU is defined as follows: In Figure 2, comparing ReLU and sigmoid [40] activation function curves, it shows that when x is greater than 0, the ReLU gradient value is always a constant of 1. e derivative of the sigmoid function is like the curve shape of the Gaussian function, but not constant. e derivative at both ends of the sigmoid curve becomes smaller. erefore, the network with ReLU as an activation function converges quickly, which is helpful in accelerating training. Figure 3 and Table 1 show the structure and parameters of ALexNet. e model is mainly composed of five convolutional layers and three fully connected layers. e number of convolution kernels in five convolution layers is 96, 256, 384, 384, and 256, respectively. e role of the pooling layer is mainly to reduce the size of the feature image after convolution. e nodes of the three fully connected layers are 4096, 4096, and 1000, respectively. SoftMax can classify 1000 categories.

Road Surface Recognition Model Based on RSRM.
e ALexNet network was pretrained on the ImageNet database with at least one million images offline, and the weights and parameters of each layer were obtained after training. e trained network has a strong ability to learn features, especially curves, edges, and contours of an image. To improve the efficiency of network training and reduce the training time, this study takes the trained ALexNet network as the pretrained model and transfers its parameters to the RSRM using fine-tuning transfer learning [41]. ALexNet, SVM, and BP use the classic structure. SVM algorithm is based on the characteristics of the road image for road color and texture feature extraction experiments.
Similarly, RSRM consists of a convolutional layer, pooling layer, fully connected layer, and SoftMax classification layer. By analyzing the characteristics of actual pavement images, nine typical pavement types are selected, as shown in Figure 4, focusing on nine typical road surface types; therefore, 9-label SoftMax is used to replace the original classifier in the ALexNet network. In addition, as shown in Figure 5, two fully connected layers are trained on the actual road pavement test set and to replace the original three fully connected layers. e number of nodes in the two fully connected layers are 4096 and 1000, respectively. rough the above steps, the problem of road surface image classification and recognition is solved.

Road Surface Data Acquisition System.
e road collection test vehicle was a sedan with a length of 3564 mm, width of 1620 mm, and height of 1527 mm. Its wheelbase was 2340 mm. e camera model was LeTMC-520. As shown in Figure 6, the camera was installed at the air intake grille in the front of the vehicle, at an angle of −10°from the horizontal grille. e installation height from the ground was 350 mm. In this study, considering the complex weather conditions in the actual driving process, three typical weather conditions, namely, cloudy, sunny, and rainy, were selected for road image data collection. Note that the images of the actual road test set are all taken by the vehicle during driving.

Journal of Advanced Transportation 7
In addition, a road surface data analysis system server configuration was performed on a desktop computer with a 64 bit operating system, 16 GB of memory, an AMD Ryzen 5 3600 6-Core Processor, and a GeForce GTX 1660 graphics processing unit.

Establishment of Road Surface Image Database.
e image standards were selected according to the typical pavement types: asphalt, concrete, grass, mud, rain, rock, soil, wet asphalt, and wet concrete, and the images with clear quality were used for the road surface image database (RSID). e sample size of each pavement was 2000, in which the training set and test set were divided in a ratio of 7 : 3.

Experimental Procedure
Step 1: Image preprocessing. Scaling and cropping operations are performed on all road surface images to ensure a uniform image size, which can meet the requirements of the neural network module in MATLAB.
Step 2: Building the training set and test set. RSID is divided into the training set and test set in a ratio of 7 : 3.
Step 3: Building the RSRM. Focusing on nine typical road surface types, the 9-label SoftMax is used to replace the original classifier in the ALexNet network. e next step is to use the trained ALexNet network as the pretrained model and transfer its parameters to the RSRM using fine-tuning transfer learning.
Step 4: Model training Model training that uses the stochastic approach initializes the model parameters; sets the momentum parameters, learning rate, and training time; and freezes the parameters of the five convolutional layers and pooling layers. rough the above, we replace the parameters of the two fully connected layers and 9-label SoftMax with a fresh new one.
Step 5: Model testing. e remaining 30% of the RSID was used as a test set to verify the accuracy and speed of the RSRM.

Experiment of Road Image Feature Extraction.
e role of the convolution layer is to extract features by performing a convolution operation on the image or feature map. First, we pretrained the improved ALexNet network model (5 convolutional layers) on the ImageNet dataset. Second, the weights of the shallow network structure after training were saved and transferred to the RSRM. Finally, to observe the feature extraction effect of RSRM more clearly, taking the mud image as an example, the output features of each convolution layer were visualized. Figure 7 shows the mud pavement image after preprocessing.
As shown in Figure 8(a), the preprocessed mud image is extracted with 96 feature maps through Conv1. e convolution layer mainly extracts edges and details of the image. After several convolution kernel operations in the convolution layer, the image retains most of the information of the original image. As shown in Figure 8(b), the convolved image is processed by the ReLU1 activation function, and the edge information and detailed information of the mud surface road image are more obvious. Figures 8(c) and 8(d) show the feature map after Conv3 and relu3. It can be seen from the figure that the convolution kernel can extract more edge information, and the outline of the mud road surface image is clearer. Figures 8(e) and 8(f ) show the feature map after Conv5 and relu5. It also reveals that as the number of convolutional layers increases from the first layer to the fifth layer, the resolution ratio of the image decreases, and the image output from the convolutional kernel becomes increasingly abstract.
According to the above image feature extraction experiments, the convolution layer integrates shallow features or underlying features to form more abstract features. is makes the expression of road information more comprehensive and can also use high-level abstract features for pavement classification and recognition.

Experiments of Road Surface Type Recognition Based on RSRM.
RSID contains 18000 images of nine pavement types, such as asphalt, wet asphalt, rain, concrete, wet concrete, soil, mud, grass, and rock. To verify the validity of the RSRM proposed in this paper, 70% of RSID were randomly selected as the training set, with a total of 12600 pieces, and the remaining 30% was used as the test set. ere were 600 images for each type of pavement in the test set, for a total of 5400 pavement images. Table 2 shows the RSRM training parameter setting. e test tolerance is the number of iterations that the loss of test set before network training stops can be greater than or equal to the previously smallest loss. is can stop training by setting the test tolerance when test loss is no longer Journal of Advanced Transportation 9 reduced, to avoid overfitting, save computer memory and improve training speed. Table 3 shows the classification results of test samples based on RSRM in this research. Table 4 lists the recognition results of some image based on RSRM.
According to Step 2, there are 600 samples per category in the test set; this is true for the asphalt pavement type. As can be seen in Table 3, an asphalt pavement image (600 samples) was misidentified, and the recognition accuracy was 99.8%.
is is because part of the asphalt pavement presents a dry-wet state, which makes it extremely similar to the image characteristics of asphalt pavement; thus, it is misidentified as wet asphalt. A total of 598 concrete pavement samples were correctly identified, and the remaining two were identified as soil and mud pavements, with an accuracy rate of 99.7%. e reason is that the color and image texture of some concrete, soil, and mud pavements are similar under dry conditions. e number of rain pavement samples correctly identified is 598, with the remaining two misidentified as wet asphalt and wet concrete pavements; meanwhile, the identification accuracy rate was 99.7%. For soil pavement, 599 samples were correctly identified, and the remaining one was classified as mud. A wet soil road surface often forms the mud surface, and the high probability of these two pavement features cooccurring in a single image is the main factor leading to false positives. e total number of wet concrete surfaces is 600, of which 580 are correctly recognized, 9 are identified as soil, one is identified as rock pavement, and the last 10 are identified as mud pavement.
us, the recognition accuracy of wet concrete pavement is 96.7%. is is because the color of the wet concrete pavement is brown-gray after being wet. e recognition accuracy of grass, rock, and wet asphalt are higher than other surfaces, which is due to the significant difference in color and texture features compared to other road images.

Experiments of Classification Method Comparison.
In this study, RSRM is compared against the ALexNet model, support vector machines (SVM), and backpropagation (BP) neural networks. e results are shown in Table 5. According to previous research, color and texture are the main features of road images. e SVM [42] classification model needs to extract road image features manually. In the three-color spaces of HSV, RGB, and YCM, there are nine color components of the road image, namely H, S, V, R, G, B, Y, C, and M. e gray-level cooccurrence matrix is used to extract four texture similar information of road surface images, such as contrast, correlation energy, and entropy. e BP neural network [43][44][45] has five layers, the number of nodes in each layer is 100, and the optimization algorithm uses stochastic gradient descent.
In this section, RSRM is compared with the BP neural network, SVM, and ALexNet models, focusing on the analysis of model prediction output, error performance, training time, and detection time.
As shown in Figures 9 and 10 , RSRM significantly improves accuracy of road surface identification compared to ALexNet. Specifically, RSRM converged at 216 iterations, realizing an accuracy of 96.38%. However, the ALexNet network has yet to converge after 500 iterations. erefore, transfer learning and optimization of the fully connected layer can effectively improve the training efficiency and accuracy of the model. e ALexNet model requires a longer training time and larger dataset to match the accuracy of RSRM. Figure 11 and Table 5 illustrate the identification accuracy of different methods. e average recognition accuracy of BP, SVM, ALexNet, and RSRM was 92.84%, 89.59%, 97.57%, and 99.48%, respectively. e accuracy of RSRM and ALexNet is more than 95%, which shows the superiority of deep learning methods. Traditional machine learning methods, such as SVM and BP neural networks, are not suitable for representing variations in illumination intensity due to their manual features. e SVM classifier is suitable for small datasets, which is why it has not achieved good results in road datasets. Table 6 shows the average time taken by each learning model to classify a test image.
e test times of all models for a given road image are almost the same. e results show that the accuracy of the deep learning model is higher than that of the traditional machine learning method.      training weight and overfitting. However, a CNN can effectively reduce the training weight and improve the training speed using a convolution operation. In addition, this study proposes a method for testing tolerance thresholds to stop model training and reduce the number of fully connected layers. Based on this, the SoftMax classifier for nine labels is designed. e training time for the proposed method was 1.6 h, BP was 2.2 h, ALexNet was 1 h, and RSRM required 0.4 h (RSRM does not include pretraining time), and it took 0.14 s to classify a test image. e training speed of RSRM is four times that of SVM and five times that of BP. Meanwhile, the recognition accuracy was 1.91% higher than that of ALexNet, 6.64% higher than BP, and 9.89% higher than SVM. RSRM can effectively improve the training efficiency and accuracy of the model.
In summary, the BP neural network is not suitable for recognizing multiple ranges of road image databases because of the large number of neurons, the number of network layers cannot be too large, and the computing time is long, which can easily lead to overfitting and inconvenience in processing high-dimensional data. SVM feature extraction is complex and only suitable for small datasets. e proposed method not only achieves fast and high-precision recognition of road surface types in a short training time but also meets the perception requirements of actual road conditions.

Conclusion
is paper presents a pavement identification method based on an improved ALexNet model. First, the ALexNet network model is pretrained on the ImageNet dataset offline. Second, the weights of the shallow network structure after training, including the convolutional layer, are saved, and migrated to the proposed model. In addition, the fully connected layer fixed to the shallow network is replaced by 2 to 3, which improves the training accuracy and shortens the training time, and the 9-label SoftMax replaces the original classifier in the ALexNet network. In addition, the proposed method is compared with the BP neural network, SVM, and ALexNet models, focusing on the prediction output, error performance, and rapidity of the model. e results show that the recognition accuracy of RSRM is 99.48%, which is higher than that of ALexNet, BP, and SVM by 1.91%, 6.64%, and 9.89%, respectively. Moreover, this paper proposes a method for testing tolerance thresholds to stop model training and reduce the number of fully connected layers, which can save 0.6 h of training time and increase the training speed to four times that of SVM and five times that of BP. In conclusion, the deep learning model not only has higher accuracy than the traditional machine learning method but also can achieve higher recognition accuracy in a shorter time, which can meet the perception requirements of actual road conditions. e research method is not only suitable for road recognition, but also suitable for human-vehicle-road collaborative perception of the vehicle environment.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.