Development of Deep Learning Model for the Recognition of Cracks on Concrete Surfaces

. This paper is devoted to the development of a deep learning-(DL-) based model to detect crack fractures on concrete surfaces. The developed model for the classiﬁcation of images was based on a DL Convolutional Neural Network (CNN). To train and validate the CNN model, a database containing 40,000 images of concrete surfaces (with and without cracks) was collected from the available literature. Several conditions on the concrete surfaces were taken into account such as illumination and surface ﬁnish (i.e., exposed, plastering, and paint). Various error measurement criteria such as accuracy, precision, recall, speciﬁcity, and F1-score were employed for accessing the quality of the developed model. Results showed that for the training dataset (50% of the database), the precision, recall, speciﬁcity, F1-score, and accuracy were 99.5%, 99.8%, 99.5%, 99.7%, and 99.7%, respectively. On the other hand, for the validating dataset, the precision, recall, speciﬁcity, F1-score, and accuracy are 96.5%, 98.8%, 96.6%, 97.7%, and 97.7%, respectively. Thus, the developed CNN model may be considered valid because it performs the classiﬁcation of cracks well using the testing data. It is also conﬁrmed that the developed DL-based model was robust and eﬃcient, as it can take into account diﬀerent conditions on the concrete surfaces. The CNN model developed in this study was compared with other works in the literature, showing that the CNN model could improve the accuracy of image classiﬁcation, in comparison with previously published results. Finally, in further work, such model could be combined with Unmanned Aerial Vehicles (UAVs) to increase the productivity of concrete infrastructure inspection.


Introduction
Various infrastructures use concrete materials such as bridges, nuclear reactors, dams, and buildings. However, these construction facilities are affected by concrete damage after years of service [1]. Notably, one of the significant impacts that severely affected the durability of concrete and reinforced one is the presence of cracks [1]. Indeed, these cracks cause many problems for the reinforcement, such as corrosion and chemical attack [2]. Consequently, structural damage identification is shown to be inevitable to reduce the risks [3].
As the identification of cracks is crucial for the assessment of concrete damage, various techniques have been proposed for the maintenance of such infrastructures. Structural health monitoring mainly consists of using sensors to detect the changes in the stiffness of infrastructures as well as the initialization of corrosion [4][5][6]. However, such monitoring technique is commonly integrated into modern construction facilities. For existing infrastructures, especially concrete structures from the 1960s, this technique remains challenging [7]. Besides, the cost for the maintenance of substantial concrete infrastructure is currently expensive; for instance, an average budget of five billion euros per year is used in Europe, as mentioned by the European HEALCON project [8], for the maintenance and repair activities. Consequently, the development of more robust and efficient techniques is crucial, aiming at saving time and cost for the maintenance of substantial concrete infrastructures, especially for those presumably exceeding their expected service life [2].
One of the traditional methods that has been used for crack detection and propagation is the Finite Element Method (FEM). Many research works have been done in the literature regarding this problem. For example, in the work of Nahvi and Jabbari [9], the authors have combined experimental modal data and FEM to study the crack detection within a cantilever beam. In another work of Li et al. [10], the crack location inside structures has been studied using Wavelength Finite Element Methods (WFEMs). Other works on crack detection of beams and structures using FEM can be found in [11][12][13][14][15][16]. In addition, FEM can also be combined with a machine learning algorithm for crack detection. For example, in the work of He et al. [17], a genetic algorithm-based model optimized by FEM has been used for crack detection in a rotorbearing model. e main difficulty of the crack detection problem using FEM is that the models are usually very complex and costly in terms of computational time. Indeed, using FEM to deal with cracks, even small ones require extremely refined mesh, which leads to problems with a high number of degrees of freedom.
Together with sensor equipment, many computer vision techniques have been proposed for the detection of cracks on the concrete surface [18][19][20]. ese vision techniques were mainly developed based on deep learning (DL) algorithms for image processing, for instance, Convolutional Neural Networks (CNNs). Indeed, DL-based algorithms can provide many advantages to overcome the limitations of conventional image processing techniques [19], especially for crack detection [21]. As an example, Olivera and Correia [22] have developed an automatic crack detection based on the DL technique for assessing the damage in the Portuguese road system. In addition, Chen et al. [23] have improved the recognition of cracks in images using a CNN model. Besides, Nhat-Duc and Nguyen Quoc-Lam [24] have proposed a classification model using Support Vector Machine for the detection of cracks on asphalt pavement. e detection of cracks in bridge infrastructures has been successfully investigated by Xu et al. [25] using a CNN model. In another study, Nhat-Duc et al. [26] have proposed a hybrid CNN model based on the use of metaheuristic techniques for training the DL algorithm and application in crack recognition in the pavement surface. Obviously, DL-based techniques exhibit a significant ability to detect concrete crack damage robustly and reliably [27,28]. Besides, a pretrained image-based recognition DL model could assist in the development of an automatic damage inspector, facilitating the detection of damage. In Gönenç-Sorguç [29], a comparison of several pretrained CNN models was investigated for the detection of cracks in building using AlexNet, ResNet, GoogleNet, and VGG. Indeed, such a pretrained DL model could be used for classifying quickly the images collected from vision capturing equipment. In several cases of large infrastructures such as bridge decks or high buildings, Unmanned Aerial Vehicles (UAVs) could be an appropriate choice as vision capturing equipment [30,31]. As the development of UAV is highly increased recently, the combination of UAV and pretrained DL models could respond effectively and efficiently to the difficulty when maintaining large concrete infrastructures, saving time and cost [32,33].
In order to overcome the difficulties of traditional approaches such as the Finite Element Method or other machine learning models that require complex input data and are costly in terms of computational time, the present study focuses on the development of an image-based CNN recognition model for the detection of cracks on concrete surfaces. To this aim, a database containing 40,000 images was served for the training and testing of the developed DL model. Various quality assessment criteria such as accuracy, precision, recall, specificity, and F1-score were employed for checking and validating the developed model. e structure of the present paper is organized as follows.
e image database, as well as the research method, is presented in Section 2. e optimization of the image-based CNN model is described in Section 3, followed by the results related to the prediction capability of the proposed model. e final section concludes this study with several discussions. e developed model represents a high potential technique to be used as a concrete crack detection tool that can combine with an automatic workflow involving many types of efficient equipment such as UAV.

Database.
In this work, a database of images with cracks was collected from the available literature [29,34]. Derived from the walls and floors of several concrete buildings at the Middle East Technical University, the database contains two categories of the concrete surface, no cracks and with cracks. e distance between the concrete surface and the camera was approximately 1 m. Both the no crack and crack categories contain 20,000 images, and each image exhibits 227×227 RGB pixels. Several samples of the database are shown in Figure 1. e images were captured on the same day with similar illumination. However, as various concrete surfaces were investigated (i.e., exposed, plastering, and paint) at different buildings, the variation in terms of surface finish and lighting conditions exists in these images. It should be noticed that this final database was generated from 458 high-resolution images (i.e., 4032×3024 pixels) as a data augmentation technique [35]. e dataset was randomly split into a training and validation dataset at a 50/50 ratio. Summary information of the database is indicated in Table 1.

Convolutional Neural Network (CNN)
. CNN can be classified as a multilayer neural network whose main objective is to process two-dimensional input data, such as texts or images. As the definition of the neural network, CNN consists of multiple layers; each layer is composed of several neural nodes that have their own function. It is worth noting that the nodes in the same layer of the model are not interconnected. In this work, the CNN algorithm was selected for the development of an image-based DL model, inspiring by various success works of CNN for image classification in the literature. A method for image segmentation based on CNN was proposed by Arbelaez et al.

2
Applied Computational Intelligence and Soft Computing [36]. In another study, a road detection system for selfdriving cars was successfully developed by Teichmann et al. [37]. Last but not least, Camilo et al. [38] proposed a CNNbased mapping for solar photovoltaic using aerial imagery. In terms of structure, the CNN model consists of 5 main layers as follows as depicted in Figure 2 [39][40][41][42]: (i) Input layer: this layer contains the image input data. (ii) Convolutional layer: the nodes in this layer work as filters whose main objective is to detect features in an image input using a convolution operator. is type of filter results in a map of activation called a feature map. (iii) Pooling layer: the main objective of this layer is to downsample the feature maps that are obtained from the convolutional layer. Technically, the results of the convolutional layer can be directly given to the classifier. However, this process can be very costly in terms of computational resources, especially with high-resolution image input data. e pooling layer provides an approach of downsampling the feature maps by summarizing the presence of features in patches. e results of the convolution layer are transferred to the pooling layer through a nonlinear activation function. (iv) Fully connected layer: the main objective of this layer is to take the output of the previous layer (i.e., the pooling layer) and then apply weights to predict the correct labels. (v) Output layer: this layer contains the prediction results of the problem.
As revealed in many studies, CNN exhibits several advantages compared to the conventional backpropagation neural network [40,43]. CNN could also reduce the complexity of the model. More precisely, the weight parameters of CNN could be shared between neighborhood regions. erefore, an acceleration in the training process could be obtained. For image application, this feature is vital because the neighborhood regions are usually carrying relevant information to the considered point [44,45]. Besides, CNN exposes higher capability than a conventional neural network in feature extraction, especially for capturing local information (e.g., neighbor pixels in an image). Moreover, CNN might need fewer samples for the learning phase as well as a lower chance of overfitting than conventional

Quality Assessment Criteria.
In this work, the error measurements of the classification task are designed in Figure 3, where (i) TP (i.e., true positive) explores the number of cracked images that are correctly identified as cracks (ii) TN (i.e., true negative) presents the number of nocracked images that are correctly found as no cracks (iii) FP (i.e., false positive) shows the number of cracked images that are incorrectly classified as no cracks (iv) FN (i.e., false negative) exposes the number of nocracked images that are incorrectly ranked as cracks Based on these definitions, several quality assessment criteria could be computed, such as the following [25]: (i) Accuracy is defined as follows: (ii) Precision is defined as follows: (i ii)Recall is defined as follows: (i v)Specificity is defined as follows: (v) F1-score is defined as follows:  [47,48]. Parameters during the training progress are indicated in Table 2. e model is evaluated every iteration using the validation dataset.    Table 3 details the proposed CNN's architecture, including 10 layers such as input layer, convolutional layer, ReLU layer 1, fully connected layer 1, fully connected layer 2, batch normalization layer, ReLU layer 2, fully connected layer 3, softmax layer, and classification output layer. Sizes of activation, weights, and bias parameters are also indicated in Table 3 for each layer. It should be noticed that such an architecture was set based on the Deep Network Designer application [48]. Figures 4 and 5 show the training progress in terms of accuracy and loss, respectively. e corresponding values of accuracy and loss using the validation dataset are also highlighted. It is seen in Figures 4 and 5 that the training phase reaches a convergence after about 1200 iterations. Besides, good results of accuracy and loss were also obtained for the validation dataset.

Model Performance.
In this section, the performance of the trained CNN model is presented. e capability of the model in detecting cracks is shown in Figure 6 for several samples. It is seen that the model can detect the cracks based on the contrast between the background and the cracks. Figures 7(a) and 7(b) show the confusion matrices of the training and testing data, respectively. Other quality assessment criteria are highlighted in Table 4. It is seen that, for the training dataset, the precision, recall, specificity, F1score, and accuracy are 99.5%, 99.8%, 99.5%, 99.7%, and 99.7%, respectively. On the other hand, for the testing dataset, the precision, recall, specificity, F1-score, and accuracy are 96.5%, 98.8%, 96.6%, 97.7%, and 97.7%, respectively. erefore, the CNN model may be considered valid because it performs the classification of cracks well using the validating data. It is also confirmed that the      Figure 8, classified into three main categories, including images with cracks in the corners, images with low resolution, and images with too small cracks, respectively. In these cases, the CNN model could not perform the recognition task well because the contrast between the cracks and the background is poor [23].
In the first configuration, the cracks only occupy a small portion of the image. Consequently, the chance of the    Applied Computational Intelligence and Soft Computing detection task is reduced. On the other hand, although all the images were captured on the same day for illumination condition purposes, however, as many buildings were investigated, the variation in the obtained images was inevitable. In addition, the detection for small cracks, especially for those at the pixel level, by using the image-based DL technique remains challenging [49].
Nonetheless, without solving complex equations, the CNN model was optimized for classifying the cracked images efficiently, saving time, and avoiding high computational costs. e performance of the developed CNN model was quantified based on various quality assessment criteria. A highlight of previous studies involving the reference, the training function, and the number of data, the size of images,  Applied Computational Intelligence and Soft Computing the training/testing ratio, and the values of quality assessment criteria is given in Table 5. In terms of the value of quality assessment criteria, the proposed CNN model in this study improves the classification of cracked images, making it even more accurate than previously published results. However, different types of cracks were not considered yet in these works.

Future Work for Practical Application.
As revealed in the introduction, a pretrained CNN model could assist in the development of an automatic damage inspector for concrete infrastructures [32]. A working process of the envisioned automatic system is shown in Figure 9. First, the images of concrete infrastructures (i.e., bridge, building, etc.) are collected as large datasets by drones, as this equipment could increase the productivity for image capturing. Second, all images are sent to the treatment center for processing and classification using the pretrained CNN model. Finally, the AI-assisted damage inspector gives an evaluation and feedback. As the developed CNN model could work with large datasets, it is expected that the algorithm could be helpful for experts in damage assessment by increasing yielding, saving time, and cost. However, it should be noticed that such a system should have the ability to be corrected by experts because human expertise is always crucial.

Conclusion and Outlook
is work was devoted to the development of a DL model for the classification of cracked and no-cracked images captured on concrete surfaces. A dataset containing 40,000 image samples of crack and noncrack labels was extracted from the available literature to train and validate the proposed model. e CNN model was trained for applying to 227 × 227-pixel images. e model achieved excellent classification performance, for the training dataset, the precision, recall, specificity, F1-score, and accuracy were 99.5%, 99.8%, 99.5%, 99.7%, and 99.7%, respectively, whereas, for the testing dataset, the precision, recall, specificity, F1-score, and accuracy were 96.5%, 98.8%, 96.6%, 97.7%, and 97.7%, respectively. As various concrete surfaces in different buildings were studied (i.e., exposed, plastering, and paint), thus the error measurements of the CNN model were in an accepted range.
However, in further research, different types of cracks should be classified (i.e., ranking by the thickness or density of cracks). Consequently, more classes will appear in the classification problem. erefore, efficient training algorithms should be investigated, including metaheuristic techniques. Nonetheless, an efficient tool for the classification of cracks with different sizes may be useful for maintenance and repair procedures. Moreover, the detection of cracks at pixel level should be considered in further researches. Coupling between structural health monitoring and DL-based techniques should be further investigated for combining the feature of each method. Finally, other deep learning approaches can be further applied to improve the performance of the prediction problem. For example, in the work of Ieracitano et al. [50], the authors have used a model that is a combination of unsupervised learning autoencoder and supervised learning multilayer perceptron for defect detection of nanomaterials. e obtained results have been proven to be very promising, which outperformed other classical machine learning approaches. It is then interesting to apply such model to the crack detection problem. In another work of Shengqi et al. [51], a deep learning model using feature visualization and quality evaluation has been introduced for the defect recognition problem of the steel surface. is model can also be a good candidate for the crack detection problem for our future works.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.  Figure 9: Envisioned working process of AI-assisted damage inspector for concrete infrastructure. 8 Applied Computational Intelligence and Soft Computing