Convolutional Neural Networks for Structural Damage Identification in Assembled Buildings

This paper investigates the migration learning AlexNet-based algorithm for the recognition of assembly building structures and the recognition based on an improved algorithm, and gives an analysis of the results. The structure of AlexNet convolutional neural network is introduced and the basic principles of migration learning are analysed. The optimal model for the ceiling damage recognition task was obtained through parameter adjustment, with a test accuracy of 96.6%. The maximum improvement in test accuracy is about 4%, with 82.6% and 79.7% for beam and column damage recognition and inﬁll wall damage recognition respectively.


Introduction
In the aftermath of a disaster, rapid assessment of the extent of damage has become an important basis for the allocation of emergency search and rescue resources following an earthquake [1]. e traditional method of assessment mainly involves a group of trained professional engineers and academic researchers visiting the site to survey the extent of the damage. While this approach is accurate, the safety of the assessment team personnel is not guaranteed given the occurrence of aftershocks, and the unsafe elements within the building, and the whole process lasts longer and is less efficient. But with another development in artificial intelligence, computer vision is starting to come into the picture. Intelligent classification and labelling of image data using deep learning methods has become the new craze [2,3]. e rapid assessment system for postearthquake building damage proposed in this paper has no restrictions on the professionalism of the photographers and can even use drones to enter the interior to take pictures, which greatly accelerates the efficiency of postearthquake assessment and provides suggestions for further investigation [4].
In order to enable people in the earthquake zone to work safely and urgently after the earthquake, a rapid assessment of postearthquake building damage is needed, and relatively safe houses can be used to prevent people from having nowhere to live and sleeping on the streets. is will provide the basic basis for the government to make scientific and effective regulation [5]. e assessment is divided into structural and nonstructural elements, with structural elements including beams and columns, and nonstructural elements including infill walls and ceilings [6]. e damage to structural elements is directly related to the safety of the overall building. From the photographic data recorded in the Wenchuan and Lushan earthquakes, the damage to beams and columns in houses that did not collapse was mainly as follows: slight cracks in the beams and columns, crushed external concrete, crushed concrete with yielding steel reinforcement, example pictures are shown in Figure 1.
Since the reform and opening up, with the development of the economic level, the quality of China's urban and rural buildings has improved significantly, and in the medium earthquake, there is rarely any damage to the load-bearing elements of houses.
rough a large number of postearthquake disaster picture data the analysis revealed that the infill walls, as the first line of seismic protection for reinforced concrete frame structures, were very seriously damaged, as shown in Figure 2.
Convolutional neural networks are trained on a large amount of data. eoretically, the richer the data, the better the learning ability of the trained convolutional neural network and the better the classification results. CNN have been known for their excellent performance in image classification since their inception. In recent years, the development of CNN has become more and more rapid and has gradually surpassed the human's own ability to classify images, which lays the foundation for using CNN in this paper to perform damage recognition of buildings after earthquakes [7]. In deep learning, training a deep neural network usually requires a large amount of labelled data as the training set, however, in many practical situations, obtaining sufficient data samples is very difficult. Training a model with an inadequate training set runs the risk of overfitting the model, resulting in poor generalisation of the trained model and failure to achieve the expected accuracy on the test data set [8]. e idea of migration learning, which has emerged in recent years, better solves the problem of inadequately large datasets in deep learning. In this paper, we collected a large amount of image data from the Institute of Engineering Mechanics of the China Earthquake Administration in the Wenchuan and Lushan earthquakes, which laid a solid foundation for the establishment of the dataset.
Compared with traditional recognition methods, we do not need to personally extract the damaged areas of the components for feature input, and can avoid the disadvantages of manual feature extraction, for example: (a) it is generally difficult to express complex high-level semantics of images based on some lifting layer feature information of the graph (e.g. colour, texture, etc.), so the generalisation ability is generally weak. (b) ese methods are generally designed for specific applications in specific domains, and their generalisation and migration capabilities are mostly weak. e convolutional neural network-based recognition method only requires us to build a convolutional neural network with completed training and upload the image in the visual operation interface to recognize the result, ensuring the intelligence, simplicity and practicality of the system. In the future, there is still a broad prospect for the development of postearthquake building damage recognition algorithms based on convolutional neural networks.

Related Work
Nowadays, more people have become aware of the importance of deep learning, especially in the field of machine learning, which has become more effective. e reason for this is that deep learning has advanced and developed very significantly in several areas, such as sound and text, and most crucially, the use of Internet technology, which has led to an artificial intelligence revolution. Reference [9] proposes a method capable of recovering the properties of cracks. In this method, crack points are first located by means of state-of-the-art crack detection techniques. en, the skeletal structure of each point is identified using image refinement methods. ese structures are integrated into the distance field of the crack point by means of a distance transformation. In this way, crack width, length and direction can be automatically recovered. Reference [10] proposes a new method for detecting spalling regions on the surface of reinforced concrete columns. e properties of the spalling regions are obtained from the image data, according to which the spalling regions are first separated using a threshold-holding algorithm based on local entropy [11]. Based on this, a new global adaptive thresholding algorithm was combined to measure the longitudinal reinforcement  (spalling depth in the column) and the spalling length in the column. A new detection method for postdisaster image classification is proposed in [12], which has four steps: firstly raw data screening, secondly scene classification, then target detection and finally damage assessment. e method was validated in the classification of specific examples. Distinguishing themselves from previous work, [13] et al. propose a new method for structural damage detection using deep CNNs that automatically obtains information from low-level waveform signals rather than relying on manual labelling. e method implements structural damage detection based on data alone without relying on human expert knowledge, and numerical simulations are performed to obtain the response data of the structure, followed by data preprocessing, data augmentation, and finally training of the augmented dataset by a deep CNN to estimate its classification performance for damage localization, and demonstrates superiority compared to another damage extraction method [14].

Data Set Creation.
Most of the data set in this paper comes from the picture data of the Wenchuan and Lushan earthquakes provided by the Institute of Engineering Mechanics of the China Earthquake Administration, and a small part of the data comes from the Internet, as shown in Figure 3.
In this paper, according to the damage characteristics of the building, the frame building is divided into two parts: structural elements, beams and columns, and nonstructural elements, infill walls and nonstructural ceiling elements. e specific classification levels and condition descriptions are shown in Table 1: 3.2. Data Preprocessing. (a) Samples from the seismic images were selected to meet the experimental requirements and the target region of interest in the images was made central by manual processing. e histogram of a grey-scale image in the grey-scale range [0, L-1] is defined as shown in (1): Here r k --kth level of grey in the image; n k --Number of pixels in the image with a grey level of r k .
where M --Number of rows of grayscale charts; N --Number of columns in the grey-scale chart; MN --Total number of pixels in the image. 1/MN is the histogram defined in the normalisation (2), which after normalisation p(r k ) can be interpreted as an estimate of the probability that grey level r k will occur in the image. e histogram is the basis for a variety of other image processing techniques, and the spatial domain processing of images can be expressed as equation (3) g where f (x, y) is the input image. g (x, y) is the transformed image. T is a transform function defined on the neighbourhood of the point (x, y). e smallest domain of this equation is 1 × 1, in which case g depends only on the value of f at point (x, y), while T in equation (3) becomes a grey-scale transform function shaped as in equation (4): e immediate effect of equation (4) is to convert the input grey-scale values to other values for the purposes of contrast stretching, binarisation, etc. Histogram equalisation is a special conversion method in the form of equation (4).
ere exists a very important transformation function in the image domain of the following form: where w--Integral dummy variables; p r (w) --e probability density function of a random variable r. e integral part of the right-hand side of the formula is the cumulative distribution function of the random variable r. Because the probability density function of the random variable r is always positive, the cumulative distribution function is increasing and has a maximum value of 1. erefore, the transformation T (r) is an increasing function and has a value range of [0, L-1]. When the random variable is a continuous random variable, the random variable s obtained by the above transformation will be a uniform probability density variable. e pixel values in an image can be treated as if the interval were [0, L-1] a discrete variable, and according to equation (5) the histogram equalisation is obtained, which changes in the following form: In equation (6), rk, MN, nk, and pr(rk) all correspond to the converted values as described previously, and L is the number of grey levels in the image. e histogram equalisation conversion formula ensures two important conditions: the conversion function T (r) is a monotonically increasing function on the interval 0 ≤ r ≤ L-1; on the interval 0 ≤ r ≤ L-1, there is 0 ≤ T (r) ≤ L-1.

Data Enhancement.
In the case of insufficient data samples, data augmentation is often used to expand the sample. A common means of data augmentation is geometric transformation, and in this paper the sample is expanded by means of rotation and mirroring. In the field of image recognition, the rotation, translation and mirroring of an image do not actually change the essential information of Mathematical Problems in Engineering the image. Moreover, by rotating, panning and mirroring the image, the sample can be obtained from multiple angles and directions with different sizes [15]. is transformation can improve the recognition accuracy of the convolutional neural network, because it can effectively avoid the possibility that the convolutional neural network is not well trained due to the different shooting equipment or shooting angles, thus causing errors in recognition. In this paper, the

Based on the Improved AlexNet
In this paper, AlexNet has a low accuracy for the task of identifying the extent of damage to beams and columns and the extent of damage to infill walls, and cannot be significantly improved by simply adjusting the hyperparameters, so improvements to the original AlexNet are considered. In recent years, many scholars have modified and enhanced AlexNet to improve neural network recognition accuracy. In [16], the WN (Weight Normalization) was proposed to replace the LRN layer with the LRN (Local Response Normalization), and the WN was placed after all pooling layers to improve the accuracy of AlexNet model training. In [17], a fusion segmentation algorithm of ReLU6 and Swish was proposed to address the problem of partial failure to update the weights and gradient explosion of the ReLU activation function in AlexNet, which improved the convergence speed and accuracy of AlexNet model training and also alleviated the occurrence of overfitting [18].

An Improved AlexNet Network Structure.
However, as the number of layers increases the disadvantages of a small data set become apparent, as not only does the test error become higher when the network is deeper, but its training error is surprisingly higher as well. Deeper networks are associated with gradient disappearance and explosion problems, which hinder the convergence of the network, and this phenomenon of deeper networks with reduced performance is generally referred to as the degradation problem. However, in order to prevent the model from being trained poorly due to the increase in parameters after the addition of the convolutional layer, we choose to combine the residual structure of ResNet with AlexNet to achieve the goal of increasing the convolutional layer without a surge in parameters, so as to improve the problem of low accuracy and loss of convergence in multitask recognition [19,20].
Assuming that the input to a particular segment of the neural network is x and the desired output is H(x), i.e., H(x) is the desired complex potential mapping, it would be more difficult to train such a model if it were to be learnt; recalling the previous assumption, if a more saturated accuracy has been learnt (or when the error in the lower layers is found to become larger), then the next learning goal shifts to the learning of a constant mapping, i.e. making the input x approximate the output H(x) to keep from causing a drop in accuracy in the later layers.
In Figure 5, the residual network structure is shown by means of "shortcut connections", where the input x is passed directly to the output as the initial result, and the output is H(x) � F(x) + x. If F(x) � 0, then H(x) � x, which is the above mentioned constant mapping. us, ResNet is equivalent to changing the learning goal from learning a complete output to the difference between the target value H(x) and x, also known as the residual F(x) � H(x)-x. us, the training goal later is to approximate the residual result to 0, so that the accuracy does not decrease as the network deepens. e two structures in Figure 6 are for ResNet34 and ResNet50/101/152 respectively, and their purpose is mainly to reduce the number of parameters. e left figure shows two 3 × 3 × 256 convolutions with the number of parameters: 3 × 3 × 256 × 256 � 1179648, while the right figure first reduces the 256-dimensional channels to 64 dimensions by 1 × 1 convolution and finally recovers them by 1 × 1 convolution, using the overall number of parameters: 1 × 1 × 256 × 64 + 3 × 3 × 64 × 64 + 1 × 1 × 64 × 256 � 69632. e number of parameters is reduced by a factor of 16.94 compared to the left graph, so the main purpose of the right graph is to reduce the number of parameters and hence the computational effort [21].

Mathematical Problems in Engineering
For a regular ResNet (Figure 6(a)), basic block can be used for networks with 34 layers or less. (Figure 6(b)) Bottleneck is used for deeper networks (e.g. 101 layers) with the aim of reducing the number of computations and parameters. is paper therefore considers adding the Basic block module after the convolutional layer of AlexNet, replacing the three fully-connected layers with one. e AlexNet convolutional layer is used to extract the feature information of the images in the dataset, and a comparison experiment is set up to add one Basic block block, two Basic block blocks and three Basic block blocks to the classification layer, and then the accuracy of the network is tested after the training is completed. e network improvement scheme is shown in Figure 7.

Vibration Recognition.
e time-frequency response of the sensor was tested at different frequencies and accelerations. e time and frequency response of the sensor is shown in Figure 8 for external excitation frequencies of 90 Hz and 350 Hz and accelerations of 1.0 g and 3.0 g respectively. e output waveform of the FBG vibration sensor is a sinusoidal waveform, as can be seen from the time domain signal; Figure 8    To verify the amplitude-frequency characteristics of the sensor, the maximum drift of the central wavelength of the sensor was investigated under certain conditions of acceleration, corresponding to different frequencies. e maximum value of the centre wavelength of the sensor was recorded and the curve is shown in Figure 9. It can be seen that the inherent frequency of the sensor at this acceleration is approximately 543.9 Hz, which is in good agreement with the simulated result of 568.6 Hz. e slight deviation in frequency is due to the sensor assembly error and other reasons.
e spectrograms of the sensor were measured at different frequencies and acceleration excitations. e spectra of the sensor are shown in Figure 10 for external excitation frequencies of 90 Hz and 350 Hz and accelerations of 1.0 g and 3.0 g respectively. It can be seen that the FBG spectrum did not broaden with increasing acceleration, indicating that the chirp phenomenon was effectively avoided. e acceleration response of the sensor was tested at a certain frequency. e acceleration response of the sensor was tested at a frequency of 300 Hz with an external excitation of 0.2 g. e results showed that the acceleration and wavelength shift of the sensor were quasi-linear, with a linear fit of 99% or more and good linearity; the sensitivity of the sensor was 6.7 pm/g; the repeatability error of the sensor was about 1.7%, with good repeatability

Comparison of Whether Damage Is Identified or Not.
In Scheme 1 for determining whether damage is present, as shown in Figure 11, the accuracy of both CNN1 and CNN2 is below 70% and the accuracy of LSTM is over 95%, so only LSTM can meet the requirements in this step of damage identification. e accuracy of CNN2 is improved by only 2.23% over CNN1, while LSTM is improved by 35.42% over CNN1, which indicates that in  Mathematical Problems in Engineering Scheme 1, the e increase in sample data is not critical but the method of recognition, and therefore the LSTM method is more appropriate than the CNN method in Scenario 1.
In Scheme 2, as shown in Figure 11, the accuracy of all three methods is relatively high, so all three methods are possible. However, the recognition in Scheme 2 is more tedious and has to be divided into 10 to identify each one.
Combining the accuracy of the three methods of Option 1 and Option 2, it is more appropriate to adopt Option 1 and use the LSTM method when determining whether the damage is.
In the recognition of damage localization, as shown in Figures 12 and 13, the accuracy of damage localization at each location and the overall average accuracy are LSTM > CNN2> CNN1, where the accuracy of both CNN1 and CNN2 is about 75, while the accuracy of LSTM can reach 91.76%, among which only LSTM meets the requirements.

Conclusions
rough the comparative experiments in this paper, it can be seen that these two methods effectively alleviate the AlexNet network's overfitting problem. At this point, the improved convolutional neural network algorithm has effectively solved the problem of framed building vandalism. e evaluation of the entire postearthquake frame building will be carried out quickly. Experimentation with the number of parameters involved in training shows that with the feature extractor training approach, the model only needs to retrain the function of the last fully-connected layer.    Data Availability e experimental data used to support the findings of this study are available from the corresponding author upon request.