Machine Learning-Based Fragility Assessment of Reinforced Concrete Buildings

In the past, large earthquakes caused the collapse of infrastructure and killed thousands of people in Pakistan, a seismically active region. Therefore, the seismic assessment of infrastructure is a dire need that can be done using the fragility analysis. This study focuses on the fragility analysis of school buildings in Muzaffarabad district, seismic zone-4 of Pakistan. Fragility curves were developed using incremental dynamic analysis (IDA); however, the numerical analysis is computationally time-consuming and expensive. Therefore, soft computing techniques such as Artificial Neural Network (ANN) and Gene Expression Programming (GEP) were employed as alternative methods to establish the fragility curves for the prediction of seismic performance. The optimized ANN model [5-25-1] was used. The feedforward backpropagation network was considered in this study. To achieve a reliable model, 70% of the data was selected for training and 15% for validation and 15% of data was used for testing the model. Similarly, the GEP model was also employed to predict the fragility curves. The results of both ANN and GEP were compared based on the coefficient of determination, R2. The ANN model accurately predicts the global drift values with R2 equal to 0.938 compared to the GEP model having R2 equal to 0.87.


Introduction
Earthquake is one of the major natural disasters which has caused large destruction in many parts of the world in terms of economic and human loss [1,2]. According to World Health Organization (WHO), earthquakes killed about 747,234 people and caused an economic loss of nearly 661 billion dollars worldwide from 1998 to 2017 [3]. Most of the fatalities in earthquakes are due to the collapse of buildings and infrastructures in seismic-prone countries [4]. Pakistan is also located in a highly active earthquake belt where many devastating earthquakes have jolted the region in the past.  [5][6][7]. In Pakistan, the poor infrastructure design and construction have caused dead consequences, especially the collapse of school buildings in the 2005 earthquake which has caused the death of 19,000 students and hugely affected the economy of the country [8]. Nearly 2/3 rd of the academic buildings were completely collapsed [9]. In Pakistan before the 2005 earthquake, the majority of the constructions were designed without incorporating the detailed seismic design provisions and use of nonengineered construction. Old practices were carried out in the design of buildings and bridges [10][11][12][13][14]. After the 2005 earthquake, the government emphasized the establishment of seismic codes which led to the Building Code of Pakistan [15,16]. Researchers have performed analytical studies that a large magnitude earthquake can happen in Pakistan in the future [17]. erefore, it is important to evaluate the structures against possible earthquakes. e fragility curves, like many other techniques, can be used for the evaluation of the seismic vulnerability of buildings [18]. Fragility is the probability that the structural response (demand) due to earthquake ground motion exceeds the capacity of the structure [19]. ese curves can be beneficial for retrofitting the structures and planning for future possible earthquakes in highly seismic zones such as Pakistan. Fragility curves can be established using judgemental, empirical, or analytical procedures [20,21]. However, analytical techniques are the most commonly practiced methods using available past recorded earthquake data and structural details [22]. Researchers have established the fragility curves for 55-story core-wall RC buildings in the Philippines using two damage limit states: damage control and collapse prevention [23].
Extensive research has already been carried out on the development of fragility curves for buildings, bridges, and dams using numerical techniques [24]. However, limited literature is available to develop fragility curves using soft computing techniques. Moreover, it has also been observed that nonlinear fragility curves demand substantial analysis time and itself is a resource extensive job [19]. is article addresses the previously mentioned issues, machine learning techniques; i.e., ANNs and GEP were deployed to predict the results of conventional approaches [25,26].

ANN and GEP Models
2.1. ANN Model. Artificial Neural Network (ANN) is a machine learning (ML) technique that works on the concept analogous to the human brain system. It was developed as a simplified model of brain function.
ere are billions of neurons (brain cells) that are interconnected in the human neural system, and these cells allow conveying signals from one neuron to the neighbouring ones [27]. It is used for solving complex problems as a powerful regression tool in various fields. e important feature of ANN is its ability to learn from experience and examples and to predict meaningful solutions. ere are numerous applications of ANNs such as prediction, classification, data association, and filtering of data [28]. In the field of engineering, it has got a significant role in terms of time saving and computational cost [29,30]. Limited research has been done in the field of structural engineering based on Artificial Intelligence (AI) techniques. However, researchers are interested to incorporate AI techniques in structural engineering due to their low computational cost and time saving. Huang and Huang have conducted a study on seismic fragility analysis of RC bridges using ANN [31]. Liu and Zhang have employed the ANN-based methodology for developing the fragility curves of steel frames [29]. Similarly, Neves et al. have assessed steel bridges based on structural health monitoring and optimization of civil engineering structures [32,33]. e ANN architecture consists of three or more layers: an input layer, hidden layer, and output layer. Input data (theoretical, empirical, experimental, or combination of the three) is fed into the input layer on which the network learns and recognizes, and then it is transmitted to the hidden layer. e neuron has a particular weight, bias value, and an activation function. e output layer predicts the target or final results based on various algorithms [34,35]. e neuron in the hidden layer collects the input data from the input layer and transmits it to the output layer as given in the following equation [26]: Here w ij is the weight coefficient, x i indicates the i th input variable, h j shows the output of the j th neuron in the hidden layer, b j is the bias of the j th neuron in the hidden layer, and "f" represents an activation function.
ere are various types of Artificial Neural architecture. e most common types of ANN are comprised of Single Layer Perceptron, Multilayer Perceptron (MLP), Radial Basis-Network (RBN), and Recurrent Neural Network (RNN), the Probabilistic Neural Network (PNN), and Cascade Correlation Neural Network [36].
One of the most commonly used algorithms is backpropagation, which can reduce the error function. Several error matrices are used for solving engineering problems, and selecting the appropriate error matrix is mandatory [37]. e predicted value minus the actual output value is known as the error function, and the mean square error (MSE) is employed for the determination of this difference as follows mathematically: where N shows the number of test data, O i is the i th predicted value, and � O is the i th desired target value. ANN model halts once the MSE is lower than the target value; otherwise the training continues and the values of weights and biases will keep being updated. Another important decision is the selection of an appropriate activation function while creating an ANN model. e most widely used activation functions are log-sigmoid and tan-sigmoid functions, which can be either linear or nonlinear. e nonlinear function enhances the nonlinear behaviour of the available data, so sigmoid, nonlinear function is adopted in this study as shown in the following equation [31]: Here ∝ is the sigmoid slope a � 1. e output data can be computed in the input layer as follows: 2 Computational Intelligence and Neuroscience where X j shows the weight component between the output layer and the j th neuron in the hidden layer, while C j is the bias coefficient in the output layer. e Levenberg-Marquardt (LM), also known as the damped least squares method, is used for the computation of nonlinear least-square problems [38]. LM can mathematically be represented by the following equation [39]: where y k+1 � learning function, Λ(y k ) � Jacobean matrix, and Ѵ(y k ) � nonlinear function. Bayesian Regularization, also known as backpropagation, is another ANN training algorithm that updates the weights and biases as per the LM optimization [40]. One-step secant (OSS) algorithm connects the quasi-Newton approach and conjugate gradient algorithms. It does not require computing the inverse matrix while calculating the new search direction. However, more computation is needed using the OSS algorithm as compared to conjugate gradient methods [41].

GEP Model. Genetic Programming (GP) is another machine learning technique that is an extended form of
Genetic Algorithm (GA), introduced by Koza [42]. e GA was developed by Holland based on inspiration from Darwin's theory of evolution [43]. GP and GA are two different approaches based on solution representation. e GA gives linear strings or chromosomes of fixed length, while GP gives nonlinear strings of different sizes and shapes (parse trees). In 1999, Candida Ferreira introduced Genetic Programming, a technique that produces computer programs for modelling any phenomenon by mimicking the biological evolution [44].
In the GEP technique, the entities are used as linear strings having a fixed length (genome or chromosome), which are subsequently represented as nonlinear strings of different shapes and sizes (expression trees). Despite its fixed length, the GEP chromosome can encode expression trees (ETs) of varying sizes and shapes [44]. ere are two major components in GEP, namely, chromosome or genome and expression tree (ETs). e chromosomes comprise a linear string of fixed length having one or more genes. Each gene itself is a string of fixed length, including arithmetic operations, fitted length parameters as a function sets, and a terminal set of constants. e genes in GEP consist of a head and a tail.
e head includes symbols that show both functions and terminals while the tail comprises terminals only [45,46]. e significance of the research is to employ the ML approach for the prediction and comparison of drift ratios of school buildings with the results obtained through the conventional approach. e optimized AI architecture and functions were used to develop the fragility curves. Similarly, the GEP model, including the expression trees, and equations were utilized for forecasting the drift values, which were then used in the development of fragility curves. e performance of ANN and GEP models was evaluated using MSE.

Methodology
is study focuses on the ANN and GEP-based fragility assessment of reinforced concrete school buildings located in seismic zone-4 of Pakistan that were designed after Muzaffarabad earthquake. Field data were collected from the school buildings through field visits and professional interviews with the concerned authorities. Field data containing structural information like the number of spans, beams, columns, the thickness of slabs, and location of stairs was collected by Ferreira from district Muzaffarabad for the existing RC school buildings and was verified with the structural drawings [44]. e databases containing building dimensions are tabulated in Table 1 [21]. e nonlinear incremental dynamic analysis was employed to determine the structural behaviour against the gradually scaled 20 past earthquake records depending upon the site history. e Peak Ground Acceleration (PGA) was selected as seismic intensity measure (IM). ree damage limit states: serviceability damage state (DS1), damage control limit state (DS2), and collapse prevention damage state (DS3), were defined. e fragility curves were plotted for each damage state using the IDA procedure in FEM software (PERFORM -3D).
Furthermore, the seismic performance was predicted using the ANN model. C. Global drift (Y) is chosen as the output value in the output layer. e accuracy of the target values using the trained ANN model requires some important tasks such as the definition of a suitable ANN architecture, selection of training, testing data sets, and training and testing the network. e same architecture is trained and tested for each damage limit state. e ANN model was trained and tested in MATLAB. One of the important steps in solving the problem is choosing the optimum number of neurons and hidden layers. e percentage of error can be reduced by optimization of the number of hidden layers and the number of neurons. ANN architecture comprises a selection of training algorithms, several hidden layers, and a number of neurons and activation functions. is study utilizes a network architecture of 5 neurons in the input layer, 25 neurons in hidden layer, and one neuron in the output layer. An appropriate training algorithm is required to recognize the relationship between the input layer and output layer. In this study, three training algorithms (Levenberg-Marquardt, Bayesian Regularization, One-Step Secant) were compared, and the best one was chosen for the establishment of fragility curves. e ANN flow diagram including the collection data, selection of architecture, training and testing the network, and finally obtaining the results is shown in Figure 1.
e drift ratios were also predicted using GEP model. e GEP model was created using three groups of fitting parameters such as general parameters, numeric constants, and genetic operators. e general parameters consist of the Computational Intelligence and Neuroscience number of chromosomes, the number of genes, head size, and the linking functions. e numeric constants comprise constants per gene, data type, lower bound, and upper bound.
e genetic operators include rate of mutation, function insertion, gene transposition, and gene recombination. Figure 2 depicts the sequential explanation of the GEP flowchart, i.e., collection of data, distribution of data into the training and testing set, construction of chromosomes, displaying and execution of expression trees (ETs), and the measurement of fitness.  Lastly, the results of IDA, ANNs, and GEP were compared for the fragility assessment of RC buildings.

Configuration of Building Topology
e buildings considered in this study comprise 1-3 stories' buildings with different number of bays in both orthogonal directions as shown in Table 1. e size of columns and beams are different for all structures considered in this study. e height and floor area of the selected buildings are also tabulated in Table 1.

Structural Modelling
To perform the nonlinear analysis, a 3D nonlinear static model was created in CSI-Perform 3D that captures various aspects of structural behaviour such as hinge rotation and material strain. e characteristics of locally available materials were considered to achieve the actual structural behaviour as per the research done by Rafi and Nasir [47]. Grade-40 reinforcing steel was used in the analytical model as suggested by professional experts during the interviews. e nonlinear trend of the structure can be achieved if the nonlinear inelastic fibre sections are used for modelling the reinforcing steel and concrete. For this purpose, the Mander model was assigned to concrete material, and complete confinement effects were counted in the analysis [19,48]. For steel bars, the nonbuckling steel model was used. e material and loading values are tabulated in Table 2.

Selection of Ground Motion
An important step in the development of fragility curves is the selection of appropriate past ground motion records. e target spectrum is provided by the Building Code of Pakistan (BCP) but it is not reliable. erefore, in this study, a total of 20 different earthquake ground motions were adopted with magnitude (here 6.5-8.0), source to site distance, and the shear-wave velocity, Vs (here 175-300 m/sec), as shown in Table 3 [29]. ese earthquake records were selected which resonate the site characteristics such as the fault mechanism of the Kashmir region. e seismic intensity measure (IM) for the establishment of fragility curves is an important parameter; however, there is no clear method to decide what intensity measures to be selected. Researchers have used various IMs such as Peak Ground Velocity (PGV), Peak Ground Acceleration (PGA), and Peak Ground Displacement (PGD), and spectral response acceleration (S a ) [49]. e intensity measures can be achieved either from earthquake records directly or by using the response spectrum of recorded earthquakes in the past which match the site history.
If the actual earthquake data is unavailable for a particular location, then a synthetic ground motion can also be developed [50]. e most frequently used intensity measure for the fragility analysis is PGA; however, it is considered that S a is comparatively better [51]. Similarly, Seo et al. did the fragility analysis of curved steel bridges using PGA as an intensity measure. Tavares et al. have used PGA values for the establishment of fragility curves to evaluate the vulnerability of Canadian highway bridges [52]. e fundamental principles in the selection of accurate IMs depend on how well the hazard level of ground motion correlates the damage level of the structures. erefore, the closeness of this correlation results in the accuracy of the fragility curves.

Structural Limit State Definition
A structure cannot fulfil its intended function once the limit state is exceeded. e structural capacity can be determined by establishing suitable limit states. Different limit states have been used by researchers in the past for the evaluation of structures. Avşar et al. employed three damage limit states: serviceability, damage control, and collapse prevention limit states for the assessment of ordinary highway bridges [49]. Researchers used four damage limit states (slight, moderate, extensive, and collapse) based on interstory drift ratio to assess steel buildings against earthquake loading [53,54]. In this study, the global drift values are used as damage states adopted from the research conducted by Zain et al. as shown in Table 4.

Development of Fragility Curve
Fragility analysis can be defined as the probability of occurrence of damage in a structure or structural member. e probability of failure can be represented by a lognormal distribution which can be mathematically expressed as shown in the following equation [55]: where P f � probability of damage occurrence, Ø � standard normal distribution function, D, C � earthquake demand and capacity of the structure, respectively, and IM � earthquake ground motion intensity measures (PGA, as in this study). e terms "m" and "β" are the median value and dispersion ratio of the lognormal distribution, respectively. e lognormal distribution was used by different researchers for the prediction of structural damage due to earthquake ground motion [56].

Incremental Dynamic Analysis
IDA is one of the most reliable and effective approaches for the assessment of the structural performance of buildings subjected to earthquake ground motion. In this method, the ground motion records are scaled and then applied to the e nonlinear response of the structure is captured in each step. Bayar and Bavaghar [57] adopted the IDA technique for the establishment of fragility curves for highway skewed bridges. e damage assessment of structures can be done by computing the likelihood of damage caused due to earthquake ground excitation. For this purpose, the fragility curves can be employed to quantify the probability of such damage.

IDA-Based Fragility Curves.
e results of numerical analysis (NA), i.e., incremental dynamic analysis, were utilized to develop the seismic fragility curves for the building topology under consideration. e lognormal distribution function was considered for the conditional probabilities using MATLAB code. e fragility curves for three limit states: serviceability (DS-1), damage control (DS-2), and collapse prevention (DS-3), were established by plotting the IMs on axis while the probability of exceedance on the Y-axis is shown in Figure 3. e probability of exceedance of any limit state can be selected from the fragility curves.

ANN-Based Fragility Curves.
e numerical analysis procedure is computationally time taking and costly; therefore, machine learning techniques might be helpful. In the current study, the fragility curves were predicted using the ANN approach. e fragility curves were predicted using the ANN technique for all three damage limit states as shown in Figure 4. All three damage limit states of the ANN model show the same trends as depicted in the IDA-based approach. In order to obtain an accurate model, 70% of the data was selected for training and 15% for validation, and 15% data was used for testing the model. ree backpropagation algorithms, LM, OSS, and BR, are compared based on their performance, i.e., mean square error (MSE) as shown in Table 5. It is vivid that the Levenberg-Marquardt (TRAINLM) is the most efficient out of all algorithms with the least MSE equal to 0.0257. erefore,          Computational Intelligence and Neuroscience the LM training algorithm has been adopted for the ANN model. Table 6 shows the optimized ANN architecture (5-25-1) which is selected based on hits and trials. A single hidden layer with 25 neurons is selected based on the lowest epochs and MSE values. MSE is chosen as the performance function and LM as a training function.
In order to predict the drift values using ANN model, a relationship is developed between the target values and the predicted values as shown in Figure 5. e performance of the model can be indicated by the correlation coefficient, R 2 , which is equal to 0.9384.

GEP-Based Fragility Curves.
Selection of an appropriate GEP architecture is necessary; hence the suitable architecture was selected in this study with the optimized GEP parameters; for example, head size, number of chromosomes, number of genes, functions, and data type parameters are shown in Table 7. e number of genes and    Computational Intelligence and Neuroscience number of chromosomes are limited to 3 and 100, respectively. e fragility curves for all three limit states were plotted as shown in Figure 6, which follow almost the same trend as in the case of IDA-based fragility curves; however, compared to the ANN model, the accuracy of the GEP model is lower in terms of coefficient of correlations, R 2 .
In order to predict the GEP model, the target values and predicted values of GEP model were compared as shown in Figure 7. e results obtained from the GEP model are in the acceptable range with a coefficient of correlation, R 2 , equal to 0.87. e error distribution between the target and predicted values is also shown in Figure 8. From the error plot, the maximum error and average error are 1.451 and 0.1953, respectively, for the ANN model. e GEP model gives the maximum error equal to 1.478 and the average error of 0.133. e maximum error in the GEP model is greater than that of the ANN model. e global drift values (Y) can also be predicted using GEPbased empirical equation. Here equation (7) is used to forecast the global drift values. e variables A, B, and C of equation (7) can be determined using equations (8)-(10), respectively.
Global drift values � Y , (10) where Y � global drift values, N � earthquake number, D � duration of earthquake, M � magnitude of earthquake, P � Peak Ground Acceleration (PGA), and V � average seismic shear-wave velocity up to a depth of 30 meters from the surface (Vs30). Figure 9 shows the graphical representation of the GEP model which can be decoded in the form of equations. e fragility curves established using incremental dynamic analysis are compared with ANN and GEP techniques corresponding to all three limit states as shown in Figure 10. In Figure 10(a), the probability of exceedance of the ANN model occurs at a lower PGA value compared to the GEP model, where the same probability of exceedance occurs at a higher PGA value. Corresponding to PGA that equals 0.4 g, the probability of exceedance is approximately 58%, 40%, and 20% using the ANN model, IDA approach, and GEP model, respectively. However, the trend reverses at PGA of nearly 0.6 g, the GEP model predicts the highest probability of exceedance followed by the IDA approach, and the lowest probability of exceedance is predicted using the ANN model. At PGA of 1.0 g and above, the fragility curves of all three models coincide, which shows the 100% probability of exceedance of the serviceability limit state.
For the damage control limit state in Figure 10(b), the ANN model follows almost the same trend as the NA model; however, the GEP model shows a high probability of exceedance at PGA greater than 0.8 g. At PGA 0.6 g, the probability of exceedance is nearly 33%, 18%, and 4%, using ANN, IDA, and GEP approaches, respectively. ere is an abrupt change in the curvature of the GEP-based fragility curve at the damage limit state; however, the ANN model predicts very well at PGA 1.0 g and above.
For the collapse prevention limit state, the results of both ANN and GEP approaches forecast almost the same result as the numerical analysis-based approach as shown in Figure 10(c). e results obtained can be used for the fragility assessment of buildings.

Conclusion
is study presents the applications of machine learning approaches such as ANN and GEP to predict the global drift values for the fragility analysis of RC school buildings in Pakistan. e results of ANN and GEP were compared with the analysis results of the incremental dynamic analysis (IDA) technique. Using soft computing techniques, the computation cost can be reduced, and the complex modelling process as in the numerical approaches can be avoided. erefore, the machine learning techniques are useful in predicting the seismic behaviour of buildings using fragility curves.
(i) Out of the two ML approaches, the ANN model gives more efficient results in terms of coefficient of determination compared to the GEP model corresponding to all three limit states. e ANN model predicts the global drift values with R 2 equal to 0.938 compared to the GEP model having R 2 equal to 0.87. (ii) Both the ML models produced very similar fragility curves as were obtained using numerical modelling. Unlike the conventional numerical approaches such as incremental dynamic analysis which requires huge computational cost and is time-consuming, the ML techniques, especially the ANN approach, can be utilized to establish the fragility curves for the seismic assessment of buildings.
e performance of ML models can further be enhanced by improving the quality data and increasing the number of datasets.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.