Using Hybrid Artificial Intelligence Approaches to Predict the Fracture Energy of Concrete Beams

Fracture energy is always used to represent the fracture performance of concrete structures/beams, which is crucial for the application of concrete. However, due to the nature of concrete material and the complexity of the fracture process, it is diﬃcult to accurately determine the fracture energy of concrete and predict the fracture behavior of diﬀerent concrete structures. In this study, artiﬁcial intelligence approaches were tried to seek a feasible way to solve these prediction issues. Firstly, the ridge regression (RR), the classiﬁcation and regression tree (CART), and the gradient boosting regression tree (GBRT) were selected to construct the predictive models. Then, the hyperparameters were tuned with the particle swarm optimization (PSO) algorithm; the performances of these three optimum models were compared with the test dataset. The mean squared errors (MSEs) of the optimum RR, CART, and GBRT models were 0.0447, 0.0164, and 0.0111, respectively, which indicated that their performances were excellent. Compared with the RR and CART models, the hybrid model constructed with GBRT and PSO appeared to be the most accurate and generalizable, both of which are signiﬁcant for prediction work. The relative importance of the variables that inﬂuence the fracture energy of concrete was obtained, and compressive strength was found to be the most signiﬁcant


Introduction
Because of the beneficial properties of concrete, i.e., its excellent corrosion resistance and good compressive performance, it has been used extensively in the load-bearing members of building structures. However, during the casting and curing process, certain amounts of voids and defects are introduced into the concrete structures, leading to the heterogeneity of microstructure and the low bonding strength of the interfacial transition zone (ITZ) [1][2][3][4][5]. As a result, concrete structures always fracture when bearing tensile loads. Hence, both the analysis of the concrete fracture phenomenon and accurate prediction of the performance of concrete with respect to fractures are crucial for the application of concrete materials. Various indices have been proposed to represent the fracture performance of concrete, such as the fracture energy, fracture toughness, and tensile strength [6][7][8].
Due to the simplicity and accuracy of the testing and calculating processes, fracture energy always is selected as the fracture index of concrete. e International Union of Laboratories and Experts in Construction Materials, Systems, and Structures (RILEM) recommended the use of a standard to test and compute the fracture energy of concrete [9]. Subsequently, many researchers became interested in determining the best way to assess the fracture energy of concrete beams. Based on the RILEM recommendation, Bazant et al. analyzed the fracture energy of different sizes of concrete to determine the relationship between fracture energy and size, and they also found that the fracture energy was associated with the lengths of the notches in the concrete beams [10]. Hu found that fracture energy depends on both the size and geometry of the test specimen, and they proposed the concept of local fracture energy to describe the fracture along the width of a concrete beam [11]. Karamloo et al. studied the influence of the water-to-cement ratio on the fracture performance of self-compacting and lightweight concrete, and they concluded that a remarkable relationship existed between the water-to-cement ratio and the fracture energy of concrete [12]. Kozul and Darwin found that the fracture energy of high-strength concrete decreases as the size of the aggregate increases and that the fracture energy of normal-strength concrete increases as the size of the aggregate increases [13]. Also, fracture energy was shown to be related to the amount and coarseness of the aggregate in the concrete. e compressive strength of concrete, which always is used to evaluate the strength of concrete, also affects the fracture energy [14][15][16].
It is apparent that the fracture energy of concrete is affected by various factors, which means it is difficult for the ordinary methods to predict the fracture energy of concrete accurately. However, artificial intelligence (AI) methods, which mimic human thinking, can be used to analyze such complex regression problems [17,18]. e artificial intelligence approaches have been used extensively in various fields. Kitouni et al. constructed a smart agricultural enterprise system based on the integration of the Internet of ings and agent technology [19]. Srinivasa et al. used the data analytics-assisted Internet of ings to produce intelligent healthcare monitoring systems [20]. Biswas et al. used a hybrid model to treat the classification problems in the Internet of ings environment [21]. Also, artificial intelligence approaches have been used in other fields, such as the determination of the solubility of gases in different liquids [22][23][24][25], analysis of seismic fragility [26][27][28][29][30], prediction of the performance of tunnel boring machine (TBM) [31][32][33], and prediction of rock burst in the underground space [34]. Unfortunately, the fracture performance of concrete, which is crucial for its application, rarely has been studied using artificial intelligence methods.
In this paper, hybrid artificial intelligence approaches were used to predict the fracture energy of concrete. e classification and regression tree (CART), support vector machine (SVM), and gradient boosting regression tree (GBRT) were used to establish the relationships between fracture energy and the influencing factors, and particle swarm optimization (PSO) was used to tune the hyperparameters of these three models. Subsequently, the performances of the three different prediction models were compared, and the importance of each of the various influencing factors was analyzed with the GBRT ensemble algorithm.
is paper is structured as follows. Section 2 presents the main details of the three machine-learning algorithms that were used in this study and introduces the theory of PSO algorithms. Section 3 describes the dataset that was used for machine learning and for preprocessing the data. Section 4 presents the procedure used to tune the hyperparameters. Section 5 presents the results of the tests of the performances of the different predictive models with optimum hyper-parameters. e influences of different variables on the fracture energy of concrete are compared in Section 6, and Section 7 provides a summary of the paper.

Linear Regression (LR) Algorithm.
e linear regression (LR) algorithm is one of the simplest and most extensively used prediction techniques. As shown in the following equation, LR uses only one equation to describe the relationship between different variables: where x 1 , x 2 , . . ., x n are the different features that are regarded as independent variables, Y is the target variable and it depends on the independent variables, and the values of θ i are the weights assigned to the features based on their importance. e cost function, J(θ i ), is introduced to evaluate the performance of the prediction equation; that is, when J(θ i ) reaches its minimum value, the best equation can be obtained. e cost function, J(θ i ), is defined as shown in the following: where m is the size of the data, Y θ (x i ) is the predicted value, and y i is the actual value. However, for simple linear regression, overfitting is a problem that cannot be ignored because the model can fit the training data perfectly but behave poorly in the prediction of unknown data. Hence, penalty methods are used to solve these problems, such as the L1 regulation technique, the L2 regulation technique, and others. In this paper, we only used the ridge regression (RR), which adds an L2 penalty term on the cost function, J(θ i ). us, the updated cost function is [35,36] where λ represents the degree of penalty. e best ridge regression model also can be obtained by minimizing the cost function. erefore, for the ridge regression model, the parameter, α, which determines λ in the L2 penalty term, should be set before a prediction is made.

Classification and Regression Tree (CART) Algorithm.
e classification and regression tree (CART) algorithm is a kind of decision tree algorithm that can deal with both classification and regression problems [37]. It uses a tree-like graph to assist in making decisions, and it is considered to be one of the best and most-frequently used supervised learning methods [38]. e CART algorithm typically consists of two stages of procedures, i.e., the tree generation stage and the pruning stage.
Normally, a CART is generated by splitting a dataset, which consists of the root node, the decision nodes, the leaf nodes, and the branches. For regression problems, the splitting criterion that is selected is recursive binary splitting [37], as shown in the following equations: 2 Advances in Civil Engineering where j is the data attribute to be split, x (j) is the splitting variable, and s is the splitting point. e data are split into two subsets, that is, R 1 and R 2 ; y i is the output variable, c 1 is the average value of the output variables in R 1 , and c 2 is the average value of the output variables in R 2. Such partitioning is required to ensure that the sums of the squared error (SSE) of subsets R 1 and R 2 are minimized separately, and it is also required that the value of the sum of SSE in R 1 and R 2 be minimized. Figure 1 shows that the splitting process divides the root node, which contains all of the data, into two subsets. e attributes in each subset are as homogeneous as possible under the condition of the biggest difference between these two subsets. It should be noted that all of the data splitting must follow this rule during the growth of the tree. With each partition, the complexity of the variance in each subset is reduced, but the model becomes more complicated. e partition will stop either (1) when the data in each leaf node share the same characteristics or (2) when the depth of the tree reaches its maximum value. After the tree is fully grown, the tree model always tends to be overly complex. Such a model could fit the given training data very well, but it also could have poor behavior in predicting the outcomes of the untrained data, a phenomenon known as overfitting.
is is because some branches of the tree are so specific that they contribute little to the tree's ability to generalize information. Hence, the pruning process, which consists of prepruning and postpruning, is designed to remove these redundant branches. After the tree is pruned, the simplified CART model will do a better job of predicting the untrained data. For each decision tree, the following essential parameters should be considered, that is, max_depth, min_samples_split, and min_samples_leaf. e max_depth is used to control the size of the tree, and the min_samples_split and the min_-samples_leaf are set to ensure the sizes of the samples in each leaf.

Gradient Boosting Regression Tree.
During the application of the CART algorithm, the high sensitivity of the data is a big challenge. In some cases, small variations in the data might result in the generation of a completely different tree. erefore, the boosting algorithm was proposed to solve this problem by combining several base learners [39]. e gradient boosting regression tree (GBRT) is a kind of boosting algorithm the base learner of which is the classification and regression tree (CART). By combining several CARTs, the ensembled model will have better predictive performance. e core of GBRT is to identify an additive model that minimizes the loss function. First, a regression tree is generated to provide maximum reduction of the loss function. en, one new tree is added to the existing model at each iteration, and the residual is updated accordingly. It should be noted that the iterative process is stagewise, and the existing trees are not modified when the following trees are added. By adding the new trees, the updated model will perform better in the region in which the previous model did not perform well. e final GBRT model consists of several decision trees that have different structures. Consequently, the predictive model becomes more robust and accurate.
In addition to the parameters set in the base learners (CART), GBRT also uses three extra parameters, that is, the number of base learners (n_estimators), the impact of each additional base learner that is fitted (learning rate), and the loss function.

Particle Swarm Optimization.
Particle swarm optimization (PSO) is an evolutionary computation technology that originated from the study of the predation behavior of birds.
e basic idea of the particle swarm optimization algorithm is to find the optimal solution through collaboration and sharing information between individuals in the group [40]. e advantages of the PSO are its rapid convergence and easy implementation, and it has been proved to be efficient in optimizing various problems, such as optimization of the objective function, optimization in a dynamic environment, training neural networks, and others [41,42]. Figure 2 shows a procedure of the PSO in a flowchart. e PSO algorithm starts with the random generation of particles. Every particle has only two attributes, that is, position x and velocity v. e position represents the direction of movement, and the velocity represents the speed of movement. Each particle searches for the optimal solution separately in the search space and records it as the fitness value f(x n i ). en, by comparing f(x n i ) with the fitness value of p n−1 best,i (particle i's previous best location), the current best location is determined, and the global best location, g n best , can be obtained accordingly. e velocity, v n+1 i , and the positions of x n+1 i of particle i can be updated based on the current best location, p n best,i , and the global best position, (g n best ):

Root node
Leaf node Leaf node Leaf node Decision node

Advances in Civil Engineering
x n+1 where w is the inertia weight parameter, c 1 and c 2 are the acceleration coefficients, and r 1 and r 2 are the random values between 0 and 1. As denoted in (6), the velocity of particle i depends on three factors, that is, the velocity at the previous iteration, its best location, and the global best position. When the criterion of termination is met (usually a sufficiently good fitness or a maximum number of iterations), the iteration stops, and the optimum location is obtained.

Data Description.
e data of 736 3-p-b concrete tests were collected from the research published in 14 papers [43][44][45][46][47][48][49][50][51][52][53][54][55]. Table 1 summarizes the items that were recorded during the experiments, and the range of each item also is listed. Some essential details of the items are demonstrated as follows. S represents the span between two supports in the 3-p-b tests, W is the width of the beams, T is the thickness, a 0 is the length of the initial notch, w/c is the water/cement ratio, λ represents the distribution of aggregate size, d max is the maximum diameter of the aggregate, f c is the compressive strength of concrete, and G f is the calculated fracture energy of the specimens. It should be noted that all of the fracture energies of the 3-p-b beams were calculated following the recommendation of the RILEM TC50-FCM (1985) as follows: where P is the load and δ is the load point displacement, which can be obtained from the load-displacement curve of the 3-p-b tests.
To clearly identify the characteristics of different variables, their distributions were plotted as a histogram and analyzed with normal distribution. As shown in Figure 3, the distributions of the input variables were unordered and scattered, and huge differences could be found among the distributions of different variables. Figure 4 demonstrates the characteristics of concrete fracture energy; the values appeared to be continuous and regular. It was difficult to determine the connection between every input variable and establish the relationship between output variables and input variables. Hence, AI approaches are needed to solve such complex problems.

Data
Processing. Some preparatory work was required before the data could be used in the predictive models. As shown in Table 1, the different variables have different units, and huge differences existed between the values of various variables. Hence, normalization was used to normalize all of the data into values that ranged from 0 to 1. en, the database was split into two sets, that is, a training set and a testing set. e training set was used to train the predictive models to gain these indispensable parameters, and the testing set was used to evaluate the performance of the predictive models. In this study, the ratio between the training set and the testing set was 0.7 : 0.3. erefore, 515 cases were used to train the model, and 221 cases were used to test the performance of the predictive model. Note that all of the data should be shuffled before being split to ensure the representativeness of the datasets.

K-Fold Cross-Validation.
In Section 3.2, it was suggested that all the data should be split into a training set and a testing set. However, in most cases for the experimental data, the sizes of the split sets were not sufficient for the predictive models.
us, K-fold cross-validation was introduced to solve this deficiency by repeatedly using the data in the training set. Figure 5 shows that the training set was divided evenly into K parts, none of which had an intersection. en, K-1 parts were chosen as the training subsets to train the predictive model, and the remaining parts served as the validation subset, which was used to validate the     performance of the current model. e above process was repeated K times. Consequently, each part could be the validation subset once, and each part could be the training subset K-1 times. For regression problems, the mean squared error (MSE) is always set as the performance indicator of models. e value of MSE (MSE i ) is assessed by the validation subset in each fold, and the average MSE value (MSE avg ) in K folds represents the behavior of the predictive model. In this study, K was set as five, as recommended by An et al. [56].

Hyperparameter
Tuning. Sections 2.1, 2.2, and 2.3 illustrated the theories of the RR, CART, and GBRT algorithms, respectively. In this section, the K-fold cross-validation and PSO are combined to tune the hyperparameters of these two algorithms. Due to the significant influence on the structure of the algorithm and the performance of the model, the alpha was tuned for the Ridge Regression model, the max_depth, min_samples_split, and min_samples_leaf parameters for CART and two additional parameters n_estimators, learning rate for GBRT model. Here, the average MSE value (MSE avg ) in K folds is regarded as the fitness value of the particles, and the least MSE avg criterion is applied to searching the optimum parameters.
First, the RR, CART, and GBRT were trained and validated with training data from the 3-p-b tests. Figure 6 shows the evolution of MSE avg with progressing iterations. It is apparent that the variations of the three algorithms are different, that is, for both the convergent rate and the optimum value of MSE avg .
e RR converged to a stable state in only one iteration with 0.0445 MSE avg value, the CART algorithm converged to stable within two iterations, and the optimum MSE avg was 0.0155; it took five iterations for GBRT to become stable, and the optimum MSE avg converged from 0.0195 to 0.0116. It was concluded that the PSO was efficient in tuning the hyperparameters of these three models. e convergence rate of RR was the fastest due to its simple structure and fewer parameters. With the increase of parameters, the time of convergence also increases. Although the GBRT was the slowest to converge, its MSE avg value was the smallest among the three models, which means the GBRT had the best performance in the training process.

Testing of Predictive Models
It is well known that an excellent predictive model must both fit the training data well and accurately predict the unknown data. at is to say, a predictive model must have both low training error and low generalization error. e hyperparameters, that is, RR, CART, and GBRT, were determined in Section 4, and the predictive models can be designed accordingly. Before the predictive models can be used, it is essential to test their performances, especially their ability to generalize based on the available information. In this study, 221 pieces of data were used to verify the predictive capabilities of the three models. e MSE and the R 2 values were selected to quantify the behaviors of the three models. e R 2 value can be calculated as follows: where R 2 is the coefficient of determination of the predictive model, y n is the experimental results, y n is the predictive results, y n is the average value of the experimental results, and N is the total number of data. Figure 7 compares the experimental data and the predicted results, and it also provides the MSE and R 2 values. During the testing process, the RR model obtained the MSE value of 0.0447 and R 2 value of 0.3120, the CART model achieved the MSE value of 0.0164 and R 2 value of 0.7468, and the GBRT model got the MSE value of 0.0111 and R 2 value of 0.8167. Based on these results, it was concluded that the GBRT and PSO hybrid model was more successful than the RR and CART model in establishing the relationship between the concrete fracture energy and the factors that influenced it. en, the performances of various predictive models in the testing process were compared with their performances in the training process with the index of MSE. Table 2 shows the differences between the MSE values in the training process and the testing process for these three models. ese observations highlighted the importance of the generalization ability of the predictive models. e GBRT model produced the smallest MSE value that was obtained in the testing process, and this model also produced a smaller MSE value during the training process. However, the RR and CART models produced higher MSE values, which indicated that those two models were unable to accurately predict the unknown data because of the poor generalization. By the addition of several different CART models as base learners, the GBRT models, with their optimum hyperparameters, had improved performances on the test dataset and on future predictions.
As discussed above, the fracture energy of 3-p-b concrete beams is influenced by various factors, and the relationships between these factors are difficult for simple ridge regression methods to describe. By adding different CARTs that focused on various regions, the hybrid models with GBRT and PSO provided higher accuracy and better generalization ability.
By comparing the AI approaches with the empirical calculation of the fracture of concrete, the AI approaches had several merits. Assumptions always are needed for the simplification and application of an empirical equation, but this requirement is avoided in AI. e size effect phenomenon of concrete fracture energy, which always is troublesome for the empirical calculation, is directly incorporated into the AI predictive models. Hence, the AI approaches seem to be more accurate in determining the fracture energy of concrete.

Relative Importance of Influencing Variables
Since the GBRT and PSO models with optimum hyperparameters had the best performance, they were selected to study the influence of different input variables on the fracture properties of concrete. For such regression problems, the mean squared error represents the impurity of a model. e importance of the variables was evaluated by their contributions to the reduction of the model's impurity, and the result was represented by a relative importance score. A higher score indicated a feature that had a stronger influence. For each feature, the importance score was calculated in every single base learner of GBRT, and then the relative importance score was obtained by averaging the scores over all CARTs. Figure 8 illustrates the influence of the input variables on the fracture energy of the concrete beams.
When the importance scores of different influencing variables were compared, it was apparent that compressive strength, with the score of 0.425, had the strongest influence on the determination of the fracture energy of the concrete beams. is high relevance between fracture properties and compressive strength has been proved repeatedly in the literature, and many empirical equations have been established between them [57,58]. Compressive strength always is regarded as the universal index for concrete performance in various countries. Hence, when it is not convenient to test the tensile or fracture properties, they can be obtained based on the empirical relationship between the fracture properties and compressive strength.
e importance scores of the aggregate distribution and maximum aggregate size were 0.175 and 0.150, respectively. ese two influencing variables are parts of the characteristics of the aggregate that can reflect the microstructure of the concrete. For most laboratory 3-p-b tests of concrete, the specimens are too small to be regarded as homogeneous [59]. erefore, the effect of aggregate on fracture energy cannot be ignored. Especially, when the laboratory tests are expected to predict the failure of large structures, the aggregate characteristics should be quantified and considered. e span, width, thickness, and length of the initial notch can be unified as the geometric parameters of a specimen. Normally, the length of the ligament (width minus initial notch length) is regarded as the size of a concrete specimen, and it has proven to be relevant for the fracture energy of the 3-p-b specimens [60,61]. However, the results in Figure 8 appear to be different, and this may have been caused by the limited range of the width and length of the initial notch, which should be considered carefully in future work. Also, the water/cement ratio should be considered carefully 8 Advances in Civil Engineering  Advances in Civil Engineering during the prediction of concrete fracture energy because it has an impact on the hydration degree of the concrete, which will determine the fracture process of concrete beams.

Conclusions
Research related to concrete fracture properties is very important in order to address the durability issues in the application of concrete. In this study, the ridge regression, CART algorithm, and the ensemble GBRT algorithm were adopted to develop predictive models, and the metaheuristic method (PSO) was used to tune their hyperparameters. e fracture energy of concrete was set as the output variable, and eight influencing parameters were set as the input variables. After the optimum predictive model was obtained, the relative importance of the various influencing variables was analyzed.
e main conclusions are summarized as follows: (1) e PSO algorithms were proved to be efficient in seeking the optimum hyper-parameters of three machine learning algorithms in this study, and these three predictive models all converged to the stable state within a small number of iterations (2) e relationship between fracture energy and its influencing factors is complex, and it cannot be predicted accurately by the simple Ridge regression or single CART prediction models (3) Using the PSO algorithm and the ensemble method, the hybrid GBRT predictive models gained an improved generalization ability, and they had the best performance in predicting the fracture energy of concrete (4) e compressive strength of concrete was found to have a significant influence on the predictive models, which should be considered carefully during the prediction of the fracture energy of concrete Although the fracture properties of concrete have been predicted accurately by artificial intelligence approaches, there are still some limitations in comparison with other previous works [62,63]. First, the dataset used in this study is limited, and some cases even are removed. A large and multinational dataset should be used to enhance the accuracy of the predictive models. en, although the optimum models get a high accuracy in predicting the concrete fracture energy, there still exists obvious prediction error around the boundaries of variables range, and the effect of variables' distribution and boundaries on the performance of predictive models should be carefully considered. Moreover, other optimization algorithms are suggested to be tried in the future work, such as firefly algorithm, ant colony optimization algorithm, and iterated greedy algorithm.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding this publication.