On the Training Algorithms for Artificial Neural Network in Predicting the Shear Strength of Deep Beams

is study aims to predict the shear strength of reinforced concrete (RC) deep beams based on artificial neural network (ANN) using four training algorithms, namely, Levenberg–Marquardt (ANN-LM), quasi-Newton method (ANN-QN), conjugate gradient (ANN-CG), and gradient descent (ANN-GD). A database containing 106 results of RC deep beam shear strength tests is collected and used to investigate the performance of the four proposed algorithms. e ANN training phase uses 70% of data, randomly taken from the collected dataset, whereas the remaining 30% of data are used for the algorithms’ evaluation process. e ANN structure consists of an input layer with 9 neurons corresponding to 9 input parameters, a hidden layer of 10 neurons, and an output layer with 1 neuron representing the shear strength of RC deep beams. e performance evaluation of the models is performed using statistical criteria, including the correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). e results show that the ANN-CG model has the best prediction performance with R� 0.992, RMSE� 14.02, MAE� 14.24, and MAPE� 6.84. e results of this study show that the ANN-CG model can accurately predict the shear strength of RC deep beams, representing a promising and useful alternative design solution for structural engineers.


Introduction
Deep beams are defined as load-bearing structural elements in the form of simple beams, in which a considerable amount of load is transferred to the supports by a combined compression force of load and jet. Deep beams are characterized by a larger beam depth compared to conventional beams, classified by the ratio of the length of the cut span to the beam depth (a/h) or on the ratio between calculated span length and beam height (l/h). Several design codes have given the conditions for defining deep beams. For instance, according to IS Code 456-2000, the deep beam is defined by a ratio of effective span-to-overall depth (l/h), which does not exceed 2.0 for the simple beam and 2.5 for the continuous beam [1]. Besides, the ACI 318-14 [2] classifies a beam as a deep beam if it satisfies the following: (a) the spacing does not exceed four times of overall structural depth, or (b) the cutting span does not exceed twice the overall part depth. of the beam when the beam's height increases [11]. In deep beams, the shear capacity can be 2 to 3 times greater than that determined by the calculation method obtained with conventional beams. erefore, the shear stress in the high beam cannot be ignored compared with the conventional bending beam. e stress distribution is not linear even in the elastic phase. At given ultimate stress, the stress field is not the same parabolic shape as the conventional beams anymore, which is also a significant reason for slippage problems in deep beams [5].
In the past several decades, many methods have been proposed to analyze the shear strength of deep beams, including the strut-and-tie method (STM) [12,13] and the upper limit theorem of plasticity theory [14,15]. Based on the STM, theoretical methods to calculate the shear strength are proposed, such as compression field theory (CFT) and modified compression field theory (MCFT) [16,17], the theory of softened strut-and-tie model (SSTM) considering the compression softening of concrete [18,19], and strutand-tie model based on the crack band theory [20,21]. Besides, the current design codes, such as ACI 318-14 [2], EN 1992-1-1:2004 [3], and CSA A23.3-04 [22], have recommended the STM approach as a deep beam design tool. In addition, some in-depth studies have been carried out to analyze the shear behavior of deep beams as well as determine the most critical parameters affecting the shear strength. According to studies [4,[23][24][25], several important parameters have been identified, including compressive strength of concrete, yield strength of longitudinal and transverse reinforcement, the ratio of effective depth to breadth, as well as the main reinforcement ratio. In fact, the relationship between the parameters and the shear capacity of deep beams is nonlinear [9,12,23]. Consequently, building an accurate model that can accurately estimate shear strength based on mathematical equations is challenging [26]. Meanwhile, the deep beam shear strength obtained by experimental tests or numerical analysis is more or less limited because of the complexity of such kind of material and beam structure [12,23]. To overcome these difficulties and to improve the ability to estimate the shear strength of deep beams, artificial intelligence (AI) approaches have been used in several investigations [27,28].
Indeed, the construction field has effectively applied AI models to solve many problems such as geotechnical [29,30], building materials [31,32], structure analysis, and design [33][34][35]. e application of AI models for problems related to the shear strength of deep beams has been studied by many scientists. Goh's first study in 1995 [36] applied the artificial neural network (ANN) model to predict beam shear resistance with 6 input parameters. Later, Sanad and Saka [37] also checked the effectiveness of the ANN model in predicting the shear strength of deep beams using 10 input parameters related to the geometry and material properties. e results showed that ANN provides an effective alternative solution in predicting the shear artificial neural network of reinforced concrete (RC) deep beams. It is obvious that the ANN algorithm is a widely used machine learning (ML) prediction tool, but the selection of an appropriate ANN algorithm is still being questioned. In fact, it is challenging to find the best ANN model that could accurately predict the target and optimize many factors, such as the processing speed, numerical precision, and memory requirements. Such an optimization problem lies in the learning process in a neural network and could be solved by using an appropriate training algorithm. In fact, the ANN algorithm contains four principal training algorithms, including Levenberg-Marquardt (ANN-LM), quasi-Newton method (ANN-QN), conjugate gradient (ANN-CG), and gradient descent (ANN-GD). A given training algorithm might be suitable for a given problem but might fail in another case [38]. Gradient descent is the slowest training algorithm but requires less memory than the other three algorithms. e fastest algorithm is Levenberg-Marquardt, which requires the most memory. erefore, an in-depth investigation is crucial to determine the best training algorithm in general and in predicting the shear strength of deep beams in particular. Besides, the basis of selecting the best ANN black-box raises a number of fundamental questions, especially the criterion to define the best one. In the field of ML, the performance evaluation of the models is assessed by different metrics [39][40][41], namely, the correlation coefficient (R) or the coefficient of determination (R 2 ), mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). A concise evaluation and comparison of different criteria need to be conducted to confirm ML models' effectiveness. erefore, in this study, the procedure to determine the best ANN algorithm is conducted through different ANN training algorithms and evaluation metrics, with the highest aim is to accurately and reliably predict the shear strength of the deep beam. To achieve this goal, in the first step, the construction of the deep beam database is conducted by gathering different experimental results published in the literature. e general theory of ANN models is then presented, including four previously mentioned training algorithms. An architecture of ANN models is proposed, along with an extensive investigation on the ANN epoch numbers. e best ANN algorithm is deduced by comparing different performance metrics and the corresponding probability density functions, taking into account the random sampling effect, while constructing the two datasets. Finally, the representative results in predicting the shear strength of deep beams are presented and compared with several existing prediction results in the available literature.

Significance of the Research Study
Accurate prediction of the deep beam shear strength is crucial in the construction design. Although some machine learning models have been proposed to predict the shear strength of deep beams in the available literature, namely, genetic-simulated annealing [4], backpropagation neural network [42], artificial neural network [43], gene expression programming [43], support vector machine [42], multivariate adaptive regression splines [42], smart artificial firefly colony algorithm and least squares support vector regression [24], and adaptive neural fuzzy inference system [44], the prediction accuracy and reliability could be further 2 Complexity improved. erefore, different contributions of the present investigation could be pointed out by the following ideas: (1) Four representative training algorithms for the ANN model are investigated to predict the shear strength of deep beams, in which the training epoch of each model is fine-tuned. (2) e reliability of ANN models is carefully evaluated by Monte Carlo simulations with random sampling strategy to construct the database. (3) e model using the conjugate gradient algorithm (ANN-CG) containing 10 neurons in the hidden layer is deduced as the best predictor. (4) e performance of the best ANN-CG architecture is compared with 10 previously published works in the literature and achieved the highest value of the correlation coefficient (R) and lowest values of mean absolute error (MAE). us, the simplicity and effectiveness of the proposed approach using ANN-CG are confirmed.

Database Construction
In this study, the database used to develop the ML models is collected from published research. e dataset includes 106 test results of the shear strength of deep beams. Specifically, 19 test results of high-strength RC deep beams are collected in the study by Tan et al. [45], 52 test results from the work of Smith et al. [46], and 35 test results from the work of Kong et al. [47]. is database includes various parameters affecting the shear strength of RC deep beams (denoted as V), including the ratio of effective span to effective depth (L/d), ratio of effective depth to breadth (d/b w ), ratio of shear span to effective depth (a/d), concrete cylinder strength (f' c ), yield strength of horizontal reinforcement (f yh ), yield strength of vertical web reinforcement (f yv ), ratio of horizontal web reinforcement (ρ h ), ratio of longitudinal reinforcement to concrete area (ρ s ), and ratio of vertical web reinforcement (ρ v ). Representative information on these parameters is detailed in Table 1. Besides, the histograms of each input and output parameter are shown in Figure 1. e beam test diagram and a schematic illustration of RC deep beams are illustrated in Figure 2. Prior to the training process of ANN models, all input and output values are normalized in the range of [0, 1] and then converted back to the initial range of values for the sake of clarity and postprocessing processes. e database is randomly divided into two parts, representing the generation of the random sampling effect. e first part (containing 70% of the total data, 74 samples) is used to train the ANN network, called the training part. e second part (using the remaining 30% of data, 32 samples) is used to verify the ANN models, referred to as the testing part. e random sampling effect generates variability in the input space of the training part, which considerably affects the accuracy of the ML models. Besides, the meaning of separating the training and testing parts in machine learning problems is to fully assess the accuracy of the models, as the testing data are entirely unknown to the model during the training phase. In general, the prediction capacity of the model is the most important factor. erefore, the results in the next sections only focus on the evaluation criteria of the testing parts.

General Presentation of ANN Models
An artificial neural network (ANN) is a computational model that is built based on the human brain with many biological neurons. It consists of many artificial neurons, interconnected in a network, including input and output data. From input data to come up with a complete result or output, a set of learning rules is used. It is called the backpropagation or backward propagation of error. e structure of a backpropagation network is a combination of different layers, including the input layer, the output layer, and the hidden layer. e input layer is the first layer, the output layer is the last one, and the connection between the two layers is the hidden layer, which might contain one or many hidden layers ( Figure 3).
During the training phase of the algorithm, ANN learns to recognize patterns from the input data. en, it compares the produced result with the desired output. e difference between the two results is adjusted through a backward working process until such a difference is lower than a predefined criterion. erefore, to train a neural network, the selection of an appropriate training algorithm is very important. e training algorithms are the underlying engines for building neural network models with the goal of training features or patterns from the input data so that a set of internal model parameters can be found to optimize the model's accuracy. ere are many types of training algorithms, but frequently used ones can be listed as gradient descent, conjugate gradient, quasi-Newton method, and Levenberg-Marquardt algorithms.

Gradient Descent Algorithm (ANN-GD).
Gradient descent is an iterative optimization algorithm used in ML and deep learning problems with the goal of finding a set of internal variables for model optimization. Inside, the "gradient" is the rate of inclination or declination of a slope, and the "descent" means descending. Gradient descent often performs in 3 steps, namely, (1) internal variable initialization, (2) evaluating the model based on the internal variable and loss function, and (3) updating internal variables in the direction of finding optimal points. e gradient descent method possesses the iteration step by where w (i) is the set of variables to be updated, ∇f(w (i) ) is the gradient of the loss function f according to set w (i) , η is the training rate, and i � 0, 1, . . ., η can be a fixed value or determined by one-dimensional optimization along the training direction per step. e nature of the optimization process of the loss function is finding the suitable points to minimize or maximize the loss function. e goal of the Complexity    Complexity gradient descent method is to find such global minimum points. e stopping criterion of the gradient descent method can be (i) the maximum number of epochs reached, (ii) the value of the loss function is small enough, and the accuracy of the model is large enough, and (iii) the value of the loss function remains stable after a finite number of epochs. e gradient descent algorithm is often used with the big neural networks. e advantage of this method lies in the storage of the gradient vector, instead of the Hessian matrix. e diagram for the training process with the gradient descent is shown in Figure 4.

Conjugate Gradient Algorithm (ANN-CG).
e conjugate gradient algorithm could be considered as one of the algorithms to improve the convergence rate of the artificial  neural network, being the intermediate between gradient descent and Newton's method. e advantage of this approach lies in the fact that there is no need to evaluate, store, and reverse the Hessian matrix. In this algorithm, the search is performed along with conjugate directions, which produce generally faster convergence than gradient descent directions. ese training directions are conjugated concerning the Hessian matrix. In this algorithm, the sequence of training directions is built using the following formula: with the initial training direction vector where y is the training direction vector, c is the conjugate parameter, and i � 0, 1,. . . e training direction, in all the cases, is reset to the gradient's negative [48]. e parameters' improvement process with the conjugate gradient algorithm is defined by where i � 0, 1, . . ., η is the training rate, usually found by line minimization. e diagram for the training process with the conjugate gradient is shown in Figure 4.

Quasi-Newton Algorithm (ANN-QN).
e advantage of the quasi-Newton method is that it is computationally inexpensive because it does not need many operations to evaluate the Hessian matrix and calculate the corresponding inverse. An approximation value to the inverse Hessian matrix is built at each iteration. It is computed using only information on the first derivatives of the loss function. e Hessian matrix is composed of the second partial derivatives of the loss function. e quasi-Newton formula is presented by where G (i) is the inverse Hessian approximation. e quasi-Newton method is commonly used because it is faster than gradient descent and conjugate gradient. e diagram of the quasi-Newton method is shown in Figure 4.

Levenberg-Marquardt Algorithm (ANN-LM).
e Levenberg-Marquardt (LM) algorithm, also called the damped least squares method, is used to solve nonlinear least squares problems. Instead of computing the exact Hessian matrix, this algorithm calculates with the gradient vector and the Jacobian matrix. e loss function is expressed as a sum of squared errors as with a is the number of instances in the dataset and u is the vector of all error terms. e Jacobian matrix of the loss function is defined as follows: for i � 1,. . ., a and j � 1,. . ., b and a is the number of instances in the dataset, b is the number of parameters in the neural network, and A is the Jacobian matrix. e size of the Jacobian matrix is [a, b]. e gradient vector of the loss function is calculated as e Hessian matrix is approximately computed by where B is the Hessian matrix, β is a damping factor that ensures the positive of the Hessian, and I is the identity matrix. e large parameter β is chosen in the first step. Next, if there is an error in any iteration, β will be increased by some factor. On the contrary, if the loss decreases, β will be decreased so that the Levenberg-Marquardt algorithm approaches the Newton method. Finally, the parameters' improvement process using the Levenberg-Marquardt algorithm is defined as for i � 0, 1, . . . e diagram of the ANN-LM training algorithms is shown in Figure 4.  e MAPE is defined as the difference between the actual and predicted values and then divided by the actual value. Specifically, the lower the RMSE, MAE, and MAPE values, the higher the accuracy of the models and the better the performance of the models. On the contrary, the higher R values mean higher model performance.

Validation of
e R value varies in the range from −1 to 1. e R values close to 0 show the poor performance of the model and close to 1 means good accuracy. e values of RMSE, MAE, MAPE, and R are defined by the following formulas: where Q AV and Q AV are the actual and the average values and Q PV and Q PV are the predicted and the average predicted values.

Methodology Flowchart
In this study, the flowchart of the proposed methodology includes the following steps: (a) Data collection: this is the first step, and the dataset is built by gathering data from the available literature. All data are randomly divided into 2 parts: training data and testing data, in which the training part accounts for 70% of the dataset and the testing part accounts for 30% of the dataset. (b) Building models: in this step, the data of the training part was used for training the models based on training algorithms such as gradient descent, conjugate gradient, quasi-Newton, and Levenberg-Marquardt. (c) Model validation: in this final step, the data of the testing part is applied to validate the proposed models. Statistical indicators including RMSE, MAE, MAPE, and R are utilized to evaluate the models.
A schematic diagram of the methodology is illustrated in Figure 5.

Results and Discussion
e definition of the ANN structure is critical in solving problems [49,50]. In the case that the number of input and output is fixed, the performance of the ANN model depends on the hidden layer number and the neuron number in each hidden layer. Cybenko [51] and Bound [52] have succeeded in using a single hidden layer model in classifying the input variables for model processing. Besides, some studies [53][54][55] have shown that an ANN model with only one hidden layer could be enough to successfully explore a complex nonlinear relationship between input(s) and output. erefore, one hidden layer is proposed for the structure of the ANN model in this investigation. Moreover, semiempirical relationships proposed by Nagendra [56], Tamura [57], and some investigations [58][59][60] have recommended that the neuron number of the hidden layer is equal to the total number of inputs and outputs. In the current database, the number of input and output representing deep beams' shear strength is equal to 9 and 1, respectively. erefore, 10 neurons in the hidden layer ANN is proposed. e sigmoid activation function for the hidden layer is selected, while the activation function for the output layer is a linear function. e cost function has been chosen as the mean square error one. Due to the random sampling effect, the number of simulations is proposed 300 times to obtain reliable results. e main purpose of this work is to investigate the performance of four ANN models to predict the shear strength of deep beams, trained by the four algorithms, namely, Levenberg-Marquardt (ANN-LM), quasi-Newton method (ANN-QN), conjugate gradient (ANN-CG), and gradient descent (ANN-GD). e training process is repeated until the network output error reaches an acceptable value (less than the initial specified error threshold). In this study, the network training is performed with various epoch numbers, ranging from 100 to 1000 with a step of 100. Finally, Table 2 summarizes the characteristics of the ANN models proposed in this study.     high accuracy results with high speed, and the same conclusion could be drawn for the case of the ANN-CG model. However, an opposite conclusion is found for the case of ANN-QN and ANN-GD models, in which the mean and std values of R increase and those of RMSE, MAE, and MAPE decrease with a higher number of epochs. us, the accuracy of ANN-QN and ANN-GD models increases with a higher number of epochs.

Comparison of ANN Models' Prediction
Moreover, Table 3 details the values and std of R, RMSE, MAE, and MAPE of four models with different epoch numbers, varying from 100 to 1000 with a step of 100. It is found that the accuracy of the ANN-LM model is very low, where the maximum value of R is only 0.747 at 100 epochs. Similarly, for the ANN-CG model, at 100 epochs, the highest R value is 0.971. erefore, the optimal ANN-LM and ANN-CG model is at 100 epochs. In contrast, with ANN-QN and ANN-GD algorithms, the highest value of R is R � 0.961 and R � 0.969, respectively. Besides, the std values of the three criteria RMSE, MAE, and MAPE of the ANN-LM model are the highest compared to the other models. is shows that the ANN-LM model has the lowest accuracy among the 4 models.
Next, the values of criteria RMSE, MAE, and MAPE of the three remaining models are compared. With the ANN-CG model, the values of these criteria are the lowest at 100 epochs, compared with the lowest value of the ANN-QN model at 900 epochs and the lowest value of the ANN-GD model at 1000 epochs. rough evaluation and analysis, it is found that the ANN-CG model is the model with the best accuracy with the least number of epochs. Considering the case of large numbers of epochs, it can be seen that the ANN-GD model is superior to the ANN-QN model. erefore, a reliability evaluation of the three models is performed in the following sections. e lowest accuracy ANN-LM model for shear beam prediction is not proposed for the next investigation.

Reliability Evaluation of the Best ANN Training
Algorithms.
e main purpose of this section is to evaluate the reliability of the three models, including the optimal ANN-CG at 100 epochs, the optimal ANN-GD at 900 epochs, and the optimal ANN-QN at 1000 epochs. Figure 8 shows the distribution of the probability density function (PDF) of the four statistical criteria for the training part, namely, R (Figure 8(a)), RMSE (Figure 8 e results show that the ANN-CG is the most reliable training algorithm for predicting the shear strength of the deep beam. erefore, the ANN-CG model is chosen to predict the shear strength of the deep beam in the next section.  Table 5.

Prediction of Beam Shear Strength
In the first case, the maximum value of R is 0.993 for both training and testing parts. For the second case, the minimum RMSE value is 14.73 for the training part and 14.02 for the testing part. e minimum value of MAE is considered in Case 3, where MAE � 9.88 for the training part, and MAE � 10.06 for the testing part. e last case finds the minimum MAPE of 6 and 5.79 with the training and testing parts, respectively.
In analyzing the results presented in Table 5, the prediction performance is evaluated through 4 criteria for the testing part. e maximal value of R is slightly different, considering cases 1, 3, and 4. e difference between the Case 4 and min MAPE values of cases 1, 2, and 3 is relatively small, especially when comparing with those of RMSE and   Table 5.
Finally, the results of this investigation are compared with the results previously published with some other predictive methods, summarized in Table 6. Using the artificial neural network-conjugate gradient (ANN-CG) in this study, the performance of shear strength prediction of the deep beam seems to be the best with the highest value of R,

Model
Statistical criteria R RMSE MAE MAPE Genetic-simulated annealing (GSA) [4] 0.929 ---Backpropagation neural network (BPNN) [42] 0.916 34.032 -11.273 Radial basis function neural network (RBFNN) [42] 0.9767 20.29 -7.63 Artificial neural network (ANN) [43] 0.9711 42.27 30.28 -Gene expression programming (GEP) [43] 0.9654 51.57 40.99 -Support vector machine (SVM) [42] 0.9465 30.134 -14.435 Multivariate adaptive regression splines (EMARS) [42] 0.986 13.011 -5.887 Genetic-simulated annealing (GSA) [4] 0.929 --12.3 Smart artificial firefly colony algorithm and least squares support vector regression (SFA LS-SVR) [24]  Complexity the lowest value of MAE, and almost the lowest values of RMSE and MAPE. More importantly, while comparing the four algorithms proposed in this study, the ANN-CG appears as the best predictor with respect to the accuracy in estimating the shear strength of the deep beam as well as less computation time is required (i.e., best performance at 100 iterations). Furthermore, the computation memory and cost are less demanded in comparison with other algorithms. It implies that the prediction of deep beam shear strength would not require a high-performance computer with the use of the ANN-CG algorithm. Usually, hybrid ML algorithms take a longer computation time than standalone ones. However, given the prediction accuracy achieved in this study, the development of a hybrid approach would not be necessary. Overall, this confirms the effectiveness of ANN-CG proposed in this study, suggesting a promising and useful alternative design solution for structural engineers. For practical applications, the final weight and bias values of the best ANN-CG model are given in Table 7 and could be used to develop a supporting numerical tool for estimation of shear strength of deep beams.

Conclusion
In this study, the neural network (ANN) model is proposed to predict the shear strength of deep beams. For this purpose, a database of 106 results from shear tests of RC deep beams is built from the available literature. e ANN model is built with 9 input parameters divided into two groups, namely, the geometric size parameter group and the parameter group representing the material properties. Four training algorithms of ANN are explored, namely, the Levenberg-Marquardt (ANN-LM), quasi-Newton method (ANN-QN), conjugate gradient (ANN-CG), and gradient descent (ANN-GD). e prediction performance of different ANN training algorithms is compared. Four different statistical criteria, namely, the correlation coefficient (R), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), are introduced to validate and evaluate the performance of the ANN model. e conjugate gradient (CG) algorithm is chosen as the best training ANN algorithm for predicting the shear strength of deep beams. With the ANN-CG model chosen as best, four cases corresponding to four different black-boxes are studied.
e results show that the crucial information to choose an accurate machine learning model might lie on the criterion that the smallest value of RMSE is obtained. Besides, the analysis of error between predicted and actual shear strength shows that the ANN model can be a promising numerical tool that could considerably avoid time-consuming and costly experimental procedures. Despite an extensive investigation on different potential training algorithms and epochs, this study is only conducted on one ANN architecture. erefore, regardless of the highest and outstanding prediction accuracy achieved, it is interesting to perform another investigation related to the neuron number and the hidden layer number to, possibly, enhance the performance of the ANN-CG model, or to further decrease the computation time by decreasing the neuron in the hidden layer.

Data Availability
e data used to support the findings of the study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.