Prediction of Transverse Reinforcement of RC Columns Using Machine Learning Techniques

Transverse reinforcement of reinforced concrete (RC) columns contributes greatly to the ductility deformation capacity of RC structures. Te existing models to predict the amount of transverse reinforcement required are all empirical models with low accuracy and large dispersion and have not considered the real ductility demand of individual components. Tis paper proposes a ductility design method of RC structure based on component drift ratio demand obtained from nonlinear structural dynamic analysis. To establish the best transverse reinforcement ratio prediction model for RC columns, based on an experimental database consisting of 498 columns, 12 machine learning (ML) models are trained. To solve the over-ftting problem caused by the current situation of “few samples and big errors” of the experimental database, feature engineering aiming at dimension reduction is systematically carried out through an iterative process. Trough comprehensive performance evaluation on the testing set, an XGBoost model is selected. To interpret the “black box” ML model, the SHAP method and partial dependence plots are used to analyse the correlation between the input parameters and the transverse reinforcement ratio. Te interpretation results are consistent with mechanical laws and engineering experience, which prove the reliability of the selected ML model. Compared with two existing empirical models, the proposed XGBoost model shows higher accuracy and smaller deviation. After safety probability analysis, the trained XGBoost model is transformed into C code and integrated into seismic design software for productive practice. An open-source data-driven model to predict the transverse reinforcement ratio required for RC columns is provided worldwide, with the fexibility to account for additional experimental results.


Introduction
Ductility refers to the ability of a structural member to bear large deformation without obvious reduction of bearing capacity in inelastic stage [1], which contributes greatly to the collapse resistance of reinforced concrete (RC) structures. Te ductility of columns, which plays an important role in the ductility deformation capacity of RC structures, is mainly ensured by a sufcient number and arrangement of transverse steel bars in the potential plastic hinge zone. Generally, transverse reinforcement has three main functions [2], namely, (1) preventing the longitudinal bars from buckling, (2) avoiding shear failure, and (3) confning the concrete core to provide sufcient deformability ductility. Tis study mainly focuses on confnement requirements. Design codes such as the Chinese code (GB 50010-2010), American code (ACI 318-11), European code (Euro Code 8), New Zealand code (NZS 3101), and Canadian code (CSA 2004) have made detailed provisions on transverse reinforcement, such as the minimum transverse reinforcement ratio (or transverse reinforcement characteristic value), maximum transverse reinforcement spacing, and length of potential plastic hinge zone. Some codes further provide empirical formulas for calculating the minimum transverse reinforcement ratio, but few codes consider the real ductility demand of individual component. Te amount of coderequired transverse reinforcement can be reduced in many cases, while insufcient in other cases.
To quantify the amount of transverse reinforcement required for RC columns, many empirical models [2][3][4][5][6][7][8][9][10] have been proposed based on the basic principles and mechanics of RC components. For example, based on a numerical study using cyclic analyses performed on a large set of columns, Watson et al. [3] derived refned design equations to determine the quantities of transverse reinforcement required for specifed ductility levels, which were adopted by New Zealand Standard. Considering the limitations of ACI 318M-11 that the transverse requirements do not account for the axial load level and confnement demand, which has been proven to signifcantly afect the confnement efectiveness and the column behaviour, Sheikh et al. [4] proposed a design procedure to determine the amount of lateral steel required considering the column ductility performance, the level of axial load, and the steel confguration. Generally speaking, an empirical method usually starts with the assumed form of an equation and then carries out regression analysis, in which the assumed variables are linearly related and the unknown coefcients are determined by using experimental data so that the equation will ft the data. However, due to the complicated constitutive material relationships and the coupling of external seismic loads, the real relationship between the input variables and the transverse reinforcement required is highly nonlinear. Te chosen equation may not be able to adequately represent complex nonlinear relationships. Besides, empirical formulas are always developed based on a narrow data range, and the diversity of sample results is limited. All these factors lead to poor accuracy and large dispersion of empirical models [2,7,8,10]. For example, the coefcient of variation of the ratio of the calculated amount to the experimental amount of transverse reinforcement is as high as 0.616 [10]. Terefore, a new method to predict the amount of transverse reinforcement needed covering a wide range of parameters; for example, normal strength and high strength of concrete and reinforcement, with high accuracy and low dispersion, should be developed.
Recently, the merits of alternative approaches, e.g., nonparametric modelling in engineering research have been widely recognized [11,12]. Artifcial intelligence (AI) techniques have attracted great scientifc interest in felds with sufcient experimental data and complicated phenomena. Machine learning (ML) has been successfully used to classify the failure mode of RC columns [13], RC beamcolumn joints [14], RC shear walls [15], and RC frames with inflls [16], predict the shear strength of RC deep beams [17], squat RC Walls [18], RC beam-column joints [14,19], precast concrete joints [20], steel fber-reinforced concrete beams [21], and slender RC structures with steel fbers [22], predict the drift capacity of RC columns [23,24], forecast the backbone curve and hysteresis loop of RC columns [25][26][27], predict the compressive strength of concrete [28], predict the compressive and fexural strengths of steel fber-reinforced concrete [29], estimate the fexural capacity of ultrahighperformance concrete beams [30], predict the punching shear capacity for fber-reinforced concrete slabs [31], and predict the lateral strain in transverse reinforcements [32]. Although progress has been made in applying the ML technique to interpret the experimental data and predict the component-level structural properties, the data-driven method for predicting the transverse reinforcement needed for columns has not yet been studied. It is recognized that the ML methods can (1) capture the complex nonlinear relationships between the input and output variables, (2) deal with a large number of input variables without neglecting potentially important variables, and (3) gain insights from big data and take into account the diversity of massive specimens. In view of these advantages, ML methods are adopted to predict the amount of transverse reinforcement required for RC columns in this paper.
Te general objectives of this research are as follows: (1) to propose a ductility design method of RC structure based on real drift ratio demand of individual components; (2) to establish ML models to predict the amount of transverse reinforcement required for columns and choose the best one; (3) to interpret the prediction of the ML model and ensure the credibility of the proposed model; (4) to create an opensource data-driven ML model that can be used in seismic ductility design worldwide, with fexibility to account for additional experimental results. Te paper begins by presenting a new ductility design method of RC structure based on real component drift ratio demand in Section 2. Ten, ML model training, performance evaluation, and model interpretation are presented in Section 3. Comparisons of the proposed ML model with empirical models, safety probability analysis, and ML model deployment are presented in Section 4. Te conclusions are given in Section 5.

Methods
To predict the transverse reinforcement of RC columns, frstly, nonlinear structural dynamic analysis under specifed earthquake ground motion is carried out to get the component deformation in the response history. Secondly, the component drift ratio demand is calculated based on component deformation. Tere are three following reasons for choosing drift ratio instead of curvature ductility as the input ductility demand: (1) drift ratio includes bending deformation and shear deformation and is suitable for fexure critical, fexure-shear critical, and shear critical columns; (2) drift ratio does not depend on the defnition of yield displacement or yield curvature; (3) drift ratio is a routine record of all specimens and can be directly related to drift limits specifed in building codes. Finally, component drift ratio demands along with other component features (for example, geometric dimensions, longitudinal reinforcement arrangements, and material properties, which will be elaborated in Section 3.2) are input into the trained ML model to predict the transverse reinforcement ratios required for individual columns. Te work fow is shown in Figure 1.
Te model establishing steps of the transverse reinforcement ratio prediction model are as follows: (1) we collect sufcient experimental data covering a wide range of parameters; (2) we carry out feature engineering to select the right features for ML models; (3) we split the data set into training set (80%) and testing set (20%) randomly; (4) we select appropriate ML methods and train the models on the training set; (5) we adopt grid search and the 10-fold cross-validation method to optimize the hyper-parameters; (6) we train ML models using the optimal hyper-parameters on training set and evaluate the performance on testing set through four typical quantitative 2 Advances in Civil Engineering metrics; (7) we interpret the established model results through the SHAP method and partial dependence plot (PDP) to verify the reliability of the trained ML model; (8) we choose the best ML model as the fnal model to predict the transverse reinforcement ratio of RC columns. In this study, 12 ML models are used to establish the best prediction algorithm of transverse reinforcement ratio as follows: (1) Ordinary Least Squares (OLS), (2) Lasso regression, (3) Ridge regression, (4) K-Nearest Neighbors (KNN), (5) Support Vector Regression (SVR), (6) Multilayer Perceptron (MLP), (7) Decision Trees (DT), (8) Random Forests (RF) [33], (9) AdaBoost [34], (10) XGBoost [35], (11) LightGBM [36], and (12) CatBoost [37]. Tey can be classifed into two categories, namely, single models and ensemble models. Models (1)-(7) belong to single models, and models (8)- (12) belong to ensemble models. Ensemble techniques can be divided into two categories, namely, parallel ensemble techniques (bagging methods) and sequential ensemble techniques (boosting methods). Combining the predictions of several single models, ensemble models increase the accuracy of the results signifcantly.
Linear regression models, including Ordinary Least Squares, Lasso regression, and Ridge regression, are the simplest and most commonly applied form of regression techniques used for the prediction of continuous variables and are used as the basic model for comparison. Te ML models are developed using Scikit-learn [38], a machine learning package in the Python programming language.

Experimental Database.
Te experimental database consists of 326 rectangular column tests and 172 circular column tests for a total of 498 tests, involving cyclic and monotonic lateral loading, with or without axial load [39][40][41]. Te test confguration and experimental data of RC columns were reduced to the cast of an equivalent cantilever to consistently compare the column behaviour for a wide range of testing confgurations [39].
Tere are many variables in the database which can be classifed into fve categories, namely, geometric dimensions, reinforcement arrangements, material properties, applied   Advances in Civil Engineering loads, and extracted displacement data. Te ultimate drift ratio θ u � Δ u /L × 100 was extracted from the force-displacement hysteresis curves for each column [39], where L is the distance between point of maximum moment and point of zero moment, also called shear span, and Δ u is the lateral displacement during loading at which the column experienced a 20% reduction in maximum applied lateral load V Max . If a 20% drop in shear capacity occurred because of cycling at constant deformation rather than through increasing deformation demands, then the displacement at 0.8 V Max was obtained through interpolation by drawing a line connecting tips of hysteresis loops before and after the horizontal line at 0.8 V Max , as can be seen in Figure 2. If there was no drop in lateral load to 0.8 V Max , then Δ u was taken as the maximum lateral displacement the column test achieved. A lower bound estimate on the lateral displacement at shear failure was obtained with the procedure. Based on the background knowledge of civil engineering, all the features related to ductility and transverse reinforcement of RC columns are extracted. Since the transverse reinforcement ratio A sh /sb c is the target variable to be predicted, features directly related to the transverse reinforcement ratio, such as bar diameter and spacing of transverse bars, cannot be used as input features.

Feature Engineering. Diferent from databases in other
industries, civil engineering test databases are characterized by limited number of samples and large errors, especially for concrete component test databases, owing to that the test is expensive and the dispersion of material properties is large, which may lead to overftting of ML models. In view of the difculty of increasing the number of samples, feature engineering, aiming at dimensionality reduction, is particularly important.
Feature engineering is an essential phase to improve performance by selecting the right features for the model, ensuring that the maximum relationship with the target variable is captured. It is worth noting that feature engineering is an iterative process, which takes a lot of efort. While there is no formula for efective feature engineering, 4 steps are used in this study:

Data Transformation.
To handle data with diferent units avoiding scale efect, data standardization is used to convert the data into a uniform format (zero mean and unit standard deviation), while the tree-based model does not need data standardization. For features with distribution skewed to the right, logarithmic transformation is also tried, while the performance improvement is not obvious.

Feature Extraction.
To reduce the number of input features, new features are extracted from the existing attributes by grouping multiple variables into a feature that measures the average of these variables, such as section depth-width ratio h/b, gross area to core area ratio A g /A c , shear span to efective depth ratio L/d, longitudinal reinforcement ratio ρ l , and axial load ratio P/A g f c ′ .

Feature Selection.
Features with high correlations can lead to collinearity problem, which will reduce the accuracy of the ML model by preventing it from learning the interactions between independent features. Terefore, feature selection is conducted to achieve dimensionality reduction based on a correlation matrix composed of the Pearson correlation coefcient of each pair of features in the database. Among the features with strong correlation, the feature with lower correlation with the target variable will be eliminated. For example, the clear cover thickness and the gross area to core area ratio A g /A c are a pair of highly correlated features. Having a lower correlation with the target variable transverse reinforcement ratio A sh /sb c , the clear cover thickness is removed.
Selected features are the section shape (S for short, 0 for rectangle and 1 for circle), section depth h, longitudinal bar diameter d l , yield strength of longitudinal bars f yl , yield strength of transverse bars f yt , and concrete compressive strength at 28 days f c ′ . Te descriptions and statistical attributes of the input and output variables, such as the mean, standard deviation, minimum, 25%, 50%, 75%, and maximum value, are given in Table 1, and their statistical distributions are displayed in Figure 3. As can be seen, the database covers a wide range of RC column parameters, including normal-strength and highstrength concrete and reinforcement, which will increase the adaptability of the trained ML model.

Feature Iteration.
Feature iteration, also known as the wrapper method of feature selection, is an iterative process involving four steps as follows: (1) we select a subset of features; (2) we train the ML model with the selected features (the training process is introduced later); (3) we measure the model performance; (4) we make a decision to retain or remove the selected features.
Permutation feature importance [33] is used to rank the features and identify the most important features. Te permutation feature importance is defned to be the decrease in a model score when a single feature value is randomly shufed, which breaks the relationship between the feature and the target. Tus, the drop in the model score indicates how much the model depends on the feature. Table 2 lists the permutation feature importance of the 12 input features for the 12 ML models, in descending order. Although the numerical algorithms of diferent ML models are diferent, the ranking of feature importance is similar, which also reciprocally proves the reliability of 12 ML models. Ranking sixth, the section shape is not very important, which proves the rationality of training rectangular column and circular column samples together. According to the average permutation feature importance of 12 ML model, four features of least importance, such as the section depth-width ratio h/b, section depth h, longitudinal bar diameter d l , and yield strength of longitudinal bars f yl , are eliminated by trial and error through an iterative process, and the best performance on testing set is obtained.

Model Training.
Te database is randomly split into a training set (80%, 398 samples) and a testing set (20%, 100 samples). Te training set is used to establish the prediction b is the section width; b c is the cross-sectional core width measured with outside edges of transverse reinforcement; A g is the gross area of column; A c is the section area measured out-to-out with transverse reinforcement; d is the efective depth in primary direction (dimension from compression face to centroid of outermost layer of tension steel); P is the axial compressive force on column; A sh is the total cross-sectional area of transverse reinforcement (including crossties) within spacing; and s is the center-to-center spacing of spirals or circular hoops.  Advances in Civil Engineering ML model, and the testing set is used to evaluate the accuracy of the training model.
After feature engineering and dimensionality reduction, eight features, namely, concrete compressive strength f c ′ , drift ratio demand θ u , axial load ratio P/A g f c ′ , gross area to core area ratio A g /A c , yield strength of transverse bars f yt , section shape S, shear span to efective depth ratio L/d, and longitudinal reinforcement ratio ρ l are selected as input parameters. Te output parameter is the transverse reinforcement ratio.
To establish the best transverse reinforcement ratio prediction model, 12 ML models, including Ordinary Least Squares, Lasso regression, Ridge regression, K-Nearest Neighbors, Support Vector Regression, Multilayer Perceptron, Decision Trees, Random Forests, AdaBoost, XGBoost, LightGBM, and CatBoost are trained on the training set.
For the best output of each model, grid search is applied for tuning hyper-parameters. ML Models of every combination of various hyper-parameters are built and evaluated, and the model with the highest accuracy wins. To alleviate the inherent randomness in selecting training and testing samples, a 10-fold cross-validation process is employed. Te training set is randomly divided into 10 folds, each fold is used as a testing set in turn and the remaining 9 folds are used as the training set. Several rounds of 10-fold crossvalidation are performed and the results from all the rounds are averaged to estimate the accuracy of the ML model. We can see the Jupyter Notebook python code on GitHub for the grid search hyper-parameters ranges and the fnal hyperparameters of each ML model. After efective feature engineering, most ML models even using default hyper-parameters can achieve good performance.

Training and Testing
Results. Te model performance on training set and testing set is evaluated through four typical quantitative metrics, namely, coefcient of determination where T i is the actual value of the transverse reinforcement ratio and P i is the predicted value, i is sample index, m is the number of samples, and T is the mean value of all the samples in the database. Te reason for using weighted average percentage error (WAPE) instead of mean absolute percentage error (MAPE) is that the MAPE exaggerates the importance of the percentage error at low transverse reinforcement ratio, which is not important in engineering practice. Generally, when the predicted transverse reinforcement ratio is low, the confnement transverse reinforcement does not play a controlling role in the fnal amount of transverse reinforcement.
Te performances of these 12 models are evaluated based on the testing set by comparing the predicted results with the experimental data, as shown in Table 3 and Figure 4. Te diagonal line (y = x) represents that the prediction is identical to the experimental data. In general, ensemble models show higher performance than single models. Statistically, a model with high value of R 2 and corresponding low values of error measures is considered to have a high performance. Among the 12 ML models, the XGBoost model shows the best performance (R 2 = 0.873, RMSE = 0.239, MAE = 0.161 and WAPE = 0.212) on testing set and is chosen as the fnal prediction model of transverse reinforcement ratio.

Model
Interpretation. Te established ML model may have good prediction performance; however, it is still a "black box" model which cannot give an explicit explanation of the underlying physical or mechanical mechanism. An ML model whose explanation violates the mechanical law cannot be used in production practice even if it has good performance. To obtain a better understanding of the predictions and verify the reliability of the proposed XGBoost model, the SHAP method [42] and partial dependence plot (PDP) [43] are used to interpret the results.

SHAP Method.
Te SHAP (SHapley Additive exPlanations) method [42] originates from game theory and it is an additive feature attribution method, that is, the output of the model is a linear addition of input variables. Te contribution of each feature is represented by the so-called Shapley value. SHAP not only ofers an understanding of which features are important but also of how each feature afects the prediction, whether at the level of the whole database or at the level of single samples. Advances in Civil Engineering Figure 5 is a SHAP summary plot of the features, which demonstrates the distribution of the SHAP values for each feature and indicates the corresponding infuence trends. Te horizontal axis represents the specifc SHAP value and the vertical axis represents the input features, ordered by importance. Te dots are the samples in the database. Te colour of the dot indicates the value of the specifc feature, and the colour from blue to red indicates a value from small to large. Te horizontal position of the dot indicates whether the feature value leads to a higher or lower prediction. For example, the upper right dot in red indicates that a high concrete compressive strength f c ′ leads to a prediction increase. It is observed that the quantity of transverse reinforcement required increases with increasing concrete compressive strength f c ′ , increasing drift ratio demand θ u , increasing axial load ratio P/A g f c ′ , increasing gross area to core area ratio A g /A c , decreasing yield strength of transverse bars f yt , and decreasing shear span to efective depth ratio L/d. In addition, more transverse reinforcement is required for rectangular columns than for circular columns. While the infuence trend of longitudinal reinforcement ratio ρ l is not obvious. Te law obtained by the SHAP method is consistent with existing mechanical models and experimental results of RC columns, so the proposed XGBoost model is convincible.
In addition to the global interpretations of the entire data set, SHAP also provides individual (local) interpretations of single samples. Figure 6 illustrates explanations for a typical circular sample. Te base value is the average of the predictions of the whole training set, which is 0.619%. Te features determine the deviation of the prediction from the base value. Te red bars pointing to the right represent the contribution to increasing the transverse reinforcement ratio from the base value, while the blue bars pointing to the left represent oppositely. For this circular sample, drift ratio demand θ u is the most critical feature and has a positive efect on transverse reinforcement ratio, whose SHAP value is 0.23%.

Partial Dependence Plot.
To visualize the relationships between transverse reinforcement ratio and the input parameters and to provide design suggestions for practical engineering, partial dependence plot (PDP) [43] is adopted in this study. Partial dependence of a feature corresponds to the average response of an estimator for each possible value of the feature. PDP shows the marginal efect of one or two features on the predicted outcome of an ML model and whether the relationship between the target and a feature is linear, monotonic, or more complex. A fat PDP indicates that the feature is not important, and the more the PDP varies, the more important the feature is.
One-way PDPs of transverse reinforcement ratio A sh /sb c on concrete compressive strength f c ′ , drift ratio demand θ u , axial load ratio P/A g f c ′ , gross area to core area ratio A g /A c , yield strength of transverse bars f yt , section shape S, shear span to efective depth ratio L/d, and longitudinal reinforcement ratio ρ l , ordered by importance obtained from permutation feature importance, are visualized in Figure 7. Te thinner lines represent individual specimens (only 50 specimens are randomly selected for clear and typical display), while the thicker line represents the average value of all the 498 samples in the database. Marks on the horizontal axis indicate the data distribution. Te largest infuences can be seen in concrete compressive strength, and there is an obvious step around 70 MPa. Te higher the concrete compressive strength is, the more transverse reinforcement is needed. Te second and third important features are drift ratio demand and axial load ratio. With the increase of drift ratio and axial load ratio, the transverse reinforcement ratio required also increases. With the increase of gross area to core area ratio A g /A c , the transverse reinforcement ratio mainly increases, ignoring the decreases caused by some large-sized specimens when the gross area to core area ratio A g /A c is small. With the increase of yield strength of transverse bars f yt and shear span to efective depth ratio L/d, the transverse reinforcement ratio decreases. Rectangular columns require more transverse reinforcement than circular columns. Te infuence of longitudinal reinforcement ratio ρ l on the transverse reinforcement ratio is small. Te law obtained from the partial dependence plots is consistent with the SHAP method, which further proves the reliability of the proposed XGBoost model. Two-way PDP of transverse reinforcement ratio A sh /sb c on drift ratio demand θ u and axial load ratio P/A g f c ′ is visualized in Figure 8(a). Te maximum and minimum studied separately. For normal-strength concrete, two-way PDP of transverse reinforcement ratio A sh /sb c on drift ratio demand θ u and axial load ratio P/A g f c ′ is visualized in Figure 8(b). Te maximum and minimum average transverse reinforcement ratios are 1.23% and 0.25%, respectively.

Comparisons with Empirical Models.
To validate the superiority of the proposed XGBoost model, two traditional empirical models proposed by Watson et al. [3] and Sheikh et al. [4] are employed to predict the transverse reinforcement ratio needed for RC rectangular columns. Te Watson et al. [3] model is given as follows: Te Sheikh et al. [4] model is given as follows: Here, P is the Axial compressive load; P 0 is the nominal axial load strength at zero eccentricity (P 0 � 0.85f c ′ A g (1 − ρ l ) + A g ρ l f yl ); μ ϕ is the curvature ductility factor, ϕ is the strength reduction factor; m � f yl /0.85f c ′ ; α is a parameter that accounts for the confnement efciency including confguration and the lateral restraint provided to the longitudinal bars; other variables are defned previously. Figure 9 illustrates the comparison between the transverse reinforcement ratio of the rectangular columns obtained from experiments and those from the prediction formulas. Ideally, all points are distributed on the diagonal line. A point distributed below the diagonal line means the formulation under-predicts, whereas above the diagonal line indicates the formulation over-predicts. All the ML models trained in the previous section show higher accuracy than the empirical formulas, especially the XGBoost model. Figure 9(c) illustrates the results of rectangular columns in the database predicted by the proposed XGBoost model (just for visual contrast, noting that the performance of the ML model should be evaluated on the testing set, not the whole database).
Te mean absolute error (MAE) of Watson et al. [3] and Sheikh et al. [4] empirical formulas are 0.727 and 0.652, respectively, while that of the proposed XGBoost model on testing set is 0.161. Te standard deviation of the error between experimental and predicted transverse reinforcement ratio of Watson et al. [3] and Sheikh et al. [4] empirical formulas are 0.990 and 1.291, respectively, while that of the proposed XGBoost model on testing set is 0.239. Both empirical formulas show less accuracy and larger uncertainties than the proposed XGBoost model. Te reasons can be attributed to the fact that (1) the empirical equations were developed based on a narrow range of data having a limited diversifcation in specimen results; (2) the empirical equations might neglect important parameters that contribute to the transverse reinforcement ratio; (3) the ability of ML methods to capture complex and nonlinear relationships between the output and inputs is stronger than the traditional linear regression formula.

Safety Analysis.
When applied in engineering practice, considering safety and structural stability, the transverse reinforcement ratio prediction model should be conservative enough for the safe design of RC structures. For primary members critical to structural stability, it is suggested that the probability that the predicted value is lower than the experimental value is less than 20%, referring to the confdence of acceptance criteria for Life Safety (LS) of primary members suggested by Ghannoum et al. [44].
Te histogram and empirical cumulative distribution of the error between experimental and predicted transverse reinforcement ratio (the error is defned as experimental value minus predicted value) of the XGBoost model on testing set are plotted in Figure 10. Te vertical line at error � 0 indicates that the predicted value is equal to the experimental value. Errors less than 0 indicate safety; errors bigger than 0 indicate insecurity. Most of the transverse reinforcement ratios predicted by the XGBoost model are very close to the experimental values, with a safety probability of 55%. To provide a conservative estimate of the transverse reinforcement ratio, it is suggested that an    Figure 11. Trough nonlinear structural dynamic analysis of the overall structural system under the specifed earthquake ground motion, the maximum component deformation in the response history can be obtained. Te concept of obtaining component drift ratio demand from nonlinear structural dynamic analysis is suitable for all types of structural components. Tis paper mainly focuses on RC columns. Because the component drift ratio demand is defned according to the experimental confguration of cantilever, deformation transformation is required. In the nonlinear structural dynamic analysis, there is usually a zero moment point near the middle point of the column. Te segments below and above the zero moment point can be equivalent to two cantilever columns, and the drift ratio demands can be calculated, respectively, as shown in Figure 12.
If the moments of the top and bottom ends of the column are of diferent signs, frstly, we fnd the location of the zero moment point and then calculate the displacement of the zero moment point according to the shape function of the fnite element model. For each equivalent cantilever, the displacement of the zero moment point and column end are transformed into the local coordinate system defned based on the tangent of the column end, that is, the implicated angle of the column end, which is considered as the harmless drift ratio, is automatically deducted, as shown in Figure 12(b).Te drift ratios at the top and bottom ends are calculated by Equation (3).
where θ t is the drift ratio at the top end; θ b is the drift ratio at the bottom end; d 0 is the displacement at zero moment point; d t is the displacement of the top node of the column; d b is the displacement of the bottom node of the column; H t is the length of the upper equivalent cantilever; H b is the length of the lower equivalent cantilever.
If the moments of the top and bottom ends of the column are of the same signs, the zero moment point is outside the column, and its position does not need to be Advances in Civil Engineering calculated. Te drift ratio of the column can be approximately calculated by Equation (4).
where H is the length of the column. Te column drift ratios at all time steps can be calculated according to the abovementioned formulas, and the envelope of the absolute value of the drift ratios at all time steps can be obtained as the drift ratio demand.  Finally, the drift ratio demands θ u , along with concrete compressive strength f c ′ , axial load ratio P/A g f c ′ , gross area to core area ratio A g /A c , yield strength of transverse bars f yt , section shape S, shear span to efective depth ratio L/d, and longitudinal reinforcement ratio ρ l are input into the trained ML model to predict the transverse reinforcement ratios required at the top or bottom ends of the RC columns. Noting that the sectional dimensions and drift ratio demands in the sectional height and width direction may be quite diferent, the transverse reinforcement ratios in height and width direction should be calculated separately.

Conclusion
Te existing models to predict the amount of transverse reinforcement required for RC columns are all empirical models with low accuracy and large dispersion and have not considered the real ductility demand of individual components. Tis paper proposes a ductility design method of RC structure based on the real component drift ratio demand and develops a method for determining the component drift ratio demand. To establish a transverse reinforcement ratio prediction model, a database consisting of 326 rectangular columns and 172 circular columns for a total of 498 tests is used. Te database is randomly split into a training set (80%) and a testing set (20%). Twelve ML models including Ordinary Least Squares, Lasso regression, Ridge regression, K-Nearest Neighbors, Support Vector Regression, Multilayer Perceptron, Decision Trees, Random Forests, AdaBoost, XGBoost, LightGBM, and CatBoost are trained. Feature engineering, including data transformation, feature extraction, feature selection, and feature iteration, is systematically carried out through an iterative process. For the best output, grid search and the 10-fold cross-validation method are applied to tune the hyper-parameters. Trough a comprehensive performance validation on the testing set, an XGBoost model is suggested for the best accuracy. Te SHAP method and partial dependence plot are used to interpret the "black box" ML model. Te following conclusions are drawn: (1) Te amount of transverse reinforcement required for columns increases with increasing concrete compressive strength, increasing drift ratio demand, increasing axial load ratio, increasing gross area to core area ratio, decreasing yield strength of transverse bars, and decreasing shear span to efective depth ratio, ordered by importance. In addition, more transverse reinforcement is required for rectangular columns than for circular columns. While the infuence trend of longitudinal reinforcement ratio is not obvious, other features are less important. (2) Compared with two empirical models, all the 12 trained ML models show higher accuracy and lower dispersion than empirical models, especially the XGBoost model. Te mean absolute error of the XGBoost model on testing set is 0.161, while the two empirical models are 0.727 and 0.652, respectively. Te standard deviation of the error of the XGBoost model on testing set is 0.239, while the two empirical models are 0.990 and 1.291, respectively. (3) Simple design charts are given by partial dependence plot for reference by engineers. Partial dependence plots of transverse reinforcement ratio on drift ratio demand and axial load ratio show that the average maximum and minimum transverse reinforcement ratio are 1.29% and 0.35% for normal-strength and high-strength concrete, while 1.23% and 0.25% for normal-strength concrete (less than 60 MPa). (4) Te safety probability of the proposed XGBoost model on testing set is 55%. To provide a conservative estimate of the transverse reinforcement ratio, an additional value of 0.12% is suggested to be added to the predicted value for a guarantee rate of 80%.
Te trained XGBoost model is transformed into C code and integrated into seismic design software for productive practice. An open-source data-driven model is created for continuous improvement, with the fexibility to incorporate more experimental data when available.

Data Availability
Te experimental database, Jupyter Notebook python code for 12 ML models, C code of the trained XGBoost model, and other materials used to support the fndings of this study have been deposited in the GitHub repository (https:// github.com/qiaobaojuan/ML-model-for-RC-columns.git).

Conflicts of Interest
Te authors declare that they have no conficts of interest.