Estimating the Compressive Strength of Cement-Based Materials with Mining Waste Using Support Vector Machine, Decision Tree, and Random Forest Models

To estimate the compressive strength of cement-based materials with mining waste, the dataset based on a series of experimental studies was constructed. The support vector machine (SVM), decision tree (DT), and random forest (RF) models were developed and compared. The beetle antennae search (BAS) algorithm was employed to tune the hyperparameters of the developed machine learning models. The predictive performances of the three models were compared by the evaluation of the values of correlation coeﬃcient ( R ) and root mean square error (RMSE). The results showed that the BAS algorithm can eﬀectively tune these artiﬁcial intelligence models. The SVM model can obtain the minimum RMSE, while the BAS algorithm is ineﬃcient in DTand RF models. The SVM, DT, and RF models can be used to predict the compressive strength of cement-based materials using solid mining waste as aggregate eﬀectively and accurately, with high R values and lower RMSE values. The RF algorithm can obtain the highest value of R and the lowest value of RMSE, demonstrating the highest accuracy. The solid mining waste to cement ratio is the most important variable to aﬀect the compressive strength. Curing time was also an important parameter in the compressive strength of cemented materials, followed by the water-solid ratio of mining waste and ﬁne sand ratio.


Introduction
e mining waste includes the stripping and waste rock (including coal gangue) produced in the process of mining, as well as the tailings discarded in the process of mineral dressing [1][2][3][4][5][6]. A large amount of mining solid waste as well as the complicated treatment has become one of the issues that should be addressed for environmental protection [4,[7][8][9][10][11][12]. e massive accumulation of mining waste can cause pollution of the land, causing disasters such as landslides and mudslides. e debris and tailings formed by the weathering of waste rock are either washed by water into the water body, dissolved and penetrated into the groundwater, or blown into the atmosphere by the wind, polluting the environment with water and gas as the medium [13][14][15][16]. ese mining waste materials not only take up a lot of land but also directly pollute the environment and threaten the safety of people's lives, since some of these wastes contain highly toxic elements, such as arsenic and cadmium, and radioactive elements, which are harmful to human health [1,2,4,[7][8][9]. Tailings have the characteristics of fine particles, small weight, large surface area, being easy to run away in contact with water, and being easy to fly when exposed to wind. erefore, those solid mining wastes are a potential hazard to the air, water bodies, farmland, and villages [4,13,[17][18][19][20].
To address such issue of solid mining waste, researchers have started various approaches to recycling and reproduction of those construction materials using the mining waste, and one of the most representative cases is the widely used cement-based materials which can be produced environmentally friendly if the main aggregates can be replaced by the mining waste [21][22][23][24][25][26][27][28][29][30][31]. In general, cement and water should be added to cement-based composites to increase strength and fluidity. After mixing with water, the slurry can be hardened in the water, and the sand, stone, and other materials can be firmly cemented together [32][33][34][35][36][37][38][39][40]. Due to its important role as a building material, it has received more and more attention in recent years, especially in the research objectives of strength and durability [41][42][43][44][45][46][47][48][49].
As mentioned above, compressive strength is a key index for evaluating the construction process and workability of cement-based composites, so most studies focus on experiments and revealing mechanisms [50][51][52][53][54][55][56][57]. Typically, the strength of cement-based materials is tested in the laboratory and the practical civil engineer finds the optimized proportion of the constituent materials in the composition [58][59][60][61][62][63][64][65][66]. However, experimental research is time-consuming and expensive and is not suitable for a large number of laboratory tests [41,42,[67][68][69][70][71][72]. Some researchers have applied empirical formulas to predict the strength of cement-based materials using the so-called mathematical regression, in which some of the empirical formulas are useful, but most of them are limited by the input and output dataset. Because of the low accuracy of the empirical formula, its scope of application has barely been realized. Hence, to accurately predict the compressive strength of cement-based materials, simpler and more reliable models should be proposed.
Using artificial intelligence methods to evaluate and predict the mechanical properties of cement-based materials is a hot spot in the research field of cement-based composites [35,39,63]. Besides, some researchers have applied machine learning methods to evaluate the performance of cementbased composites. For example, artificial neural networks (ANN) and other evolutionary optimization algorithms have been used to predict the compressive strength of cementbased materials. Jamal et al. analyzed the potential to use multiple linear regression and adaptive neurofuzzy reasoning systems for the estimation of the strength of recycled aggregate concrete [73]. Khademi and Jamal used the same method in a follow-up study to predict the compressive strength of the recycled aggregate concrete after curing for 28 days [74]. e measured results of the above-mentioned method were consistent with the predicted ones, indicating that the machine learning techniques are feasible for the performance prediction of the cemented-based materials. However, limited studies can support accurate and efficient artificial intelligence techniques that specifically target the compressive strength of the cemented-based materials that use the mining waste as the aggregates. Also, it should be noted that the above machine learning techniques have been successfully adapted to the prediction of the concrete materials, but these studies still have the limitations of uncertainty, being time-consuming, and low efficiency. erefore, it is necessary to propose more efficient and simple machine learning techniques to evaluate the compressive strength of the cemented-based materials that use the mining waste as the aggregates. Besides, more intelligent algorithms such as random forest, support vector machine, and decision tree are also applied in this field. It should be noted that most machine learning models can make predictions accurately, but there is limited research on the impact of different models on the accuracy of predictions. e above algorithms have different sensitivities to datasets with different attributes. erefore, it is necessary to use different models for predictive compression and to find the optimal model for future applications.
In this study, various experimental tests were conducted to construct the dataset considering four key factors, that is, solid mining waste to cement ratio, fine sand ratio, solid mining waste to water ratio, and curing time. en three machine learning models, that is, SVM, DT, and RF, were used for the prediction of the strength of cement-based materials using the mining waste as the aggregates, and the hyperparameters of these models were tuned by beetle antennae search (BAS) algorithm. Last, the prediction results were compared and analyzed, which give the best models in compressive strength prediction of the cement-based materials. Also, the importance of influencing variables was conducted and the most sensitive factor was found. e study provides the best models for strength prediction of cement-based materials, which is an important guideline in this research field.

Dataset Collection.
In the present study, the compressive strength dataset used for the prediction of cement-based materials was collected from the previous studies by the authors. e tailings were employed as the solid mining waste for the replacement of the aggregates. Four parameters confirmed to be the influenced variables for the compressive strength of the cement-based materials were determined as the input: the ratio of the solid mining waste to the cement, fine sand ratio, the ratio of the solid mining waste to water, and the curing time. Table 1 presents the composition of the dataset. e solid mining waste to the cement ratio was determined as 4, 6, 8, and 10, respectively. e fine sand ratio concerning the total mixture was selected as 0, 0.1, 0.15, and 0.2. e solid mining waste to water ratio was determined as 0.68, 0.7, and 0.72, respectively. e concrete samples were cured for 7, 28, and 60 days. Table 2 shows the statistical analysis of these variables in the dataset. e Pearson correlation coefficient was employed in this study to evaluate the relationship between the input parameters. e Pearson correlation coefficient between two input variables is defined as the quotient of covariance and standard deviation between two variables, as shown in the following equation: Correspondingly, the correlation of the collected dataset was analyzed by SPSS software in the present study. Figure 1 gives the correlation matrix between the input variables (the solid mining waste to cement ratio, fine sand ratio, solid mining waste to water ratio, and curing time).
It can be found from Figure 1 that the correlation between the two same variables is 1 on the diagonal from the bottom left to the top right and the correlation coefficient of the part above the diagonal is symmetric with the correlation coefficient of the part below the diagonal. e correlation coefficients between the different variables are relatively low (most values are close to 0.1). is indicates that these input variables are independent of each other, so they can be used as input variables for intelligent prediction of compressive strength of the concrete samples without causing multicollinearity issues.

Artificial Intelligence Algorithms.
In the present study, three artificial intelligence algorithms, SVM, DT, and RF, were employed to predict the compressive strength of concrete samples with tailings as aggregate. Support vector regression (SVR) seeks an estimation indicator function, which can be used to classify test samples [60]. By extending the problem from seeking indication function estimation to seeking real-valued function estimation, a support vector machine (SVM) for function estimation (regression) can be obtained [75][76][77]. SVM effectively solves the problems of a small number of samples, high dimension, and nonlinearity [78,79]. However, as a new machine learning algorithm, there are still some areas to be improved, and the selection of its parameters (including error ε, error penalty factor C, and kernel function parameters y) is one of the problems to be improved [60,80,81]. e kernel function parameter Y affects the complexity of the distribution of sample data in the high-    e DT model is a decision analysis method that obtains the probability that the expected value of net present value is greater than or equal to zero, evaluates the project risk, and judges its feasibility based on the known probability of occurrence of various situations by forming a decision tree. e DT model is a graphical method that directly uses probability analysis. In machine learning, the DT algorithm can be used as a prediction model, which represents a mapping relationship between object attributes and object values.
RF is an algorithm that integrates multiple trees through the idea of ensemble learning. Its basic unit is the decision tree, and its essence belongs to the ensemble learning method of machine learning. In fact, from an intuitive point of view, each decision tree is a classifier. erefore, for input with n samples, n trees will have n classification results. e RF algorithm integrates all the classification voting results and designates the category with the most votes as the final output. In the RF algorithm, k feature columns are randomly selected from the dataset with a total of m features (where k is less than or equal to m), and then a decision tree is established based on these k features. is process should be repeated n times, and n decision trees can be built from the k properties through different random combinations. Finally, the random variables are passed to each decision tree to predict the outcome.
In this study, the input variables of the model are the ratio of solid waste to cement, the ratio of fine sand, the ratio of solid waste to water, and the curing time, and the output is the compressive strength of concrete samples. ese machine learning algorithms are widely used to address the regression problems in the engineering area such as the prediction of the concrete performance, rock performance, and rockburst. However, these models have not been compared to find the optimized one, especially regarding the concrete samples with tailings as aggregate. erefore, it is necessary to compare them and find out the best model to predict the compressive strength of concrete specimens with mining waste as aggregate.

Evaluation of the Predictive Performance.
e evaluation method is crucial for the performance of machine learning models because it can help researchers find the optimal algorithm for application. In this study, the dataset is divided into two parts: the training set (70%, including 100 data points) and the testing set (30%, including 44 data points).
Each dataset was then validated using a 10-fold cross-validation method. According to the literature, the correlation coefficient (R) and root mean square error (RMSE) were used in this study to evaluate the performance of the applied model. e two parameters can be described as follows: where N represents the number of datum, which is 144 in the present study; y * i and y i represent the expected and measured compressive strengths of the concrete samples, respectively; y and y * are the average values of the predicted and measured compressive strengths of the concrete samples, respectively. Figure 2 gives the experimental results of the compressive strength of the concrete samples using the tailings as the aggregate and all the datasets, the solid mining waste to cement ratio, fine sand ratio, solid mining waste to water ratio, and curing time, are summarized in it. As can be seen from Figure 2, the influence of fine sand ratio on the compressive strength of concrete specimens is generally negative, that is, with the decrease of fine tailings, the compressive strength of concrete specimens increases. However, it should be noted that, in some cases, the effect of fine tailings is not clear. In addition, with the increase of curing time, the compressive strength of concrete specimens increases rapidly. e solid mining waste to water ratio also has a similar influencing effect, and the compressive strength of concrete samples increases with the increase of the solid mining waste to water ratio. However, with the increase of solid mining waste to cement ratio, its strength decreases. e experimental results are consistent with the results of the previous literature, indicating that the dataset is reliable and accurate. Figure 3 gives the relationship between iterations and RMSE values.

Hyperparameters Tuning.
For different artificial intelligence algorithms, BAS is used to find the optimized hyperparameters. As can be seen from Figure 3, RMSE drops rapidly after several iterations and then remains stable, indicating that BAS can effectively tune these artificial intelligence models. e reduction rates of RMSE values of the three models are similar, indicating the lower difference in computational efficiency. Also, it can be clearly seen from Figure 3 that, due to the hyperparameters tuning of the BAS algorithm, the SVM model can obtain the minimum RMSE, while BAS algorithm is inefficient in DT and RF models. In summary, the BAS algorithm is efficient and accurate in the hyperparameter tuning process of the three models and the following model evaluation process can be conducted. e 10-fold cross-validation (CV) was employed for the hyperparameter tuning in the three machine learning models. Figures 4-6 give the RMSE values of different folds using the SVM, DT, and RF models, respectively.
It can be seen from the figures that the SVM model can obtain the minimum value of RMSE at the 4th fold with a value of 0.065 during the 10-fold CV process. e minimum value of RMSE in the DT model can be obtained at the 4th fold as well, with a value of 0.095. RF model can obtain the minimum value of RMSE at the 7th fold with a value of 0.08.

Model Evaluation.
e predictive results of the three machine learning models employed in this study were systematically compared. Figures 7-9 give the results of the comparison between the actual compressive strength and predicted compressive strength using the SVM model, DT model, and RF model, respectively.
It can be observed that the RMSE values (test set) of the SVM model, DTmodel, and RF model were 0.2332, 0.24, and  Advances in Civil Engineering 0.2286, respectively. R values (test set) of the SVM model, DT model, and RF model were 0.9699, 0.9619, and 0.9731, respectively. It is indicated that the RF model is the optimized machine learning algorithm to predict the compressive strength of the cement-based materials using the tailings as the aggregates. However, it should be noted that the predictive performances of the three models are quite similar (less than 5%).

Variable Importance Evaluation.
e model can effectively predict the compressive strength of the cementing material of tailings as aggregate, and the relative importance of each variable can be determined by the machine learning model. To achieve optimal prediction performance, the RF model was used to calculate the importance of variables in this study, and the results are shown in Figure 10.     It can be seen that the solid mining waste to cement ratio is the most important variable, and its influence score is 4.9159. Curing time (3.1694) can also be used as an important parameter in the compressive strength of cemented materials relative to the total amount of mining waste, followed by the water-solid ratio of mining waste (0.0593) and fine sand ratio (0.0269), indicating that engineers should pay more attention to the ratio of solid mining waste to cement. e above results have a certain guiding significance for the mix design of the cemented materials with mining waste as aggregate and field construction.

Conclusions
Based on a series of experimental studies on cement-based materials with mining waste as the aggregates, the dataset was constructed and the model was evaluated. ree machine learning models (SVM, DT, and RF) were used to predict the compressive strength of cement-based materials with mining waste as the aggregates, and the prediction results of different models were compared. e corresponding results are as follows: (1) e results of compressive strength of cement-based materials show that, with the increase of water content and curing time of solid mine waste, the compressive strength of cement-based materials increases, while it decreases with the increase of fine sand ratio and solid waste rock cement ratio. (2) e BAS algorithm can effectively tune these artificial intelligence models. e reduction rates of RMSE values of the three models are similar, indicating the lower difference in computational efficiency. Due to the hyperparameters tuning of the BAS algorithm, the SVM model can obtain the minimum RMSE, while the BAS algorithm is inefficient in DT and RF models. Advances in Civil Engineering predict the compressive strength of cement-based materials using solid mining waste as aggregate effectively and accurately, with high R values (0.9699, 0.9619, and 0.9731 for the SVM, DT, and RF models) and lower RMSE values (SVM, DT, and RF models were 0.2332, 0.24, 0.2286, respectively). Comparing the three developed models, the RF algorithm can obtain the highest value of R and the lowest value of RMSE, demonstrating the highest accuracy. (4) e solid mining waste to cement ratio is the most important variable to affect the compressive strength of the cement-based materials using the mining waste as the aggregates. Curing time was also an important parameter in the compressive strength of cemented materials relative to the total amount of mining waste, followed by the water-solid ratio of mining waste and fine sand ratio. ese results have a certain guiding significance for the mix design of the cemented materials with mining waste as aggregate and field construction in the future.
It should be pointed out that the accuracy and reliability of these developed machine learning models depend on the dataset, that is, the number and type of samples. erefore, in the future, more samples will need to be obtained and more efficient models should be proposed.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
ere are no conflicts of interest.