Improvement of compressive strength prediction accuracy for concrete is crucial and is considered a challenging task to reduce costly experiments and time. Particularly, the determination of compressive strength of concrete using ground granulated blast furnace slag (GGBFS) is more difficult due to the complexity of the composition mix design. In this paper, an approach using random forest (RF), which is one of the powerful machine learning algorithms, is proposed for predicting the compressive strength of concrete using GGBFS. The RF model is first evaluated to determine the best architecture, which constitutes 500 growth trees and leaf size of 1. In the next step, the evaluation of the model is conducted over 500 simulations considering the effect of random sampling. Finally, the best prediction results are given in function of statistical measures such as the correlation coefficient (

Nowadays, ground granulated blast furnace slag (GGBFS) has been used as supplementary cementitious material in Portland concretes. GGBFS is a product of the glassy granular material formed when molten blast furnace slag is quickly cooled by water. GGBFS can replace 35–65% Portland cement in concrete. Using GGBFS as a partial replacement of Portland cement enhances concrete strength and durability through a denser matrix formation. It could also increase the performance of concrete structures. Moreover, GGBFS as a partial replacement requires approximately only 25% of the energy needed to produce Portland cement [

Numerous investigations were performed to calculate GGBFS concrete mix design, including the experimental and statistical methods. Some experimental investigations have been carried out to estimate the compressive strength of GGBFS concrete [^{3}, Oner and Akyuz [

In recent years, artificial intelligence (AI) or machine learning (ML) is gradually becoming popular and applied in numerous scientific fields [

Therefore, the primary purpose of this study is to propose an efficient RF model to increase the compressive strength prediction accuracy of concrete containing GGBFS, thanks to the higher data samples collected from the literature. Moreover, efficient RF architecture will be performed by performing numerous simulations for increasing the RF model’s reliability. Precisely, the performance of ML model is strongly affected by the parameter or architecture selections of the corresponding ML algorithms. Therefore, this study performs firstly the determination of RF architecture for better predicting the compressive strength of concrete containing GGBFS. To acquire the purpose, numerous experimental samples from the literature are gathered and randomly split up into two parts, namely, the training part (70% of data) and the testing part (30% of data). The best RF architecture is obtained and used to predict the compressive strength of concrete containing GGBFS with the evaluation of three statistical measurements, consisting of the correlation coefficient (

Accurate prediction of the concrete compressive strength using supplementary cementitious materials, such as GGBFS, is crucial thanks to many further advantages and contributions to construction design. Although many machine learning models have been proposed to predict the compressive strength of concrete in the available literature (i.e., [

introduce the variability in the sampling process to construct the training and testing datasets

assess the prediction reliability of the RF model using Monte Carlo simulations

finely tune the hyperparameters to obtain the best RF model

show that the performance of the best model is compared with 7 investigations published in the literature, confirming its simplicity and effectiveness

show a reliable variable importance analysis by taking the average results of 500 simulations

The experimental database used in this study is collected from published articles [_{1} to _{8}: cement content, kg/m^{3} (_{1}); water content, kg/m^{3} (_{2}); coarse aggregate, kg/m^{3} (_{3}); fine aggregate or sand, kg/m^{3} (_{4}); GGFBS content, kg/m^{3} (_{5}); hyperplasticizing, kg/m^{3} (_{6}); superplasticizer, % (_{7}); and age of samples, day (_{8}). The output variable of the present study is the compressive strength, MPa (denoted as

Detail of database collection.

No. | Reference | Data number | Shape of sample | Percentage (%) |
---|---|---|---|---|

1 | Oner and Akyuz [ | 168 | Cubic | 37.09 |

2 | Shariq et al. [ | 63 | Cubic | 13.91 |

3 | Chidiac and Panesar [ | 36 | Cylindrical | 7.95 |

4 | Boga et al. [ | 6 | Cubic | 1.32 |

5 | Bilim et al. [ | 180 | Cubic | 39.73 |

Total | 453 | 100% |

Histograms of the input variables used in this study: (a) cement content; (b) water content; (c) coarse aggregate content; (d) fine aggregate (or sand) content; (e) ground granulated blast furnace slag content; (f) carboxylic-type hyperplasticizing content; (g) superplasticizer content; and (h) testing age of samples.

The input variables from _{1} to _{5} are distributed in a wide range, while the variables _{6} to _{8} are in a narrow range. Precisely, the cement content (_{1}) ranges from 70 to 360 (kg/m^{3}), but it is mainly in the range of 180 to 270 (kg/m^{3}). The highest sample number is about 79, which corresponds to 180 kg/m^{3} of cement content. Similarly, the water content (_{2}) ranges from 70 to 295 kg/m^{3}. As shown in Figure _{3}) is varied from about 400 to 1200 (kg/m^{3}), but no sample has coarse aggregate content in the range of 500 to 700 (kg/m^{3}). The fine aggregate or sand content (_{4}) is mainly in two ranges from 500 to 950 (kg/m^{3}) and 1150 to 1550 (kg/m^{3}). The highest sample number (_{4}) corresponds to 680 kg/m^{3} of fine aggregate (or sand content). The GGBFS content (_{5}) varies from 40 to 460 kg/m^{3}, but the values are mostly in the range of 70 to 270 (kg/m^{3}). The carboxylic-type hyperplasticizing content (_{6}) ranges from 2 to 14 kg/m^{3}. However, hyperplasticizing is not used in almost all cases, accounting for about 330 samples (on a total of 453). Besides, almost all samples have zero superplasticizer content (_{7}) except for six samples, representing only a proportion of 1%. With the age of samples, there are ten values; the minimum age of the sample is one day, and the maximum age of the sample is 365 days.

The correlations between the inputs and compressive strength are plotted in Figure _{4} and _{6} for aggregate content and carboxylic-type hyperplasticizing content, respectively. Overall, the correlation between the inputs and compressive strength is relatively low. Therefore, all variables are included to increase the accuracy of the final model developed.

Multicorrelation graph of input and output variables used in this study.

Random forest (RF) [

In recent years, RF is used quite commonly because of its superiority compared with other algorithms; it can handle data with a large number of properties and able to estimate the importance of the attributes, often with high accuracy in classification (or regression) and fast learning process. In RF, each tree selects only a small set of attributes during construction (2nd random step); this mechanism makes the RF execute with the dataset with a large number of attributes in an acceptable time when calculating. The user can default to the number of properties to construct trees in the forest; normally the optimal default is

_{1}, _{2},…, _{k} to build trees _{1}, _{2},…, _{k}

The architecture of the random forest algorithm.

Overall, the RF model is selected in this study because of many advantages, such as the prediction accuracy, fast simulation speed, robustness to noise, and overfitting [

In this study, three statistical criteria are used to evaluate the error between the actual value and the predicted value of the compressive strength of concrete, namely, correlation coefficient (_{0} and _{t} and

The methodology of constructing the RF model to predict the compressive strength of concrete containing GGBFS is described in Figure

Methodology flow chart.

In this section, the RF architecture is determined through the mean square error (MSE), as shown in Figure

Values of MSE in function of grown tree number and leaf size number.

In this section, the RF model performance is assessed by three criteria such as

Analysis of the results over 500 simulations (presented in average values with standard deviation) using different RF architectures: (a)

Summary of different quality assessment criteria over 500 simulations with the best RF architecture.

Criteria | RMSE | MAE | ||||
---|---|---|---|---|---|---|

Training set | Testing set | Training set | Testing set | Training set | Testing set | |

Min | 0.9700 | 0.9054 | 4.6951 | 4.9858 | 3.5261 | 3.9423 |

Average | 0.9744 | 0.9461 | 5.2203 | 7.2602 | 3.8367 | 5.3628 |

Max | 0.9805 | 0.9729 | 5.5260 | 9.6108 | 4.1365 | 6.8476 |

Std | 0.0016 | 0.0127 | 0.1251 | 0.7264 | 0.0992 | 0.4900 |

Once the best architecture is found, this section is dedicated to the presentation of the best simulation using the RF algorithm. Figures

Regression graphs of the best predictor RF between experimental and predicted compressive strength: (a) training dataset and (b) testing dataset.

The comparison shows that the predicted value is very close to the experimental value. The model error is plotted between the predicted value and the experimental value for the training database (Figure

Error between target and output value plots for the case of the best RF architecture: (a) training dataset and (b) testing dataset.

Table

Summary of different quality assessment criteria for the best RF predictor.

RMSE | MAE | Err. mean | Err. std | ||
---|---|---|---|---|---|

Training set | 5.4480 | 4.1365 | −0.0563 | 5.4563 | 0.9759 |

Testing set | 4.9585 | 3.9423 | 0.6252 | 4.9647 | 0.9729 |

Table

Comparison of different machine learning models for predicting compressive strength of concrete.

Reference | Machine learning algorithm | Input | Number of data | Performance measure |
---|---|---|---|---|

Saridemir et al. [ | ANN and fuzzy logic models ANFIS | 5 inputs: TA, C, GGBFS, W, and Agg. | 284 | |

Bilim et al. [ | ANN model | 6 inputs: C, GGBFS, W, SP, Agg., and TA | 225 | |

Kandiri et al. [ | Hybridized multiobjective ANN and a multiobjective slap swarm algorithm (MOSSA)/the M5P model tree algorithm | 7 inputs: C, GGBFS, GGBFS grade (SG), W, fine Agg., coarse Agg., and TA | 624 | |

Han et al. [ | ANN model | 7 inputs: curing temperature, W/binder, GGBFS/total binder, W, fine Agg., coarse Agg., SP | 269 | |

Boukhatem et al. [ | ANN model | 5 inputs: C, W/C, GGBFS, temperature, TA | 726 | |

Boğa et al. [ | ANN model and the adaptive neuro-fuzzy inference system (ANFIS) | 4 inputs: cure type, curing period, BFS ratio, CNI ratio | 162 | |

This work | RF model | 8 inputs: C, W, coarse Agg. or gravel, fine Agg. or sand, GGBFS, FA, SP, TA | 453 |

C: cement; GGBFS: ground granulated blast furnace slag; W: water; SP: superplasticizer; TA: age of samples; FA: fly ash; Agg.: aggregate.

Finally, Figure _{1}), water content (_{2}), coarse aggregate (_{3}), fine aggregate or sand (_{4}), GGFBS content (_{5}), hyperplasticizing (_{6}), superplasticizer (_{7}), and age of samples (_{8}). After 500 simulations, the average value of _{7} is the smallest, whereas the average value of _{8} is the highest. The results show that the superplasticizer content exhibits the most negligible effect on the compressive strength of GGBFS concrete, which mainly depends on the testing age of samples. More importantly, different from most of the previously published results, the analysis shown in Figure

Feature importance over 500 simulations.

In this investigation, the RF algorithm is presented to predict the compressive strength of concrete containing GGBFS. A number of 453 experimental samples are gathered to develop the RF model. The database is randomly divided into two parts 70% of training data and 30% of testing data for the validation phase of the constructed RF model. To fully assess the RF model performance, a number of 500 simulations are performed using random sampling technique. The results show that the RF architecture containing 500 growth trees and 1 leaf size is an excellent architecture to predict the compressive strength of concrete using GGBFS, in which the mean values of

Several short-term research directions of the present work could be mentioned. First, although the effectiveness of the RF model is clearly shown in this study, the model’s applicability can be improved by collecting more data samples with a broader range of input and output variables. This could be conducted based on the investigations of Golafshani and Behnood [

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.