This research carries out a comparative study to investigate a machine learning solution that employs the Gaussian Process Regression (GPR) for modeling compressive strength of high-performance concrete (HPC). This machine learning approach is utilized to establish the nonlinear functional mapping between the compressive strength and HPC ingredients. To train and verify the aforementioned prediction model, a data set containing 239 HPC experimental tests, recorded from an overpass construction project in Danang City (Vietnam), has been collected for this study. Based on experimental outcomes, prediction results of the GPR model are superior to those of the Least Squares Support Vector Machine and the Artificial Neural Network. Furthermore, GPR model is strongly recommended for estimating HPC strength because this method demonstrates good learning performance and can inherently express prediction outputs coupled with prediction intervals.
In construction industry, high-performance concrete (HPC) has been widely used in high-rise building/infrastructure projects for its superior strength, durability, and workability which exceed those of normal concrete [
The compressive strength is determined through a standard uniaxial compression test. If the test result does not meet the designed strength, remediation actions must be undertaken. Furthermore, corrective actions for underground concrete structures, such as concrete piles or foundations, can be very costly. As a result, an accurate estimation of the compressive strength before the placement is a practical need of construction engineers.
As the relationships between concrete components and compressive strength are complex and highly nonlinear, mathematical modeling of HPC is very challenging and oftentimes inaccurate [
Accordingly, this research extends the body of knowledge by evaluating the capability of the Gaussian Process Regression (GPR) [
Due to the importance of the research topic, HPC compressive strength modeling has been a very active research area and various artificial intelligence (AI) techniques have been applied to tackle the problem of interest. Based on previous studies, AI techniques have proved its superior capability over traditional modeling methods. ANN is the most common modeling method [
Sophisticated AI based systems have also been developed to fit particular HPC data sets. Słoński [
Cheng et al. [
This research employs a data set consisting of 239 testing results of HPC concrete specimens. All the experimental tests were performed with 15 cm cylindrical specimens of HPC prepared according to the Vietnamese standard (TCVN 3105: 1993), which is relatively similar to the American standard ASTM C39. The amounts of cement (Kg/m3), sand (Kg/m3), small coarse aggregate (Kg/m3), medium coarse aggregate (Kg/m3), water (liter/m3), and superplasticizer (liter/m3) are batch components employed for expressing properties of a concrete sample. It is noted that the concrete age of each sample is measured in day.
Statistical descriptions of HPC test are reported in Table
Concrete components and statistical descriptions.
HPC input factor (IF) | Notation | Min | Mean | Std. dev. | Max |
---|---|---|---|---|---|
Cement (Kg/m3) | IF1 | 350.0 | 447.4 | 25.0 | 498.0 |
Fine aggregate (Kg/m3) | IF2 | 666.0 | 728.9 | 37.8 | 879.0 |
Small coarse aggregate (Kg/m3) | IF3 | 0.0 | 347.2 | 55.9 | 424.0 |
Medium coarse aggregate (Kg/m3) | IF4 | 626.0 | 721.3 | 61.4 | 1060.0 |
Water (liter/m3) | IF5 | 134.0 | 178.8 | 20.4 | 207.0 |
Superplasticizer (liter/m3) | IF6 | 3.5 | 5.1 | 0.6 | 7.0 |
Concrete age (day) | IF7 | 3.0 | 15.1 | 10.9 | 28.0 |
Compressive strength (MPa/m3) | CS | 23.6 | 42.5 | 13.5 | 85.2 |
GPR presents a probabilistic, nonparametric supervised learning approach for generalizing nonlinear and complex function mapping hidden in data sets. This approach has recently received huge attention of researchers in various study disciplines [
Given a training set
In GPR methodology, the
Given the training data set, the ultimate goal of the learning process is to predict the output value
Due to the assumption that the data is sampled from a multivariate Gaussian distribution, we have the following expression:
Since
When the hyperparameters of the kernel function are specified, the model parameters, including
In this experiment, the data set of HPC testing samples has been divided into two sets: the training set (90%) used for model construction and the testing set (10%) employed for model testing. Prior to the training process, it is necessary to specify the hyperparameters of GPR model. These hyperparameters include the initial value for the standard deviation of the noise
To select the maximum allowable covariance
The training set is further separated into two subsets: subset 1 (90%) and subset 2 (10%); and a grid search procedure described in Algorithm
Establishing Establishing PM =
Train GPR model with GPR model prediction with PM( // RMSE denotes the Root Mean Squared Error
Finding the best set of
With the three aforementioned hyperparameters, the training process of GPR can be executed. Accordingly the constructed model is utilized to predict the data instances in the testing set. The prediction outcome of the GPR testing phase is illustrated in Figure
GPR prediction result in the testing phase.
GPR prediction result with prediction interval.
In this section of the article, to better evaluate the performance of the GPR model, the ANN [
To establish an ANN model, number of neurons in the hidden layer should be determined in advance and this parameter significantly influences the ANN prediction capability. In order to specify an appropriate model structure for ANN, the hidden layer starts with seven neurons (which is equal to the number of input factors) and then gradually increased to the maximum value of 30 neurons. The log-sigmoid function is commonly employed as the activation function and the Levenberg-Marquardt algorithm is utilized to train the ANN [
As mentioned earlier, the data set is randomly divided into 2 sets: training set (90%) and testing set (10%). Accordingly, the training and testing sets consist of 215 and 24 cases, respectively. Nevertheless, to avoid the randomness in testing sample selection and to compare the performances of models reliably, a 10-fold cross validation process is performed [
Prediction results of the GPR model and the two benchmark models obtained from the ten-fold cross validation process are reported in Table
Prediction result comparison.
Model | Criteria | Training phase | Testing phase | ||
---|---|---|---|---|---|
Average result | Standard deviation | Average result | Standard deviation | ||
GPR | RMSE | 4.06 | 1.29 | 4.04 | 0.47 |
MAPE | 5.02 | 1.86 | 5.14 | 0.89 | |
|
0.90 | 0.07 | 0.90 | 0.05 | |
LSSVM | RMSE | 4.46 | 0.58 | 4.63 | 0.62 |
MAPE | 5.67 | 0.92 | 5.94 | 0.86 | |
|
0.89 | 0.02 | 0.87 | 0.05 | |
ANN | RMSE | 5.07 | 0.87 | 5.21 | 1.85 |
MAPE | 6.56 | 1.23 | 6.34 | 2.32 | |
|
0.85 | 0.14 | 0.81 | 0.15 |
The outcomes of the GPR, LSSVM, and ANN models attained from the cross validation process are graphically reported in Figures
GPR prediction result obtained from the 10-fold cross validation.
LSSVM prediction result obtained from the 10-fold cross validation.
ANN prediction result obtained from the 10-fold cross validation.
This research has investigated the capability of the GPR model for the task of HPC compressive strength prediction. To construct and verify the machine learning model, a data set of actual HPC compressive tests has been collected for this study. Based on experimental results, the GPR model has achieved the most desirable performance with comparatively low prediction errors (RMSE = 4.04, MAPE = 5.15%) and a high coefficient of determination
One significant advantage of GPR over other benchmark methods is that the GPR can deliver estimated compressive strength coupled with prediction interval. This property is also of great usefulness for construction engineers to reliably assess the strength of HPC concrete mixtures. Therefore, the GPR model is recommended as a promising alternative to assist construction engineers in concrete mixture design.
Despite the aforementioned advantages of GPR, one limitation of the study is that the employed approach is a black-box prediction model; hence, this may impose certain hindrance for civil engineers to understand the model structure. In addition, the size of the current data set should be expanded by collecting more testing results of HPC samples to further enhance the generalization of the prediction model.
Therefore, future extensions of this research may include applications of GPR for solving other prediction/modeling tasks in civil engineering, investigation on the effects of novel covariance functions on the GPR model performance, and discovering new techniques to improve the model learning capability. On the other hand, studying the potentiality of other machine learning techniques with transparent model structures such as instance-based learning or regression trees to meliorate the model interpretation is also a worth-investigating research direction.
The authors (Nhat-Duc Hoang, Anh-Duc Pham, Quoc-Lam Nguyen, and Quang-Nhat Pham) declare that there is no conflict of interests regarding the publication of this article.