Machine learning methods have been successfully applied to many engineering disciplines. Prediction of the concrete compressive strength (f_{c}) and slump (S) is important in terms of the desirability of concrete and its sustainability. The goals of this study were (i) to determine the most successful normalization technique for the datasets, (ii) to select the prime regression method to predict the f_{c} and S outputs, (iii) to obtain the best subset with the ReliefF feature selection method, and (iv) to compare the regression results for the original and selected subsets. Experimental results demonstrate that the decimal scaling and min-max normalization techniques are the most successful methods for predicting the compressive strength and slump outputs, respectively. According to the evaluation metrics, such as the correlation coefficient, root mean squared error, and mean absolute error, the fuzzy logic method makes better predictions than any other regression method. Moreover, when the input variable was reduced from seven to four by the ReliefF feature selection method, the predicted accuracy was within the acceptable error rate.
1. Introduction
Concrete is a complex composite material. The predictability of concrete properties is extremely low. Therefore, it is challenging to model the concrete properties according to the effect variables. The biggest challenge of experimental designs is a high number of effect variables affecting the response variables. Multiple effect variables increase the number of trials. The higher amount of uncontrollable variables makes it difficult to obtain the real response function.
Generally, the one-factor-at-a-time method is used in experimental designs to determine the concrete properties. The major disadvantage of this approach is that it does not consider the interaction between the factors (interaction terms). The higher the number of the controlled and uncontrolled effect variables that influence the concrete properties, the lesser the predicted accuracy. Despite this, a few experimental designs have been suggested by considering the controllable effect variables and interaction terms between them [1].
Machine learning (ML) is a highly multidisciplinary field and consists of various methods for obtaining new information [2]. ML is most often used for prediction. Predicting the categorical variable values is called classification, whereas predicting the numerical variable values is called regression. Regression is the process of analyzing the relationship between one or more independent variables and a dependent variable [3].
In recent years, the ML methods have become popular as they allow researchers to improve the prediction accuracy of concrete properties [4] and are used for various engineering applications [5, 6]. The ML methods have been used to increase the prediction accuracy of concrete properties [7–15], and the data derived from the literature sources were used. However, Chopra et al. [16, 17] applied the data generated under the controlled laboratory conditions.
Regression models tend to be used for the prediction of the compressive strength of high-strength concrete [18, 19]. These models also demonstrate how the concrete compressive strength depends on the mixing ratios [20]. Topçu and Sarıdemir [21] and Başyiğit et al. [22] developed models using the neural network (NN) and fuzzy logic (FL) methods to improve the prediction accuracy of the compressive strength of the mineral-additive (fly ash) concrete and heavy-weight concrete. Both studies concluded that the compressive strength could be predicted by using the models that were developed with the NN and FL methods without any further experiments. NN is more successful than the data mining methods and does not enhance the prediction accuracy of the concrete compressive strength [15, 17, 23–26]. Khademi et al. [27] compared the multiple linear regression, neural network, and adaptive neuro-fuzzy inference system (ANFIS) methods to estimate the concrete compressive strength for 28 days and reported that the NN and ANFIS models provide reliable results.
Previous studies evaluated the amount of the concrete component materials and compared their results to the published data. In this study, the ML regression methods were compared to predict the compressive strength and slump values of the cube samples. The samples were prepared by accounting for seven simultaneously controllable effect variables in the laboratory. The study aimed to determine the most successful regression method by comparing the decision tree (DT), random forest (RF), support vector machine (SVM), partial least squares (PLS), artificial neural networks (ANN), bootstrap aggregation (bagging), and FL models for the prediction of the concrete compressive strength and slump values. The R, RMSE, and MAE metrics were used to compare the prediction accuracy of the developed models. Finally, feature reduction was accomplished by the feature selection method. Then, the model’s success rates were compared to predict the compressive strength and slump value using fewer variables.
2. Materials and Methods2.1. Experimental Datasets
Datasets used for this study comprised seven input variables (i.e., W/C, C, f_{cc,} FA, k_{k}, CA, and TA) and two output (response) variables (i.e., f_{c} and S) for two different maximum aggregate sizes D_{max.} = 22.4 mm (D_{224}) and D_{max.} = 11.2 mm (D_{112}). The input variables were selected considering the simultaneously controllable effect variables [28–30]. D-optimal design obtained by the augmentation of the fractional factorial design (2^{7-3}) was used as the experimental design. In the D-optimal design, 58 and 56 test results were employed for D_{112} and D_{224}, respectively. Each experimental result was calculated as an average of three sample results that are produced under laboratory conditions [28–30]. Properties of the constituents are given in Table 1 [28–30]. Abbreviations of the effect and response variables and the basic statistic of the datasets are presented in Table 2.
Properties of the constituents.
Fineness modulus, k (−)
Particle density, ρ (kg/m^{3})
Water absorption, μ (kg/kg)
Compressive strength, f_{cc} (MPa)
Blaine specific surface, σ (m^{2}/kg)
Aggregate
Basalt
Crushed stone II
10.456
2872
0.0100
—
—
Crushed stone I
9.129
2878
0.0130
—
—
Crushed stone sand
5.198
2845
0.0220
—
—
Limestone
Crushed stone II
10.181
2600
0.0120
Crushed stone I
7.107
2590
0.0170
Crushed stone sand
4.791
2550
0.0260
—
—
Sand
3.770
2600
0.0140
—
—
Binding material
Cement
CEM V/A (S-P) 32.5 N
—
2990
0.0000
34.4
416.0
SDC 32.5 R
—
3160
0.0000
44.75
339.0
CEM I 42.5 R
—
3140
0.0000
55.1
379.0
Admixture
Super plasticizer
—
—
1100
0.0000
—
—
Basic statistic of used datasets.
Data
Attribute
Abbreviation
Unit
Min
Max
μ
σ
σ2
D_{112}
Water/cement
W/C
%
54.95
59.88
57.38
2.07
4.29
Cement content
C
Kg
330.00
345.00
337.72
6.31
39.76
Compressive strength of cement
f_{cc}
MPa
34.40
55.10
44.75
9.09
82.69
Fine aggregate
FA
%
65.00
68.00
66.47
1.27
1.62
Fineness module
k_{k}
—
5.60
5.80
5.70
0.07
0.01
Chemical admixture
CA
%
1.20
1.40
1.30
0.08
0.01
Concrete compressive strength
f_{c}
MPa
19.86
44.19
33.30
6.91
47.81
Slump value
S
cm
1.20
23.20
12.35
7.06
49.85
Type of aggregate
TA
—
0: limestone, 1: basalt
D_{224}
Water/cement
W/C
%
50.00
54.95
52.60
2.11
4.46
Cement content
C
kg
330.00
345.0
337.63
6.49
42.14
Compressive strength of cement
f_{cc}
MPa
34.40
55.10
45.12
9.25
85.56
Fine aggregate
FA
%
48.00
54.00
51.00
2.36
5.56
Fineness module
k_{k}
—
6.60
6.80
6.70
0.09
0.01
Chemical admixture
CA
%
1.20
1.40
1.230
0.09
0.01
Concrete compressive strength
f_{c}
MPa
26.59
53.87
40.38
8.12
65.92
Slump value
S
cm
2.60
21.70
13.33
6.56
43.00
Type of aggregate
TA
—
0: limestone, 1: basalt
μ: mean, σ: standard deviation, and σ^{2}: variance.
3. Methods
In this study, the concrete compressive strength and slump values were predicted using the ML regression models, namely, the regression tree, RF, support vector machines, artificial neural network, partial least square, bagging, and FL. Datasets were randomly split into 70% for the training set and 30% for the independent test set. The training data were used to train the ML model. The independent test data were applied for the evaluation of the model’s performance. The 10-fold cross-validation procedure helped in the estimation of the ML model skills.
The ML preprocessing steps were applied to the raw datasets before they could be utilized for the regression method training. The datasets were not normally distributed according to the Shapiro–Wilk normality test [31] results. Many normalization methods have been previously developed to normalize the dataset [32]. In this study, four different normalization methods (i.e., min-max, decimal, sigmoid, and z-score) were applied to derive the most successful normalization method for the raw dataset. Then, the K-nearest neighbor (KNN) regression method was applied to the normalized datasets. The prediction results were compared to determine the most suitable normalization method. Later, the raw datasets were normalized with the determined normalization technique.
The ML regression models were trained to predict the f_{c} and S values. The correlation coefficient (R), root mean squared error (RMSE), and mean absolute error (MAE) metrics were employed to compare the models’ prediction performance. According to these statistical results, the most successful regression method was determined to predict the f_{c} and S values. Afterward, the feature selection method was used to obtain the subset with fewer features, and the prediction accuracy was examined. All regression methods and computations were performed using the R programming language [33]. The prediction process is illustrated in Figure 1 in the form of a flow diagram.
Flow diagram of the prediction process.
3.1. Normalization Methods
Normalization is the preprocessing step in ML. The normalization methods are used where the variation intervals of the variables in the dataset differ. When the mean and variance of the variables differ significantly, the variables with a large mean and variance increase the impact on the other variables. This may result in the loss of important variables due to the low variation intervals. It can also affect the success of the ML models [34, 35]. Therefore, regression models are normalized by the numerical data normalization methods to standardize the effect of each variable on the results. In this study, the dataset was normalized by the min-max, decimal, sigmoid, and z-score normalization techniques, and then their performances were compared.
3.2. Machine Learning Methods
The ML regression method estimates the output value using the input samples of the dataset. Such a procedure is also termed as the training set. The purpose of the regression method is to minimize the error between the predicted and actual outputs [36]. Herein, seven different regression methods (i.e., DT, RF, support vector machine, partial least squares, artificial neural networks, bootstrap aggregation (bagging), and FL) were used to predict the concrete compressive strength and slump values. Additionally, the K-nearest neighbor method was applied to determine the suitable normalization method for the dataset. These methods are briefly described below.
Decision tree (DT) [37] is a supervised ML algorithm. It can be used for both regression and classification. The aim of the DT algorithm is to divide the dataset into smaller, meaningful pieces, where each input has its own class label (tag) or value. Different measurements are used for the DT splitting, such as Gini and information gain. Regression tree is a type of a DT and a hierarchical model for the supervised learning. Classification and regression trees (CART), ID3, and C4.5 methods are the most important learning algorithms mentioned in the literature. In this study, the CART [38] model is used for the regression.
Random forest (RF) [39] is an ensemble method that combines many DTs. It can be used for both regression and classification. Each DT in the forest is created by the selection of different samples from the original dataset by the bootstrap technique. These samples are then trained using a set of attributes selected by the bagging mechanism. Subsequently, the decisions made by a large number of individual trees are subjected to voting. As such, the most voted class is presented as the class estimate of the community.
Support vector machine (SVM) has been developed by Vapnik [40]. It is applied both for regression and classification. The SVM method is based on finding an optimal hyperplane that maximizes the margin between the classes.
Partial least squares (PLS) [41] regression generalizes and combines the attributes from the principal component analysis and multiple regression. The most important characteristic of the PLS method is its ability to obtain a simple model with a few components, even when the variables are highly correlated or linearly independent.
Artificial neural networks (ANN) [42] involve a system of many interconnected neurons. The neurons are connected by the weighted links. The ANN architecture consists of the input, hidden, and output layers. The multilayer perceptron neural network (MLP) is a fully connected, feedforward type of network. It is mostly used in network architecture. The output of all the neurons in the input layer is scaled by the related connection weights. Then, the input of the neurons is feedforwarded to the output layer. Activation functions are used for the sum of the input neuron signals in the output layer.
Bootstrap aggregation (bagging) was introduced by Breiman [43] and can be utilized for both regression and classification. Bagging is performed by aggregating the resulting prediction rules using the bootstrap samples from the training sample.
Fuzzy logic (FL) is an ML method and was introduced by Zadeh [44]. FL is a mathematical-based method used to analyze the systems in a manner similar to how people do. As many problems could not be expressed by the exact mathematical definitions, a new method was developed. In the classical approach, an element is a member or nonmember of the cluster, making the result equal to zero or one. However, in the FL, the situation is expressed by the membership degrees, which indicate the element’s involvement in the cluster. The membership function is used to map each element into a continuous interval from zero to one. In other words, the membership degree of the element can vary as an infinite number from zero to one. A typical fuzzy system consists of a rule base, membership functions, and inference procedure. In this study, Wang and Mendel’s technique (WM) was employed to generate the fuzzy rule.
K-nearest neighbor (KNN) [45] is an instance-based algorithm and can be applied for both regression and classification. The KNN method searches for the k-data points closest to the test object and uses the features of these neighbors to classify the new object. For this, a distance is measured between each instance in the training dataset and the test instance. Herein, k = 3, 5, and 7 were chosen. The Euclidean distance was deployed as a distance measure. The “knn.reg” function was used in the “FNN” package [46]. The detailed information regarding the ML regression methods applied in this study is presented in Table 3.
Hyperparameters of machine learning regression models.
To evaluate the predicted values of the regression methods, the actual and predicted values were compared. In this study, the R, RMSE, and MAE metrics were used to evaluate the prediction accuracy [47]. The model parameters were optimized for the highest R, lowest RMSE, and lowest MAE. All of them were calculated according to the following equations:(1)R=∑iNactuali−actual¯predictedi−predicted¯∑i=1Nactuali−actual¯2∑i=1Npredictedi−predicted¯2,(2)RMSE=1N∑i=1Npredictedi−actuali2,(3)MAE=1N∑i=1Npredictedi−actuali.
Here, N is the number of data points.
3.4. Feature Selection
Feature selection (reduction in irrelevant variables) is the preprocessing step in ML that selects the best subset from the original dataset by evaluating the properties according to the used algorithm [48]. The ReliefF algorithm was developed by Kira and Rendell [49]. It weights the features according to the relationship between the effect variables. Although this method was successfully applied to two classes of the datasets, it was not proved functional for the datasets with multiple classes. To solve this problem, in 1994, Kononenko developed the ReliefF algorithm that works for the multiclass datasets [50]. The algorithm determines the weights of the continuous and discrete attributes based on a distance between the instances.
4. Results and Discussion
The cross-correlation between the datasets representing the parameters D_{112} and D_{224} is depicted in Figure 2. The correlation coefficient provides information on the effect level and direction of the linear relationship between two variables. The Pearson correlation is used when the dataset has a normal distribution, whereas the Spearman correlation is applied when the normal distribution cannot be reached.
Correlation matrix of D_{112} (a) and D_{224} (b) datasets.
According to the correlation results of the D_{112} dataset, the response variable f_{c} is highly correlated with the effect variable f_{cc} (0.88). Moreover, the highest correlation is observed between the response variable S and the effect variable TA (−0.57 for basalt and 0.57 for limestone). According to the correlation results of the D_{224} dataset, the response variable f_{c} is highly correlated with the effect variable f_{cc} (0.91). Besides, the highest correlation is obtained between the response variable S and the effect variable TA (−0.55 for basalt and 0.55 for limestone).
Before the data analysis begins, the data must be checked in accordance with the normal distribution. In this study, the normality test was performed using the Shapiro–Wilk normality test with the Gaussian error [51]. In the Shapiro–Wilk normality test, when the probability is >0.05, the data are normally distributed, whereas when the probability is <0.05, the data demonstrate a nonnormal data distribution. The small W value in the Shapiro–Wilk normality test indicates that the sample is not normally distributed. The Shapiro–Wilk normality test results for the D_{112} and D_{224} datasets are presented in Table 4.
Shapiro–Wilk normality test results for datasets.
Variables
D_{112} dataset
D_{224} dataset
P value
W
P value
W
W/C
1.21 ⋅ 10^{−07}
0.792
9.46E − 08
0.780
C
1.36 ⋅ 10^{−07}
0.794
8.11E − 08
0.777
f_{cc}
2.75 ⋅ 10^{−08}
0.764
2.32E − 08
0.752
FA
7.11 ⋅ 10^{−08}
0.782
3.73E − 07
0.805
k_{k}
3.40 ⋅ 10^{−07}
0.810
6.73E − 08
0.773
CA
9.28 ⋅ 10^{−08}
0.787
4.74E − 08
0.767
According to the Shapiro–Wilk normality test results (Table 4), the D_{112} and D_{224} datasets are not normally distributed with the probability of the variables <0.05 and very high W values for both datasets. Furthermore, the box-plot graphs (Figure 3) prove that the dataset is not normally distributed. With the box-plot graph, it is possible to examine both ranges of the value and the numeric variable distribution.
Box-plot graphs for the raw and decimal normalized dataset.
In this study, four different normalization techniques, namely, min-max, decimal, sigmoid, and z-score were applied to four different datasets. As a result, the most successful method was determined. After the normalization of the datasets by these methods, their success rate was compared using the KNN regression method. The KNN regression method was chosen being distance-based and rapid in application. In this study, the k-values were selected at 3, 5, and 7. The results are provided in Table 5. According to the KNN regression results, the f_{c} (D_{112}, D_{224}) and S (D_{112}, D_{224}) values were normalized by the decimal scaling and min-max normalization methods, respectively.
The results of the normalization methods.
Regression method
Normalization method
D_{112}_f_{c} dataset
D_{112}_S dataset
D_{224}_f_{c} dataset
D_{224}_S dataset
RMSE
MAE
RMSE
MAE
RMSE
MAE
RMSE
MAE
3NN
Min-max
5.32
3.96
19.93
18.24
5.51
3.42
27.87
26.34
Decimal
3.13
2.41
25.46
24.79
3.39
2.70
29.09
27.80
Sigmoid
5.21
3.35
26.41
25.21
5.65
3.61
28.13
26.64
Z-norm
5.22
3.62
25.84
24.68
5.60
3.50
28.22
26.69
5NN
Min-max
6.53
5.09
19.70
17.99
5.33
4.64
30.17
28.92
Decimal
2.78
2.32
23.44
22.61
3.46
3.07
34.17
33.12
Sigmoid
6.03
5.24
22.82
21.26
5.50
4.66
30.73
29.56
Z-norm
6.01
5.22
22.81
21.16
5.66
4.87
30.34
29.22
7NN
Min-max
5.51
4.36
20.25
19.09
5.51
4.57
27.61
26.53
Decimal
3.60
2.89
21.97
21.26
3.78
3.12
29.40
28.18
Sigmoid
5.69
4.77
21.65
20.70
5.50
4.25
26.63
25.66
Z-norm
5.63
4.72
21.63
20.67
5.46
4.20
26.63
25.64
The results of the RF, SVM linear, SVM linear (SVMLin), SVM polynomial (SVMPoly), PLS, Bagging, DT, MLP, and FL models for the prediction of the compressive strength and the slump value are presented in Table 6.
Metrics results of the different regression methods.
Dataset
Metric
RF
SVMLin
SVMPoly
PLS
Bagging
DT
ANN
FL
D_{112}_f_{c}
R
0.916
0.912
0.920
0.907
0.915
0.857
0.932
0.945
RMSE
2.362
2.518
3.046
2.604
2.419
2.878
2.855
1.090
MAE
1.957
1.837
2.423
2.001
2.117
2.511
2.625
0.933
D_{112}_S
R
0.833
0.758
0.761
0.705
0.705
0.693
0.897
0.947
RMSE
4.748
4.983
5.094
5.380
6.100
5.942
2.686
2.477
MAE
4.302
3.702
3.933
4.476
5.776
5.465
3.409
1.954
D_{224}_f_{c}
R
0.853
0.816
0.816
0.779
0.736
0.408
0.899
0.928
RMSE
2.054
3.285
2.943
3.243
2.689
3.008
2.107
1.442
MAE
1.641
2.678
2.316
2.724
2.192
2.364
2.926
0.995
D_{224}_S
R
0.772
0.654
0.765
0.645
0.730
0.518
0.896
0.977
RMSE
3.778
5.114
4.005
5.015
4.094
5.424
2.534
1.413
MAE
2.428
3.994
2.708
4.050
3.254
4.442
3.842
1.152
The reason for the selection of these regression methods was the successful employment of those prediction algorithms in published literature. As mentioned earlier, the datasets were randomly divided into the training (70%) and individual test sets (30%). Herein, the training and individual test sets consisted of 40 and 18 instances for the D_{112} dataset, respectively, and 39 and 17 instances for the D_{224} dataset, respectively. Prediction results for the models were obtained from the 10-fold cross-validation process. The performance of these regression methods was evaluated according to the R, RMSE, and MAE statistical criteria. R was employed to evaluate the good fit between the predicted and actual values. A combination of the R, RMSE, and MAE results was sufficient to reveal any significant differences between the predicted and actual values.
According to the statistical results of the regression method (Table 6), the FL regression model delivered the highest prediction accuracy for the prediction of the response variables f_{c} and S according to the maximum aggregate sizes (D_{112} and D_{224}). The FL model achieved the best prediction accuracy results among all the performance criteria according to seven benchmark models.
The prediction results obtained from the FL regression model and actual results are depicted in Figures 4 and 5. The prediction values for the compressive strength and slump are similar to the actual values.
Comparison between actual and predicted values of f_{c} and S values using the FL model for the D_{112} dataset.
Comparison between actual and predicted values of f_{c} and S values using the FL model for the D_{224} dataset.
To reduce the number of the effect variables, the ReliefF feature selection method was used to determine the high-level effect variables. As a result, the f_{cc}, k_{k}, C, and W/C effect variables were selected as they had a high-level effect on f_{c} and S for the maximum aggregate size. The results of R, RMSE, and MAE obtained after applying the FL model to all the effect variables and reduced effect variables are presented in Table 7. This table also indicates that there is no significant change in the R, RMSE, and MAE results when the number of features was reduced from seven to four. Therefore, the FL model with fewer features can still make successful predictions.
The results of the FL regression for the original dataset and selected subset.
Dataset
Metric
FL
FL after FS
D_{112}_f_{c}
R
0.945
0.962
RMSE
1.090
1.245
MAE
0.933
0.961
D_{112}_S
R
0.947
0.946
RMSE
2.477
2.546
MAE
1.954
1.514
D_{224}_f_{c}
R
0.928
0.954
RMSE
1.442
1.461
MAE
0.995
1.062
D_{224}_S
R
0.977
0.927
RMSE
1.413
1.743
MAE
1.152
1.183
The effect levels of the simultaneously controllable effect variables on the response variables exhibit some variations [28–30]. Considering the selected variation intervals, the f_{cc}, k_{k}, C, and W/C variables had a significant effect level on the response variables for the maximum aggregate sizes. Cement strength (f_{cc}), cement dosage (C), and water/cement (W/C) ratio tend to have a significant effect on the compressive strength. Furthermore, the fineness modulus (k_{k}), which expresses the fineness and distribution of the mixture aggregate, is one of the essential variables that affects the concrete compactness. Moreover, the concrete compactness directly affects the compressive strength.
The workability of concrete is directly influenced by the cement properties (e.g., cement fineness), aggregate properties (e.g., roughness of the aggregate surface), and amount of mixing water. Particularly, it is not expected that the chemical additive variable does not have a significant effect on workability. The variation intervals of the chemical additive are negligible and do not show a significant effect on the workability of concrete. However, the variation intervals of the other effect variables can be considerable. Therefore, the predicted accuracy does not decrease due to the FA, CA, and TA variables, which do not have a significant effect on response variables in the selected variation intervals.
5. Conclusions
The goals of this study were (i) to determine the most successful normalization technique for the datasets, (ii) to obtain the prime regression method to predict the f_{c} and S values, (iii) to choose the best subset using the ReliefF feature selection method, and (iv) to compare the regression results for the original and selected subsets.
To determine the effect levels of the effect variables on the response variables (i.e., f_{c} and S) with precision, data were analyzed for normalization. If the data were not normally distributed, it was necessary to determine the most appropriate normalization method. In this study, the Shapiro–Wilk normality test results demonstrated that the datasets were not normally distributed. The most successful techniques for the determination of the f_{c} (D_{112}, D_{224}) and S (D_{112}, D_{224}) values were the decimal scaling and min-max normalization methods, respectively. Therefore, as the variation ranges of the effect variables influencing the concrete properties varied substantially, it was necessary to preprocess the raw data for the estimation of the concrete properties.
Herein, seven different ML methods, such as DT, RF, SVM, PLS, ANN, bagging, and FL were experimented with to predict the f_{c} and S values. According to the R, RMSE, and MAE statistical results, FL is the best regression method for the maximum aggregate size. Generally, the similarity between the actual and predicted values is high for the compressive strength (Figures 4 and 5). A minimal difference between the actual and predicted slump values indicates that the slump values are more sensitive to the experimental error, simultaneously uncontrollable effect variables, and variation intervals of the effect variables. The flexibility of the computational structure of the FL approximated the results instead of providing the exact results. In particular, the uncertainties in the problem-solving and decision-making processes can be clarified by the application of the FL. Thus, complicated problems can be solved, making the FL more functional than any other ML method.
In experimental designs, where the number of the simultaneously uncontrollable effect variables is high, it is crucial to reduce the number of the experiments to save costs and time. Therefore, the predicted values close to the actual values need to be obtained with the minimum number of the experiments. In this study, seven simultaneously controllable effect variables were reduced to four effect variables (i.e., f_{cc}, k_{k}, C, and W/C) using the RelifF feature selection method. The metric results obtained by the FL regression were similar for four and seven effect variables (Table 7). Therefore, the experimental designs with fewer effect variables are sufficient for estimating the concrete properties.
Data Availability
Previously reported “compressive strength and slump of concrete” data were used to support this study and are available in the author’s PhD thesis, report, and article. These prior studies (and datasets) are cited at relevant places within the text as references [28–30].
Conflicts of Interest
The author declares that there are no conflicts of interest.
MontgomeryD. C.AwadM.KhannaR.HofmannM.KlinkenbergR.BoukhatemB.KenaiS.Tagnit-HamouA.GhriciM.Application of new information technology on concrete: an overview/naujų informacinių technologijų naudojimas ruošiant betoną. ApžvalgaCihanP.GökçeE.KalıpsızO.A review of machine learning applications in veterinary fieldOzbasE. E.AksuD.OngenA.AydinM. A.OzcanH. K.Hydrogen production via biomass gasification, and modeling by supervised machine learning algorithmsNiH.-G.WangJ.-Z.Prediction of compressive strength of concrete by neural networksAkkurtS.TayfurG.CanS.Fuzzy logic model for the prediction of cement compressive strengthÖztaşA.PalaM.ÖzbayE. A.KancaE.CaglarN.BhattiM. A.Predicting the compressive strength and slump of high strength concrete using neural networkPalaM.ÖzbayE.ÖztaşA.YuceM. I.Appraisal of long-term effects of fly ash and silica fume on compressive strength of concrete by neural networksOzturanM.KutluB.OzturanT.Comparison of concrete strength prediction techniques with artificial neural network approachAlshihriM. M.AzmyA. M.El-BisyM. S.Neural networks for predicting compressive strength of structural light weight concreteSarıdemirM.Prediction of compressive strength of concretes containing metakaolin and silica fume by artificial neural networksDiabA. M.ElyamanyH. E.Abd ElmoatyA. E. M.ShalanA. H.Prediction of concrete compressive strength due to long term sulfate attack using neural networkChopraP.SharmaR. K.KumarM.Artificial neural networks for the prediction of compressive strength of concreteChopraP.SharmaR. K.KumarM.Prediction of compressive strength of concrete using artificial neural network and genetic programmingChopraP.SharmaR. K.KumarM.ChopraT.Comparison of machine learning techniques for the prediction of compressive strength of concreteWuS.LiB.YangJ.ShuklaS.Predictive modeling of high-performance concrete with regression analysisProceedings of the 2010 IEEE International Conference on Industrial Engineering and Engineering ManagementDecember 2010Macao, China1009101310.1109/IEEM.2010.56742292-s2.0-78751697084ZainM. F. M.AbdS. M.Multiple regression model for compressive strength prediction of high performance concreteNamyongJ.SangchunY.HongbumC.Prediction of compressive strength of in-situ concrete based on mixture proportionsTopçuİ. B.SarıdemirM.Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logicBaşyigitC.AkkurtI.KilincarslanS.BeyciogluA.Prediction of compressive strength of heavyweight concrete by ANN and FL modelsWankhadeM.KambekarA.Prediction of compressive strength of concrete using artificial neural networkChopraP.SharmaR. K.KumarM.Predicting compressive strength of concrete for varying workability using regression modelsNikooM.MoghadamF. T.SadowskiŁ.Prediction of concrete compressive strength by evolutionary artificial neural networksKhashmanA.AkpinarP.Non-destructive prediction of concrete compressive strength using neural networksKhademiF.AkbariM.JamalS. M.NikooM.Multiple linear regression, artificial neural network, and fuzzy logic prediction of 28 days compressive strength of concreteGünerA.CihanM. T.Tepki yüzeyi yöntem bilgisinin beton uygulamasında kullanılabilirliğinin geliştirilmesi2012Istanbul, TurkeyFen Bilimleri Enstitüsü, Yıldız Teknik ÜniversitesiPh.D. thesisCihanM. T.GünerA.YüzerN.Response surfaces for compressive strength of concreteRoystonP.Approximating the Shapiro-Wilk W-test for non-normalityDutkaA. F.HansenH. H.IhakaR.GentlemanR.R: a language for data analysis and graphicsShalabiL. A.ShaabanZ.KasasbehB.Data mining: a preprocessing engineCihanP.KalipsizO.GökçeE.Hayvan hastaliği teşhisinde normalizasyon tekniklerinin yapay sinir aği ve özellik seçim performansina etkisiAlpaydinE.QuinlanJ. R.Induction of decision treesBreimanL.FriedmanJ.OlshenR.StoneC.LiawA.WienerM.Classification and regression by randomForestVapnikV.WoldS.RuheA.WoldH.DunnW. J.IIIThe collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inversesZuradaJ. M.BreimanL.Bagging predictorsZadehL. A.Fuzzy setsKramerO.BeygelzimerA.KakadetS.LangfordJ.AryaS.MountD.LiS.ChaiT.DraxlerR. R.Root mean square error (RMSE) or mean absolute error (MAE)?—arguments against avoiding RMSE in the literatureAlpaydinE.KiraK.RendellL. A.A practical approach to feature selectionKononenkoI.Estimating attributes: analysis and extensions of ReliefRoystonJ. P.An extension of Shapiro and Wilk’s W test for normality to large samples