Artificial neural networks (ANNs) have been the preferred choice for modeling the complex and nonlinear material behavior where conventional mathematical approaches do not yield the desired accuracy and predictability. Despite their popularity as a universal function approximator and wide range of applications, no specific rules for deciding the architecture of neural networks catering to a specific modeling task have been formulated. The research paper presents a methodology for automated design of neural network architecture, replacing the conventional trial and error technique of finding the optimal neural network. The genetic algorithms (GA) stochastic search has been harnessed for evolving the optimum number of hidden layer neurons, transfer function, learning rate, and momentum coefficient for backpropagation ANN. The methodology has been applied for modeling slump of ready mix concrete based on its design mix constituents, namely, cement, fly ash, sand, coarse aggregates, admixture, and waterbinder ratio. Six different statistical performance measures have been used for evaluating the performance of the trained neural networks. The study showed that, in comparison to conventional trial and error technique of deciding the neural network architecture and training parameters, the neural network architecture evolved through GA was of reduced complexity and provided better prediction performance.
Cement concrete is one of the most widely used construction materials in the world today. The material modeling of concrete is a difficult task owing to its composite nature. Although various empirical relationships in the form of regression equations have been derived from experimental results and are widely in use, these do not provide accuracy and predictability wherein the interactions among the number of variables are either unknown, complex, or nonlinear in nature. Artificial neural networks (ANNs), touted as the next generation of computing, have been the preferred choice since the last few decades for modeling unstructured problems pertaining to material behavior. The notable applications of ANN in modeling properties of concrete are implementations in predicting and modeling compressive strength of high performance concrete [
Known for its design mix precision leading to better quality concrete and ease of transportation and laying at the construction site, ready mix concrete (RMC) has emerged as a preferred concrete construction product catering to the requirement of the end user. One of the important properties of concrete that play an important role in the success of RMC is its workability. Workability of concrete is determined by the effort to lay and compact the freshly prepared concrete at construction site with minimum loss of homogeneity. The workability of concrete quantitatively measured in terms of slump value is an important quality parameter in RMC industry. The slump of concrete depends on the concrete’s design mix proportion. As noticed for other properties of concrete, the slump value also exhibits a highly nonlinear and complex functional relationship with concrete’s constituents. Recent studies, like prediction of slump and strength of ready mix concrete containing retarders and high strength concrete containing silica fume and plasticizers [
Besides ANN applications in modeling behavior of concrete discussed above, there are many multidisciplinary applications of ANN which are beyond the scope of this paper. The reason for this rapid growth in the field of neural networks is attributed to its “black box” nature which allows it to be applied to almost every available problem, without seeking the knowledge of underlying relationships among the input and output variables. In spite of its popularity as a universal function approximator, the neural networks are still being designed using trial and error approach. Generally, iterative techniques using different combinations of number of hidden layers and hidden layer neurons are employed in conjunction with different learning rate, momentum coefficient, and transfer function to arrive at optimal neural network design. This technique of designing a neural network is therefore time consuming and relies heavily on the experience of the designer. In order to reduce the effort and time in designing the optimal neural network architecture and its training parameters, various studies for automatic design of neural network have been successfully performed in the past by harnessing the stochastic search ability of genetic algorithms [
The study has been organized into sections. Section
The exemplar data for ANN was collected from the same RMC plant to mitigate any chance of change caused in the slump data due to change in physical and chemical properties of the concrete design mix constituents. The collected data comprised 560 concrete design mix proportions and their corresponding slump test values. The design mix proportions included weight per m^{3} of cement, pulverized fly ash (PFA), sand (as fine aggregate), coarse aggregate 20 mm, coarse aggregate 10 mm, admixture, and waterbinder ratio. The range (maximum and minimum values) of the RMC data used in the study is shown in Table
Range of RMC data used for neural network modeling.
RMC data  Maximum  Minimum 

Cement (kg/m^{3})  425  100 
Fly ash (kg/m^{3})  220  0 
Sand (kg/m^{3})  900  550 
Coarse aggregate 20 mm (kg/m^{3})  788  58 
Coarse aggregate 10 mm (kg/m^{3})  771  343 
Admixture (kg/m^{3})  5.5  1.0 
Waterbinder ratio  0.76  0.36 
Concrete slump (mm)  175  110 
For conducting the study, the Neural Network Toolbox and Global Optimization Toolbox included in the commercially available software MATLAB R2011b (version 7.13.0.564) were used to implement the BPNN and GA, respectively.
ANN derives learning capabilities through training using inputoutput data pairs and subsequent generalization ability when subjected to unseen data. The training and generalization of neural networks are accomplished using a training data set and a validation data set, respectively. The robustness of the trained and validated neural network is tested using a test data set. This procedure is accomplished by dividing the entire data into three disjoint sets, namely, training data set, validation data set, and test data set. The available data was randomized and 70% of the data was designated as training data set and the remaining 30% data was equally divided to create the validation and test data sets, respectively.
The data used for training, validation, and testing of neural networks comprise inputs and corresponding output features of different identities, which normally have minimum similarities. Moreover the range (minimum–maximum values) of the data used for each input and output component is also quite different. In order to scale down all the inputs and outputs in a particular bound range preferably −1 to +1 or 0 to +1, data normalization is performed. This type of normalization has the advantage of preserving exactly all relationships in the data and it does not introduce any bias [
An artificial neural network is an information processing paradigm which presents a computational analogy inspired by human brain. ANN consists of processing elements called the artificial neurons which are arranged in layers. The computational structure of an artificial neuron comprises several inputs
Mathematical model of an artificial neuron.
The architecture of a neural network consists of three basic layers denoted by “input layer,” “output layer,” and a number of intermediate “hidden layer/s”. In case of multilayer feedforward neural networks (MFNN), the neurons in each layer are connected in the forward direction only and no intralayer connections between the neurons are permitted. The input features form the neurons of the input layer and output features are represented by the neurons of the output layer. The number of hidden layers and hidden layer neurons depends on the number of training cases, the amount of noise, and the degree of complexity of the function or the classification desired to be learnt [
MFNNs trained using backpropagation (BP) algorithms are commonly used for tasks associated with function approximation and pattern recognition. Backpropagation algorithm, in essence, is a means of updating neural network synaptic weights by backpropagating a gradient vector in which each element is defined as the derivative of an error measure with respect to a parameter [
A suitable learning rate and momentum coefficient are employed for efficient learning of the network. A higher learning rate leads to faster training, but by doing so it produces large oscillations in the weight change which may force the ANN model to overshoot the optimal weight values. On the other hand, a lower learning rate makes convergence slower and increases the probability of ANN model to get trapped in local minima. The momentum term effectively filters out the high frequency variations of the error surface in the weight space, since it adds the effect of the past weight changes on the current direction of movement in the weight space [
In the present study, the input layer consists of seven neurons, namely, cement, fly ash, sand, coarse aggregate (20 mm), coarse aggregate (10 mm), admixture content, and waterbinder ratio. The output layer comprises a single neuron representing the slump value corresponding to the seven input neurons defined above. In this study eleven single hidden layer neural network architectures of different complexities with hidden layer neurons varying in the range 5 to 20 have been used for evolving the optimal neural network architecture. The neural network architecture with five hidden layer neurons for the present study is shown in Figure
Single hidden layer neural network with five hidden layer neurons.
The neural networks were trained using the training data set. The information to the neural network was presented through input layer neurons. The information propagated in the forward direction through hidden layers and was processed by the hidden layer neurons. The network’s response at the output layer was evaluated and compared with actual output. The error between the actual and the predicted response of the neural network was computed and propagated in the backward direction to adjust the weights and biases of the neural network. Using the BP algorithm the weights and biases were adjusted in a manner to render error to a minimum value. In the present study, LevenbergMarquardt backpropagation algorithm which is the fastest converging algorithm preferred for supervised learning has been used as training algorithm. During training process, the neural network has a tendency to overfit the training data. This leads to poor generalization when the trained neural network is presented with unseen data. The validation data set is used to test the generalization ability of trained neural at each iteration cycle. Early stopping of neural network training is generally undertaken to avoid overfitting or overtraining of the neural network. In this technique the validation error is also monitored at each iteration cycle along with training error and the training is stopped once the validation error begins to increase. The neural network architecture having the least validation error is selected as the optimal one.
Genetic algorithm (GA) inspired by Darwin’s theory “survival of the fittest” is a global search and optimization algorithm which involves the use of genetic and evolution operators. GA is a population based search technique that simultaneously works on a number of probable solutions to a problem at a time and uses probabilistic operators to narrow down the search to the region where there is maximum possibility of finding an optimal solution. GA presents a perfect blend of exploration and exploitation of the solution search space by harnessing computational models of evolutionary processes like selection, crossover, and mutation. GAs outperform the efficiency of conventional optimization techniques in searching nonlinear and noncontinuous spaces, which are characterized by abstract or poorly understood expert knowledge [
Automatic design of neural network architecture and its training parameters is accomplished by amalgamating GA with ANN during its training process. The methodology uses GA to evolve neural network’s hidden layer neurons and transfer function along with its learning rate and momentum coefficient. The BP algorithm then uses these ANN design variables to compute the training error. The training process is monitored at each iteration by computing the validation error. The training of neural network is stopped once validation error starts to increase. This process is repeated number of times till optimum neural network architecture and its training parameters are evolved. The steps of this methodology are presented as flow chart in Figure
Evolving neural network architecture and training parameters using GA.
The size of the population is chosen in such a way to promote evolving of optimal set of solutions to a particular problem. A large initial population of chromosomes tends to increase the computational time, whereas a small population size leads to poor quality solution. Therefore population size must be chosen to derive a balance between the computational effort and the quality of solution. In the present study an initial population size of 50 chromosomes is used.
The mutation operator modifies the existing building blocks of the chromosomes maintaining genetic diversity in the population. It therefore prevents GA from getting trapped at a local minimum. In contrast to crossover which exploits the current solution, the mutation aids the exploration of the search space. Too high mutation rate increases the search space to a level that convergence or finding global optima becomes a difficult issue whereas a lower mutation rate drastically reduces the search space and eventually leads genetic algorithm to get stuck in a local optima. The present study uses uniform mutation with mutation rate 0.02. The procedure for creating new population of chromosomes is continued till maximum generation limit is achieved or the fitness function reaches a saturation level. Maximum number of generations used for present study is 150.
The study uses six different statistical performance metrics for evaluating the performance of the trained models. The statistical parameters are mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), coefficient of correlation (
The neural network architecture was evolved through trial and error process by analyzing 30 different combinations of hidden layer neurons, transfer function, learning rate, and momentum coefficient. The optimal neural network architecture (BPNN) was evolved as 7111 having eleven hidden layer neurons with learning rate 0.45, momentum coefficient 0.85, and tangent hyperbolic hidden layer transfer function. The same operation was performed by incorporating GA during the training of ANN. The GA was able to evolve the optimal neural network architecture and training parameters in 92 generations (Figure
Statistical performance of ANN models for training, validation, and testing data sets.
Model  MAE (mm)  RMSE (mm)  MAPE (%) 


RSR 

Training  
ANN  1.7378  2.4027  1.1862  0.9804  0.9610  0.1974 
ANNGA  1.506  2.2357  1.0479  0.9830  0.9663  0.1837 
Validation  
ANN  1.9829  2.7489  1.3474  0.9746  0.9482  0.2276 
ANNGA  1.6299  2.4687  1.0991  0.9794  0.9582  0.2044 
Testing  
ANN  2.0651  2.9582  1.3916  0.9735  0.9474  0.2294 
ANNGA  1.7769  2.6295  1.2382  0.9803  0.9584  0.2039 
Fitness function (RMSE) versus generation.
The entire RMC data was also used for evaluating the prediction ability of the trained models, namely, BPNN and ANNGA. The regression plots showing the prediction of trained BPNN and ANNGA models are exhibited in Figures
Statistical performance of the trained ANN models for the entire data set.
Model  MAE (mm)  RMSE (mm)  MAPE (%) 


RSR 

Overall  
ANN  1.8236  2.5470  1.2412  0.9783  0.9569  0.2075 
ANNGA  1.5754  2.3345  1.0841  0.9819  0.9638  0.1902 
Regression plot of BPNN and ANNGA predicted slump versus observed slump.
The results of the study show that amalgamation of GA with ANN during its training phase leads to evolving of optimal neural network architecture and training parameters. In comparison to trial and error BPNN neural network having architecture 7111, the hybrid ANNGA automatically evolved a less complex architecture 791. Moreover the optimal training parameters evolved using GA were able to enhance the learning and generalization of the neural network. In comparison to BPNN model, the ANNGA model provided a lower error statistics MAE, RMSE, MAPE, and RSR value of 1.506 mm, 2.2357 mm, 1.0479%, and 0.1837 during training; 1.6299 mm, 2.4687 mm, 1.0991%, and 0.2044 during validation; and 1.7769 mm, 2.6295 mm, 1.2382%, and 0.2039 during testing, respectively. The trained ANNGA model gave higher prediction accuracy with higher values of statistics
The study presented a methodology of designing the neural networks using genetic algorithms. Genetic algorithms population based stochastic search was harnessed during the training phase of the neural networks to evolve the number of hidden layer neurons, type of transfer function, and the values of learning parameters, namely, learning rate and momentum coefficient for backpropagation based ANN.
The performance metrics show that ANNGA model outperformed the prediction accuracy of BPNN model. Moreover the GA was able to automatically determine the number of hidden layer neurons which were found to be less than those evolved using trial and error methodology. The hybrid ANNGA provided a good alternative over time consuming conventional trial and error technique for evolving optimal neural network architecture and its training parameters.
The proposed model based on past experimental data can be very handy for predicting the complex material behavior of concrete in quick time. It can be used as a decision support tool, aiding the technical staff to easily predict the slump value for a particular concrete design mix. This technique will considerably reduce the effort and time to design a concrete mix for a customized slump without undertaking multiple trials. Despite the effectiveness and advantages of this methodology, it is also subjected to some limitations. Since the mathematical modeling of concrete slump is dependent on the physical and chemical properties of the design mix constituents, hence the same trained model may or may not be applicable for accurate modeling of slump on the basis of design mix data obtained from other RMC plants deriving its raw material from a different source.
The authors declare that there is no conflict of interests regarding the publication of this paper.