Hybrid Model: Teaching Learning-Based Optimization of Artificial Neural Network (TLBO-ANN) for the Prediction of Soil Permeability Coefficient

e permeability coefficient (k-value) of the soil is an important parameter used in the civil engineering design of roads, tunnels, dams, and other structures. However, the determination of k-value by experimental methods in the laboratory or the field is still costly and time-consuming. Moreover, it requires special equipment and special care in the collection of soil samples for laboratory study. erefore, in this study, we have proposed machine learning (ML) hybrid model: teaching learning-based optimization of artificial neural network (TLBO-ANN) to predict the k-value of soil based on limited parameters (natural water content, void ratio, specific gravity, liquid limit, plastic limit, and clay content) which can be determined easily in the laboratory. Test results of 84 soil samples obtained from the Da Nang-Quang Ngai expressway project in Vietnam are used in the model development. Statistical indicators such as correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE) are used to validate and evaluate the accuracy of the model. e results show that the TLBO-ANN model is an effective tool in predicting correctly the k-value (R� 0.905) of soil for the consideration in the design of structures founded on the soil.


Introduction
Permeability of soil is one of the important parameters in the design of most civil engineering structures such as roads, tunnels, and dams constructed on soil [1]. e permeability coefficient of soil (k-value) is a coefficient that evaluates the ability of liquids to flow through interconnected voids in soil from high to low to hydraulic gradients [2]. e k-value is used in many different theoretical and practical problems, for example, in modeling the underground water flow, consolidation settlement rate, and slope stability of groundmass [3]. e requirement of the desired k-value often changes depending on the type of soil and the service life of structures. For example, a higher k-value is required for filter layer and drainage construction, while a lower kvalue is required in the case of roadbeds or dams. Many factors such as density, void (size and type), particle (size, distribution, and shape), and surface roughness of the soil are the major factors that govern the variety of k-value [1,4]. e accurate determination of the k-value is not an easy task because of field conditions and laboratory test methods. e common feature of these experiments is that it is complex, time-consuming, and costly. us, the other way to predict the k-value is based on the empirical formulas. e formulas of David [5], Alyamani and Sen [6], and Chapuis [7] considered the particle size to estimate the soil permeability. David's [5], Cheng and Chen [8], Terzaghi's [8], Milan, and Andjelko's [9] formulas show that the k-value is depended on porosity, particle size, and another factor. Lebron et al. [10] predicted the k-value based on bulk density, particle size, and shape. It can be seen that the formulas provide a relatively fast and simple tool for calculating the k-value. However, the k-value obtained from experimental results and empirical formulas show in many cases significant differences. It is indicated that the formulas should therefore be applied only in preliminary calculations. Furthermore, empirical formulas are not applicable to all soil types [1]. erefore, artificial intelligence (AI) or machine learning (ML) methods have been developed in recent decades to accurately predict the k-value of the soil and to reduce cost and time using limited geotechnical parameters. Such methods include artificial neural network (ANN) [11][12][13][14], adaptive neural fuzzy system (ANFIS) [15,16], and hybrid optimization models of genetic algorithms with adaptive neural fuzzy inference system (GA-ANFIS) [15], support vector machine (SVM), random forest (RF) [12], M5P, and Gaussian process (GP) [17].
It is observed that soft computer-based models (AI or ML) are excellent tools for predicting the k-value [18]. In which, the ANN model is used commonly because of some advantages: (i) it has a simple architecture, (ii) it is easy to train and generalize, and (iii) it can solve nonlinear problems with high accuracy [19]. However, this method also has some weaknesses such as slow convergence speed and also being prone to local errors. To overcome its drawbacks and improve its prediction performance, the optimization algorithms will be helpful [20]. e optimization algorithm is used to change the properties of the neural network such as the weight and the learning speed to reduce the loss [21].
Teaching learning-based optimization (TLBO) has been proposed in recent years [22]. is is a new swarm intelligence optimization algorithm that simulates the teachinglearning phenomenon of a classroom [23]. It has been tested on several unconstrained and unconstrained nonlinear programming problems, including some combinatorial optimization problems, and has achieved considerable success [24]. According to recent literature reviews, the TLBO seems to have the potential to solve combinatorial optimization problems [25]. However, its performance has yet to be tested on shelf-space allocation issues [26]. It is a fact that the continuous development of metaheuristics helps to provide effective solutions to optimization problems [27]. erefore, in the present study, we have used the following ML hybrid model: teaching learning-based optimization of artificial neural network (TLBO-ANN) to predict the k-value by combining the advantages of both TLBO and ANN. To the best of the authors' knowledge, for the first time, the TLBO-ANN model is used in determining the kvalue in the Vietnamese study area. e main objective of this study is to apply a newly developed hybrid model (TLBO-ANN) for the prediction of the k-value based on collected data from the Da Nang-Quang Ngai Expressway project site in Vietnam to assess its capability in highly accurate prediction for further use in other areas. Statistical metrics such as correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE) have been used to validate the model performance.
e MATLAB software is used to process the data and to simulate this model.

Data Used.
e k-value is affected by many factors such as porosity, particle composition, mineral composition, and physical and mechanical parameters [1,2,4,[28][29][30]. However, this study will focus on the key factors that significantly influence soil permeability to reduce the complexity of the model. In the present study, data on 84 soil samples were collected from the Da Nang-Quang Ngai expressway project. e experimental program in the laboratory consists of the following two parts: (1) Specific tests determined water content (w), void ratio (e), specific gravity (c), clay content (CC), liquid limit (LL), and plastic limit (PL). e collected results were used as input parameters for the predictive model. (2) Permeability test determined the k-value. e collected results were used as output parameters for the predictive model.
Statistical analysis of these input parameters is provided in Table 1. e results show that the maximum, minimum, mean, and standard deviation values of 06 input variables and 01 output variable were used in this study.
All data, including input and output parameters, is normalized. Data normalization or scaling was performed to minimize information clutter and errors in the model study. As part of the normalization process, the values in the dataset were changed to a general scale that did not distort the difference in the value range 0-1. Normalization data in the columns is carried out according to the following equation: where α and β are the maximum and minimum values of the parameter i.

Methods Used.
In this study, a hybrid model (TLBO-ANN) has been developed by optimizing artificial neural network (ANN) technique with teaching learning-based optimization (TLBO) algorithm. A brief description of both the methods is described as follows.

Artificial Neural Network (ANN).
ANN is known as a common and powerful technique that imitates the activity and performance of the human brain and nervous system [31]. is technique has many crucial abilities such as generalization, learning from data, and can deal with a large variable. It was reported that the major characteristic of ANN comprises continuous nonlinear dynamics, high fault tolerance, collective computation, self-learning, self-organization, and real-time treatment [32]. us, this algorithm has been widely employed and applied successfully to solve many problems in geotechnical engineering. In both linear and nonlinear patterns, ANN is generally adopted to determine the hidden layer between output and input neutrons; as a result, ANN could decide to analyze relationships and patterns by itself in data. To predict the permeability coefficient of soil, a multilayer perceptron (MLP) was adopted as a regression technique. To calculate the weights of the input through activation function, the sigmoid function is used in neurons.

Teaching Learning-Based Optimization (TLBO).
e teaching learning-based optimization (TLBO) algorithm is a novel algorithm, which has been suggested by Rao et al. [33,34] and developed according to the inspiration of students and teachers in a class. It was reported that the TLBO algorithm showed more superiority than other algorithms such as particle swarm optimization, harmony search, and artificial bee colony algorithm [34]. In addition, other researchers indicated that the TLBO algorithm showed better results than those using genetic algorithm and ant colony optimization [35]. e concept of the TLBO algorithm has mimicked the influence of teachers on the output of the student in a class. Teachers and students are two main components of the algorithm and they represent two basic modes of learning, via the teacher phase and its interaction with the learner phase (student phase). e output of this algorithm is the grades or results of the learners that are strongly affected by the teacher's quality. A high-quality teacher could encourage learners in a class, thus helping in enhancing the performance of the class. In the class, each learner attempts to follow the teacher and improve their performance of the class. Besides, each learner also interacts and exchanges with other learners in the class to enhance their single performance. e TLBO algorithm is a population-based method that is established by learners. e different variables are defined as different subjects that are introduced to the learners, and the results of learners are corresponding to the fitness value of optimization. e whole process of the TLBO algorithm includes two phases, namely, the learner phase and the teacher phase. e detail of the two phases, the algorithm, and the procedure of this algorithm can be found in the published literature [24,36].

Validation Indicators
( e R-value varies between −1 and 1, and the closer the absolute value of R is to 1, the more accurate is the model performance. e formulas for determining R, RMSE, and MAE are shown as follows: where M is the actual experimental value, N represents the expected value based on the model's estimate, and n represents the total sample size in the data set. (2) Cost Function. To show the difference between the predicted and the actual value, a cost function or loss function is generally used. e loss function refers to the error for a single training example, while the cost function refers to the average of the loss functions over the entire training dataset [37]. e cost function acts as an indicator of the model's performance improvement during adjustment of error [38]. e main objective of the optimization strategy is to minimize the value of the cost function [39]. Some of the cost functions used in ML models include the regression cost function, the binary classifier cost function, and the multiclass classification cost function. Iterative strategies are applied during the training of ML models to reduce loss. In this study, the regression cost function was used during the training of the models.

Methodology.
e proposed methodology of the present study is described in the following three main steps: data preparation, model building, and model validation ( Figure 1)

Results
e performance of the TLBO-ANN model has been evaluated based on the results of the cost function. e goal of predictive modeling is to have the model converge as soon as possible with the least number of iterations to minimize the cost function. Figure 2 depicts the convergence diagram of RMSE, MAE, and R of the TLBO-ANN model after 500 iterations. Convergence curves were obtained by plotting the cost function in each iteration of the three indices (with red lines representing training data and blue lines representing testing data). Different parameters showed different convergence behaviors. It is seen that the R-value of the cost function tends to increase, whereas the MAE and RMSE tend to decrease markedly with the number of iterations and the convergence. e results show that the cost function values of RMSE, after some very strong fluctuations in the first iterations, also converged and were almost stable after the 35th iteration for the testing data, and after the 115th iteration for training data, relative stability was achieved. e cost function of MAE achieved the fastest convergence at the 42nd iteration (for both training and testing data) although there was still slight variation after that as the number of iterations increased. Different from the above cost functions, the cost function of R achieved the fastest convergence at the 45th iteration (for both data); however, for the training data, there is still a slight upward trend but insignificant. As can be seen, the cost functions of R, MAE, and RMSE in the TLBO-ANN model converge rapidly in the simulation runs.
In the next section, typical results after 500 simulations of the TLBO-ANN model are presented. e correlation between the k-value corresponds to the experimental value obtained (black line) and the predicted value (red line) from the training and testing process,   Mathematical Problems in Engineering testing data set, the experimental results are also predicted with small errors (Figure 3(b)). e performance of the model is also evaluated by the error evaluation criteria, namely, MAE and RMSE presented above. e values of these criteria for the training and testing dataset are shown in Figure 4. e RMSE values are 3.0541 and 2.9401 while the MAE values are 2.1721 and 2.3075 for the training and testing dataset, respectively.  Mathematical Problems in Engineering Figure 5 shows the histogram of the frequency and probability density function for the predictive results of the k-value. e results show that the maximum concentration error is in the range −0.002 to 0.001 for the training dataset and −0.009 to 0.012 for the testing dataset, indicating a very highly concentrated probability density function in this range. ere are also a few cases where the error is high, about −1.5, but accounts for a tiny percentage that does not affect the overall. In addition, with a very small mean error (−0.0052 for training data and 0.0491 for testing data), the TLBO-ANN model shows a very high predictive accuracy. e regression model showing the correlation results between the predicted value according to the TLBO-ANN model and the actual value for the training and testing dataset is shown in Figure 6. In which, the horizontal axis represents the results of the collected experiment, and the vertical axis represents the outcome predicted by the proposed model. It is observed that the values obtained from the proposed model for the training dataset ( Figure 6(a)) and the testing dataset (Figure 6(b)) are very close to the experimental results.
ese results show that the TLBO-ANN model can generalize between input and output parameters and gives reasonable prediction results. For the training data set, the correlation between simulation and experimental results reached R � 0.951, for the téting dataset, R � 0.905, and the error is mainly concentrated in the first quartile. It indicates that the predictive power of the model is very good. e function "y � 0.88x + 0.0052" is set up to represent the correlation between experimental and simulation data for the training data set. Similarly, the function "y � x + 0.0051" is established in the testing data set. It is noticed that the coefficients of these two equations are quite equivalent. At the same time, the R values show that it is feasible to apply the TLBO-ANN model to predict the k-value.

Discussion
e performance of the TLBO-ANN model has been validated by statistical metrics. e results of this study are compared with the results of the studies of Pham et al. [12,17]. ey have used the following ML models: M5P, GP, ANN, SVM, and RF to predict the k-value with the same 06 types of parameters as inputs at the Da Nang-Quang Ngai Expressway project (Table 2). e results show that the determination coefficient of the TLBO-ANN model in the study (R � 0.905) is much higher than that of the M5P and GP models (R < 0.77), as well as the ANN, SVM, and RF models (R < 0.851). It can be seen that the TLBO-ANN model is much superior to the single ANN model and other models such as M5P, GP, SNM, and RF. Result also shows that the role of the TLBO algorithm in the enhancement of optimization in the performance of the TLBO model is similar to some published studies. Actually, the TLBO shows that this is an optimal algorithm with high reliability, accuracy, and fast convergence speed. It does not require any algorithm-specific parameters, so the TLBO algorithm can also be called an algorithm-specific parameter-free algorithm [40]. e TLBO algorithm is based on the influence of the teacher's presence on student outcomes in the classroom, and outcomes are calculated by semester grades. e TLBO algorithm is observed to perform better than other optimization algorithms (archive-based microgenetic algorithm (AMGA), clustering multiobjective evolutionary algorithm (clustering MOEA), differential evolution with self-adaptation and local search algorithm (DECMOSA-SQP), dynamical multiobjective evolutionary algorithm (DMOEA), generalized differential evolution 3 (GDE3), LiuLi algorithm, multiobjective evolutionary algorithm based on decomposition (MOEAD), enhancing MOEA/D with guided mutation and priority update (MOEADGM)) for unconstrained benchmarking problems, and unconstrained multiobjective [41]. In applying optimization algorithms to enhance the learning process of ANN, TLBO has better training accuracy in comparison with the other two algorithms (particle swarm optimization (PSO) and differential evolution (DE)) [19].
However, it should be noted that as only 84 test results were used from one project in Da Nang, Vietnam, the k-value prediction results of this paper only have high reliability within the scope of experimental data. erefore, in the next research direction of the research, more experiments related to other physical parameters of the soil such as particle shape, particle distribution, effective particle diameter, and sampling in the different projects to expand the scope and number of inputs will be conducted.

Conclusion
In this study, the proposed ML hybrid model TLBO-ANN has been successfully developed and evaluated for the prediction of the k-value using cost function and statistical measures (R, RMSE, and MAE). e results show that the TLBO-ANN model is a good predictor in predicting the k-value of soil with R � 0.905. Comparison of this model performance with another single ANN model and other models such as M5P, GP, SNM, and RF is also much superior. erefore, the TLBO-ANN model can be used for the accurate prediction of the kvalue of soil. However, as the sample size of the present study is limited, it is proposed in future studies to include more samples and different combinations of input parameters of soils for wider applicability of this model in other areas also considering success in highly accurate prediction of the k-value at Da Nang-Quang Ngai Expressway, Vietnam.

Data Availability
e data used to support the findings of this study are available from the corresponding authors upon request.