Developing a Robust Model Based on the Gaussian Process Regression Approach to Predict Biodiesel Properties

Biodiesel is assumed a renewable and environmentally friendly fuel that possesses the potential to substitute petroleum diesel. The basic purpose of the present study is to design a precise algorithm based on Gaussian Process Regression (GPR) model with several kernel functions, i


Introduction
In the age of increasing greenhouse gases, a sharp fall in oil resources and rising fossil fuel prices forced the authorities to attend to biomass resources much more than previous [1][2][3][4]. All of these convincing reasons make the biofuels, such as biodiesel and bioethanol, suitable and major alternatives for fossil fuels [2,5,6]. Biodiesel has high adaptability to the environment and on the other hand is reproducible fuels [7,8]. For these convincing reasons, this fuel is a suitable replacement for petroleum diesel [9][10][11]. Some fuels such as biodiesel can be acquired by alcoholics and chain reactions of animal fats and vegetables with several light alcohols involving ethanol and methane while acidic catalysts or alkaline is applied to them [12]. ese chain reactions mainly diminish the oil viscosity. e general properties of biodiesel mainly depend on the construction of employed oil [13]. Many different and determinative properties such as density, IV, KV, flash, PP, and CP are mentioned for the quality of biodiesel [14,15]. e fact is that the experimental evaluation of these properties is simple, but it required expansive time and cost. e offering model should predict the fuel properties, and on the other hand, it should enhance the quality of biofuel; therefore, it is essential to propose an accurate and reliable model. To approximate the Cetane number about biodiesel, Ramadhas et al. proposed a multilayer feed forward model [16]. To estimate the Cetane number and fatty acid methyl esters (FAME), Hansen and Bamgboye suggested a novel correlation [17]. To measure the iodine and saponification value of different types of biodiesel, Gopinath et al. suggested a multiple linear regression model. ey could reduce estimation error to about 3.4% [18]. Phankosol and coworkers expanded an experimental model based on double bond and carbon numbers in various temperature span to evaluate biodiesel viscosity. e Average Absolute Deviation (AAD) for this algorithm is estimated to be about 6.95% [19]. Rocabruno-Valdés and coworkers performed an artificial neural network (ANN) to predict the number of cetane, biodiesel density, and dynamic viscosity. MSE for validation set of this model is about 1.842 × 10 −3 [20]. Talebi et al. formed a modern system to analyze and evaluate biodiesel features according to profile of FAME [21]. Miraboutalebi and coworkers handled an ANN model to evaluate cetane numbers. By analyzing statistical data, it can be understood that RMSE and R 2 are about 2.53 and 0.95, respectively [22]. Hong and coworkers employed fatty acid methylparaben esters profile to approximate the biodiesel features. e range of Average Absolute Error (AAE) in Hong work was between 0.14 and 7.5 percent [23]. Giwa and coworkers employed a multilayer perceptron neural network (MLP) to approximate the number of cetane in biodiesel features. eir model is inspired from five fatty acids [24]. Hosseinpour and coworkers predicted the number of cetane by mixture of ANN and partial least square. In this method, the percent error (PE), R 2 , and MSE are about 1.06, 0.99, and 0.72, respectively [25]. Mostafaei proposed a system known as ANFIS to predict the number of cetane in biodiesel [26]. It is noteworthy to mention that, because of the deficiency of worthy and useful experimental data, expanding an accurate model can be helpful to researchers. In recent years, a minority of estimated models such as artificial intelligence methods have been executed to evaluate material properties and processes in different applications [27][28][29][30][31][32][33]. Recently, with extensive development in technology and science, some novel and smart methods are suggested such as GMDH (Group Method of Data Handling), GPR, ANFIS, ANN, and LSSVM; by means of these useful methods, many complex and nonlinear problems can be modelled in many different branches [34][35][36][37][38].
Based on the study, approximately there are no studies around biodiesel properties such as PP, CP, KV, and IV; in other words, there is not any accurate and smart model to be able to model these four parameters. is work aimed to extend a detailed model to approximate the biodiesel properties as mentioned above in point of fatty acid methyl ester utilizing the GPR algorithm. To achieve this goal, an extensive dataset is utilized and has been evaluated for the model accuracy and precession by statistical parameters.

A Summary of Gaussian Process Regression. Gaussian
Process (GP) is described as a complex of random variables, in which some variables have a multivariable distribution of Gaussian. GPRs are nonprometric probabilistic models which are based on kernel. Suppose a training set, {(x i , y i ); i � 1, 2, . . ..} that y i and x i ∈ R d and both of them are from unknown distributions. A trained GPR model predicts the value of y new which its input is the matrix x new . Suppose a linear regression function, y � x T β + ε, which ε ∼ N(0, 0 2 ). e GPR method tends to explain y by presenting hidden variables which can be shown by l(x i ) that i � 1, 2, 3, . . .., n, which starts from a Gaussian Process (GP) while the common distribution of l(x i ) ′ s is a Gaussian function and fundamental function, b. l(x i ) ′ s is a covariance function that catches the smoothness of y. e base function has to project x in a feature space. e dimension of feature space is p. Covariance and mean are the principal parameters by which a GP is described. Suppose the mean function of l(x) is m(x) � E(l(x)) and its covariance function is k( . θ is a hyperparameter of k (x, x ′ ) and therefore can be expressed like k(x, x ′ |θ). Generally, several algorithms approximate θ, σ 2 , and β for training a suitable model and allocate initial values and some specifications such as k and b as parameters. is study investigates four disparate and important kernel functions such as Rational Quadratic, Matern, Exponential, and Squared Exponential. Equations (1)-(4) present these specifications, respectively, and in these equations, σ l is the scale of characteristics length which means how x ′ s can be far from y ′ s to become uncorrelated.
, and on the other hand, α is a positive parameter with complexity in scale. It is important to mention that σ f and σ l should be greater than zero [39]. is could be possible only through θ which included two parameters θ 1 � log σ l and θ 2 � logσ f .
In such previous equations, four base functions are studied here as well; these functions include constant, empty, linear, and pure quadratic as can be seen in equations (5)-(8), respectively. e specifications of base functions are as follows: 2 International Journal of Chemical Engineering For estimating θ, σ 2 , and β, the function known as marginal log-likelihood that mentioned in equation final goal is maximizing the equation log P(y|X, β(θ, σ 2 ), σ 2 , θ) based on σ 2 and β. e function in the log is known as the likelihood function. Firstly, algorithm calculates β(θ, σ 2 ); this should be maximized in respect to σ 2 , β, and θ.
is assists to obtain β − profile likelihood that mentioned above and as mentioned previously, it should be maximized by two parameters σ 2 and β.

Data Gathering
56 laboratory data of IV were extracted from previously reported sources [18,21]. 25 and 44 experimental data were employed for PP and CP, respectively [40,41]. In addition, 59 data were utilized for KV of biodiesel [19,24]. Data are separated into two categories: 75 percent of the total data are randomly selected for training and the rest are categorized into a testing set for validating model. e input data for evaluation of KV are double bounds number (dn), carbon number (C n ), and temperature (K). To estimate the IV, the input data are weight percent of poly unsaturated fatty acid (PU), monounsaturated fatty acid (MU), and, moreover, the number of double bonds. To estimate CP, input data include weight percent of saturated fatty acid (C 0 ), carbon number (C n ), and molecular weight (M w ). For evaluation of PP, the input data include number of double bond, molecular weight, and carbon number. C n , d n , and M w are expressed in the following formulas: In these formulations C 3 , C 2 , and C 1 represent the sum of triunsaturated fatty acid methyl esters, di, and mono, respectively. X i indicates mass fraction [26].

Results and Discussion
is study introduced a new algorithm known as GPR to predict the Biodiesel properties. e principle goal of this section is the graphical and statistical analysis of the GPR algorithm. Figures 1-4 show the graphical view between predicted and experimental output; in other terms, these figures, which are known as data index charts, attempt to compare the experimental output with predicted values graphically. Usually in experimental papers, there is a slight difference between the actual values and the modelled values [42][43][44]. Figures 1-4 represent the data index for four different biodiesel properties such as IV, KV, PP, and CP and each figure is divided into four subfigures that compare the performance of the network in disparate kernel functions. Subfigures a, b, c, and d represent disparate kernel functions of GPR, as follows: Exponential, Matern, Squared Exponential, and Rational Quadratic, respectively. Carefully in these figures, it can be understood that all kernel functions have a nice cover between experimental and predicted outputs and have a good prediction. As can be observed, the estimated and experimental values nicely overlapped which show the proposed GPR algorithm has great efficiency in the prediction of disparate properties of biodiesel. Figures 5-8 indicate the regression plot for each biodiesel property and each figure compares the performance of the GPR model among four different GPR kernel functions. As can be seen in the four figures, the majority of test data accumulated near y � x line. As can be understood from the regression plot, increasing in test data accumulation rate near y � x line resulted in high accuracy in the GPR model. As mentioned earlier, these charts represent a graphical and general view, and it generally can be seen that all test data are near y � x line. Based on these charts, the result of the GPR model has an acceptable performance in all biodiesel properties.
In order to acquire a better understanding of regression charts and more explanations, Tables 1-4 are prepared. According to Table 1 which is related to KV property, if consider closely R 2 field, the test value for the Matern kernel function is 0.992 which is more appropriate in comparison with other kernel functions. e RMSE value for this kernel function is 0.15697. If consider Table 2 carefully, for IV, the best R 2 value for test data is 0.998 while the RMSE value is    0.96580 and this belongs to the Squared Exponential kernel function. As can be seen in Table 3, the best R 2 value in test data for CP property belongs to exponential function with a value about 0.966 while Squared Exponential kernel function for PP property in Table 4           International Journal of Chemical Engineering      International Journal of Chemical Engineering outcomes of ANFIS and LSSVM models reported in previous literature [45], Tables 1-4 are prepared as numerical. Table 5 summarizes R 2 and RMSE values among three different models for four biodiesel properties. As it can be understood from Table 5

Conclusion
In this paper, a Gaussian Process Regression model using four different kernel functions such as Exponential, Matern, Squared Exponential, and Rational Quadratic was proposed. is model has the ability to estimate the physical and chemical features of the biodiesel material where these properties include KV, PP, CP, and IV. A valuable dataset was collected from different sources for these biodiesel properties. On the other hand, the results of offered GPR model are compared with two previous models, ANFIS and LSSVM-PSO results. e graphical and statistical approaches indicated the GPR model obtained high efficiency in terms of estimation and evaluation of biodiesel properties. e proposed GPR algorithm is easy to apply and researchers can open an account on this algorithm from the point of view of simplicity and usefulness. is model can be helpful for those who desire to work with biodiesel fuels.

Data Availability
e data used to support the findings of this study are provided within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.