Development of Artificial Intelligence Methods for Determination of Methane Solubility in Aqueous Systems

Accurate determinations of water (H2O) content in natural gases especially in the methane (CH4) phase are highly important for chemical engineers dealing with natural gas processes. To this end, development of a high performance model is necessary. Due to importance of the solubility of methane in the aqueous solutions for natural gas industries, two novel models based on the Decision Tree (DT) and Adaptive Neuro-Fuzzy Interference System (ANFIS) have been employed. To this end, a total number of 204 real methane solubility points in aqueous solution containing NaCl under dierent pressure and temperature conditions have been gathered. e comparisons between predicted solubility values and experimental data points have been conducted in visual and mathematical approaches. e R values of 1 for training and testing phases express the great ability of proposed models in calculation of methane solubility in pure water systems.


Introduction
In the natural gas industry, the precision estimations of water content in the methane-rich gas phase have vital importance. An accurate approach is necessary for prediction of vapor-liquid equilibria (VLE) of methane (CH 4 ) and water (H 2 O) binary systems. ere are several equations of state (EOS) to estimate VLE of CH 4 -H 2 O systems. Luedecke et al. used the Mansoori expression and Van der Waals equation as repulsive and attractive parts, respectively. eir estimation for binary systems was acceptable approximately, but the model was not completely successful in prediction of VLE for ternary systems [1]. Bakker used an EOS for the ternary system of CH 4 -H 2 O-NaCl. However, this approach does not perform well near critical points [2]. en, Chapoy and workers used a static-analytic apparatus to measure the solubility of methane in H 2 O for conditions of 0.1-18 MPa and 275.11-313.11 K. For this system, the binary interaction coe cients were calculated based on the measured data to enhance the performance of Patel-Taja EOS [3]. Mohammadi and coworkers developed a new approach by using mixing rules and Patel-Teja EOS [4]. After that, Haslam used the Hudson-McCoubrey rule and expressed that in the CH 4 -H 2 O system, and the square-well potential had better performance than the Lennard-Jones potential [5]. Markocic implemented the Redlich-Kwong EOS to evaluate the binary system by using nine sets of data [6]. Li [9]. Yarrison applied the Peng-Robinson EOS and liquid model for prediction of VLE of CH 4 -H 2 O systems. e model estimations concluded with the average absolute relative deviation (AARD) of more than 6% [10]. Abudour et al. investigated the coalbed gas and water system by using the Peng-Robinson EOS in the model; however, its performance was not accurate [11,12]. Li combined the Peng-Robinson and Pitzer models for determination of the aqueous phase and gas phase based on 18 adjustable parameters [13]. en, another method was developed by Zhao for pure water systems in the range of 0.1-150 MPa and 274-573 K. e AARD values of 7% and 4% were determined for liquid and gas phases, respectively [14].
ere are numerous studies on applications of machine learning in different industries [15][16][17][18]. Najafzadeh and Azamathulla used the neuro-fuzzy-based group method of data handling (NF-GMDH) to estimate the scour process at pile groups due to waves in terms of wave characteristics upstream of group piles, arrangement of pile group, pile spacing, geometric property, and sediment size [19]. Najafzadeh et al. employed NF-GMDG combined with the gravitational search algorithm, genetic algorithm, and particle swarm optimization to determine scour depth [20]. In another study, ANFIS and Support Vector Machine were used to study the local scour depth in long contractions of the waterway [21]. Saberi-Movahed et al. used the group method of data handling (GMDH) in the determination of the longitudinal dispersion coefficient (LDC) as a critical variable in investigation of pollution profiles in the water pipeline [22]. Nazari et al. used machine learning models to determine energy and energy efficiencies in terms of productivity, wind velocity, ambient temperature, nanofluid temperature, basin temperature, solar radiation, fan power, and nanoparticle volume fraction [23]. Najafzadeh and Oliveto studied the scouring propagation rate around pipelines in terms of the current angle of attack to the pipeline, the Shields parameter, the ratio of embedment depth to pipeline diameter, and the approaching flow Froude number by using machine learning models [24]. e wide application of machine learning approaches shows that these approaches can be employed in complex issues. e Adaptive Neuro-Fuzzy Interference System and Decision Tree are two user-friendly and simple models which can be used by engineers working in different fields [25]. A little knowledge in machine learning can provide ability to develop DT and ANFIS algorithms. In the present study, Adaptive Neuro-Fuzzy Interference System and Decision Tree algorithms have been used to predict the solubility of methane in the pure water system. Furthermore, the CPA-vdW [26], CPA-HV [27], and SRK-HV [28] models have been employed to compare with proposed model results.

Experimental Databank.
In order to prepare and validate the ANFIS and DT algorithms, a comprehensive databank of 470 actual methane solubility points in the pure water system in a wide range of temperature and pressure has been collected from various papers. e details of this databank are reported in the following reference [27]. is databank has been divided into 353 and 117 data points for training and testing sets.

Adaptive Neuro-Fuzzy Interference
System. In the literature for the first time, Zadeh introduced fuzzy logic (FL) [29]. e capacity of alteration of linguistic variables to mathematical forms is known as the major feature of fuzzy logic. Sometimes, this approach fails to achieve appropriate results because of contrasts in assessment or insufficient data. To solve this issue, other methods including the artificial neural network (ANN) can be combined with fuzzy logic for process modeling. e FL and ANN approaches are combined together and produce a new algorithm, namely, the Adaptive Neuro-Fuzzy Interference System (ANFIS). e combination of these methods performs based on definition of membership functions (MFs) and IF-THEN rules.
ere are several popular MFs including Gaussian, triangular, generalized bell-shaped, and trapezoidal. In the literature, there are two structures for ANFIS called Takagi-Sugeno and Mamdani types [30]. In the present work, Takagi-Sugeno has been implemented because of its ability in solving the nonlinear relationship between the output and the inputs. e main processes of designing an ANFIS algorithm are shown in Figure 1. In its different layers, there are various relations which are explained as follows [31,32]: e achievement of linguistic terms from the raw input data occurs in the first layer. Inputs are connected to nodes which are used for defining linguistic terms. is definition is constructed by the MFs [33][34][35]. e utilized MF in this study is the Gaussian type which described as follows: where O stands for the output of the first layer, and σ and z represent the variance term and Gaussian MF center, respectively. e second layer or the firing strength layer in which the accuracy and adequacy of the previous sections conditions are investigated. e formulation of firing strength is as Here, β and ω stand for MF and the rule's firing strength.
After that, the normalization of the rule is performed in the third layer. e following formulation expresses the process of normalization: e fourth layer has characterized the model's output linguistic terms. e following expression is used to determine the level of each rule that influences the model's output: In this equation, the linear variables are obtained by optimization of ANFIS.
Finally, the fifth layer sums up the existing rules and changes them to a quantitative form as follows [36,37]:

Decision Tree.
In the recent years, one of the most applicable machine learning tools is the decision tree classifier [38]. is method is constructed based on a tree-like hierarchy to create a classification tree that has a simple scheme in which the terminal nodes stand for decision outputs and nonterminal nodes represent the attributes [39].     In this method, the major advantage is that the classification can have an easy visual representation. However, there are some disadvantages that include it cannot produce multiple outcomes, and it is approximately susceptible to the data noise [40]. Recently, many decision trees based on C4.5 [41], ID3 [42], the chi-square automatic interaction detector [43], and the classification and regression tree [44] have been suggested. e J48 decision tree or C4.5 algorithm has been applied as the fundamental classifier in ensemble frameworks. Although the C4.5 decision tree is an interesting approach for classification, its estimative ability can be enhanced by utilization of ensemble approaches [45]. In this study, the ensemble approach, namely, bagging has been used. It is one of the recent ensemble approaches which uses the bootstrap sampling strategy. is strategy samples randomly by replacing to produce multiple samples creating a training subset. ese created subsets are used to generate the decision tree, and at last, they are aggregated into the  International Journal of Chemical Engineering final model. e mentioned strategy improves the classification performance by reducing the variance in the errors [46]. e scheme of the bagging process is shown in Figure 2.

Results and Discussion
In this part, the proposed ANFIS and DTmodels results have been evaluated in different stages including training and testing. In training of the ANFIS algorithm, particle swarm optimization (PSO) has been used. e selected cost function in this work is the mean squared error function, whose variations in terms of iterations are shown in Figure 3. After optimization of the ANFIS algorithm, comprehensive statistical comparison has been carried out. e number of clusters, population size, and iterations in training of the ANFIS model are 6, 65, and 1500, respectively. For DT, the learning rate and number of additive terms are 0.1 and 300, respectively. To this end, various indexes expressing the quality of the match between predicted and actual methane solubility values are determined and reported in Table 1   confirm the performance of the developed ANFIS model for prediction of other unseen conditions. e visual comparison of predicted and experimental methane solubility values is a necessary part of evaluation of models. To that end, the model outputs and actual methane solubility points are shown simultaneously in Figure 4. In addition, this excellent agreement between forecasted and actual methane solubility values are shown by cross plot depiction in Figure 5. As can be seen, the methane solubility points are located on the bisector line for both phases. In   International Journal of Chemical Engineering 7 addition, the relative deviation between forecasted and experimental methane solubility points are determined and shown in Figure 6. It can be seen that the relative deviation points are highly close to zero. In the present work, three other models are borrowed from the literature to compare with ANFIS and DT algorithms in prediction of solubility of methane in pure water systems. As shown in Figure 7, CPA-HV, CPA-vdW, SRK-HV, DT, and ANFIS models have been employed to predict methane solubility in different temperatures and pressures.
is figure shows that the ANFIS algorithm has the most accuracy between the aforementioned models. Moreover, the accuracy of four other models can be affected by pressure and temperature, while the accuracy of the developed ANFIS algorithm is interesting for a full range of investigated conditions.

Conclusions
e main aim of the present work is development of innovative and accurate methods for estimation of the solubility of methane in pure water systems for extensive ranges of pressure and temperature. ese methods have been constructed based on ANFIS and DT algorithms by using 470 methane solubility points. is databank has been used in determination of optimum parameters of ANFIS and DT algorithms in the training step and performance evaluation of suggested ANFIS and DT algorithms in determination of unseen methane solubility points. For ANFIS model as the most accurate method, the determined R2, RMSE, MSE, MRE, and STD are 1, 0.001, 4.33045E − 08, 4.571, and 0.0002, respectively. On the other hand, results of three other models from the literature have been compared with the ANFIS algorithm. is comparison shows that the ANFIS model is the best tool for estimating methane solubility in aqueous systems. Due to these results, the present algorithms are useful tools for chemical engineers dealing with the natural gas industries.

Data Availability
Data are included within the manuscript.

Conflicts of Interest
e authors declare that they have no conflicts of interest.