Analysis of Environmental Controls on the Quasi-Ocean and Ocean CO2 Concentration by Two Intelligent Algorithms

School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China School of Electrical and Electronics Engineering, Shanghai Institute of Technology, Shanghai 201418, China State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China University of Chinese Academy of Sciences, Beijing 100049, China Sino-Belgian Joint Laboratory of Geo-Information, Urumqi 830011, China CAS Research Centre for Ecology and Environment of Central Asia, Urumqi 830011, China


Introduction
During the past decades, emissions of carbon dioxide (CO 2 ) from fossil fuel combustion and deforestation are rapidly increasing the atmospheric concentration of CO 2 [1][2][3][4][5]. is has potentially motivated global warming and reduced the pH of the oceans [6,7]. e increase in atmospheric CO 2 concentration will also resulted in the decrease in marine biodiversity and change in the structure of phytoplankton communities [8]. Scientists around the world are making great efforts to a better control the atmospheric CO 2 concentration, and the technology of CO 2 sequestration keeps on developing in recent years [9,10].
Benefiting from rapid development of communications technology, sensing technology, and automation technology, a series of electrochemical and semiconductor sensors have been manufactured and applied for real-time measurement of atmospheric CO 2 concentration [11].
e monitoring system of above-ground CO 2 concentration was also developed based on the mid-infrared absorption spectroscopy technology and has been widely applied [12]. However, the observations of below-ground CO 2 concentration are few conducted [13].
Some previous studies have demonstrated CO 2 absorption by soils in arid regions, where the absorbed CO 2 is conjectured to have been largely sequestrated in the "subterranean ocean"-groundwater [12][13][14][15]. Hence, interpretation of the environmental controls of the above-ground CO 2 concentration and the below-ground CO 2 concentration is both essentially important for understanding the evolution of atmospheric CO 2 concentration [16]. Considering the physical and chemical properties of soils in arid regions, many researchers tried to analyze important environmental factors which affect the above-ground CO 2 concentration, but environmental controls of below-ground CO 2 concentration are still poorly understood [11][12][13][14][15][16].
Taking into account the subterranean runoff, there is strong relationship between the ocean and groundwater. Furthermore, CO 2 uptake in ocean and groundwater is also similar. Consequently, we are motivated to analyze and compare the environmental controls of ocean CO 2 concentration (surface ocean pCO 2 ) and quasi-ocean CO 2 concentration (deep-soil pCO 2 , i.e., the underground high humidity pCO 2 ).
Objectives of this study are as follows: (1) to present a better understanding of environmental controls on the below-ground CO 2 concentration and (2) to find latent factors having not been taken into consideration. For the convenience of the problem formulation and theoretical analyses, we utilized two additive algorithms-partial least linear regression (PLSR, representing a linear approach) and the artificial neural network (ANN, representing a nonlinear approach).

PLSR and ANN.
In order to investigate relationship between two dependent variables (ocean CO 2 concentration and quasi-ocean CO 2 concentration) and other 10 independent variables, both linear and nonlinear, we, respectively, use two different algorithms, which are partial least squares regression (PLSR) and artificial neural network (ANN). As a kind of regression modeling method which deals with two groups of multiple correlated variables, PLSR combines the strengths of multivariate linear regression, principal component analysis, and canonical correlation analysis. Besides, for discovering the linear relationships between two groups of multiple correlated variables, PLSR can also be used in feature selection for ANN.
It is well-known that ANN is a 'Blackbox Model' [17][18][19][20][21]. Even though the predicting results can be better, they are actually hard to explain. Under this circumstance, PLSR can also make up such drawback of ANN for its strong ability to explain. e combination of PLSR and ANN can bring about a more comprehensive exploration on the considering problem.
e implement steps of PLSR are shown in Figure 1. ANN is a system composed of a large number of interconnected processing units which processes nonlinear and adaptive information. Its four characteristics are nonlinear, nonlimiting, abnormal qualitative, and nonconvexity. In the present study, ANN is utilized in exploring a nonlinear relationship because ANN performs better than PLSR on nonlinear database. e implement steps of artificial neural network are shown in Figure 2.
We employed Python 3.7 to function PLSR and ANN, where the corresponding functions are PLSRegression() and MLPRegression(), respectively. Some important tuned parameters of these two functions are listed in Table 1.
Two groups of experiments were conducted (first, regard quasi-ocean CO 2 concentration as the dependent variables; second, consider ocean CO 2 concentration as the dependent variable), where the important parameter "n_components" is tuned from 0 to 10 in each group of experiment. In each experiment, we split the database into training set (1/3 of the dataset) and testing set (2/3 of the dataset).
Performances of PLSR and ANN are both evaluated by RMSE, RPD, and R 2 . Let * C and C′ be the measured and predicted values, respectively. Let n be the number of observations. RPD is defined as the standard deviation of prediction (SDP) over RMSE, as shown in Table 2.
Finally, for each PLSR and ANN model in the present study, if RPD >2, then we will conclude that the model has a good ability for prediction; if the RPD is less than 1.4, then we will claim that the model is unable to make good estimation. is rule has been proposed and widely recognized in some previous studies [22,23].

Performance of PLSR and ANN with regard to Quasi-Ocean CO 2 Concentration.
As what can be seen from Figure 3, the RPD values are all less than 1.4 whichever number of factors we choose. In other words, we cannot get effective information about the crucial factors controlling the quasiocean CO 2 concentration from the PLSR models. Furthermore, with R 2 is only 12.5 for predicting the concentration of quasi-ocean CO 2 , we concluded that the PLSR model built is not a good predictor.
Considering that the PLSR algorithm can only verify the linear correlation between independent variables and dependent variables, we train artificial neural network algorithms to explore the possible nonlinear correlation between the environmental factors and CO 2 concentration.

Mathematical Problems in Engineering
For ANN models of quasi-ocean CO 2 concentration (Figure 4), we added one independent variable one time to the model. e R 2 presents an overall upward trend as more factors are taken into consideration. However, the maximum R 2 value is about 40%, which is a relatively low value for precise prediction. is means that the nonlinear relationship between quasi-ocean CO 2 concentration and other 10 environmental variables is also weak, and the changes of quasi-ocean CO 2 concentration cannot be explained by these environment factors.

Performance of PLSR and ANN with regard to Ocean CO 2
Concentration. Different from the performance of PLSR using the data about quasi-ocean CO 2 , it can be evidently seen from Figure 5 that all the RPD are over 1.4. e positive indicator-RPD experienced a sharp increase while the negative indicator-RMSE decreases sharply, implying that the PLSR model has a good capability to do prediction. Another positive indicator-R 2 reaching up to almost 100% (99.7%) shows that the linear relationship between ocean CO 2 concentration and the considered environmental variables is strong.
Loadings H Loadings G Normalization Data Figure 1: e description of PLSR algorithm procedure.
Next, we train ANN models to explore the possible nonlinear correlation between the environmental factors and ocean CO 2 concentration. Similar to what we have done in training the ANN algorithm for quasi-ocean CO 2 concentration, we still added one independent variable one time to the model. As shown in Figure 6, although R 2 of the first five ANN algorithms we trained is less than zero, once variables which are highly linearly correlated to the ocean CO 2 concentration are taken into the model, R 2 always remains a high level, even reaching up to 100%. e results above show that the variables considered, especially those which have strong linear relationship with the      ocean CO 2 concentration, also have a strong nonlinear relationship with the mechanism of ocean CO 2 concentration changes.

Discussion
Both PLSR and ANN model algorithms were developed in their interdisciplinary applications. Bong [27]. More than a wide range of applications, the ANN algorithm can also be implemented by multiple forms. e ANN model can also be practiced using MATLAB and FPGA [28]. Since ANN is a 'Blackbox Model,' PLSR models were also studied for its strong ability to explain. It is true that ANN models are much more complicated than PLSR models (there are more parameters in ANN models needed to be confirmed compared with PLSR models), but it is also true that ANN models perform better than PLSR models. Farifteh et al. concluded that ANN is superior to PLSR in predicting salt concentration [29]. Same conclusion that ANN is a good predictor compared with PLSR was drawn by Xu et al. [30]. e dynamics of ocean CO 2 concentration has attracted enough attention in previous studies, and the employed models and methods also make sense for the present study. Until today, various kinds of machine learning methods including MLR, MNR, PCR, decision tree, SVMs, MPNN, and RFRE have been used to estimate surface ocean pCO 2 concentration with a total R 2 about 0.95 [31], while the performance of PLSR and ANN in our study was better, R 2 being reaching up to 0.997 and 0.982, respectively. Among the machine learning methods mentioned above, RFRE proved to be the best approach [32].
In the process of training an ANN model, we found that with the factor "groundwater level" added to the model, the R 2 experienced a more obvious increase (about 10%), in comparison to other factors being added to the model. From this phenomenon, we conjectured that groundwater level, along with fCO 2 , HCO 3 , and CO 3 in the groundwater, maybe significant environmental controls of quasi-ocean CO 2 concentration. If this was true, groundwater discharge/ recharge is a significant modulator of soil CO 2 absorption in arid regions.
To further improve the robustness of PLSR and ANN models, we should, on the one hand, collect data about additional environmental variables and also take into account of the evolution in groundwater environments [33]. On the other hand, we can conduct reliability analysis of the employed models and methods utilizing the Monte Carlo simulation [34]. Last but not least, there are many improvements that can be done on both PLSR and ANN. In addition to the application of ANN with a single model, ANN could also work well with other algorithms. Hadi et al. combined ANN with MLP [20,23]. ANN was also applied together with Molecular Dynamic (MD) [21,35]. Hervice et al. found out that proposed optimal ANN model usually had higher accuracy for prediction [36]. Dynamic changes were also described by the dynamic model-based ANN algorithm [28]. In some previous studies, the SVM algorithm has already been combined with PLSR and ANN, respectively [37][38][39][40][41][42][43][44][45][46][47][48]. Before we fully understood the change mechanism of underground CO 2 concentration, all the above regression methods were reasonable based on our current knowledge. In this sense, the further improvements of models and methods also require a development of understanding of the underlining mechanisms for CO 2 absorption by saline-alkaline soils.

Conclusion
Taking an overview of the performance of all the above models, we can conclude that the environmental controls of quasi-ocean CO 2 concentration are still poorly understood. However, the good performance of PLSR and ANN for prediction of ocean CO 2 concentration reveals many useful information. e ten environmental variables we took into consideration could not explain the changes of quasi-ocean CO 2 concentration well. A next research priority is to investigate the influences of the groundwater level and groundwater chemical properties on the dynamics of the quasi-ocean CO 2 concentration.

Data Availability
e data utilized to support the theory and models of the present study are available from the corresponding authors upon request. e set of data related to ocean CO 2 is from the MORB database PetDB (http://www.erathchem.org/ petdb). And others are collected by the authors' project.