Research on Fusing Multisatellite Soil Moisture Data Based on Bayesian Model Averaging

Soil moisture (SM) is an important physical quantity that can reflect the land surface condition. -ere are many ways to measure SM, satellite microwave remote sensing is now considered the primary method because it can provide real-time high-resolution data. However, SM data obtained by satellite remote sensing exhibit certain deviation compared with reference data obtained from ground stations. To improve the accuracy of SM forecasts, this study proposed the use of a Bayesian model averaging (BMA) method to integrate multisatellite SM data. First, China was divided into eight regions.-en, SM data observed by satellites (FY3B, SMOS, and WINDSAT) were fused using the BMA method and a traditional averaging method. Finally, SM data were predicted using data from ground observation stations as a reference standard. Following the fusion process, three parameters (standard deviation, correlation coefficient, and root mean square deviation) were used to evaluate the fusion results, which revealed the superiority of the BMA method over the traditional averaging method.


Introduction
Soil moisture (SM) is an important parameter in surface climate and numerical weather forecasting, hydrological forecasting, agricultural drought monitoring, and the predictions of land surface models in the field of research.
erefore, obtaining high-quality SM data is important with regard to these various activities [1][2][3].ere are many ways to obtain SM data, such as ground-based observation and satellite microwave remote sensing [4].Currently, because the distribution of ground observation stations is sparse, satellite microwave remote sensing has become the primary method to obtain real-time high-resolution SM data.Microwave sensors mounted on satellites can obtain large quantities of information on shallow SM. is information has been used in research regarding the validation of the inversion of satellite-derived SM products, as well as in applications of land surface models and numerical forecasting models [5,6].
Since 1978, several satellite-borne active and passive microwave sensors have been launched successfully by China and other countries.For example, microwave sensors for detecting SM have been deployed on Aqua/AMSR-E, Coriolis/WINDSAT, MetOp-A/ASCAT, SMOS/MIRAS, and FY3B/MWRI satellites.However, because of the influence of land surface roughness, vegetation, and the inversion algorithm adopted, SM data derived from satellite remote sensing often show certain deviation [7].erefore, how to integrate multisource remote sensing satellite data to achieve better results is a topic worth pondering.e traditional method of fusing SM data takes the average values of the observational data of each satellite for the fusion results.In other words, each satellite model is assigned the same weighting.However, in practice, each satellite model behaves differently in different regions [8].Taking China as an example, SM derived by FY3B shows two extreme states: moisture in the humid areas of northern and southern China and dryness in arid northwestern areas.SM derived by SMOS is mostly dry in most parts of the country, and differences in its spatial distribution are not obvious.e soil moisture in WINDSAT is relatively dry in the whole country, but it is relatively moist in the middle and lower reaches of the Yangtze River and in the Northeast [9].In addition, the traditional fusion method does not incorporate observational data from ground-based stations.e objective of this study was to improve the accuracy of SM forecasts using a Bayesian model averaging (BMA) method to obtain time-varying weights.
e a posteriori probability weights assigned to each sensor model in a specific period reflect the inherent uncertainty of each sensor model.In the early stages of the development of the BMA method, there have been some related research applications [10,11].e first application of the BMA method was to calibrate forecast ensembles, where the sea level pressure and other weather variables obeyed Gaussian distributions [12].In 2007, the BMA method was applied to the variable of precipitation, which did not obey a Gaussian distribution [13].In 2013, the BMA method was applied to the prediction of daily mean temperature in the Huaihe River Basin, China.A BMA probabilistic forecasting model for each site in the basin was established dynamically using the regional forecasting techniques of the TIGGE multimodel super ensemble forecasting system [14].
e average daily soil volumetric water content of 376 sites of the automatic soil moisture observation station (ASM) in 2012 was used in this study as a reference standard.
is paper mainly studies the surface soil moisture data (0-10 cm).ree passive microwave remote sensing satellites (FY3B, SMOS, and WINDSAT) are used for satellite data [15,16].SM data observed by satellites were fused by the BMA method.e remainder of this paper is organized as follows.Section 2 introduces both the theory of the BMA method and the algorithm adopted.In Section 3, the method for evaluating the BMA model is described.Section 4 presents the original data and the SM forecast values adopting each region as the research object.Finally, our conclusions are stated in Section 5.

Principles of the BMA Method
2.1.Bayesian Model Averaging Method.Bayesian theory provides a set of ideas based on probabilistic statistical methods applicable to the fusion of information from different sources.e Bayesian formula is defined as e BMA method is a statistical postprocessing method based on the Bayesian theory that uses a combination of multiple statistical models to produce a prediction.If y is the forecast variable (the SM data after fusion), f � f 1 , . . ., f K is the prediction result of K possible models (the prediction result of K satellites), and y T is the training data (the prereal satellite remote sensing SM observational data), the BMA prediction model can be written as follows: where p(f K |y T ) is the weight value of BMA, which represents the prediction value of the Kth satellite model as the a posteriori probability of the optimal prediction result.
Greater weights mean higher accuracy of the prediction result, and the sum of all model weights is 1. e term p K (y|f K , y T ) represents the probability density function (PDF) of the predicted variable y for a given sample and model condition.For each possible model, it is necessary only to consider the proportion that it occupies throughout the prediction process.
e synthetic prediction of variable y based on the BMA method is based on using the probability p(f K |y T ) as the weight, and the PDF p K (y|f K , y T ) of all the models is weighted to realize the probability prediction of the variables.In the study of satellite remote sensing of SM data, p K (y|f K , y T ) can be regarded as a normal distribution function, and its prediction is expected to be a simple linear function u K + v K f K of the single prediction result.e variance is σ 2 , and where u K and v K can be calculated by the linear regression method.erefore, we can obtain the expectation of the BMA forecast as At this point, the predicted value is a definite value, which can be compared with the predicted value of the single model.We denote space and time by subscripts s and t, such that f Kst denotes the Kth forecast in the ensemble for location s and time t [12,17]: e solution of parameter w K and σ 2 in the upper model is the key.First, the maximum likelihood estimation method is used to determine the maximum value and then the expectation-maximization (EM) algorithm is used to solve the problem (Section 2.2 for details of the specific algorithms).

Parameter Estimation.
e EM algorithm is a method for obtaining maximum likelihood estimations of parameters [18,19].Raftery et al. [12] proposed a new method to solve the weights and the variance using the EM algorithm to solve the case where the forecast variable obeys a normal distribution.e EM algorithm is iterative, and it alternates between two steps: the E (or expectation) step and the M (or maximization) step [20].e two steps of the EM algorithm are as follows: First, the weights and variances are initialized: (5)

Advances in Meteorology
Its logarithmic likelihood function is E step: for each j, replace it with j + 1 and calculate M step: its weight value is We can then obtain the variance e above steps are repeated to update the iteration, constantly optimizing the parameter value and checking for convergence.e iteration continues until convergence is achieved.

Evaluation Criteria
e SM data obtained from the automatic ground observation stations were used as the reference standard, and the multisatellite SM data were recorded as x i , the mean value of which was x i .
e calibrated SM data were recorded as y i , the mean value of which was y i .To evaluate the SM data, we introduce three parameters to measure the calibration results, which are the standard deviation (SD), correlation coefficient (R), and root mean square deviation (RMSD), using the SM data from the ground observation stations as the reference standard.For ground-observed SM data and satellite-derived SM data, the SD has the following formula: e value of R between the corrected SM data and the reference SM data can be calculated as follows: and the RMSD can be calculated as If either the SD or the RMSD is small or the R is closer to 1, the better result we will get.A Taylor diagram can characterize these three parameter values and illustrate the results more clearly (Section 4).

Regional SM Data Fusion Results
Unlike some studies, the SM data of each province were taken as the research object.However, it is more appropriate that the overall area was divided into several regions based on characteristics of drought and flooding.Zhu [21] used the rotary empirical orthogonal function to divide eastern and western regions of China into seven drought and flood areas.In this study, China was divided into eight regions, each of which was covered equally by three satellites.In the case of Northeast China, for example, we randomly selected 9/10 of all the data for training purposes.According to the Bayesian model, the EM algorithm was used to derive the weights of each satellite, and then the remaining 1/10 data were treated using the BMA method and the traditional averaging method to obtain the weights of the fusion and to draw the Taylor diagrams for comparison.Figures 1 and 2 depict the experimental results of the training for the Northeast region.Figure 1 describes the distribution of raw data in.
Figure 1 describes the distribution of raw data in Northeast China, including ground observation data and three satellite observations.e horizontal axis represents the distribution of the site, and the vertical axis indicates the value of soil moisture data.rough the curve trend of each data, it can be seen that the soil moisture has a strong spatial variability.It can be seen from the map that the trend of the WINDSAT satellite is the closest to the ground observation data (ASM), and the FY3B and SMOS satellite curves are far from the ASM value.Figure 2 illustrates the satellite observation data and the product of the BMA method after integration of the PDF curve, where the abscissa represents the soil moisture and the ordinate represents the soil volume of water content of the data.From the PDF curve describing the cumulative distribution, it is evident that WINDSAT is the closest of the three satellites to the fusion results and WINDSAT has the highest weight.In addition, we obtained three satellites with weights of 0.268 (FY3B), 0.211 (SMOS), and 0.521 (WINDSAT), respectively, in Northeast region.
To evaluate the quality of the results obtained by the fusion method, we used the parameters of SD, R, and RMSD to measure the system deviation and plotted them on a Taylor diagram for an intuitive representation.In the Taylor diagrams, the distance from the origin (the radius) represents the SD of the data.e arc represents the value of R between the point and the reference data.Drawing an arc with the reference point as its center, each point located on the arc represents the RMSD of that point.In the process of drawing the Taylor diagrams, some difficulties were encountered; however, the results were finally optimized using quality control.
Figures 3-5 present Taylor diagrams for Northeast China, northern North China, and the region of the middle and lower reaches of the Yangtze River, respectively.In figures, the six points plotted in the Taylor diagrams are respectively the automatic ground-based observational data (ASM), observational data from the three satellites (FY3B, SMOS, and WINDSAT), fusion results obtained by means of the traditional averaging method (AVER), and BMA results.As can be seen from Figure 3, the SD of the WINDSAT Advances in Meteorology satellite in Northeast China is about 0.07, R is more than 0.6, and the RMSD is about 0.06.ese three evaluation indexes are all superior to the SMOS satellite and the FY3B satellite, which corresponds to the information expressed in the previous results.Using the multisatellite data to fuse, the SD of the fusion results of the average method is about 0.075, R reaches 0.6, and the RMSD is about 0.06.And the SD of the fusion result through BMA method is 0.06, R reaches 0.7, and RMSD is 0.05.e parameters of the BMA method have been improved accordingly.e experimental results show that the BMA method can improve the accuracy of the prediction compared with the mean method and the BMA method has a better fusion effect.
In Figure 4, three parameters (SD, R, and RMSD) of the WINDSAT satellite in northern North China are superior to that of the SMOS satellite and the FY3B satellite.Using the multisatellite data to fuse, the standard deviation of the fusion results of the average method is larger than the standard deviation of the BMA method.e parameter of R is approximately equal to that of the BMA method, and RMSD is greater than the root mean square error of the BMA method.It is still the better fusion effect of the BMA method.From Figure 5, we can see that the three evaluation indexes of satellite data are similar to those of the above two regions, while the results of the traditional mean method and the BMA method are similar.But we can see that the BMA method is still better.However, environmental and other factors led to insufficient data for the Northwest and Southwest regions, which produced unsatisfactory results.
ese areas have high altitude and poor weather, and these regional meteorological stations are scarce.
is will be investigated in future research.
In order to compare the overall performance of the BMA method and the traditional average method in different regions, we give the results of multiple regions in a Taylor diagram.Figure 6 shows the results of the Taylor diagram of five regions.e regions 1 to 5 are, respectively, Northeast China, northern North China, southern North China, the  middle and lower reaches of the Yangtze river, and eastern Northwest China.In the figure, 1FY represents the result of FY3B in region 1; correspondingly, 1SM stands for the result of SMOS in region 1, and 1WI is the result of WINDSAT in region 1.From the figure, we can see that the points named BMA is located below the AVER points.It means that the SD and RMSD of the fusion results of the averaging method are greater than the BMA method.Furthermore, the R of the fusion results of BMA method is closer to 1.We can know that the BMA method is better than the traditional averaging method in the five regions.
Table 1 shows a comparison of the fusion characteristics based on region division.e first three columns of the table are the previous divided area and the divided area of this paper, and the corresponding latitude and longitude of each area.e next two columns are quantity of data fusion, and BMA method fusion effect is better than the averaging method number of sites.e last column is ratio of better result in BMA method.It can be seen that the proportion of BMA method fusion in the Northeast, eastern Northwest, and southern Southwest regions is 78.96%, 57.33%, and 92.46%, respectively, whereas it is <50% in all other regions.By querying the satellite weight in those regions with values <50%, it was found that the weight value of the BMA method is about 1/3, similar to the weight value obtained by the averaging method.e small amounts of data obtained for these regions because of environmental and other factors account for the poor results.In spite of this, the overall result of the fusion of BMA methods is clearly superior to the traditional average method.

Conclusions
is research used the BMA method for the fusion of multisatellite microwave remote sensing SM data, in order to

Advances in Meteorology
improve the prediction of SM data at unmeasured points.From the results of Taylor diagram, the fusion results obtained by using BMA method in different regions are better than the traditional average method.From the fusion results of different regions, it can be known that the proportion of the BMA method fusion in the Northeast region is better, accounting for 78.96%, the eastern Northwest region reaching 57.33%, and the southern Southwest region reaching 92.46%.So we can get the following conclusions.In the prediction of SM data, the use of the BMA method for the fusion of SM data not only solves the uncertainty of the model but it also improves the accuracy of the predicted value.
Although the evaluation parameters were improved, it has not been clarified how the forecast could be made more accurate.In future study, we will use additional groundbased SM observation stations and more sophisticated equipment to obtain further SM data to augment the original database.

Figure 2 :
Figure2: Northeast region satellite data and product of the BMA method after integration of the PDF curve.

Figure 3 : 1 C o r r e l a t i o n c o e f f i c i e n tFigure 4 : 1 C o r r e l a t i o n c o e f f i c i e n tFigure 5 :Figure 6 :
Figure 3: Taylor diagram for Northeast China.

Table 1 :
Comparison of regional divisions and fusion characteristics.