Spatial Downscaling of GPM Annual and Monthly Precipitation Using Regression-Based Algorithms in a Mountainous Area

As a fundamental component in material and energy circulation, precipitation with high resolution and accuracy is of great significance for hydrological, meteorological, and ecological studies. Since satellite measured precipitation is often too coarse for practical applications, it is essential to develop spatial downscaling algorithms. In this study, we investigated two downscaling algorithms based on the Multiple Linear Regression (MLR) and the Geographically Weighted Regression (GWR), respectively. They were employed to downscale annual and monthly precipitation obtained from the Global Precipitation Measurement (GPM) Mission in HengduanMountains, Southwestern China, from 10 km × 10 km to 1 km × 1 km. Ground observations were then used to validate the accuracy of downscaled precipitation. The results showed that (1) GWR performed much better than MLR to regress precipitation on Normalized Difference Vegetation Index (NDVI) and Digital Elevation Model (DEM); (2) coefficients of GWR models showed strong spatial nonstationarity, but the spatial mean standardized coefficients were very similar to standardized coefficients of MLR in terms of intra-annual patterns: generally NDVI was positively related to precipitation when monthly precipitation was under 166mm; DEM was negatively related to precipitation, especially in wet months like July and August; contribution of DEM to precipitation was greater than that of NDVI; (3) residuals’ correction was indispensable for the MLRbased algorithm but should be removed from the GWR-based algorithm; (4) the GWR-based algorithm rather than theMLR-based algorithm produced more accurate precipitation than original GPM precipitation.These results indicated that GWR is a promising method in satellite precipitation downscaling researches and needed to be further studied.


Introduction
As a fundamental component in material and energy circulation, precipitation is of great significance for hydrological, meteorological, and ecological studies (e.g., [1][2][3]).Traditionally, spatial distribution of precipitation is obtained through rain gauge data interpolation [4].However, interpolation methods are greatly limited over mountainous regions due to the sparse rain gauge network [5,6].Satellite precipitation datasets based on remote sensing technology have developed rapidly since the 1980s, such as the Global Precipitation Climatology Project (GPCP, [7]), the Tropical rainfall Measuring Mission (TRMM, [8,9]), the Global Satellite Mapping of Precipitation (GSMaP, [10]), and the Global Precipitation Measurement (GPM) mission [11][12][13].Gridded satellite precipitation datasets provide reliable estimations of precipitation reflecting more spatial distribution than rain gauge data.However, for regional scale applications, they are often too coarse to be used in hydrological, meteorological, or ecological studies [14,15].Taking TRMM as an example, its spatial resolution is only 0.25 ∘ × 0.25 ∘ .Therefore, it is essential to develop downscaling algorithms for satellite datasets to improve their resolutions as well as accuracy.
Relationships between precipitation and other environmental factors such as vegetation and topography have been widely investigated in the literature [16][17][18][19][20][21].Meanwhile, available resolutions of gridded data of these factors are often higher than those of satellite precipitation products.

Advances in Meteorology
For instance, the SPOT Normalized Difference Vegetation Index (NDVI) is available at 1 km × 1 km resolution.Therefore, it is a feasible approach to downscale precipitation through establishing statistical models of precipitation and these factors, and this is termed as statistical downscaling algorithms.Their fundamental statistical theories could be classified into regression analysis [14,15,[22][23][24] and machine learning methods such as Artificial Neural Network [25,26] and random forests [27][28][29].
In this article, regression-based downscaling algorithms were mainly concerned and investigated.Immerzeel et al. [23] established exponential regression (ER) models between annual TRMM precipitation and NDVI at different spatial resolution from 0.25 ∘ to 1.50 ∘ , among which who with the optimal coefficient of determination ( 2 ) was eventually selected to downscale the TRMM precipitation from 0.25 ∘ to 1 km.Jia et al. [14] introduced both NDVI and Digital Elevation Model (DEM) as the explanatory variables in a Multiple Linear Regression (MLR) model.Independent station validation in the two studies and later studies [15,22,24,30] showed that the downscaled precipitation based on ER or MLR after residual correction was comparable with the original TRMM product but at much improved resolution.
Nevertheless, there are several limitations of the above downscaling algorithms.Firstly, downscaled precipitation would inevitably inherent errors in the original satellite precipitation [14,23].This indicates that it is not likely to obtain accurate downscaled precipitation from inaccurate satellite datasets.Therefore, the next generation GPM product with higher quality and resolution (0.1 ∘ × 0.1 ∘ ) was used as the original satellite precipitation to partly overcome this problem.As far as our information goes, most previous studies were based on TRMM, and downscaling algorithms based on GPM product were short of research.Secondly, performance of downscaling these algorithms relies on goodness-of-fit of regression models.The above two algorithms are both globally regression models which assume that the relationships among precipitation, NDVI, and DEM are spatially stationary.Against their assumption, however, precipitation-NDVI and precipitation-DEM relationships were both reported to be spatially nonstationary [17,18,21].Brunsdont et al. [31] put forward a local regression model called Geographically Weighted Regression (GWR) to handle the problem of spatial nonstationary.GWR-based satellite precipitation downscaling algorithm was found to perform better than the two global-regression-based algorithms over various regions on TRMM product, at both annual scale and monthly scale [15,30,32].
This study aims to investigate the effectiveness of the MLR-based and GWR-based downscaling algorithms on GPM annual and monthly precipitation.To this end, NDVI and DEM were employed as the explanatory variables in the MLR-based and GWR-based downscaling algorithms to downscale GPM monthly and annual precipitation from 10 km × 10 km to 1 km × 1 km over Hengduan Mountains, Southwestern China.Their performances were validated with observed precipitation from 71 rain gauge stations.

Study Area and Datasets
2.1.Hengduan Mountains.Hengduan Mountains (24 ∘ 40  -34 ∘ 00  N; 96 ∘ 20  -104 ∘ 30  E) locates in the southwestern of China and the southeastern of Tibetan Plateau, with an area of ∼500,000 km 2 .The study region possesses a complex topography with plenty of high mountains and deep-cutting gorges which run north to south.The climate of Hengduan Mountains is mainly controlled by southwest Asian monsoon, the southeast Asian monsoon, and the winter monsoon as well as local circulation of Tibetan Plateau.Meanwhile, due to its wide range of altitude from 300 m to over 7000 m, the climate also varies vertically.Mean annual precipitation of Hengduan Mountains is about 800 mm and more than 75% comes from the summer monsoon (May to October).To weaken the influence of edge effect of the downscaling algorithms, a 100 kilometers wide buffer area was added to the study area during calculation and validation.

Precipitation Datasets.
The Global Precipitation Measurement (GPM) Mission, as a successor of the Tropical rainfall Measuring Mission (TRMM), aims to provide a new generation of global rainfall and snowfall observations.The Core Observatory satellite was launched in February 2014 by the National Aeronautics and Space Administration (NASA) and the Japan Aerospace and Exploration Agency (JAXA) carrying the GPM Microwave Imager (GMI) and the Dualfrequency Precipitation Radar (DPR).This study selected the GPM-3IMERGM product (GPM for short, downloaded from https://pmm.nasa.gov/data-access/downloads/gpm)which estimated global monthly precipitation at resolution of 0.1 ∘ × 0.1 ∘ by combining GPM with several other satellite precipitation datasets.Since GPM dataset starts from March 2014, the first year with complete months (i.e., 2015) was therefore selected to investigate GPM annual precipitation downscaling algorithms.Annual precipitation (Figure 2(a)) was obtained by summing up monthly GPM precipitation in the year.
The ground observed precipitation of 71 rain gauge stations (Figure 1) located in Hengduan Mountains and the buffer area was provided by the National Meteorological Information Center of China Meteorological Administration.Since the function of ground data was to validate the results of downscaling algorithms independently, it was not used to correct GPM precipitation.

Method
3.1.Multiple Linear Regression.Jia et al. [14] proposed a downscaling algorithm based on Multiple Linear Regression (MLR), which is expressed as where  LR , NDVI LR , and DEM LR are the GPM annual precipitation, NDVI, and elevation at low resolution (LR). NDVI and  DEM are slope of precipitation to NDVI and precipitation to DEM, respectively. 0 is the intercept, and  LR is residual of the regression model.The three coefficients are calibrated with the Ordinary Least Square (OLS) method.Standardized coefficients could be calibrated if the variables are standardized with (2) in advance.

Geographically Weighted Regression. Geographically
Weighted Regression (GWR) proposed by Brunsdont et al. [31] is a local regression method that can be used to investigate spatially nonstationary correlational relationships.Contrast to the MLR model, the GWR model assumes the coefficients to vary with geographical locations, and it is expressed as where  is the geographical location.The coefficients are as same as those in the MLR method, except that they are local rather than global coefficients and can be estimated by where  LR is matrix of the explanatory variable at low resolution composed of NDVI LR , DEM LR , and vector of constant one and   LR is transposition of  LR .() is the weight matrix that put more weights on observations closer to the center point and can be calculated with kernel functions like Gaussian kernel and bisquare kernel.Gaussian kernel (Figure 3) was adopted in this study, and () is the diagonal matrix of   () where   () is the Euclidean distance between the th observation and the center point and  is the kernel bandwidth that can be estimated with the cross validation (CV) technique.

Procedure of Downscaling Algorithms
Step 1 (data preparation).Resample annual or monthly NDVI and DEM fields to the low (10 km × 10 km) and high (1 km × 1 km) resolution.And resample the GPM annual precipitation field to 10 km × 10 km resolution.It should be noted that all the fields must be projected to the same projection coordinate system (China Albers Equal Area Conic), and fields at the same resolution should share the same raster.
Step 2 (model establishment).Using sample data at low resolution to calibrate the coefficients of regression functions in the downscaling algorithms.For GWR-based algorithm, the bandwidth  should be estimated with cross validation (CV) technique before calibration of the coefficients.
Step 3. Prediction of high resolution precipitation ( PHR ) before residuals correction: put high resolution NDVI (NDVI HR ) and DEM (DEM HR ) into the MLR models and GWR models, respectively, to predict  HR .The downscaled results before residuals correction were termed as MLRBRC and GWRBRC, respectively, in this study.
Step 4 (residuals correction).Interpolate low resolution residuals ( LR ) of the regression methods to high resolution residuals ( HR ) with simple spline tension interpolator, respectively.The final downscaled precipitation ( HR ) is calculated as And the downscaled results after residuals corrections were termed as MLRARC and GWRARC, respectively.

Validation Criteria.
Observed precipitation from rain gauges was used to validate the precipitation downscaled by the two different algorithms.In the validation, we employed three widely used criteria, namely, the Mean absolute error (MAE), the Root Mean Square Error (RMSE), and the coefficient of determination ( 2 ), which are calculated as where  is precipitation predicted by regression models or downscaling algorithms and  is observed precipitation from rain gauge stations.1.First of all,  values of all the MLR models were smaller than 0.001, which meant all the MLR models were extremely significant and therefore to be valid.However,  values of February to June were relatively larger, and the corresponding  2 was smaller than 0.06, indicating that no more than 6% of variances of GPM precipitation could be interpreted by MLR models from February to June.Among the other months, July had the highest  2 of 0.39, and annual  2 was 0.35.For annual GPM precipitation, slope of NDVI was 390.93 mm/1, which meant as NDVI increased per unit, annual precipitation would increase 390.93 mm averagely.Similarly, slope of DEM was −124.06 mm/km, which meant as DEM increased per kilometer, annual precipitation would decrease 124.06 mm averagely.For monthly precipitation, slopes of NDVI ranged from −64.37 to 75.79 mm/1 and slopes of DEM ranged from −49.16 to 0.01 mm/km.

Results and Discussion
We further calculated standardized regression coefficients of the MLR models to investigate the intra-annual variation of standardized coefficients and compare the relative contributions of NDVI and DEM.As shown in Figure 4, for monthly precipitation, standardized slopes of NDVI changed periodically for three times in the year.The first cycle was from January to June, the second was from June to October, and the third was from October to January.The peak values were positive and the valley values were slightly negative, except that the valley value reached −0.12 in August, when monthly precipitation reached its peak of 195.22 mm in the year.The reasons for such the complicated periodicity might be the combined effect of the following facts: (1) Hengduan Mountains is controlled by southwest Asian monsoon, the southeast Asian monsoon, the winter monsoon, and local circulation of Tibetan Plateau in turn and/or simultaneously; (2) there are time lags between vegetation growth withering and precipitation increase/decrease; (3) vegetation is restricted by both water and temperature during dry season, but much less restricted by water than temperature during flood season.Standardized slopes of DEM were negative and smaller than −0.2 all the year except for February, May, and June.
The relative contributions of NDVI and DEM could be estimated through comparing absolute values of their standardized coefficients.For annual precipitation, standardized slopes of NDVI were 0.20, which meant as NDVI increased per standardized unit, annual precipitation would increase 0.20 standardized units averagely.Meanwhile, standardized slopes of DEM were −0.44, which meant as DEM increased per standardized unit, annual precipitation would decrease 0.44 standardized units averagely.The relative contribution to annual precipitation of DEM was about twice of that of NDVI.For monthly precipitation, contribution of DEM was also larger than that of NDVI all the year except for January, February, and June.
Finally, we investigated relationships between standardized coefficients of NDVI and DEM in MLR models against regionally averaged GPM monthly precipitation (Figure 5).Both insignificant (p ∼= 0.20) decreasing trends were found for NDVI and DEM.Standardized coefficients of NDVI trended from positive to zero, when monthly precipitation increased below 166 mm, and trended from zero to negative when monthly precipitation increased over 166 mm.To a certain degree, this feature precipitation of 166 mm could be regarded as the water saturation point of Hengduan Mountains, if neglecting time lags between NDVI and precipitation.Standardized coefficients of DEM trended more negative as monthly precipitation increased, indicating that the stronger southwest and southeast monsoons [33] were, the stronger relationships between precipitation and DEM were.

Results of GWR Models. Geographically Weighted
Regression (MLR) models were established to explore the local relationships of annual and monthly GPM precipitation against NDVI and DEM in Hengduan Mountains at lower resolution of 10 km.As Figure 6 showed,  2 of GWR models ranged from 0.82 to 0.98, except that  2 of November was 0.65, which were much greater than those of MLR models.Obviously, GWR models were much more effective on explaining the original GPM annual and monthly precipitation than MLR models.Meantime, the intra-annual variation pattern of  2 of GWR was very similar to that of MLR, indicating that the local relationships of precipitation   against NDVI and DEM were positively related to the global ones within a year.
Unlike the MLR model, calibrated coefficients in GWR models were not spatially stationary.As an example, Figure 7 showed the spatial distribution of coefficients of the GWR model for annual GPM precipitation in Hengduan Mountains.The intercepts varied from −1666 to 4106 mm, and the spatial pattern was similar to that of annual GPM precipitation with a  2 of 0.56.The coefficients of NDVI varied from −1674 to 1581 mm/1, with no obvious spatial pattern.The coefficients of DEM varied from −657 to 435 mm/km and were slightly positively related to elevation ( 2 = 0.097).
We also calculated standardized coefficients of NDVI and DEM when establishing the GWR model of annual GPM precipitation.Like coefficients of determination, mean standardized coefficients of NDVI and DEM of the GWR models (Figure 8) showed very similar intra-annual pattern with those of the MLR models.It revealed that although coefficients in the GWR models were estimated locally, but their spatial distribution could reflect the global relationships of precipitation against NDVI and DEM.However, it should be noted that the standard deviations of standardized coefficients were quite large, emphasizing the great difference between local coefficients in the GWR model and global coefficients in the MLR model.

Downscaled Precipitation before/after Residuals Correction.
Monthly and annual precipitation at high resolution before residual correction (MLRBRC and GWRBRC) were predicted by putting the calibrated coefficients, high resolution NDVI, and DEM into the MLR and GWR models, respectively.After adding the high resolution residuals interpolated with spline tension method to MLRBRC and GWRBRC, MLRARC and GWRARC were eventually obtained.As an example, Figure 9 showed the downscaled annual precipitation before residuals correction, interpolated residuals, and the downscaled annual precipitation after residuals correction, from which a qualitative comparison of downscaled annual precipitation of the MLR-based and GWR-based algorithms was made as below.
Basically, MLRBRC annual precipitation was only able to capture the global trend that "south wetter than north" rather than the spatial distribution of original GPM, and its absolute residuals were very large, especially in extremely wet regions like the Gaoligong Mountains and middle and lower reaches of the Nu River.While GWRBRC annual precipitation distributed very similar with original GPM spatially, the former was a little smoother.Meanwhile, the range of GWRBRC (269-1703 mm) was a little wider than that of original GPM (394-1625 mm), which was reasonable for the averaging effecting of low resolution precipitation.Absolute residuals of GWRBRC were also much smaller than those of MLRBRC in most regions.As for MLRARC and GWRARC annual precipitation, they were also very similar with the original GPM like GWRBRC, so the differences among them were analyzed based on ground validation in the next two sections.To investigate whether residuals correction was helpful to MLR-based and GWR-based downscaling algorithms, we compared the accuracy of monthly (Figure 10) and annual (Table 2) precipitation downscaled by MLR-based and GWR-based algorithms between before residuals correction (termed as MLRBRC or GWRBRC) and after residuals correction (termed as MLRARC or GWRARC).Average accuracy of monthly downscaled precipitation was also summarized in Table 2.

Is Residuals Correction
Compared to MLRBRC, MLRARC had smaller RMSE and MAE as well as higher  2 in each month of the year.For monthly precipitation, averagely speaking, ΔRMSE =   Obviously, the step of residuals correction improved the accuracy of precipitation predicted by MLR significantly and hence was indispensable for MLR-based algorithm.

Advances in Meteorology
However, GWRARC had slightly greater RMSE and MAE as well as lower  2 compared to GWRBRC in each month except March of the year.For monthly precipitation, averagely speaking, ΔRMSE = 1.66 mm, ΔMAE = 1.06 mm, and Δ 2 = −0.05.And for annual precipitation in 2015, the changes were, ΔRMSE = 7.00 mm, ΔMAE = 7.12 mm, and Δ 2 = −0.03.This indicated that the step of residuals correction degraded the accuracy of precipitation predicted by the GWR models.Therefore, the step of residuals correction was recommended to be removed from the procedure of the GWR-based downscaling algorithm and as a result, GWRBRC rather than GWRARC was adopted as the final downscaled precipitation.and GWRBRC should be adopted as the final results of MLR-based and GWR-based algorithms.In this section we compared the accuracy of MLRARC, GWRBRC, and the original GPM monthly and annual precipitation.The accuracy of monthly precipitation was presented in Figure 11 and those on annual precipitation and summary of average accuracy of monthly precipitation were shown in Table 2.

Accuracy of the MLR-Based and GWR-Based
MLRARC performed better than original GPM monthly precipitation in 7 out of 12 months with RMSE as evaluation criterion and 6 out of 12 months with MAE or  2 as evaluation criteria.Averagely speaking, MLRARC had almost the same accuracy as GPM monthly precipitation, with ΔRMSE = −0.67 mm, ΔMAE = 0.43 mm, and Δ 2 = 0.00, indicating that it was no better than the Nearest Neighborhood

2. 3 .
NDVI and DEM Datasets.The MOD13A3 monthly NDVI product with a spatial resolution of 1 km × 1 km derived from atmospherically corrected reflectance in the red and near-infrared wavebands of the Moderate Resolution Imaging Spectroradiometer (MODIS) on the Terra satellite was used in this study.Annual NDVI (Figure2(b)) was obtained by averaging monthly NDVI in the year.The Shuttle Radar Topography Mission (STRM) DEM product was provided by the National Geospatial-Intelligence Agency (NGA) and the National Aeronautics and Space Administration (NASA).The original DEM data was at resolution of 90 m × 90 m and resampled to resolution of 1 km × 1 km (Figure1) and 10 km × 10 km.

Figure 1 :Figure 2 :Figure 3 :
Figure 1: The Digital Elevation Model of Hengduan Mountains and locations of rain gauge stations.

Figure 4 :
Figure 4: Standardized coefficients of MLR models of annual and monthly GPM precipitation against NDVI and DEM in Hengduan Mountains.

4. 1 .
Results of Regression Models 4.1.1.Results of MLR Models.Multiple linear regression (MLR) models were established to explore the global relationships of annual and monthly GPM precipitation against NDVI and DEM in Hengduan Mountains at lower resolution of 10 km.The calibrated coefficients including the constant coefficients, slopes of NDVI and DEM, and the corresponding coefficients of determination and  values were shown in Table

Figure 5 :Figure 6 :
Figure 5: Relationships between standardized coefficients of (a) NDVI and (b) DEM in the MLR models against regionally averaged GPM monthly precipitation.

Figure 7 :
Figure 7: Spatial distribution of coefficients of the GWR model for annual GPM precipitation in Hengduan Mountains in 2015: (a) for the intercepts, (b) for the coefficients of NDVI, and (c) for the coefficients of DEM.

Figure 8 :
Figure 8: Regional means and standard deviations of standardized coefficients of GWR models of annual and monthly GPM precipitation against NDVI and DEM in Hengduan Mountains.
buffer area) were used to validate the accuracy of the MLRbased and GWR-based downscaling algorithms using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of determination ( 2 ).

Figure 10 :
Figure 10: Comparison of the accuracy of monthly precipitation downscaled by MLR-based and GWR-based algorithms between before residuals correction (termed as MLRBRC or GWRBRC) and after residuals correction (termed as MLRARC or GWRARC) using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination ( 2 ).

Figure 11 :
Figure 11: Comparison of the accuracy of monthly precipitation downscaled by MLR-based algorithm after residuals correction (termed as MLRARC) and GWR-based algorithm between before residuals correction (termed as GWRBRC) using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and coefficient of determination ( 2 ).

Table 1 :
Coefficients, coefficients of determination, and  values of MLR models of annual and monthly GPM precipitation against NDVI and DEM in Hengduan Mountains, at resolution

Table 2 :
Averaged accuracy for monthly precipitation and accuracy for annual precipitation of the MLR-based and GWR-based downscaling algorithms with Root Mean Square Error, Mean Absolute Error, and coefficients of determination as criteria.BRC represents before residuals correction and ARC represents after residual correction.