Validation of the Accuracy of Different Precipitation Datasets over Tianshan Mountainous Area

Precipitation is one of the important water supplies in the arid and semiarid regions of northwestern China, playing a vital role in maintaining the fragile ecosystem. In remote mountainous area, it is difficult to obtain an accurate and reliable spatialization of the precipitation amount at the regional scale due to the inaccessibility, the sparsity of observation stations, and the complexity of relationships between precipitation and topography. Furthermore, accurate precipitation is important driven data for hydrological models to assess the water balance and water resource for hydrologists. Therefore, the use of satellite remote sensing becomes an important means over mountainous area. Precipitation datasets based on station data or pure satellite data have been increasingly available in spite of several weaknesses. This paper evaluates the usefulness of three precipitation datasets including TRMM 3B43 V6, 3B43 V7, and Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation with rain gauge data over Tianshan mountainous area where precipitation data is scarce. The results suggest that precipitation measurements only provided accurate information on a small scale, while the satellite remote sensing of precipitation had obvious advantages in basin scale or large scale especially over remote mountainous area.


Introduction
Mountainous areas play a critical role in maintaining water resource supply in the arid and semiarid regions [1,2].However, due to the inaccessibility, the sparsity of observation stations, and the complexity of relationships between precipitation and topography, little data has generally been collected in mountainous areas, and it is difficult to obtain an accurate and reliable spatialization of precipitation amount at the regional scale [3][4][5].Many researchers have indicated that in the arid and semiarid regions, including the northwest of China, mountainous areas contribute 40-85% of total runoff, while in subhumid areas the contribution only varies from 20% to 50% [6][7][8][9][10].As one important input parameter for hydrological and ecological models, how to obtain high resolution precipitation data in remote mountainous areas is a true challenge for researchers [11,12].
In general, precipitation data is mainly obtained by rain gauge measurement, estimation, and modeling.Because of complex terrain, the density of rain gauge networks is seriously limited in mountainous area, and the distribution is unreasonable and even practically impossible in high mountainous area.Hence, accurate understanding of the distribution and characteristics of mountainous precipitation is particularly difficult.In recent decades, with the development of remote sensing techniques, precipitation data with relatively uniform and consistent information with temporal and spatial coverage has become available in some remote regions lacking data.On the basis of this, the precipitation datasets based on satellite remote sensing in hydrological and meteorological stations have increased significantly in the past few decades [13][14][15].However, there are always different types of error such as inherent measurement and retrieval 2 Advances in Meteorology errors and sampling uncertainty.In order to obtain accurate data, the precipitation datasets based on satellite need to be calibrated or verified by comparison with ground-based rain gauge data [16][17][18].
There are a number of precipitation datasets with distinct characteristics at different temporal and spatial scales, and they are helpful to study global warming, the hydrological cycle, and economic activities at large scale.These datasets include Climate Research Unit (CRU) precipitation datasets [19], Global Precipitation and Climatology Project (GPCP) One-Degree Daily [20], and CPC Merged Analysis of Precipitation (CMAP) [21].These datasets can be divided into two categories: one provides high resolution to meet spatial resolution requirement but lacks time series, and the other displays great stability for year long-term but sacrifices temporal and spatial resolution [22,23].The CRU precipitation datasets can originate station measurement using thin-plate splines interpolation with up to 14500 stations and provide precipitation data of all continents except Antarctica at 0.5 ∘ × 0.5 ∘ resolution from the period 1901 to 2001.The GPCP One-Degree Daily precipitation datasets can adjust the satellite estimation to the gauge bias and then combine multisatellites with rain gauge through inverse error-variance weighting, which can provide precipitation data globally at 1 ∘ × 1 ∘ resolution from October 1996 forward.The CMAP datasets can merge different kinds of information sources with different characteristics, including gauge observations, estimates from a variety of satellite observations, and the NCEP-NCAR reanalysis, and provide global precipitation monthly at 2.5 ∘ × 2.5 ∘ resolution from 1979 onwards.Besides, the datasets can reduce random errors by linearly combining satellite estimates using the maximum likelihood method and give an inversely proportional weight to the linear combination coefficients in relation to the square of the random error of the individual sources [23].
The Tianshan mountainous area is located in the northwest of China, stretched from west to east, and divides Xinjiang into two parts, that is, Southern Xinjiang and Northern Xinjiang [24][25][26].This area is situated in the hinterland of Eurasian continent far away from the ocean, and precipitation is the main source of surface water for river runoff, farm irrigation, and urban water consumption.The west wind current owing to the prevailing westerlies is the major vapor resource over Tianshan mountainous area, and the dry and cold current from the Arctic Ocean is the second resource, whose vapor content is equal to 25-33% of that of the west wind current [26].As a result, the Tianshan mountainous area presents more precipitation in the north than in the south, more in the west than in the east, more in the mountainous area than in the plain area, and more on the windward slope than on the leeward slope [27].Zonal vertical distribution of mountainous precipitation is very obvious [25].
To address this, this paper tries to point out the similarities and discrepancies over the Tianshan mountainous area from 1998 to 2007 by virtue of comparing the monthly precipitation datasets of TRMM 3B43 V6, the new version of 3B43 V7, and APHRODITE with rain gauge data.According to the results, the applicability of these datasets for hydrological and ecological models in mountainous area is evaluated to provide some suggestions for decision makers including agriculturalists, emergency managers, and industrialists.

Data and Methodology
2.1.Data Collection.The Tropical Rainfall Measuring Mission (TRMM) is a joint US-Japan program to monitor tropical and subtropical rainfall, which provides the hourly, daily, and monthly precipitation datasets of the NASA Goddard Space Flight Centre with high resolution.The datasets have shown good performance in different regions around the world including Asian region [28][29][30].TRMM 3B43 V6 is a monthly precipitation dataset with spatial resolution of 0.25 ∘ × 0.25 ∘ covering 50 ∘ N-50 ∘ S from 1998 onwards [31].This dataset combines the estimates generated by the TRMM 3B42 with other satellite datasets and calibrates against globally gridded monthly rain gauge data to scale the estimates.On May 22, 2012, the new version of TRMM, 3B43 V7, was implemented, and the functions of TRMM were further enhanced.Notably, although the TRMM algorithm ensures every grid box has the best possible estimate, it can also result in statistically heterogeneous datasets [32].
APHRODITE precipitation datasets are produced using a modified version of the distance-weighting interpolation method to interpolate rain gauge observations from meteorological stations generated by the Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) of Water Resources project [33].The datasets use daily precipitation climatology to interpolate the ratio of the daily precipitation to the climatology with a resolution of 0.05 degrees and then multiply each gridded ratio by each gridded climatology value day-by-day.APHRODITE has released several daily precipitation datasets since 2008 over Monsoon Asia (MA), Middle East (ME), and Russia (RU) with resolution of 0.25 ∘ × 0.25 ∘ and 0.5 ∘ × 0.5 ∘ , which were considered as "ground-truth" precipitation datasets throughout the region [34].Thereinto, the APHRODITE V1003R1 is a daily dataset released in 2010 and monthly data used in this research is cumulated on daily data obtained from this dataset.
Monthly precipitation was used for 32 stations in the Tianshan mountainous area (Figure 1).Quality of the datasets was firmly controlled and homogeneity tests were also performed before its release [35].Most stations were distributed in low elevations from 35 to 3539 m as presented in Table 1, and no stations were distributed in high mountains over 4000 m.The series of datasets were different in this research, and comparisons were made over a 10-year period from 1998 to 2007.

Methodology.
The same resolution was set for TRMM 3B43 V6, 3B43 V7, and APHRODITE V1003R1, and the precipitation datasets were quantitatively compared on a cellby-cell basis to avoid losing fine details when the datasets were aggregated and assessed [36].The nearly gridded data was selected to be compared with station data because of the resolution difference of station data with TRMM 3B43 V6, 3B43 V7, and APHRODITE V1003R1.The quantitative accuracy of precipitation datasets was assessed by the root-mean square error (RMSE), the root-mean square factor (RMSF), and correlation coefficient ().
The RMSE is a common accuracy measure that is usually used as the measure of magnitude of errors in time series [37,38], which can be calculated as follows: where  is the length of datasets in time series and   and   stand for two compared datasets at time  for each grid, respectively.However, the RMSE mainly emphasizes the differences resulting from erroneous data and only provides limited information for further research.Whereas the RMSF has been found to provide more information than the RMSE, the RMSF comparing the two datasets is also calculated as follows: The closer the RMSF value is to 1, the more accurate the two compared datasets.

Results
Before comparing different precipitation databases, a general description of the mean annual precipitation field of 3B43 V6, 3B43 V7, and APHRODITE V1003R1 is given.The mean annual precipitation over the Tianshan mountainous area displays a well-known pattern of more mean annual precipitation in the north than in the south and more in the west than in the east during the period 1998 to 2007. Figure 2 illustrates the mean annual precipitation distribution of the three different databases.It can be seen that the mean annual precipitation of APHRODITE V1003R1 varies from 54 to 684 mm, that of 3B43 V6 from 77 to 590 mm, and that of 3B43 V7 from 68 to 786 mm.The spatial distribution of these datasets is basically similar over the whole study areas, while the 3B43 V6 shows lower values in northern Ili Valley.Considering the amount, the patterns displayed by the APHRODITE V1003R1 and 3B43 V7 are better than that of 3B43 V6.  3. It is obvious that the description of the interpolation data highlights the existence of similarities and discrepancies with satellite-based data, not only in terms of temporal and spatial distribution, but also in terms of total amount of precipitation.The spatial distribution of correlation value provides a more complete picture that the value is higher in the south than in the north and higher in the east than in the west (Figure 3(a)).In the middle and west parts of Tianshan mountainous area, the value of correlation coefficient is higher than 0.85, which indicates a good performance between the two databases, while in the north and west parts of the region, the value is lower than 0.2, which indicates that the two databases work badly at the position where the precipitation amount is high, especially in the Ili Valley.In the whole study area, the spatial distribution of RMSE is reverse to that of correlation coefficient (Figure 3(b)); namely, higher RMSE can be obtained over the high precipitation area while correlation coefficient would be lower.On the contrary, the spatial distribution of RMSF presents good consistency with that of correlation coefficient (Figure 3(c)), which means the two databases work well in low mountainous area but fail in the area where orographic precipitation has a greater influence on the amount of precipitation.

TRMM 3B43 V7 and APHRODITE V1003R1.
Figure 4 shows the spatial distribution of , RMSE, and RMSF between the APHRODITE V1003R1 and TRMM 3B43 V7.It can be seen that the value of correlation coefficient (Figure 4(a)) has been greatly improved compared with Figure 3(a) in the whole study area, which means the two databases have good consistency in most grid cells.In the middle of Tianshan mountainous area especially the Ili Valley, the improvement is most obvious relative to those in the east and west.The RMSE and RMSF have also been significantly improved (Figures 4(b) and 4(c)).The spatial distribution in Figures 4(b) and 4(c) is identical to that in Figures 3(b) and 3(c); the relationship is reverse between RMSE and  but consistent between RMSF and .In general, the higher the precipitation, the lower the value of  and RMSF and the higher the value of RMSE.

Validation of the Rain Gauge and Precipitation Datasets.
Considering sparse rain gauge distribution in Tianshan mountainous area, the accuracy evaluation on the satellitebased database, and the calculation of , RMSE, and RMSF are carried out only in 32 grids containing stations for annual precipitation between TRMM 3B43 V6, 3B43 V7, APHRODITE V1003R1, and rain gauge data.

TRMM 3B43 V6 and Rain Gauge Data.
The patterns of TRMM 3B43 V6 and rain gauge are showing that the correlation coefficients are statistically significant across much of Tianshan mountainous area.The value of correlation coefficient is greater in the south than in the north and greater in the west than in the east.The value is less than 0.1 among WQ, WS, ZS, and SSJF, while being up to 0.9 for many stations in the south slope.The RMSE value follows reverse spatial patterns of the correlation coefficients and reaches its maximum of above 250 mm in the Ili Valley.The RMSF follows similar spatial patterns to the RMSE, which is lower   in the south than in the north and lower in the west than in the east, with its maximum in the TRP.The results indicate that the two databases are well matched in the west and the middle compared with the east of Tianshan mountainous area.
The spatial distributions of correlation coefficient of TRMM 3B43 V6 against rain gauge have shown obvious seasonal characteristics.During winter, the correlation coefficient between TRMM 3B43 V6 and rain gauge is above 0.5 except individual stations, while correlation coefficient is declining with elevation rising which stations are above 2000 m.In spring, the relationship between elevation and correlation coefficient is relatively complex.Firstly, the correlation coefficient is increasing with station elevation rising about 1100 m.After this point, the correlation coefficient is decreasing with station elevation rising.This pattern is also reflected in the fall.In summer, the correlation coefficient is increasing with station elevation rising.Therefore, the relationship between precipitation and terrain can be portrayed in TRMM 3B43 V6 datasets.

TRMM 3B43 V7
and Rain Gauge Data.The pattern of correlation coefficient indicates the significant improvement between TRMM 3B43 V7 and rain gauge data at almost all stations over whole study area, while the spatial distribution characteristic of correlation value follows similar patterns to TRMM 3B43 V6 and rain gauge, where the correlation value is greater in the south than in the north.The minimum value of correlation is larger than 0.34 in SSJF and HM of the east parts of Tianshan mountainous area, and the greater error with RMSE exceeding 100 mm is observed in the east center and Ili Valley.The spatial distribution characteristic of RMSF is reverse to that of RMSE, where the value is lower except the east center and Ili Valley.The results indicate that the TRMM 3B43 V7 database matches with rain gauge data better over Tianshan mountainous area than the TRMM 3B43 V6 database does.

APHRODITE V1003R1 and Rain Gauge Data.
As the APHRODITE precipitation datasets are produced by interpolation using rain gauge observations, the results are well matched with rain gauge data in most grid cells compared with the two versions of TRMM. Figure 5 depicts the monthly variations of precipitation of the selected stations and grid boxes for rain gauge and APHRODITE with correlation   coefficient 0.59 (YW), 0.8 (TRP), and 0.95 (BLT), respectively, during the period 1998-2007.Generally, it is apparent that the rain gauge data is greater than the APHRODITE data (see Figures 5(a) and 5(c)), while correlation coefficient is 0.59 and 0.95, respectively.However, as a Global Telecommunication System station used in the interpolation, The APHRODITE, may be overestimated, is higher than rain gauge with correlation coefficient 0.8 in TRP which is lower than BLT.During the drought in winter, the rain gauge data has shown good consistency with APHRODITE data.On the contrary, the more the monthly precipitation, the more significant, the difference between rain gauges with APHRODITE, especially in summer.Thus, the validation of APHRODITE datasets is necessary especially in mountainous areas.

Discussion and Conclusions
4.1.Discussion.Except the altitude, the variability and distribution of precipitation were affected by elevation, mountains toward, vapor source, and so forth.The elevation is the most significant impact among them, especially in the mountainous area.Thus, the quality of precipitation data can be reflected by the relationship between the precipitation data and elevation.As for the consistency, the each grid cell contained at least one station has been selected, and elevation of each grid cell is extracted by using the digital elevation model.The correlation coefficients of the annual mean precipitation between APHRODITE, TRMM 3B43 V6, TRMM 3B43 V7, rain gauge, and elevation are calculated, respectively (Figure 6).Obviously, the correlation coefficient between rain gauge and elevation is the highest among them, while it is not significant difference in the relationship between APHRODITE, TRMM 3B43 V6, TRMM 3B43 V7, and elevation.As for the stations or grid cells below the 1500 m, part of the stations or grid cells are located in the north slope of Tianshan mountainous area which is windward, especially the Ili River Valley, and other stations or grid cells are located in the south of the piedmont Gobi; the relationship between precipitation and elevation is affected by these stations or grid cells.However, above 1500 m, the relationship between precipitation and elevation is more significant.
The correlation coefficients of the average monthly precipitation between APHRODITE, TRMM 3B43 V6, TRMM 3B43 V7, rain gauge, and elevation are analyzed.The results have shown that the relationship between the rain gauge and elevation is still the best, but poor between the TRMM 3B43 V6 and elevation.Overall, the relationship between precipitation and elevation presented negative correlation in January and February; the correlation coefficient is low and did not pass the test of significance.But this relationship is reversed in March.The relationship between precipitation and elevation has exhibited weak positive correlation in March and April and has shown a significant positive correlation from May to September, passing the test of significance.With the increase in precipitation, the correlation coefficient is increased.However, the relationship is reversed in October; the correlation is not significant.The main reason is that rainy season is from April to September over Tianshan mountainous area and amount of precipitation from April to September accounts for 80% of annual precipitation, and most of precipitation is liquid easily observed.The precipitation is solid and is greatly influenced by the wind; the accuracy is not high, especially for the remote datasets in January, February, November, and December.It is obvious that the relationship between precipitation and elevation has influence on this factor.

Conclusions.
The comparison between several kinds of precipitation datasets has shown good agreement over the Tianshan mountainous area but rather poor agreement in some regions.Similar precipitation patterns have been identified according to the results of all datasets, which perform well over the whole study area except the east center and Ili Valley.Compared with APHRODITE, it is obvious that TRMM 3B43 V7 performs better than TRMM 3B43 V6

Figure 1 :
Figure 1: Location of the study area and distribution of meteorological stations.

Table 1 :
Geographical coordinates of the meteorological stations of Tianshan mountainous area.