On the Use of Threshold for the Ground Validation of Satellite Rain Rate

Ground-truthing is a major problem in the satellite estimation of rain rate. This problem is that the measurement taken by the satellite sensor is fundamentally different from the one it is compared with on the ground. Additionally, since the satellite has the limited capability tomeasure the light rain rate exactly, the comparison should also consider the threshold value of satellite rain rate. This paper proposes a ground-truth designwith threshold for the satellite rain rate.This ground-truth design is the generalization of the conventional ground-truth design which considered the only (zero, nonzero) and (nonzero, nonzero) measurement pairs. The mean-square error is used as an index of accuracy in estimating the groundmeasurement by satellite measurement. An application to the artificial random field shows that the proposed ground-truth design with threshold is valid as the design bias is zero. The same result is also derived in the application to the COMS (Communication, Ocean, and Meteorological Satellite) rain rate data in Korea.


Introduction
It is a very difficult job to accurately observe the rain rate field in space.Rain gauge is the most traditional tool to measure the rainfall and is still assumed to be the most reliable one.However, rain gauge produces point measurements which may not be enough to capture the spatial characteristics of the rain rate field.Radar has advantages in that sense as it has a large spatial coverage.However, the spatial coverage of radar is still small to cover the continent or the entire earth.This is the reason why the satellite is expected to play an important role in global hydrology [1].
Satellite provides various kinds of information like the land use, vegetation, and soil moisture.Specifically, the information about the precipitation is used for the analysis of global hydrological cycle and ultimately is to be used as an input for the flood forecasting [2][3][4][5].Many satellite programs have been underway to measure the rain rate especially in remote areas such as the Tropical Pacific [6][7][8][9][10][11][12].However, the satellite also uses a sensor to indirectly measure the rain rate, whose measurement can be different from the true value.This is the reason why the validation process is required for the rain rate data through the comparison with the so-called ground-truth [13,14].
The ground-truth problem is a complex procedure since the two sensors measure different quantities: (1) the rain gauge measures rain rate at a point nearly continuously in time, while (2) the satellite measures an area average of rain rate over its field of view (FOV) discretely in time.While these two estimates in the long run should agree, there could be a large random difference between the two because of the different space-time sampling configurations.It should be possible, however, by taking enough simultaneous pairs of measurements to compare them and check for bias in the satellite rain rate.Thiele [15] presents an extensive discussion of all the issues relating the ground-truth and indicates several areas of active research problem.North et al. [16] and Ha and North [17] developed a strategy that can be used to check the calibration of satellite data using the ground measurements.
The ground-truth problem is complicated by the fact that the probability distribution of real rain rate data has a significant contribution at zero rain rate (usually greater than 90%).Hence, many of the data pairs can be (no-rain, norain) measurements or perhaps (no-rain, rain) measurements where the second entry is the satellite data.For this reason, much more data pairs should be collected to show that there is no bias involved due to the ground-truth design.Ha and North [18] and Ha et al. [13] also proposed the groundtruth design which uses the data pairs only when the satellite measurement is positive.
Additionally, it is also known that the satellite measurement has a lot of uncertainty especially when the rain rate is very low [19][20][21].This problem is related with the spectrum of the electric signal reflected from the object.When the strength of the signal is small, the SNR (signal to noise ratio) of the electric signal can decrease, which leads to larger uncertainty [22,23].That is, the uncertainty in the rain rate measurement can be very high when the rain rate is small [22,23].A simple solution to this problem is to discard the data which is lower than a certain threshold value of rain rate.
We present here a technique of validating satellite rain rate based upon the comparison with the measurements on the ground.Since the satellite rain rate is not trusty when the rain rate is very low, the ground-truth design proposed in this study uses the data pairs only when the satellite measurement exceeds a certain threshold level.The point rain gauge is used as the ground-truth measurement to validate satellite rain rate.The advantage of using the rain gauges is that they do not introduce any controversial algorithms associated with estimating the rain rate from a "ground-truth" measurement such as that derived from radar.
As an application example, this study analyzed the COMS (Communication, Ocean, and Meteorological Satellite) rain rate data in Korea.As ground-truth data, the rain rate from a total of 468 AWS (automatic weather stations) was used.The rainfall event analyzed includes the rain rate on 0800LST JUL 27 2011, whose maximum hourly rain rate recorded in Seoul, the capital city of Korea, was more than 80 mm/hr.This paper is composed of a total of four sections, including the Introduction and Conclusions.Section 2 summarizes theoretically the comparison problem of the satellite and rain gauge rain rate with threshold.The application example to the Bernoulli random field is also provided in this section.In Section 3, a real application example with the COMS rain rate data is given.

Theoretical Backgrounds
Ha et al. [24] presented a theoretical background on the comparison of radar and rain gauge rain rate with given thresholds.They also showed that there would be no systematic bias introduced when applying a threshold to the radar rain rate.Summarizing the theoretical background on the comparison of radar and rain gauge rain rate is as follows.
Consider a random rain rate field (r, ), defined in the r = (,) plane, and along the time axis .As a typical experiment, we locate a point rain gauge at some fixed location r  , on the r = (, ) plane.The rain gauge is located inside a satellite bin, with its area .Now the rain gauge rain rate is defined as and the satellite rain rate, based on , is calculated as Obviously, the satellite rain rate of (2) corresponds to the rain gauge rain rate of (1).Thus, we have two measurements, with respect to the th rain gauge, Ψ   and Ψ   , where the subscripts denote the satellite and rain gauge, respectively.We form the difference between the satellite and rain gauge rain rate and call it the error   = Ψ   − Ψ   for the th data pair.Here, we assume that the rain gauge rain rate is the "truth," and only the satellite rain rate contains some error.The meansquare error, which is usually used as an index of the accuracy, when comparing the satellite rain rate to the rain gauge one, is defined by The error in the satellite rain rate for a specific rain gauge is likely to contain a large component of random error.
If the members of the measurement pairs are statistically independent, we can sharpen the histogram of the difference between the satellite rain rate and the rain gauge rain rate, by adding independent data pairs.As outlined in the introduction, the comparison with threshold uses the data pairs only when the satellite rain rate is greater than the threshold; that is, it uses data pairs (Ψ   , Ψ   ) when Ψ   > , where  is the threshold.We can write the mean error for comparison that uses the threshold  as follows: where ⟨ |  > ⟩ denotes the conditional mean of , given that  > .We can also express the mean-square error as In this study, we partition the  km ×  km satellite bin into  =  2 (n × n) tiles (or cells), to treat the random field effectively as a multivariate vector.The satellite rain rate can then be written as where  (= /) is the number of subdivisions that have been chosen for the partitioning of the satellite bin and (, ) represents the area-average rain rate in an  km ×  km grid square, which we call a tile (or a cell).Also, the rain gauge rain rate is assumed to be where (  ,   ) is the rain gauge location.For the convenience of notation, we will use (r  ) ( = 1, 2, . . ., ) and (r  ), instead of (, ) (,  = 1, 2, . . ., ) and (  ,   ), respectively.The satellite and rain gauge rain rates are then When we compute the mean and the mean-square error, we have to consider the location of the rain gauge within the satellite bin.Since the rain gauge is located randomly within a satellite bin, we can assume r  to be a random number, following the uniform distribution.Ha et al. [24] also presented an example such that the probability of rain rate in an individual tile (or cell) is  and one tile is independent of the other.Since the random field here was assumed to be a white noise Bernoulli random field, the distributions of the gauge and satellite measurements are The ground-truth design with threshold uses the data pairs only when the satellite measurements are greater than the threshold.Thus, we can derive the distribution of ground and satellite measurements with threshold using the distribution of Ψ  and Ψ  conditional on Ψ  > .Since we assumed that the random field is white noise, the probability that the satellite measurement is greater than threshold  is where  * = / and [ * ] is the largest integer less than or equal to  * .Note that when the random field is not white noise the probability   cannot be written in this form.Finally, they derived the error distribution Using the derived conditional error distribution (  ), it was shown that This says that the bias of the error for the ground-truth design with threshold is zero and thus the ground-truth design can be used to validate the satellite measurements.The spatial resolution of the COMS rain rate data is 4 km × 4 km, and its temporal resolution is 15 minutes.As groundtruth data for the evaluation of the COMS rain rate data, the rain gauge data from a total of 468 AWS (automatic weather stations) were used.Figure 1 shows example images of rain rate over the northern hemisphere and over the Korean Peninsula on 0800LST JUL 27 2011.The storm event occurring at this time was a very severe one and the maximum hourly rain rate recorded in Seoul, the capital city of Korea, was more than 80 mm/hr.

Application Results
. In this study, the error was defined as the difference between the satellite rain rate and the rain gauge rain rate.However, as the object of this study was to find whether the threshold applied to the satellite rain rate causes a systematic bias or not, we prepared two different rain gauge data types.The first rain gauge data type was prepared as those from the satellite data with 4 km × 4 km resolution at the location of rain gauges.The second rain gauge data type is the real rain gauge data collected by the AWS on the ground.Thus, in this study, we could derive two different sets of comparison results.By comparing these two results, we could distinguish the systematic bias and the bias of the satellite rain rate.
Figure 2 summarizes the characteristics of the satellite rain rate data, satellite rain rate data collected at the location of rain gauges, and the rain gauge rain rate data by the histogram with respect to the thresholds applied, 0.0, 0.5, 1.0, and 1.5 mm/hr.As the size of the bin of the satellite data, 8 km × 8 km (i.e., the average of four (2 × 2) COMS data cells) was considered.As can be seen in Figure 2, the histogram of the satellite rain rate looks closer to the normal distribution but becomes the truncated normal as the threshold value increases.The histograms of the satellite data at the location of rain gauges are more or less the same but the peak at the origin counting the portion of the no rain is rather short.On the other hand, the histogram of the rain gauge rain rate shows an exponential distribution, also with a strong peak at the origin.The range of the data is also wider than that of the satellite data.As can be expected, the portion of no rain in the rain gauge rain rate decreases as the threshold value increases.Figure 3 shows the comparison results between the satellite rain rate and the rain gauge rain rate (in fact, this is the satellite rain rate at the location of rain gauge).The sizes of the bin of the satellite data considered are 2 × 2, 4 × 4, 8 × 8, and 16 × 16, and the threshold rain rates considered are 0.0, 0.5, 1.0, and 1.5 mm/hr.The rain gauge rain rate was assumed to be the COMS rain rate at the location of AWS, that is, for the size of 1 × 1.As can be seen in Figure 3, when the size of the satellite bin is fixed, such as the 2 × 2 (i.e., the number of COMS cells is four), the error becomes more concentrated around zero as the threshold value increases.This result indicates that the satellite rain rate becomes more similar to the rain gauge rain rate when higher threshold value is applied.Satellite data with relatively high difference between the satellite and the rain gauge rain rate have been removed by increasing the threshold rain rate.On the other hand, when the threshold value is fixed, such as the 0.5 mm/hr, the distribution of error becomes wider and less concentrated around zero as the size of the satellite bin increases.This result indicates that more various pairs of satellite rain rate and rain gauge rain rate have been made as the size of the bin increases.
Table 1 summarizes the basic statistics of satellite rain rate, rain gauge rain rate, and the error as shown in Figure 3. Mean of satellite rain rate and the rain gauge rain rate show an increasing trend as the threshold value increases.On the other hand, the mean of error has been decreased as the threshold value increased.Variance shows a bit different behavior.Variances of the satellite rain rate and rain gauge rain rate have all been decreased as the threshold value increased.On the other hand, the variance of the error does not show any obvious trend.The variance of the error seems to be unchanged regardless of the increase of the threshold.These results explain how the threshold works on the structure of the satellite data.Basically, the satellite data (composed of many COMS cells) with high spatial variability has more chance to be removed as the threshold value increases.On the other hand, as the size of the satellite bin increases, more various pairs of satellite rain rate and rain gauge rain rate are made.Finally, the bias was estimated to be around 1-2% of the mean of satellite rain rate.This result indicates that applying the threshold value to the satellite rain rate does not result in any systematic bias when comparing the satellite and rain gauge rain rate.Figure 4 shows the same results as in Figure 3 but with the real rain gauge data from a total of 468 AWS.As can be seen in this figure, the distribution of error is negatively skewed along with the high peaks in the range of 0-5 mm/hr.Overall, the mean of error is negative.This result indicates that the satellite rain rate is higher than the rain gauge rain rate when the rain gauge rain rate is small and that the satellite rain rate is much smaller than the rain gauge rain rate when the rain gauge rain rate is high.As the satellite rain rate is the areal average one, this result may be accepted as a normal one.Thus, as the size of the bin of the satellite data (i.e., the number of COMS cells to be used for making the satellite data) increases, the satellite rain rate decreases to make the peak of the histogram approach the origin.As an effect of the high threshold value, the relative portion of the large negative errors has also been decreased.
Similar to Table 1, Table 2 summarizes the basic statistics of satellite rain rate, rain gauge rain rate, and the error as shown in Figure 4.As can be seen in this table, the mean of error is negative, which is largely due to the negative skewness of the error distribution.The coefficient of skewness was estimated to be about 50-60% of the mean rain rate.The size of error also increases, as the threshold value increases.However, both the mean and the variance of the error decrease as the size of the bin of the satellite date increases.Overall, the size of error was estimated to be much higher than the mean of satellite rain rate.The above result is a totally different one from that in Figure 3 and Table 1.The result in Table 2 indicates that there should be another kind of bias involved in the mean of error.To investigate this possibility, the ratios between the satellite rain rate and the rain gauge rain rate (i.e., the artificial one) in the case of Figure 3 and Table 1 were derived and compared with those in the case of Figure 4 and Table 2 using the real rain gauge rain rate from the AWS.The results are summarized in Table 3. Quite interestingly, Table 3 shows that there is a very consistent relation between the satellite rain rate and rain gauge rain rate.In an artificial case in Figure 3 (and Table 1), the ratios were all estimated to be around one.On the other hand, the ratios have been increased to be higher than two in all cases considered.It is also interesting to note that, for a given size of the bin of satellite data, the ratio remains the same regardless of the threshold value.That is, the threshold applied to the satellite data does not introduce and systematic bias when comparing the satellite and rain gauge rain rate.However, the ratio becomes a bit smaller with the larger size of the bin of satellite data.

Summary and Conclusions
Ground-truthing is a major problem in the satellite estimation of rain rate.The main problem is that the measurement taken by the satellite sensor is fundamentally different from Table 3: The ratio between the mean satellite rain rate over the satellite bin and the mean of satellite rain rate at the location of rain gauges and the ratio between the mean satellite rain rate over the satellite bin and the mean of rain gauge rain rate.

Figure 1 :
Figure 1: A sample image of COMS rain rate over the northern hemisphere and over the Korean Peninsula (empty circles in the map of the Korean Peninsula indicate the locations of AWS rain gauge).

Figure 2 :((( 5 Figure 3 5 Figure 3 :
Figure 2: Histograms of satellite rain rate over the satellite bin, satellite rain rate at the location of rain gauges, and the rain gauge rain rate (the size of the satellite bin  = 2 × 2).

5 Figure 4 :
Figure 4: It is the same as Figure 3 (comparison with rain gauge rain rate).
As an application example, this study analyzed the COMS (Communication, Ocean, and Meteorological Satellite) rain rate data.The COMS is a geostationary satellite launched on June 27, 2010.The COMS is operated by the National Meteorological Satellite Center, which produces the satellite rain rate data by analyzing the satellite images with the Calibration Matrices (CMs) made from the radar information.The CM is dependent on the solar zenith angle.If the solar zenith angle is higher than the threshold angle (85 ∘ ), the CM based on the images of infrared (IR-1) and water vapor (WV) is used.On the other hand, if the solar zenith angle is lower than the threshold angle, another CM based on the images of IR-1, WV, and visible (VIS) is used.Additionally, moisture correction factor, cloud growth rate correction factor, cloud-top temperature gradient correction factor, parallax correction factor, and orographic correction factor are applied to correct the rain rate estimated with the CMs. 3.1.Data.

Table 1 :
Mean of satellite rain rate over the satellite bin Ψ  , mean of satellite rain rate at the location of rain gauges Ψ    , and the mean error   .

Table 2 :
Mean of satellite rain rate over the satellite bin Ψ  , mean of rain gauge rain rate Ψ  , and the mean error   .
Threshold ⟨Ψ    ⟩/⟨Ψ  ⟩ ⟨ Ψ  ⟩/⟨Ψ  ⟩ the one it is compared with on the ground.In this study, we presented a technique of validating satellite rain rate based