Sources of Forecast Errors for Rainstorms in the South China Monsoon Region

Key Laboratory of Meteorological Disaster, Ministry of Education/Joint International Research Laboratory of Climate and Environment Change/Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science and Technology, Nanjing 210044, China Laboratory of Cloud–Precipitation Physics and Severe Storms (LACS), Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China University of Chinese Academy of Sciences, Beijing 100049, China


Introduction
South China is affected by both the tropical and subtropical monsoon, in which the rainstorm process is closely related to the circulation of the summer monsoon [1,2]. Rainstorms are one of the most serious types of natural disasters in South China [3][4][5]. However, currently, most global numerical models still carry considerable uncertainty in their forecasting of the location and precipitation amount of rainstorms [6,7]. In particular, the forecasting of rainstorms in the South China monsoon region has always been a difficult and challenging issue in China's operational forecasting sector [8][9][10].
In recent years, most researchers have tended to focus on the underlying physical mechanisms of rainstorms in the South China monsoon region. Some of these studies have found that, during the active period of the summer monsoon, the South China Sea summer monsoon can extend to the rainstorm area in the form of low-frequency oscillations, and the rainstorm process has a good correlation with the pulsation or strengthening period of the monsoon [2]. After the onset of the summer monsoon, the Bay of Bengal and the Indian Ocean contribute significantly to the quantity of water vapor in South China [11]. Moreover, both the average precipitation and the convection intensity in South China have generally increased, and the convection intensity is basically consistent with the subseasonal changes in atmospheric thermodynamic conditions, which probably leads to the occurrence of regional extreme precipitation [12,13]. For example, two strong rainstorms in Dongguan in July 2011 were caused by the onset of the southwest monsoon, the northward uplift of the ITCZ, and the pressure of the high-altitude East Asian trough [14].
In addition to the direct influence of summer monsoon circulation on precipitation, the rainstorm process in the South China monsoon region is also closely related to the prevailing mesoscale system under the influence of the summer monsoon [15]. For instance, the rainstorms on 11 May 2014, 19 May 2015, and 27 August 2018 were all caused by one or more mesoscale convective systems [16][17][18][19]. e evolution of the monsoon low-pressure intensity is synchronized with the daily distribution of the rainstorm area's size but not completely synchronized with the maximum daily precipitation's day-to-day evolution [17].
To study the predictability of rainstorms in the South China monsoon region, it is not enough just to understand the underlying physical mechanism of the rainstorm process; studying the forecast error of the numerical model is also necessary [20][21][22][23]. With continuous improvement in model accuracy, increasing the horizontal resolution of the model may improve the effects of the precipitation forecast [24][25][26]. Nevertheless, high-resolution models are still unable to reasonably reproduce the characteristics of the precipitation's distribution owing to the inherent unpredictability of the atmosphere, the imperfection of the numerical method in the dynamic framework of the physical processes, and so on, meaning current numerical precipitation forecasts usually carry large uncertainty [27].
In 2013, the China Meteorological Administration initiated the South China Monsoon Precipitation Experiment, which included studies on improving models and their initial fields [28][29][30]. Several researchers have found that improving the initial value assimilation technology of wind profile data can better describe the development of the convective system. Additionally, improvement of the lowlevel water vapor and wind field in the Weather Research and Forecasting (WRF) model data assimilation system can reduce the forecast error of heavy rain, thereby improving the prediction skill [31]. With the occurrence and development of precipitation, the evolution of errors gradually develops from local growth to global propagation, and the initial error in the precipitation area makes an important contribution to the precipitation forecast error [32]. A number of researchers have also studied the sensitivity of physical parameters and related conditional nonlinear optimal disturbances in the Global/Regional Assimilation and Prediction System for the forecasting of heavy rain in South China [33]. For instance, Lu et al. [34] studied a case of heavy rain in the South China monsoon region in 2015 and found that the initial error was an important source of forecast error, and the sensitive area could be found more accurately by the moist energy of perturbations.
Despite the clear need to explore the sources of error in rainfall forecasts in the South China monsoon region, there have been relatively few studies in this regard and so further exploration is still required. e present reported research sought to address this knowledge gap by developing a guide flow method to identify the sensitive area for rainstorm forecasts in this region. e overarching aim was to strengthen our understanding of the source areas of forecast error and improve the forecasting ability in the South China monsoon region.  [13] analysis of the five-day precipitation forecasts in three seasons (2013)(2014)(2015) in South China, the European Centre for Medium-Range Weather Forecasts (ECMWF) model generally has high forecasting skill. erefore, it can be assumed that both the initial field and the model of ECMWF have high accuracy. e ECMWF output (obtained from the TIGGE (THORPEX Interactive Grand Global Ensemble) dataset of the ECMWF center, hereinafter referred to as TIGGE_EC) was used as a control group with a spatial resolution of 0.5°latitude by 0.5°longitude. e initial conditions of TIGGE_EC were used to generate accurate initial conditions for the WRF forecasts.

Materials and Methods
2.2. WRF. Version 3.6.1 of WRF was employed in the present study. Both the initial fields and boundary fields were generated using FNL analysis data with a time interval of six hours. e horizontal resolution of the forecast was 3 km, with 1100 × 700 grid points, covering the entire South China region. Moreover, there were 60 layers in the vertical direction and the integration time step was 15 s without nesting.
e physical parameterizations were as follows: ompson scheme for the microphysical scheme; Goddard shortwave scheme for the shortwave radiation scheme; Rapid Radiative Transfer Model scheme for the longwave radiation scheme; Eta Mellor-Yamada-Janjic TKE (turbulent kinetic energy) scheme for the boundary layer; and the land surface process scheme adopted the Noah land surface model scheme, without cumulus parameterization [16,34]. e forecasts using the WRF model initialized by NCEP_FNL data are denoted as WRF_FNL, while TIG-GE_EC is used as a control group to be compared with WRF_FNL and OBS. Besides, we used accurate initial conditions (namely, the initial conditions of the TIGGE_EC forecasts) as the input field to generate the WRF forecasts (hereinafter referred to as WRF_EC) under the same physical parameterization schemes as WRF_FNL.

Methods.
In order to comprehensively investigate forecast accuracy, namely, the similarity in the location and intensity between the forecast and observed cumulative precipitation in South China, the correlation coefficient was calculated as follows [35]: where F is the forecast cumulative precipitation (TIG- Besides, the grid-to-grid threat score (T S ) was calculated to investigate the forecasting of rainstorm events for different levels of precipitation, namely, light rain and heavy rain. T S was calculated as follows [36]: where N A represents the number of grid points where the forecast and observed precipitation are at the same level as light rain or (or heavy rain) and above, N B is the number of grid points where the forecast precipitation level is light rain (or heavy rain) and above but observed precipitation is not at this level, and N C is the number of grid points where light rain (or heavy rain) has not been forecast.
Since we found that improving the accuracy of the initial conditions could significantly improve the WRF model's forecasts, it is interesting to further explore the sources of initial errors. In this aspect, we compared the differences between the two kinds of initial conditions (FNL and EC) and calculated the vertical integration of moist energy of the differences. e idea is that the area of large moist energy will indicate where the large differences exist, and thus it is taken as the source area of the initial error [34,37,38]. e formula of moist energy is as follows [34]: where σ is the vertical coordinate; C p is the specific heat at constant pressure (� 1005.7 J kg −1 ·K −1 ); R a is the dry air gas constant (�287.04 J·kg −1 ·K −1 ); p r � 1000 hPa; T r � 270 K; and u 0 ′ , v 0 ′ , T 0 ′ , p s0 ′ , and q 0 ′ are the differences in meridional wind, zonal wind, temperature, surface pressure, and water vapor mixing ratio between the EC and FNL initial fields, respectively. Compared with a single variable, moist energy indicates the comprehensive influence of all variables. Lu et al. [34] found that moist energy could help screen out the source area of the initial error, and the improvement of the initial conditions in that source area could achieve the greatest benefits compared to the improvement of initial conditions in other areas. us, hereafter, we refer to this method as the "moist energy method," and the source area as the "sensitive area."

Results
In 73.7% (14 cases) of the 19 cases, the correlation coefficient between TIGGE_EC and OBS is higher than that between WRF_FNL and OBS, and the average correlation between TIGGE_EC and OBS of the 19 cases is 0.428, which is significantly higher than that between WRF_FNL and OBS (0.328). Besides, the shaded area of the correlation coefficient in Figure 1 shows the extent to which TIGGE_EC is better than WRF_FNL.
Comparison of the cumulative precipitation distribution of WRF_FNL, TIGGE_EC, and OBS in cases 4 and 9 shows that WRF_FNL and TIGGE_EC differ in precipitation intensity and location compared with OBS ( Figure 2). However, both the intensity and location of the precipitation forecast by TIGGE_EC are more similar than those of WRF_FNL to OBS. TIGGE_EC has a better effect on the forecasting of the precipitation center in southwest Guangdong and southern Guangxi in both case 4 and case 9. Combined with Figure 1, it can be seen that TIGGE_EC is closer to OBS than WRF_FNL and has a better forecast result, which is consistent with the findings of Huang and Luo [13]. erefore, in these 14 cases, we assume that both the initial field and the model (i.e., the ECMWF model) have high accuracy, and we only examine these 14 cases in the following parts.
Next, we used accurate initial conditions (namely, those of TIGGE_EC forecasts) to generate WRF forecasts (hereafter referred to as WRF_EC) for the 14 cases in which the forecast of TIGGE_EC was better than WRF_FNL and quantitatively compared them with WRF_FNL. First, the TIGGE_EC forecasts were taken as the true values to examine the improvements in forecasts when using accurate initial conditions. It was found that the average correlation between the cumulative precipitation in South China forecast by WRF_EC and TIGGE_EC is 0.430, which is higher than that between the forecasts of WRF_FNL and TIGGE_EC (0.390).
Meanwhile, 71.4% of the 14 cases have better forecast skill with WRF_EC than with WRF_FNL. Taking case 9 as an example, it can be seen that both the location and the intensity of the precipitation forecasted by WRF_EC are closer to those of TIGGE_EC than WRF_FNL (Figure 2). Arguably, however, although the TIGGE_EC forecasts appear to be more accurate than the WRF_FNL forecasts, they still contain errors. us, we further used OBS as the true values to evaluate the improvements of WRF_EC compared to WRF_FNL, and the results turned out to be similar. at is, the cumulative precipitation in South China of OBS generally has higher correlation coefficients with WRF_EC than with WRF_FNL (Table 2). In other words, WRF_EC generates precipitation patterns that are more similar to OBS compared to WRF_FNL.
Besides, we also checked the T s scores of WRF_EC and WRF_FNL; here, the OBS are used as true values. From Table 3, we can see that, of the 14 cases, there are, respectively, 8 cases and 10 cases for which WRF_EC has better T s than WRF_FNL for "light rain and above" and "heavy rain and above" (Table 3). is indicates that WRF_EC generates precipitation amounts that are more similar to OBS than WRF_FNL.
From the above results, it is seen that forecasts of WRF_EC are better than those of WRF_FNL as regards both the precipitation patterns and precipitation amounts. In short, the WRF_EC forecasts are better than the WRF_FNL forecasts.
By quantitatively investigating the improvement in the forecast, we can examine the importance of the accuracy of the initial conditions in the forecasts and thus judge the source of the forecast errors for summer rainstorms in South China. erefore, we define the following parameters: Here, O FNL is the correlation between the cumulative precipitation in South China of WRF_FNL and OBS, while O EC is the same but between WRF_EC and OBS. Similarly, T FNL represents the correlation between WRF_FNL and TIGGE_EC, while T EC represents the correlation between WRF_EC and TIGGE_EC. en, we define the degree of improvement with IM 1 and IM 2 for each case. If IM 1 (IM 2 ) ≥ 50%, improving the initial value can significantly improve the forecast, and therefore the initial error is the main source of forecast errors. We define such a degree of improvement as "significantly improved." In this category, there are six cases, accounting for 31.6% of all cases. Likewise, if 0 < IM 1 (IM 2 ) < 50%, then improving the initial value can slightly improve the forecast (thus defined as "slightly improved"), for which there are four cases, accounting for 21.1% of all cases. Lastly, if IM 1 (IM 2 ) < 0, improving the initial value can barely improve the forecast results, and so the degree of improvement is defined as "not improved," for which there are also four cases. e IM 1 and IM 2 values for the above 14 cases are shown in Table 4. According to Table 4, the 14 cases can be divided into three types: "significantly improved," "slightly improved," and "not improved." It is seen that, among the 14 cases, the forecasts of 6 cases are significantly improved after improving the initial field, For those slightly improved cases, the initial errors also play some parts in the forecast errors, so it is also meaningful to explore the source of the initial errors for those cases. us, next step, we explore the source of the initial errors for the "significantly improved" group and the "slightly improved" group (a total of 10 cases). e differences between two kinds of initial conditions are explored, and the area of largest differences is taken as the source area (also called sensitive areas hereafter) of the initial errors. By analyzing the basic flows at 700 hPa, we found that the sensitive areas are located in the basic flows that are directed towards the precipitation area. is means that the initial errors mainly come from upstream the precipitation area. To quantitatively find out the relationship between the precipitation area and the sensitive area, we selected both their sizes as 4°× 4° (Figure 3) and found that their distances are about 5°of longitude. Based on the above results, we developed a method to identify the sensitive area by combining the precipitation area and the basic state wind (hereafter referred to as the "guide flow method"). e precipitation area is centered on the maximum precipitation point. First, we define the direction of the guide flow at 700 hPa as follows: where u and v are the zonal and meridional wind components on 700 hPa, respectively; θ is the angle between the wind direction at 700 hPa and the horizontal direction. lat is  Advances in Meteorology 5    the latitudinal difference between the maximum precipitation point and the selected maximum point of the sensitive area; lon � 5°, which is the longitudinal difference between the maximum precipitation point and the selected maximum point of the sensitive area. From the description above, u, v, and lon are all known, and then lat can be derived. us, the sensitive area can be determined according to the precipitation area, tan(θ), and the lat and lon values. If the guide flow in the precipitation area is relatively straight, then u and v are the zonal and meridional wind components of the maximum precipitation point. With lon known and tan(θ) obtained by linear backward deduction of the wind (u and v) in the precipitation area, lat can be derived, so the sensitive area can be determined (Figure 4(a)). If there are obvious troughs or cyclonic circulations in the precipitation area, u and v are the average zonal and meridional wind components of the precipitation area. Similarly, the location of the sensitive area can be deduced according to the average guide flow direction in the precipitation area (Figure 4(b)). Whether the guide flow is relatively straight or there are obvious troughs or cyclonic circulations in the precipitation area, the sensitive area selected by the guide flow method is relatively consistent with the location of the large-value area of moist energy (Figure 4). Table 5 shows the locations of the sensitive areas determined by the guide flow method and the moist energy method. For 80% (8 cases) of the 10 cases, the positions of the sensitive areas selected by the two methods are relatively consistent, with the distances between them being less than 2°. us, the sensitive area selected by the guide flow method is similar to that identified by the moist energy method. To confirm this point, we also use both the guide flow method and the moist energy method to identify the sensitive areas of those cases that belong to "not improved" group (cases 2, 8, 12, and 18) in Table 4 and those cases where TIGGE_EC has lower skills than WRF_FNL (cases 3, 6, 7, 16, and 19) in Figure 1. Results showed that, for 8 out of 9 cases, the positions of the sensitive areas selected by the two methods are similar (Table 6). is confirms that the guide flow method can be used to identify sensitive areas associated with rainstorm forecasts. Since the guide flow method is easy to use, it thus may be helpful for quickly selecting perturbation areas for ensemble forecasts or carrying out supplemental observations for adaptive observations.
To verify the accuracy of the sensitive areas, namely, to demonstrate that they are the source areas of the initial errors which lead to large forecast errors, we chose three other areas for comparison and carried out sensitivity experiments. ese three areas had the same size as sensitive areas, and they were, respectively, located to the west, south, and north of the precipitation areas with distances of 10°l ongitude, 5°latitude, and 5°latitude. en, the initial conditions of WRF_FNL were replaced with those of WRF_EC in the sensitive area to form new ones. Similarly, three other sets of new initial conditions were generated by replacing those of WRF_FNL with the initial conditions of WRF_EC in those three areas.
Keeping the model configuration unchanged, four new forecasts were produced with the above four sets of new initial conditions, and these are, respectively, referred to here  as F-sens, F-north, F-west, and F-south.
e cumulative precipitation of these forecasts was compared with the observed precipitation in South China, and their correlation coefficients were calculated. It can be seen from the results ( Figure 5) that, for the 10 improved cases in Table 4     Advances in Meteorology higher than other correlations (F-north and OBS, F-west and OBS, and F-south and OBS). is means that improving the initial conditions in sensitive areas can improve the forecast to a greater extent than by improving them in other areas, and again this verifies that the sensitive areas are the source areas of initial errors as well as the forecast errors.

Conclusion
Based on the WRF model, this paper has investigated the possible sources of forecast errors with respect to rainstorms in the South China monsoon region. First it is demonstrated that the initial error is the important source of the forecast errors, and then the source area (sensitive area) of initial errors is explored. Next, the relationship between the sensitive area and the precipitation area is analyzed. Based on the above results, a new method is developed (which we call the "guide flow method") to identify the sensitive area. Finally, the sensitive areas were examined through sensitivity experiments, which confirmed the accuracy of the sources of initial errors as well as the forecast errors. By investigating the improvement rate of the forecast when using one set of data rather than the other, the important degree of the initial conditions with respect to the forecasts for each case has been obtained. e results showed that forecasts of 6 cases have been significantly improved (the improvement rate is larger than 0.5), which means that, for these cases, the initial condition plays a main role in the forecasts. While the forecasts of 4 cases have been slightly improved, this means the initial condition also has some effects on the forecasts. For these 10 cases, the initial errors are the important sources of forecast errors; then we further explored the source of the initial errors by comparing the two initial conditions. e results showed that the initial errors mainly came from an area located upstream of the rainfall area (about 5°of longitude away from the maximum precipitation area) and we called that area the sensitive area. By studying the relationship between the sensitive area and the precipitation area, we found out the rules behind it and then put forward a "guide flow" method to identify the sensitive areas. e sensitive areas identified by the guide flow method were found to be generally consistent with those identified by the moist energy method. Since the guide flow method is easy to use, it thus may be helpful for quickly selecting perturbation areas for ensemble forecasts or carrying out supplemental observations for adaptive observations. Finally, sensitivity experiments demonstrated that improving the initial conditions in the sensitive areas leads to more benefits than improving them in other same-sized areas. is verifies the accuracy of the sensitive areas and confirms the source of the forecast errors.
Statistically, improving the initial conditions may improve the forecasting of rainstorms in the South China monsoon region. However, there are some cases (such as the 2019.4.12 case) in which improving the initial conditions has no benefit on the forecasts. us, for these cases, the model error has an important impact on the rainstorm forecast. Besides, there are some cases where we fail to find a better initial condition (cases 3, 6, 7, 16, and 19); for these cases, it is necessary to find out other ways to evaluate the importance of the initial conditions. In a word, in order to further improve the prediction skill for rainstorms in the South China monsoon region, further in-depth research is still needed.
Data Availability e data used in this paper can be obtained from Lin Lin (20211101017@nuist.edu.cn) upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.  Table 4, in which the red, green, purple, and blue lines plot the correlation coefficients between the cumulative precipitation results after replacing the sensitive area, northern control area, southern control area, western control area, and OBS in South China, respectively.
Advances in Meteorology 9