Prediction of convective storms at . . .

For the first time ever, convection-resolving forecasts at 1 km grid spacing were produced in realtime in spring 2009 by the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma. The forecasts assimilated both radial velocity and reflectivity data from all operational WSR-88D radars within a domain covering most of the continental United States. In preparation for the realtime forecasts, 1 km forecast tests were carried out using a case from spring 2008 and the forecasts with and without assimilating radar data are compared with corresponding 4 km forecasts produced in realtime. Significant positive impact of radar data assimilation is found to last at least 24 hours. The 1 km grid produced a more accurate forecast of organized convection, especially in structure and intensity details. It successfully predicted an isolated severe-weather-producing storm nearly 24 hours into the forecast, which all ten members of the 4 km real time ensemble forecasts failed to predict. This case, together with all available forecasts from 2009 CAPS realtime forecasts, provides evidence of the value of both convection-resolving 1 km grid and radar data assimilation for severe weather prediction for up to 24 hours.


Abstract
For the first time ever, convection-resolving forecasts at 1 km horizontal resolution were produced in realtime in spring 2009 by the Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma as part of the NOAA Hazadous Weather Testbed Spring Experiment. The forecasts assimilated both radial velocity and reflectivity data from all operational WSR-88D radars within a domain covering most of the continental United States. In preparation for the realtime forecasts, 1 km forecast tests were carried out using a case from spring 2008 and the forecasts with and without assimilating radar data are compared with corresponding 4 km forecasts produced in realtime. For the test case, significant positive impact of radar data assimilation is found to last at least 24 hours. The 1 km grid produced a more accurate forecast of organized convection, especially in the details of structure and intensity. It successfully predicted an isolated severe-weather-producing storm nearly 24 hours into the forecast, a storm which all ten 4 km real time forecasts from a forecast ensemble failed to predict. This case, together with all available forecasts from 2009 CAPS realtime spring forecast experiment, provide evidence of the value of both convection-resolving 1 km grid and radar data assimilation for severe weather prediction for up to 24 hours.

Introduction
Accurate prediction of convective-scale hazardous weather continues to be a major challenge. Efforts to explicitly predict convective storms using numerical models date back to Lilly [1], and began with the establishment in 1989 of an NSF Science and Technology Center, the Center for Analysis and Prediction of Storms (CAPS). Over the past two decades, steady progress has been made, aided by steady increases in available computing power. Still, the resolutions of the current-generation operational numerical weather prediction (NWP) models remain too low to explicitly resolve convection, limiting the accuracy of quantitative precipitation forecasts.
For over a decade, the research community has been producing experimental real time forecasts at 3-4 km convection-allowing resolutions [e.g., 2, 3, 4]. Roberts and Lean [5] documented that convection forecasts of up to 6 hours are more skillful when run on a 1 km grid than on a 12 km grid, and more so than on a 4 km grid. On the other hand, Kain et al. [3] found no appreciable improvement with 2 km forecasts compared to 4 km forecasts for the forecast beyond 12 hours.
In the spring seasons of 2007 and 2008, CAPS conducted more systematic realtime experiments. Daily forecasts of 30 h or more were produced for 10-member 4 km ensembles and 2 km deterministic forecasts [6, 7, X07 and X08 hereafter]. In 2008, radial velocity (V r ) and reflectivity (Z) data from all operational radars in a domain covering most of the CONUS (continental US) were assimilated [7] using a combined 3DVARcloud analysis method [8,9]. Standard precipitation verification scores show that significant positive impact of radar data lasts up to 9 hours but the difference in scores between the 4 and 2 km forecasts is relatively small [7,10].
Recognizing that producing better convective forecasts requires accurately resolving the internal structures of convective storms, the CAPS team carried out realtime 1 km resolution forecasts assimilating radar data from mid-April through early June, 2009 [11]. Daily 30-hour forecasts used 9600 processor cores of the new Cray XT5 supercomputer at Oak Ridge National Laboratory (ORNL). Each forecast took about 5.5 hours to complete. In preparation for such forecasts, tests were made using cases from the spring of 2008 and they represented the first time ever that forecasts at a 1 km resolution were produced for a large domain covering the entire CONUS, assimilating all available data from the operational weather radars in the domain (see Fig 1). In this paper, we document the results of one of the 1 km tests as they were produced in early 2009 in preparation of the 2009 CAPS spring forecast experiment, and compare the forecasts produced at 4 km grid spacing with and without radar data assimilation. We also present briefly the mean precipitation skill scores from the spring 2009 forecasts, produced at 1 km and 4 km grid spacing with radar data assimilation, and 4 km forecasts without radar data, together their comparisons with the reference NAM forecasts.
The rest of this paper is organized as follows. Section 2 describes the forecast model configurations, and sections 3 and 4 present and discuss the results. A summary is given in section 5.

Forecast Configurations
The case chosen is that of 26 May 2008, which is a more weakly forced case highlighted in X08. The 4 km forecasts were produced in real-time, corresponding to the control members of the 4 km storm-scale ensemble forecasts [SSEF, X08, 12], with and without radar data (named CN4 and C04, respectively). In 2008, the CAPS forecasts used version 2.2 of the Advanced Research Weather Research and Forecast (WRF-ARW) [13] model while in 2009 version 3.0 of WRF-ARW was used. For this reason, the 4 km and 1 km forecasts presented in this paper used versions 2.2 and 3.0 of WRF, respectively, but with the same set of physics parameterization options that correspond to the control member of the CAPS SSEFs of the two years [12,14] 1 . They are, specifically, the RRTM shortwave and NASA GSFC long-wave radiation, the NOAH land surface model, the Thompson microphysics and the Mellor-Yamada-Jancic (MYJ) PBL schemes, were used (see X08 for references), together with monotonic advection [15]. Cumulus parameterization scheme was not used, since 4 km and 1 km grid spacings are generally considered convection-permitting and convection-resolving, respectively, while cumulus parameterization schemes are usually designed for grid spacings larger than 10 km [16].
All forecasts were initialized at 0000 UTC of 26 May 2008. Forecasts C04 and C01 are, respectively, 4 and 1 km forecasts without radar data assimilation, and were initialized by interpolation from the operational National Centers for Environmental Prediction (NCEP) North America Mesoscale (NAM) model 0000 UTC analysis on a 12 km grid. The 4 and 1 km forecasts with radar data assimilation, i.e., CN4 and CN1, started from the analyses produced on the native model grid by the Advanced Regional Prediction (ARPS) [8] three-dimensional variational (3DVAR) [17] and its complex cloud analysis package [9,18], using the same NAM analysis as the background. Full- 1 The physic options used by the control forecasts of the two years were the same. Furthermore, version 3.0 differs from 2.2 mainly in the addition of new physics parameterization schemes while the dynamic core remains the same. For the configurations used, version 3.0 produced essentially the same forecasts results as version 2.2 for the 4 km forecasts based on later tests. volume level-2 Vr data from 57 WSR-88D radars running in precipitation mode (63 additional radars ran in clear mode) were analyzed by the 3DVAR. The Z data entered the system through the ARPS complex cloud analysis package, which analyzes cloud and hydrometeor fields and then adjusts in-cloud temperature and moisture based on a 1-D parcel model with entrainment in areas of diagnosed cloud and rising motion [18]. The radar data were first quality controlled, including velocity dealiasing, then 'remapped' to the model grid through a least-squares fitting procedure [19] before being analyzed.
Hence, the data were essentially super-obbed to the model grid first. Additionally, wind profiler and standard surface observations including the Oklahoma (OK) Mesonet data were also analyzed. The lateral boundary conditions came from the NAM forecasts. Both grids had 50 vertical layers with a near-surface vertical resolution of 20 m. border. Over the next three hours, these lines evolved into a long connected line that was further linked with the convection in the Great Lakes (GL) region (Fig. 1a). This squall line propagated eastward and maintained its identity until 0000 UTC, May 27 (not shown), when it was found over eastern Mississippi (MS), northern Alabama (AL) and eastern Tennessee (TN). During the entire period, the cold front was nearly stationary; the squall line was therefore mostly self-propagating, driven by the progression of its own cold pool. The initial convection-initiating forcing along the front and dryline was lost during this stage. This line quickly dissipated after 0000 UTC, May 27.

Forecast Results and Subjective
During this 24 hour period, there were other regions of convection that interacted with each other. As documented by X08, the evolution of convection during this period was rather complex and the morphology of many of the convective storms were modulated by their own cold pools and gust fronts and interactions with those of other storms. Such a situation is more difficult to predict than cases where strong propagating synoptic-scale features, such as a strong cold front, play more controlling roles. We demonstrate here that in the absence of strong large-scale control, the impact of radar data can be long lasting.

Prediction results
At the initial time (not shown), the composite (vertical column maximum) Z fields in CN4 and CN1 look very similar to the observed, which is due to the direct assimilation of Z data. C04 and C01, however, had no reflectivity in the initial condition Being properly initialized in CN1 and CN4, these groups of convection were accurately predicted over the first three hours (Fig. 1b,c). The characteristics and pattern of convection predicted by CN1 (Fig. 1b) in the TX panhandle, northwest OK, and KS regions at 0300 UTC compare very well with those of observation (Fig. 1a). The The 4 km forecast without assimilating radar or additional surface mesonet data (C04) is clearly inferior at 3 hours (Fig. 1d). Essentially all of the line segments in TX, OK and KS are missing. Instead, the model was trying to initiate new convection along the dryline at the TX-NM and KS-Colorado (CO) borders and along the cold front now located at the KS-Nebraska (NE) border and intersecting the dryline at the northwest corner of KS. In C04, the bow in MO-AR region is mostly missing, and the convection in the GL region is too weak. In this case, the convection that developed in the first few hours of forecast near the cold front and dryline was at wrong locations; as we will see later, this has long-term consequences.
At 9 hours, a time when the direct impact of radar data measured by standard skill scores for the season average starts to diminish (X08), the positive impact of radar data is still very clear in this case in both CN1 and CN4 (Fig. 2). Fig. 2b shows CN1 predicted the strong, narrow squall line extending from central OK through eastern-central MO very well, including the structure of embedded intense convection. Its southern end advanced too fast though, placing it about 150 km ahead of the one observed in southeast OK. One possible reason for the too fast advancement of the line is the cold pool which may be too strong. Cold pool intensity has been found to be rather sensitive to the microphysics, especially the drop/particle size distributions of rain and graupel, which through evaporative and melting affects cold pool intensity [21,22].  23] but are notoriously difficult to predict in numerical models, and lack of model resolution and deficiency in the microphysics had been suspected to be the cause [24][25][26].
The fact that the 1 km forecast shows a somewhat better ability in producing the trailing stratiform precipitation is encouraging. The evolution of convection in other parts of the domain not shown, including those in southwest TX, the northern US Rockies, and near the GL, generally agrees with observations also.
The general pattern of predicted convection in CN4 (Fig. 2c) is similar to that in CN1 (Fig. 2b), although significant differences exist in detail. CN4 also captured the general Γ-shaped echo, but the embedded cells are clearly weaker. The southern portion of the main line also propagated too fast. In general, the 1 km forecast is noticeably superior to the 4 km forecast; it provides a much clearer indication of the intensity of the strongest embedded convective cells.
The forecast of C04 at this time is much poorer (Fig. 2d). This forecast never managed to 'spin up' the pre-front and pre-dryline convection. It simply evolved the convection that was incorrectly initiated along the front and dryline during the first few hours of the forecast, missing the most significant areas of convection. As discussed in X08, this failure continued to affect the subsequent evolution of a complex sequence of convective activities, for the reminder of the forecast.
By noon of 26 May (1800 UTC), all of the convective systems from the previous evening and night have moved out of the central Plains. The quasi-stationary front remained running across central KS, intersecting the dryline that extended north from the TX panhandle near the CO border (not shown). In the afternoon, convection was initiated along the dryline and, to a lesser extent, along the front. These processes were captured well in both CN1 and CN4 (Fig. 3).
In the late afternoon hours, many hail events associated with the above convective storms were reported. Two brief tornadoes were reported near Dodge City, KS, between the dryline-cold front triple point. At 2300 UTC, the observed composite reflectivity map of the OK-KS region shows three groups of convective cells (labeled A, B and C in Fig.   3a), one near the western OK border (A), one in southwestern KS near Dodge City (B), and one in the form of more isolated cells at the central OK-KS border (C). Groups A and B were initiated along the dryline and B near the front-dryline triple point (the east-west frontal location can be inferred from the surface wind field in, e.g., Fig. 3b while the north-south dryline is located near the east edge of the plotting domain), and they were captured in both CN4 and CN1 (Fig. 3b,c) but not in C04 (Fig. 3d). In C04, the convection that was incorrectly initiated along the front over 20 hours earlier organized into an east-west oriented line and moved to northern OK by this time (Fig. 3d); it dissipated over the next couple of hours. This line obviously interfered with the conditions producing the actual dryline convective initiation in the afternoon of the second day. In fact, in C04 no initiation occurred at all along the dryline, except for an isolated cell near the triple point (Fig. 3d).
Group C, consisting of more isolated cells, formed in the warm sector south of the front and east of the dryline near KS-OK border (Fig. 3a). It is interesting that the main cell with this group is successfully predicted in CN1 (Fig. 3b), but not in CN4, C04, nor in any other member of the 4 km ensemble produced in real time (X08). The observed cell became fully developed at 1900 UTC, while in CN1 it reached maturity at 2100 UTC.
The observed storm propagated slowly south-southeastward, and maintained its identity until 0300 UTC 27 May. It generated many hail reports and a high-wind report of over 40 m s -1 at 2340 UTC.
The corresponding storm in the CN1 prediction maintained its full intensity until after 0100 UTC. It gained some supercell characteristics in terms of the shape of the reflectivity by 2300 UTC (Fig. 3b), consistent with severe weather reports. Despite some difference in the exact timing and longevity between the observed and prediction storms, the ability of a 1 km model to predict, about 20 hours into the forecast, an isolated severe storm that developed in the absence of obvious mesoscale forcing, is very remarkable.
None of the ten 4 km ensemble forecasts that included initial and boundary condition perturbations as well as variations in physics schemes, captured this storm. In fact the 4 km member without radar data assimilation completely missed the initiation along the dryline on the second day. Finally, the 1 km forecast without radar data assimilation, C01, is similarly poor as C04, and this can be seen from the precipitation forecast scores presented in the next section.

Precipitation verifications
To complement the earlier subjective evaluation of the forecasts for the May 26, 2008 test case, we calculate the Equitable Threat Scores (ETSs) verified against hourly radar-estimated precipitation produced on a 1 km grid by the National Severe Storms Laboratory in real time [27]. Such data were first interpolated to the forecast model grid before the ETS scores are calculated. Fig. 4 shows the ETSs for hourly accumulated precipitation, at the 0.1 and 0.5 inch per hour thresholds, for the entire model domain.
Clearly evident is that the radar-assimilating CN1 and CN4 start with much higher ETSs initially while the scores of C01 and C04 are around zero before 12 hours. For the 0.1 inch per hour threshold (Fig. 4a), the ETS score for the first hour is about 0.45 for CN1 and 0.3 for CN4, indicating large difference in the short-range precipitation forecasts of 1 and 4 km grids. For the higher 0.5 inch per hour threshold (Fig. 4b), the scores for the first hour are 0.29 versus 0.14, respectively. In general, the ETS scores decrease quickly during the first 5 hours, and the decrease is fastest during the first two hours, especially for the higher thresholds. Such behaviors are actually expected, and are consistent with the shorter range of predictability for more intense, smaller-scale convection, since errors associated with smaller scale, unstably motion grow the fastest [e.g., 28]. As errors associated with very short spatial scales present in the radar-assimilated initial condition grow quickly, predictability associated with such scales are quickly lost, causing initially rapid decrease of the precipitation forecast skill scores. Another possible cause for the initially rapid decrease in the skill score is insufficient dynamic and thermodynamic consistency among the model state variables within clouds when initialized by the singletime 3DVAR/cloud analysis. More advanced, four-dimensional, data assimilation methods that are closely coupled with the prediction model are expected to slow down the initial error growth to some degree. The forecast model error is another source although such an error tends to have larger impacts on longer forecasts.
The scores of C04 and C01 remain very low throughout the 30 hour long forecasts and never exceed 0.03 (0.02 for the higher threshold). Between 2 and 19 hours, the scores of CN1 are up to 0.05 higher than those of CN4 for the lower threshold (Fig. 4a). After 19 hours, the scores are comparable. For the higher threshold (Fig. 4b), the differences between CN1 and CN4 below small after three hours. For grid point-based skill scores such as the ETS, position errors in small scale features can significantly impact the skill scores. In general, beyond the life cycle of the initial convective storms present in the initial condition, it is difficult for an NWP model to predict accurately the timing and location of new storm cells, especially when they are not forced by fixed features such as local terrain. Therefore skill scores that would allow for a certain degree of position error are often more useful [e.g., 5].
To examine the precipitation forecast skill scores for the 4 and 1 km grids and the impact of radar data on the 4 km grid beyond the single test case present above, we discuss briefly here ETS scores for forecasts from 23 days of the 2009 CAPS spring forecast experiment on which all three forecasts are available; they are presented in Fig. 5 for three hour accumulated precipitation and for the 0.1 and 0.5 inch thresholds. For the ETS calculations, the 1 km precipitation fields were averaged to the 4 km grid. Figure 5a shows that for the lower threshold, the mean ETS scores for the CN1 are slightly higher than those of CN4 before 21 hours except for hour 12 when the score of CN1 dips slightly below that of CN4. For later hours, the scores are similar. The same comparison holds for the higher threshold (Fig. 5b) although the relative difference is larger. This suggests that more intensive convection typically associated with smaller more localized storms benefits more from the increased spatial resolution on average. For the May 26, 2008 test case, the difference between CN1 and CN4 are larger for the lower threshold, but it should be pointed out that the threshold for Fig. 4a is 0.1 inch per hour rather than the 0.1 inch per three hours, therefore it actually corresponds to a higher precipitation intensity. In general, the ETS scores for all forecasts of spring 2009 are consistent with those of May 26, 2008 test case.
The ETS scores for the operational 12 km NAM forecasts are consistently lower than all high resolution forecasts for the lower threshold shown (Fig. 5a), except for the first three hours when compared to the no-radar 4 km run (C04). For the initial hours, the NAM might have benefited from the consistency of its own analysis with its prediction model. Still, with the assimilation of radar data on either 4 or 1 km grid, the precipitation scores are much higher even during the initial hours (Fig. 5a).

Summary
In this paper, we report on the results of the first ever test forecasts performed for a case from May 2008, at 1 km grid spacing in a domain covering almost the entire continental U.S., and the comparison of such forecasts with similarly configured forecasts produced at 4 km grid spacing in real time. These forecasts were 30 hour long, and a pair of forecasts assimilated both radial velocity and reflectivity data from all operational U.S. WSR-88D radars within the model domain while another pair did not assimilate radar data. These 1 and 4 km forecasts with and without radar data assimilation are compared.
Based on subjective evaluations, significant positive impact of radar data assimilation is found to last at least 24 hours for the test case. The 1 km forecast with radar data assimilation more accurately reproduced the observed convection than the corresponding 4 km forecast, especially in structure and intensity. It successfully predicted an isolated severe storm nearly 24 hours into the forecast, while the corresponding 4 km forecast, as well all other 4 km members from the CAPS realtime storm-scale ensemble forecasts failed to do so. The positive impact of radar assimilation on the precipitation forecast is even larger on both 4 and 1 km grids. Similar conclusions hold for precipitation forecasts based on mean equitable threat scores for 23 forecast days from spring 2009. This study provides evidence of the value of both convection-resolving resolution and radar data assimilation for severe weather prediction for up to 24 hours. We do want to point out that the equitable threat score examined in this paper has many limitations when applied to high-resolution precipitation forecasts due to large penalty associated with position errors. Object-based verification methods [e.g., 29] and methods that account for position errors [e.g., 5] will be explored in the future. In fact, an initial effort has been made to compare the number and size characteristics of storm cells predicted on the 4 and 1 km grids during the CAPS realtime forecasts [30].