Statistical Downscaling for Rainfall Forecasts Using Modified Constructed Analog Method in Thailand

The simulations of rainfall from historical data were created in this study by using statistical downscaling. Statistical downscaling techniques are based on a relationship between the variables that are solved by the General Circulation Models (GCMs) and the observed predictions. The Modified Constructed Analog Method (MCAM) is a technique in downscaling estimation, suitable for rainfall simulation accuracy using weather forecasting. In this research, the MCAM was used to calculate the Euclidean distance to obtain the number of analog days. Afterwards, a linear combination of 30 analog days is created with simulated rainfall data which are determined by the corresponding 5 days from the adjusted weights of the appropriate forecast day. This method is used to forecast the daily rainfall and was received from the Thai Meteorological Department (TMD) from the period during 1979 to 2010 at thirty stations. The experiment involved the use of rainfall forecast data that was combined with the historical data during the rainy season in 2010. The result showed that the MCAM gave the correlation value of 0.8 resulting in a reduced percentage error of 13.66%. The MCAM gave the value of 1094.10 mm which was the closest value to the observed precipitation of 1119.53 mm.


Introduction
It is difficult to predict the exact amount of precipitation in future events and prevent the likelihood of natural disasters. Henceforth, research and development of forecasting weather should be considered because rainfall is a crucial factor in sustaining life and the environment. Rainfall forecast plays an important role in maintaining water resources, the environment, and agriculture. Rainfall forecasts are still in the developing stages. They can be classified into 3 main methods [1][2][3][4][5]. The first method is statistical forecasting, based on finding the relationship between climatology data from past forecasts and future forecasts. This method is relatively simple but the relationship may suddenly change and it makes the forecasts less accurate. The second method is dynamical forecasting based on a climate model. This method requires a high-performance computer to generate sophisticated models and may also require large amounts of input data. The third method is hybrid forecasting which is based on the combination between statistical forecasting and dynamical forecasting which are applied together [1]. In general, this method provides a forecast with higher accuracy than the statistical method [2]. However, the resolution of the forecast is still too low for area-specific applications. A downscaling method is required. A downscaling method is a term used to explain the process of relating information or data with large-scale atmospheric variables that are provided by GCMs and reducing them to a finer, spatial, and temporal scale. In a more recent variety of articles, downscaling is widely used and applied in climatology for situations such as the construction, simulation, and prediction of the mean, minimum, and maximum air temperature and rainfall for 2 The Scientific World Journal the past 30 years [3]. Approaches for downscaling GCM simulations can be broadly classified as "dynamical" or "statistical" downscaling [4]. Dynamical downscaling is a technique that gathers output data from GCMs and uses that data to select a suitable regional and numerical model with a higher spatial resolution. This can simulate local climate conditions in greater detail. Techniques that employ regional climate models using fine grid spacing are quite efficient for forecasting [5][6][7][8]. Statistical downscaling techniques are based on a relationship between the larger scale climate predictors and observed precipitation. Predictors such as mean sea level pressure, humidity, geopotential height, relative humidity, and temperature may be used to downscale precipitation forecasts to the desired region and popularity [2][3][4][5][6][7][8][9][10][11][12][13]. There are a variety of methods for statistical downscaling [5], for example, the Delta Method (DM), Bias-Correction Method (BCM), Constructed Analogs Method (CAM), Localized Constructed Analogs Method (LOCA), Artificial Neural Networks (ANNs), Least Squares Support Vector Machines (LS-SVM), nonparametric kernel regression (NKR) [8][9][10], and so forth. In this research, the DM compares an arrangement of historical data and present day data with the actual records of measured data (monthly or daily observations) [14]. The BCM uses differences in observed climatology mean values between the GCM and observations from historical reference periods and is used to "correct" future GCM simulations [1]. BCSD is the GCM-simulated values that are "mapped" by quintile onto historical observed data. The AM uses data of weather forecasts in present day and records a day in the past when the weather scenario appears most similar (analog day) or finds an "appropriate match" analog for a forecast in the future [15]. The CAM uses a combination of analog days to forecast the temperature and improve the National Multimodel Ensemble's (NMME) method during the March-April-May (MAM) precipitation forecasts specifically used in studies at equatorial East Africa (EA) (by Shukla et al.) [16]. The area of study is between 2 ∘ S to 8 ∘ N and 36 ∘ E to 46 ∘ E. The results showed that precipitation and sea surface temperature (SST) forecasted over a large part of the Indo-Pacific Ocean (specifically between latitude 30 ∘ S to 30 ∘ N and longitude 30 ∘ E to 27 ∘ E, i.e., the analog domain) demonstrated high levels of absolute correlation with observed MAM precipitation over the EA (focus) region during the post-1999 period. Moreover, NMME closely resembles the precipitation forecasts over the analog domain and is used as a predictor for forecasting EA MAM precipitation. This generally provided higher levels of performance than when SST forecasts are used as predictors. Pierce et al. introduced a new technique (LOCA) for statistical downscaling simulations of daily temperature and precipitation [17] from using observations over the period from 1940 to 1969 when investigated. Observations between the periods 1970 to 2005 are used as testing data. They use anomalies when downscaling temperature and absolute values in precipitation. Results from downscaling the daily maximum temperature and precipitation illustrate that LOCA reproduces the extremes in summer maximum daily temperatures and winter daily precipitation quite well. A study found that many researchers have constructed predictions with experimental methods in a variety of ways that used statistical downscaling. So, statistical downscaling applications are preferable in the present day studies and are considered as one of the most cost-effective methods in localimpact estimates of climate scenarios and rainfall forecasts [18]. This method is of interest to developing countries and provides economic resources to streamline the recruitment system but requires high-performance computers. Climate change has significant impacts on human activity and natural disasters [3]. Thailand frequently faces large quantities of rain that causes the problem of flooding and damages the agriculture and affects industry and the people. These issues are the primary motivation for this research. The main objective of this study is to develop a rainfall forecast for Thailand using MCAM and compare observed precipitation at TMD stations where the investigation is carried out. The developed MCAM is designed to determine the appropriate measurement and coefficients in the linear combination. The Modified Constructed Analog Model is downscaled and provides estimates suitable for rainfall simulation accuracy using analog weather forecasting. Analog weather forecasting finds the best matching historical occurrence of a target pattern to determine an analog day with the MCAM for rainfall forecast at the station. Four predictors are used in each of the two datasets at the National Centers for Environmental Prediction (NCEP) Climate Forecast System Reanalysis (CFSR) and NCEP Climate Forecast System Version 2 (CFSv2) for the area covering Thailand. These predictors include the mean sea level pressure, temperature, moisture, and geopotential height at 850 hPa. Using the analysis field of CFSR (during the years between 1979 and 2009) and forecast field of CFSv2 (in the year 2010) and searching for the best matching historical pattern, it is possible to find analog day using the Euclidean distance formula. An analog day in the historical record (past data) will have the same characteristics as a predictor at a given target time. However, choosing a suitable predictor for the MCAM is important because it decreases the error in precipitation forecast based on the change in the coefficients. If the coefficient's value that is obtained from the CFSR and CFSv2 is too high, the forecasted precipitation error will also be high, respectively. The most suitable predictor must be selected from observing the lowest value Euclidean distance from similarity measurements. Once the analog day has been determined, the information can be used to forecast the precipitation of the current day (MCAM calibration). In order to test the performance of the method, it has been compared with the AM and the CAM. The AM and the CAM are the original methods that the MCAM is derived from. This paper is organized as follows. Section 2 presents an overview of the study area and data used in this research, followed by methodology in Section 3. In Section 4, results of various analyses are presented and finally conclusions of the study are given in Section 5.

Data and Domain
In this research, experimental cases only select the predictors from the CFSR and CFSv2-Interim forecast dataset. These are the initial conditions for comparison between CFSR in years 1979 to 2009 (analysis data) and CFSv2 in the year In the standard analysis for the daily rainfall in Thailand, the past rainfall data during the years of 1979 to 2010 were recorded with measurement tools such as the rain gauge. The rain gauge measures the height of precipitation that falls onto a set area in millimeters. There are two types of rain gauges: the nonrecording rain gauge and the recording rain gauge. There are a total of 80 meteorological stations in Thailand that record this data daily from 7.00 a.m. to 7.00 a.m. of the next day. However, in these 80 stations, there are missing data. Therefore, to ensure that this research is accurate, only 30 out of the 80 stations which do not have missing data were selected. Figure 1 shows the mean rainfall (mm/day) during 15 May to 15 October 2010 at the 30 stations.

Methodology
Similarity measure is a function which computes the degree of similarity between a pair of objects. Similarity measure can be done in a variety of ways such as using Euclidean distance and absolute error [24]. However, only the Euclidean distance has been developed as a calculation method. The Euclidean distance is the shortest distance between two points, which is a line [14]. Euclidean distance between and is defined by Therefore, distance measurements have been applied by searching for the day in history most similar to the forecast day. Euclidean distance can be applied in many ways such as the AM and the CAM. The MCAM can estimate the rainfall forecast (mm/h) at the station. For example, the AM, the CAM, and the MCAM are represented here.

Analog Method (AM).
The analog method is a simple statistical downscaling method which is based on the selection of similar atmospheric states. The performance of the AM is dependent on the degree of similarity. Wetterhall et al. [25] described that the basic idea of AM is to find a predictor from the historical record which has the same characteristics as a predictor at a given target time.
Let ( ) be predictors from the GCM: Let ( ) be predictors from observation (analysis data): Therefore, Euclidian distance for the analog method to find analog day is defined as in  For the case of 15 May forecast, we got 31 analog days, but we will choose the minimum value compared with the Euclidean distance in each year. Then, we get an analog day for daily measurement (rainfall forecast/time).

Constructed Analog Method (CAM).
The constructed analog is a technique in statistical downscaling which is inspired by analog weather forecasting [16]. The difference between constructed analog and analog method is that the constructed analog creates the analog from a linear combination of 30 analog days. By measure of similarity of analog for two anomalies, "maps" observed at and consist of the following two expressions.

Root Mean Squared Difference (RMSD). This is defined as
where ( , ) is forecast predictor during 15 May 2010 (forecast data), ( , ) is a predictor from observation during 15 May between the years 1979 and 2009 (analysis data), and = 1, . . . , (number of grid points) [26]. The CAM is applied to determine the analog days for each forecast day and is determined from the corresponding 1 day of analysis data. To determine of coefficients for the linear combination is the main concept of the CAM.

Linear Combination for the CAM.
Given an initial condition IC ( , 0 , ), for example, the most recent state The Scientific World Journal 5 (monthly mean map), where 0 is outside the range = 1, . . . , , suitable monthly climatology is removed from the data; henceforth, shall be the anomaly [27]. A constructed analog is defined as where is month ( ), 0 is outside the range = 1, . . . , , is year ( = 1, . . . , ), and are coefficients to be determined to minimize the difference between CA ( , 0 , ) and IC ( , 0 , ). The technical solution to this problem is discussed below in (5) and involves manipulating the alternative covariance matrix . An approximated solution to this problem is given by Van Den Dool [26]. In this study, rainfall forecast for the CAM coefficient is 0.1 [28].

Modified Constructed Analog Method (MCAM).
Modified Constructed Analog Methods are developed from the CAM [27] and the AM [25] with two steps using a technique in statistical downscaling which is inspired by analog weather forecasting. There are 2 steps to develop the MCAM, by determining the appropriate measure in (7) and determining the appropriate method for finding coefficients in the linear combination in (11).

Appropriate Measure of Modified Constructed Analog
Method. The Euclidian distance for Modified Constructed Analog Method is defined as follows: Hence, where PF ( , , , , ) is the forecast predictor during 15 May to 15 October 2010 (forecast data). AF ( , 0 , , , ) is a predictor from observation during 15 May to 15 October between the years 1979 and 2009 (analysis data) determined from the corresponding 5 days. MW is the weight vector by a nonnegative real number and are the coefficients to be determined so as to minimize the difference between PF ( , , , , ) and AF ( , 0 , , , ) at node at iteration ( = 1, . . . , ) (number of grid points). For determining the corresponding 5 days in comparison, we will get an analog day as 155 days/time. Selection of 30 analog days with the minimum Euclidean distance comes from the calculation of 1979-2009 which is similar to the previous year, 2010. Then, we can determine the rainfall forecast at the monitoring stations according to the principle of downscaling techniques. Determining the coefficients for the linear combination is the main concept of the MCAM. To summarize, the concept of the method is to form the best matching historical occurrence of a target pattern and it is assumed that the weather will evolve the same way it did before. The MCAM is a method used to find and select a suitable analog day from a linear combination of the best 30 analog days. Reducing errors in forecasting rainfall by experimentation to find the appropriate method and their coefficients in the linear combination in equations will be presented in the next section.

Finding the Weight of the Modified Constructed Analog
Method. The weight of the Modified Constructed Analog Method based on the weighted sum method by solution to the problem presented in (7) is MW if the weight is positive for all. The updated new value of the weight at iteration can be written as where MIMEU MCAM is the smallest Euclidean distance that was selected in 30 analog days from a total of 155 analog days 6 The Scientific World Journal with predictor data and is the number of analog days ( = 1, . . . , 30). By weight, the sum can be defined as where is the number of weights ( = 1, . . . , 30). The weight of a nonnegative real number is obtained with actual data in each forecast from the calculation.

Linear Combination Method for MCAM.
The linear combination method is defined as follows: where is the value daily rainfall forecast of predictors (G850, MSLP, Q850, and T850) (mm/h) (calibration). RF is the observed rainfall at the stations (analog day). MW is the weighted data ( = 1, . . . , 30).

The Value Rainfall Forecast in Each Predictor for MCAM.
By linear combination for the Modified Constructed Analog Method, the updated predictor data used for daily rainfall forecast in the Modified Constructed Analog Method is defined as where is the weight, ,g850 is the daily forecasted precipitation value for G850 (mm/h), ,mslp is the daily forecasted precipitation value for MSLP (mm/h), ,q850 is the daily forecasted precipitation value for Q850 (mm/h), and finally ,t850 is the daily forecasted precipitation value for T850 (mm/h) observed at the stations ( = 1, . . . , 30).

The Average of Rainfall Forecast for Four Predictors in
the AM, the CAM, and the MCAM. This method is a simple and precise method for calculating and forecasting regional rainfall volume [5]. The new updated value for the average of AM, CAM, and MCAM in 0000, 0600, 1200, and 1800 UTC can be written as in where SRF is the observed rainfall forecast (mm/h) ( is a predictor) (see (13)), with the rainfall forecast for all predictors (four times). Creating the situations of a consistent spatial pattern of rainfall at the stations is required.

Performance Criteria for Rainfall Forecast.
To evaluate the performance of each of the three indexes, the prediction error can be calculated: the correlation coefficient ( 2 ), the root mean square error (RMSE), and the mean absolute percentage error (MAPE) [26].

Correlation Coefficient
where OBS is the value observed at station in Thailand (actual value), RF is the rainfall forecast of predictors (forecast value), and OBS is the mean values of OBS (observed rainfall).
To determine the level of correlation, a coefficient in the range between −1 and +1 is used. The sign shows the direction of correlation. When is close to −1 or +1, this indicates a high level of correlation. When is 0 or close to 0, this indicates little or no correlation. Shown in Table 3 are the levels of correlation.

Root Mean Square Error (RMSE).
The RMSE is frequently used to indicate the sample standard deviation of the forecast and observation, defined as follows [1]: The  Step 4. Determine initial forecast and date and time of the forecast Step 5. Compute the measures AM, CAM, and MCAM Step 6. Find analog day Step 2. Select domain coverage 4 ∘ N to 22 ∘ N and 95 ∘ N to 110 ∘ N Step 3. Downscale the grid size of CFSv2 from 1 ∘ long. × 1 ∘ lat. to 0.5 ∘ long. × 0.5 ∘ lat.

The Mean Absolute Percentage Error (MAPE).
The mean absolute percentage error is a measure of accuracy of the method for constructing rainfall forecasting of the predictors at the station number in statistics, specifically in trend estimation. It usually expresses accuracy as a percentage and is defined by the following equation: where OBS is the value of observed rainfall at the stations in Thailand (actual value) and RF is the rainfall forecast of predictors (forecast value). The closer the values of correlation coefficient are to 1, the more accurate the data will be. Simulations are considered satisfactory when MAPE is below 10% and excellent when MAPE is less than 5% [8]. A percentage error of 0 indicates that the forecasted rainfall and the actual observed rainfall are identical.  Table 4. The process for the AM, the CAM, and the MCAM of the research in this paper is described in Figure 3.
The steps for the simulation of AM, CAM, and MCAM are shown in Figure 4 TAK3  TAK2  TAK1  LBR  SAK  PHK2  PHK1  SRT  SNK  SPB  SUR  KHK  KCN  PBR  NSW  UBT  NAN  RET  TRG  LPG  PCB  NST  CHM  NKH  CHR  LEI  UTD  PHR  NPN       Step 2. Select the domain coverage from 4 ∘ N to 22 ∘ N and 95 ∘ E to 110 ∘ E in Thailand. Examples of this data are shown in Figure 6.
Step 4. Determine the initial forecast, date, and time in Table 4.

Average from all predictors OBS
Percentage error (%) Figure 15: (a) shows the bar graphs for rainfall in millimeters as follows: the average forecasted rainfall by using the AM (blue), the average forecasted rainfall by using the CAM (green), the average forecasted rainfall by using the MCAM (orange), and the actual observed rainfall (red). (b) shows the bar graphs of percentage errors for each of the methods as follows: AM in green, CAM in yellow, and MCAM in red.
Step 6. Find the analog day.
Step 7. Forecast daily rainfall value based on the analog day.

Results and Discussion
In this research, the results for forecasting rainfall from 15 Figure 7) [29,30]. The data demonstrated that the amount of precipitation has increased due to a low pressure trough and a southwestern monsoon that arrives to cover Thailand during the rainy season. The low pressure trough that passes across the country causes precipitation starting from the beginning of the rainy season throughout the months of May and July. In July, the low pressure will shift south again and causes continuous heavy rains until the northwestern monsoons arrive to cover Thailand. When the southwestern monsoon comes to replace the northwestern monsoon during mid-August, northern Thailand will start to have cold weather and decreased rainfall. However, the south will still continue to experience heavy rains. This information is in accordance with the observed rainfall at the meteorology station.
The following intervals are used to consider the amount of daily precipitation: very dry at >0.1 mm, normal at 10.1-35 mm, and very wet at <90.1 mm [27]. The performance of the forecast predictor during 15 May to 15 October 2010 (forecast data) in Thailand is shown in Figure 8. To determine the performance of each predictor, the RSME and MAPE can be checked [26]. The analyzed correlation between the observed and simulated rainfall is shown in Tables 5 and 7 rainfall and forecasted rainfall are displayed in Tables 5-7. From Table 5, by using the two methods, the four predictors G850, MSLP, Q850, and T850 had a positive correlation overall. The values vary according to the amount of rainfall. When the amount of actual observed rainfall increases, the amount of predicted rainfall also increases accordingly.  Table 6. However, these three methods gave the various correlations which are acceptable to statistical calculations and shown in Figure 9. Tables 5 and 6 can be summarized as the value of average rainfall between observed and simulated rainfall for all four predictors. It is found that the average rainfall (observed) at 30 stations is 1119.53 mm. AM gave average rainfall similar to the observed rainfall at Q850 which was 1091.04 mm and the percentage error was 2.54%. CAM gave average rainfall differing from the observed one with high percentage error. MCAM give average rainfall similar to the observed one in T850 at 1133.12 mm and the percentage error is 1.2%.
The results pointed out that MCAM gave a result most similar to the optimized forecast with the least amount of percentage error out of the three methods. This research displays data for the observed rainfall and simulated rainfall using the four predictors which are divided into five regions in Thailand. The data is identified in histogram graphs (Figures 10-12).
These figures show that the forecast percentage errors in the three methods are different but the MCAM gave the rainfall forecast which is most similar to the observed rainfall at NKH, SUR, SAK, PCB, SPB, SRT, NST, and PHK1 which 22 The Scientific World Journal is satisfactory. Performances of the rainfall forecast between observed and simulated rainfall for all predictors are shown and summarized in Table 7 ( Figures 13-15). Performance of the forecast predictor during 15 May to 15 October 2010 (forecast data) is shown in Figure 8.
Another point of interest is the correlation between observed and average rainfall from all predictors in Table 7 ( Figures 10-12), which is higher than 0.82 ( 2 = 0.67) using AM, and the lowest performance correlation is 0.79 ( 2 = 0.61) using CAM. AM gave more correlation than CAM and MCAM, but MCAM gave the minimum percentage error (13.66%). The experimental results are summarized in Table 7 and are compared with the results in Figure 13. This is another way that the application of statistical downscaling can be used for rainfall forecasting by using the MCAM in Thailand.

Conclusion
This paper introduces another method for the development of rainfall forecasting in Thailand. The MCAM is used for statistical downscaling with the four predictors (T850, G850, Q850, and MSLP) when the amount of precipitation is being compared at the stations. Hence, the present downscaling approach is suitable for the simulation of rainfall under changed climate from GCMs [10]. The MCAM investigates rainfall forecasting in five regions at 30 stations in Thailand. This method is compared with AM and CAM. It can reduce the problem of errors in forecasts, the need of intensive computational resources, and the management of large data while simplifying output data. The MCAM is linearly combined with past anomaly patterns such that the combination is as close to the initial desired state as possible. From the results of rainfall forecasting for the three methods, the correlation and percentage error can be determined. It is discovered that the rainfall forecast during 15 May to 15 October 2010 in five regions by using the MCAM gave results that are similar to the observed stations at NKH, SUR, SAK, PCB, SPB, SRT, NST, and PHK1 which are satisfactory. The AM gave more correlation than the CAM and MCAM. However, the MCAM gave the minimum percentage error (13.66%), which shows that the rainfall forecast is closest to the actual observed value. The results are very similar to the actual data. Therefore, the MCAM is an alternative approach to forecast daily precipitation.