Modelling Agro-Met Station Observations Using Genetic Algorithm

The present work discusses the development of a nonlinear data-fitting technique based on genetic algorithm (GA) for the prediction of routine weather parameters using observations from Agro-Met Stations (AMS). The algorithm produces the equations that best describe the temporal evolutions of daily minimum and maximum near-surface (at 2.5-meter height) air temperature and relative humidity and daily averaged wind speed (at 10-meter height) at selected AMS locations. These enable the forecasts of these weather parameters, which could have possible use in crop forecast models. The forecast equations developed in the present study use only the past observations of the above-mentioned parameters. This approach, unlike other prediction methods, provides explicit analytical forecast equation for each parameter. The predictions up to 3 days in advance have been validated using independent datasets, unknown to the training algorithm, with impressive results. The power of the algorithm has also been demonstrated by its superiority over persistence forecast used as a benchmark.


Introduction
In the recent past, nonlinear data-adaptive approach of genetic algorithm [1,2] for the forecast of a particular meteorological parameter has gained considerable ground. This is particularly advantageous when a location-specific forecast of a specific parameter is required and when a long time series of observations (either in situ or remotely sensed) of the relevant parameter exists. The algorithm is computationally extremely cheap and produces good quality forecasts. An additional advantage of genetic algorithm (GA) is that an explicit analytical forecast equation is obtained as an outcome of the algorithm [3,4]. Also, the algorithm can work with only a small number of data points, typically of the order of a few hundred. Predictive skill of GA has been amply demonstrated in the cases of sea surface temperature (SST) in the Alboran Sea [5] and Arabian Sea [6], summer monsoon rainfall over India [7], SST and sea level anomaly in the Ligurian Sea [8], wave heights in the north Indian Ocean [3] and more recently, for basin-scale predictions of ocean surface wind in the north Indian Ocean [9] and wave heights in the Bay of Bengal [10]. Even tidal currents from a few tidal levels have been predicted using GA [11]. At the outset it should be clarified that we are not attempting to provide an alternate method of weather prediction, which can be done only by a numerical weather prediction (NWP) model, since only such a model can provide multilevel forecasts of a large number of meteorological parameters over a wide area (encompassing sometimes even the entire globe) at regular time intervals. Rather, GA should be viewed as a useful method of forecast for those who need the forecast of a few meteorological parameters at just a few locations.
The focus of our study is the prediction of daily maximum and minimum near-surface air temperature and daily maximum and minimum near-surface humidity (at 2.5 m height) and daily averaged wind speed at 10 m height at isolated locations which are representatives of very small areas (of the order of 500 m × 500 m). Such forecasts are integral components of several crop forecast models [12]. Forecasts of these parameters from NWP models represent average over a large area (typically of the order of 100 km × 100 km), which does not serve the purpose of crop forecast models.  GA is, of course, not the only available technique for  predicting a time series. There are other prediction techniques which, like GA, operate directly on a data time  series, for example, techniques of polynomial fitting, nearest neighbors, and artificial neural networks [13]. Unlike GA, these techniques, however, use a priori fixed architecture. The first two methods are suitable primarily for systems obeying relatively simple nonchaotic dynamics [13]. Artificial neural networks (ANNs) learn the forecast rules directly from the data using input, hidden, and output layers. Although suitable for chaotic dynamics, the disadvantage is that the network architecture (the number of hidden layers, the number of neurons in these layers, etc.) is fixed a priori, often quite heuristically [13]. Thus the degree of freedom in this technique for searching the forecast rule is limited. GA, on the other hand, finds the architecture itself from the data using Darwinian evolution (to be explained later). The most important feature of GA is that it allows an effective search in a high-dimensional phase space. This flexibility is the reason for its success for systems with complex chaotic dynamics like natural geophysical phenomena. Another noteworthy advantage is that, unlike ANNs, it provides explicit, easy to use in practice, analytical forecast equations, which has already been mentioned by us.
As mentioned earlier, the focus is on the prediction of four maximum/minimum quantities and one average quantity on a daily basis. For this purpose, half-hourly observations collected by 5 micrometeorological towers (also known as Agro-Met Stations or AMS in brief) have been used. From these half-hourly observations, the quantities to be predicted have been computed and subsequently time series of these quantities have been subjected to forecast by GA, the details of which are described later in this paper.

Agro-Met Station (AMS) Data
The AMS are programmed to record air temperature, relative humidity, and winds at half-hourly interval at three vertical levels over different short vegetation cover types [14], apart from radiative and soil heat fluxes across different agroclimatic zones in India. The AMS data are available from the website (http://www.mosdac.gov.in/) maintained by Space Applications Centre, ISRO, India. These AMS stations are deployed during the year 2009. Good quality data from the next two years have been used in this study. Data from five AMS were used in the study (Table 1). For each AMS, the last three months of data were used for validating the algorithm, while the remaining data were used for training the algorithm. Daily maximum and minimum values of air temperature and relative humidity were selected from halfhourly observations of these quantities, while daily average wind speed is simply the mean of all the 48 observations of wind speed at 10-meter height in a day. (GA). The algorithm has been discussed in detail by several authors [1,2,9,15]. However, for the benefit of the present readers, we describe below the essential details of the algorithm, borrowing heavily from the reference [15]. As mentioned above, the problem is to find a suitable prediction function such that the parameter concerned (say, ) at a particular time can be predicted from the past observations of the same parameter. In other words,

Genetic Algorithm
where denotes the embedding dimension and denotes the time interval between successive observations. That such a function exists for a deterministic, albeit chaotic, time series is guaranteed by Takens' theorem [16]. Unfortunately, the theorem is silent about the method to find the function. GA is one such method, which takes its cue from evolutionary biology. The starting point of the algorithm is an initial population of "equation strings" which are simply short equations connecting the past values of the parameter concerned. The connections are quite random, meaning two values of the parameter at arbitrary past times are connected randomly by an arithmetic operator (addition, subtraction, multiplication, or division). The coefficients of these connections are real numbers, taken randomly from a set, generally [−10, +10]. Every equation string is an approximation of the prediction function. The equation strings are ranked according to their strength, a measure of which is a well-defined mathematical expression. Before coming to this quantity, we note that (1) is valid only for one step ahead prediction. However, longer term predictability is also addressable by GA. This is simply the following obvious generalization: ( ) = ( ( − ) , ( − ( + 1) ) , . . . , valid for steps ahead prediction, = 1 being the special case of one step ahead prediction defined by (1). Henceforth, we divide the time series into two parts, namely, the training set and the validation set. As the name signifies, the algorithm is trained using data in the training set, while the data in the validation set is withheld from the algorithm. Once the training is complete by virtue of satisfying a specific training criterion, an analytical forecast equation is obtained (which is simply the strongest equation set, or the fittest equation, following Darwinian terminology). This forecast equation is applied to the validation set, unknown to the algorithm, to see how the forecast performs. This is really a stringent test of the algorithm. We now come to the actual training part. The th equation string with mapping function is used to compute estimates of all ( ) in the training set, as a function of the previous values of the time series. The fitness for the equation string can then be computed as International Journal of Atmospheric Sciences 3 where = and is the total length of the training set. One can further define a strength index for each individual in the following fashion: where ⟨ ⟩ represents the mean value of the training data. can be interpreted as the percentage of the training set's total variance explained by the th equation string. It is quite clear that the closer is the value of toward unity, the stronger is the th individual equation string. The major steps involved in the algorithm are (1) initialization, (2) computing the fitness, (3) ranking the agents, (4) choice of mates, (5) reproduction and crossover, and (6) mutation. These steps have been elaborately described in [15]. We choose not to dwell any further on them, since the manner in which the final prediction equation is arrived at is not vitally important for interpretation of the results of this particular study. Interested readers are requested to consult the mentioned reference for more details. The mentioned steps are run and rerun for a certain number of generations or until some stopping criterion is satisfied, for example, when strength index no longer increases. Finally, the topranked equation is selected and is broken down into a concise formula. The only thing which escaped our attention so far is the embedding dimension . This owes its origin to the theory of deterministic chaos. When the system under investigation obeys deterministic chaotic dynamics, there is a strange attractor (geometric object which attracts the statespace trajectories, after the transients die out). In principle, the fractional dimension of the attractor and consequently the embedding dimension (which is necessarily an integer, being the smallest dimension of the space in which the attractor can be faithfully embedded) can be accurately estimated. However, a very large amount of data, of the order of a few tens of thousands, is needed for this estimation. Genetic algorithm, on the contrary, works with a small amount of data, of the order of a few hundred. Fortunately, it has been established in the past studies (and also, during the course of this work) that embedding dimension can be found simply by trial and error, starting from a very small dimension like 3. One has to gradually increase this dimension and calculate strength of the final prediction equation. At a particular value of , this strength is the maximum and we opt for that value as the final choice for .
3.2. Training of GA. As explained in the previous section, GA works with equation strings, and the algorithm has to be told beforehand how many such strings are to be taken. Similarly, the total number of arguments and the arithmetic operators connecting these arguments have to be defined beforehand. These are called the parameters of the algorithm. Another important parameter, namely, the embedding dimension, was defined earlier. This basically tells the algorithm how many past values are to be used while training the algorithm. The last important parameter is mutation rate [15]. In our case, the following values were assigned to these parameters. The number of equation strings was 60. The total number of arguments and operators used was 20 in each case. The embedding dimension varied from 2 to as high as 25. The mutation rate was chosen to be 0.01. The number of iterations required to achieve maximum strength index also varied from case to case. The number of iterations was of the order of a few thousand. The analytical forecast equations (a representative few are provided in the appendix) were later applied to the validation dataset (separately for each parameter concerned) and the forecasts were evaluated statistically.

Statistical Evaluation of the Accuracy of Prediction.
We considered mean differences (BIAS) and root mean square differences (RMSD) as the standard statistical yardsticks to evaluate the quality of our forecast. The GA predicted daily minimum and maximum values of near-surface (at 2.5 m height) temperature, relative humidity, and daily averaged surface (10 m) wind speed were compared with observed AMS parameters for computing the error statistics. Comparisons were done for all the locations for independent validation dataset ( Table 1). Forecast of any geophysical parameter is compulsorily to be compared with persistence forecast defined below, which is a very simple and elementary method of forecast, operating on the principle that geophysical phenomena are generally persistent in nature and their change is generally not a rapid one. Any forecast is simply meaningless unless it is able to improve on persistence forecast defined by the equation where denotes the lead time of forecast. In other words, persistence assumes that the present conditions are not going to change at all for forecast steps. The following improvement parameter normally indicates the degree of improvement: where "PER" is the persistence forecast, "GA" is the forecast produced using genetic algorithm, "AMS" is the AMS-observed value, and is the total number of forecasts. A positive (negative) value of this parameter signifies

Daily Maximum ( ) and Minimum ( ) Air Temperatures.
As regards the minimum temperature ( min ), the forecasts for all the three lead times are near perfect, the correlation coefficient between predicted and observed min being always more than 0.93. Compared to the observations, GA shows negative BIAS for all the forecast lead times. Further, it is observed that the absolute value of BIAS ( Figure 1) is more as the forecast lead time increases. The RMSD displays increasing trend, while correlation coefficient displays decreasing trend, in conformity with our expectations. As mentioned previously, because of chaotic behaviour of weather parameters, this degradation is generic in nature and has been also observed in previous studies of similar nature. Similar to min , the maximum temperature ( max ) forecast is also degraded as the forecast horizon is increased from 24 hr to 72 hr period, as seen from Figure 1. The correlation coefficients greater than 0.94 are observed for all the cases of forecast which obviously demonstrates the power of GA forecast. Slightly less BIAS and RMSD are observed for max forecast as compared to min forecast. The same story of the degradation in forecast quality with increasing length of the forecast horizon is noticed here also. The positive value of the improvement parameter for both maximum and minimum temperatures ( Table 2) demonstrates the superiority of GA over persistence. Although in Table 2 we show the improvement parameter by clubbing results of all the stations, it has been checked that the parameter is positive for each station. It can be also noted that the improvements for min are better than those for max .

Daily Maximum (
) and Minimum ( ) Relative Humidity. The comparison of 24 hr, 48 hr, and 72 hr forecasts of daily minimum (RH min ) and maximum (RH max ) relative humidity with corresponding AMS-observed humidities in the validation set is shown in Figure 2. Similar to temperature forecast, a systematic degradation in the quality of forecast is observed, as the length of the forecast horizon increases. RMSD of 8.6, 10.2, and 10.5% are noted in 24, 48, and 72 hr RH min forecasts, respectively. BIAS value of RH min is increasing and correlation coefficient is decreasing as the forecast length is increased from 24 hr to 72 hr. Of course, for the case of maximum humidity, the correlation and the RMSD of 72 hr forecast are marginally better than the corresponding quantities for 48 hr forecast, but the BIAS is much higher. Similar to temperature forecast, more percentage improvement is observed in RH min forecast compared to RH max forecast ( Table 2). Wind speed forecast using GA shows maximum percentage improvement (more than 12%) among all the five predicted parameters ( Table 2).

Conclusion
A nonlinear data-fitting technique based on genetic algorithm has been applied for predicting daily maximum and minimum near-surface air temperatures, daily maximum and minimum near-surface relative humidities, and daily average wind speed at 10 m height at selected Agro-Met Stations. The forecast lead time varies from one to three days.
The unique beauty of the method is that it is not computerintensive, as huge data inputs are not required for carrying out the prediction. Neither does it require any auxiliary dataset (apart from the basic data consisting of the measurements of the parameter concerned at regular time intervals) and still manages to produce explicit analytical forecast equation.
Thus it is ideal for those scientists, for example, scientists involved in crop forecast modeling, who only require forecasts of certain Agro-Met parameters at isolated locations, representative of a small area. The shortcoming is of course that it is not easy to generalize this algorithm for predicting weather parameters over the entire globe, or over large threedimensional regions, although there are promising twodimensional generalizations [5,6,[8][9][10]15]. The method is highly suitable for location-specific forecasts as demonstrated in the present study. The AMS forecast locations have been chosen carefully so as to reflect the diverse climates of the Indian landmass. Validation of the forecast was carried out with independent validation datasets, unknown to the training algorithm. It was found that the GA forecasts are of reasonably good quality. They are also improvements over corresponding persistence forecasts in each of the considered cases, as seen from the positive values of the improvement parameter. The improvement is more as length of the forecast horizon increases (barring isolated cases), as persistence begins to lose its predictability. We can possibly conclude by saying that the method advocated in the present paper is an extremely powerful technique for carrying out location-specific forecasts of Agro-Met parameters. It must have been noted that the study period consists of only two years. If we divide the dataset into seasons, there will be insufficient number of data points for a reliable GA forecast. More data points are required for studying seasonal variation of forecasts using GA method. GA forecast also depends on geographical locations of AMS station. Again, more data points are required to reach any conclusion.
Presently, India has 1100 INSAT-linked Automatic Weather Stations (AWS) maintained by ISRO (Indian Space Research Organization) as well as 1000 AWS maintained by India Meteorological Department (IMD). Apart from that, many private agencies install AWS for settling crop insurance claims by farmers based on weather derivatives. Therefore, the benefits of application of this approach can be harnessed in economic terms if one applies this algorithm for the forecast of the AWS measured weather parameters.