Crop Yield Forecasting Using Artificial Neural Networks : A Comparison between Spatial and Temporal Models

Our recent study using historic data of wheat yield and associated plantation area, rainfall, and temperature has shown that incorporating statistics and artificial neural networks can produce highly satisfactory forecasting of wheat yield. However, no comparison has been made between the outcomes from the spatial neural network model and commonly used temporal neural network models in crop forecasting. This paper presents the latest research outcomes from using both the spatial and temporal neural network models in crop forecasting. Our simulation shows that the spatial NNmodel is able to predict the wheat yield with respect to a given plantation area with a high accuracy compared with the temporal NARNN and NARXNNmodels. However, the high accuracy of the spatial NNmodel in crop yield forecasting is limited to the forecasting of crop yield only within normal ranges. Users must be cautious when using either NARNN or NARXNN for crop yield forecasting due to their inconsistency between the results of training and forecasting.


Introduction
Crop yield forecasting plays an important role in farming planning and management, domestic food supply, international food trade, ecosystem sustainability, and so on [1][2][3].For instance, China has the largest population in the world but with limited agricultural land so accurate crop forecasting helps the government provide sufficient food supply to the people.Australia has a small population with vast agricultural land so its concern on crop production is how to optimize revenue from international crop export to countries like China.
There are many factors that have an influence on crop yield, such as plantation area, efficiency of irrigation systems, variations in rainfall and temperature, quality of crop seeds, topographic attributes, soil quality and fertilisation, and disease occurrences [4][5][6][7][8].Crop growing follows seasonal cycles but many of the factors above are largely irrelevant to the temporal factor.For example, plantation area, rainfall, fertilising, and disease occurrence vary yearly; efficiency of irrigation systems, quality of crop seeds, and soil quality may be improved or degraded from year to year; and topographic attributes may largely remain the same for a long period of time.
Effort has been made in using either statistics to identify relationships or neural networks to establish mappings between crop yield and some of these factors [4][5][6][7][8][9][10].Our recent study using historic data of wheat yield and associated plantation area, rainfall, and temperature in Queensland, Australia, has shown that incorporating statistics and artificial neural networks can produce a high level of satisfactory forecasting of wheat yield [10].The neural network employed in this study was a spatial model that treats the wheat plantation areas and yields as mutual mappings, rather than yearly time series.Doubts have been raised about the lack of comparison between the outcomes from this spatial neural network model and commonly used temporal neural network models in crop forecasting.To address this issue, using the wheat yield in Queensland as a reference, this paper presents our research outcomes from using both the spatial and temporal neural network models in crop forecasting.Comparison and discussion are made in terms of their usefulness in crop yield forecasting.This is a comparative study between two types of NNbased forecasting models so our strategy is to focus on examining the performances of existing models that have been applied to forecasting, rather than introducing new forecasting models.Readers who are interested in fundamentals of NN forecasting can refer to [9][10][11][12][13][14] for details.

Nonlinear Autoregressive Neural Network (NARNN)
Model.Nonlinear autoregressive (NAR) is a widely used statistical forecasting model for time series [15,16].The forecasting model takes the form as follows: where () is the forecasted output and  is an unknown function of  previous known outputs.Traditionally, function  is determined by statistical optimization processes, such as the minimum mean square method.The feedforward neural network has been used to establish NAR models, in which the traditional function  is replaced by a number of neurons that work together to implicitly approximate the same functionality [11,17,18] as  () ≈  ( ( − 1) ,  ( − 2) , . . .,  ( − )) where  is the transfer functions;   denotes the input-tohidden layer weights at the hidden neuron ; and   is the hidden-to-output layer weight.This is a time-delay and recurrent neural network model.The input is the known time series which is fed to the hidden layer as input according to the number of time delay.This model is visually illustrated in Figure 1.

Nonlinear Autoregressive with External Input Neural
where [] is the external input to the forecasting model with the same number of time delay as [].Similarly the feedforward neural network is able to establish NARX models, which can be expressed as where  is the transfer functions;   and   denote the inputto-hidden layer weights at the hidden neuron ; and   is the hidden-to-output layer weight.
This time-delay recurrent neural network model uses two known time series as independent inputs to the hidden layer according to the same number of time delay.This model is visually illustrated in Figure 2.

Spatial Feedforward Neural Network Forecasting Model.
Multilayer perceptron (MLP) model belongs to feedforward neural networks.In terms of functionality, MLP has no difference from the neural networks used in both NARNN and NARXNN models if the input is time series.Additionally MLPs have been proven to be able to approximate any continuous function by adjusting the number of nodes in the hidden layer [12], with numerous cases of successful applications [13,14,[22][23][24]. Figure 3 illustrates the general structure of a three-layer MLP with one hidden layer of  nodes, a -dimensional input vector X, and a -dimensional output vector Y.The relationship between the input and output components for this MLP can be generally expressed as where  and  are the transfer functions;   denotes the input-to-hidden layer weights at the hidden neuron ; and   is the hidden-to-output layer weights at the output unit .
There are at least two relevant time series used in the NARXNN model, the internal series [] and external series  [𝑡].Time series analysis emphasises on the appearance of consecutive events.However, for example, in crop yield forecasting, the current plantation area should have a much higher impact on the forthcoming crop yield than the historic yields of any past years.
Treating crop yield and plantation area as a correlated pair, MLPs have been used to approach the nonlinear relation that may exist between the two sequences in a correlated "spatial" manner [9,10], rather than a correlated temporal mode.This has resulted in some encouraging outcomes.

Input layer
Hidden layer Output layer

Crop Data for Neural Network Simulation
3.1.Historic Crop Data.The Queensland historic wheat plantation area in hectare and wheat yield in ton from 1861 to 2007 are extracted from the report of Australian Bureau of Statistics [25], which gives a total of 135 entries over the past 147 years.Both plantation area in hectare and wheat yield in ton are listed in their approximated absolute values each year in the original data.We normalise both factors with their ceilings in the order of millions.The ceiling for plantation area is 13 million hectares and that for wheat yield is 20 million tons.Plot of these two normalised factors is shown in Figure 4(a).After two rounds of outlier detection and exclusion, a thirdorder polynomial correlation has been defined as where  represents the normalised annual wheat yield and  is the normalised plantation area.This correlation fits the filtered data well (Figure 4(b)) with a coefficient of 0.9904.This nonlinear correlation indicates that, through properly training, a neural network system can be used to approach such nonlinear relation between the crop production and plantation area.

Data for Training and Testing Neural Network Models.
Neural network training requires a sufficient amount of data for achieving a high reliability.For MLPs, since temporal factor does not play any role in correlation (6), this correlation can be used to generate more data without changing the general trend.By keeping the same pattern, a moving window operator with different sizes is repeatedly applied to these cleaned data so as to generate more entries to fill the gaps where the original entries are scarce.The final data to train and test the selected neural networks are compiled by mixing the cleaned and regenerated entries together.
For both NARNN and NARXNN models, such data expansion cannot be applied because the training data must be a sequence ordered by time.Among the 135 datasets, the first three are with an interval of 5 years and thus excluded from both NARNN and NARXNN training and testing.Time series data for NARNN model is the normalised annual wheat yield by the corresponding plantation area because NARNN

Results of Neural Network Simulation
Two MLP models are used for training and simulation.By running the process ten times with both the 100-node hidden-layer MLP and 200-node hidden-layer MLP, the simulations produce a highly satisfactory average outcome for both training and testing (Table 1).The difference is that the latter achieves a slightly lower MSE and a higher correlation than the former but both show a high consistency between the results of training and simulation or testing.Two NARNN models are also used in training and simulation.Both models have ten hidden neurons but with 5 delays and 10 delays, respectively.Running ten repetitions for each model has resulted in a fairly satisfactory outcome on average shown in Table 1.Since the model changes data partition for training and testing dynamically between separate runs, a highly satisfactory outcome from training does not always produce a highly satisfactory outcome from testing.In general, the result of simulation is always inferior to that of training, with the 10-delay model performing slightly better than the 5-delay model.Similar trends are found from the results of training and testing the two NARXNN models, whose performance is even worse than that of NARNNs (Table 1).

Discussion
In terms of consistency between the performances of training and testing using the same model, MLPs are able to achieve the highest consistency and hence produce the best simulation results among the three forecasting models.This is mainly because the data used to train MLPs have been subject to outlier cleaning, which means the abnormal wheat yields outside the statistical trend have no impact on the training and testing.In addition, without the temporal constraint, the expanded dataset ensures that the MLPs are adequately trained and tested with multiple crossing validations.Since the original data have been cleaned, in theory the MLPs should only be effective for crop forecasting of any "normal" year.
NARNNs exhibit a highly satisfactory performance in training but the simulation is highly dependent on the selection of testing dataset; hence, the range of forecasting error is large.This indicates that a well trained NARNN model is not able to produce consistently accurate forecasting.This inconsistency between the training and testing is clearly illustrated in Figure 5.Our experiments also show that changes in number of hidden neuron and length of delay (>3) for NARNN do not make significant improvement to the performance of forecasting.Although the NARNNs are not consistent in forecasting, they use the whole data without excluding "abnormal" datasets in both training and testing.This is a complement to MLPs to some extent.
NARXNNs exhibit similar inconsistent patterns between training and testing but even worse than NARNNs (Figure 6).This may be caused by the double impacts on forecasting exerted by the "anomalies" in both the wheat yield series and plantation area series, which were not excluded through data cleaning.
Data cleaning for both NARNN and NARXNN is very challenging since both models use time series as the input.Excluding some temporal events will leave irregular gaps in the time series, which in turn influences the training and testing.The other possible reason that contributed to the inconsistency between the training and testing of both NARNN and NARXNN may be the inadequacy of the historic data in this case.Since we cannot artificially create extra yearly crop yields, like using interpolation to generate extra spatial datasets [9,26], using time series based NN models to forecast crop yield may be immature at this stage.

Conclusion
The spatial NN model is able to predict the wheat yield with respect to a given plantation area with a high accuracy, compared with the temporal NN models such as NARNN and NARXNN.However, the high accuracy of the spatial NN model in crop yield forecasting is only applicable to the forecasting of crop yield within normal ranges because the model is trained using the cleaned and expanded data following a third-order polynomial trend between the crop yield and  plantation area.NARNNs may be used as a complementary means to the MLPs due to its usage of the whole data.Users must be cautious when using either NARNN or NARXNN for crop yield forecasting due to the inconsistency between training and forecasting.
In the future, other factors, such as efficiency of irrigation systems, variations in rainfall and temperature, quality of crop seeds, topographic attributes, soil quality and fertilisation, and disease occurrences, should be incorporated in forecasting model building and simulation.

Figure 4 :
Figure 4: Correlations between normalised wheat yield and plantation area.

Figure 5 :
Figure 5: Linear regression for two separate training and testing runs of NARNN.

Table 1 :
Statistical results of neural network training and testing.time series as the input.To some extent, this normalised series actually absorbs the effect of plantation area on crop yield into the forecasting.For NARXNNs, the normalised annual wheat yield by wheat ceiling is the internal input [] and the normalised plantation area by plantation ceiling is the external input [].If the normalised wheat yield by plantation area is used as the internal input [], the effect of plantation area on wheat yield will be doubly accounted in the forecasting.