Our recent study using historic data of wheat yield and associated plantation area, rainfall, and temperature has shown that incorporating statistics and artificial neural networks can produce highly satisfactory forecasting of wheat yield. However, no comparison has been made between the outcomes from the spatial neural network model and commonly used temporal neural network models in crop forecasting. This paper presents the latest research outcomes from using both the spatial and temporal neural network models in crop forecasting. Our simulation shows that the spatial NN model is able to predict the wheat yield with respect to a given plantation area with a high accuracy compared with the temporal NARNN and NARXNN models. However, the high accuracy of the spatial NN model in crop yield forecasting is limited to the forecasting of crop yield only within normal ranges. Users must be cautious when using either NARNN or NARXNN for crop yield forecasting due to their inconsistency between the results of training and forecasting.
1. Introduction
Crop yield forecasting plays an important role in farming planning and management, domestic food supply, international food trade, ecosystem sustainability, and so on [1–3]. For instance, China has the largest population in the world but with limited agricultural land so accurate crop forecasting helps the government provide sufficient food supply to the people. Australia has a small population with vast agricultural land so its concern on crop production is how to optimize revenue from international crop export to countries like China.
There are many factors that have an influence on crop yield, such as plantation area, efficiency of irrigation systems, variations in rainfall and temperature, quality of crop seeds, topographic attributes, soil quality and fertilisation, and disease occurrences [4–8]. Crop growing follows seasonal cycles but many of the factors above are largely irrelevant to the temporal factor. For example, plantation area, rainfall, fertilising, and disease occurrence vary yearly; efficiency of irrigation systems, quality of crop seeds, and soil quality may be improved or degraded from year to year; and topographic attributes may largely remain the same for a long period of time.
Effort has been made in using either statistics to identify relationships or neural networks to establish mappings between crop yield and some of these factors [4–10]. Our recent study using historic data of wheat yield and associated plantation area, rainfall, and temperature in Queensland, Australia, has shown that incorporating statistics and artificial neural networks can produce a high level of satisfactory forecasting of wheat yield [10]. The neural network employed in this study was a spatial model that treats the wheat plantation areas and yields as mutual mappings, rather than yearly time series. Doubts have been raised about the lack of comparison between the outcomes from this spatial neural network model and commonly used temporal neural network models in crop forecasting. To address this issue, using the wheat yield in Queensland as a reference, this paper presents our research outcomes from using both the spatial and temporal neural network models in crop forecasting. Comparison and discussion are made in terms of their usefulness in crop yield forecasting.
This is a comparative study between two types of NN-based forecasting models so our strategy is to focus on examining the performances of existing models that have been applied to forecasting, rather than introducing new forecasting models. Readers who are interested in fundamentals of NN forecasting can refer to [9–14] for details.
Nonlinear autoregressive (NAR) is a widely used statistical forecasting model for time series [15, 16]. The forecasting model takes the form as follows:
(1)y(t)≈f(y(t-1),y(t-2),…,y(t-d)),
where y(t) is the forecasted output and f is an unknown function of d previous known outputs. Traditionally, function f is determined by statistical optimization processes, such as the minimum mean square method.
The feedforward neural network has been used to establish NAR models, in which the traditional function f is replaced by a number of neurons that work together to implicitly approximate the same functionality [11, 17, 18] as
(2)y(t)≈f(y(t-1),y(t-2),…,y(t-d))=∑ibiψ(∑j=1dajiy(t-j)),
where ψ is the transfer functions; aji denotes the input-to-hidden layer weights at the hidden neuron j; and bi is the hidden-to-output layer weight.
This is a time-delay and recurrent neural network model. The input is the known time series which is fed to the hidden layer as input according to the number of time delay. This model is visually illustrated in Figure 1.
Structure of NARNN with 10 hidden neurons and 5 delays.
2.2. Nonlinear Autoregressive with External Input Neural Network (NARXNN) Model
Nonlinear autoregressive with external input (NARX) is a modified NAR model by including another relevant time series as extra input to the forecasting model [19–21], which can be expressed as
(3)y(t)≈f(x(t-1),x(t-2),…,x(t-d),y(t-1),y(t-2),…,y(t-d)),
where x[t] is the external input to the forecasting model with the same number of time delay as y[t].
Similarly the feedforward neural network is able to establish NARX models, which can be expressed as
(4)y(t)≈∑iciψ(∑j=1d(ajix(t-j)+bjiy(t-j))),
where ψ is the transfer functions; aji and bji denote the input-to-hidden layer weights at the hidden neuron j; and ci is the hidden-to-output layer weight.
This time-delay recurrent neural network model uses two known time series as independent inputs to the hidden layer according to the same number of time delay. This model is visually illustrated in Figure 2.
Structure of NARXNN with 10 hidden neurons and 5 delays.
2.3. Spatial Feedforward Neural Network Forecasting Model
Multilayer perceptron (MLP) model belongs to feedforward neural networks. In terms of functionality, MLP has no difference from the neural networks used in both NARNN and NARXNN models if the input is time series. Additionally MLPs have been proven to be able to approximate any continuous function by adjusting the number of nodes in the hidden layer [12], with numerous cases of successful applications [13, 14, 22–24]. Figure 3 illustrates the general structure of a three-layer MLP with one hidden layer of L nodes, a p-dimensional input vector X, and a q-dimensional output vector Y. The relationship between the input and output components for this MLP can be generally expressed as
(5)yk=φ(∑j=1Lbkjψ(∑ajixi)),
where φ and ψ are the transfer functions; aji denotes the input-to-hidden layer weights at the hidden neuron j; and bkj is the hidden-to-output layer weights at the output unit k.
Three layer multilayer perceptron (MLP) neural network.
There are at least two relevant time series used in the NARXNN model, the internal series y[t] and external series x[t]. Time series analysis emphasises on the appearance of consecutive events. However, for example, in crop yield forecasting, the current plantation area should have a much higher impact on the forthcoming crop yield than the historic yields of any past years.
Treating crop yield and plantation area as a correlated pair, MLPs have been used to approach the nonlinear relation that may exist between the two sequences in a correlated “spatial” manner [9, 10], rather than a correlated temporal mode. This has resulted in some encouraging outcomes.
3. Crop Data for Neural Network Simulation3.1. Historic Crop Data
The Queensland historic wheat plantation area in hectare and wheat yield in ton from 1861 to 2007 are extracted from the report of Australian Bureau of Statistics [25], which gives a total of 135 entries over the past 147 years. Both plantation area in hectare and wheat yield in ton are listed in their approximated absolute values each year in the original data. We normalise both factors with their ceilings in the order of millions. The ceiling for plantation area is 13 million hectares and that for wheat yield is 20 million tons. Plot of these two normalised factors is shown in Figure 4(a). After two rounds of outlier detection and exclusion, a third-order polynomial correlation has been defined as
(6)w=0.8197a3-0.5102a2+0.8511a-0.0073,
where w represents the normalised annual wheat yield and a is the normalised plantation area. This correlation fits the filtered data well (Figure 4(b)) with a coefficient of 0.9904. This nonlinear correlation indicates that, through properly training, a neural network system can be used to approach such nonlinear relation between the crop production and plantation area.
Correlations between normalised wheat yield and plantation area.
Original data
Cleaned data
3.2. Data for Training and Testing Neural Network Models
Neural network training requires a sufficient amount of data for achieving a high reliability. For MLPs, since temporal factor does not play any role in correlation (6), this correlation can be used to generate more data without changing the general trend. By keeping the same pattern, a moving window operator with different sizes is repeatedly applied to these cleaned data so as to generate more entries to fill the gaps where the original entries are scarce. The final data to train and test the selected neural networks are compiled by mixing the cleaned and regenerated entries together.
For both NARNN and NARXNN models, such data expansion cannot be applied because the training data must be a sequence ordered by time. Among the 135 datasets, the first three are with an interval of 5 years and thus excluded from both NARNN and NARXNN training and testing. Time series data for NARNN model is the normalised annual wheat yield by the corresponding plantation area because NARNN takes only one time series as the input. To some extent, this normalised series actually absorbs the effect of plantation area on crop yield into the forecasting. For NARXNNs, the normalised annual wheat yield by wheat ceiling is the internal input y[t] and the normalised plantation area by plantation ceiling is the external input x[t]. If the normalised wheat yield by plantation area is used as the internal input y[t], the effect of plantation area on wheat yield will be doubly accounted in the forecasting.
4. Results of Neural Network Simulation
Two MLP models are used for training and simulation. By running the process ten times with both the 100-node hidden-layer MLP and 200-node hidden-layer MLP, the simulations produce a highly satisfactory average outcome for both training and testing (Table 1). The difference is that the latter achieves a slightly lower MSE and a higher correlation than the former but both show a high consistency between the results of training and simulation or testing.
Statistical results of neural network training and testing.
Training
Testing
N
MSE
R
N
MSE
R
MLP (100)
329
0.0001
0.9996
40
0.0001
0.9943
MLP (200)
329
0.0001
0.9998
40
0.0001
0.9981
NARNN (5 d)
105
0.1035
0.9682
20
0.0231
0.8800
NARNN (10 d)
105
0.0069
0.9753
20
0.0188
0.9298
NARXNN (5 d)
105
0.0174
0.908
20
0.0528
0.6945
NARXNN (10 d)
105
0.0271
0.8574
20
0.0628
0.6801
Two NARNN models are also used in training and simulation. Both models have ten hidden neurons but with 5 delays and 10 delays, respectively. Running ten repetitions for each model has resulted in a fairly satisfactory outcome on average shown in Table 1. Since the model changes data partition for training and testing dynamically between separate runs, a highly satisfactory outcome from training does not always produce a highly satisfactory outcome from testing. In general, the result of simulation is always inferior to that of training, with the 10-delay model performing slightly better than the 5-delay model. Similar trends are found from the results of training and testing the two NARXNN models, whose performance is even worse than that of NARNNs (Table 1).
5. Discussion
In terms of consistency between the performances of training and testing using the same model, MLPs are able to achieve the highest consistency and hence produce the best simulation results among the three forecasting models. This is mainly because the data used to train MLPs have been subject to outlier cleaning, which means the abnormal wheat yields outside the statistical trend have no impact on the training and testing. In addition, without the temporal constraint, the expanded dataset ensures that the MLPs are adequately trained and tested with multiple crossing validations. Since the original data have been cleaned, in theory the MLPs should only be effective for crop forecasting of any “normal” year.
NARNNs exhibit a highly satisfactory performance in training but the simulation is highly dependent on the selection of testing dataset; hence, the range of forecasting error is large. This indicates that a well trained NARNN model is not able to produce consistently accurate forecasting. This inconsistency between the training and testing is clearly illustrated in Figure 5. Our experiments also show that changes in number of hidden neuron and length of delay (>3) for NARNN do not make significant improvement to the performance of forecasting. Although the NARNNs are not consistent in forecasting, they use the whole data without excluding “abnormal” datasets in both training and testing. This is a complement to MLPs to some extent.
Linear regression for two separate training and testing runs of NARNN.
NARXNNs exhibit similar inconsistent patterns between training and testing but even worse than NARNNs (Figure 6). This may be caused by the double impacts on forecasting exerted by the “anomalies” in both the wheat yield series and plantation area series, which were not excluded through data cleaning.
Linear regression for two separate training and testing runs of NARXNN.
Data cleaning for both NARNN and NARXNN is very challenging since both models use time series as the input. Excluding some temporal events will leave irregular gaps in the time series, which in turn influences the training and testing. The other possible reason that contributed to the inconsistency between the training and testing of both NARNN and NARXNN may be the inadequacy of the historic data in this case. Since we cannot artificially create extra yearly crop yields, like using interpolation to generate extra spatial datasets [9, 26], using time series based NN models to forecast crop yield may be immature at this stage.
6. Conclusion
The spatial NN model is able to predict the wheat yield with respect to a given plantation area with a high accuracy, compared with the temporal NN models such as NARNN and NARXNN. However, the high accuracy of the spatial NN model in crop yield forecasting is only applicable to the forecasting of crop yield within normal ranges because the model is trained using the cleaned and expanded data following a third-order polynomial trend between the crop yield and plantation area. NARNNs may be used as a complementary means to the MLPs due to its usage of the whole data. Users must be cautious when using either NARNN or NARXNN for crop yield forecasting due to the inconsistency between training and forecasting.
In the future, other factors, such as efficiency of irrigation systems, variations in rainfall and temperature, quality of crop seeds, topographic attributes, soil quality and fertilisation, and disease occurrences, should be incorporated in forecasting model building and simulation.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
JameY. W.CutforthH. W.Crop growth models for decision support systems19967619192-s2.0-0030465703YunJ. I.Predicting regional rice production in South Korea using spatial data and crop-growth modeling200377123382-s2.0-003741123610.1016/S0308-521X(02)00084-7TaoF.YokozawaM.ZhangZ.XuY.HayashiY.Remote sensing of crop production in China by production efficiency models: models comparisons, estimates and uncertainties200518343853962-s2.0-1474426720010.1016/j.ecolmodel.2004.08.023HoogenboomG.Contribution of agrometeorology to the simulation of crop production and its applications20001031-21371572-s2.0-003421320710.1016/S0168-1923(00)00108-8HeY.ZhangY.ZhangS.FangH.Application of artificial neural network on relationship analysis between wheat yield and soil nutrientsPrceedings of the IEEE 27th Annual Conference on Engineering in Medicine and BiologySeptember 2005Shanghai, China4530453310.1109/IEMBS.2005.16154762-s2.0-33846924569GreenT. R.SalasJ. D.MartinezA.ErskineR. H.Relating crop yield to topographic attributes using spatial analysis neural networks and regression20071391-223372-s2.0-3394760146810.1016/j.geoderma.2006.12.004de JongE.RennieD. A.Effect of soil profile type and fertilizer on moisture use by wheat grown on fallow or stubble land196949218919710.4141/cjss69-026CampbellC. A.ZentnerR. P.JohnsonP. J.Effect of crop rotation and fertilization on the quantitative relationship between spring wheat yield and moisture use in southwestern Saskatchewan19886811162-s2.0-0024224943GuoW. W.LiL. D.WhymarkG.Simulating wheat yield in New South Wales of Australia using interpolation and neural networks20106444Berlin, GermanySpringer708715Lecture Notes in Computer Science10.1007/978-3-642-17534-3_87GuoW. W.XueH.An incorporative statistic and neural approach for crop yield modelling and forecasting20122111091172-s2.0-8485601925310.1007/s00521-011-0636-0LiangF.Bayesian neural networks for nonlinear time series forecasting2005151132910.1007/s11222-005-4786-8MR2137214HornikK.StinchcombeM.WhiteH.Multilayer feedforward networks are universal approximators1989253593662-s2.0-0024880831GuoW. W.LiM. M.WhymarkG.LiZ.-X.Mutual complement between statistical and neural network approaches for rock magnetism data analysis2009366967896822-s2.0-6404910727610.1016/j.eswa.2008.11.045LiM. M.GuoW.VermaB.TickleK.O'ConnorJ.Intelligent methods for solving inverse problems of backscattering spectra with noise: a comparison between neural networks and simulated annealing20091854234302-s2.0-6734918557510.1007/s00521-008-0219-xHwangS. Y.BasawaI. V.Large sample inference based on multiple observations from nonlinear autoregressive processes199449112714010.1016/0304-4149(93)00068-QMR1258286ZBL0796.62074KapetaniosG.Nonlinear autoregressive models and long memory200691336036810.1016/j.econlet.2005.12.006MR2239130ZBL1255.62255Taskaya-TemizelT.CaseyM.A comparative study of autoregressive neural network hybrids2005185-67817892-s2.0-2774446713810.1016/j.neunet.2005.06.003PawlusW.KarimiH. R.RobbersmyrK. G.Data-based modeling of vehicle collisions by nonlinear autoregressive model and feedforward neural network20123565792-s2.0-8485991498210.1016/j.ins.2012.03.013ChatterjeeS.NigamS.SinghJ. B.UpadhyayaL. N.Software fault prediction using Nonlinear Autoregressive with eXogenous Inputs (NARX) network20123711211292-s2.0-8005238039910.1007/s10489-011-0316-xArbainS. H.WibowoA.Neural networks based nonlinear time series regression for water level forecasting of Dungun River2012891506151310.3844/jcssp.2012.1506.1513HuoF.PooA.-N.Nonlinear autoregressive network with exogenous inputs based contour error reduction in CNC machines2013674552GuoW. W.Incorporating statistical and neural network approaches for student course satisfaction analysis and prediction2010374335833652-s2.0-7154916600810.1016/j.eswa.2009.10.014BojórquezE.BojórquezJ.RuizS. E.Reyes-SalazarA.Prediction of inelastic response spectra using artificial neural networks20122012593748010.1155/2012/937480ShaoY. E.Prediction of currency volume issued in Taiwan using a hybrid artificial neural network and multiple regression approach20132013967674210.1155/2013/676742Australian Bureau of Statistics2008Canberra, AustraliaCommonwealth of AustraliaGuoW. W.A novel application of neural networks for instant iron-ore grade estimation20103712872987352-s2.0-7795784426710.1016/j.eswa.2010.06.043