Integrated Feature Selection of ARIMA with Computational Intelligence Approaches for Food Crop Price Prediction

Because of global climate change, lack of arable land, and rapid population growth, the supplies of three major food crops (i.e., rice, wheat, and corn) have been gradually decreasing worldwide. The rapid increase in demand for food has contributed to a continuous rise in food prices, which directly threatens the lives of over 800 million people around the world who are reported to be chronically undernourished. Consequently, food crop price prediction has attracted considerable attention in recent years. Recent integrated forecasting models have developed various feature selection methods (FSMs) to capture fewer, but more important, explanatory variables. However, one major problem is that the future values of these important explanatory variables are not available. Thus, predictions based on these variables are not actually possible. Because an autoregressive integrated moving average (ARIMA) can extract important self-predictor variables with future values that can be calculated, this study incorporates an ARIMA as the FSM for computational intelligence (CI) models to predict three major food crop (i.e., rice, wheat, and corn) prices. Other than the ARIMA, the components of the proposed integrated forecasting models include artificial neural networks (ANNs), support vector regression (SVR), and multivariate adaptive regression splines (MARS). The predictive accuracies of ARIMA, ANN, SVR, MARS, and the proposed integrated model are compared and discussed. Experimental results reveal that the proposed integrated model achieves superior forecasting performance for predicting food crop prices.


Introduction
1.1.Background.Everyone needs food.However, not everyone has enough food to survive.Food crops are a primary source of human food, with rice, wheat, and corn being the most widely consumed sources of grains around the world.In this study, food crops refer to plants, which provide food for human consumption, cultivated by agriculturists.The demand and consumption for food crops will rapidly increase in the future, propelled by a 2.3 billion person increase in global population and greater per capita incomes anticipated through the midcentury [1,2].According to the Food and Agriculture Organization of the United Nations (FAO) [3], the demand for food is expected to grow substantially by 2050.A major factor for this increase is world population growth.Today, world population has reached 7.6 billion, and we may reach 9.7 billion by 2050.This growth, along with rising incomes in developing countries, is driving up global food demand.Consequently, humanity may directly face one difficulty, which is "will enough food crops be produced at affordable prices or will rising prices drive more of the humanity into hunger?" The three most important food crops, rice, wheat, and corn, directly provide more than 50% of all calories consumed by the world population.While the area harvested for wheat each year is 214 million ha, the areas harvested for rice and corn are 154 million ha and 140 million ha, respectively [4].Additionally, the rapid rise in food prices has been a burden on the poor in developing countries, who spend roughly half their household income on food.The issue of being able to affordably purchase the aforementioned food crops plays a very important role in human life.
Therefore, from the human point of view, the prediction of prices for rice, wheat, and corn has become a significant research topic.
1.2.Related Work.Owing to various agricultural and environmental factors, meteorological factors, and biophysical factors, it was indicated in [5] that the exports of rice were very difficult to predict.They employed the autoregressive integrated moving average (ARIMA) and artificial neural network (ANN) methodologies to predict Thailand's rice exports.The wheat price in the Chinese market was predicted by using the ARIMA, ANN, and linear combination models [6].The overall results showed that the ANN technique was the best prediction model.In Africa, around 40% of the rice consumed is imported [7].This high dependence on rice imports indicates that Africa is highly exposed to international rice market shocks with sometimes grave consequences for its food security and political stability.In addition, models were constructed to predict the quarterly prices of two types of food crops, barley and wheat [8].In [9], it was reported that the world food prices would rise around 32% by 2022.
They showed the dynamic relationship between acreage response, food crop yield, and price volatility by developing an optimization model [10].In [11], it was stated that most countries would like to predict their annual food requirements in order to provide food security for the people.They proposed an artificial intelligent support vector regression (SVR) model to predict the output energy in rice production.
Because food crop prices are seasonally affected [5,6], this study uses the ARIMA to predict the prices of rice, wheat, and corn.However, the assumptions inherent to the linear form of ARIMA may encounter problems in adopting nonlinear relationships for practical data [12].Computational intelligence (CI) techniques have been widely used in many forecasting applications because of data-driven features and fewer a priori assumptions.Accordingly, in addition to ARIMA modeling, this study uses CI schemes, including ANN, SVR, and multivariate adaptive regression splines (MARS), for predicting the prices of the three food crops because they allow nonlinearity modeling and provide good forecasting characteristics.
However, CI modeling may face difficulties in its training process for designing an optimal topology owing to the use of a high number of input variables.Therefore, feature selection methods (FSMs) have been incorporated in order to reduce the number of explanatory or predictor variables [13,14].In this study, feature selection refers to the process of identifying a subset of relevant explanatory or predictor variables for use during forecasting model construction.This subset of variables contains fewer but more important input variables that aid in predicting the outcome.
Feature selection techniques have become a research hot spot in many forecasting applications.For example, in order to accurately predict wind speed, an SVR forecasting model with a feature selection procedure called phase space reconstruction has been proposed [15].Additionally, a set of general ARIMA models was used for performance comparison.A hybrid forecasting model with a feature selection technique based on mutual information, extreme learning machines, and bootstrap techniques was proposed to predict dayahead electricity prices [16].The authors of [17] proposed a hybrid filter-wrapper approach to predict electrical load and the price of electricity.The feature selection method used in their approach identifies a minimum subset of the most informative features by considering the relevancy, redundancy, and interactions of candidate inputs.In [18], the authors proposed a two-step approach that selects a set of candidate features based on data characteristics.A hybrid ANN-based model was then used to predict the day-ahead price of electricity.A hybrid forecasting methodology was designed to predict electrical load in [19].The study used a feature selection method that was developed based on entropy and several CI techniques.The prediction performance of the proposed model was compared to various ARIMA models.A regression model was proposed to predict online sales in [20].Additionally, a feature selection methodology based on a multiobjective evolutionary algorithm was combined with the forecasting model.A combined ANN and Kalman filter approach was proposed to predict wind speed in [21].An ARIMA feature selection method was used to determine the input structure for their hybrid model.
When reviewing related research, we noticed that, although many studies have focused on using FSMs to obtain accurate forecasts, little attention has been devoted to the use of FSMs for food crop price prediction.Additionally, we encounter practical problems when using FSMs to predict food crop prices.As mentioned earlier, one of the major purposes for using an FSM is to select a subset of explanatory variables for further use in forecasting.However, the future values of the explanatory variables that are selected by the FSM are unknown.Relatively little research has attempted to address this problem.As a consequence, predictions cannot be computed even though the most important explanatory variables have been identified by effective FSMs.We propose an integrated forecasting model to remedy the problems caused by unknown explanatory variables.The proposed integrated model employs an ARIMA as an FSM to capture important self-predictor variables, which then serve as input variables for the ANN, SVR, and MARS models.Because the future values of a self-predictor variable can be computed based on its previous and current values, food crop price predictions can be obtained.
In this study, we propose four single-stage models and three integrated models for predicting food crop prices.A real monthly dataset was obtained, which contains the prices of rice, wheat, and corn from January 1990 to September 2015.This real dataset makes it possible to compare predictions of food crop prices using single-stage models and integrated models.The remainder of the study is organized as follows: Section 2 presents the modeling methodologies of different forecasting schemes.In Section 3, we describe the design structure for our single-and two-stage integrated models.Real data on the prices of three food crops are collected to verify the single-stage models and integrated models.Section 4 contains forecasting comparisons for four single-stage models and three integrated models.Some practical implications and other concerns are addressed in Section 5.The final section summarizes our research findings and contains our conclusions.

Forecasting Methodologies
2.1.Autoregressive Integrated Moving Average.Because seasonal effects are possibly involved in food crop price forecasting, seasonal ARIMA modeling should be developed [22].A seasonal ARIMA model can be described as follows: δ is an unknown constant; a t is the white noise at time t, which is independent and identical (iid) with normal distribution; p is the order of nonseasonal autoregressive (AR) models; P is the order of seasonal AR models; q is the order of nonseasonal moving average (MA) models; Q is the order of seasonal MA models; ϕ p B is a polynomial function for a nonseasonal AR model, defined as Φ P B L is a polynomial function for a seasonal AR model, defined as The original nonstationary time series should be transformed into a stationary working series through differencing.Typically, the appropriate transformations can be performed by four combinations of d and D; that is, d, D = 0, 0 , d, D = 1, 0 , d, D = 0, 1 , and d, D = 1, 1 , respectively.Once the stationary working series has been attained, we can observe the behavior of the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) to determine the values of p, P, q, and Q for the seasonal ARIMA models.
They prefer using the maximum likelihood (ML) technique to obtain estimates for model parameters [22].For the ML technique, the likelihood function is maximized via nonlinear least squares using Marquardt's method.Because the ML approach is more computationally expensive than conditional least squares (CLS) estimates, most computer packages employ CLS as a default approach.LS refers to the parameter estimate associated with the smallest sum of squared errors (SSE).For the CLS approach, we assume that the number of past unobserved errors is zero.The data series y t can be represented as follows: The π weights are calculated as follows: The CLS approach should produce parameter estimates that can minimize the following: where the unobserved past values of y t are set to 0 and π i is computed from the estimates of the ARIMA model parameters ϕ and θ at each iteration.
After estimating the model parameters, the diagnostic checks for testing the lack of fit of the seasonal ARIMA model should be conducted.A logical way to check the adequacy of a seasonal ARIMA model is to analyze the residuals obtained from the underlying model.The Ljung-Box test was developed to examine whether the first K sample autocorrelations of residuals indicate adequacy of the model [23].The null hypothesis for this test is that the first K autocorrelations are jointly zero; that is, The Ljung-Box statistics are described as where Q is the Ljung-Box statistics, and the asymptotic distribution of Q follows a chi-square distribution with k − n p degrees of freedom; n is the number of observations; d is the degree of nonseasonal differencing; n p is the number of parameters other than δ that must be estimated in the ARIMA model under consideration; r 2 l â is the square of the sample autocorrelation of the residuals at lag l.
The Ljung-Box test rejects the null hypothesis, which indicates that the underlying model has significant lack of fit if where χ 2 α k − n p is the chi-square distribution table value with k − n p degrees of freedom and significance level α.

Artificial Neural Networks.
ANN is composed of a large number of highly interconnected processing elements, which are referred to as nodes or neurons, working in parallel to 3 Complexity solve a particular problem.While the leftmost layer of the network is referred to as the input layer, the rightmost layer is called the output layer.The middle layer is called the hidden layer.In the ANN structure, each layer consists of a number of nodes connected by links.The ANN contains nodes that are connected to themselves, enabling a node to influence other nodes and itself.ANN includes input variables and output variables.In addition, a certain number of nodes are contained in the hidden layer, and the hidden nodes are nonlinear functions of the input variables.The output variable is a function of the nodes in the hidden layer.
ANN modeling is briefly described as follows.For an ANN model, the relationship between inputs I 1 , I 2 , … , I a and output (O) can be represented as where α j j = 0, 1, 2, … , b and δ ij i = 0, 1, 2, … , a ; j = 0, 1, 2, … , b are the model connection weights; a is the number of input nodes; b is the number of hidden nodes; and ε is the error term.In the hidden layer, the transfer function is often used with a logistic function.
Consequently, the ANN transports nonlinear functional mapping from the inputs I 1 , I 2 , … , I a to the output O; namely, where w is a vector of model parameters and f is a function determined by the ANN structure and connection weights.
For the ANN structure, the nodes in the input layers receive input signals from an external source and the nodes in the output layers generate the target output signals.The output of each neuron in the input layer is the same as the input to that neuron.The hidden layers adjust the weights of those inputs until the neural network's error is minimized.For ANN processing, backpropagation is one method for computing the error contribution of each neuron after a batch of data is processed.This method can be used to adjust the weight of each neuron, thereby completing the learning process for that case.For each neuron j in the hidden layer and each neuron k in the output layer, the net inputs are given by 11 where i j is a neuron in the previous layer, o i o j is the output of node i j , and w ji w kj is the connection weight from neuron i j to neuron j k .The neuron outputs are given by where net j net k is the input signal from the external source to node j k in the input layer and θ j θ k is a bias.The generalized delta rule is the conventional technique used to derive the connection weights in the network.First, a set of random numbers is assigned to the connection weights.Then for a presentation of a pattern p with a target output vector t p = t p1 , t p2 , … , t pM T , the sum of squared errors to be minimized is given by where M is the number of output nodes.By minimizing the error E p using the gradient descent technique, the connection weights can be updated by applying the following equations: where for output nodes, and for hidden nodes, Note that the learning rate affects the network's generalization ability and learning speed to a great extent.

Support Vector Regression.
SVR is an adaptation of the support vector machine (SVM), one of the most powerful techniques in CI [24,25].The basic concept of SVR is to find a model function f x to represent the relationship between the features and target.The modeling of SVR can be described as follows.Suppose where x and y represent the model input and output vectors, w is the weight vector, b is a constant, Φ x denotes a mapping function in the feature space, and w ⋅ Φ x describes the dot production in the feature space F. Typical regression modeling estimates the coefficients by minimizing the square error, which can be considered as an empirical risk based on loss function.The ε-insensitivity loss function was introduced [25] and can be described as follows: where y is the model output and ε is the region of ε-insensitivity.When the predicted value falls into the band area, the loss is zero.However, when the predicted value falls 4 Complexity outside the band area, the loss is defined as the difference between the predicted value and margin.When both empirical risk and structure risk are considered, the SVR can be designed to minimize the following quadratic programming problem [25]: i is the empirical risk; 1/2 w 2 is the structure risk preventing overlearning and lack of applied universality; and C is a modifying coefficient representing the trade-off between empirical and structure risks.With an appropriate modifying coefficient C, band area width ε, and kernel function, the optimum value of each parameter can be solved by the Lagrange procedure.In addition, the general form of the SVR-based regression function can be expressed as follows [25]: where r i and r * i are the Lagrangian multipliers and satisfy the equality r i r * i = 0.In addition, K x, x i is the kernel function.The values of the kernel are equal to the inner product of two vectors, x i and x j , in the feature space Φ x i and Φ x j ; that is, K x i , x j = Φ x i Φ x j .Because the radial basis function (RBF) is the most commonly used kernel function [26], we employ it for the experimental study.The RBF is written as follows: where σ is the width of the RBF.

Multivariate Adaptive Regression
Splines.MARS was developed for solving regression-type problems [27].It is a nonparametric regression procedure that makes no assumption about the functional relationship between the response and explanatory variables.MARS modeling is based on a divide-and-conquer strategy where training datasets are partitioned into separate regions, each of which is assigned to its own regression equation.Consequently, MARS is appropriate and effective for problems with more than two input variables.Particularly, MARS can select important explanatory variables and relationships for complex data structures that often hide in higher dimensional data series.
A MARS model can be described as follows [27]: where β 0 and β m are the parameters, M is the number of basis functions (BFs), J m is the number of knots, S jm takes on values of either 1 or −1 and indicates the right or left sense of the associated step function, ν j, m is the label of the independent variable, and l jm is the knot location.The optimal MARS model is determined in a two-step procedure.In the first step, a model is grown by adding BFs until an overly large model is obtained.The BFs are then deleted in the order of least contribution to most contribution by using the generalized cross-validation (GCV) criterion in the second step.The measure of variable importance is provided by observing the decrease in the computed GCV values when a variable is removed from the model.The GCV is described as follows: where N is the number of observations and C M is the cost penalty measures of a model containing M BF.

Food Crop Price Forecasting
3.1.Datasets and Forecasting Criteria.In this study, the price data of the three most important food crops were collected for the period of January 1990 to September 2015 from the web sites of IndexMundi [28][29][30][31].The datasets consist of 309 records for each food crop price.Among them, the first 297 data records were used to develop the different forecasting models, while the remaining 12 data records were used to perform model validation.This study has presented an experiment based on a larger population of data and has foreseen obtaining accurate forecasts of food crop price, with the forecast horizon for the out-of-sample forecasting experiment of 12. Good prediction of prices may assist in planning agricultural supply, but the investigated data does not contain any information on population.The forecasting capability of the models was compared using three forecasting accuracy criteria, including mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE).These forecasting measurements are expressed as follows: 24 where e t is the value of the residual at time t and Y t is the observation at time t.

Complexity
Obviously, it can be noted that the lower the MAPE, M AE, and RMSE values, the closer the forecasted values to the actual values.

Forecast Modeling of Rice Prices.
Figure 1 shows the original time plot for the rice price data series.This study uses an SAS package to run the ARIMA modeling.Table 1 shows the parameter estimates, and all the estimated parameters are significantly different from zero.In Table 1, the notations "AR1,1," "AR1,2," "AR1,3," and "AR1,4" correspond to the parameters ϕ 1 , ϕ 2 , ϕ 7 , and ϕ 13 of the AR model.The ARIMA model presented in Table 1 was a subset model.Since the parameters, ϕ 1 , ϕ 2 , ϕ 7 , and ϕ 13 , are significantly different from zero, this study refers to the model in Table 1 as an AR (1, 2, 7, and 13) model.
In addition, Table 2 shows the Ljung-Box test results.As mentioned earlier, the Ljung-Box statistic can be used to check the fit of an ARIMA model.A simple method to derive the hypothesis testing result is described as follows.We should reject the model under consideration by setting the type I error equal to α if and only if the p value is less than α.Generally, α is set equal to 0.05.The SAS package computes the Ljung-Box statistics Q and their associated p values for values of K equal to 6, 12, 18, 24, 30, and 36.In Table 2, the first column contains the values of K, the second column contains the values of Q, and the fourth column contains the associated p values.By studying Table 2, we observe that all the associated p values are greater than α.Accordingly, a conclusion can be made concerning the fit of the underlying ARIMA model.Thus, the model expressed in (25) is suitable for modeling rice prices. where For ANN designs, there is no fixed mode to decide the number of hidden nodes.Too few hidden nodes confine the network generalization capability, whereas too many hidden nodes may lead to overtraining difficulties.Therefore, in this study, we consider the hidden nodes to be set from (2n − 2) to (2n + 2) if n ≦ 8, where n represents the number of input variables.When n > 8, we consider the hidden nodes to be set from (n − 2) to (n + 2).In this study, we denote the term {n i -n h -n o } for ANN parameter settings, where n i is the number of neurons in the input layer, n h stands for the number of neurons in the hidden layer, and n o represents the number of neurons in the output layer, respectively.Additionally, this study employs the learning rate for all ANN models at the default value (i.e., 0.01) to ensure consistency [32].The network topology with the minimum MAPE is considered as the optimal network topology, due to the fact that the MAPE is one of the most important performance measurements of forecasting capability.
In this study, because there are no explanatory variables available, the inputs of ANN modeling for food crop price forecasting becomes unfeasible.Consequently, this study uses self-predictor variables to serve as the inputs for ANN.In here, the self-predictor variable is defined as the lags or past values of a variable.This study proposes two input designs for the modeling of CI techniques.Because seasonal effects may influence food crop price forecasts, the first design rationally selects the preceding 12 observations to forecast food crop prices at time t.Accordingly, the first design used 12 self-predictor variables (i.e., y t−1 , y t−2 , y t−3 , … , and y t−12 ) to serve as input variables and employed a single variable (i.e., y t ) to serve as the output for ANN modeling.That is, the first design considered 12 input nodes and one output node for the ANN structures.This design is denoted as ANN 1 .
The second input design for CI technique modeling is performed using FSM.For this proposed modeling, this study incorporated FSMs with ANN, SVR, and MARS to develop forecasting models.The proposed input design used ARIMA as an FSM to extract important self-predictor  To forecast the rice prices, the two ANN models contain twelve and four input nodes for the first and second designs, respectively.The hidden nodes were selected as 10, 11, 12, 13, and 14 for the first design and 6, 7, 8, 9, and 10 for the second design.After performing the ANN modeling, the ANN 1 design showed that the {12-14-1} structure provided the best results and minimum testing MAPE for rice prices.For the ARIMA-ANN design, the best structure was {4-7-1}.Three forecasting measurements for different settings of the ANN topologies for the two designs are shown in Table 3.
For modeling SVR on rice prices, this study adopted the input structure as ANN modeling.SVR modeling employed two designs for the input variables.The first design used 12 variables, y t−1 , y t−2 , y t−3 , … , and y t−12 as the input variables and y t as the output variable.The second design used four self-predictor variables, Z t−1 , Z t−2 , Z t−7 , and Z t−13 to serve as the input variables and Z t as the output variable.The first and second SVR designs are denoted as SVR 1 and ARIMA-SVR, respectively.Because the parameter settings of C and ε often affect the performance of SVR modeling, an analytic parameter selection approach and grid search are used in this study [26].Accordingly, we denote the term {ε, C} SVR as the best SVR parameter settings.After performing the SVR modeling, the SVR 1 design reported that the parameter settings of {2 −4 , 2 11 } SVR provided MAPE = 6.912% for rice price forecasting.For the second design, ARIMA-SVR, the parameter settings of {2 −5 , 2 9 } SVR provided MAPE = 2.057%.
For MARS modeling, this study also adopted the same input structure of ANN or SVR modeling.The first and second MARS designs are denoted as MARS 1 and ARIMA-MARS, respectively.Table 4 lists the variable selection results and BFs after modeling rice price data using MARS 1 .In Table 4, the first column contains the variables that should be included in the model, the second column contains the relative importance of the variables that are listed in the first column, the third column contains the various BFs, and the final column contains the estimated coefficient values for the BFs that are listed in the third column.Based on the results in Table 4, it can be seen that three input variables (i.e., y t−12 , y t−11 , and y t−10 ) played important roles in building the MARS forecasting models.Their relative importance values (%) are 100, 13.1, and 12.1.The construction of the BFs can be expressed as follows.Consider the variable y t−12 as an example.We have two BFs (i.e., BF 1 and BF 2 ) to be considered.The observed values of BF 1 are determined by the values larger than 0 or y t−12 − 367.The corresponding coefficient is estimated to be 0.985.The observed values of BF 2 are determined by the values larger than 0 or 552 09 − y t−12 .The corresponding coefficient is estimated to be −1.258.Accordingly, the MARS forecasting model for rice prices can be expressed as follows:  For ARIMA-MARS modeling, the variable selection results and BFs are summarized in Table 5.In addition, it can be seen that three input variables (i.e., Z t−13 , Z t−7 , and Z t−1 ) played important roles in building the ARIMA-MARS  2 displays the original time plot for wheat prices.After modeling the wheat price data series with the ARIMA procedure, the parameter estimates are shown in Table 6.In Table 6, the notations "MA1,1," "MA1,2," "MA1,3," and "MA1,4" correspond to the parameters θ 1 , θ 7 , θ 12 , and θ 13 of the MA model.where In addition, a t−j stands for the white noise at time t − j.
For ANN modeling on wheat prices, this study also used two ANN designs, same as the structures for modeling rice prices.The first design used 12 self-predictor variables (i.e., y t−1 , y t−2 , y t−3 , … , and y t−12 ) to serve as input variables and a single variable (i.e., y t ) to serve as the output variable.The second design considers the ARIMA as an FSM to extract important self-predictor variables, which serve as the inputs for the ANN models.The ARIMA model for wheat price forecasting is obtained in (30) and indicates that a t−1 , a t−7 , a t−12 , and a t−13 will influence Z t . Because , where Z ∧ t−1 is the forecast at time t − 1, we observe that Z t−1 will influence Z t .Therefore, Z t−1 should be selected as an input variable as long as the component a t−1 is in the MA model.Following the same logic, because the model (e.i., (30)) contains a t−1 , a t−7 , a t−12 , and a t−13 , we conclude that Z t−1 , Z t−7 , Z t−12 , and Z t−13 will influence Z t .Accordingly, we have selected Z t−1 , Z t−7 , Z t−12 , and Z t−13 to serve as input variables for the second proposed ANN modeling design for the wheat price data series.
After performing ANN modeling on wheat price data, Table 8 shows the corresponding forecast validity measures for different settings of ANN topologies for the two designs.As shown in Table 8, we observe that the ANN 1 design with the {12-11-1} structure has the smallest MAPE.For the second, or ARIMA-ANN design, the {4-8-1} structure was associated with the smallest MAPE.
For modeling SVR on wheat prices, this study adopted the input structure as ANN modeling.The first design employed 12 variables, y t−1 , y t−2 , y t−3 , … , and y t−12 as the input variables and y t as the output variable.The second design employed four self-predictor variables, Z t−1 , Z t−7 , Z t−12 , and Z t−13 to serve as the input variables and Z t as the output variable.After performing SVR modeling on the wheat price data series, we obtained the parameter settings of {2 −8 , 2 10 } SVR and associated MAPE = 4.661% for the first design.In addition, we had parameter settings of {2 −7 , 2 −1 } SVR and associated MAPE = 4.347% for the second design.
For MARS modeling with the first design, Table 9 presents the variable selection results and BFs.As shown in Table 9, we notice that three input variables (i.e., y t−12 , y t−6 , and y t−11 ) played important roles in building the MARS forecasting models.In addition, the MARS forecasting model for wheat prices can be expressed as follows: where For ANN modeling on corn prices, while the first design used 12 input variables, the second design used six input variables extracted by using the FSM of ARIMA modeling.After performing ANN modeling, Table 13 presents the corresponding forecast validity measures for different ANN topology settings for the two designs.From Table 13, we observe that the ANN 1 design with the {12-14-1} structure has the smallest MAPE.For the second design, the {6-13-1} structure has the smallest MAPE.
In the modeling of SVR for the corn price data series, this study adopted the same input structure as ANN modeling.The first design employed 12 variables (i.e., y t−1 , y t−2 , y t−3 , … , and y t−12 ) as the input variables and y t as the output variable.The second design employed six self-predictor variables (i.e., Z t−1 , Z t−3 , Z t−6 , Z t−12 , Z t−14 , and Z t−24 ), which were selected by using the FSM of ARIMA modeling to serve as the input variables and Z t as the output variable.After performing SVR modeling, we obtained the parameter settings of {2 −9 , 2 7 } SVR and associated MAPE = 4.060% for the first design.In addition, we obtained parameter settings of {2 −11 , 2 11 } SVR and associated MAPE = 4.215% for the second design.
For MARS modeling on corn price data with the first design, Table 14 illustrates the variable selection results and BFs.As shown in Table 14, we notice that two variables (i.e., y t−12 and y t−11 ) play important roles in building the MARS forecasting models.Consequently, the MARS forecasting model for corn prices can be described as follows: For MARS modeling with the second design,

Forecasting Comparison
Several forecasting models were proposed to forecast the three major food crop prices in this study.These models include four single models (i.e., ARIMA, ANN In comparison to the forecasting performance of the first design, or single models, in Tables 16-18, we observe that the three CI models demonstrated better performance than the ARIMA models.The possible reason may be that ARIMA modeling is difficult for capturing nonlinear features in the food crop price data series.By reviewing these tables, we found that there is no best model for food crop price forecasting.For example, while the ANN 1 model seems to possess better forecasting accuracy for rice price forecasting, the SVR 1 model seems to have better forecasting accuracy for wheat and corn price forecasting.
It can be clearly seen in  that the proposed integrated models possess better forecasting accuracy than the single models for most cases.For example, by reviewing Table 16, the proposed integrated ARIMA-ANN model is associated with MAE, RMSE, and MAPE values of 10.365%, 12.037%, and 2.650%, respectively, for rice price forecasting.These three performance values are smaller than the corresponding performance values for any one of the four single models.Take the proposed integrated ARIMA-SVR model as another example.By reviewing Table 17, the ARIMA-SVR model is associated with MAE, RMSE, and M APE values of 9.779%, 11.371%, and 4.347%, respectively, for wheat price forecasting.These three performance values are also smaller than the corresponding performance values for any one of the four single models in Table 17.Thus, our proposed models, which were integrated with FSM, provide more accurate forecasting results than the single models.
In addition, Table 19 presents a comparison with respect to the overall percentage improvements (PIs) of forecasting accuracy for the proposed integrated models over the single models.The PIs of the MAE, RMSE, and MAPE are defined as follows: From Table 19, it is obvious that positive PIs can be achieved by using the proposed integrated models.For example, in the rice price forecasting, the MAPE PI of the proposed ARIMA-ANN model over the two single models, ARIMA and ANN S , were 376.038% and 34.226%, respectively.The corresponding MAE PI and RMSE PI were 369.822% and 34.250% and 340.945% and 27.273%, respectively.Apart from the rice price forecasting, most of the MAE PI , RMSE PI , and MAPE PI are positive large numbers for wheat and corn price forecasting.Accordingly, considerable forecasting accuracy improvements are achieved through the use of the proposed integrated models.Furthermore, we also found five negative PIs in Table 19.
For wheat price forecasting, the RMSE PI of the ARIMA-MARS over MARS 1 was −1.029%.For corn price forecasting, the RMSE PI of the ARIMA-ANN over ANN 1 was −10.973%.The three remaining negative values occurred in corn price forecasting when using the ARIMA-SVR model, and the M AE PI , RMSE PI , and MAPE PI were −2.601%, −5.525%, and −3.677%, respectively.A negative PI value implies that the forecasting performance declined.However, the magnitudes of those five negative values are small, and they have minor effects on forecasting performance.Additionally, the associated MAPE PI for those five negative PIs were only one negative.That is, if a forecasting model has the minimum MAPE, this does not mean it has the minimum MAE or RMSE.

Discussion
While the aforementioned description focuses on accuracy comparisons for each individual food crop price forecast, the following discussion provides the comparison regarding overall performance for food crop price forecasting as a whole.
Table 20 lists the average PIs of the proposed integrated models over their common element, the ARIMA models.Figure 4 presents the average PIs of MAE, RMSE, and MAP E by employing the proposed integrated models over single ARIMA models.As shown in Figure 4, we notice that considerable accuracy improvements can be reached by using our proposed approaches.Additionally, Table 21 reports the average PIs of MAE, RMSE, and MAPE by using the proposed integrated models over another element, other than ARIMA element.Figure 5 displays the satisfied average PIs by using the proposed integrated models.
In addition, Figure 6 displays the forecasts of rice prices, obtained by using ARIMA, ANN 1 , and the proposed ARIMA-ANN models, for the twelve testing records.One can see that the forecasts of the ARIMA-ANN model are closest to the actual observations.The forecasts of the basic ARIMA model are relatively far away from the actual observations.Additionally, the prediction accuracy of ANN 1 is better than that of the ARIMA model.Regarding the ANN 1 design, we observe that the input vectors I 1 , I 2 , … , I 12 are equivalent to y t−1 , y t−2 , y t−3 , … , y t−12 and that the output O  11 Complexity can be obtained by performing a nonlinear functional mapping, as shown in (10).Regarding the ARIMA-ANN mechanism, we observe that the input vectors I 1 , I 2 , I 3 , and I 4 are characterized by Z t−1 , Z t−2 , Z t−7 , and Z t−13 and that the output O can be obtained by performing a nonlinear functional mapping, as shown in (10).
Figure 7 presents the forecasts of rice prices, obtained by using ARIMA, SVR1, and the proposed ARIMA-SVR models, for the twelve testing records.Regarding the SVR 1 design, we observe that the input vectors I 1 , I 2 , … , I 12 are characterized by y t−1 , y t−2 , y t−3 , … , y t−12 and that the output O can be obtained by performing a nonlinear functional mapping, as shown in (17).Regarding the ARIMA-SVR mechanism, we observe that the input vectors I 1 , I 2 , I 3 , and I 4 are characterized by Z t−1 , Z t−2 , Z t−7 , and Z t−13 and that the output O can be obtained by performing a nonlinear functional mapping, as shown in (17).
Figure 8 shows the forecasts of rice prices, obtained by using ARIMA, MARS 1 , and the proposed ARIMA-MARS models, for the twelve testing records.Regarding MARS 1 modeling, we observe that the input vectors I 1 , I 2 , … , I 12 are characterized by y t−1 , y t−2 , y t−3 , … , y t−12 and that the output O can be obtained by performing a nonlinear functional mapping, as shown in (22).Regarding the ARIMA-MARS design, we observe that the input vectors I 1 , I 2 , I 3 , and I 4 are characterized by Z t−1 , Z t−2 , Z t−7 , and Z t−13 and that the output O can be obtained by performing a nonlinear functional mapping, as shown in (22).By observing Figures 7 and 8, we can see that the forecasts of the ARIMA-SVR and ARIMA-MARS models are closest to the actual observations.Both figures illustrate that the forecasts of the ARIMA models are relatively far away from the actual observations.Similar implications were obtained for the cases of wheat and corn price predictions.That is, the forecasts of the proposed integrated models are closer to the actual observations.The forecasts of the ARIMA model are far away from the actual observations.These findings can be observed in Figures 9-14.
In this study, we collected 309 records to build various models for the prediction of the prices of rice, wheat, and corn.One assumption for building the ARIMA models is that their structures will remain unchanged over time.Therefore, even if more sample data become available, there is no need to rebuild the ARIMA model for our proposed integrated

Conclusion
Humans survive based on food crops.We eat significant quantities of rice, wheat, corn, and other simple crops to maintain energy and good health.Accordingly, food crop price forecasting is very important and has drawn considerable attention in recent decades.14 Complexity obtain the future values of these variables.Therefore, we proposed integrated ARIMA-ANN, ARIMA-SVR, and ARIMA-MARS models in order to perform forecasting for three important food crop prices.The role of the ARIMA element is as an FSM that can capture important self-predictor variables.Rather than using unavailable explanatory variables, the self-predictor variables serve as the inputs for ANN, SVR, and MARS modeling.The experimental results reveal that the proposed integrated models are desirable alternatives for food crop price forecasting, because they all have excellent forecasting performance.Most importantly, the main contribution of the proposed models is their ability to provide predictions of food crop prices without requiring extensive effort to obtain the future values of explanatory variables.One limitation of our models is that the techniques for identifying the correct ARIMA model from the variety of possible models may be unintuitive and computationally expensive.However, the modeling in this study can be used as a guideline for developing forecasting models for other       15 Complexity food crop price data series.Additionally, because MARS is effective for selecting important variables for predicting response variables, attempting to extend the FSM of MARS modeling may be a valuable future research direction.Finally, the integrated models may also be combined with other CI techniques, such as extreme learning machines, time delay neural networks, or artificial immune systems, which may be worthy of investigation in the future.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.Complexity

Figure 1 :
Figure 1: Time plot for rice price (unit: US dollars per metric ton).

Figure 2 :
Figure 2: Time plot for wheat price (unit: US dollars per metric ton).

Figure 3 :
Figure 3: Time plot for corn price (unit: US dollars per metric ton).

Figure 5 :
Figure 5: Average PIs by using the proposed integrated models over single ANN, SVR, and MARS models.

ForecastsFigure 6 :
Figure 6: Forecasts of rice prices versus actual observations for the 12 testing records (with the use of ARIMA, ANN 1 , and the proposed ARIMA-ANN models).

Figure 7 :
Figure 7: Forecasts of rice prices versus actual observations for the 12 testing records (with the use of ARIMA, SVR 1 , and the proposed ARIMA-SVR models).

Figure 8 :
Figure 8: Forecasts of rice prices versus actual observations for the 12 testing records (with the use of ARIMA, MARS 1 , and the proposed ARIMA-MARS models).

ForecastsFigure 9 :
Figure 9: Forecasts of wheat prices versus actual observations for the 12 testing records (with the use of ARIMA, ANN 1 , and the proposed ARIMA-ANN models).

Figure 10 :
Figure 10: Forecasts of wheat prices versus actual observations for the 12 testing records (with the use of ARIMA, SVR 1 , and the proposed ARIMA-SVR models).

Figure 11 :
Figure 11: Forecasts of wheat prices versus actual observations for the 12 testing records (with the use of ARIMA, MARS 1 , and the proposed ARIMA-MARS models).

Figure 12 :
Figure 12: Forecasts of corn prices versus actual observations for the 12 testing records (with the use of ARIMA, ANN 1 , and the proposed ARIMA-ANN models).

Figure 13 :
Figure 13: Forecasts of corn prices versus actual observations for the 12 testing records (with the use of ARIMA, SVR 1 , and the proposed ARIMA-SVR models).

Figure 14 :
Figure 14: Forecasts of corn prices versus actual observations for the 12 testing records (with the use of ARIMA, MARS 1 , and the proposed ARIMA-MARS models).

1
where B is the backward shift operator, defined as B j y t = y t−j ; L is the number of seasons in a year, and L = 12 for monthly data;d is the values of nonseasonal difference transformations; D is the values of seasonal difference transformations; Z t is the working series, which are stationary after fitting a suitable difference transformation from original time series Y t ;

Table 1 :
Parameter estimates for rice prices.
(25)ables, which serve as the inputs for the ANN, SVR, and MARS models.Because the ARIMA model for rice price forecasting is obtained in(25), it indicates that Z t−1 , Z t−2 , Z t−7 , and Z t−13 will influence Z t .Consequently, this study selected Z t−1 , Z t−2 , Z t−7 , and Z t−13 to serve as self-predictor, or input, variables for the proposed integrated modeling on the rice price data series.In this study, this second input design is denoted as ARIMA-ANN.

Table 2 :
Ljung-Box test results for ARIMA modeling of rice prices.

Table 3 :
Forecasting measurements for two ANN designs (rice price forecasting).

Table 4 :
The results of MARS 1 for rice price forecasting.

Table 5 :
The results of ARIMA-MARS for rice price forecasting., ϕ 3 , ϕ 6 , ϕ 12 , ϕ 14 , and ϕ 24 of the AR model.In Table11, we still contain the parameters of "AR1,2" and "AR1,4" in the model, although the absolute t values are less than or equal to 2 (i.e., the typical type I error is chosen as 0.05).The main reason is that those two parameters are important, and the Ljung-Box statistics indicate that the model is not appropriate if those two parameters are not involved in the underlying ARIMA model.Additionally, Table 12 demonstrates the Ljung-Box test results and a conclusion is made on the appropriateness of the ARIMA model.Thus, (34) is a suitable ARIMA model for modeling the corn price data series.
Table 15 lists the variable selection results and BFs.The MARS forecasting model for the corn prices is expressed as follows:

Table 6 :
Parameter estimates for wheat prices.

Table 7 :
Ljung-Box test results for ARIMA modeling of wheat prices.

Table 8 :
Forecast validity measures for two ANN designs (rice price forecasting).

Table 9 :
The results of MARS 1 for wheat price forecasting.

Table 10 :
The results of ARIMA-MARS for wheat price forecasting.

Table 11 :
Parameter estimates for corn prices.

Table 12 :
Ljung-Box test results for ARIMA modeling of corn prices.

Table 13 :
Forecast validity measures for two ANN designs (corn price forecasting).

Table 14 :
The results of MARS 1 for corn price forecasting.

Table 15 :
The results of ARIMA-MARS for corn price forecasting.
sample data becomes available, we can still use the same input variables for our proposed ARIMA-ANN, ARIMA-SVR, and ARIMA-MARS models.Some feature selection modeling processes may need to be performed repeatedly when more sample data become available, which may be a time-consuming task.It is true that, after performing feature selection with the ARIMA, we can increase the

Table 16 :
Performance comparison of various models for rice price forecasting.

Table 17 :
Performance comparison of various models for wheat price forecasting.

Table 18 :
Performance comparison of various models for corn price forecasting.

Table 19 :
Improvement of the proposed integrated models over single models.

Table 20 :
PIs of the whole food crop prices by using the proposed integrated models over the single ARIMA models.

Table 21 :
PIs of the whole food crop prices by using the proposed integrated models over another element, other than ARIMA element.