Forecasting Rice Productivity and Production of Odisha, India, Using Autoregressive Integrated Moving Average Models

Forecasting of rice area, production, and productivity of Odisha was made from the historical data of 1950-51 to 2008-09 by using univariate autoregressive integrated moving average (ARIMA) models and was compared with the forecasted all Indian data. The autoregressive (p) and moving average (q) parameters were identified based on the significant spikes in the plots of partial autocorrelation function (PACF) and autocorrelation function (ACF) of the different time series. ARIMA (2, 1, 0) model was found suitable for all Indian rice productivity and production, whereas ARIMA (1, 1, 1) was best fitted for forecasting of rice productivity and production in Odisha. Prediction was made for the immediate next three years, that is, 2007-08, 2008-09, and 2009-10, using the best fitted ARIMA models based on minimum value of the selection criterion, that is, Akaike information criteria (AIC) and Schwarz-Bayesian information criteria (SBC).The performances of models were validated by comparing with percentage deviation from the actual values and mean absolute percent error (MAPE), which was found to be 0.61 and 2.99% for the area under rice in Odisha and India, respectively. Similarly for prediction of rice production and productivity in Odisha and India, the MAPE was found to be less than 6%.


Introduction
Rice is one of the most important cereal crops of India occupying an area of 41.92 million hectare with an annual production of 89.09 million tonnes with an average productivity of 2.13 t ha −1 (2009-10) (http://www.agricoop.nic.in/).It plays a vital role in the national food security and would continue to remain so because of its wider adaptability to grow under diverse ecosystems.Rice contributes 40.8% of total food grain and remains the principal source of livelihood for more than 58% of the population.With the stabilization of area under rice at around 42 million hectare, plateauing, and/or declining productivity trend, especially in the Northern and Southern zones and shrinking natural resource bases, the only opportunities for sustaining the current level of sufficiency are seen in the vast underexploited potential of rainfed Eastern India [1].
A proper trend analysis and forecast of production of such an important crop in the potential Eastern Region is having significance on many accounts.Critical analysis of production and productivity is a prerequisite for proper knowledge base on the ecology and appropriate research/development efforts for harvesting maximum possible potential.Trend analysis has been attempted for crops like papaya and garlic by several authors [2][3][4].An unexpected decrease in production reduces marketable surplus and income of the farmers and leads to price rise.Similarly, an increase in production can lead to a sharp decrease in prices and has adverse effect on farmers' incomes.Impact on price of an essential commodity has a significant role in determining the inflation rate, wages, salaries, and various policies in an economy.The proper forecast would pave way for appropriate surplus and deficit management to stabilize the price and ensure profits for the farmers.
Several techniques like simulation modelling and remote sensing are largely being used for forecasting of the crop yield and acreage.But sometimes, forecasting is needed much before the crop harvest or even before the crop planting.This can be achieved only by modeling the past data and getting the predictions.Autoregressive integrated moving average (ARIMA) has been used for model building based on the past data and predictions are made.ARIMA models have been developed to forecast the cultivable area, production, and productivity of various crops of Tamil Nadu [5,6] and wheat production in Pakistan [7] and Canada [8].Univariate forecasting of state level agricultural production was also made by various authors using ARIMA models [9][10][11][12].Keeping the above requirement in view, the present study was carried out to (i) analyze the trends of production, productivity, and area under rice in Odisha, an Eastern Indian state, and compare with all Indian scenarios and (ii) forecast and validate the rice area, production, and productivity using ARIMA models.

Trend Analysis.
The time series data pertaining to rice area, productivity, and production in Odisha as well as India were analyzed using the Mann-Kendall trend test for assessing the trend present in the data.Initially, this test was used by Mann [13] and Kendall [14] and subsequently derived the test statistic distribution [15,16].This hypothesis test is a nonparametric, rank-based method for evaluating the presence of trends in time series data.The data are ranked according to time and then each data point is successively treated as a reference data point and is compared to all data points that follow in time.Compared with parametric statistical tests, nonparametric tests are thought to be more suitable for nonnormally distributed data [17].Since the time series data used in the study is mostly nonnormally distributed as evident from the skewness and kurtosis values given in Table 1, the nonparametric tests were used in the study.
The Mann-Kendall test statistic is given by where   and   are the sequential data values,  is the data set record length, and (2) The Mann-Kendall test has two parameters that are of importance to the trend detection.These parameters are the significance level that indicates the trend's strength and the slope magnitude estimate which indicates the direction as well as the magnitude of the trend.
For independent, identically distributed random variables with no tied data values, we have () = 0; When some data value are tied, the correction to Var() is where   denotes the number of ties of extent .For  larger than 10, the test statistic for  = 0  + 1 [var()] 0.5 for  < 0; (5)   follows the standard normal distribution [14].The magnitude of trend slopes can be also calculated (Sen, 1968).Sen's estimate for slope is associated with the Mann-Kendall test as follows: where   and   are considered data values at time  and  ( > ), correspondingly.The median of these  values of   is represented as Sen's estimator of slope which is given as A positive value of  indicates an upward trend, whereas a negative value represents a downward trend.

ARIMA Model.
The ARIMA model analyzes and forecasts equally spaced univariate time series data.An ARIMA model predicts a value in a response time series as a linear combination of its own past values.The ARIMA approach was first popularized by Box and Jenkins [18], and ARIMA models are often referred to as Box-Jenkins models.In this study, the analysis performed by ARIMA is divided into three stages [19].

Notation for Pure ARIMA Models.
Consider where  indexes time,   is the response series   or a difference of the response series,  is the mean term,  is the backshift operator, that is,   =  −1 , Φ() is the autoregressive operator, represented as a polynomial in the backshift operator: Candidate ARIMA models were identified by finding the initial values for the orders of nonseasonal parameters "" and "." They were obtained by looking for significant spikes in autocorrelation and partial autocorrelation functions.At the identification stage, one or more models were tentatively chosen which seem to provide statistically adequate representations of the available data.Then precise estimates of parameters of the model were obtained by least squares.
Estimation Stage.ARIMA models are fitted and accuracy of the model was tested on the basis of diagnostics statistics.
Diagnostic Checking.The best model was selected based on the following diagnostics.
is forecasted variable,  is actual variable, and  is number of variables.SAS 9.2 software (SAS Institute, Inc., Cary, NC) was used for time series analysis and developing ARIMA models and forecasting.

Results and Discussion
3.1.Trend Analysis.Descriptive statistics for the time series data of rice area, production, and productivity for both Odisha and India is given in Table 1.The time series data is plotted in Figure 1.The time series data for rice area, production, and productivity are nonnormal which can be assessed from their probability density plot and values of skewness and kurtosis.Hence nonparametric Mann-Kendall test for trend analysis was performed to test the significance of trend.As evident from the values of Mann-Kendall's  statistics and Sen's slope estimate (), the time series data for all the parameters selected for analysis showed significant and positive trend.The Mann-Kendall  value as well as magnitude of slope indicated that the rate of increase was less for area, production, and productivity in Odisha as compared to all Indian scenarios.
The trend analysis of long term time series data (1950-51 to 2006-07) for the area under rice was found to be positive with a  value of 0.01 and 0.26 for both Odisha and India, respectively.The low  value can be explained by the fact that the area under rice remained more or less constant for the last 10 years due to competition from urbanization and industrialization.Area under rice in India was 43.45 million hectare and 43.81 million hectare, respectively, for the years 1997-98 and 2006-07, while during the same period area under rice in Odisha reduced to 4.45 million hectare from 4.50 million hectare.It is evident that there is plateauing in the area under rice in the last decade and the only option available to increase the rice production is vertical expansion.
Trend analysis also showed a considerable increase in all Indian average productivity of rice from 668 kg ha −1 in 1950-51 to 2131 kg ha −1 in 2006-07 and during the same period, rice productivity in Odisha increased from 520 kg ha −1 to 1557 kg ha −1 .The rate of increase of productivity in Odisha is less than all Indian average as evident from Sen's slope estimate of 25.32 kg ha −1 year −1 and 16.39 kg ha −1 year −1 for India and Odisha, respectively, indicating an untapped growth potential for rice in Odisha.In order to tap this potential Government of India has launched a programme "Bringing Green Revolution in Eastern India" since 2010-11.

Building ARIMA Models.
The autoregressive () and moving average () parameters were identified based on the significant spikes in the plots of PACF and ACF of the different time series.While identifying the best fit ARIMA models, appropriate values of , , and  were chosen corresponding to minimum value of the selection criterion, that is, AIC and SBC.The appropriate best fit models for rice area, production, and productivity of Odisha and India along with AIC and SBC are given in Table 2.The estimates of the autoregressive and moving average parameters along with the constant term are presented in Table 3.It is clear from the "" value that all the parameters estimates were significant which is an essential criteria for the ARIMA models.It is evident from Figure 2(a) that ACF of area under rice for India has a significant spike at lag 1 and PACF declines gradually (Figure 3(a)), which indicated a moving average model of first order.Similarly significant spikes at lag 2 for the PACF of rice productivity and production of India indicate a second order autoregressive model of ARIMA (2, 1, 0), which was found to be a best fit model.Significant spike at lag 2 of PACF (Figure 3(d)) and gradually declining ACF (Figure 2(d)) for area under rice for Odisha indicated a pure autoregressive model of order 2 and ARIMA (2, 0, 0) and was found to be best fitted.Significant spike at lag 1 in Figures 2(e) and 2(f) and Figures 3(e) and 3(f), for both ACF and PACF, indicated a first order autoregressive as well as moving average model for both productivity and production of Odisha.The ACF and PACF were plotted for residuals of the fitted model and were lying within the limits, which showed that ARIMA model fitted well.and 2190.89kg ha −1 with 3.67, 0.20, and −3.10% deviations in prediction, respectively (Table 4).The total forecasted production of rice in Odisha was 6.77, 6.95, and 6.99 million tonnes for the years 2007-08, 2008-09, and 2009-10 with prediction deviation of 11.62, −2.06, and −1.01%, respectively.This was due to high average productivity in 2007-08.Similarly the average production deviation for India was 4.62, 3.95, and −7.67%, respectively.The % error in prediction for area under rice varied from 0.07 to −5.80 and −0.22 to −1.60 for India and Odisha, respectively.The % deviation in prediction for rice productivity was 0.20 to 3.67 and −1.02 to 12.56% for India and Odisha, respectively.The % deviation in prediction for production of rice varied from 3.95 to −7.67 and −1.01 to 11.62% for India and Odisha, respectively.The MAPE was within 6% for all the forecasted parameters for Odisha as well as for India.

Conclusions
The trend analysis of the rice data showed an increasing productivity and production trend for both Odisha and India; the rate of increase was less in Odisha than all Indian average.This may be attributed to underexploitation of the potential of the state due to low input in agricultural operations and other biotic and abiotic factors.To bridge the gap between existing and potential productivity, rice varieties suitable to different ecologies can be introduced in farmer's field along with the nutrient and agronomic management practices.Based on the forecasting and validation results, it may be concluded that ARIMA model could be successfully used for forecasting rice area, production, and productivity of Odisha as well as India for the immediate subsequent years.

Highlights
(i) Trend analysis of rice area, production, and productivity of Odisha vis a vis India from the historical data of 1950-51 to 2008-09 is done.
(ii) Forecasting of rice area, production, and productivity of Odisha vis a vis India was made from the historical data using ARIMA models.

(
ii) Insignificance of Autocorrelations for Residuals.If a model is an adequate representation of a time series, it should capture all the correlation in the series, and the white noise residuals should be independent of each other.(iii) Significance of the Parameters.Significance tests for parameter estimates indicate whether some terms in the model might be unnecessary.Forecasting Stage.Future values of the time series are forecasted.2.5.Model Evaluation.The mean absolute percent error (MAPE) as defined below was used as a measure of accuracy of the models: MAPE = 100 * ( ∑  =1 (       −      /)  )

Figure 1 :
Figure 1: Trend of area, productivity, and production of rice in Odisha and India.(a) Area, million hectare; (b) productivity, kg ha −1 , and (c) production, million tonnes.

3. 3 .
Forecast Using ARIMA Models.The observed and predicted values for rice area, production, and productivity along with percentage of deviation are presented in

Table 1 :
Descriptive statistics and Mann-Kendall trend analysis test for the time series data of rice cultivation in Odisha and India from 1950-51 to 2006-07.
, where  is the order of differencing.
is the moving average operator, represented as a polynomial in the backshift operator () = 1 −  1  − ⋅ ⋅ ⋅     , and   is the independent disturbance, also called the random error.For simple differencing,   = (1 − )

Table 2 :
Autoregressive integrated moving average (ARIMA) models fitted for time series data on rice area, productivity, and corresponding selection criterion, that is, Akaike information criteria (AIC) and Schwarz-Bayesian information criteria (SBC).

Table 3 :
Final estimates of parameters of autoregressive integrated moving average (ARIMA) models fitted for time series data on rice area, productivity, and production for Odisha and India.

Table 4 :
Performance of autoregressive integrated moving average (ARIMA) models for rice area, productivity, and production for Odisha and India.
MAPE: mean absolute percent error.