Bayesian Methods for Predicting the Shape of Chinese Yam in Terms of Key Diameters

1Research Center for Global Agromedicine, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho, Obihiro 080-8555, Japan 2Department of Human Sciences, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho, Obihiro 080-8555, Japan 3Institute of Agricultural Machinery, BRAIN, National Agricultural and Food Research Organization, 1-40-2, Nisshin-cho, Kita, Saitama-shi 331-8537, Japan


Introduction
Chinese yam (Dioscorea opposita) is one of the most exported crops from Japan.The value of yam exports reached 1.89 billion JPY in 2013 [1].About 90% of the total yield of yam in Japan was produced in two prefectures, Hokkaido (45.8%) and Aomori (44.0%), in 2012 [2].In both prefectures, mechanical cultivation is used for rapid expansion of production.However, seed yams (seed tubers of yams), which are uniformly cutoff yams (Figure 1), are manually produced and require the effort of 300 people⋅h/ha.In order to reduce the cost of production and improve the yield of yams, mechanization for producing seed yams is required.
The problem in the mechanization of seed yam production is how to determine the cutoff positions for each yam.It is expected that a yam be uniformly cut with a desired weight and without much loss.Therefore, under the assumption of equal density among yams, it is required that the shape of the yam be measured, since the weight of each seed yam can be calculated using the shape and the cutoff positions.
A straightforward way to measure the shape of a yam is to scan a yam using sensors.However, this includes three problems: (1) cost of the sensor, (2) speed of the process, and (3) accuracy of the scanning (e.g., trichomes of a yam can reduce the accuracy of the scanning).Another way is to use images of yams for shape determination.Such an approach has been widely used in fruit/crop grading, classification and removal before shipment [3][4][5][6].Computational and statistical methodologies have been provided [7][8][9][10][11][12][13][14][15][16].In the case of producing seed yams, the problem is much simpler than the general problem mentioned above for fruits and crops; we can assume a regular pattern of yams (see Figure 1) and do not have to strictly check yam damage, because the purpose here is to know the shape of yams quickly without the use of many devices (i.e., a low-cost way).In this paper, we propose a Bayesian framework to address issues (1) and (2), that is, to provide a low-cost and high-speed way for shape prediction of yam.Our hypothesis is that shape of yam can be predicted by a few key diameters at fixed positions, under an assumption that shape of yam can be represented by a set of diameters.In order to examine this hypothesis, we need to construct a model that gives a relationship between the diameters to be predicted and the key diameters, which can be measured.A difficulty in the model construction is that measurements of diameters for each sample are insufficient and unsteady.Thus, we introduce a Bayesian framework to relieve such difficulty.
Bayesian method is a technique for statistical inference that updates the probability based on a prior probability for random parameters in a model based on observations.By using Bayesian inference, we can set up a prior distribution for parameters based on prior information, which is available in advance, to obtain robust estimates for parameters for lack of observations, so Bayesian method is especially useful when observational data are insufficient for estimation.In this reason, methods of Bayesian data analysis are widely applied (e.g., [17]).Bayesian inference is particularly important in time series analysis.For example, [18] proposed an approach of Bayesian smoothness priors for analyzing time varying structure in a dynamic system; it is useful for a case that there are some missing data in time series.In this paper, we apply the technique of smoothness priors to the problem of shape prediction of Chinese yam.
The proposed method estimates the whole shape of a yam based on a few measurements of the key diameter of the yam.The two issues regarding the measurement of the shape of yams are overcome by using the proposed method, since the diameter of a yam are easily and accurately measured without any sensors.We estimated optimal positions of the diameter to be measured by minimizing the error of the shape prediction.We also illustrated high performance of the proposed method in terms of estimating the shape of yams using a sample data set, which contains the length, weight, and diameters at intervals of 10 to 50 mm (Figure 2, see also Section 2.2) of 111 yams from Hokkaido, Japan.After the construction of the proposed method using the sample data set, the method gives whole shape prediction of yam based on a few key diameters without any scanners or images of yam.
The rest of this paper is organized as follows; Section 2 discusses the procedures for implementing the proposed methods, the results obtained from a set of sample data are show in Section 3, and the result and performance of the proposed methods are discussed in Section 4. Finally, Section 5 concludes the paper.

Basic Consideration.
In this section, we introduce our sample data set and proposed methods.After the construction of the proposed methods using the sample data set, the methods predict the whole shape of yam, which can be expressed by all the diameters along the length of a yam tuber shaft, based on a few key diameters that can be measured in advance.
We developed Bayesian methods to predict the shape of a yam in three steps.
Step 1. Apply Bayesian estimation model to estimate missing diameters (Figure 4).

First of all, as
Step 0 of our Bayesian methods, all yams are arranged into [0, 1] interval (Figure 3).For example, in Figure 3    to be estimated exceeded that of the observations.Therefore, we applied the Bayesian model to solve this problem (Step 1).In Step 2, we constructed a predictive model based on the observed diameters and estimated diameters in Step 1.
The details of the sample data set and proposed methods are explained in the following subsections.

Sample Data Set.
In this study, we used data from 111 (= ) yams in Hokkaido, Japan, to construct Bayesian models.Each yam had measurements of length (mm), weight (g), and diameters (mm) at suitable positions (Figure 2 and description below).All yams were automatically cut off at the position with a diameter of 25 mm (Figure 2).The mean length, weight, and diameter were 451.86 (±64.31)mm, 783.24 (±205.67)g, and 44.30 (±14.43)mm, respectively.The diameters were measured at intervals of 25 mm for 87 yams and 50 mm for 24 yams.Out of the 87 yams, 60 had detailed measurements of the diameter at intervals of 10 mm at the front edge of the yam.A scatterplot of the length and weight of the 111 yams in this study is shown in Appendix A. Length and weight were highly correlated with each other (Pearson correlation coefficient  = 0.739,  < 0.001), implying high quality of the data for model construction.

Step 1: Bayesian Estimation Model for Estimating Missing
Diameters.For a sample yam , we consider the model for the observation of the diameter at the -th point as follows: where   ,   , and   are the diameter, true diameter, and measurement error, respectively,  is the number of yams in the sample, and  is the number of equally spaced points for which the true diameter to be estimated.Note that when there is an observation near the -th point, we regard it as the measure for   ; otherwise we consider that the   is missing.
A difficulty in estimating the unknown quantities   for  = 1, 2, . . .,  and  = 1, 2, . . .,  is that the number of the unknown quantities that need to be estimated is larger than that of the observations; that is, we have too many missing values for the diameters.In order to alleviate this difficulty, we used a Bayesian model.Here, from the viewpoint of a Bayesian approach,   is treated as a random variable.It is assumed that the distribution of this variable can be described with stochastic difference equations that are called smoothness priors ( [18]).For a given sample , we express the smoothness priors for   by a 2-nd order stochastic difference equation as In ( 1) and ( 2),   ∼ (0,  2 ) and V  ∼ (0,  2  ) are white noise sequences on , and they are independent of each other, where  2 and  2  are unknown parameters.By introducing the smoothness priors described in (2) into the model in (1), we can construct a set of flexible Bayesian linear models for   .Now, we put Then, the model in ( 1) and ( 2) can be expressed by the following state space model: In the state space model comprising (4), the parameter   is included in the state vector z  , so its estimate can be obtained from the estimate of z  .Moreover, the variances  2 and  2  can be estimated by the maximum likelihood method.The above Bayesian model to estimate diameters of yams was first introduced in [19] for another application.
When the parameters  2 and  2  are given, we can obtain the estimate of z  using the algorithm of Kalman filter.The estimates for parameters  2 and  2  are obtained by maximizing a likelihood function which is defined based on the Kalman filter.See Appendix B for the algorithm of Kalman filter and Appendix C for the estimation of the parameters  2 and  2  in detail.See also [18,20].

Step 2: Bayesian Predictive Model for Shape Prediction
Using Key Diameter(s).In this section, we propose three models for predicting the shape of a yam based on the results estimated from a set of samples.Let   () be a key diameter at position  (mm) from the tip of the th yam (cf. Figure 5).Also, let   ( 1 ) and   ( 2 ) be the key diameters at positions  1 (mm) and  2 (mm) from the tip of the th yam.

Weighted Averaging (WA).
We aim to predict the diameters at all points  = 1, 2, . . .,  of a yam from the key diameters   ().
Defining d () = d /  () and Ṽ () =   / 2  (), the posterior distribution of the normalized diameter   /  () is given by ( d (), Ṽ ()), where d () is given by the first element of z | , and V is given by the 1, 1 element of C | , which were obtained from the fixed-interval smoothing mentioned above.The weighted average of the diameters is then calculated by which can be regarded as the standard shape of the average yam.
Then, for a yam with the value  * () for the key diameter (), its predicted diameter value at point  is given by Then, we can obtain the estimates â and b of the regression coefficients   and   at point  using a least squares method.For a given yam with a key diameter  * (), the predictive value of the diameter at the point  is obtained by  *  () = â + b  * ().

Multiple Regression Model (M-RM).
Based on the estimated value d of the diameter   and the values of   ( 1 ) and   ( 2 ), a multiple regression model is built as Then, the predictive value of the diameter at point  is obtained using the relation  *  = â + b  * ( 1 )+ĉ   * ( 2 ) with â , b , and ĉ being the estimates of the regression coefficients   ,   , and   , respectively.

Evaluating the Performance of the Bayesian Methods.
As mentioned above, three kinds of predictive models were constructed.There were two issues related to these predictive models.One was how to determine the location parameters, that is,  in the WA and S-RM models or  1 and  2 in the M-RM model.Another issue is how to evaluate these different models.A useful way to address these issues is the use of the mean squared error (MSE) as a criterion for evaluating the predictive models (see, e.g., [21]).
Specifically, for the WA and S-RM models, the MSE is defined by where  *  () is the predictive value of the diameter at the th point on the th yam with the location parameter ,   is the index set {1, 2, . . ., } \   with the index set   for missing values (so,   ( ∈   ) indicate the actual observations for th yam), and   = ∑  =1 |  | ≤  is the total number of indices with measurements.Thus, the mean square differences between predictive values and the observations for the diameters can be expressed.Therefore, we can determine the location parameter  by minimizing the value of MSE() and then evaluate the predictive models based on the minimum values of MSE().
Similarly, for the M-RM model, MSE is defined by where  *  ( 1 ,  2 ) is the predictive value of the diameter at the th point on the th yam with the location parameters  1 and  2 .
A predictive model that minimizes the minimum values of MSE() and MSE( 1 ,  2 ) is considered to be the best model.

First of all, as
Step 0 of the proposed approach, measurements of diameter were disposed at equal intervals with  = 100.For example, for the -th yam with a length of 500 mm and a measuring interval of 50 mm, we obtain the measurement of diameter as { ,10 ,  ,20 , . . .,  ,100 }, and {  } ( ̸ = 10, 20, . . ., 100) are missing.We then applied the Bayesian estimation model to estimate the diameters at every  = 1, 2, . . ., 100 as Step 1 of the proposed approach.In Step 2, predictive models were constructed using the estimated values of parameters.In fact, three approaches of predicting yam shape, that is, WA, S-RM and M-RM, were applied to obtain the prediction for diameters.We set the position  mm of the key diameter   () to be 142.5, 145.0, . . ., 270.0, and the MSE value was calculated for each value of .In the case of M-RM, two positions  1 and  2 for defining the key diameters   ( 1 ) and   ( 2 ) were set as {85.0, 87.5, . . ., 142.0} and {142.5, 145.0, . . ., 270.0}, respectively.The minimum MSE values of WA, S-RM, and M-RM were 18.62 (at  = 257.5 mm), 15.71 (at  = 235.0mm), and 11.48 (at  1 = 105.0mm and  2 = 255.0mm), respectively.Thus the minimum MSE value was attained by M-RM at  1 = 105.0mm and  2 = 255.0mm. Figure 6 shows the change in the MSE value using the three methods. * (105.0) and  * (255.0) of a new yam for whole shape prediction.Figures 8 and 9 show observations together with predictions of the diameters at each point using M-RM with two key diameters at  = 105.0 and 255.0 mm for the shape of the 111 samples in this study.

Discussion
First, three predictive models, WA, S-RM, and M-RM, which are constructed based on result of the Bayesian estimation model, for yam shape prediction are compared in terms of MSE.Although WA is a simple approach compared with the other methods, it resulted in a small MSE value of 18.62 at  = 257.5 mm.The regression methods performed better than  Figure 10: Measured and predicted weight by using proposed Bayesian method with M-RM prediction model.The proposed Bayesian method successfully predicted not only the whole shape of yams (Figures 8 and 9) but also the weight of the yams.
255.0 mm.The quality of the sample data set is then critical for the performance of the shape prediction.In our data set, yam length and weight were correlated with each other ( = 0.739,  < 0.001, Appendix A).This means that the yams had a uniform shape and there were no outliers that show an irregular shape; if there were thick (short and heavy) and thin (long and light) yams, they might be plotted on the upper-left or lower-right on the scatterplot respectively, and the correlation might be lower.The quality of the sample data set, which was used for the construction of M-RM, seemed to be high for model construction.The M-RM method performed well according to the MSE value (Figure 6) and visual inspection of the actual shape prediction (Figures 8 and 9).In order to evaluate the weight of the yams based on the predicted shape, we assumed that (a) each yam was circular in cross-section and (b) the shape changed linearly between each pair of positions.The weight was then estimated under the assumption (a) and (b) (Figure 10).M-RM successfully predicted the weight of the yams.Relatively high accuracy can be obtained by adequately treating the outliers (e.g., removing heavy yams with weight > 1200 g = mean + 2SD).We believe that the Bayesian approaches in this paper are applicable not only for shape prediction of yam but also for other shape prediction problems in agriculture.

Conclusion
This paper proposed Bayesian methods, which is a combination of Bayesian estimation model and predictive model, for shape prediction of yam.Three predictive models we applied were weighted average (WA) and single and multiple regression methods (S-RM and M-RM, resp.).Bayesian method with M-RM prediction model with two diameters at fixed positions of  = 105.0 and 255.0 mm attained the highest performance of the estimate in terms of the MSE value.After the construction of M-RM using the sample data set in this study, M-RM predicts the whole shape of yam based on two key diameters.To measure two diameters at those positions of a yam is fairly easy, and this approach does not need any sensors for the shape estimation.Development of such shape prediction approaches, including our Bayesian method, will be required to reduce the cost and time in food processing.

Kalman Filter (Step 1): One-Step Ahead Prediction
Kalman Filter (Step 2): Filter Fixed-Interval Smoothing Here, I denotes an identity matrix.Note that the calculation in the filter step will be skipped when   is a missing value.Then, the posterior distribution of z  can be given by z | and C | , and subsequently the estimates for the parameter   can be obtained because the state space model described by (4) in the main text incorporates   in the state vector z  .Hereafter, the estimates of   are denoted by d .

C. Algorithm for Estimating the Variances
When the observation data   = { 1 ,  2 , . . .,   } for the -th sample are given, a likelihood function for the variances  2 and  By applying the results of σ2 and τ2  to the above algorithms of the Kalman filter and fixed-interval smoothing, we can obtain the final estimates of   and corresponding variances from the results of z | and C | .

Figure 1 :
Figure 1: An example of yam (a) and seed yams (seed tubers of yam (b)).

Figure 2 :
Figure 2: An example of the measurements of a yam: length and diameters.The weight was also observed.All yams were automatically cut off at a diameter of 25 mm (the cutoff point).

Figure 5 :
Figure 5:Step 2. Shape of a yam, that is, all diameters {  } ( = 1, 2, . . ., ) (both observed and estimated in Step 1), is predicted by a few key diameters   () at  (mm).Mean squared error between the observed diameters and the predicted diameters is calculated in order to evaluate the prediction accuracy.

Figure 6 :
Figure 6: The MSE value for three predictive methods: WA (dotted line), S-RM (broken line) and M-RM (solid line).The horizontal axis indicates the distance  (mm) from the cutoff point.For M-RM, the MSE value indicates the value of MSE(105.0, ) with  1 = 105.0 and  2 =  in (10).

Figure 8 :Figure 9 :
Figure 8: Observations (solid) and predictions (broken) for the shape of the samples (numbers 1-60, length of 274 to 459 mm), using proposed Bayesian method with M-RM prediction model.The samples are ordered by the length.The horizontal and vertical axes indicate the distance (mm) from the cutoff point and the radius (mm), respectively.

Figure 11 :
Figure 11: The scatter plot of the length and weight of the 111 yams in this study.