Analysis of the Oil Content of Rapeseed Using Artificial Neural Networks Based on Near Infrared Spectral Data

The oil content of rapeseed is a crucial property in practical applications. In this paper, instead of traditional analytical approaches, an artificial neural network (ANN) method was used to analyze the oil content of 29 rapeseed samples based on near infrared spectral data with different wavelengths. Results show that multilayer feed-forward neural networks with 8 nodes (MLFN-8) are the most suitable and reasonable mathematical model to use, with an RMS error of 0.59.This study indicates that using a nonlinear method is a quick and easy approach to analyze the rapeseed oil’s content based on near infrared spectral data.


Introduction
Infrared absorption spectroscopy is a common approach for analyzing food composition [1][2][3].For a certain characteristic absorption frequency, Lambert's law provides the following equation [4,5]: where  0 represents incident light intensity,  represents transmission light intensity,  represents the attenuation coefficient, ℓ represents the distance the light travels through the material, and  represents concentration.Equation ( 1) is widely used for determining food composition.However, because the wavelengths in the infrared absorption spectrum are diverse and the force of penetration is tiny, infrared absorption spectroscopy can only be used for analyzing transparent liquids.It is of great difficulty to analyze the oil content of rapeseed using infrared absorption spectroscopy.Therefore, to solve this problem, this study instead uses a nonlinear approach to analyze near-spectral data to determine the oil content of rapeseed.

Artificial Neural Networks
2.1.Fundamental of ANN Models.Artificial neural networks (ANN) model is composed of an interconnected group of artificial neurons.In most circumstances, an artificial neural network is an adaptive system that is equipped to be adapting continuously to new data and learning from the accumulated experience and noisy data [6,7].Apart from that, the system structure can be changed based on external or internal information that flows through the network during the learning phase.Meanwhile, essential information can be abstracted from data or model complex relationships between inputs and outputs [8][9][10].
As can be seen from Figure 1, the main structure of the artificial neural network (ANN) is made up of the input layer and the output layer.The input variables are introduced to the network by the input layer [11].Also, the response variables with predictions, which stand for the output of the nodes in this certain layer, are provided by the network.Additionally, the hidden layer is included.The type and the complexity of the process or experimentation usually iteratively determine the optimal number of the neurons in the hidden layers [12].

Model Development.
Gu and Wang [12] have accomplished a series of researches from correlative precision instrument from which we could obtain data of rapeseeds' near infrared spectroscopy by analyzing absorbance under different wavelengths.We defined  (%) as the percentage composition of the oil in rapeseed.Data of 29 rapeseed samples are shown on Table 1.
In order to confirm the most suitable and robust ANN model in analyzing the oil content of rapeseed, 21 models were established including linear prediction model, general regression neural networks (GRNN) [14] and multilayer feed-forward neural networks (MLFN) [15,16].Into that matter, nodes of MLFN models were set to be from 2 to 20, so that the most robust MLFN model could be found.The independent variables are the absorbancies under the wavelength of 1.68 m (reference wavelength), 1.73 m (characteristic absorption wavelength of fat), 1.94 m (characteristic absorption wavelength of water), 2.10 m (characteristic absorption wavelength of starch), and 2.18 m (characteristic absorption wavelength of protein), respectively, while the dependent variable is the percentage composition of the oil in rapeseed.Training set is consist of 24 samples while the rest

Training Results of MLFN-8. Training and testing results
of MLFN-8 model were extracted from the experiments.For more intuitionistic, six figures described by data are used to portray the training and testing results, which are shown in Figures 2 to 7.
In training process, the comparison result between predicted values and actual values is depicted by Figure 2. The regulation between predicted values and actual values implies that the training process is precise.Different from Figure 3, Figure 4 depicts the relationship between residual values and predicted values during training process.Similar to the result shown in Figure 3, the residual values present the same phenomenon as Figure 3, which indicates that the training process is precise.
In general, Figures 2, 3, and 4 depict the results of training process, showing that the values are concentrated and correspond with the normal training process of MLFN-8 model.It is worth mentioning that the residue values are generally tiny and close to zero, which implies that the training process is correct and precise.

Testing Results of MLFN-8.
To analyze the testing process, three figures were used to present the average values of testing results, which are shown in Figures 5 to 7.
In testing process, as shown in Figure 5, the comparison between predicted values and actual values is also close to linear situation, which means that the MLFN-8 model is precise while predicting.In order to confirm the robustness of comparison between residual values and actual values as well as the comparison between residual values and predicted values, we plotted the comparison between residual values and these two kinds of values, which are shown in Figures 6 and 7.
Figures 5, 6, and 7 depict the average testing process of the MLFN-8 model.All the values shown in the three figures are the average values, from which we can draw a conclusion that the model is accurate and robust.
According to the results presented above, MLFN-8 model is proved to be a suitable and rational model in determining the oil content of rapeseed.

Discussion.
There are several previous studies that are relative to the field we studied [12,[17][18][19][20]. Gu and Wang [12] analyzed the oil content of rapeseed by multiple linear regression based on near spectral data, which is the chief inspiration of our work.In contrast, our work has a higher robustness and precision since the core we paid attention to is the wellfitted nonlinear function.Besides, Madsen [17] established a quick determination approach of oil content in rapeseed by a commercial nuclear magnetic resonance spectrometer.Tkachuk [18] utilized a near infrared reflectance technique to determine oil, protein, chlorophyll, and glucosinolate content in whole rapeseed kernels.In addition, Velasco and relative coworkers [19] used near-infrared reflectance spectroscopy to estimate the seed weight, oil content, and fatty acid composition in intact single seeds of rapeseed.Shafii and his coworkers [20] analyzed the interaction effects on the winter rapeseeds yield and oil content.These researches can analyze the oil content and other properties of rapeseeds effectively, which can be seen as the great references.However, these analytical approaches still need complex manual operation and the process is intricate to some extent.Our study has successfully proved that the oil content of rapeseed can be analyzed by artificial neural networks, which is a quick and easy method that can be calculated automatically by computer.
In the field of food science and analytical chemistry, oil content of rapeseed reveals the yield of the relative products in practical applications.Taking one of the production steps as an example, people should estimate and evaluate the oil content of the rapeseed samples before mass run.Therefore, using artificial neural networks can achieve this step in a high effective way.

Conclusion
Oil content of rapeseed is a crucial aspect on practical applications of food science and chemistry.In this paper, instead of using traditional analytical methods, we successfully used artificial neural networks (ANNs) method to analyze the oil content of 29 rapeseed samples based on near spectral data with different wavelengths.Results show that the multilayer feed-forward neural networks with 8 nodes (MLFN-8) are the most suitable and reasonable mathematical model during experiments.In future research, we will aim at looking for the explicit nonlinear functions of near spectral data in the analysis of rapeseed's oil content.

Figure 3
depicts the relationship between residual values and actual values during training process, showing that the residual values are relatively concentrated.

Figure 2 :Figure 3 :
Figure 2: Comparison between predicted values and actual values during training process.

Figure 4 :Figure 5 :
Figure 4: Comparison between residual values and predicted values during training process.

Figure 6 :Figure 7 :
Figure 6: Comparison between residual values and actual values during testing process.
Figure 1: A schematic view of artificial neural network structure.

Table 2 :
Results of different models in analyzing oil content of rapeseed.

Table 2 .
Results presented by Table2imply that the lowest RMS error of testing exists in the MLFN model with 8 nodes (MLFN-8), which is 0.59, lower than those generated by linear prediction model and GRNN model.And the accuracy rate of the testing is 100% with the permission error.Therefore, the MLFN-8 model is proved to be an accurate and robust model.