Integration of Artificial Neural Network Modeling and Hyperspectral Data Preprocessing for Discrimination of Colla Corii Asini Adulteration

,


Introduction
Colla Corii Asini (CCA, E'jiao), a gelatin-like preparation derived from stewing and concentrating material from donkey hide, is a health-care food and one of the well-known traditional Chinese medicines [1].It was reported that several main components have been isolated from CCA, including amino acids, proteins/gelatins, polysaccharides, volatile substances, and organic substances [2].Due to its stimulating effect in hematopoiesis, CCA has been widely used in dietetic life-nourishing and clinical antianemic therapy for over two thousand years [3].
CCA is made of dry skin derived from donkeys which is specified in the Chinese Pharmacopoeia and in the non-JP crude drug standards [4].Because of the complex preparation and raw material shortage, pigskin gelatin has been illegally added to CCA.As a kind of traditional food, authenticity of CCA is usually identified by its external properties; however, it needs rich personal experience and professional knowledge.Nowadays, some physical and chemical inspection methods were used to discriminate adulteration in food like meat [5,6], oil [7,8], honey [9,10], milk [11,12], and so on.As for CCA, polymerase chain reaction method has been reported [4].In spite of their feasibility in the authentication of CCA, these methods are too complicated and timeconsuming.Therefore, establishment of a fast and objective identified method for CCA is required, followed by a quantitative prediction of gelatin adding, if necessary.
Recently, some researchers have paid much attention to spectroscopy for authentication of food [13,14], but these techniques cannot provide visual and spectral information of whole samples simultaneously.Hyperspectral technique combines two different pieces of information (spectroscopy and images), which can provide a detailed analysis as it contains spectra with spatial data.By obtaining the spectral data at every pixel in image, a hypercube including three-dimensional dataset (two-dimensional image and onedimensional spectral data) is created, which can be applied to estimate the changes of spectrum and reflect the physical and chemical characteristics of the samples [15].Many researchers have investigated the potential of hyperspectral technique for the quality detection measurement of meat [16,17], vegetables [18], fruit [19], and seafood [20].As for food authenticity, it has been successfully applied in milk [21], muscle food [22], and oil [23].
In this study, with pig skin gelatin (PSG) adulterating as a specific example, a fast and noninvasion method based on hyperspectral technique was developed.This study was focused on (1) using different pretreatment of multiplicative scatter correction (MSC), Savitzky-Golay (SG) smoothing, and combination of MSC and SG (MSC-SG) to preprocess the spectral data of region of interest, (2) using the successive projections algorithm (SPA) to conduct data dimension reduction and acquire characteristic wavelengths (CWs) under different preprocessing methods, (3) extracting the image features of CWs to represent the physical and chemical characteristics of CCA, and (4) building prediction models of CCA adulteration and comparing the prediction ability of different models based on different spectral preprocessed methods.

Sample Preparation.
The CCA was purchased from Tong Ren Tang Technologies Co. Ltd., whose production method is in terms of the company standard (Q/TX TRT0001).The PSG was obtained from Boyang Gelatin Co. Ltd. (Henan, China).Then, CCA samples were added with PSG in the range of 5-95% (w/w) at 5% increments.In order to obtain homogeneous mixing samples, certain amounts of CCA and PSG were dissolved in deionized water and heated at 90 ∘ C in a water bath for 30 minutes.Then, the samples were frozen in refrigerator (−80 ∘ C) for 2 hours and then transferred to a vacuum freeze-dryer (FD-1-50, Boyikang, Beijing, China) for 5 days.Subsequently, the samples were milled by a food pulverizer (MDJ-A01Y1, Bear, Foshan, China) and obtained from sieves of 40-mesh sizes for further hyperspectral data acquiring.The key steps of measurement procedure were illustrated in Figure 1.

Hyperspectral Data Acquisition.
Hyperspectral data of samples was acquired using a line-scanning hyperspectral imaging system based on reflectance mode.The system (Sichuan Dualix Spectral Image Technology Co. Ltd., Gai-aField-V10E) consists of a spectrograph with spectral resolution of 2.8 nm, a cooled charge-coupled device (CCD) camera, four 200 W bromine-tungsten lamps, an electrical mobile platform driven by a stepper motor, and a computer installed with system controlled software.The system collects information on the spectral range of 388-1045 nm, which is able to cover the visible (VIS) and partial near infrared (NIR) band.Each acquired dataset was stored as a three-dimensional data cube (, , ), the image dimensions (, ) included 1392 × 3023 pixels, and the spectral dimensions () included 1040 bands.Through experimental optimization, the exposure time was set at 17 ms, the object distance was set at 185 mm, the spectral sampling point was set at 0.65 nm, and the movement speed of the object stage was set at 1.5 mm/s.
Because of the influence of the samples differences, the dark current in the CCD camera, and the uneven intensity of the illumination in different bands, the hyperspectral dataset acquired from the system was first calibrated by using a white and dark reference image.The white reference image with nearly 100% reflectance was acquired using a uniform Teflon board (Sichuan Dualix Spectral Image Technology Co., Ltd.) under the same condition of the raw hyperspectral data.The dark reference image with nearly 0% reflectance was acquired by turning off all the lamps and covering the lens of the camera with a black cap.The calibration was conducted using the following equation [24]: where (),  sample (),  white (), and  dark () are the calibrated image, the acquired original image, the white reference image, and the dark reference image, respectively.The calibrated images () were employed for further image processing and analysis.The calibration was performed using the system controlled software in the computer.The region of interest (ROI) was obtained from the calibrated images using ENVI software (ENVI 5.0, Research System Inc., USA).An ROI holding approximately 400 × 400 pixels 2 of the square area was manually selected in the center of each sample, and the mean spectra (388-1045 nm) of ROI for 200 samples were calculated by ENVI.

Spectral Preprocessing.
In order to reduce or remove undesired physical effect such as light scattering and random noise caused by instruments or variable physical sample properties, the average reflectance spectrum (ARS) should be preprocessed.In this study, MSC, SG smoothing, and MSC-SG were individually used.MSC method is effectively used in spectroscopic applications where light scattering variation and multiplicative noise were present [25,26].SG smoothing were widely used in removing noise of baseline excursion and smoothing spectra [27,28].For SG smoothing, a window width of 7 points was applied.The preprocessed spectra were used for CWs selecting, respectively.The performances of MSC, SG, and MSC-SG were compared by modeling results.All of these pretreatments were carried out using the Matlab 2009a (The Math Works Inc., USA).

Extraction of Image Features.
The colour and texture features were extracted from hyperspectral images at CWs.In this study, the colour feature was determined as average gray level (AGL) of ROI, which was calculated using the following formula: where  is the column number of image;  is the row number of image; gray (, ) denotes the gray level of the pixel of ROI; area denotes the number of total pixels of ROI.Gray level cooccurrence matrix (GLCM) is a commonly used method for texture analysis, in which the texture features are extracted through statistical calculation.In this study, texture extraction based on GLCM was carried out according to the methods described by hyperspectral image researches with slight modification [29,30].GLCM was constructed with four different angles (0 ∘ , 45 ∘ , 90 ∘ , and 135 ∘ ), in which the distance between pixel pairs was set as 1 pixel.Four texture features were calculated including contrast (CON), energy (ENE), correlation (COR), and inverse difference moment (IDM), where CON mainly describes strongly different spectral responses, ENE mainly describes textural uniformity, that is, pixel pairs repetitions, COR mainly describes linear-dependencies relationship of gray level, IDM mainly describes image homogeneity, and the larger values imply smaller differences of gray level in pair elements [31].The texture features were calculated according to the following formulas [31,32]: where (, ) is the (, )th entry in GLCM,  is the column number of GLCM, and  is the row number of GLCM.All of the extraction processing of image features was carried out using Matlab 2009a.

Model Development. Probabilistic neural network (PNN)
and generalized regression neural network (GRNN) proposed by Specht are the two kinds of modified radial basis function (RBF) neural network [33,34], which possess more advantages than RBF in some respects and are both widely used in hyperspectral data modeling for food [32,35].In this study, PNN and GRNN were employed to build prediction model for detecting of CCA adulteration, which were composed of the input layer, hidden layer, and output layer.The nodes number of input layer was determined by colour and textural features derived from hyperspectral images of the selected CWs.The hidden layer was determined by neural network self-adaptive process.The output layer was the adulterated ratio of PSG.The error of predicted adulteration rate (AR) less than 0.5% was considered accurate in this study.
The deviation between prediction and calibration value at different AR for different models was determined by RMSE.To eliminate the different orders of magnitude and units, the input layer parameters were standardized using the -scores transformation method [36]: where  denotes the sample size;  denotes the numbers of samples,  = 1, 2, . . .., ;  denotes the numbers of input layer parameters;    denotes the standardized input layer parameter;   denotes the original value of input layer parameter;   denotes the average of the th input layer parameter;   denotes the standard deviation of the th input layer parameter.

ARS Analysis and Preprocessing.
The properties of electromagnetic radiation could interact with different proportions of physicochemical materials existing in the adulterated CCA, resulting in some especial absorption characteristics at some specific wavelengths [25].Each pixel of ROI acquired from hyperspectral image cube has plentiful spectral information [37].The original and pretreated ARS of PSG adulteration from 5% to 95% were presented in Figure 2. As shown in Figure 2(a), the ARS of different PSG adulteration presented similar variation tendency, peak, and trough, which illustrated their similarity in physical and chemical properties [15].Though different spectral curves presented similar patterns, they displayed different absorbance intensity in the range of 388-1045 nm.It implied that the CCA with PSG adulteration has led to significant alterations to the physicochemical property, which can be measured by spectral information.Specifically, it was noticed that significant peaks and troughs were found at wavelengths of 600-700, 700-780, and 800-970 nm.The apparent absorption peaks at approximately 760 nm appeared on the ARS curves.The 700-780 nm region presented one obvious absorption peak particularly, which could be attributed to the O-H third overtones and C-H forth overtones [38,39].Because too much noise existed in region of 388-400 nm and 1000-1045 nm in the original ARS and corrected images, the remaining region from 400 to 1000 nm was employed for further analysis.
In order to correct spectral data, improve signal to noise ratio, and enhance the spectral resolution, the original ARS were preprocessed with the MSC, SG, and MSC-SG before CW selecting (Figures 2(b), 2(c), and 2(d)).It is apparent that the preprocessed ARS illustrated one more distinctive spectral shape than the spectra in Figure 2   there is still much noise in the region of 400-490 nm and 790-1000 nm after MSC.It can be seen that the spectra after SG smoothing have better denoising effect than original ones and MSC.However, there was no apparent change on waveform.Based on the above results, the combined method using MSC and SG in turn was applied.The ARS with MSC-SG (Figure 2(d)) illustrated that the combined method can decrease the noises as well as correct scatter effects, respectively.The impact of preprocessing (MSC, SG, and MSC-SG) was compared by further modeling results analysis.

CWs
Selection by SPA.SPA method was employed to select the CWs from the full spectra, which was performed according to the transition value of RMSE [40].By SPA, RMSE trend with the increasing CW number was illustrated in Figure 3 using the solid black curve which corresponded to different preprocessing methods.According to the RMSE plot, the selected number of CWs was determined by the point corresponding to the values of RMSE tending to be stable.The ordinate of the red box in Figure 3 represents the CW number which is 1, 4, 5, and 5, respectively, for the four preprocessing methods (no preprocessing, SG, MSC, and MSC-SG).As shown in Figure 4, the CWs were calculated by SPA as the most informative wavelengths replacing the full wavelengths for adulterated prediction purpose.One wavelength of 695 nm was selected for original spectral without preprocessing.Four wavelengths of 694, 743, 774, and 958 nm were selected as CWs for spectra with SG smoothing.Five wavelengths of 934, 500, 784, 741, and 721 nm and 934, 501, 720, 787, and 741 nm were determined as CWs for spectra with MSC and MSC-SG, respectively.The CWs of each preprocessing method were ranked in order and the optimum one was ranked first.The CWs mainly concentrated at the region from 700 to 800 nm.It was consistent with absorption peak appearing in Figure 2, which was attributed to the third and fourth overtone of the O-H-and C-H-functional group, respectively [39].The hyperspectral images of CWs were applied to extracted image features which were further used for prediction models.

Image Feature Selection.
Hyperspectral imaging was one of the most commonly used applications of hyperspectral techniques [13].The hyperspectral images of the first CWs (695, 694, 934, and 934 nm) under different preprocessing method (no preprocessing, SG, MSC, and SG-MSC) were shown in Figure 5.The AGL of the first CWs were shown in Figure 6.Because of obtaining the same first CWs of 934 nm after MSC and MSC-SG, they have the same hyperspectral image and AGL as shown in Figures 5(c), 5(d), 6(c), and 6(d).
In Figure 5, there was an obvious tendency of colour change with PSG added ratio increasing.As for texture features, specific to the grooving depth and homogeneous degree, regular changes with different amounts of adulteration were shown.As shown in Figure 6, the AGL of images corresponding to the first CWs after different pretreatments demonstrated the same variation trend; that is, the values of AGL increased with PSG content increasing.The tendency of colour change in Figure 5 consisted with the tendency of AGL illustrated in Figure 6.From the above, the colour (AGL) and texture (CON, ENE, COR, and IDM) features were applied to build prediction model.The above five features were standardized by -scores processing before use as input layer for modeling.
Taking texture features of the first CW (934 nm) under MSC-SG preprocessing as an example, four texture statistical measurements in four directions (0 ∘ , 45 ∘ , 90 ∘ , and 135 ∘ ) were illustrated in Figure 7. Larger variety was observed for measurements of CON than ENE, COR, and IDM along three directions (45 ∘ , 90 ∘ , and 135 ∘ ).It could be explained by intrinsic properties of heterogeneity of CCA adulterated with PSG [30].The values of CON were higher than ENE, COR, and IDM at 45 ∘ , 90 ∘ , and 135 ∘ , which indicated that CON contains high local variations in those directions [30].The figures of the rest of the CWs with different preprocessing methods have similar characters as Figure 7, which were no longer as the mean of averaged over the four directions at CWs were used for further analysis [30], which means only 4, 20, 16, and 20 texture features were obtained as inputs to develop prediction models, respectively, for different preprocessing methods.[35].When the data without preprocessing was used to establish the prediction model, PNN and GRNN presented the lowest and similar CCR of 82.5%.All preprocessed spectra used in model building displayed higher CCR than those without preprocessing, which meant that the preprocessing methods were helpful and to some extent improved the prediction accuracy compared with the original data.After the application of MSC-SG, the CCR of PNN and GRNN both showed better performances.GRNN demonstrated better results than PNN.MSC-SG coupled with GRNN displayed the best performance for prediction with the highest CCR of 92.5%.According to Table 2, the normal samples, samples with pretreatment, and samples with higher adulteration rate (AR) have relatively lower RMSE.Samples with relatively lower AR possessed higher RMSE, specific to samples at AR of 5%, and RMSE reached 0.70 and 0.71 for PNN and GRNN, respectively.The value of RMSE decreased with increasing adulteration, which means the deviation between the prediction and calibration value decreased with increasing of AR.

Conclusions
The hyperspectral imaging in the spectral range of 388-1045 nm in tandem with spectral preprocessing and ANN (PNN and GRNN) technique was successfully developed for predicting of adulterated CCA.SPA was employed to select the CWs under the three preprocessed methods (MSC, SG, and MSC-SG) and reduce the high dimension of hyperspectral data.As a result, one wavelength of 695 nm was selected as CW for spectra without preprocessing.After MSC, five wavelengths of 943, 500, 784, 741, and 721 nm were selected by SPA.Four wavelengths of 694, 743, 774, and 958 nm were selected by SPA under SG smoothing.For the data with MSC-SG, five  wavelengths of 934, 501, 720, 787, and 741 nm were selected.Besides, the colour (AGL) and texture (CON, ENE, COR, and IDM) features were extracted from the hyperspectral imaging of CWs and applied to build prediction models.Eight ANN (PNN, PNN + SG, PNN + MSC, PNN + MSC-SG, GRNN, GRNN + SG, GRNN + MSC, and GRNN + MSC-SG) were used to establish the prediction models, of which GRNN in tandem with the MSC-SG (GRNN + MSC-SG) presented satisfactory performance with the CCR value of 92.5%.The overall results indicated that integration of ANN models  analysis and spectra preprocessing based on hyperspectral imaging for CCA online detection was suitable and potential.

Figure 1 :
Figure 1: The key steps of measurement procedure.
(a).Particularly, the ARS with MSC (Figure2(b)) gathered in a narrow space compared to the original spectra (Figure2(a)).However,

Figure 5 :
Figure 5: Hyperspectral images of samples adulterated with PSG in the range of 0-95% corresponding to the first CW of 695 nm (a), 694 nm (b), 934 nm (c), and 934 nm (d) for different preprocessing (none, MSC, SG, and MSC-SG).

Figure 7 :
Figure 7: Four texture characteristics from original spectrum at the first CW (695 nm) along different directions for 50% PSG adulteration.

Table 1 :
Performance of PNN and GRNN with different preprocessing methods.

Table 2 :
RMSE of prediction set at different AR.