Honey Discrimination Using Visible and Near-Infrared Spectroscopy

This study aims to investigate the potential of honey discrimination by visible and near-infrared (vis-NIR) spectroscopy with wavelength reduction. A total of 80 samples from four brands of honey produces were measured by a mobile fiber-type USB4000 spectrophotometer with recorded wavelength range of 380.17∼939.98 nm for model calibration. Firstly, principal components analysis (PCA) was used for extracting principal components (PCs). Next, the first seven PCs, which accounted for 97% of variance of the spectra, were combined separately with support vector machine (SVM) and linear discriminate analysis (LDA) to develop PC-SVM and PC-LDA models, both of which achieved 100% discrimination accuracy. In addition, the spectra were subjected to successive wavelength reduction rates (WRRs) of 2x , x = 1–9, for wavelength reduction. The PC-LDA and PC-SVM models developed for these reduced wavelengths produced almost the same performance as compared with those developed for original full wavelengths. This experiment suggests that vis-NIR spectral wavelengths can be reduced at large spacing interval, which allows easing data analysis as well as developing a simpler and cheaper sensor for honey discrimination in practice.


Introduction
Honey is considered as healthy and wholesome food with curative properties as it contains plenty of nutrients and plays effective antimicrobial effects against many bacteria [1].Honey produces are quite popular in China.Generally, consumers classify honeys in terms of color, smell, and taste.Although sensory evaluation is easy to use, subjective experience is often biased and low accurate.Using chemical test methods in laboratory may be accurate but quite timeand cost-consuming, which produces unsafe guarantee to ordinary consumers and hinders the development of apiculture.Thus, it is necessary to develop a convenient and accurate way for honey discrimination.
Recently, visible and near-infrared (vis-NIR) spectroscopy has received wide attention as it is suitable for nondestructive analysis of biological and biomedical materials.For example, the vis-NIR spectroscopy can be used for discriminating tea beverages [2], coffee [3], milk power [4], and other materials [5][6][7].Using this technique, Gallardo-Velázquez et al. [8] and Zhu et al. [9] qualified adulterants in some local origins of honey.Although these researches presented good accuracy for honey discrimination, the calibration models were developed using full range of wavelengths, which would result in high complexity in computation and cause difficulty in practical application.
This study was to investigate the possibility of reducing spectral wavelengths in the process of using the vis-NIR spectroscopy to discriminate Chinese honeys with an aim of fabricating a simpler and cheaper sensor for practical use.

Honey Origins and Spectrum
Acquisition.Four brands of honey (linden tree: 1, locust: 2, wild chrysanthemum: 3, milk vetch: 4) were purchased from a local supermarket on December, 2011 and sealed in room temperature of 20 ∼ 25 • C. Before spectral measurement, the samples were placed in a water container at 50 • C until the soluble substances fully dissolved.Eighty samples were obtained with twenty for each brand.Spectral scanning was conducted by a mobile fibertype USB4000 spectrophotometer (Ocean Optics, USA) in transmission mode with recorded wavelength range of 346.01 ∼ 1038.08 nm at 0.3 nm sampling space.Before sample measurement, twenty reference scans were taken on a ceramic standard supplied with the spectrophotometer.Ten photometric scans were conducted and averaged on each sample.

Spectrum Transformation.
Due to signal noise at both ends of the original spectra, only the wavelengths from 380 to 940 nm were retained for further investigation.Firstly, sample spectral reflectance at a wavelength of λ was calculated by where S ref (λ) is sample reflectance, and S λ , D λ , and R λ are digital counts of sample, dark, and reference light intensities, respectively [10].Then, the reflectance spectra were transformed into absorbance spectra using log (1/R), as absorbance is directly proportional to the concentration of an absorber according to the Beer-Lambert Law.

Principal Components Analysis (PCA).
The PCA is a well-known method for revealing the hidden structure within large data set [11].By using an orthogonal transformation, which could calculate the eigenvectors of the covariance matrix of the original inputs, PCA converts a set of observations of possibly correlated variables (wavelengths) into a set of values of uncorrelated variables called principal components (PCs).Using the PCA, data can be projected from a high-dimension space to a lower-dimension space.In this study, the first few PCs were used by the following two calibration methods, that is, SVM and LDA, for developing discrimination models [3].

Support Vector Machine (SVM).
The SVM is a relatively new computational learning method based on statistical learning theory.SVM maps input vectors to a higher dimensional space by constructing a maximal separating hyperplane, which can produce good performance in ill-posed situations with limited amount of training data.With appropriate choice of kernel functions, SVM is capable of modeling highly nonlinear classification boundaries [12].However, SVM may cause some problems when dealing with large number of input variables [13].In the study, we used PCA to transform the hundreds of variables to several principal components, which was then used for developing SVM models.

Linear Discrimination Analysis (LDA).
The LDA has been proved to be a promising approach of feature extraction.The basic idea of LDA is to find a linear transform in a way that the ratio of the between-class scatter and the within-class scatter is maximized.Namely, samples can be projected to a new space with smallest within-class distance and largest between-class distance [14,15].Although LDA usually gives a good discrimination performance, it suffers from some deficiencies if variables are highly correlated or class boundaries are complex or nonlinear.In the former case, LDA overfits the data, and in the latter case, LDA underfits the data [16].To avoid such deficiencies, variables are often transformed by correlation-reducing methods such as PCA.

Results and Discussion
3.1.Absorbance Spectra. Figure 1 shows absorbance spectra of the four brands of honey with wavelengths ranging from 480 to 940 nm.The spectra present peaks at the band of 400 ∼ 450 nm and low absorbance in the NIR range above 760 nm.All the spectra show their similarity in spectral shape and absorbance.Thus, it is necessary to apply appropriate multivariate analysis methods to build calibration models for honey discrimination.

PC-SVM and PC-LDA Models for Full Wavelengths.
PCA was used to reduce the spectral dimensionality to a small number of principal components, which enabled samples clustering in terms of floral origins.Figure 2 shows the scores scatter plot of the 1st and 2nd PCs.Obviously, the samples can be clearly distinguished in terms of their origins.The first two PCs explain as high as 96% of variance of the spectra (71% for PC-1 and 25% for PC-2).To balance model complexity and discrimination accuracy, we chose seven PCs, which accounted for 97% of spectral variance, to develop PC-SVM and PC-LDA models.Table 1 shows their excellent performance with 100% accuracy for honey discrimination.

PC-SVM and PC-LDA Models for Reduced Wavelengths.
We reduced the wavelengths by increasing spectral spacing interval.Subjected to successive wavelength reduction rates (WRRs) of 2 x , x = 1-9, the number of wavelengths was reduced by the times of 2, 4, 8, 16, 32, 64, 128, 256, and 512, respectively.Based on these reduced wavelengths, a series of PC-SVM and PC-LDA models were established.The results of model performance are also listed in Table 1.Surprisingly, these new models responded insensitively to various WRRs.For example, the PC-SVM and PC-LDA models developed for 6 wavelengths (WRR = 512) achieved discrimination accuracy of 97.5% and 98.7%, which was almost as those for full 2847 wavelengths.This indicates that the six wavelengths at interval of 153.6 nm (512 points at 0.3 nm spacing) are enough to develop accurate models (PC-SVM or PC-LDA) for honey discrimination.Previous reports also revealed the feasibility of reducing wavelengths by WRRs.Yang et al. [7] developed soil nitrogen and carbon contents prediction models using vis-NIR wavelengths subjected to the WRRs of 2, 5, 10, 20, 50, 100, 200, and 500.They found that the models calibrated for 21 wavelengths at interval of 100 nm achieved almost the same as those for full 2100 wavelengths.

Conclusions
In this study, the vis-NIR spectroscopy was used for honey discrimination.Two classification models (PC-SVM and PC-LDA) were developed for various numbers of wavelengths ranging from 380 to 940 nm.The results indicated that the vis-NIR spectra can be reduced without compromise of model performance for honey discrimination.PC-SVM and PC-LDA achieved perfect performance with discrimination accuracy of 97% for full wavelengths as well as for those reduced at large spectral interval, which may allow developing a simpler and cheaper sensor for honey discrimination in practice.

Table 1 :
Discrimination accuracy of PC-SVM and PC-LDA models under different wavelength reduction rates (WRRs).