Nondestructive Discrimination of Lead ( Pb ) in Preserved Eggs ( Pidan ) by Near-Infrared Spectroscopy and Chemometrics

A major safety concern with pidan (preserved eggs) has been the usage of lead (II) oxide (PbO) during its processing. This paper develops a rapid and nondestructive method for discrimination of lead (Pb) in preserved eggs with different processing methods by near-infrared (NIR) spectroscopy and chemometrics. Ten batches of 331 unleaded eggs and six batches of 147 eggs processed with usage of PbO were collected and analyzed by NIR spectroscopy. Inductively coupled plasma mass spectrometry (ICP-MS) analysis was used as a reference method for Pb identification. The Pb contents of leaded eggs ranged from 1.2 to 12.8 ppm. Linear partial least squares discriminant analysis (PLSDA) and nonlinear least squares support vector machine (LS-SVM) were used to classify samples based on NIR spectra. Different preprocessing methods were studied to improve the classification performance. With second-order derivative spectra, PLSDA and LS-SVM obtained accurate and reliable classification of leaded and unleaded preserved eggs. The sensitivity and specificity of PLSDA were 0.975 and 1.000, respectively. Because the strictest safety standard of Pb content in traditional pidan is 2 ppm, the proposed method shows the feasibility for rapid and nondestructive discrimination of Pb in Chinese preserved eggs.


Introduction
Preserved egg or pidan has been one of the most popular traditional alkali-treated egg products in South and East Asian countries, including China, South Korea, Thailand, and Japan.Nowadays, pidan is consumed in more than 30 different countries worldwide for its special taste, flavor, and texture [1].In China, pidan is recognized as a healthy and functional food and is also used to treat eye problems, toothache, high blood pressure, tinnitus, vertigo, and other diseases [2].
It has been shown that alkali treatment improves the extractability, solubilization, gelation, and dispersibility for preparing texturized products in recent years [3].Moreover, this process is also effective in destroying toxins, such as aflatoxin and protein inhibitors, which is advantageous for food processing [3,4].The main acting alkaline chemical reagent is sodium hydroxide (NaOH).Produced by the reaction of sodium carbonate (Na 2 CO 3 ), water (H 2 O), and calcium oxide (CaO) of pickle or coating mud, NaOH penetrates the eggshell and membrane into an egg, leading to physical and chemical changes, color changes, and gelation of the protein [1,[5][6][7][8].
Pidan is made of preserving duck, chicken, or quail eggs in a mixture of clay, ash, salt, and quicklime for several weeks to several months.The main difference between traditional and new processing methods of pidan is whether a light yellow powder, litharge powder (PbO), is used or not in the pickle.In the traditional process, PbO plays multiple roles in different stages of pidan formation [1]: (1) when NaOH penetrates the eggshell, the protein is negatively charged with hydroxyl ions.By combining the carboxyl group of the charged protein, Pb 2+ can help to destroy the protein structure and enhance the protein gelation.In this way, diffusion of NaOH is also controlled; (2) Pb 2+ can combine S 2− (produced from protein hydrolysis) to produce PbS, which forms sediments inside and outside of the eggshell and 2 Journal of Spectroscopy reduces the size of the egg shell pores.The previous reaction can help to control the penetration speed and concentration of NaOH inside the egg in late fermentation stage, which is very important for making good-quality pidan; (3) pidan produced with PbO generally has the same ripening time and the large-scale production is easy to control.Although the use of PbO can improve the quality of pidan, pidan produced by this process has high levels of lead residue.It is well-known that Pb accumulation in the human body can damage the nervous system, blood-producing organs, kidneys, and immune system.Due to the safety concerns, the current Chinese food safety standard for the Pb content in traditionally processed pidan is less than 2 ppm [9], suggesting that excessive usage of PbO in traditional pidan processing should be strictly controlled.As a result, much attention has been paid on developing pidan processing techniques without using PbO [10][11][12].However, because the traditional methods using PbO have the above-mentioned advantages, it is still widely used and Pb contamination of pidan products has aroused wide concern among consumers.Therefore, rapid and reliable methods for identification of Pb are highly demanded to perform routine analysis of pidan products at the market.
The traditional methods for Pb detection are based on wet chemical analysis or instrumental analysis [13][14][15].For pidan samples, both traditional chemical and instrumental analytical methods require a cumbersome sample preprocessing procedure, which is sometimes too time-consuming and labor-intensive for routine analysis.As an important factor in the gelation of egg protein, the cations used in the pickle can greatly influence the chemical composition, physical property, and microstructure of pidan [7,8].Therefore, it is possible to discriminate leaded and "lead-free" pidan by characterizing them with instrumental signals and examining the measured features.Near-infrared (NIR) spectroscopy and chemometrics have been widely used in food analysis [16][17][18] and can provide a useful tool for this purpose.Although NIR quantitative analysis of trace components [19] is rarely reported, NIR spectra can characterize the changes of physical and chemical properties caused by PbO, which is useful for discrimination of trace Pb.Compared with traditional analysis methods, NIR analysis has some advantages, including less or no sample preparations, reduced analysis, time and the feasibility for nondestructive analysis, which make it very suitable to discriminate lead-contaminated and "lead-free" pidan products.
The objective of this paper was to develop a rapid and nondestructive analysis method for routine discrimination of Pb in pidan by combining NIR spectroscopy and chemometrics.The investigations were focused on (1) investigating the influence of different data preprocessing methods on classification performance; (2) comparing the classification performance of linear partial least squares discriminant analysis (PLSDA) [20,21] and nonlinear least squares support vector machines (LS-SVMs) [22]; and (3) investigating the potential of the proposed method for Pb discrimination in pidan with inductively coupled plasma mass spectrometry (ICP-MS) analysis for reference.a Three samples from each batch were analyzed by ICP-MS.b U: "unleaded" and L: "Leaded." c The limit of detection (LOD) was 0.033 ppm.

Pidan Samples.
All the preserved eggs were made of fresh duck eggs.Ten batches of 331 preserved eggs produced by lead-free processing and six batches of 147 preserved eggs produced with traditional processing with usage of PbO were purchased from domestic markets.The difference in pidan processing is whether the pickle used for egg fermentation contains PbO or not.The detailed information of each batch is listed in Table 1.The ripening time of all the preserved eggs was between 5 June and 15 June.The ripening of pidan was further confirmed by examining the inside of egg samples from each batch.To prevent the aging of pidan, all pidan samples were coated with their raw pickle materials (mud or powder) and stored in a cool, dark, and dry place.The temperature was maintained at about 25 ∘ C (±0.5 ∘ C) and the humidity was kept at a stable level in the laboratory.All the cleaned eggs had been examined manually and the cracked eggs had been excluded.Each egg was washed vigorously with deionized water and was left to dry completely before spectroscopic and chemical analysis.

Measurement of NIR Spectra.
The NIR spectra were collected in the diffuse reflectance mode by using a Bruker-TENSOR37 FTIR spectrometer (Bruker Optics,Ettlingen, Germany).All the spectra were acquired with a PbS detector and an internal gold background as the reference.A fiber bundle was used to illuminate the sample and collect the scattered light [23].The fiber probe was placed directly to contact with equatorial region of the egg, because NIR spectra are more easily measured in the equatorial region than at the two sides (air chamber).To account for the differences in the internal composition, the diffuse reflectance spectrum was obtained by averaging three spectra measured around the equatorial region of an egg.Each spectrum was the average of 64 scanning spectra and more scans did not improve the signal quality significantly.The range of the raw spectra was from 12,000 to 4000 cm −1 , and the data were measured with an interval of 3.857 cm −1 .The temperature was kept around 25 ∘ C and the humidity was kept at a steady level in the laboratory.

ICP-MS Analysis of Lead.
The actual Pb contents of pidan were analyzed using ICP-MS by a third-party Lab (Kingmed Diagnostics, Guangzhou, China).The egg shell was cracked and removed carefully.Because the distribution of lead in preserved egg is not uniform and its content decreases gradually from the outside to the inside [15], a FastPrep-24 homogenizer (MP Biomedicals, Santa Ana, USA) was used to crush and blend different parts of an egg.A microwave-assisted digestion procedure [15] was performed with a CEM Mars 5 Microwave Accelerated Reaction System (CEM Corp., Matthews, USA).About 0.5 g of homogenized samples was digested in Teflon vessels with 4 mL of nitric acid (HNO 3 ) (w/w, 50%) and 2 mL of hydrogen peroxide (H 2 O 2 ) (w/w, 30%).Lead contents were analyzed by an Agilent 7500 CE inductively coupled plasma mass spectrometer (Agilent Technologies Inc., Santa Clara, USA).
The accuracy of the ICP-MS analysis was validated by using Pb reference solutions (Agilent Technologies Inc., Santa Clara, USA) and the precision was verified by duplication of the samples.The correlation coefficient of calibration curve was 0.9999.The coefficient of variation was 6.55%, whereas the recovery rate under the experimental conditions was 92.6∼105.1%.The limit of detection (LOD) was 0.66 g/L (considering the dilution of samples) as the concentration corresponding to 3 (3 * standard deviation) of 11 measurements of the blank.The practical quantification limit (PQL) was estimated to be 80 g/L, considering the standard solutions and sample dilution.

Chemometrics Analysis.
Multivariate data analysis was performed on Matlab 7.0.1 (Mathworks, Sherborn, MA).The raw spectral dataset was analyzed by robust principal component analysis (ROBPCA) [24] to detect and remove outlying samples caused by the quality of unprocessed eggs, processing, and measurements.An advantage of ROBPCA is that it can overcome the masking effects caused by the presence of multiple outliers.The number of principal components (PCs) was determined by cross validation.Based on the robust PCs, orthogonal distance (OD) and score distance (SD) of each object can be computed.OD is a measure of the distance from an object to the space spanned by significant PCs, which is related to the residuals of PCA; SD describes the distance from an object to the class center, which is related to the leverage of an object.Each of the objects will fall into one of the four groups: regular objects (with small SD and small OD), good PCA-leverage objects (with large SD and small OD), orthogonal outliers (with small SD and large OD), and bad PCA-leverage outliers (with large SD and large OD).
To compare the performance of different preprocessing methods and classification models, data splitting was based on the raw spectra and all the models were developed on the same training set and validated with the same test set.With outliers removed, DUPLEX [25] was performed separately on the raw spectra of unleaded and leaded preserved eggs.DUPLEX alternatively selects the farthest objects for training set and test set, which covers the overall spectral experimental domain.The unleaded and leaded training/test samples selected by DUPLEX were combined as the final training/test set.
Preprocessing methods including smoothing, taking second-order derivative spectra, and standard normal variate (SNV) transformation [26] were performed to improve the classification performance.The Savitzky-Golay (S-G) method [27] by local polynomial fitting was used for smoothing.S-G smoothing tends to retain features such as relative maxima, minima, and width, which are usually distorted by techniques such as moving average.S-G method was also used to compute the second-order derivative spectra, as it can reduce the degradation of signal-to-noise ratio (SNR) by direct differencing.Because the surface curvature of egg shell is not the same for different locations and eggs, SNV was used to remove both additive and multiplicative baseline variations caused by variations in optical path length.
For classification models, linear models tend to have lower model complexity and better generalization performance but poorer model flexibility compared with nonlinear models; therefore, both linear and nonlinear models were investigated.Based on the key method in chemometrics, partial least squares regression (PLSR), PLSDA is a very popular and effective pattern recognition technique.For twoclass problems, PLSDA can be trained with a PLSR between the predictors and a response category variable.The category variable can be assigned +1 for class A and −1 for class B. For prediction, an object with a predicted response value above 0 can be classified into class A and otherwise class B. For PLSDA, only one parameter, the model complexity, or the number of latent variables needs to be optimized.
Least squares support vector machine (LS-SVM) is a simplified version of SVMs [28].Unlike the ordinary SVMs, which need to perform a quadratic programming to obtain the solution, LS-SVM uses equality type of constraints and is much faster to compute.In this paper, the most frequently used Gaussian radical basis function (RBF) was adopted as a nonlinear transformation.Therefore, two parameters,  and , need to be optimized when developing a LS-SVM model.The kernel width parameter, , influences the non-linear nature of the RBF.A narrower kernel can force the model toward a more complex nonlinear solution.The regularization parameter, , controls the tradeoff between reducing the structural risk and minimizing the training error, as a too small value of  cannot fit the data sufficiently and an unnecessarily large  would increase the risk of overfitting.Therefore,  and  should be optimized simultaneously.
For both PLSDA and LS-SVM, Monte Carlo cross validation (MCCV) [29] was used to optimize model parameters.By multiple random splitting of the training set and having a higher proportion of samples for prediction, MCCV can effectively reduce the risk of overfitting.In this paper, MCCV was performed with a left-out rate of 30% and sampling time of 100.The parameters of PLSDA and LS-SVM were selected to obtain the lowest misclassification rate of MCCV (MRMCCV): where   and   are the numbers of prediction objects and misclassified objects, respectively.Sensitivity (Sens.) and specificity (Spec.)were used to evaluate the classification performance: Sens. = TP TP + FN , where TP, FN, TN, and FP denote the numbers of true positives, false negatives, true negatives, and false positives, respectively.In this work, unleaded and leaded preserved eggs were denoted as "positives" and "negatives, " respectively.The overall accuracy (Accu.) of classification models was also used:

Results and Discussion
The detectable Pb contents of pidan objects ranged from 1.2 ppm to 12.8 ppm.Considering the LOD (0.66 g/L) and PQL (80 g/L) of ICP-MS analysis, ICP-MS is sufficient as a reference method, because the cutoff value of Pb content for pidan is 2 ppm.Some of the raw NIR spectra of leaded and unleaded pidan objects are plotted in Figure 1.Seen from Figure 1, the spectra of leaded and unleaded pidan samples have very similar absorbance bands in the range of 4000-12000 cm −1 .
The assignments of bands are as follows: 8500 cm −1 (the second overtones of C-H stretching in various groups), 6000-7000 cm −1 (overlapping of the first overtone of O-H stretching and N-H stretching), 5700 cm −1 (the first overtones of C-H stretching in various groups), 5160 cm −1 (the combination of the baseband of O-H stretching and the first overtone of C-O deformation), 4870 cm −1 (combination of N-H stretching and deformation of peptide groups), 4600 cm −1 (combination of C=O stretching and deformation of peptide groups), and 4270 cm −1 (combination of C-H stretching and C-H deformation).Some bands (8500 cm −1 , 5700 cm −1 , and 4870 cm −1 ) are very weak and have a very poor resolution.Moreover, the range of 8000-12000 cm −1 is baseline and background and carries no chemical information, so this interval was excluded from further data analysis.
Figure 2 shows the spectra preprocessed by taking second-order derivatives (21 points, fourth-order polynomial) and SNV transformation.Spectral smoothing with 15 points and second-order polynomial was also performed.Compared with the raw spectra, taking second-order derivative spectra can enhance the resolution of some bands, as well as improving some details in the spectra.The actual effects of preprocessing should be evaluated by classification performance.For both raw and preprocessed spectra, it is difficult to unambiguously attribute bands to specific chemical components due to overlapping of bands and significant background; therefore, chemometric methods are required to extract the relevant information for classification of leaded and unleaded pidan objects.
Before data splitting, ROBPCA was performed separately on the raw spectra (4000-8000 cm −1 ) of 147 leaded and 331 unleaded preserved eggs to detect outliers.Figure 3 demonstrates the ROBPCA results of leaded and unleaded data sets.The number of significant PCs was estimated by examining robust pooled predicted residual sum of squares (PRESS) values from cross validation with different PCs.For the 147 leaded pidan objects, 4 PCs were selected to compute OD and SD because including more PCs would not reduce the PRESS significantly.For the leaded pidan, the first 4 PCs Figure 2: Some of the second-order derivative and SNV spectra of leaded and unleaded pidan objects.A shift was added to the spectra of unleaded objects.explain about 93.6% of the total data variances.Similarly, for the 331 unleaded pidan objects, 4 PCs were selected and they account for about 91.9% of the total data variances.Because OD is a measure of the distance from the sample to the PCs space and SD describes the sample dispersion in the class in the PCs space, both orthogonal outliers (with small SD and large OD) and bad PCA-leverage points (with large SD and large OD) should be removed.To include the regular variations in a class, good PCA-leverage points (with large SD and small OD) should be retained.Seen from Figure 3 PLSDA and LS-SVM models were developed with the raw and preprocessed spectra in the range of 4000-8000 cm −1 .With different numbers of latent variables and combinations of  and , for PLSDA and LS-SVM, respectively, MRMCCV was computed and the parameters were determined as to obtain the lowest MRMCCV value.Based on different preprocessing methods, the prediction results and the selected parameters are listed in Table 2.It can be seen that preprocessing generally improved the classification performance in terms of sensitivity, specificity and total accuracy.Second derivative and SNV significantly sharpened the classification models by reducing the baseline and backgrounds.Taking second-order derivative of the raw spectra reduced the model complexity of PLSDA.The best models were obtained with second-order derivative spectra, and the sensitivity and specificity, were 0.975 and 1.000 for PLSDA and 0.975 and  0.969 for LS-SVM, respectively.The best prediction results were also demonstrated in Figure 4.

Conclusions
The results obtained in this paper demonstrate that leaded (Pb > 2 ppm) and unleaded preserved eggs (Pb < 2 ppm) can be safely discriminated using NIR spectroscopy and chemometrics.Since the most stringent safety standard of Pb content currently implemented for traditional pidan is 2 ppm, this paper demonstrates the feasibility of NIR spectroscopy as a rapid and nondestructive method for discrimination of Pb in pidan.The comparison of different data preprocessing methods demonstrates that the spectral variations caused by scattering effects and baseline shifts played a more important role than SNR.With comparable classification performance, PLSDA with second-order derivative spectra should be recommended because it is linear and simpler and expected to have a more reliable generalization performance.Our future work will be focused on the influence of PbO on NIR spectra of pidan.

Figure 1 :
Figure 1: Some of the raw spectra of leaded and unleaded pidan objects.

Table 1 :
Detailed information of the unleaded and leaded pidan samples.

Table 2 :
Discrimination of unleaded (positive) and leaded (negative) pidan objects by PLSDA and LS-SVM.The number of correctly classified/the total number of test objects.dThenumber of PLS components.e  2 and  for LS-SVM.
a True positive/total positive.b True negative/total negative.c