Discrimination of Free-Range and Caged Eggs by Chemometrics Analysis of the Elemental Profiles of Eggshell

As one of the foods commonly eaten all over the world, eggs have attracted more and more attention for their quality and price. A method based on elemental profiles and chemometrics to discriminate between free-range and caged eggs was established. Free-range (n1 = 127) and caged (n2 = 122) eggs were collected from different producing areas in China. The content of 16 elements (Zn, Pb, Cd, Co, Ni, Fe, Mn, Cr, Mg, Cu, Se, Ca, Al, Sr, Na, and K) in the eggshell was determined using a inductively coupled plasma atomic emission spectrometer (ICP-AES). Outlier diagnosis is performed by robust Stahel–Donoho estimation (SDE) and the Kennard and Stone (K-S) algorithm for training and test set partitioning. Partial least squares discriminant analysis (PLS-DA) and least squares support vector machine (LS-SVM) were used for classification of the two types of eggs. As a result, Cd, Mn, Mg, Se, and K make an important contribution to the classification of free-range and caged eggs. By combining column-wise and row-wise rescaling of the elemental data, the sensitivity, specificity, and accuracy were 91.9%, 91.1%, and 92.7% for PLS-DA, while the results of LS-SVM were 95.3%, 95.6%, and 95.1%, respectively. The result indicates that chemometrics analysis of the elemental profiles of eggshells could provide a useful and effective method to discriminate between free-range and caged eggs.


Introduction
Chicken eggs are one of the main foodstufs consumed worldwide, mainly consisting of eggshell, shell membrane, egg white, and yolk [1,2]. It contains a wide range of nutrients and health-promoting components, such as lecithin, calcium ions, iron ions, and vitamin A [3][4][5]. Eggs have a high digestibility and absorption rate and are an inexpensive and abundant source of high-quality animal protein (about 13 grams of protein per 100 grams of egg) [6][7][8].
At present, various types of eggs exist in the Chinese market, such as organic and ordinary eggs, which have diferent nutritional composition and commercial value [9][10][11]. It has been demonstrated that the nutritional composition of eggs produced by hens fed diferent feeding methods (free-range or cage) and feeds (plant or animal sources) varies signifcantly [12,13]. Free-range eggs contain about one-third and onequarter less cholesterol and saturated fat than regular caged eggs, respectively, and there are signifcant diferences in vitamin A, vitamin E, and beta-carotene content between freerange and caged eggs [14][15][16]. In addition, free-range eggs are considered to have higher nutritional value, favor, and safety than caged eggs, so people are willing to pay a higher price for them. In recent years, egg adulteration and fraud have become more frequent, with traders selling caged eggs as free-range eggs for proft, which seriously undermines the lives, health, and legal rights of consumers. Terefore, it is important to develop a quick and reliable method to identify free-range or caged eggs on the market.
Presently, the methods applied to egg identifcation include high performance liquid chromatography [17,18], gas chromatography-mass spectrometry [19,20], hyperspectral imaging [21], elemental analysis [22], and near infrared spectroscopy [23,24]. For example, Mi et al. [22] investigated the diferentiation of Deqingyuan, Taihe, and crossbred eggs based on multielemental and lipidomic data combined with chemometric analysis and obtained a panel of 22 potential lipid markers for diferentiating Deqingyuan, Taihe, and crossbred egg yolks. Rogers et al. [25] successfully used stable isotopes to analyse and discriminate between eggs produced under cage, barn, free-range, and organic farming systems in the Netherlands and New Zealand. However, fewer studies have been performed to discriminate between caged and free-range eggs in China. So, it is necessary to analyse and discriminate between caged and freerange eggs in China.
In China, a signifcant diference between caged and freerange eggs is the use of diferent feeds. Free-range eggs are produced in a small scale by individual farmers using grains as the main feed, while caged eggs are produced in a large scale using commercial feeds [26,27]. Te diferences in feeding styles can cause the diferences in elemental contents, which can be used to identify diferent types of eggs [28,29]. Terefore, the aim of this work is to develop an egg classifcation method to discriminate between free-range and cage eggs using element analysis combined with chemometrics. In this work, using egg shells as an analytical object to distinguish between free-range and caged eggs, an inductively coupled plasma-atomic emission spectrometer (ICP-AES) was used to analyse the content of 16 mineral elements in eggshells. Various classifcation models such as PLS-DA and LS-SVM were established to discriminate between free-range and caged eggs, and the performance of diferent methods was compared to obtain the best classifcation model.

Experimental Materials and Reagents.
Representative caged and free-range egg samples were collected from diferent producing areas in China. 127 free-range samples and 122 caged eggs were analysed. All egg samples are purchased directly from the manufacturer after confrming the type in 2019. Five eggs will be taken from each batch of samples for parallel analysis, and the remaining two eggs will be used as spares. Te detailed information concerning the samples is shown in Table 1.
Standard reserve solutions (1000 μg·mL −1 ) of Zn, Cd, Co, Cr, Cu, Ca, Mg, Mn, Mo, Ni, Pb, Sr, Fe, Na, and K were obtained from the National Standard Material Center of China. HNO 3 and H 2 O 2 were purchased from Sinopharm Chemical Reagent Co., Ltd.

Digestion of Eggshells.
Te eggshell was rinsed with tap water after removing the internal membrane. Ten, the eggshell was washed with deionized water and dried at 120°C using electric sleeve heating. About 1 gram of dried eggshell was weighed accurately on the electro-optic balance, then smashed into small pieces, and put into a 50-mL conical fask. For digestion, 8 mL HNO 3 (65%, w/w %) and 2 mL H 2 O 2 (30%, w/w %) were added. Te conical fask was heated and kept at 60°C until a colourless solution was obtained. Te solution was cooled naturally and transferred to a 50 mL volumetric fask, where deionized water was added to a constant volume. Te blank was prepared using 4 mL HNO 3 (65%, w/w %) and 1 mL H 2 O 2 (30%, w/w %).

Elemental Analysis by ICP-AES.
Te concentration of the 16 mineral elements in the eggshells was determined using a Shimadzu ICPS-7510 sequential plasma emission spectrometer (Shimadzu, Kyoto, Japan). Te spectrometer parameters were as follows: power: 1300 W; plasma fow rate: 15 L min −1 ; carrier gas fow rate: 0.8 L min −1 ; auxiliary fow rate: 0.2 L min −1 ; atomization fow rate: 0.8 L min −1 ; pump fow rate: 1.5 mL min −1 ; axial observation distance: 15 mm; and the instrumentation stabilization time of 30 s. Analytical lines (Table 2) were selected by considering the overlapping and intensity of signals. A standard curve was developed for each element. For each batch, elemental contents were reported as the average of eggshell samples analysed in triple.

Data Preprocessing, Outlier Diagnosis, and Data Splitting.
All data preprocessing and further analysis were performed using Matlab 7.0.1 (Mathworks, Sherborn, MA). When the measured data are infuenced by signifcant bias and other undesirable factors, the performance and reliability of classifcation modeling would be degraded; therefore, the potential outliers should be detected and removed. In order to solve the  masking efect of multiple outliers, the Stahel-Donoho estimate (SDE) of outlyingness was used for outlier diagnosis of elemental data, which is a robust statistical method with dimension reduction techniques [30]. Te SDE calculates a large number of projections of randomly selected objects in each direction, and through the robust positioning and scatter estimators of the projection, the SDE outlier of each sample is obtained. In this work, the SDE was used for outlier diagnosis in free-range and caged eggs separately. Subsequently, the measured data are divided into a training set and a prediction set by the Kennard and Stone (K-S) algorithm [31]. Te K-S algorithm will select a representative training set to make the objects as scattered in the data space as possible. Because the distributions of two classes of eggs were not the same, the K-S method was performed separately for the free-range and caged eggs.

Multivariate Discriminate Analysis.
For pattern recognition, linear partial least squares discriminant analysis (PLS-DA) [32] and nonlinear least squares support vector machine (LS-SVM) [33] are performed to distinguish free-range and caged eggs. Monte Carlo Cross Validation (MCCV) [34] is used to evaluate the number of PLS-DA latent variables, and the parameters of LS-SVM are optimized to obtain the lowest MCCV error rate (MCCVER) and reduce the risk of model overftting.
Principal component analysis (PCA) is an unsupervised data dimensionality reduction method, which converts a set of potentially correlated variable data into a set of linearly uncorrelated variables through orthogonal transformation, and the converted variables are called principal components. In recent years, PCA has been widely used for classifcation and identifcation of varieties, origins, and adulteration of food and agricultural products [35]. Partial least squares discriminant analysis (PLS-DA) is a supervised discriminant analysis statistical method which is often used to deal with classifcation and discriminant problems. It can well solve those classifcation problems in which the diferences between groups are small and the sample sizes of the groups vary widely [36]. LS-SVM (least squares support vector machines) is mainly used to solve pattern classifcation and function estimation problems. Te optimization of the model parameters such as the kernel function parameter (σ) and the regularization parameter (c) is required when using it. Te kernel parameter has a direct impact on the complexity of the distribution of lowdimensional sample data in the mapping space, while the regularization parameter is related to the ft of the model to the training samples and the generalization ability of the model [37].
Sensitivity and specifcity were used to estimate and compare the performance of classifcation models. Free-range eggs are denoted as "positives," and caged eggs are denoted as "negatives." Sensitivity (Sens), specifcity (Spec), and overall accuracy (Accu) can be computed as follows: Among them, TP represents true positive, FN represents false negative, TN represents true negative, and FP represents false positive. Table 3 showed the ICP-AES analysis results of 16 elements in free-range and caged eggs. Te elemental contents of Ca, Mg, Na, and K were the highest in free-range and caged eggs. Among them, free-range eggs have higher content of Ca, Mg, and Se compared to caged eggs, while caged eggs have higher content of Na, K, Al, Sr, Fe, and Mn, which is consistent with previous studies [38]. It is noteworthy that caged eggs have higher content of heavy metals such as Pb, Cd, Cr, and Cu, and there is no detected Cd element in the free-range eggs. It is known that elements Ca, Mg, Na, and K are involved in various metabolisms in the human body and are essential elements required by the human body, and Se is an important nutrient for the prevention of tumors and liver diseases as well as the improvement of immunity.

Elemental Data of Eggshells.
To illustrate the data distribution, principal component analysis (PCA) was used on the column-wise and row-wise rescaled data without outlier diagnosis (Figure 1). Principal component 1 and principal component 2 explained 90.06% of all data variation, and projection of the raw data onto PC1 and PC2 to obtain score plots showed that free-range eggs and caged eggs basically achieved a better separation, which were clustered into two groups, respectively, where some samples overlapped due to the small diferences in trace element contents in these samples (Figure 1(a)). Te loading plot of principal component 1 is shown in Figure 1(b), which shows that the contents of Cd, Mn, Mg, Se, and K contribute signifcantly to the separation between groups achieved by PC1, while the elements Zn, Co, Ni, Gr, Cu, and Al have negative efects in the classifcation. Te combined content analysis showed that Cd, Mn, Mg, Se, and K had important contributions in the classifcation of free-range eggs and caged eggs and could be used as efective elements to distinguish free-range eggs from caged eggs. Although the PCA model achieved the distinction between free-range eggs and caged eggs, the classifcation accuracy did not reach 100%. So, supervised chemometric models are still needed to achieve accurate classifcation of the two classes.

Development of Classifcation Models.
Considering the relative contents of diferent elements and the diference in each sample weight, rescaling of the data was necessary to analyse the elemental data. In this work, the data for an object was divided by its sample weight followed by a column-wise transformation into unit variance for each element. Te SDE outlyingness analysis was performed separately on each of the two classes using the rescaled data. Outlying values were estimated by 1,000 random projections. Figure 2 shows the SDE outlier diagnostic curve for 127 free-range eggshells and 122 caged eggshells, according to the 3-σ rule. A critical value of 3 was adopted, and an object with an outlyingness value above 3 was considered an outlier. 2 and 1 objects for free-range and caged eggs were detected as outliers, respectively ( Figure 2). Further tracing of the samples indicates that the labels of these eggs were suspicious. Terefore, these objects were excluded from discriminant analysis.
After eliminating outliers, the remaining 125 free-range eggs and 121 caged eggs were used to develop and test classifcation models. Te K-S algorithm was performed separately for the two groups, dividing the free-range eggs into 80 training subjects and 45 test subjects and then dividing the caged eggs into 80 training subjects and 41 test subjects. Terefore, a training set of 160 (80 + 80) objects and a test set of 86 (45 + 41) objects were obtained to develop and evaluate the classifcation model.
Te PLS-DA model and the LS-SVM model based on the eggshell element data were established. Te two parameters c and σ are optimized in the LS-SVM model. Te kernel width parameter σ is related to the data confdence and the nonlinear nature of the model, and the smaller σ means the narrower the kernel width, which may force the model to shift to more complex nonlinear solutions. Another parameter c is a regularization parameter, which involves the trade-of between learning accuracy and structural risk. To simultaneously optimize (σ, c), a grid search method was performed by MCCV. In addition, MCCV is to estimate the number of meaningful PLS-DA latent variables (LV). All parameters of PLS-DA and LS-SVM are optimized by minimizing the MCCV error rate. For MCCV, 70% of the samples were used for the training set and 30% for the test set. Te random data split number of MCCV is 100, and the optimization of model parameters is shown in Figure 3.
Te optimization parameters and classifcation results of PLS-DA and LS-SVM models are shown in Table 4. For PLS- DA, the model has the lowest MCCVER (8.36%) when LV � 4 (Figure 3(a)), which indicates that better classifcation of free-range eggs and caged eggs can be achieved with lower model complexity. For LS-SVM, the lowest value of MCCVER (2.47%) was obtained when the values of σ and c were 700 and 5, respectively; so, this parameter was chosen for classifcation. Figure 4 shows the score plot of the prediction set of the PLS-DA model (Figure 4(a)), which shows that four free-range eggs were misclassifed as caged eggs and three caged eggs were misclassifed as free-range eggs, and the models' accuracy, sensitivity, and specifcity were 91.9%, 91.1%, and 92.7%, respectively. In the LS-SVM model, 2 free-range eggs were misclassifed as caged eggs and 2 caged eggs were misclassifed as free-range eggs, with the models' accuracy, sensitivity, and specifcity of 95.3%, 95.6%, and 95.1%, respectively. Te LS-SVM model has higher classifcation accuracy compared to PLS-DA, demonstrating that LS-SVM is more suitable for the classifcation of freerange eggs and caged eggs. According to previous studies, the discrimination of free-range, caged, organic, and ordinary eggs is mainly based on the analysis of chemical components such as carotenoids [39], lipid extracts [40], proteins, and moisture in eggs [41], which enable an accurate identifcation of diferent varieties of eggs, but the pretreatment of these methods is more complicated. In addition, mineral element-based methods combined with   Journal of Analytical Methods in Chemistry chemometrics have been successfully applied to identify free-range and caged eggs. In Dao's study, signifcantly higher levels of the mineral elements P, Mg, and Na and lower levels of the trace elements Cu, Fe, K, S, and Mn were found in Australian free-range eggs, and a good classifcation of free-range and caged eggs from Australia and Syria was achieved [38]. Te above studies show that mineral elementbased methods combined with chemometrics can achieve accurate identifcation of free-range eggs and caged eggs in China.

Conclusions
As a result, 16

Data Availability
Te data supporting the fndings of the current study are available from the corresponding author upon request.

Disclosure
Shunping Xie and Chengying Hai are the co-frst authors.

Conflicts of Interest
Te authors declare that there are no conficts of interest.

Authors' Contributions
Shunping Xie was involved in methodology, writing, and editing. Chengying Hai investigated the study and wrote the original draft. Song He and Huanhuan Lu performed formal analysis. Lu Xu conceptualized and supervised the study and was involved in funding acquisition. Haiyan Fu conceptualized the study and was involved in funding acquisition.