Robust and Automated Internal Quality Grading of a Chinese Green Tea ( Longjing ) by Near-Infrared Spectroscopy and Chemometrics

Near-infrared (NIR) spectroscopy and chemometric methods were applied to internal quality control of a Chinese green tea, Longjing, with Protected Geographical Indication (PGI). A total of 2745 authentic Longjing tea samples of three different grades were analyzed by NIR spectroscopy. To remove the influence of abnormal samples, The Stahel-Donoho estimate (SDE) of outlyingness was used for outlier analysis. Partial least squares discriminant analysis (PLSDA) was then used to classify the grades of tea based on NIR spectra. Different data preprocessing methods, including smoothing, taking second-order derivative (D2) spectra, and standard normal variate (SNV) transformation, were performed to reduce unwanted spectral variations in samples of the same grade before classification models were developed. The results demonstrate that smoothing, taking D2 spectra, and SNV can improve the performance of PLSDAmodels. With SNV spectra, the model sensitivity was 1.000, 0.955, and 0.924, and the model specificity was 0.979, 0.952, and 0.996 for samples of three grades, respectively. FT-NIR spectrometry and chemometrics can provide a robust and effective tool for rapid internal quality control of Longjing green tea.


Introduction
Tea is one of the most popular beverages around the world and favored for its various healthy benefits [1,2].According to the degree of fermentation, teas can be generally classified into three types: unfermented, partially fermented, and fully fermented [3].In China, although all the above three types of teas are produced and consumed, green tea is the most favorable for its special flavor and taste.
Longjing tea, a green tea produced from Hangzhou and its neighboring areas, has been traditionally recognized as a top-grade green tea for its top quality as well as its cultural backgrounds [4,5].Longjing tea leaves are roasted soon after picking to cease the natural oxidation process.When steeped, the flat and straight leaves produce a yellow-green color.Its flavor and taste are very gentle and sweet, although it has one of the highest concentrations of catechins among teas [4,5], which is an important indicator of high-quality green teas.
Because Longjing tea has a very high commercial value, the quality control of Longjing tea is urgently demanded against various counterfeit Longjing teas.The internal grading especially among authentic Longjing tea is the foundation for its quality control.As a green tea with Protected Geographical Indication (PGI), the three producing areas of Longjing are explicitly defined as West Lake and its neighboring areas (I), Qiantang (II), and Yuezhou (III).For a long time, it has been recognized that the quality of Longjing tea can be ranked according to their producing areas, namely, I, II, and III.Therefore, it is necessary to develop a rapid and effective method to distinguish different grades of Longjing tea.
Recently, near-infrared (NIR) spectroscopy has been extensively used in food quality control [6][7][8].Compared with traditional analytical methods, NIR has some advantages, including (1) reduced sample preparation, labor, and cost of analysis; (2) the potential for nondestructive and online analysis; (3) comprehensive characterization of multiple components.However, because NIR spectra are often characterized by low spectral resolution and serious peak overlapping, chemometric methods are required to extract useful information concerning food quality from the measured signal.Among various pattern recognition techniques, classification methods are the most frequently used.Some commonly used classification or discrimination analysis (DA) methods include support vector machines (SVMs) [9], k-nearest neighbors (KNN) [10], linear discriminant analysis (LDA) [11], and partial least squares discriminant analysis (PLSDA) [12].
This paper aims at developing a rapid analysis method for grading Longjing tea by NIR spectroscopy coupled with PLSDA.Different data preprocessing methods, including smoothing [13], taking second-order derivative (D2) spectra [14], and standard normal variate (SNV) [15] transformation, were performed to reduce unwanted spectral variations in samples of the same grade before chemometric models were developed.

Tea Samples and NIR Analysis. A total of 2745 authentic
Longjing tea samples of three types were collected from the local tea plants.The detailed information concerning samples is summarized in Table 1.All of the samples were stored in a cool, dark, and dry place with integral packaging before NIR spectroscopy analysis.
Nondestructive NIR analysis of tea was performed using a TENSOR37 Fourier transform NIR spectrometer (Bruker, Ettlingen, Germany) in the wavelength range of 4000-12000 cm −1 .Each sample was measured in a quartz cup without any pretreatments.For each sample, 32 scans were carried out with a resolution of 8 cm −1 at 25 ∘ C using OPUS software.Increasing the number of scans did not significantly improve the signal.The average of the 32 scans was saved as a raw spectrum for chemometric analysis.

Outlier Diagnosis, Data
Splitting, and PLSDA.Outliers are the abnormal samples that deviate from the mass of samples.For classification, outliers not only would lead to bias and error of a model but also can result in misleading estimation of model performance.Considering the multivariate nature of NIR spectra and to avoid the masking effects of multioutliers, robust diagnosis with dimension reduction techniques are suitable to detect the NIR outliers.The Stahel-Donoho estimate (SDE) of outlyingness [14] was used for outlier diagnosis for each grade of Longjing.SDE projects each high-dimensional sample onto randomly selected directions for many times.The SDE outlyingness of each object can be computed using the robust location (median) and scatter estimator (median absolute deviation, MAD).Objects with an especially large SDE outlyingness values were detected as outliers and removed.The number of projections was 500 in this paper.
With outliers removed, the Kennard and Stone (K-S) algorithm [16] was performed to divide the measured data into a training set and prediction set.K-S algorithm can select Suppose that one has an  ×  matrix X of the spectra at  wavelengths for  trainingobjects, for multiclass classification,  is the total number of samples collected from all the  (in this paper,  = 3) different classes.A response matrix Y ( × ) is designed corresponding to the category of each object in X.All the elements in Y are originally set −1, and if an object  ( = 1 : ) is from class  ( = 1 : ), then the element at ith row and jth column in Y is assigned a value of 1.Then, A PLS models can be developed to predict each column of Y using X.For prediction, a new object is classified into class  ( = 1 : ) when the jth element of its predicted response vector is above 0.

Model Validation and Evaluation.
For PLSDA, an important parameter is the number of latent variables (LVs) or the model complexity.Too many latent variables would lead to overfitting of the model and a bad generalization performance, while selecting too few LVs would underfit the model.In this paper, Monte Carlo cross validation (MCCV) [17] was used to select the number of LVs in PLSDA model.The number of PLSDA components was estimated as the mean percentage error of MCCV (MPEMCCV) was minimized: where  is the times of MCCV data splitting,   is the number of prediction samples, and   the number of misclassified for the th splitting during MCCV.
To compare the performance of classification models, sensitivity and specificity of test set for each grade were computed as Sens = TP TP + FN , Spec = TN TN + FP . ( where TP, FN, TN, and FP denote the numbers of true positives, false negatives, true negatives, and false positives, respectively.In this paper, objects in each grade were denoted as positives, and the other two grades were denoted as negatives.

Results and Discussion
Some of the raw NIR spectra of Longjing tea are shown in Figure 1.Seen from Figure 1, the raw spectra of three grades of Longjing have very similar absorbance patterns, and the signals are characterized by low absorbance and baseline.In each grade, the spectra have considerable variations and may overlap with those of the other grades.Therefore, data preprocessing was demanded to reduce the unwanted variations in each grade.Figure 2 demonstrates the spectra preprocessed by smoothing, taking second-order derivative (D2), and SNV.Spectral smoothing seems to obtain an improved SNR but cannot remove the baselines in the data.Second-derivative spectra have enhanced the local peak differences, for example, around 7200 cm −1 .SNV seems to be able to remove most of the within-grade variations.
The SDE outlyingness diagnosis plots of the three grades of Longjing are shown in Figure 3.According to the 3− rule, a SDE value above 3 is recognized as an outlier.4, 9, and 20 objects were removed from grades I, II, and III, respectively.Therefore, 461, 891, and 1360 objects were left for grades I, II, and III, respectively.To investigate the effects of data preprocessing on classification performance, all the PLSDA models were trained and tested with the same data sets.The K-S algorithm was performed on the raw data of each grade to obtain training and test objects.Finally, the training set contains 1800 objects (grade I, 300; grades II, 600; grade III, 900) for training and 912 objects (grade I, 161; grades II, 291; grade III, 460) for prediction.
With different preprocessing methods, PLSDA models were developed, and MCCV was performed to estimate the number of latent variables.For MCCV, the original training set was randomly divided into training (50%) and prediction objects (50%) for 20 times.The classification results and model parameters of PLSDA with different preprocessing are summarized in Table 2. Seen from Table 2, D2 and SNV spectra obtained significantly improved prediction accuracy compared with raw and smoothed spectra.The best classification models were obtained by SNV-PLSDA with sensitivity/specificity of 1.000/0.979,0.955/0.952,and 0.924/0.996for Longjing of grades I, II, and III, respectively.Figure 4 presents the misclassification results by different preprocessing methods.For most of the models, the classification sensitivity and specificity for each grade of tea were above 0.9, indicating the effectiveness of NIR for characterization and classification of Longjing.Moreover, D2 and SNV can reduce unwanted variations by removing part of baseline and scattering effects; therefore, D2 and SNV should be preferred for spectral preprocessing.

Conclusions
Rapid and reliable internal quality control of Longjing green tea was performed using NIR analysis and chemometrics.Comparison of different preprocessing methods demonstrates taking SNV and D2 transformations that can effectively reduce unwanted spectral variations in each grade of tea.NIR analysis and pattern recognition methods demonstrate potential for nondestructive and rapid discrimination of internal quality grades of Longjing.A practical problem is the seasonal and year-to-year variations in the chemical compositions of green tea.Therefore, our future work will be developing quality control models for Longjing tea with different harvest seasons and years.

Figure 1 :
Figure 1: Representative raw NIR spectra of Longjing tea of grade I (a), grade II (b), and grade III (c).

Figure 2 :Figure 3 :
Figure 2: NIR spectra of Longjing tea preprocessed by (a) smoothing, (b) second-order derivatives and (c) SNV.An artificial shift was added to distinguish different grades (I, II, and III) of Longjing.

Figure 4 :
Figure 4: PLSDA classification of Longjing grades for the test objects (objects 1-161, grade I; objects 162-452, grade II; objects 463-912, grade III) with raw (a), smoothed (b), second-order derivative (c), and SNV (d) spectra.The correctly classified objects are not displayed.The location of a bar indicates a misclassified object, and the height indicates to which grade it was wrongly assigned.

Table 1 :
Longjing tea of different grades.

Table 2 :
Classification results of three grades of Longjing with different preprocessing methods.
a The number of PLSDA components used to distinguish one grade from the other two grades.b TP/(TP + FN).c TN/(TN + FP).