Stable Isotope Ratio and Elemental Profile Combined with Support Vector Machine for Provenance Discrimination of Oolong Tea (Wuyi-Rock Tea)

This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n = 99) collected from nine producing areas and non-Wuyi-Rock tea (n = 33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model.


Introduction
Oolong tea is a traditional beverage favoured by consumers all over the world for its pleasurable aroma and taste. In addition, Oolong tea is a rich source of antioxidants, such as tea polyphenol and tea polysaccharide, so it is also reported as a functional drink that combats obesity, hypoglycaemia, and oral bacterial infection [1][2][3][4].
For tea products, their aroma and savor are influenced by many aspects, such as geographical origin [5,6], tea specie, cultivation, and processing method [7,8]. Among these aspects, the geographical and natural conditions in which the tea trees grow are widely perceived to be a key factor. Therefore, in China, the majority of famous teas are named for their provenance, such as the Anxi-Tieguanyin tea, the West Lake-Longjing tea, and the Anji-White tea.
Wuyi-Rock tea is originally cultivated in a mountain in the north of Fujian Province (Wuyi Mountain). Contributed to by the unique climate and edatope of Wuyi Mountain, Wuyi-Rock tea (WRT) is recognized as one of the most prestigious Oolong teas for its special savor and long-lasting fragrance. Therefore, Wuyi-Rock tea has been awarded a protected geographical indication (PGI) and exported to more than 30 countries. However, the actual yield of WRT is limited, and it cannot satisfy the needs of consumers. In various markets, many teas labeled as Wuyi-Rock tea were actually cultivated outside the protected production area; some of them are not even cultivated in Fujian Province. Although the taste and aroma of non-Wuyi-Rock teas (NWRT) are inferior to authentic WRT, teas planted in different geographical origins still have a similar appearance, and they can hardly be distinguished just by the naked eye. In traditional sensory analysis, WRT was tasted by professional tea tasters, and then counterfeits were identified based on a series of sensory scores. However, the result of sensory analysis depends a great deal on subjective decisions by tea tasters, and a welltrained tea taster is hard to find. Therefore, an urgent demand exists for developing a more effective and stable technique to discriminate the provenances of tea products.
In recent years, analytical methods based on instrumental technology have been widely applied in food quality control [9,10]. Isotope ratio mass spectrometry (IRMS) is a technique 2 Journal of Analytical Methods in Chemistry for measuring the isotope content, which is a highly indicative parameter in provenance discrimination. For example, the stable isotope ratio of hydrogen ( 2 H) in plants is influenced by the latitude and altitude of the production site. The concentration of deuterium in the water decreases when clouds form above the ocean. Then, as rainwater falls and the clouds move inland and gain altitude, the content of 2 H in rainwater decreases gradually [11]. As a result, an isotopic gradient exists in groundwater from coast to inland. Moreover, the variation of oxygen-18 ( 18 O) follows the same pattern as hydrogen in the hydrosphere [12]. The isotope ratio of carbon ( 13 C) is strongly environmentally dependent; plants cultivated in humid environments have a lower 13 C than plants in arid environments. The isotope ratio of nitrogen ( 15 N) is influenced by agricultural practices; plants treated with organic fertilizer develop a higher 15 N content than plants treated with chemical fertilizer. For provenance analysis, IRMS has been successfully used for the analysis of orange juice, fruits, cow milk, and wine [13][14][15][16]. In addition to isotopes, trace element profiling is also available. There are various metallic elements in agricultural food; some of them are easily affected by edaphic and environmental factors, such as fertilization, soil type, climate, and temperature. Both inductively coupled plasma mass spectrometry (ICP-MS) and atomic spectroscopy are tools for quantitative determination of trace elements. In provenance discrimination, element profiling has been used for honey, onion, black tea, and wine [17][18][19][20][21]. This paper aims to develop an automatic analytical method for discriminating geographical origin of WRT by stable isotope and trace element contents. To model the complicated relationship between the measured data and production site, nonlinear multivariate classification models are usually used [22][23][24][25]. Compared with other nonlinear models such as kernel partial least squares (PLS) and artificial neural networks (ANNs), support vector machine (SVM) analysis is more suitable for the target data in this experiment because of the small sample size [26][27][28]. Consequently, the classification model was built based on SVM. Then, each variable of isotope and element data was ranked by its contribution to the model.

Tea Samples.
In total, 99 authentic WRT samples were collected from 11 main rich-producing areas in Wuyishan, and 33 NWRT samples were collected from 11 different production sites. All of the samples were made of spring teas picked in 2015. Before analysis, the samples were preserved in cold and dry storage with lightproof packaging. Detailed information about the above samples is displayed in Table 1 and Figure 1.
The origin of Wuyi-Rock tea is the administrative area of Wuyishan city, and Wuyishan has 11 subdistricts. The range of sample collection covered all over the whole Wuyishan city area. So the number of samples is adequate. All the Wuyishan city collected tea samples were obtained directly from the local tea processing space with the help of official department. All the tea samples belonged to "Wuyi-Rock tea," and the non-Wuyi-Rock tea samples were purchased outside the protected production area such as Jianyang, Jianou, and Ganzhou.

Isotopic Ratio
Determinations. 13 C, 15 N, 18 O, and 2 H were measured using a MAT-253 isotope ratio mass spectrometer (Thermo Fisher, USA) connected to a Flash-2000 organic elemental analyser (Thermo Fisher, USA). During carbon and nitrogen isotope analysis, the quartz tube of the reactor was packed with chromium trioxide, high purity copper, and silver cobalt oxide to completely oxidize the organic matter. The carbon and nitrogen element carried by helium gas entered the IRMS in the form of CO 2 and N 2 , respectively. The standard CO 2 and N 2 gases were used as reference gas before and after the isotope test of organic matter, and the detection of the instrument state in the sample analysis was completed by the standard sample such as labeled urea, IAEA-600, IAEA-CH-3, and VPDB (Vienna Pee Dee Belemnite). Similarly, the instrument state was detected by benzoic acid, IAEA-601, IAEA-602, IAEA-CH-7, and VSMOW (Vienna Standard Mean Ocean Water) in the analysis process of hydrogen and oxygen isotope. Each sample was repeated three times.
The measured values ( 13 C/ 12 C, 15 N/ 14 N, 18 O/ 16 O, and 2 H/ 1 H) are usually presented as isotopic deviations, , defined as follows: where is the measured value of the sample and std is the measured value of an international standard. and std are the ratios of the heavier isotope and lighter isotope of an element.
In isotope profiling, calibration was conducted according to calibrated-urea and calibrated-benzoic acid standards as well as the IAEA-600, IAEA-601, IAEA-602, IAEA-CH-3, and IAEA-CH-3 standards of the International Atomic Energy Agency (IAEA, Vienna).

Metal Determinations.
Before metallic element detection, all of the tea samples were pretreated with microwave assisted digestion. The samples were dried before digestion process (placed in the oven for 4 hours at 80 ∘ C). All of the tea samples were manufactured in May 2015 and simultaneously analysed. Water content of tea samples was less than 6% before drying. 0.3 g of each dried tea sample was placed into a digestion vessel. Then, 1 mL ultrapure water and 5 mL nitric acid were added to the vessel. Finally, the digestion vessel was heated in the microwave cavity (2450 MHz). The vibration of gas pressure in the digestion procedure was conducted as follows: (1) ramping from normal pressure to 0.5 Mpa and holding at 0.5 Mpa for 70 s, (2) ramping to 1.0 Mpa for 50 s, (3) ramping to 1.5 Mpa for 50 s, and (4) ramping to 2.0 Mpa for 300 s. After the digestion, we evaporated the excess nitric acid in 130 ∘ C. When the temperature of the digestion vessel had cooled to room temperature, the digested liquid was moved to a volumetric flask and diluted to 50 mL with ultrapure water. Subsequently, concentrations of 14 metallic elements were measured. The concentrations of Ti, Cr, Co, Ni, Cu, Zn, Rb, Cd, Cs, Ba, and Sr were detected using an X Series-IIICP-MS (Thermo Fisher, The USA). The information about the working condition and parameters for ICP-MS is shown in Table 2. The concentrations of Ca, Mg, and Mn were analysed by HITACHI 180-50 flame atomic absorption spectroscopy (FAAS, HITACHI, Japan). The main parameters for FAAS are presented in Table 3. The results of ICP-MS and FAAS were calibrated by mixed standard solution (GSB04-1767-2004) and biological component analysis standard substance, tea (GBW10052), respectively.

Data Splicing.
All the data analysis was performed using MATLAB 7.14.0.739 (Mathworks, Sherborn, MA). For data splicing, the data of IRMS can be described as an × matrix with rows and columns. represents the number of samples and is the number of features in this paper ( = 132, = 4). In the same way, an × matrix (in this paper, = 14) was obtained from metallic element detection. Then, the columns of matrix were arranged behind the last column of matrix . As a result, a union matrix (with rows and + columns) was formed that contains both the isotope and metallic element information. Before data analysis, each variable in matrix was normalized as follows: where is the value of th ( = 1 : 132) row and jth ( = 1 : 18) column, max is the max value in the jth column, and min is the min value in the jth column.

SVM Analysis.
The support vector machine algorithm is a type of classification and regression model for supervised machine learning. The kernel function is the main factor in the SVM algorithm. Kernels have the advantage of operating in the input space, where the solution of the classification problem is a weighted sum of kernel functions evaluated at the support vectors. SVM is designed to find an optimal plane that all the sample units can be divided into two classes in a multidimensional space. The optimal plane is in the middle of the nearest points between two classes and makes the distance as far as possible. For variables, the optimal hyperplane is of -1 dimensions [29]. After the data splicing was performed, three SVM classification models were established, based on the isotope data (Matrix ), the metallic element data (Matrix ), and the union data (Matrix ). For all of the 132 samples (99 WRT samples and 33 NWRT samples), 88 of them (including WRT and NWRT samples) were selected as a training class at random and the other 44 were put into a prediction class.
To estimate the performance of the SVM model, the sensitivity and specificity were calculated as follows: where TP and FN represent the number of true positives and false negatives, respectively, and TN and FP denote the number of true negatives and false positives, respectively.

Variable
Ranking. For SVM models, it is obvious that each variable does not contribute equally to prediction accuracy. In addition, some useless information may even have negative influence on prediction, so the significance of each measured isotope and element was investigated in this paper. For each feature, the column was removed from the data matrix and a new SVM model based on the incomplete data was built. In this way, each feature of the isotope and element data was separately removed, and 18 models were built. Then, the models were compared, and if a model showed a lower accuracy, the missing feature was considered important in provenance discrimination. Using this method, every variable was ranked by its contribution.

Results and Discussion
13 C in plant samples is mainly affected by the metabolic pathway of plant photosynthesis, so 13 C is significantly different between different plants, while 15 N is mainly under the influence of such regional agricultural activities as fertilization [12]. 2 H and 18 O are affected by atmospheric water cycle. Obviously, they have dimensional and land effects through the meteorological cycle of evaporation, condensation, and precipitation. Decreasing temperatures causes a progressive heavy-isotope depletion of the precipitation when the water vapour from oceans in equatorial regions moves to higher latitudes and altitudes [11]. 2 H and 18 O in plants are affected by 2 H and 18 O in the surrounding environment, so they are well used to characterize the origin of agricultural products [12]. The characteristics of metallic elements in plants are not only related to the composition of mineral elements in the soil, but also affected by varieties, climate, and agricultural activities [30,31]. Alkaline metals, especially Cs and Rb being easily mobilised in the soil and transported into plants, are good indicators of geographical identity [12]. Cu, Zn, and Cd in the soil will be affected by agricultural activities (organic fertilizer), so these elements in plants will also be affected [32][33][34]. In conclusion, it is necessary to evaluate the influence of these trace elements in the identification of tea samples. The results of isotope and metallic element profiling are presented in Tables 4 and 5. The tables demonstrate a considerable difference in isotope and metallic content, but distinguishing the provenance of the sample just by these values proved difficult. Chemometric models are powerful tools in such situations.
Three SVM models based on isotope data, element content data, and coupled data were established, and their prediction results are shown in Table 6. The accuracy of isotope-SVM model reached 0.9318, and only 3 samples were mispredicted. Although the performance of element-SVM models (0.7727) was not very satisfactory, when it was applied coupled with isotope data, the model can greatly improve predictions, and the accuracy of the coupled model reached 0.9773.
The rank of each feature is reported in Table 7. In the table, 2 H, 18 O, Cs, Cu, Ca, and Rb contents are ranked highest, so they are the most significant indication in provenance analysis of WRT. Moreover, in further analysis, each feature was accumulatively assembled by its rank order and the  variation of accuracy was plotted in Figure 2 when a new variable was added. As shown in Figure 2, the prediction accuracy was reduced as the 18 O feature was added. This result may be caused by the overlapping chemical information between 2 H and 18 O because they have analogous variation in the hydrosphere. Therefore, the relationship between 2 H and 18 O was examined, and the correlation coefficient between 2 H and 18 O reached 0.8634, which strongly supports this assumption. Afterwards, the model achieved better performance as the variables of 15 N and 13 C were added. In element profiling, the SVM model achieved the best prediction result when only Cs and Cu contents were applied. The prediction accuracy decreased as more element features were used, so some metal elements were not significant in the identification of Wuyi-Rock tea geographical origin.

Conclusion
The origin place of Wuyi-Rock tea is the typical Danxia landform constituting purple soil, red soil, and moist sandy soil and the microenvironment is unique and exclusive. According to the results of tea identification, the importance of 2 H and 18 O exactly reflects the particular climatic environment in Wuyishan. 13 C mainly reflects the difference between different plants, and 15 N is easily influenced by the   agricultural activities [12]; therefore, the importance of 13 C and 15 N is not significant in the identification. Cs, Rb, and Sr have a higher contribution to Wuyi-Rock tea discrimination than great majority of elements, which has illustrated that the special geology in Wuyishan area provides unique features of trace elements for Wuyi-Rock tea. The contents of Cu, Ca, and Zn in Wuyi-Rock tea were affected by many factors, such as the kind of soil, fertilization, and tea varieties, so it needed further investigation and analysis of the relationship between identification and those aspects mentioned above in Wuyishan tea field. In this paper, isotope and metallic element analyses demonstrate the potential for geographical origin discrimination of Wuyi-Rock tea. As a nonlinear model, SVM was performed for classification, and the chemical information of isotopes and metallic elements is complementary in provenance discrimination. In addition, the ranks of isotope and element features were carried out using established methods. The result shows that 2 H, 18 O, Cs, Cu, Ca, and Rb contents are significant in provenance analysis, 2 H and 18 O are interrelated, and not every element is helpful in geographical origin discrimination.