Geographical Origin Traceability of Red Wines Based on Chemometric Classification via Organic Acid Profiles

A preliminary study on the chemometric classification of red wines produced from different grape varieties and geographical origins was performed based on their chromatographic profiles of organic acids. Tartaric, malic, citric, lactic, acetic, and succinic acids in wines were detected via high performance liquid chromatography (HPLC). Employing multivariate statistical methods including principal component analysis (PCA) and linear discriminant analysis (LDA), pattern recognition models were built for the classification of the investigated wines regarding the grape varieties and geographical origins. The PCA clearly grouped the wines according to variety, and the LDA further offered 100% classification ability toward geographical identification of the wines and the leave-one-out cross-validated assignments were 100%, 86.7%, and 100% correct for Cabernet Sauvignon, Merlot, and Pinot Noir wines, respectively. The results reveal the potential of using chromatographic profiles of organic acid as the characteristic indices for chemometric classification of red wines.


Introduction
Wine is one of the most popular and largest consumed alcoholic beverages in the world.Moderate intake of wine has been proved to be beneficial for the human immune system and associated with better cognitive performance [1].The classification of wines, according to different grape varieties, years of vintage, and geographical origins, has been highly attractive in the last decade [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17].Grape variety and cultivar identification is one of the most important aspects in wines classification and consumers protection, as in some occasions mixing of wines made from different wine varieties is not declared.
Organic acids are another group of possible fingerprint compounds in wines [14,17], although these compounds have certain relevance with yeast and bacteria metabolism.The most important organic acids commonly found in wines are tartaric acid (TA), malic acid (MA), citric acid (CA), lactic acid (LA), acetic acid (AA), and succinic acid (SA).TA, MA, and CA fundamentally originate from the grape, while the biological relevant AA, LA, and SA mainly stem from the alcoholic and malolactic fermentation.Choosing organic acids as the indices, Regmi et al. [14] reported the chemometric discrimination of wines, vinegars, and spirits by the direct determination of organic acids via Fourier transform infrared spectroscopy.Gallagher et al. [17] reported the pattern-based discrimination of six wine varietals by a 3 × 3 indicator displacement array composed of three boronic acid receptors and three colorimetric indicators.Although researchers have realized the feasibility of using organic acids as indices for wine classification, to the best of our knowledge, there is no report of the differentiation of wines directly based on the chromatographic profiles of organic acids in wines.Herein, we report a preliminary study on the chemometric classification of red wines according to varieties and geographical origins based on the chromatographic profiles of six organic acids.The aim of this work is to verify whether wines of different varieties and origins could be discriminated using six organic acids solely as the fingerprint.We analyzed red wines produced from different grape varieties, in recent years of vintage and from global geographical origins, to study their differences and establish an objective method for differentiating wine varieties and origins.

Sample Collection and Preparation.
A total of 20 commercial red wines were analyzed.Wines were selected to cover the common grape varieties being produced from different countries (Table 1).Prior to HPLC analysis, wines were opened and filtered through 0.45 m Polytetrafluoroethylene (PTFE) membrane.Subsequently, 10.0 mL of the filtrate was transferred into a 100 mL volumetric flask and diluted to the mark with ultrapure water.

Standard Preparation.
Organic acid stock standard solution (1 g/L) was prepared by dissolving the required amount of organic acid in water.The stock solution was stored in the dark at 4 ∘ C. Working solutions of lower concentrations were freshly prepared from the above stock solution prior to use.

HPLC Analysis.
High performance liquid chromatography (HPLC) analysis was performed with an Agilent 1200 system (Agilent, USA) consisting of an isocratic pump (model G1310A), a UV detector (model G1314B), and an injection valve (model G1328B) with a 20 L sample loop.An Alltima6 C18 column (250 mm × 4.6 mm ID, 5 m, Grace, USA) was used with a Yamatake HT-230A column oven.
In the preparation of mobile phase, 1.0 mL of 0.1 M pH 2.7 phosphate stock solution was diluted with 50 mL water; subsequently 5 mL MeOH was added to the diluted phosphate aqueous solution and then the mixture was diluted to 100 mL with water and mixed thoroughly.The obtained 10 mM phosphate solution containing 5% (v/v) MeOH was filtered through a 0.45 m membrane filter and ultrasonicated for 20 min to yield the final eluent.
In HPLC settings, flow rate was set at 1.0 mL/min, column temperature was fixed at 25 ∘ C, injection volume of 20 L was determined by the sample loop, and the variable wavelength detector was operated at 215 nm.The water dilution of the filtrate of wine was introduced into HPLC to determine the contents of six organic acids.
Linearity test solutions were prepared from the working solutions of six organic acids at seven concentration levels.Calibration curves were obtained by plotting the peak areas versus concentrations.The limit of detection (LOD) and limit of qualification (LOQ) of the method were estimated by injecting a series of diluted solutions with known concentration and determined as the sample concentration that produces a peak with a height three times and ten times the level of the baseline noise, respectively.The noise level () was 0.1 mAu.
The run-to-run repeatability and day-to-day reproducibility of both the retention time and peak area were measured.The precisions (RSDs) for the retention times and peak areas of five replicate detection instances of the mixture of 2.5 mg/L SA, 30 mg/L TA, MA, and CA, 50 mg/L LA, and 100 mg/L AA were calculated to evaluate the intraday repeatability.The interday reproducibility was estimated by calculating the RSDs of the retention times and peak areas from five repetitive injections of the standard mixture mentioned above performed on three consecutive working days.

Statistical Analysis.
A data matrix was constructed from the organic acid profiles with rows representing wine samples and columns corresponding to organic acid contents.Autoscaling was then executed to give variables with zero means and unit standard deviation; that is, each entry was subtracted by column means and then divided by column standard deviations.PCA was then employed to discriminate the wine varieties, and, subsequently, LDA was used to derive a classification rule whereby the wine samples were classified according to both variety and origin.All statistical analyses were performed using IBM SPSS, version 19.0 (IBM, USA).

Analysis of Organic Acids in Wines.
Simultaneous determination of various organic acids in wines by HPLC has been well documented [18,19].Based on the literature methods, phosphate solution containing 5% (v/v) MeOH and a conventional C18 column were used as the mobile and stationary phase, respectively.The instrumental quality parameters are listed in Table 2.The method presented good linearities ( 2 > 0.994) toward all six organic acids in wide ranges of concentration.The LOQ was low enough for the analysis of organic acids in wines.The precisions (RSDs) of the retention times and peak areas of five replicate detection instances within one day and five repetitive injections on three consecutive working days were below 2% and 10%, respectively.The above analytical results demonstrate that the employed HPLC method is of satisfactory sensitivity, precision, and stability.
Determination of six organic acids in the selected wines was then performed by HPLC.A typical chromatogram of a Merlot wine is depicted in Figure 1.As shown in Table 3, virtually all the six organic acids were found in the wines, except for MA, which was not detected (below LOQ) in four wines.TA, AA, and LA were the three major organic acids found in the wines, while MA, CA, and SA existed in relatively low contents.From the viewpoint of distribution, the contents of MA presented the largest variability among the wines (RSD = 123%), and the RSDs of those of the other acids ranged from 38% to 66%.The organic acid profiles of the wines displayed different patterns for each sample, based on which the chemometric classification of these wines was expected to be feasible.

Principal Component Analysis.
The obtained 20 samples × 6 variables × 3 replicates data matrix was processed by PCA.Two principal components explaining 58.9% of the total variance in the data were yielded.As presented in Figure 2(a), the wines of different varieties were clearly grouped from each other.The first PC, which explained 36.6% of the total variance, was mainly contributed by CA and MA  (loadings > 0.8).The second PC (22.1% of the total variance) correlated positively with SA (loading > 0.75) and negatively with LA (loading < −0.67).As CA and MA both virtually originate from the grape, while SA and LA are basically yielded in the fermentation, it could be assumed that CA and MA originally existing in grapes are the most significant variables for PCA classification of variety, while SA and LA produced in the brewing play a secondary role.TA, also one of the native organic acids in grapes, did not present as a major characteristic indice, which was not hard to understand considering that some of the detected TA could be exogenous and this would be unrelated to variety, as addition of TA in wines is legal in some countries but not in others.
The loading plot (Figure 2(b)) expresses the extent to which the new PCs correlate with the old variables.By comparing the PCA score and loading plots, the dominant variables that make one object differ from others can be disclosed.For instance, Cabernet Sauvignon wines were found in the lower left quadrant in the score plot, and they differed from other wines due to their higher AA content which was found in the lower left quadrant in the loading plot and lower SA and MA contents which were located oppositely in the higher right quadrant.Although the direct comparison of the loadings as absolute values is not appropriate, it is still possible that the conclusions obtained from this particular plot are valid.
A sample (R20) made from the combination of Shiraz (20%) and Grenache (80%) was selected as a pseudomixed example to assess the ability of the model to reveal the possible undeclared mixing of wines made from different wine varieties.The PCA result showed that this assumed mixed sample was located far from the group of pure Shiraz.Instead, it appeared adjacent to Merlot.Such result may be assigned to the assumption that the organic acid profile of the combination of Shiraz (20%) and Grenache (80%) may be similar to that of Merlot.However, as the wine made from pure Grenache was not available in this study, such assumption could not be verified.Besides, whether the mixing ratio of different wine varieties can be determined by HPLC analysis of organic acids and to what ratios a mixture of different varieties can be identified both need further investigation.
As can be seen from Figure 2(b), the main contributing variables to the first two PCs were CA, MA, SA, and LA.The feasibility of using these four acids as variables to discriminate the wines was validated.Unfortunately, the classification of the wine varieties was not achieved by using only four acids as indices.The PCA result showed that the scores of Merlot, Cabernet Sauvignon, and Pinot Noir wines overlapped with each other and could not be grouped separately.It is not surprising considering that the remaining two acids, AA and TA, which were not included as the new variables, both correlated to the original first two PCs to a degree which could not be ignored (loadings > 0.35 or <−0.25).

Linear Discriminant Analysis.
Although the PCA allowed the classification of the red wines according to variety, it could not afford the geographical identification.As an unsupervised method, PCA does not consider the relation of an object to the specified groups and just selects a direction retaining maximal structure in a lower dimension.After the failure of employing PCA to differentiate the geographical origin of the wines, we resorted to LDA.Unlike PCA, LDA is a supervised method of pattern recognition, that is, the grouping of the data has to be known prior to LDA.Utilizing the class information that is given during training, LDA selects a direction that achieves minimum within-class distance and maximum separation among the classes [20].As shown in Figure 3, the  LDA successfully discriminated the wines regarding variety with 100% correction.The leave-one-out cross-validation was 100% corrected.Compared with the result obtained by the PCA, the LDA gave more tight groups within one variety.
Stepwise LDA was also performed to find the most significant variables by using the Wilk's lambda criterion.CA and SA were revealed to be the most dominant variables for the discrimination regarding variety.This finding was partially in agreement with the PCA result, in which CA and SA were highly related to the first and second PC, respectively.
Subsequently, the feasibility of using LDA to discriminate the geographical origin was tested.The LDA plots of Cabernet Sauvignon, Merlot, and Pinot Noir wines produced in different nations are shown in Figure 4.The model owned 100% classification ability toward geographical origin for three varieties.After the leave-one-out cross-validation, 100% of Cabernet Sauvignon and Pinot Noir wines were classified correctly, while that for Merlot wines was 86.7%.The different organic acid profiles of wines from different geographical origins were still the foundation for the LDA classification.For example, the most contributing variable to classify Merlot wines by geographical origin was MA.The selected Merlot wines from different countries contained distinct MA contents, while, for Cabernet Sauvignon wines, those from China possessed higher MA, CA, and SA contents, whereas that from Australia showed the lowest contents for nearly all the six acids.
In Kiss and Sass-Kiss's work [10] to classify botrytized wines based on the biogenic amine profiles, it was reported that the organic acids alone were not adequate for the categorization of the origin and authenticity of botrytized wines.Herein we demonstrate that six organic acids are competent for the preliminary discrimination of wines according to both grape variety and geographical origin.However, as the samples set was rather small with limited grape varieties and geographical origins, whether this strategy can be extended to larger sets of samples needs further investigation.

Conclusion
In conclusion, we have achieved the preliminary discrimination of wine variety and geographical origin toward a small sample set based on the chromatographic profiles of organic acid.Using solely six organic acids as the indices, the PCA allowed the categorization of the red wines according to variety, and the LDA provided further geographical discrimination ability with 100% correction and the leave-oneout cross-validation was at least 86.7% correct.This tentative study reveals that the quantitative analysis of organic acid's profile by HPLC, coupled to chemometrics such as PCA and LDA, presents a possible alternative to classifying wines produced from different varieties and geographical origins.However, as malolactic fermentation and other brewing processes may influence the profile of organic acids, these preliminary findings should be confirmed with a larger set of samples composed of additional wine varieties and geographical origins.12JCZDJC34100, 13JCYBJC18700, and 16JCYBJC43300) and Tianjin Funding Project for Excellent Young College Teachers (Grant no.507-125RCPY0317).

Figure 2 :
Figure 2: PCA score (a) and loading (b) plots for the discrimination of wine variety based on the organic acid profiles (20 samples × 6 variables × 3 replicates).

Figure 3 :
Figure 3: LDA plot for the discrimination of the wine variety based on the organic acid profiles (20 samples × 6 variables × 3 replicates).

Figure 4 :
Figure 4: LDA plots for the geographical identification based on the organic acid profiles.Grape variety: Cabernet Sauvignon (a), Merlot (b), and Shiraz (c).

Table 1 :
Detailed information of the investigated red wines.

Table 2 :
Instrumental quality parameters for HPLC analysis of six organic acids.

Table 3 :
Organic acid profiles (mean ± s;  = 3) of the wines.See the detailed information of the wines in Table1.