Age Discrimination of Chinese Baijiu Based on Midinfrared Spectroscopy and Chemometrics

College of Food Science and Technology, Nanjing Agricultural University, Nanjing 210095, China Key Laboratory of Meat Processing and Quality ControlKey Laboratory of Meat Processing and Quality Control, MOE, Key Laboratory of Meat Processing, MOA, Jiangsu Synergetic Innovation Center of Meat Processing and Quality Control, Nanjing 210095, China College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China Key Laboratory of Navigation, Control and Health-Management Technologies of Advanced Aerocraft, Ministry of Industry and Information Technology, Nanjing 211106, China


Introduction
Chinese Baijiu is one of the six distilled spirits in the world, and it is the most traditional and popular alcoholic drink with a history of more than 5000 years in China [1][2][3]. In the past three years, although Baijiu was suffering declining annual sales because of the impact of COVID-19, it has a huge consumer market. In 2020, the annual production of Baijiu reached 7.407 million hectoliters [4]. erefore, the investigation of Chinese Baijiu in recent decades has attracted more and more interest. However, Chinese Baijiu is a transparent and extremely complex mixture. e most contents of Baijiu are water and alcohol [5], and Baijiu contains more than 300 organic compounds such as ethyl acetate, acetic acid, ethyl butyrate, and ethyl hexanoate and [6,7], which only take up less than 3% volume fraction of it. It is widely known that these organic compounds determine the quality or flavor of Baijiu.
Flavor is the most important grading standard for Chinese Baijiu. In modern Baijiu industry, the aging process is usually employed to improve the flavor and quality of Chinese Baijiu. In other words, the age of Baijiu is the most important factor affecting flavor [8,9], where the wine age is the storage years of Chines Baijiu in specific containers. at is because a series of slow physical and chemical reactions occurred during the extension of storage time. Some lowboiling impurities volatilize naturally such as sulfides, irritative aldehydes, and so on, which reduces the unpleasant bitter taste and astringency. Meanwhile, due to the reinforced association between alcohol and water molecules and the volatilization of ethanol, the stimulation from alcohol has weakened compared with the high-proof raw Baijiu. In this case, more than 300 organic compounds can reach equilibrium, which forms more harmonious and coordinated taste and tends to achieve optimal quality and increasingly prominent fragrance [10][11][12]. Consequently, the liquor age is often used to evaluate the quality of Chinese Baijiu [13]. However, since better economic benefits can be obtained by prolonging the storage time, there exist many unacceptable phenomenon in the market, such as cutting corners in the production and treatment process, false reporting of the age of Chinese Baijiu. ese cases disrupt the Baijiu market and seriously affect the reputation of Chinese Baijiu. erefore, it is urgent to design a method to quickly and accurately detect the age of Chinese Baijiu and avoid the aforementioned problems [14,15].
In recent decades, several technologies, used for the age detection and quality identification, have been proposed. e technologies mainly focus on the chromatography and spectrum, such as gas chromatography [16], gas chromatography-mass spectrometry (GC-MS) [17,18], high-performance liquid chromatography [19], near-infrared spectroscopy [20], atomic absorption spectroscopy [21], visible-ultraviolet spectroscopy [22], and fluorescence spectroscopy [23]. In this paper, we adopt GC-MS and midinfrared (MID) spectroscopy technologies to classify the age of Chinese Baijiu. GC-MS is extensively applied in the field of spirit ingredient detection and accurate qualitative and quantitative analysis [17]. In [18], GC-MS, combined with an electronic nose system, was utilized to characterize the volatile aroma compounds in the Chinese Baijiu and distinguish the difference between different liquor ages.
MID spectroscopy is the absorption spectrum of material in the wavelength range of 2.5 ∼ 25 μm. e information recorded in the MID spectrum is the fundamental absorption region of hydrogen-containing groups such as -CH, -NH, and -OH [24]. In the production of Chinese Baijiu, MID spectroscopy has developed into an effective approach for quantitative and qualitative analysis. In [25,26], the aroma component detection has been performed, and the quantitative models for routine parameters in the spirit have been established. In [27,28], the Baijiu samples from different geographical origins were classified accurately to realize the purpose of optimizing the brewing processing. In [29], MID is utilized to identify the authenticity of Chinese Baijiu for protecting the interests of consumers. e work in [30,31] demonstrates the application of MIR spectroscopy in the classification of mellow wine. Nevertheless, neither of these studies the applications on the aging of Chinese Baijiu.
In addition, many intelligent models have been widely used in rapid detection of Chinese Baijiu due to their advantages in multivariate nonlinear modeling establishment. ey are represented by principal component analysis (PCA) [32], artificial neural networks [33][34][35], and support vector machine (SVM) [36,37]. In particular, SVM [38] is a learning method and first proposed by Cortes and Vapnik in 1995. It is based on statistical learning theory and structural risk minimization criterion. Meanwhile, it has a great superiority in solving the nonlinear and high-dimensional pattern recognition problems and other machine learning problems such as function fitting [39]. erefore, the SVM model has already been widely employed in the food classification problems [40,41]. In this paper, SVM is adopted to construct the classification model to realize the age discrimination of Chinese Baijiu.
In summary, it is important to investigate wine age discrimination of Chinese Baijiu based on midinfrared spectroscopy and chemometrics. In this paper, the aging mechanism of Baijiu is studied and a qualitative model is established to distinguish it from aging time (raw spirit, 1, 3, and 5 years old) according to infrared spectrum characteristics. Meanwhile, the impacts on the results of different spectral preprocessing methods, composing of the principal component analysis (PCA), discriminant analysis (DA), and SVM, are evaluated. e major contributions of this paper are summarized as follows: (i) Based on near-infrared spectroscopy technology, a qualitative analysis method is developed to be able to quickly and nondestructively evaluate the age of Chinese Baijiu. (ii) For the spectral data of Chinese Baijiu, PCA technology is proposed to extract the main data and exclude outliers to provide optimal variables for subsequent analysis. Simultaneously, PCA and DA are employed to establish the analysis model. (iii) Furthermore, considering the limited number of the Baijiu samples, the grid search strategy and crossvalidation methods are used to dynamically adjust the parameters of the SVM during the training process of the SVM classification model, which improves the accuracy of the SVM model. is paper is organized as follows: materials and methods are listed in Section 2. e statistical analysis including GC-MS results, infrared spectrum data analysis, and DA model classification results are presented in Section 3. e constructing of SVM classification results is presented in Section 4. Section 5 gives the final conclusion and future work.

Experimental Material.
Eighty Baijiu samples are provided by the Yanfeng Winery in Hunan, and the samples are selected from different workshops, vessels, and production dates. Luzhou-flavor Baijiu, whose alcohol content is 60% (V/V), is a typical fragrance type of Chinese Baijiu. erefore, Luzhou-flavor Baijiu is selected in this paper. All samples are separated into four groups on the basis of storage time: 0, 1, 3, and 5 years. In total, 80 samples are collected and analyzed (20 samples of each group). reefourth of the samples are selected randomly for training the SVM model, namely, the training set. e remaining part is utilized to test the classification performance of the SVM model, namely, the test set. Furthermore, the training sample set consists of 60 samples and the test sample set is composed of 20 samples. e distribution of Baijiu samples is listed in Table 1 in detail.

Determination of Volatile Aroma Components.
Chromatographic Calculation of the concentration of volatile aroma components: n-amyl acetate is selected as the internal standard substance for quantitative analysis, and the internal standard solution is prepared according to GB/T10345 − 2007. e concentration and peak area of the internal standard substance are known, and the quantitative analysis is carried out according to the comparison of the peak area of target substance and the internal standard substance. e concentration is expressed as follows: where c is the concentration of the aroma substance whose unit is mg/μL. A 1 and A 2 are the peak area of the aroma substance which require quantitative analysis and internal standard substance, respectively. m is the mass of the internal standard substance whose unit is mg. V is the volume of the Baijiu sample, and its unit is μL.

Infrared Spectrometric Measurement.
Before spectral acquisition, all samples are stored in the laboratory at 4°C. Samples are scanned by using the Nicolet-6700 FT-NIR spectrometer ( ermo Fisher Scientific, USA) with the single-point attenuated total reflectance attenuation accessory under the room temperature 25 ± 0.5°C, and deionized water is utilized as the reference. e sample cuvette is cleaned more than three times by test samples and dried up before every measurement to refrain from pollution. Instrument parameters are provided as follows: spectral resolution is 4 cm − 1 ; measuring range is 4000 ∼ 400 cm − 1 ; and successive scans times are 32. e spectra of each sample are corrected in triplicate, and the average value is regarded as the final spectral data.

Spectral Data Pretreatment.
In this paper, several spectral data pretreatment methods are employed, which are spectral smoothing, multivariate baseline correction, and first and second derivative, respectively. Spectral smoothing can reduce signal interference from high-frequency noise and improve the appearance of the spectrum. Since the baseline obtained in the spectrum may be tilted, drifted, or curved, baseline calibration is conducive to find desirable peaks, which is more profitable in spectral comparison or quantitative analysis. Multivariate baseline correction is a polynomial interpolation calculation for a specified baseline point, which is suitable for severely curved baselines. Furthermore, due to the coupling of different chemical groups in the Baijiu samples, the infrared absorption spectrum lines coincide. It is known that differential processing is proposed against the overlap of spectral lines. Consequently, the first derivative and the second derivative are commonly utilized. ey can enhance the subtle spectral features. e first derivative is the rate of change of the whole spectrum, and the second is the change in the spectral rate change.

Principal Component Analysis.
PCA is a multivariate statistical analysis method. e main principle is that the high-dimensional feature data are mapped to the low-dimensional space through orthogonal transformation. e linear independent variables in the low-dimensional space can contain the features of the original data, and the main components are defined. In general, the larger the signal data variance is, the greater is the amount of information contained in the signal. Because contained information mainly depends on the carrying characteristics of data variance, the cumulative variance contribution rate is employed to measure the amount of data information. e detailed steps are listed as follows: Step 1: standardization of raw data: if there are m features and N samples in the original data, they can be expressed by the matrix of dimensions, that is, Step 2: the original data are normalized to generate the standard matrix X * (the values of all elements are within 0 and 1), that is,  Training set  Test set  Raw spirit  20  15  5  1-year aged  20  15  5  3-year aged  20  15  5  5-year aged  20  15  5  Total  80  60  20 where i � 1, 2, . . . , N, j � 1, 2, . . . , m. x j , s j are the mean value and variance of variable index x j , respectively.
Step 3: the correlation matrix R of the standard matrix X * in step 1 can be calculated by Meanwhile, the eigenvalues of matrix R from large to small are calculated as λ 1 > λ 2 > · · · > λ m , and the corresponding eigenvectors can be also obtained as Step 4: determining the number of principal components: firstly, the variance contribution rate is calculated according to formula (5); then, the cumulative variance contribution rate can be obtained by equation (6).
According to the cumulative variance contribution rate, the number of principal components can be determined. In general, the cumulative variance contribution rate of the selected main component should be within 80% and 97%, which can contain most of information of the original data.
Step 5: according to the principal components in step 3, it can be concluded that the corresponding eigenvector matrix is U m×p � [u 1 , u 2 , . . . , u p ]. Finally, the features of n samples are compressed to p principal components, and the dimensionality of the data is reduced. e matrix after dimension reduction is 2.6. Discriminant Analysis. Discriminant Analysis (DA) is a multivariate statistical analysis method for classification [39][40][41][42]. e basic principle of this method is that Baijiu samples are classified based on distance function, where the most commonly utilized method is the Mahalanobis distance. e Mahalanobis distance is calculated as where d is the Mahalanobis distance and x is the score vector of the sample. μ is the mean score vector of the sample sets, and S is the score covariance matrix. T is the transpose of (x − μ). Discriminant analysis is applied to calculate the Mahalanobis distance between unknown samples' spectrum and a set of standard spectra with TQ Analysis software. Consequently, those unknown samples will be classified to a given class and the Mahalanobis distance displayed for each class. e closer the value is to 0, the better the matching result is.

Support Vector Machine.
For the training samples (x i , y i ), x i ∈ R n is regarded as the input of the SVM model and N) is the number of the training samples. roughout the nonlinear mapping ϕ(·), the input data x i can be mapped to a highdimensional feature space. By the high-dimensional spatial map, a linearly nonseparable problem can be transformed into a linear separable problem in high-dimensional space, which is shown in Figure 1. Hence, in this feature space, the regression model is mathematically expressed as where ω is a weight vector and b is bias.
According to the principle of structural risk minimization, equation (9) can be rewritten as an optimization problem with equality constraints: where c is the regularization parameter. ζ i is the relaxation variable. e aforementioned problem (10) is a typical convex quadratic planning problem which can be solved by introducing Lagrange function. It can be expressed in detail as where α i (i � 1, 2, . . . , N) represents the Lagrange multiplier. According to the optimal condition of Karush-Kuhn-Tucher (KKT), it can be concluded that where is a kernel function satisfying the Mercer condition. In this paper, we adopt radial basis function as the kernel function of SVM. It is expressed in detail as where g 2 represents the nuclear width. e SVM classification model can be obtained by solving the linear equation (12). Also, the model is presented as From formula (14), we can conclude that the structure of SVM is similar to that of neural network. e output is a linear combination of intermediate nodes, and each

Changes of Volatile Flavor Compounds during Spirit
Storage. Acetic acid is one of the chief acids in Chinese Baijiu, and esters exist in the form of ethyl ester mostly. e component contents of ethyl caproate and ethyl lactate, which are related to the quality closely, are at a high level. ey are the main aroma components of Luzhouflavor Baijiu, which is consistent with the references. Changing regularities of the organic compounds are beneficial to explore the aging mechanism of Chinese Baijiu. It can be observed from Figure 3 that the major contents exhibit an increasing trend, a sharp growth tendency in the early stage and a mild growth in the later stage with the extension of storage time. Accordingly, we can infer that the physical and chemical reaction rate in Baijiu decreases and tends to be stable. Not only the content but also the types of substances have changed. Some new substances appeared such as propionic acid, valeric acid, hexyl hexanoate, ethyl decanoate, and so on. e reasons for their formation are the oxidation of alcohols, esterification of acids and corresponding alcohols, and hydrolysis of esters, which make all kinds of trace components to be in a dynamic equilibrium. e formation of new substances makes the Baijiu body become more abundant, which is indispensable in stabilizing and improving quality. To summarize, compared with base Baijiu samples, the aged Chinese Baijiu is more affluent in ingredients' content and variety. e change of the ratio of internal components and new substances makes the Baijiu body become more harmonious, which endows mellow taste and strong fragrances.

Original Spectral Analysis and Spectral Pretreatment.
From Figure 4, it can be observed that the spectra of the four groups' samples are highly overlapped regardless of aging duration, which cannot be distinguished by naked eyes. Although there are hundreds of substances in the Chinese Baijiu, the MIR band consists of the base frequency and the fundamental absorption region of hydrogen-containing groups, which results in no significant difference on the whole except in the range of 2300 − 2400 cm − 1 . en, the wave band of 2300 − 2400 cm − 1 is locally magnified and displayed in the medium-sized picture at the top right of Figure 4. e difference is visible after amplification, but the samples cannot be completely distinguished through original spectral analysis alone. e spectral data pretreatment, composed of spectral smoothing, multivariate baseline correction, and first and second derivative processing, are subsequently carried out to evaluate the classification of samples.
Compared with original spectra, the subtle differences can be significantly enhanced and amplified through derivative processing. Figures 5 and 6 are the results of firstorder and second-order derivative spectral processing, respectively. Different from spectral smoothing and multivariate baseline correction, it makes the difference become more remarkable. e spectral characteristics of original spectrum in two bands of 2300 − 2400 cm − 1 and 1400 − 1600 cm − 1 are enhanced, and the absorption band at 1740 cm − 1 is potentially related with esters. In addition, the absorption band at 1580 cm − 1 might be related with Lactate [43][44][45]. However, it is difficult to distinguish them barely from the intensity, position, and shape of peak. Besides, the spectrum of Chinese Baijiu samples overlaps and interlaces, which makes the work become more challenging. Figure 1: Architecture of SVM. Figure 2: Support vector and interval.

PCA Analysis.
e spectrum of wine age identification samples is collected on the whole band. e results of PCA are shown in Figures 7 and 8. From Figure 7, it can be seen that the later the component, the smaller the contribution rate of variance. e cumulative contribution rate of the first two principal components is as high as 99.8%, which is very close to 100%. PC1 and PC2 can represent the most of information of the infrared spectrum. From another perspective, it is impactful and feasible to utilize the means of PCA for dimension reduction. Figure 8 shows the two-dimensional score figure of PC1 and PC2 derived from the original spectrum separately. It      Journal of Food Quality samples. e black marking part of 5-year-old samples is inclined to cluster evidently, yet some overlap with 1-yearold samples. In the meantime, 1-year-old and 3-year-old samples are messy and hard to distinguish. at reason is the close nature of them. As for the whole, it is uncomplicated to distinguish whether the Baijiu is raw or mellow because of the great difference in its own properties. ere are more or less overlapping phenomena in the aged Baijiu samples, especially in the 1-year-old and 3-year-old samples. ey are approximate in terms of storage time, trace components, and spectral characteristics, which make the two become most likely confused.

DA Classification
Results. PCA can merely achieve the distinction between raw and aged spirit samples. It is unrealistic to gain the complete classification of four kinds of substances. erefore, the methods of discriminant analysis, different spectral bands, and spectral pretreatment ways are employed to establish the identification model, so as to avert the possible misjudgment caused by overlapping. It is essential to select appropriate wavenumber for mitigating disturbance, improving prediction accuracy, and simplifying the model. According to the position of several main absorption peaks, the full spectrum    is in the test set. e accuracy of discrimination in the training set is 93.33% and 95.00% in the test set. e waveband of 594 − 3930 cm − 1 has achieved optimum results, which indicates that the spectrum, in the range, contains the key classification and identification information of Baijiu samples. According to the abovementioned analysis results, full band range is the final choice to modeling, and different spectral pretreatment methods are applied to complete the screening of them. Table 3 shows model results from distinct spectral pretreatment methods: 5-point smoothing, 15-point smoothing, multivariate baseline correction, and first derivative and second derivative processing. It can be observed that the first-order and second-order differential processing have 15 and 11 misjudgment samples in amount, respectively, with the poor results. e prediction accuracy of first-order differential in the test set is merely 75.00% with great uncertainty. Yet, the results of smoothing and multivariate baseline correction are much better. Five samples are misjudged in total, which is consistent with the original spectral analysis results. In other words, smoothing and multivariate baseline correction processing have no essential changes on the treatment results. Comparing the results of differential processing and original spectral modeling, the number of misclassified samples in calibration sets decreases from 15 to 5. e accuracy of discrimination increases from 83.38% to 93.33% in the training set and increases from 75.00% to 95.00% in the test set. e decrease in the number of misclassification and improvement of the accuracy of discrimination are in the ideal direction. e qualitative identification model is established on the whole band combining with original spectrum finally. Figures 9 and 10 are two-dimensional and three-dimensional Mahalanobis distance graphs of different liquor age samples based on the DA method. It can be seen that the raw Baijiu samples can be distinguished from mellow Baijiu samples evidently. It is more obvious in the three-dimensional Mahalanobis distance graph. From      are 93.33% and 95.00%, respectively, by the DA method for the age classification of Chinese Baijiu.

Parameter Optimization Based on Grid Search and Cross
Validation. According to the principle of SVM, the regularization parameter c and kernel width parameter g 2 play an important role in the model. Consequently, before utilizing SVM to construct the Chinese Baijiu classification model, the regularization parameter c and kernel width parameter g 2 should be determined. In this paper, grid search (GS) and cross validation (CV) are employed to optimize the two parameters of SVM.
Grid search: the grid search method is an exhaustive method. is method takes several divisions in each dimension of the parameter space, and it traverses all grid intersections in the input space to obtain the optimal solution. e advantage of the grid search method is that it can ensure that the search solution is the global optimal solution in the delimited grid. Simultaneously, the significant errors can also be avoided.
e details of the method are represented as follows: Firstly, to the best of our knowledge, the ranges of c and g 2 are set as [−10, 10] to form a larger 2-dimensional plane.
en, based on this plane, the intervals of c and g 2 are divided into M points and N points at equal intervals to form an M × N grid plane. e intersection of the grid planes is a possible combination of parameters. Finally, for each parameter combination, the estimation error is calculated and the combination with the minimum error is the optimal parameter. Cross validation: in this paper, the capacity of the samples is limited relatively. In order to make full use of all the sample dataset for training and test, the crossvalidation method is employed by minimizing the mean square error (MSE), which is expressed as where y i and y i are the actual value and estimation value, respectively. As a matter of fact, it is worth noting that the SVM classification performance of parameter combination is affected by the training data. For the same group (C, g 2 ), when the training data change, the corresponding SVM performance also changes. In particular, considering small sample training, the parameter optimization is greatly affected by the randomness of the sample, which is not conducive to the generalization and promotion of the model. Based on the abovementioned discussion, the k-fold cross validation method is adopted to comprehensively evaluate the performance of each group (C, g 2 ).

PCA-GS-CV-SVM Classification Model.
e qualitative identification analysis model of Chinese Baijiu samples is established based on the SVM algorithm in libsvm toolbox of MATLAB. e specific steps are as follows: Step 1: PCA of the infrared spectra of all samples is carried out over the full spectra range.
Step 2: the data after the PCA are divided into the training dataset and test dataset. Establishing the correspondence between sample categories and labels simultaneously, the corresponding relationships are listed as follows: raw spirit (1); 1 year old (2); 3 years old (3); and 5 years old (4).
Step 3: the input and output data from the training set, together with input data from the test set, are normalized. e normalization formula is as follows: where y i is the normalized data, x i is the original data, and x max and x min are the maximum and minimum values of the original data, respectively.
Step 4:eEstablishing and training the qualitative model of SVM: e radial basis function is used in this paper to obtain better qualitative accuracy, and the crossvalidation method is used to find the optimal SVM model parameters, including the penalty factor c and the variance g in the radial basis function.
Step 5: the input data from the test set are input to the trained SVM qualitative model to detect the performance of the established model. To explain the principle and scheme of the PCA-GS-CV-SVM classification model, the entire frame is given by Figure 11. Figure 7, the accumulated contribution rate of the first three principal components is 99.8%, close to 100%. e contribution rates of the latter components are small. Most of the spectral information is represented by the first three principal components. erefore, it can be considered that PCA is reliable for reducing dimensionality of the Chinese Baijiu identification samples. When the penalty factor c is 0.5000 and the variance in the radial basis function g is 0.2176, the qualitative model of SVM combined with PCA is established. e identification results are shown in Figures 12 and 13. In addition, the classification results of Baijiu samples by different models are given in Table 4. It can be seen that a total of 100% classification accuracy is obtained in the training set   and test set. e classification of the established model is completely consistent with the actual ascription, which shows that the SVM model could distinguish the different age groups excellently.

Conclusions
In this paper, we propose a liquor age discrimination method of Chinese Baijiu based on midinfrared spectroscopy and chemometrics. Meanwhile, the identifying results are demonstrated based on different modeling methods, spectral preprocessing, and band selection. Fivepoint, 15-point spectral smoothing, and multivariate baseline correction have little effect on the analysis results, but the derivative processing is the worst. As far as the modeling method is concerned, PCA can merely achieve the distinction between raw and aged Chinese Baijiu samples. It is unrealistic to obtain the complete classification of four kinds of substances. e DA method mistakenly judges 1-year-old and 5-year-old samples as 3year-old with 93.33% classification accuracy in the training set and 95% in the test set. A total of 100% classification accuracy is obtained in the training set and test set by employing PCA-GS-CV-SVM algorithm. is method can obtain ideal experimental results and can be applied for the rapid and nondestructive detection of Chinese Baijiu. However, the current work focuses on the liquor age classification of Luzhou-flavor Baijiu, one of the classic flavors of Chinese Baijiu, and the number of samples is limited. In further research, added samples should be collected from different flavors, regions, and grades to establish a more complete calibration model.

Data Availability
e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare no conflicts of interest.