Multivariate Methods Based Soft Measurement for Wine Quality Evaluation

and Applied Analysis 3 in which P is the so-called loading matrix of covariance matrix. Step 3. Determine the number of principle components with the appropriate criteria [19] and calculate the score matrix T: T = UPpc ∈ R n×l , (5)


Introduction
The evaluation of wine quality is highly significant due to their effects on wine classification and target marketing. The quality of wine reflects several major influences, including quality ratings, the reputation of the winery, and profitability [1]. Therefore, it is imperative to evaluate wine quality for both the food industry and wine science community [2,3]. In view of the evaluation of wine quality, the traditional method is manual inspection or analysis of the chemical compounds. These methods cost huge financial inputs and time. The studies of wine quality evaluation are abundant internationally. The support vector machine (SVM) can build the wine quality classification model. The probabilistic neural network can evaluate the wine quality based on the mineral elements content of red wine. The visualization method of wine quality evaluation proposed can evaluate the wine quality according to the physicochemical property. However, in this paper, the multivariate methods based on soft measurement are the appropriate tools for chemical and physical measurements based on the wine physicochemical indexes.
Soft measurement technology is widely used [4][5][6]. The production process and the automatic control theory can be combined perfectly by soft measurement technique. In the industrial process, it seems common that some key control variables cannot be measured due to technique and economy factors [7]. However, based on soft measurement, a mathematical model can be built according to the relation of detectable process variable and in this way the undetectable process variable can be measured or estimated [8]. Due to these detectable variables and estimable variables, the productive process is controllable.
With the method of online analysis applied, the investment and the instrument maintenance continue to increase. Manual inspection is a time-consuming and complex work. The soft measurement is superior, with the development of computer technology. The primary methods to build soft measurement model include mechanism based modelling, knowledge based modelling, and data based modelling according to different prediction models. Using mechanism based technique to build the model, the known physical and chemical laws are a must and sometimes the mechanism in production process needs to be understood deeply. Moreover, this application is limited, owing to the difficulty in modelling and period length. Although the knowledge based modelling is a simple, easy-to-understand, and convenient method, it is not suitable for the high precision and the knowledge rules extracted. The process data based modelling method can build statistical regression model based on multivariate statistical analysis theory, using a large number of production data. This method is superior in modelling, maintenance, and precision. In this paper, the model based modelling is utilized and the model is built by the multivariate methods based on soft measurement.
In this paper, in order to evaluate red wine quality, the multivariate methods based on soft measurement are used. This algorithm can help construct the fitted wine quality model and predict wine quality. We make a comparison among the models established by using OLSR, PCR, PLSR, and MPLSR, respectively, and then select out the best model in order to improve the model accuracy [9]. The methods are superior to manual measurement in the facet of wine quality evaluation. Due to the serious stability of OLSR, the models of PCR, PLSR, and MPLSR are built. These models can also help improve the production process. Furthermore, it can be useful in target marketing, it can help identify the most relevant factors, and can help classify wines such as premium brands (useful for setting prices) [2]. The physicochemical indexes that can impact the quality of wine are also proposed in this paper. There are 87 physicochemical indexes collected, in which a set containing 20 samples is used as calibration set and one containing 7 samples is used as verification set. We predict the wine quality when these physicochemical indexes are just known. This paper is organized as follows. Section 2 reviews the multivariate statistical methods, including OLSR, PCR, PLSR, and MPLSR. In Section 3, the data sets are introduced and analyzed. The wine prediction models are established based on the multivariate statistical methods and then a comparison among these models is provided. Finally, the conclusions are presented in the last section.

Modeling Algorithms
In this section, the multivariate statistical analysis is used to solve the wine problems. OLSR, PCR, PLSR, and MPLSR are introduced briefly. Owing to the multicollinearity or the less number of samples than variables, OLSR lost its effectiveness. PCR is utilized to solve the problem of collineation. The information plays an important role in the data set based on the cumulative percent variance (CPV). Unlike PCR, PLSR especially focuses on the internal height linearly dependent variables. MPLSR has the advantage of both PCR and PLSR and is better than them.

Ordinary Least Squares Regression (OLSR).
Least squares method is a kind of mathematical optimization technique. The matching data is found by minimizing the error sum of squares. The unknown data is obtained by the least squares method simply. The difference between quadratic sum of actual data and quadratic sum of calculated data is very small so that the least squares method can also be used for curve fitting. In general, the OLSR algorithmn is used to solve problems only involving one dimensional response variables. However, OLSR can also solve the problems of more than one dimensional response variables. The algorithm is briefly introduced as follows.
Step 1. Collect samples of the measurable variables and response variables and then normalize them to zero mean and unit variance, denoted as The following steps will be executed only when is a full rank matrix.
Step 2. Minimize the sum of errors : Because of the invertibility of , the can also be calculated by Step 3. Use the and to form ∈ × and ∈ × , respectively, and the final model can be indicated as

Principal Component Regression (PCR).
Principal components analysis invented by Pearson in 1901 [10] can analyze the data and establish the mathematical model. The method plays an important role in the data of principal components (i.e., characteristic vectors) and their weights (i.e., the eigenvalues [11]) through the study of the characteristics of covariance to decompose matrix [12]. Since 1980s, PCA has been successfully applied in numerous areas including data compression, image processing, feature extraction, pattern recognition, and process monitoring [13,14]. Since PCR is superior in dimensionality reduction, it may solve the variable multicollinearity in practice and OLSR's stability problem [15]. The collinearity problem is effectually solved. The irrelevant aggregative indicators, original indicators variant, take the place of the original ones based on PCR. The aggregative indicators are used to show the original indicators in this statistical approach. PCR has been widely used in many fields [16][17][18].
Step 1. Gather measurable variables and the responses variables and normalize them to zero mean and unit variance, Step 2. Accomplish singular value decomposition (SVD) on the covariance matrix of independent variable set: Abstract and Applied Analysis 3 in which is the so-called loading matrix of covariance matrix.
Step 3. Determine the number of principle components with the appropriate criteria [19] and calculate the score matrix : where is the number of the principal components.
Step 4. Perform OLS regression between the score matrix and the dependent matrix and obtain the final model: Namely, = + , = pc .

Partial Least Squares Regression (PLSR).
Partial least squares method can be applied in the case where the number of explanatory variables is very high. PLS generalizes and combines features from principal component analysis and multiple regression. Least squares regression and principal component regression method extract different factor scores. The purpose of PCR is to extract the relevant information to ensconce matrix and predict the value of variable . Then we can use these independent variables to improve the quality of prediction model. When the correlation of some useful variables is low, the reliability of the final prediction model of PCR will go down. This technique has certain defects and is too hard in solving this problem. Nevertheless, PLS can decompose the variables and and extract the components from , at the same time. Latent variables (LVs) of and have a strong relation and ensure the PLSR algorithmn based model can make prediction from measurable variables [20].
We decide a few factors to participate in the model. PLSR is widely used in many areas, such as fault detection, wine analysis, and chemistry [9,21,22]. We consider the standard PLSR method as follows.
Step 1. Collect the measurable variables and the response variables and normalize the data sets of and , expressed as = [ 1 ⋅ ⋅ ⋅ ] ∈ × and = [ 1 ⋅ ⋅ ⋅ ] ∈ × , in which and are normalized to zero mean and unit variance.
Step 2. Calculate the following equations times iteratively: where the , and , are the loading vector and score vector of and , respectively, and the is the number of latent variables (LVs) which is usually determined by the cross validation criteria [22].

Modified Partial Least Squares Regression (MPLSR).
As a matter of fact, the problem of multicollinearity and the less number of samples than variables are two common phenomena. When OLSR is applied, the problem of multicollinearity leads to a serious stability problem. PCR and PLSR can solve the problem of highly collinear. PCR algorithm solves the collinearity problems efficiently by introducing principal components (PCs). PLSR considers both outer relations ( and block individually) and inner relation (linking both blocks), but PCR only considers the outer relation of block. PCR and PLSR run the risk of losing useful information in selected PCs or LVs by these dimension reduction methods. In order to solve these problems, Yin et al. proposed the MPLSR [23]. MPLSR has been validated on the industrial benchmark of Tennessee Eastman process for fault detection and a good result has been obtained.
Step 1. Gather all the samples of measurable variables and response variables and stack them into and , respectively. Normalize them to zero mean and unit variance, denoted as = [ 1 ⋅ ⋅ ⋅ ] ∈ × and = [ 1 ⋅ ⋅ ⋅ ] ∈ × .
Step 2. Calculate the regression coefficient matrix : in which the ( ) † is the pseudoinverse of .

4
Abstract and Applied Analysis  Step 3. Obtain the final model of MPLSR: where is the residue part of which is uncorrelated with .

Results and Discussion
In order to classify and identify red wine, [24] proposed the technology based on the spectrum and pattern recognition. Nevertheless, this paper is devoted to comparing these methods and predicting the wine quality for the purpose of wine classification and target marketing.
In this section, we use three multivariate statistical methods based on soft measurement, including principal component regression (PCR), partial least squares regression (PLSR), and modified partial least squares regression (MPLSR) to find which is the best method according to the relationship between the red wine physicochemical indexes and the wine quality. We ignore ordinary least squares regression (OLSR) which is not for this situation. The fitting efficiency and prediction ability of these methods provided are compared by relevant figures or indexes and the best model can be searched.

Data Preprocessing.
The data set (mathematical modeling official site: http://www.mcm.edu.cn) in this paper includes the red wine qualities, physicochemical indexes, and aroma substances perfectly, of which the last two are collectively known as physicochemical indexes in the following.
The scores, given by professional tasters in two groups, have obvious differences. Solving the scores is a primary task. We analyze two sets of scores from four aspects: presentation (20 points), fragrance (30 points), mouth-feel (40 points), and overall feeling (10 points) and the total is 100. For example, the important information in one sample can be found in Table 1.
In the data set, 27 samples of wine in the first group are equal to those in the second group. There are 10 tasters on different levels in each group. The missing data in the first group is replaced by average value (AVG). The first group's standard deviation (SD) is 7.3426 and the other one's is 3.978.
The average value of samples in both groups and standard deviation of two groups are calculated in Table 2, respectively.
Obviously, the standard deviation of the second group calculated is smaller than the one of the first group. The second group provides a more reliable result than the first one. So we choose the second one as the actual wine quality to compare with fitted wine quality. To construct the predicting models, the set of 20 random samples collected is used as calibration set. The remainder samples are used as the verification set to verify the prediction ability of the models.

Modeling and Comparison.
In order to guarantee the influence between indexes and quality of wine, some assumptions are made as follows.
(1) In this paper, the vinification processes are identical, and the environment of vinification is the same.
(2) The scores given by the expert tasters approach the real quality scores.
(3) The grapes used to vinify a kind of wine are of one species.
The purpose of all assumptions above is that only the wine physicochemical indexes can affect the quality of wine. We emphasize the relations between the physicochemical indexes and the quality of red wine while ignoring others.
OLSR is not applicable to the situation of multicollinearity in the calibration set. It is not utilized when the number of samples is smaller than the number of variables. In this paper, OLSR is not employed to establish the model.
The other multivariate statistic methods based on soft measurement, including PCR, PLSR, and MPLSR, are utilized to establish models. We propose the root mean squared error of calibration (RMSEC) and root mean error of prediction (RMSEP); RMSEC evaluates the fitting efficiency and RMSEP Abstract and Applied Analysis demonstrates the prediction ability of different methods [9]. RMSEC and RMSEP can be described as follows: where is the measured value for the th sample;̂c al, and pred, are the method fitted and model predicted response values for the th sample, respectively; and and are the number of calibration samples and verification samples, respectively.
The CPV 2 (cumulative percent variance) [19] is firstly employed to select the number of PCs with 98% as CPV and 17 PCs are obtained for PCR. With the help of cross validation [22,25], 4 LVs are chosen for PLSR. There is no need for MPLSR to select the PCs or LVs that are the main difficulties for PCR and PLSR in maintaining the integrity of information from the origin data set.
According to Figures 1, 2, and 3 and Table 3, these different multivariate statistical methods will significantly impact the effect of the predicting models. Due to the problems of multicollinearity and the less number of samples than variables in the calibration set, the OLSR is totally not fitted to establish the prediction model. In order to reach the minimal values of RMSEC, MPLSR obtains a satisfying result for the fitting efficiency. However, PCR, PLSR, and MPLSR have a similar prediction ability in this paper.

Contribution Ratio of Wine Physicochemical Indexes.
In this paper, 87 wine physicochemical indexes are collected in the calibration set and these indexes can provide adequate information. The large wine physicochemical indexes to be measured are a time consuming and complex work. We all know some wine physicochemical indexes do not contribute to the wine quality. In order to avoid doing lengthy and complex calculations, it is necessary to analyze the contribution of every wine physicochemical index and select the main wine physicochemical indexes.
In order to analyze the contribution of every grape physicochemical index to the wine quality, a contribution ratio (CR) is introduced and it can be formulated as follows: where is the coefficient matrix of regression equation and 0 is a constant. The benefit of this definition for CR is that the sum of all CR ( = 1 ⋅ ⋅ ⋅ ) equals 1.
The figure, contribution of wine physicochemical indexes, is shown in Figure 4. As can be seen from the figure, the great influence on the wine quality is cis-resveratrol, color, acetic acid, 2-methylpropyl ester, butanedioic acid, diethyl ester, and so forth. On the contrary, some physicochemical indexes can be ignored, such as transresveratrol, TR, limonene, 1butanol,3-methyl, styrene, and so forth. Since some variables can be controlled in the production process, this information can be used to improve the wine quality. The detailed information is shown in [9].

Conclusion
In this paper, multivariate methods based on soft measurement, ordinary least squares regression (OLSR), principal component regression (PCR), partial least squares regression (PLSR), and modified partial least squares regression (MPLSR) are reviewed. With these methods and the real data obtained in practice, wine quality prediction models have been constructed. One purpose of this work is to choose the best method among OLSR, PLS, PCR, and MPLSR. The  model built by MPLSR performs the best among the four models with the superior fitting efficiency represented by RMSEC. The models built by PCR, PLSR, and MPLSR have a similar prediction ability represented by RESEP. Several wine physicochemical indexes have significant contributions to the final wine quality while others are insignificant. The efficiency of the MPLSR model is validated on a larger data set, while this method is based on the objective tests, aiding the speed and quality of the oenologist performance. The outcome of the work is useful for the wine industry. From the perspective of industries, they can save manpower and financial resources in this method.
With the science and technology transformed into productivity, the advanced technology is widely used in industry. At the same time, soft measurement is developing. Soft measurement has achieved comprehensive results in the process control research. This technology breaks the traditional pattern of single input single output (SISO). The computer technology is conducive to soft measurement, improving the availability of soft-sensing technique and reducing the difficulty of application. We believe that soft measurement will have a wide application prospect, with the integration of computer technology and advanced technology.

Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.