Incremental Support Vector Machine Combined with Ultraviolet-Visible Spectroscopy for Rapid Discriminant Analysis of Red Wine

The aim of this work is to develop a new method to overcome the increased training time when a recognition model is updated based on the condition of new features extracted from new samples. As a common complex system, red wine has a rich chemical composition and is used as an object of this research. The novel method based on incremental learning support vector machine (I-SVM) combined with ultraviolet – visible (UV-Vis) spectroscopy was applied to discriminant analysis of the brands of red wine for the ﬁ rst time. In this method, new features included in the new training samples were introduced into the recognition model through iterative learning in each iteration, and the recognition model was rapidly updated without signi ﬁ cantly increasing the training time. Experimental results show that the recognition model established by this method obtains a good balance between training e ﬃ ciency and recognition accuracy.


Introduction
Ultraviolet-visible (UV-Vis) absorption spectra of red wine samples can be obtained using a UV spectrophotometer.These differences of peak shape, height, and area of UV-Vis spectra characterize the disparity of composition and the degree of unsaturation of components contained in red wine samples, which reflect the overall characteristics of red wine.UV-Vis spectroscopy has the characteristics of high sensitivity, good reproducibility, high efficiency, and low cost.
Different pattern recognition algorithm with UV-Vis spectroscopy has been used to detect red wines [1][2][3][4][5][6][7][8][9].However, the traditional method of pattern recognition is a kind of off-line training method, which trains a classifier with labeled sample datasets for recognition.The quality of red wine is mainly determined by the raw materials of grape and the brewing process, while the quality of raw material is greatly influenced by the climate of the place of production, which makes the tastes of different batches of red wines of the same brand have subtle differences, and the corresponding spectral data also change.So the classification accuracy of the classifier trained by off-line data will be significantly reduced.The solution is usually to retrain the classifier, but the retraining requires a large number of training samples and training time, this cost is unbearable.Therefore, how to adapt the classifier trained by the old labeled data to the identification of new samples is a difficult problem in on-line identification [10][11][12][13].
Support vector machine (SVM) has been successfully used for data mining, pattern recognition, and artificial intelligence fields.With labeled data, SVM learns a boundary (i.e., hyperplane) separating different class data with maximum margin.The classification process usually face the new evolving data; the initial training sample data cannot reflect all the sample information.When new training samples are accumulated to a certain scale, in order to obtain the new sample information, it would like to integrate these examples and train a new classification model.However, the training of a SVM model has the time complexity of O(n 3 ) (n is the number of training samples); it does benefit large-scale online applications [14][15][16][17].
It is noteworthy that performance of classification method for red wine is evaluated not only based on accuracy but also on rapidity, which are also of great significance in practical applications.To attack this problem, lots of works have been done.One way is to reduce training samples with a certain sample selection strategy.The quality of training data set is vital to the performance of the classifier being constructed.Bennett et al. worked out an incremental algorithm based on SVM, which retains only the support vector set as a historical training sample [13][14][15][16][17][18].
The main contribution of this paper is that a novel hybrid classification method based on principal component analysis (PCA) and incremental support vector machine (I-SVM) [17,18] combined with UV-Vis spectra is proposed.Experimental results indicated that PCA-I-SVM, as a classifier, was tested in terms of classification rate and running time.Compared with normal SVM, PCA-I-SVM can run much faster with similar accuracy rate.Experimental results showed that PCA-I-SVM combined with UV-Vis spectra can be a rapid, accurate method for classification of red wine.

Experiments and Materials
2.1.Sample Collection and Preparation.Nine brands of red wine, a total of 54 wine samples, were purchased from a well-known e-commerce website in China.At the same time, in order to verify the time efficiency of incremental learning algorithm, a total of 5400 samples of 100 batches of samples were simulated by Monte Carlo method.

UV-Vis Spectrum Acquisition.
The UV-Vis spectra were obtained from a UV-Vis spectrometer, T6 New century, Purkinje General Instrument Co., Ltd.(Beijing, China).Water was used as the zero point.Each test sample was prepared through mixed 100 μL wine with 3 mL water.The scanning range of the UV-Vis spectrum of each sample was 240~550 nm.

Data Preprocessing.
In this study, we used PCA to remove redundant features, and several previous principal components were extracted as the input of the classifier for red wine.PCA is a method for the reexpressing multivariate data.It allows the researcher to reorient the data so that the first few dimensions account for as much of the available information as possible.The principal component solution has the property that each component is uncorrelated with all others, which has the advantage of eliminating multicollinearity.
The number of the generated features was still quite large for the classifier.So PCA was used to perform feature reduction before pattern recognition, and then I-SVM was used for classification of red wine.

Incremental SVM Learning
Based on Support Vector.In order to make the SVM learning algorithm incremental, reducing the number of training data sets is an effective way to apply SVM classification for large data sets.Because support vectors are a sufficient description of the decision boundary between the examples, then at each incremental step, the representation of the old sample data is given by the set of support vectors.Such support vectors are incorporated with the new incoming batch of data to provide the training data for the next step.Since the number of support vectors is small compared to the total number of training examples, this method can effectively reduce the number of samples.Now, suppose that labeled data X = x ij | i = 1, 2, … , c, j = 1, 2, … , N i is the UV-Vis spectra data, where c is the red wine class number, x ij represents the jth samples in class i, N i is the number of samples in class i.The overall red wine sample size is N, which is expressed by: N = ∑ c i=0 N i .The incremental SVM algorithm framework is shown in Figure 1: (1) Uses labeled UV-Vis spectra data X ini to initialize SVM classifier (2) For each incremental learning step.Creating subsets For each class data set X i , compute the support vector set X i SV , then concatenate

Data Analysis
UV-Vis spectra can quickly obtain sample information about the functional groups in aromatic compounds, and have significant advantages that include simple sample preparation, rapid analysis, high sensitivity, robustness, green process, and low cost.Since the UV-Vis spectra mainly reflected the main compounds of the red wine, UV-Vis spectra of red wine are extremely similar and difficult to be identified manually, as shown in Figure 2.These spectra of red wine at 240~550 nm are similar.The UV-Vis absorption around 260~275 nm could be attributed to pi-pi * electron transition between bonding pi orbital to antibonding pi * orbital; for example, unsaturated hydrocarbon and aromatic hydrocarbons.In traditional spectral analysis, Beer-Lambert law was used commonly, that is, the maximum absorption peak and the concern between specific substances were applied to analyze, and the utilization rate of spectral information was relatively low.In fact, the information of the UV-Vis spectra of red wine is very rich; its peak shape, area, width, and so on are closely related to these quality of the sample.However, if we only rely on the observation of these spectra, it is not appropriate to make a visual judgment of the quality of the red wine, because most of the time, different molecules contribute to the similar peak.In order to overcome this limitation of visual analysis, statistical methods are mostly used for the further analysis of the UV-Vis spectra.With the statistical approach, one can extract useful information from the data set by highlighting the similarities and differences.In this study, we used I-SVM for the classification of red wine in order to efficiently handle large amounts of sample data.

Result and Discussion
4.1.Raw Data of Characteristic Information.The chemical components and relative contents of different red wines are different; these will produce different associations, so it determines the spectral curves of different red wines are somewhat different and has different characteristics and fingerprints.The difference between the spectra is the variation of relative intensities of the absorption peaks in the fingerprint region and the minute difference in the small peaks in the fingerprint region.Pattern recognition algorithm can maximize the information extracted from the data and can classify the sample set.

Red Wine Classification
4.2.1.Data Preprocessing Results with PCA.The dimension of feature space generated by PCA is not determined by itself, and depended on the final classification rate and efficiency.According to Figure 3, we utilize 8 principal components as feature vectors, thinking of account the balance between efficiency and classification accuracy.

Classification
Result with I-SVM.I-SVM algorithm was used to classify the nine brands of red wine samples.We selected the RBF function as the kernel function, and the kernel parameter was optimized by using grid search with cross-validation method.We used leave-one-out cross-validation and 10-fold cross-validation to assess the performance of these classifiers.4.2.3.Compare Data Processing Efficiency and Accuracy.For comparison, three different algorithms were simulated.Algorithm 1 is the normal SVM algorithm, which uses all the samples to solve the support vector for each incremental learning.Algorithm 2 is I-SVM algorithm, which uses the support vector set for incremental learning.Algorithm 3 is the multilayer perceptron neural network (MLPNN) algorithm.The initial sample set is 300 samples randomly selected from all samples, and 510 samples are added for each incremental learning.The results are shown in Figure 4. I-SVM algorithm iteratively chooses the support vectors set and the new sample set as training data set at each increment learning step, and it greatly saves the computation time and accelerates the simulation speed and basically has the same classification accuracy.Meanwhile, with the continuous learning of incremental learning, the algorithm can naturally make part of the support vectors into non-SV vector to achieve the selective forgetting of the historical data of the training.Therefore, when dealing with a large number of new training data, the speed advantage of I-SVM algorithm is more remarkable.

Conclusion
The experiment will be UV-visible spectroscopy and I-SVM combination of red wine used online identification.After the pretreatment of the samples of red wine, the ultraviolet spectral fingerprint library of nine kinds of red wine was established by UV spectrophotometer.After dimensionality reduction by PCA, an incremental SVM model was established to identify the red wine.The recognition rate of red wine reached 94.9%.At the same time, in order to verify the time efficiency of the algorithm, a total of 5400 samples of 100 batches of samples were simulated by Monte Carlo method.The recognition rate of wine reached 96.78%.The average training time of I-SVM models for each batch was 0.47seconds with standard deviation of 0.03, one-thirteenth of the average time of normal SVM.The method provides a reliable, stable, rapid, and completely new method for the identification of online red wine and provides a method basis for quality evaluation and quality control of red wine.

Figure 2 :
Figure 2: Typical UV-Vis spectra of nine kinds of red wine.

Figure 3 :
Figure 3: The classification accuracy rate of principal component number.

Figure 4 :
Figure 4: Comparison the classification accuracy and training time of SVM, I-SVM, and MLPNN algorithms.