Rapid Discrimination of Cheese Products Based on Probabilistic Neural Network and Raman Spectroscopy

The aim of this work is to solve the practical problem that there are relatively few fast, intelligent, and objective methods to distinguish dairy products and to further improve the quality control methods of them. Therefore, an approach of cheese product brand discrimination method based on Raman spectroscopy and probabilistic neural network algorithm was developed. The experimental results show that the spectrum contains abundant molecular vibration information of carbohydrates, fats, proteins, and other components, and the Raman spectral data collection time of a single sample is only 100s. Due to the high spectral similarity between samples, it is impossible to identify them with naked eyes. Characteristic peak intensity combined with statistical process control method was employed to study the ﬂuctuation characteristics of samples. The results show that the characteristic peak of experimental samples ﬂuctuates within a certain control limit. However, due to the high similarity between the Raman spectra of diﬀerent brand samples, they cannot be eﬀectively identiﬁed as well. This paper further studied and established the analytical approach based on Raman spectroscopy, including wavelet denoising, normalization, principal component analysis, and probabilistic neural network discrimination. In db1 wavelet processing, [ − 1, 1] normalization, 74 principal components (cumulative contribution rate of 100%) can realize the eﬀective discrimination of diﬀerent brands of cheese products in 1 s, with the average recognition accuracy of 96%. The discriminant method established in this work has the advantages of simple operation, rapid analysis, and accurate results. It provides a technical reference for the ﬁght against counterfeit products and has a broad application prospect.


Introduction
In recent years, the quality and safety assurance of dairy products has become a hot issue of great concern to consumers, enterprises, and government regulators. e risks are mainly from illegal addition, toxic and harmful substances, and shoddy products [1,2]. e existing detection method that has been widely studied and applied is component analysis, mainly represented by chromatography and chromatography-mass spectrometry [3]. For example, the detection of melamine, dicyandiamide, aflatoxin, and other components can obtain accurate qualitative and quantitative information through component analysis [4,5]. However, this method has to face two challenges. First, it is a routine separation and analysis component test, which generally requires preprocessing and is time-consuming and laborious. Second, it is recently reported that some "counterfeit" products are all qualified products in fact, which will not cause harm to human body but are illegally used by criminals to pretend to be high-quality products. In this way, illegal profits can be obtained [6].
e traditional component analysis method may not effectively identify the true and counterfeit samples.
In view of the above problems, rapid detection methods have been widely used in the field of analytical chemistry, such as colorimetric method, strip method, and computeraided discrimination method [7][8][9]. Colorimetry mainly uses the specific reaction between the marker and the reactant; the marker causes the dispersion and aggregation state change of nanomaterials (e.g., colloidal gold nanoparticles), and then the color change and the target molecules will be identified [10,11]. e strip method mainly uses immune or competitive reactions to initiate or prevent the aggregation of colloidal gold nanoparticles on the paper strip detection line. e detection line shows red or colorless to indicate the presence or absence of the target molecules [12,13]. e computer-aided discriminant analysis is mainly based on the data information of spectroscopy, chromatography, mass spectrometry, and so on, of the samples and combined with the chemometrics algorithm [14][15][16][17]. Because spectral data can be obtained quickly and contain abundant sample molecular information, it has become a key area of rapid detection technology research and development.
e spectroscopy mainly includes infrared spectroscopy, ultraviolet spectroscopy, fluorescence spectroscopy, and Raman spectroscopy. It is found that the infrared spectroscopy is easy to be interfered by water molecules.
e ultraviolet spectroscopy requires that the tested object contains unsaturated compounds and the sample is mainly liquid. e fluorescence spectroscopy requires that the tested object contains a luminescent structure. Compared with the above-mentioned spectroscopy methods, Raman spectroscopy has many advantages, such as there is no obvious interference of water molecules, the object can be directly tested, characterization of rich vibration information of samples is possible, and the equipment is portable [18,19]. It has become the research focus in the field of rapid detection. For example, Mendes et al. established a quantitative analysis of milk fat based on vibration spectroscopy [20]. Teixeira et al. evaluated the detection of β-lactam antibiotics in milk from the experimental and theoretical levels [21]. Nieuwoudt et al. reported a method for rapid quantitative determination of melamine, urea, ammonia sulfate, dicyandiamide, and sucrose in milk using partial least squares combined with Raman spectroscopy [22]. However, there are still relatively few research studies about Raman spectroscopy combined with chemometrics technology in the field of dairy products, and the current reports have mainly focused on component prediction [23].
Herein, an approach of cheese product classification based on Raman spectroscopy and probabilistic neural network was established. Cheese products contain rich nutrients; the quantity of products is relatively small and the price is high, so it is urgent to develop a rapid and intelligent discrimination method [24]. e newly established method mainly includes the following advantages. First of all, the traditional component analysis method is difficult to effectively identify different brands of cheese products with high similarity. e sample molecular information can be characterized by Raman spectroscopy, and the sample can be effectively identified by combining the probabilistic neural network classification algorithm. Second, Raman spectral signal acquisition of the experimental samples is simple and fast and do not need sample pretreatment, the water molecules in the cheese samples do not interfere with the test, the data acquisition time of each sample is only 100 s, and the probabilistic neural network algorithm operation is less than 1 s. Finally, the Raman spectrometer is portable, which is conducive to on-site detection. Combined with the multimolecular characteristics of the Raman spectroscopy, the proposed method can effectively achieve the fingerprint of experimental samples and provide an intelligent objective evaluation system.

Samples and Instruments.
Samples of three brands of cheese products were purchased from Suguo Supermarket (Nanjing, China) and marked as brand XX, brand YY, and brand ZZ, respectively. 25 samples were randomly collected for each brand. A 96-well plate was filled with appropriate amount of cheese products.
e Raman spectra were recorded using a portable laser Raman spectrometer (ProttezRaman-d3; Enwave Optronics Inc., USA). e excitation wavelength of the laser was 785 nm, the laser power was about 450 mW, and the integration time was 100 s. e spectrometer operated over a spectral range from 250 to 2000 cm −1 with a resolution of 1 cm −1 .

Data Analysis.
e baseline calibration of the collected Raman spectra was carried out by the software SLSR Reader V8.3.9 (Enwave Optronics Inc., USA). Wavelet denoising, normalization, principal component analysis, and probabilistic neural network were performed using MATLAB software (MathWorks, Natick, MA, U.S.A.). e "wden" function was applied to implement wavelet denoising. e "mapminmax" function was applied to implement normalization.
e "princomp" function was employed to implement principal component analysis. e "newpnn" function was used to construct the probabilistic neural network. A statistical control chart was obtained using Minitab software (Minitab Inc., USA).

Raman Spectroscopic Characterization Analysis of Cheese
Products. Figure 1 shows the Raman spectra of cheese products. Referring to the existing literature reports [25][26][27], the main Raman peaks of cheese products can be assigned as follows (Table 1). e peak of Raman spectra at 1760 cm −1 was mainly attributed to the C�O stretching ester of fat acid molecules. e Raman peak at 1670 cm −1 was characteristic of C � O stretching vibration of amide I of proteins and C � C stretching mode of unsaturated fatty acids. e weak Raman band at 1620 cm −1 could be attributed to the ring vibration of the amino acid phenylalanine. e prominent feature of the spectra was the CH 2 deformation vibration of fats and carbohydrate molecules at 1458 cm −1 . e Raman peak at 1313 cm −1 was CH 2 twisting vibration related to the lipids. e region between 800 and 1200 cm −1 was very characteristic of carbohydrates; the main peaks could be attributed to C-O stretching vibration, C-C stretching vibration, and C-O-H deformation vibration (1143 cm −1 , 1095 cm −1 , and 1080 cm −1 ), C-O-C and C-O-H deformation vibration and C-O stretching vibration (938 cm −1 ), and C-C-H and C-O-C deformation vibration (851 cm −1 ); except the peak at 1019 cm −1 , which was the ring breathing mode associated with the presence of phenylalanine. Vibrations in the 250-800 cm −1 region mainly included the C-C-O deformation vibration (636 cm −1 ), glucose (510 cm −1 ), and lactose (384 cm −1 ). e rich information of material components and molecular vibration of cheese products are shown in Figure 1; at the same time, it can be seen that the Raman spectra of different brands of cheese products have high similarity, which cannot be effectively visually identified with naked eyes.
Similarly, the appearance of cheese products is a yellowish viscous solid, and it is difficult to identify the sample brand from the appearance as well. Figures S1S3 show the ten randomly selected Raman spectra of brand XX, YY, and ZZ cheese products, respectively. It can be seen from the figures that the intensities of Raman spectra of cheese products of the same brand have some fluctuations, but the overall spectra maintain high consistency.
ere are also high similarities among these different brand samples, which suggests that we need to use statistical learning methods for their discriminant analysis.

Statistical Analysis of Raman Spectral Peak Intensity of Cheese Products.
In the actual production management process, the statistical process control method is often used for the statistical analysis of sample quality fluctuation [28]. From the above analysis, it can be seen that the Raman spectrum is closely related to the molecular composition of the corresponding cheese products. erefore, the statistical control chart method can be employed to analyze the fats (1760 cm −1 ), carbohydrates (1458 cm −1 ), and proteins (1019 cm −1 ), respectively. e statistical control chart can be realized using the following individual and moving range chart formulae [29,30]. For the individual (a) control chart, the formula is as follows: (1) For the moving range (MR) control chart, the formula is as follows: In the formulae, a and a represent the Raman intensity and the average value of the samples, respectively; and MR represents the moving range, which is MR � |a i+1 − a i |; a i represents the Raman intensity of the sample i variable, and i changes from 1 to 24 in steps of 1 in this work. UCL � upper control limit; LCL � lower control limit; MR � the average value of moving range control chart.
As shown in Figure 2, the control lines were calculated based on the Raman spectral intensities of brand XX at 1760 cm −1 . It can be seen that the Raman peak intensity corresponding to the fat content of brand XX's experimental sample fluctuates in the range of 55.7-163.1, and the moving range is located at 0-65.97. e experimental sample shows good quality stability, and there is no sample jumping out of Ring-breathing (phenylalanine); ](C-C) ring 938 δ

Discriminant Analysis of Cheese Products Based on
Probabilistic Neural Network. For this kind of product with both quality fluctuation and similarity, using a machine learning algorithm to establish an effective discriminant analysis process has become a research hotspot [31]. As a pattern classification algorithm, the probabilistic neural network (PNN) algorithm had the advantages of easy training and fast convergence, which was employed to construct the discriminant analysis method in this paper [32,33]. is algorithm consists of input layer, model layer, sum layer, and output layer. e main function of the input layer is to receive the training sample Raman spectral data and transmit the data to the network. e number of neurons is equal to the attribute dimension of the samples. e model layer mainly describes the feature vector transferred from the previous layer and the pairing relationship of each pattern in all training samples. When vector x is received, the input-output relationship of the j-th neuron of the class i sample in this layer is shown as follows: where i � 1, 2, . . . , b, b is the total number of classes corresponding to the training samples, d is the sample space dimension, x ij is the j-th center of the class i sample, σ is the smoothing factor, and φ ij is the output of the j-th neuron of the class i sample in the model layer.
In the sum layer, the outputs of neurons belonging to the same class in the model layer are weighted and averaged using the formula: where f i is the category out of class i and L is the number of neurons of class i. e output layer is composed of competing neurons, whose function is to receive the output from the sum layer and to find one neuron with the largest posterior probability density among all the output layer neurons. Its output is the prediction category and the output of the other neurons is 0. e formula is as follows: y � arg max(f i ), where y is the results of the output layer. e "newpnn" function of MATLAB software could be employed to build the probabilistic neural network. First, 80% of the experimental samples were selected to build the training set, and the remaining 20% were used to build the test set. e experimental results show that the recognition accuracy is only 33.33%. e reason may be that there is  redundant information in the Raman spectral data of cheese samples, which affects the effective calculation and discrimination of the model.
According to the Raman spectral data of cheese products, wavelet denoising ("wden" function, db1 wavelet base, and decomposition layer � 3) was employed to effectively eliminate the spectral line noise of the Raman spectroscopy, and the normalization function ("mapminmax" function) was employed to classify the Raman spectral intensity into the range of [−1, 1], so as to effectively reduce the influence of dimensional differences. For wavelet denoising, the Raman spectral data f(t) is expressed as a linear combination of wavelet functions, which is where wf(m, n) is the component represented by wavelet function ψ m,n (t) in the original Raman spectral data f(t). In the calculation process, the original Raman spectral data are transformed into wavelet coefficients, then the smaller coefficients are weakened according to the soft threshold processing method, and finally, the denoised spectral data are reconstructed.
e normalization formula is as follows: W � (((W max − W min ) * (X − X min ))/(X max − X min )) + W min . e denoised Raman spectral data X are mapped to W in the range of [−1, 1], W max � 1 and W min � −1.
As shown in Figure S12, it can be seen that the Raman spectral lines of cheese products become smooth and the ratio difference between peaks increases. Principal component analysis was applied to extract features and reduce data dimensions [34]. →T p ] are selected, where the contribution rate of the eigenvalue λ i is defined as (λ i / m k�1 λ k ) × 100% and the cumulative contribution rate of the first p eigenvalue is defined as ( p i�1 λ i / m k�1 λ k ) × 100%. (4) e data are projected into the space formed by the feature vector Q � W · W PCA , and Q is the new feature vector extracted. e results show that the first principal component can explain 30.20% of the original 1751-dimensional data information, and the second principal component can explain 11.51% of the original information ( Figure S13). Only 74 principal components can achieve 100% of the original information. Figure 3 shows the three-dimensional scatter diagram of cheese products by principal component analysis. It can be seen that the samples of the same brand tend to gather, and the samples of different brands have some separation and some cross.
Randomly select 80% of the samples as the training set, input 74 principal components extracted by wavelet denoising, normalization, and principal component analysis as the data input, and reconstruct the probabilistic neural network model with the remaining 20% of the samples as the test set. e experimental results are shown in Figure 4, the recognition accuracy is up to 100%, and the average recognition accuracy is 96%. e discriminant analysis takes less than 1 s. e experimental results show that the Raman spectral collection and processing established in this work can realize the effective and rapid discrimination of cheese products with    high similarity and provide technical support for their quality control.

Conclusions
In this paper, a novel method based on Raman spectroscopy and probabilistic neural network has been developed, which will strongly support to solve the practical challenge of the lack of fast and intelligent discrimination technology for dairy product quality control. is method has many advantages: Raman spectral signals of cheese products can be collected directly, the operation is convenient and fast, Raman spectroscopy contains abundant components and molecular vibration information of samples, and the method can solve the quality control problem that the traditional component analysis and statistical control cannot achieve brand discrimination. e total time of Raman spectral signal acquisition, discrimination algorithm analysis, and result output is only a few minutes, and the average recognition accuracy is 96%. is method can be employed for reference and potential application in food system discrimination analysis with high similarity between samples.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares no conflicts of interest with respect to research, authorship, and/or publication of this article.