Evaluation of Quality Parameters of Apple Juices Using Near-Infrared Spectroscopy and Chemometrics

Near-infrared (NIR) spectra were recorded for commercial apple juices. Analysis of these spectra using partial least squares (PLS) regression revealed quantitative relations between the spectra and qualityand taste-related properties of juices: soluble solids content (SSC), titratable acidity (TA), and the ratio of soluble solids content to titratable acidity (SSC/TA). Various spectral preprocessing methods were used for model optimization. +e optimal spectral variables were chosen using the jack-knife-based method and different variants of the interval PLS (iPLS) method. +e models were cross-validated and evaluated based on the determination coefficients (R), root-mean-square error of cross-validation (RMSECV), and relative error (RE). +e best model for the prediction of SSC (R � 0.881, RMSECV� 0.277 °Brix, and RE� 2.37%) was obtained for the first-derivative preprocessed spectra and jack-knife variable selection. +e optimal model for TA (R � 0.761, RMSECV� 0.239 g/L, and RE� 4.55%) was obtained for smoothed spectra in the range of 6224–5350 cm.+e best model for the SSC/TA (R � 0.843, RMSECV� 0.113, and RE� 5.04%) was obtained for the spectra without preprocessing in the range of 6224–5350 cm. +e present results show the potential of the NIR spectroscopy for screening the important quality parameters of apple juices.

NIR spectroscopy is based on the absorption of electromagnetic radiation in the range of 12,500-4000 cm −1 [2,7].e NIR spectra consist of broad overlapping bands arising from overtones and combination tones of the fundamental vibrations involving C-H, O-H, and N-H chemical bonds.ese bonds are the primary structural components of organic molecules; thus, NIR is very useful for measurements of biological and organic systems, including foods.Due to the wealth of chemical information provided by the NIR spectra, they allow simultaneous determination of several constituents and/or of diverse sample properties [4,7].
One of the main advantages of the NIR technique is its nondestructive character and simple and rapid measurements.Different measurement modes enable direct analysis of both liquid and solid samples without any preparation.Due to its advantages, the NIR technique coupled with chemometrics provides a rapid, effective, and cost-saving alternative to the conventional methods in routine, highthroughput analysis of foods.NIR has been used to assess both the properties and concentrations of the food components, being also a well-established tool for process monitoring.
Using NIR for quality control requires chemometric methods to extract useful information out of complex spectra of the products studied [8].Practical applications usually require development of multivariate calibration, which define the relationships between the measured spectra and the content of the compound or property of interest, obtained by the respective reference methods.Multivariate regression methods are used for developing quantitative models, with partial least squares (PLS) regression being the most widely used.A lot of factors impact the performance of the calibration models, one of the important issues being an appropriate choice and application of chemometric methods.
e collected spectra are usually preprocessed mathematically to reduce noise and enhance the analytical information.
is improves the results of the subsequent data analysis and leads to better calibration models [9].e regression analysis may be performed using the entire NIR spectra.However, many studies showed improvements when calibrations were developed in a selected spectral region as compared to the full-spectrum model [10].Several methods have been developed to objectively identify the important variables (spectral regions), being more efficient than the traditional approach based on the knowledge of the spectroscopic properties of the sample and/or analysis of the regression results performed on the entire spectra [10,11].
An important area of NIR application is the analysis of fruit and vegetables and products of their processing [5,[12][13][14].Considerable attention has been devoted to studies of the apple properties using NIR [12,15].Apples are very popular due to their pleasant flavour and beneficial health effects, being a relevant dietary source of phytochemicals, including phenolics [16].
NIR spectroscopy has been successfully used to evaluate a range of intact apple quality attributes such as the soluble solids content, titratable acidity, sugar content, vitamin C, total polyphenols, starch index, chlorophyll content, firmness, and mealiness [17][18][19].e feasibility of using variable selection methods for determination of the apple quality parameters such as soluble solids content was also demonstrated [17,20,21].
Despite the amount of research carried out to date on using NIR to evaluate properties of the intact apples, the number of published papers that study the apple juice is rather limited.Spectroscopy in the NIR range was used to predict sugar content in the apple juice [22], detect adulteration [23], and differentiate between the apple juices on the basis of apple variety [24].
e combination of NIR spectroscopy and fluorescence enabled detection of quality deterioration of the apple juice during storage and heating [25].Application of this method for determination of the quality parameters of apple wine was also reported recently [26].
e important characteristics of apple juices related directly to their quality are soluble solids content (SSC) and titratable acidity (TA).e limits for these parameters in marketed apple juices are defined by the Code of Practice developed by the European Fruit Juice Association, which provides reference for the control of juice quality on the EU market.SSC is one of the major characteristics used to indicate sweetness of fresh and processed fruit products [13].Titratable acidity is related to the organic acid contents; these compounds contribute to the sour taste and also stabilize colour and extend the shelf life of fresh fruit and their processed products.e overall taste of fruit is more closely related to the ratio of SSC and TA than to the individual parameters; therefore, this ratio is used as an index of sensory acceptability of the fruit taste [27].
e aim of the present study was to test feasibility of the NIR spectroscopy in developing the calibration models for predicting the main quality parameters of the apple juices: SSC, TA, and SSC/TA.We also explored the possibilities to optimize the models using jack-knife variable selection, different variants of the interval PLS variable selection, and preprocessing methods.

Materials and Methods
2.1.Apple Juices.Apple juices that are available on the market were evaluated in this study.e samples included clear and cloudy juices reconstituted from the concentrate, direct juices that were pasteurized, and freshly squeezed juices.e total of thirty juices from 15 different producers was studied; all of these samples were studied in duplicate, using two different production batches.

NIR Measurements.
e spectra were collected using an FT-NIR spectrophotometer (MPA; Bruker Optics, Ettlingen, Germany).e instrument performance was validated before measurements by running automatic tests according to the manufacturer's procedure.Spectral acquisition and instrument control were performed using OPUS software (v.5; Bruker Optics, Ettlingen, Germany).e spectra were acquired in the range of 12,500-4000 cm −1 with the resolution of 8 cm −1 and with 64 scans coadded to obtain the averaged spectrum.e measurements were performed using transmittance techniques in cuvettes with the optical pathlength of 2 mm.e cuvettes were placed into a temperaturecontrolled cell holder, and measurements were conducted at a constant temperature of 35 °C, controlled by the OPUS software.
e spectra were recorded after centrifugation (15,000 rpm for 5 min), with six replicated spectra collected for each of the juices.

Determination of the Chemical Parameters.
e soluble solids contents (SSC) of the juices were determined using an Abbe refractometer (model DR-A1's Conbest) at 20 °C, calibrated with distilled water.e SSC was expressed as Brix degrees ( °Brix), with all of the measurements carried out in triplicate.
Titratable acidity (TA) was measured using a pH meter (S220 SevenCompact ™ ; Mettler Toledo), by titrating 25 ml of the juice sample with 0.1 M NaOH to the pH endpoint of 8.1.e results were expressed as grams of malic acid per litre of the juice (g/L).
ese measurements were performed in triplicate.

Data Analysis
2.4.1.Regression Methods.Partial least squares (PLS) regression was used to establish the calibration models between the NIR spectra (the X matrix) and the quality parameters of the apple juices (the Y matrix).
e PLS method models both the X-and Y-matrices simultaneously, finding the latent variables in X that best predict the latent variables in Y [28].We used all thirty juice samples for 2 Journal of Spectroscopy developing and optimizing the calibration models.e average spectra were used in the analysis.

Validation of the Regression Models.
Full leave-one-out (LOO) cross-validation was applied to all of the regression models.e regression models were evaluated using the determination coefficient (R 2 ), the root-mean-square error of cross-validation (RMSECV), and the relative error (RE), calculated as the percentage ratio of RMSECV to the average value of the studied parameter in the calibration set.e optimal number of components was chosen as the minimum on the plot of the RMSECV as a function of the number of components.

Spectral Preprocessing.
We used different preprocessing methods in order to remove noise, baseline, and scattering effects from the spectra.Savitzky-Golay smoothing with the filter width of 15 data points was used to remove spectral noise, while the baseline was corrected using the baseline offset and the first and second derivatives.e baseline offset involved linear offset subtraction, which shifted the spectra in order to set the minimum value to zero.
e first-order derivative is normally used to eliminate constant baseline shifts, and the second-order derivative also eliminates the baseline slope [9].e derivatives were calculated using the Savitzky-Golay algorithm, with the filter width of 15 data points.Multiplicative scatter correction (MSC) and standard normal variate (SNV) were applied for the correction of the light-scattering effects [9].e MSC estimates the correction coefficients for additive and multiplicative scattering effects by regressing the spectrum to be corrected on a reference spectrum [9].e average spectrum of the calibration set was used as a reference.
e SNV corrects the spectra by first calculating the mean spectrum and subsequently subtracting this mean from the spectrum to be corrected.en, that value is divided by the standard deviation of the spectrum [9].e spectra were preprocessed using each of the single methods and/or their following combinations: smoothing and baseline, smoothing and SNV, smoothing and MSC, MSC and the first-order derivative, MSC and the second-order derivative, SNV and the first-order derivative, and SNV and the second-order derivative.
e order of application of the different preprocessing methods was as indicated in the preceding description.
e preprocessing was performed on the average NIR spectra.Prior to PLS analysis, all of the spectra were meancentred.

Variable Selection.
e variable selection methods applied in this work include the jack-knife method and different variants of the interval PLS (iPLS) [29].
e jack-knife is a method used for calculating the standard errors of the regression coefficient estimated in the PLS regression model [30].e regression coefficients are then divided by their estimated standard errors, giving the t-test values to be used for testing the significance of the variables used in the model [11].ese calculations were carried out using e Unscrambler v. 9.8 software (CAMO, Norway).
e iPLS method subdivides the data into nonoverlapping sections, obtaining a local PLS model in each section, in order to determine the most useful variable range.
e comparison between all of the local models is usually based on the RMSECV values, obtained from the validation [11].An optimal data range may be found by reducing or increasing the existing trial ranges, or by removing or adding new variables [20].Presently, we used different variants of the iPLS method as implemented in the OPUS software for selection of the optimal variable ranges [31].
e iPLS (NIR) variant used an NIR spectrum (with the 12,500-11,263 cm −1 and 5349-4779 cm −1 ranges excluded) that was divided into five frequency ranges, each corresponding to specific absorption bands.e local PLS models were tested in each of the selected ranges on their own and in all of their possible combinations.is procedure coincides with the synergy interval PLS (SiPLS) [10].
e iPLS (A) started the calculation with all of the 10 subranges and next successively excluded one of the subranges.is procedure continued until the RMSECV value did not improve any further.is procedure coincides with the backward iPLS (BiPLS) [10].
e iPLS (B) starts the calculation to find the optimum spectral range with one of the subranges.After finding the best subrange, a second subrange is added.After the best combination of the two subranges is found, a third subrange is added, and so on.e best combination of the subranges was thus searched by adding and leaving out further subranges.
e selection of variables was performed on differently preprocessed spectra.
e algorithm implemented in the OPUS software enables automatic searching for the optimal combinations of the preprocessing method with the spectral range based on the minimum value of the RMSECV criterion.
e 5349-4779 cm −1 spectral range was excluded from the calculations due to the high absorbance values, clearly exceeding the useful range of the instrument.
Finally, all of the PLS models with different combinations of the preprocessing methods and the variable ranges were calculated using e Unscrambler v. 9.8 (CAMO, Norway).

NIR Spectra of Apple Juices.
e thirty apple juice samples studied included different juice categories available on the market.ey included juices reconstituted from the concentrate, both clear and with added fruit pulp, and direct juices, pasteurized and freshly squeezed.
Figure 1 shows the NIR absorbance spectra collected for the apple juices studied.
Very similar characteristic spectral patterns were observed in all of the measured spectra, which were visually indistinguishable.Generally, the positions of the main absorption bands coincided with those obtained for intact apples [32] and other fruit juices [33].
e absorbance spectra are dominated by water absorption, which is the main component of the apple juices.e absorption bands for water were reported at 10,309 cm −1 (the second overtone of the O-H stretching band), 8403 cm −1 (the combination of the first overtone of the O-H stretching and the O-H bending bands), 6896 cm −1 (the first overtone of the O-H stretching band and a combination band), 5154 cm −1 (combination of the O-H stretching band and the O-H bending band), and 4444 cm −1 [12,13,34].
Sugars and organic acids are the main constituents of apple juices, besides water.e most dominant sugar in the apple fruit is fructose, followed by glucose and sucrose.Malic acid is the principal organic acid found in apples.Other components of the apple juice include polyphenolic compounds, vitamins, and some amino acids [35].All of these components should contribute to the spectra in different NIR ranges; however, their bands are largely suppressed by the dominant water absorption bands.e absorption bands in fruit juices at 6896, 5587, and 4413 cm −1 were attributed to sucrose, fructose, and glucose [2].In fact, the absorption spectra of glucose, fructose, and sucrose are very similar to each other in aqueous solutions, with characteristic bands at 6301-6317, 4716-4710, and 4403-4397 cm −1 [36].
e first, second, and third overtones of the C-H stretching vibrations (CH group) are observed, respectively, in the ranges of 5550-6250 cm −1 , 8100-9100 cm −1 , and 11,000 cm −1 .e bands arising from the overtones of OH, CH, and CH 2 deformation vibrations are observed below 5400 cm −1 [13].e combination band of the C-H bond in sugars and organic acids was reported at 4323 cm −1 [32].

Chemical Characteristics of the Calibration Set.
All of the thirty apple juice samples were used as the calibration set for developing and optimizing the calibration models.Chemical characteristics of the calibration set including the mean values, ranges, and standard deviations of the soluble solids contents (SSC), titratable acidity (TA), and the SSC/TA ratio are presented in Table 1.
e solids content in the studied apple juices was in the range of 11.0 to 13.6 °Brix.e titratable acidity was 5.25 g/L on average and ranged from 4.51 g/L to 6.09 g/L.ese values are within the limits established for apple juices by the Code of Practice [37].
e ratio of SSC and TA fell in a narrow range of 1.84-2.71 in all of the juices studied, being the key parameter determining the taste of fruit products.

Development and Optimization of the Calibration
Models.Multivariate PLS regression was used to model the relations between the NIR spectra and the properties of the juices (SSC, TA, and SSC/TA).Different methods of preprocessing and variable selection were tested.
e preprocessing methods included smoothing, multiplicative scatter correction (MSC), standard normal variate (SNV), and baseline correction techniques, and the latter included baseline offset and calculation of the first and second spectral derivatives; both single methods and some of their combinations were tested.
e optimal variable ranges for the raw and differently preprocessed spectra were determined using the jack-knife method and three variants of the iPLS method.e jack-knife method was applied to the regression coefficient of the PLS regression models obtained for the analysis of the entire NIR spectra.
e iPLS models were developed on ten spectral subranges of equal width, or on five subranges, selected to include specific absorption bands.ese five spectral intervals were 11,262-9407 cm −1 , 9406-7498 cm −1 , 7497-6225 cm −1 , 6224-5350 cm −1 , and 4778-4000 cm −1 .e idea of variable selection is to identify a subset of the data that produces the lowest prediction error for the parameter of interest.Different combinations of the preprocessing and variable selection methods were evaluated in order to find the optimal procedure.We compared the prediction performance of these local models with that of the global full-spectrum model.We evaluated the models on the basis of cross-validation, R 2 , the RMSECV, and the RE value [12].
Table 2 presents the optimal calibration models obtained using each of the tested variable selection methods for each  Finally, for each of the parameters studied, we identified a combination of the preprocessing method and the spectral range, which provided the model with the best prediction performance.e predicted versus measured plots and the regression coefficient plots for these models with the best performance for each of the parameters studied are shown in Figure 2.
(1) SSC Calibration Models.e parameters listed in Table 2 demonstrate good capacity of the NIR spectroscopy to predict the SSC of the apple juices.Indeed, a relatively good model for SSC was obtained for the analysis of the entire NIR spectra without any preprocessing.Preprocessing and variable selection improved the model parameters.us, the best model for SSC prediction was obtained for the first spectral derivative and variables selected by the jack-knife method; these variables are shown in Figure 2(a).e respective model was characterized by the R 2 of 0.881 and the RE value of 2.37%.
e variable selection by iPLS also led to model improvement as compared to the full-spectra models.e optimal intervals selected using iPLS (NIR) were 9406-7498 cm −1 and 6224-5350 cm −1 , which combined with the second-derivative preprocessing gave the model with a slightly higher value of RE of 2.65%.e optimal intervals selected using both iPLS (A) and iPLS (B) were the same (10,109-8516 cm −1 and 6137-5334 cm −1 ), which combined with the SNV and the first-derivative spectral preprocessing gave the models with slightly higher errors than the other variable selection methods tested (RE equal to 2.77%).
e results obtained for SSC modeling are comparable with the literature data; typical values of RMSEP for intact apples were around 0.5 °Brix or even higher (1-1.5 °Brix), when the external validation was performed using fruit test sets collected in different seasons and orchards [12].
(2) TA Calibration Models.e calibration model developed using raw data and the full-spectral range for TA showed poor performance with a low R 2 value and high RMSECV (Table 2).Spectral preprocessing combined with variable selection markedly increased the prediction ability of these models.However, it should be noted that even the optimized models were characterized by rather low R 2 values in the range between 0.713 and 0.761.e best model for the TA prediction was obtained for the spectral range selected by the iPLS (NIR) method (6224-5350 cm −1 ) for smoothed spectra (Figure 2(b)). is model was characterized by the R 2 of 0.761 and RE value of 4.55%.e application of iPLS (A) and iPLS (B) methods led to the selection of a wider spectral range as compared to the iPLS (NIR).In addition to the range of 6137-5334 cm −1 selected by both iPLS (A) and iPLS (B) methods, the 10,904-10,106 cm −1 and 9314-7722 cm −1 regions were selected by iPLS (A) and the 10,109-8516 cm −1 region was selected by iPLS (B).Models with a similar predictive ability resulted from the combination of iPLS (A) with smoothing and SNV, and of iPLS (B) with smoothing and MSC.ese models had a lower prediction ability as compared to the iPLS (NIR) models.A model with an intermediate predictive ability (RE of 4.80%) was obtained for the analysis of the smoothed and SNV-corrected spectra with the variables selected by jack-knife.
e lower predictive ability obtained for the TA models as compared to the SSC models is in accordance with the literature data. is result may be explained by the lower concentration of acids compared to that of sugars [12], and/or lower NIR spectral sensitivity to acids, due to the lower number of functional groups per molecule.(3) SSC/TA Calibration Models.e regression analysis for SSC/TA performed on raw spectra in the full-spectral range gave a model with the R 2 equal to 0.707 and RE equal to 6.88% (Table 2).Also, in this case, PLS models were significantly improved by applying an appropriate combination of spectral preprocessing and variable selection methods.e performances of optimized models for the SSC/TA prediction were intermediate as compared to those of the SSC and TA models.6 Journal of Spectroscopy e best model was obtained for the analysis performed on spectra without preprocessing, using the variables selected by the iPLS (NIR) method, in the range of 6224-5350 cm −1 (Figure 2(c)). is model was characterized by the RE of 5.04%.A slightly inferior performance was produced by the models that used spectra without any preprocessing and variables selected using the iPLS (A) or iPLS (B) method in the range of 6137-5334 cm −1 .e combination of smoothing and variable selection using the jack-knife method provided a model with intermediate performance (R 2 of 0.835 and RE of 5.18%).
Summing up, preprocessing and variable selection had a marked effect on the model performance.e two variants of the iPLS method, versions (A) and (B), each based on the same ten intervals, selected similar spectral ranges and provided PLS models with a similar performance.On the contrary, for the parameters studied, using the intervals based on the chemical knowledge of the NIR spectrum of the iPLS (NIR) variant produced better performing models as compared to iPLS (A) or iPLS (B).Application of the jack-knife method enabled selection of variables that gave models with a similar or better performance as compared to the iPLS method.
e iPLS-based models with the best performance for each of the chemical parameters studied used the 6224-5350 cm −1 range (or a similar 6137-5334 cm −1 range), indicating that spectral bands containing chemically significant information on the parameters studied are present in this spectral region.
e models for TA and SSC/TA using this range only gave good calibration results, while the calibration model for SSC required additional spectral ranges.

Conclusions
In the present study, we developed and optimized the calibration models for the prediction of characteristic parameters in apple juices.We demonstrated that NIR coupled with multivariate calibration is a suitable method for determination of the parameters, which are crucial for quality assessment (SSC and TA) and additionally for sweet-sour taste (SSC/TA) evaluation of apple juices.An optimal combination of the mathematical preprocessing of the spectra and selection of the variable range had to be found individually for each of the parameters studied, leading to a significant improvement of the model performance.e usage of an objective variable selection method may speed up the process of model optimization, identifying the spectral ranges with significant chemical information.e identification of the important spectral variables may contribute to the development of NIR screening sensors for the quality and sensory-related properties of apple juices.Such applications require further studies on extended sample sets.

Figure 1 :
Figure 1: e NIR spectra of the apple juices under study.

Figure 2 :
Figure 2: e results of PLS regression analysis for (a) SSC, (b) TA, and (c) SSC/TA.Left panel: predicted versus measured plots for the cross-validation.Right panel: regression coefficients.

Table 1 :
e soluble solids content (SSC), the titratable acidity (TA), and the ratio of the soluble solids to the titratable acidity (SSC/TA) of the apple juices in the calibration set (n � 30 samples).

Table 2 :
Characteristics of the optimal regression models for the prediction of the soluble solids content (SSC), titratable acidity (TA), and ratio of the soluble solids content to the titratable acidity (SSC/TA) of the apple juices under study.