Near-infrared (NIR) spectra were recorded for commercial apple juices. Analysis of these spectra using partial least squares (PLS) regression revealed quantitative relations between the spectra and quality- and taste-related properties of juices: soluble solids content (SSC), titratable acidity (TA), and the ratio of soluble solids content to titratable acidity (SSC/TA). Various spectral preprocessing methods were used for model optimization. The optimal spectral variables were chosen using the jack-knife-based method and different variants of the interval PLS (iPLS) method. The models were cross-validated and evaluated based on the determination coefficients (
Over the past years, the application of the near-infrared (NIR) spectroscopy coupled with chemometrics has gained wide acceptance in different fields, including food and agricultural products [
NIR spectroscopy is based on the absorption of electromagnetic radiation in the range of 12,500–4000 cm−1 [
One of the main advantages of the NIR technique is its nondestructive character and simple and rapid measurements. Different measurement modes enable direct analysis of both liquid and solid samples without any preparation. Due to its advantages, the NIR technique coupled with chemometrics provides a rapid, effective, and cost-saving alternative to the conventional methods in routine, high-throughput analysis of foods. NIR has been used to assess both the properties and concentrations of the food components, being also a well-established tool for process monitoring.
Using NIR for quality control requires chemometric methods to extract useful information out of complex spectra of the products studied [
An important area of NIR application is the analysis of fruit and vegetables and products of their processing [
NIR spectroscopy has been successfully used to evaluate a range of intact apple quality attributes such as the soluble solids content, titratable acidity, sugar content, vitamin C, total polyphenols, starch index, chlorophyll content, firmness, and mealiness [
Despite the amount of research carried out to date on using NIR to evaluate properties of the intact apples, the number of published papers that study the apple juice is rather limited. Spectroscopy in the NIR range was used to predict sugar content in the apple juice [
The important characteristics of apple juices related directly to their quality are soluble solids content (SSC) and titratable acidity (TA). The limits for these parameters in marketed apple juices are defined by the Code of Practice developed by the European Fruit Juice Association, which provides reference for the control of juice quality on the EU market. SSC is one of the major characteristics used to indicate sweetness of fresh and processed fruit products [
The aim of the present study was to test feasibility of the NIR spectroscopy in developing the calibration models for predicting the main quality parameters of the apple juices: SSC, TA, and SSC/TA. We also explored the possibilities to optimize the models using jack-knife variable selection, different variants of the interval PLS variable selection, and preprocessing methods.
Apple juices that are available on the market were evaluated in this study. The samples included clear and cloudy juices reconstituted from the concentrate, direct juices that were pasteurized, and freshly squeezed juices. The total of thirty juices from 15 different producers was studied; all of these samples were studied in duplicate, using two different production batches.
The spectra were collected using an FT-NIR spectrophotometer (MPA; Bruker Optics, Ettlingen, Germany). The instrument performance was validated before measurements by running automatic tests according to the manufacturer’s procedure. Spectral acquisition and instrument control were performed using OPUS software (v. 5; Bruker Optics, Ettlingen, Germany). The spectra were acquired in the range of 12,500–4000 cm−1 with the resolution of 8 cm−1 and with 64 scans coadded to obtain the averaged spectrum. The measurements were performed using transmittance techniques in cuvettes with the optical pathlength of 2 mm. The cuvettes were placed into a temperature-controlled cell holder, and measurements were conducted at a constant temperature of 35°C, controlled by the OPUS software. The spectra were recorded after centrifugation (15,000 rpm for 5 min), with six replicated spectra collected for each of the juices.
The soluble solids contents (SSC) of the juices were determined using an Abbe refractometer (model DR-A1’s Conbest) at 20°C, calibrated with distilled water. The SSC was expressed as Brix degrees (°Brix), with all of the measurements carried out in triplicate.
Titratable acidity (TA) was measured using a pH meter (S220 SevenCompact™; Mettler Toledo), by titrating 25 ml of the juice sample with 0.1 M NaOH to the pH endpoint of 8.1. The results were expressed as grams of malic acid per litre of the juice (g/L). These measurements were performed in triplicate.
Partial least squares (PLS) regression was used to establish the calibration models between the NIR spectra (the
Full leave-one-out (LOO) cross-validation was applied to all of the regression models. The regression models were evaluated using the determination coefficient (
We used different preprocessing methods in order to remove noise, baseline, and scattering effects from the spectra. Savitzky–Golay smoothing with the filter width of 15 data points was used to remove spectral noise, while the baseline was corrected using the baseline offset and the first and second derivatives. The baseline offset involved linear offset subtraction, which shifted the spectra in order to set the minimum value to zero. The first-order derivative is normally used to eliminate constant baseline shifts, and the second-order derivative also eliminates the baseline slope [
The preprocessing was performed on the average NIR spectra. Prior to PLS analysis, all of the spectra were mean-centred.
The variable selection methods applied in this work include the jack-knife method and different variants of the interval PLS (iPLS) [
The jack-knife is a method used for calculating the standard errors of the regression coefficient estimated in the PLS regression model [
The iPLS method subdivides the data into nonoverlapping sections, obtaining a local PLS model in each section, in order to determine the most useful variable range. The comparison between all of the local models is usually based on the RMSECV values, obtained from the validation [
The iPLS (NIR) variant used an NIR spectrum (with the 12,500–11,263 cm−1 and 5349–4779 cm−1 ranges excluded) that was divided into five frequency ranges, each corresponding to specific absorption bands. The local PLS models were tested in each of the selected ranges on their own and in all of their possible combinations. This procedure coincides with the synergy interval PLS (SiPLS) [
The iPLS (A) and iPLS (B) variants used the entire NIR spectrum (in the 12,500–4000 cm−1 range, with the 5349–4779 cm−1 range excluded) divided into ten subranges. The iPLS (A) started the calculation with all of the 10 subranges and next successively excluded one of the subranges. This procedure continued until the RMSECV value did not improve any further. This procedure coincides with the backward iPLS (BiPLS) [
The iPLS (B) starts the calculation to find the optimum spectral range with one of the subranges. After finding the best subrange, a second subrange is added. After the best combination of the two subranges is found, a third subrange is added, and so on. The best combination of the subranges was thus searched by adding and leaving out further subranges. This procedure coincides with the forward iPLS (FiPLS) [
The selection of variables was performed on differently preprocessed spectra. The algorithm implemented in the OPUS software enables automatic searching for the optimal combinations of the preprocessing method with the spectral range based on the minimum value of the RMSECV criterion. The 5349–4779 cm−1 spectral range was excluded from the calculations due to the high absorbance values, clearly exceeding the useful range of the instrument.
Finally, all of the PLS models with different combinations of the preprocessing methods and the variable ranges were calculated using The Unscrambler v. 9.8 (CAMO, Norway).
The thirty apple juice samples studied included different juice categories available on the market. They included juices reconstituted from the concentrate, both clear and with added fruit pulp, and direct juices, pasteurized and freshly squeezed.
Figure
The NIR spectra of the apple juices under study.
Very similar characteristic spectral patterns were observed in all of the measured spectra, which were visually indistinguishable. Generally, the positions of the main absorption bands coincided with those obtained for intact apples [
The absorbance spectra are dominated by water absorption, which is the main component of the apple juices. The absorption bands for water were reported at 10,309 cm−1 (the second overtone of the O-H stretching band), 8403 cm−1 (the combination of the first overtone of the O-H stretching and the O-H bending bands), 6896 cm−1 (the first overtone of the O-H stretching band and a combination band), 5154 cm−1 (combination of the O-H stretching band and the O-H bending band), and 4444 cm−1 [
Sugars and organic acids are the main constituents of apple juices, besides water. The most dominant sugar in the apple fruit is fructose, followed by glucose and sucrose. Malic acid is the principal organic acid found in apples. Other components of the apple juice include polyphenolic compounds, vitamins, and some amino acids [
The first, second, and third overtones of the C-H stretching vibrations (CH group) are observed, respectively, in the ranges of 5550–6250 cm−1, 8100–9100 cm−1, and 11,000 cm−1. The bands arising from the overtones of OH, CH, and CH2 deformation vibrations are observed below 5400 cm−1 [
The absorption bands characteristic for the carboxylic acids appear at 6222 cm−1 (C-O from COOH), 8873 cm−1 (O-H from carboxylic acids), and 6959 cm−1 (C=O from saturated and unsaturated carboxylic acids) [
All of the thirty apple juice samples were used as the calibration set for developing and optimizing the calibration models. Chemical characteristics of the calibration set including the mean values, ranges, and standard deviations of the soluble solids contents (SSC), titratable acidity (TA), and the SSC/TA ratio are presented in Table
The soluble solids content (SSC), the titratable acidity (TA), and the ratio of the soluble solids to the titratable acidity (SSC/TA) of the apple juices in the calibration set (
Parameter | Range | Mean | SD |
---|---|---|---|
SSC (°Brix) | 11.0–13.6 | 11.68 | 0.79 |
TA (g/L) | 4.51–6.09 | 5.25 | 0.48 |
SSC/TA | 1.84–2.71 | 2.24 | 0.28 |
The solids content in the studied apple juices was in the range of 11.0 to 13.6 °Brix. The titratable acidity was 5.25 g/L on average and ranged from 4.51 g/L to 6.09 g/L. These values are within the limits established for apple juices by the Code of Practice [
The ratio of SSC and TA fell in a narrow range of 1.84–2.71 in all of the juices studied, being the key parameter determining the taste of fruit products.
Multivariate PLS regression was used to model the relations between the NIR spectra and the properties of the juices (SSC, TA, and SSC/TA). Different methods of preprocessing and variable selection were tested. The preprocessing methods included smoothing, multiplicative scatter correction (MSC), standard normal variate (SNV), and baseline correction techniques, and the latter included baseline offset and calculation of the first and second spectral derivatives; both single methods and some of their combinations were tested.
The optimal variable ranges for the raw and differently preprocessed spectra were determined using the jack-knife method and three variants of the iPLS method. The jack-knife method was applied to the regression coefficient of the PLS regression models obtained for the analysis of the entire NIR spectra. The iPLS models were developed on ten spectral subranges of equal width, or on five subranges, selected to include specific absorption bands. These five spectral intervals were 11,262–9407 cm−1, 9406–7498 cm−1, 7497–6225 cm−1, 6224–5350 cm−1, and 4778–4000 cm−1. The idea of variable selection is to identify a subset of the data that produces the lowest prediction error for the parameter of interest. Different combinations of the preprocessing and variable selection methods were evaluated in order to find the optimal procedure. We compared the prediction performance of these local models with that of the global full-spectrum model. We evaluated the models on the basis of cross-validation,
Table
Characteristics of the optimal regression models for the prediction of the soluble solids content (SSC), titratable acidity (TA), and ratio of the soluble solids content to the titratable acidity (SSC/TA) of the apple juices under study.
Parameter | Variable selection | Spectral range (cm−1) | Preprocessing | LV |
|
Calibration | RE (%) |
|
Cross-validation | RE (%) |
---|---|---|---|---|---|---|---|---|---|---|
RMSECV | RMSECV | |||||||||
SSC | None | Full range | None | 6 | 0.891 | 0.257 | 2.20 | 0.800 | 0.360 | 3.08 |
|
|
|
|
|
|
|
|
|
||
NIR | 9406–7498; 6224–5350 | 2nd derivative | 10 | 0.987 | 0.088 | 0.75 | 0.853 | 0.309 | 2.65 | |
A, B | 10,109–8516; 6137–5334 | SNV + 1st derivative | 4 | 0.904 | 0.242 | 2.07 | 0.838 | 0.324 | 2.77 | |
|
||||||||||
TA | None | Full range | None | 7 | 0.804 | 0.209 | 3.98 | 0.512 | 0.341 | 6.50 |
Jack-knife | Smooth + SNV | 7 | 0.862 | 0.175 | 3.33 | 0.733 | 0.252 | 4.80 | ||
|
|
|
|
|
|
|
|
|
|
|
A | 10,904–10,106; 9314–7722; 6137–5334 | Smooth + SNV | 6 | 0.829 | 0.195 | 3.71 | 0.717 | 0.260 | 4.95 | |
B | 10,109–8516; 6137–5334 | Smooth + MSC | 10 | 0.893 | 0.155 | 2.95 | 0.713 | 0.262 | 4.99 | |
|
||||||||||
SSC/TA | None | Full range | None | 7 | 0.902 | 0.086 | 3.84 | 0.707 | 0.154 | 6.88 |
Jack-knife | Smooth | 8 | 0.931 | 0.073 | 3.26 | 0.835 | 0.116 | 5.18 | ||
|
|
|
|
0.940 |
|
|
|
0.113 |
|
|
A, B | 6137–5334 | None | 10 | 0.941 | 0.067 | 2.99 | 0.828 | 0.118 | 5.27 |
The best model for each of the parameters is italicized; none: spectra without preprocessing; 1st derivative: first-order derivative; 2nd derivative: second-order derivative; smooth: smoothing; SNV: standard normal variate; MSC: multiplicative scatter correction; NIR, A, and B are different versions of the iPLS method.
Finally, for each of the parameters studied, we identified a combination of the preprocessing method and the spectral range, which provided the model with the best prediction performance. The predicted versus measured plots and the regression coefficient plots for these models with the best performance for each of the parameters studied are shown in Figure
The results of PLS regression analysis for (a) SSC, (b) TA, and (c) SSC/TA. Left panel: predicted versus measured plots for the cross-validation. Right panel: regression coefficients.
The variable selection by iPLS also led to model improvement as compared to the full-spectra models. The optimal intervals selected using iPLS (NIR) were 9406–7498 cm−1 and 6224–5350 cm−1, which combined with the second-derivative preprocessing gave the model with a slightly higher value of RE of 2.65%. The optimal intervals selected using both iPLS (A) and iPLS (B) were the same (10,109–8516 cm−1 and 6137–5334 cm−1), which combined with the SNV and the first-derivative spectral preprocessing gave the models with slightly higher errors than the other variable selection methods tested (RE equal to 2.77%).
The results obtained for SSC modeling are comparable with the literature data; typical values of RMSEP for intact apples were around 0.5 °Brix or even higher (1–1.5 °Brix), when the external validation was performed using fruit test sets collected in different seasons and orchards [
The best model for the TA prediction was obtained for the spectral range selected by the iPLS (NIR) method (6224–5350 cm−1) for smoothed spectra (Figure
The lower predictive ability obtained for the TA models as compared to the SSC models is in accordance with the literature data. This result may be explained by the lower concentration of acids compared to that of sugars [
The best model was obtained for the analysis performed on spectra without preprocessing, using the variables selected by the iPLS (NIR) method, in the range of 6224–5350 cm−1 (Figure
Summing up, preprocessing and variable selection had a marked effect on the model performance. The two variants of the iPLS method, versions (A) and (B), each based on the same ten intervals, selected similar spectral ranges and provided PLS models with a similar performance. On the contrary, for the parameters studied, using the intervals based on the chemical knowledge of the NIR spectrum of the iPLS (NIR) variant produced better performing models as compared to iPLS (A) or iPLS (B). Application of the jack-knife method enabled selection of variables that gave models with a similar or better performance as compared to the iPLS method.
The iPLS-based models with the best performance for each of the chemical parameters studied used the 6224–5350 cm−1 range (or a similar 6137–5334 cm−1 range), indicating that spectral bands containing chemically significant information on the parameters studied are present in this spectral region. The models for TA and SSC/TA using this range only gave good calibration results, while the calibration model for SSC required additional spectral ranges.
In the present study, we developed and optimized the calibration models for the prediction of characteristic parameters in apple juices. We demonstrated that NIR coupled with multivariate calibration is a suitable method for determination of the parameters, which are crucial for quality assessment (SSC and TA) and additionally for sweet-sour taste (SSC/TA) evaluation of apple juices. An optimal combination of the mathematical preprocessing of the spectra and selection of the variable range had to be found individually for each of the parameters studied, leading to a significant improvement of the model performance. The usage of an objective variable selection method may speed up the process of model optimization, identifying the spectral ranges with significant chemical information. The identification of the important spectral variables may contribute to the development of NIR screening sensors for the quality and sensory-related properties of apple juices. Such applications require further studies on extended sample sets.
The data are available upon request from
The authors declare that they have no conflicts of interest.
Grant 2016/23/B/NZ9/03591 from the National Science Centre, Poland, is gratefully acknowledged.