Study on the Effect of Apple Size Difference on Soluble Solids Content Model Based on Near-Infrared (NIR) Spectroscopy

Soluble solids content (SSC) is a vital evaluation index for the internal quality of apples, and NIR spectroscopy is the preferred technique for predicting the SSC of apples. Due to the differences in fruits’ sizes, their SSC prediction models have poor robustness and low prediction accuracy, so it is important to eliminate the effects brought by the differences in fruit sizes to improve the accuracy of fruit sorting models. The NIR spectra of apples with different fruit sizes were collected by applying NIR spectroscopy online detection device, and after various preprocessing of the spectra, the partial least squares (PLS) models of apple SSC were established, respectively, and then the modeling set in the apple fruit size group of 75 mm–85mm was used to predict the prediction set samples in the apple fruit size group of 65 mm–75mm and 85mm–95 mm, respectively. To better address the effects of apple size differences, data fusion techniques were used to perform an intermediate fusion of apple fruit diameter and spectra, firstly, the competitive adaptive reweighting algorithm (CARS) and the continuous projection algorithm (SPA) were used to select spectral variables and build their prediction models for apple SSC, respectively, and the results showed that the models built with 61 spectral variables selected by CARS had better performance, greatly reduced the amount of data involved in modeling, effectively simplified the model, and improved the stability of the model. The apple size variables were added to the wavelength variables selected by CARS, and the data were normalized to establish a PLS model of apple SSC with the normalized spectral and apple fruit diameter data, and the results showed that the size compensation model based on intermediate fusion had the best prediction performance, with the prediction set Rp of 0.886 for fruit diameter of 65 mm–75mm, RMSEP of 0.536%, and its prediction set Rp was 0.913 and RMSEP was 0.497% for the fruit diameter of 85mm–95mm. Therefore, adding the fruit diameter variable to establish the size-compensated model of apple SSC can improve the prediction performance of the model.


Introduction
Apples are rich in many vitamins and acids inside, and eating more apples can relieve fatigue and improve brain vitality. Due to the increasing standard of living, the demand for high-quality apples is also increasing. Near-infrared spectroscopy online inspection technology has been applied to detect the internal quality of fruits such as apples, strawberries, citrus, pears, and watermelons as a fast, nondestructive, and green inspection technology [1][2][3][4]. e di erence in fruit diameter of apples a ects the performance of the established SSC model. erefore, selecting a sizecompensated model of apple SSC is necessary.
Scholars at home and abroad have done a lot of research on the internal quality of fruits by applying NIR spectroscopy. Guo et al. [5] built an online detection system for apple heart rot using NIR transmission, and the correlation coe cient of the prediction model they built was 0.92. Liu et al. [6] created a NIR di use re ection online detection model for SSC of navel orange, and its prediction correlation coe cient was 0.90. Li et al. [7] built an apple online nondestructive testing equipment using NIR spectroscopy and established a prediction model for the SSC content of apple, whose correlation coe cient reached 0.949 and the root mean square error of the prediction set was 0.449. Han et al. [8,9] used NIR transmission spectroscopy combined with a band screening method to discriminate two diseases of apple, and their discriminant model accuracy reached 95.7%. Xu et al. [10] studied compared the effect of single-and double-point detection on the accuracy of online detection of apple SSC. e double-branched fiber system proved excellent robustness, while the single-branched fiber proved perfect accuracy with a prediction set coefficient of 0.63. e studies conducted by the above et al. did not consider the effect of apple fruit diameter on the model, and the performance of the established model was low [7,8,10]. Liu et al. [11] established the NIR spectrum detection model for navel oranges of different sizes and found through the study that the use of MSC and SNV pretreatment can solve the influence of apple size differences and improve the accuracy of the prediction model. Tian et al. [12] established a discriminative model for apple core mold of different sizes in NIR spectra. ey found that the NIR spectrum intensity and optical range were exponentially related and modified the NIR spectra and the modified model. e prediction set discrimination accuracy reached 90.2%, and the method can correct the effect of fruit size on the transmission spectrum to improve the identification of diseased apples. Two prototypes of online NIR systems were developed by McGlone et al. [13], one is based on a time-delay integral spectrometer and the other on a large-aperture spectrometer. e latter system has high accuracy, with a predicted root mean square error of 4.1% after PLS correction. However, only apples with a mean equatorial diameter of 76 mm (SD � 2.8 mm) were selected for the experiment, and the effect of fruit size on the detection of browning tissue was not investigated. Qin and Lu [14,15] quantified the light transmission in apples using Monte Carlo simulations. ey corrected the diffuse reflectance spectrum according to the fruit size to eliminate the light intensity distortion caused by the curved fruit surface. In this paper, we applied NIR spectra. We collected apple NIR spectra at different sizes to establish various preprocessing models, mixed size models, and data fusion-based apple size compensation models to compare the advantages and disadvantages of the prediction performance of the three models and find the best solution for the effect of apple size differences on the model performance.

Experimental Materials.
A total of 480 apple samples, including 160 of 65 mm-75 mm fruit diameter, 160 of 75 mm-85 mm fruit diameter, and 160 of 85 mm-95 mm fruit diameter, were ordered from an orchard in Yantai, Shandong Province. Upon arrival, the experimental spectra of the apples were collected after wiping off the dust from the surface of the apples with a wet paper towel in order to prevent the scattering effect of the dust on the transmission spectra and leaving the apples in a room with an ambient temperature of 25°C for 24 hours.

Experimental Device and Spectrum Acquisition.
e near-infrared spectrum acquisition device used in this paper is a dynamic online diffuse transmission detection device developed by our group [16], as shown in Figure 1. e light sources are two rows of halogen lamps, 5 in a row, 10 in total. e parameters of the halogen lamps are 12 V and 100 W, which provide light sources for collecting spectrum information in the diffuse transmission mode. e apples are placed on the fruit cups and transferred to the dark box by the chain. e halogen lamps illuminate the apples passing by, and the light through the interior of the apples is received by the optical fiber and transferred to the computer through the spectrometer, which has a wavelength range of 350∼1150 nm, and the exposure time of the samples is adjusted by the supporting spectrum acquisition software. e device was preheated for 30 min before the spectrum acquisition, the detection speed of the device was set to 0.5 m/s, and the exposure time was 100 ms. Each sample was collected four times at the equatorial part, and the average spectrum was taken as the experimental spectrum of that sample.

Parameter Measurement.
e SSC content of apple samples was measured using a refractive digital saccharimeter (PR-101a, Japan). e measurement process was as follows: a fruit knife was used to cut off part of the flesh of the spectrum collection site on the four sides, and the juice was squeezed out of the flesh and dropped on the measurement position of the saccharimeter to measure the saccharimetric value of this side of the apple. e average SSC value of the four sides was taken as the SSC value of the apple sample.
e fruit diameter at the equatorial position of the apple was measured using a digital vernier caliper (Mitutoyo-500, Japan). Each apple was measured four times at the equatorial position and the average fruit diameter was taken as the fruit diameter of that fruit.

Data Processing.
e Kennard-Stone (K-S) algorithm was first applied to classify the collected apple spectra. e collected spectrum data were imported using unscrambler software to establish the SSC model of apples. e prediction set correlation coefficient R p judged the performance of the model and prediction set root mean square error value (RMSEP). e formulae for R P and RMSEP are shown in equations (1) and (2), respectively.
where n is the number of samples in the prediction set, y i is the predicted value of the i-th sample in the prediction set, y i is the true value of the i-th sample in the prediction set, and y i is the average of the true values of all samples in the prediction set. PLS is the most commonly used multivariate linear correction technique, which is widely used in NIR spectroscopy to predict the internal quality of fruits quantitatively, and the principle of PLS prediction is shown in 2 Journal of Spectroscopy where Y is the model prediction, i denotes the i-th wavelength point, β i denotes the regression coe cient value corresponding to the i-th wavelength point, λ i is the spectrum energy value corresponding to the i-th wavelength point, n is the number of wavelength points, and B is the intercept.

Analysis of Apple SSC and Measurement
Results. e experimental samples of 480 apples were classi ed separately using the K-S algorithm for the modeling set and the prediction set, 160 samples under each fruit diameter group, of which 120 were in the modeling set and 40 in the prediction set, and the SSC measurements of apples are shown in Table 1. e SSC content range of the modeling set was more comprehensive (9.05-16.4 Brix) than that of the prediction set (9.65-14.85 Brix), which could achieve a better prediction for the apple SSC model.

Analysis of NIR Spectrum Characteristics of Apples with Di erent Fruit Diameters.
e average spectra of apples in the three fruit diameters were taken and compared with the spectra of three di erent fruit diameters, as shown in Fig [17,18]. e peaks at 805 nm are mainly associated with the secondary multiplicative absorption of C-H and N-H bonds [19,20]. e energy spectrum of apples with a fruit diameter of 65 mm-75 mm is higher than apples with a fruit diameter of 75 mm-85 mm and 85 mm-95 mm.
is phenomenon is because the energy carried by near-infrared light inside the apple decays as the light range increases. At a given wavelength, the extinction rate of light entering the apple interior is approximated as an exponential decay function [21], which can be tted as where I 0 is the light intensity entering the interior of the apple, I is the light intensity received by the ber optic probe below the apple, u e is the extinction coe cient, and d is the distance from the point where the light enters the apple to the point where the light exits the apple.
In the NIR online detection device, as in Figure 1, d in formulae (3) is the fruit diameter of the apple. As the fruit diameter increases, the energy of the light is absorbed more and more inside the apple, resulting in a lower energy value of its collected NIR spectrum. It is thus speculated that di erent fruit diameters of the apple will cause di erences in its NIR spectrum, which will impact the performance of the apple SSC prediction model built from the NIR spectrum.

Apple Soluble Solids Content Prediction Model for Each of the ree Fruit Diameters.
e PLS was used to build the apple SSC prediction model, and the number of LVs was set to 1∼20 to prevent the model from over tting or undertting. e spectra were pretreated by MSC, SNV, and S-G smoothing alternatively, and the results of the PLS model of sugar content built for the three groups of fruit diameter apples are shown in Table 2.
e results showed that the NIR spectra of apples with fruit diameters of 65 mm-75 mm were treated with SNV. eir model predictions were the best, with Rp of 0.885 and RMSEP of 0.771%. eir scatter diagrams are shown in Figure 3(a). e NIR spectra of apples with fruit diameters of 75 mm-85 mm were treated with SNV, and their model predictions were the best, with Rp of 0.959 and RMSEP of the scatter diagram shown in Figure 3(b). e NIR spectra of apples with the fruit diameter of 85 mm-95 mm were best predicted by SNV, with Rp of 0.937 and RMSEP of 0.421%, and the scatter diagram is shown in Figure 3(c). e modeling results of the original spectra at the three fruit sizes and the modeling results after SNV pretreatment show that SNV could solve the problem of poor performance of its sugar prediction model due to di erences in apple size to some extent, because SNV, as a pretreatment method that can eliminate sample particle size, surface scattering, and light range variation [22], can solve the e ect of spectrum scattering caused by uneven sample size. Table 2, the modeling set of apple SSC model with fruit diameter 75 mm-85 mm had better performance than the other fruit diameter groups, so the modeling set with fruit diameter 75 mm-85 mm was selected as the modeling set of the hybrid prediction model to investigate the performance of the apple SSC prediction model when the apple fruit diameter in the modeling set was di erent from the apple fruit diameter in the prediction set. e modeling set in the fruit diameter 75 mm-85 mm group was used to predict the prediction set in the fruit diameter 65 mm-75 mm and fruit diameter 85 mm-95 mm groups. e prediction model e ects are shown in Table 3. e scatter diagrams are shown in Figures 4(a) and 4(b). e      Journal of Spectroscopy 5 spectra at 750 nm, as shown in Figure 5, from which it can be seen that the size of the apple will have an e ect on its NIR spectrum and thus will have an e ect on the performance of the apple SSC prediction model established by its NIR spectrum, so it is necessary to compensate the size of the apple SSC model to increase the prediction performance of the sorting model.

Mixed Apple Size Soluble Solids Content Prediction
Model. From Table 3 and Figure 5, it is known that apple size di erences will a ect its NIR spectrum, so it is necessary to build an apple size compensation model to solve the in uence of apple size di erences on the model. 120 representative apple samples from each fruit size group were selected as the modeling set of the mixed apple size SSC prediction model using the K-S algorithm, and 40 apple samples were selected as the prediction set. e PLS prediction models for SSC of di erent apple sizes were constructed, the model e ects are shown in Table 4, and the scatter diagrams are shown in Figure 6. e results show that the prediction models established for mixed apple sizes t better, the model performance is better, the correlation coe cient Rp is signi cantly improved, the root mean square error value RMSEP of the prediction set is signicantly reduced, and the model stability is signi cantly improved compared with the prediction models established in Table 3. e model stability was signi cantly improved, and the in uence of the model on the SSC of apple due to apple size could be reduced. As shown from Table 4, the constructed PLS model of mixed apple size SSC has improved its model prediction performance relative to Table 3 when the modeling set and prediction set of fruit diameter is di erent. Still, its prediction set root mean square error value RMSEP is as high as 0.911%. In the actual fruit sorting line, such a high error value will lead to inaccurate apple quality sorting. Add the size variable, and then build its SSC prediction model. Its model prediction results are shown in Table 5.
As shown in Table 5, the model prediction performance of the mixed apple size SSC prediction model improved after adding the size variable. e root means square error value RMSEP of the prediction set decreased from 0.911% to 0.822%, but the improvement of its size variable on the model performance was negligible. In the apple mixed size solids content model, its modeling set containing each group of the e ect of size variables was diluted in the mixed size solids content model, resulting in an insigni cant impact.

Development of a Size-Compensated Soluble Solids Content Prediction Model for Apples.
In this study, the e ect of apple fruit size on the SSC model of apple in NIR spectroscopy was investigated by the data fusion technique. e main objective of this technique is to optimize the amount of information on the tested sample metrics through the synergistic e ect between the individual assays of the same sample [23], which consists of three levels of fusion: primary, intermediate, and advanced fusion. Primary fusion is the fusion and modeling of the raw data from multiple assays; intermediate fusion is the screening of the e ective variables of each assay and then data fusion and modeling; advanced fusion is the modeling of each assay independently and then the decision making after considering the results of each model. In recent years, data fusion techniques have been applied in several elds, such as metabolomics [24,25], artwork identi cation [26,27], dye classi cation [28], and food testing [29,30]. In this study, a preliminary analysis of the e ect of apple fruit diameter on the visible/near-infrared spectrum fruit nondestructive inspection model was conducted using a mid-level data fusion technique, the technical ow chart shown in Figure 7.
CARS and SPA were used to select the spectrum variables of the apple modeling set in the fruit diameter of 75 mm-85 mm to eliminate useless variables, further optimize the prediction model's performance, and improve the detection speed of the model. PLS modeling was performed in two cases: (1) PLS models of apple SSC were established with the wavelength variables selected by CARS or SPA. (2) e wavelength variables selected by CARS or SPA were fused with their corresponding apple size data and normalized to establish the PLS model of apple SSC. e selected spectrum wavelength points of CARS and SPA are shown in Figures 8(a)) and 8(b), and the results of the established PLS model of apple SSC are shown in Table 6. e results showed that most of the wavelength points selected by CARS and SPA were located at 650 nm-850 nm. Most of them were at the peaks and valleys, indicating a large amount of information on the SSC of apples in this spectrum region. e poor performance of the SSC model built with the screened wavelengths after using SPA wavelength screening was caused by the fact that the SSC has multiple representations on the spectrum, and the wavelength selection was performed to remove a lot of useful information, which led to the poor performance of the model. Among them, the Rp of the model prediction set established by SPA for fruit diameter of 65 mm-75 mm was 0.744 and RMSEP was 1.340%; the Rp of the model prediction set for fruit diameter of 85 mm-95 mm was 0.665 and RMSEP was 1.942%. Using PLS to model the wavelength variables selected by CARS, the Rp of the model prediction is set for fruit diameter of 65 mm-75 mm. e number of wavelength variables used in the model decreases from 1044 to 61, e ectively simplifying the model and improving its stability. e apple size variables were added to the wavelength variables selected by CARS, and the data were normalized because the apple size data and spectrum data units do not coincide [31]. e e ect of data normalization is to eliminate the impact of data dimensionality and make the data metrics comparable, which is essential for model building [32]. A PLS model of apple SSC with the intermediate level fusion of spectrum and apple fruit diameter data after normalization was established to show the model performance in Table 7. e model scatter diagrams are shown in Figures 9(a) and 9(b). Table 7, compared with the PLS model of apple SSC built with CARS selected wavelength variables, the developed size-compensated intermediate fusion model had an improved prediction set Rp from 0.854 to 0.886 and a reduced RMSEP from 0.611% to 0.536% for fruit diameter of 65 mm-75 mm, and its prediction set Rp from 0.863 to 0.913 and a reduced RMSEP from 0.586% to 0.497% for fruit diameter of 85 mm-95 mm improved to 0.913 and RMSEP decreased from 0.586% to 0.497%. Compared with the mixed fruit size model, its prediction set Rp had a signi cant improvement and RMSEP had a considerable decrease. e results indicate that apple size in uences the

Conclusion
is paper investigated the effect of apple fruit diameter differences on its SSC prediction model. e results showed that apple size differences will have an impact on its spectrum, and apple size and its spectrum light intensity satisfy the relationship of the logarithmic function, which will eventually have an effect on the prediction performance of the PLS model of apple SSC established by it. For this reason, the solution methods of different size differences and different preprocessing models, and apple fruit size were studied. We found that SNV is a pretreatment method that can eliminate sample particle size, surface scattering, and light range variation. It can solve the poor performance of the prediction model of SSC due to the difference in apple size to a certain extent. e correlation coefficient Rp was significantly improved; the root means the square error of the prediction set RMSEP was reduced considerably. e stability of the model was dramatically enhanced, which could reduce the influence of apple size on the model of apple SSC. e prediction set Rp for fruit diameter 65 mm-75 mm is 0.886 and RMSEP is 0.536%, and the prediction set Rp for fruit diameter 85 mm-95 mm is 0.913 and RMSEP is 0.497%, which is the best model performance.
erefore, adding the fruit diameter variable to establish the size compensation model of apple SSC can improve the model's prediction performance and meet the requirements of online detection of the SSC of apples with different fruit diameters.
Data Availability e raw data in our research cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest
e authors report no conflicts of interest.