Application of Near-Infrared Spectroscopy to Rapidly Classify the Chinese Quince Fruits from Different Habitats

,


Introduction
China is the natural habitat and cultivation center of Chinese quince (Chaenomeles speciosa Nakai), which has vast genetic resources and is mostly planted in the East, Central, and Southwest regions of China [1].China is also the secondlargest producer of quince in worldwide production, following Turkey.Tis fruit is a rich source of nutritious components, and it also possesses antioxidant and immune regulatory qualities.Sugar, amino acids, favonoids, saponins, organic acids, and other useful components can be found in the fruit, which also possesses the ability to relax channels, activate collaterals, moisten the stomach, and perform a variety of other functions [2,3].It has also been used for thousands of years as one of the most essential substances in traditional Chinese medicine, which is typically appropriate to treat several diseases, including arthralgia, leg edema, and sunstroke [4].To the present, Chinese quinces have continued to get a growing amount of attention for their potential to improve one's overall health.However, the quality of Chinese quince fruit might change depending on the habitat where it is grown because of the varying climatic circumstances (such as the moisture and humidity levels of the soil and the temperature).Terefore, there is a growing demand for research to determine the quality of Chinese quince fruits grown in a variety of feld conditions.
Tere are several reports that have been published in the past that discuss various methods, traditional and emerging, that have been used to determine the quality of fruits that have been produced in a range of diferent feld conditions.Traditional methods, such as DNA analysis [5], amino acid composition [6], and gas chromatography (GC) analysis [7], were both time-consuming and expensive.Near-infrared spectroscopy (NIRs) is an emerging method that is both rapid and nondestructive [8].It is used for qualitative and quantitative analysis of the chemical composition of fruits such as apples [9], bananas [10], peaches [11], kiwifruits [12], and pears [13].Tis method is becoming increasingly popular as a solution to the limitations and challenges of traditional methods.NIRs has been used to authenticate the authenticity of freeze-dried açai pulp [14], trace apple habitat [15], determine soluble solid content in multihabitat apples [16], diferentiate apple varieties, and investigate organic status [17].Nevertheless, despite the number of research on fruit quality and habitat as discussed in the preceding lines, there is very little or no known research work related to the use of NIRs to determine Chinese quince habitat.In our earlier research [2], we analyzed and compared three distinct methods of discriminant analysis to determine the Chinese quince habitat.
Partial least squares discriminant analysis (PLS-DA) is one of the most widely used methods for classifcation in chemometrics [18,19].Tis method has also received widespread application in domains associated with the "omics," such as metabolomics, proteomics, and genomics, in addition to an array of other felds that generate huge amounts of data, such as spectroscopy [20][21][22][23][24]. Te rising interest in PLS-DA, particularly in the feld of metabolomics, may largely be attributed to the fact that it is included in the vast majority of widely used statistical software programs [22,[25][26][27][28][29][30].Tese software packages include R, S-Plus, SAS, SPSS, and MATLAB.On the other hand, PLS-DA has recently been described by researchers as a powerful and reliable classifcation approach when paired with spectroscopy, which is utilized for discriminating between diferent qualities of fruit [31][32][33].However, the PLS algorithm has a faw in that it might provide inaccurate predictions due to the large number of irrelevant variables that it considers [34].Te methods used for selecting variables can choose a limited number of variables that are extremely signifcant and have an association with the characteristics of the class (for example, habitat) [35].Variable selection may also increase classifcation performance by accurately selecting a subset of key predictors [36].Tis can be done by using the results of the classifcation.
Te utilization of NIRs has recently been employed to efciently categorize Chinese quince fruits originating from distinct habitats [2].Te NIRs method provides a noninvasive and highly efective approach for analyzing the chemical composition of fruit samples [2,8,9,14,15,31].In a scientifc investigation, scientists employed near-infrared refectance spectroscopy in conjunction with multivariate analysis methodologies to categorize Chinese quince fruits according to their specifc geographical origins [2].Te current investigation centered on Chinese provinces renowned for their diverse climate conditions and soil characteristics.Te objective of the preceding investigation was to construct a model capable of efectively discriminating quince fruits originating from the aforementioned two geographical areas [2].Te investigation gathered NIRs spectra from a substantial quantity of quince fruit samples and employed multivariate analysis techniques, including principal component analysis (PCA) and linear discriminant analysis (LDA), to categorize the samples.Te PCA was employed to efectively decrease the dimensionality of the spectral data.Subsequently, the LDA was utilized to construct a classifcation model using the reduced dataset.Te fndings of the research demonstrated that the NIRs methodology, in conjunction with multivariate analysis techniques, exhibited a high level of efcacy in accurately categorizing Chinese quince fruits originating from diverse habitats.Consequently, the classifcation accuracy exhibited a notable level, suggesting that NIRs possesses signifcant potential as a valuable instrument for swiftly and noninvasively categorizing fruit samples according to their geographical origin or natural habitat [2].Te utilization of NIRs in the categorization of Chinese quince fruits originating from diverse habitats showcases the promising capabilities of this method in ensuring fruit quality control, traceability, and authentication within the agricultural sector.
Terefore, the study aimed to develop PLS-DA models based on the NIRs of Chinese quince fruits to predict the habitats of Chinese quince and demonstrate how diferent variable selection methods infuence the classifcation results of PLS-DA models rapidly and accurately.

Materials.
During the harvest season in the year 2020, samples of Chinese quince fruit were collected from six diferent habitats (Figure 1), which together represent the majority of the Chinese quince fruit-producing regions.When the fruit's color changed to a yellowish green, which is also the customary time for harvesting quinces for medicinal purposes, three fresh quinces that were still intact were picked at random from each tree in each habitat.All of the samples were thereafter placed in a plastic bag, which was then labeled and then placed in a cooler box to maintain their freshness.Te samples for the test consisted of a total of 663 fruits, which were collected from six main producing regions at a rate of three fruits per plant for a total of 221 distinct plants (Table 1 and Figure 2).

Spectra Acquisition.
In this study, the data for the near-infrared refectance spectra of individual fruits were 2 Journal of Food Quality collected at room temperature (25 °C) using a hand-held near-infrared spectrometer (LF-2500, Spectral Evolution, USA) at an interval of 6 nm from 1000 nm to 2500 nm.A total of 32 times, on average, were used for scanning each spectrum.Te manufacturer of the apparatus supplied the DARWin SP (version 1.2) software that was used to analyze the collected data.Each individual fruit sample was subjected to the recording of all three spectra.Te contact probe, which had a diameter of 20 mms, was positioned on the ventral surface of the Chinese quince fruit samples with the stem-calyx axis horizontal at a location chosen at random.Te second measurement was carried out at a location that was roughly 120 °rotated from the starting point.Te third spectra were collected at an angle of roughly 240 °rotated from the starting point.For each sample, an average of the three spectra was calculated.

Data Processing.
Te R software (version 3.1.2)was utilized for the processing of the data [37].Te NIRs spectra were averaged using the mean value of all of the fruits that were found on each tree.In the end, 221 diferent spectral samples were utilized.Following the conversion of the refectance spectrum into the absorbance spectrum, multivariate analysis was performed.Both the standard normal variable and the frst derivative were put through their tests as potential spectral preprocessing methods.Te additive efect and noise present in the spectrum can be efectively eliminated through the utilization of two distinct preprocessing techniques, which difer from the conventional methods employed for processing NIRs spectra [2,14].
Te dataset was subsequently partitioned into two distinct subsets: a calibration set and a validation set [14].Both of these subsets comprised samples that were chosen interactively using their Euclidean distances, aiming to achieve the highest attainable data coverage.Ultimately, a total of 181 samples were employed for the calibration set, while the remaining 40 samples were allocated for the validation set [34].PLS-DA classifcation models were utilized to diferentiate between the various origins of Chinese quince fruits [17].Te PLS-DA method is a variant of the PLS regression (PLS-R) methodology.PLS-R is usually used to tackle regression-related problems and is most appropriate in situations in which the matrix of predictors contains more variables than data.PLS-DA is an appropriate approach for classifcation since it conducts a dimension reduction on the predictor variables and extracts the components that are signifcantly linked with the class factor [14,16].As a result, PLS-DA was employed to classify data.
In the PLS-DA model, the spectra of the six diferent habitat fruits were utilized for the X matrix, and six fabricated values were used for the Y matrix to represent each habitat.Shandong, Anhui, Zhejiang, Hubei, Chongqing, and Yunnan each had a dummy value between 0 and 5, and those values were given to their respective spectra.Root mean square error (RMSE) ranges of ±0.5 were set between each habitat.If an individual's RMSE fell within one of these ranges from any habitat, then the individual was considered to be classifed in that habitat.Te leave-one-out crossvalidation method was utilized in the development of PLS-DA calibration models [35].

Variable Selection.
Five diferent methods of selecting variables were tested to see which of these methods may produce more accurate prediction results.Tese methods include backward variable elimination (BVE), genetic algorithm (GA), uninformative variable elimination (UVE), and subwindow permutation analysis (SwPA).
SwPA.Te SwPA, when paired with the PLS-DA model, has the potential to make the model more effective and faster for analyzing large datasets.Tis is because the SwPA ofers the infuence of each variable individually, without taking into account the infuence of the other factors.Additional information can be found in the reports that Mehmood and his coworkers [36] as well as Li and his coworkers [38] published.IPW.Te IPW variable selection was introduced by Forina and coworkers [39].Te method is predicated on the PLS model of each predictor's efect on the response, and it iteratively changes the original Xvariables to eliminate the variables that are of the least importance.In the feld of spectrometry, successful use of this method has been accomplished in the past [40].BVE.Backward variable elimination was frst ascribed by Frank for the elimination of noninformative variables [41].Later, in an upgraded version, it was utilized for wavelength selection [42].Te method works by frst sorting the variables using a flter measurement and then using a threshold to eliminate a subset of the least informative variables.Tis process is continued until there is no longer a need for any more elimination.GA.Te GA, which is derived from the concepts of genetics and natural selection, has developed into a tool for optimization that conducts a search that is both random and global inside a space that has a high dimension.By sampling a broad parameter space at each  stage of the optimization, GA might escape local optima and fnd global optima in a relatively short time.It has been extensively utilized for variable selection in multivariate spectroscopic calibration [43].Te steps of the genetic algorithm are explained in the study published by Mehmood and colleagues [36].UVE.Before employing the PLS model, the UVE procedures that have been developed by Centner and coworkers with PLS models included the addition of artifcial noise variables to the predictor set [44].It does away with the habitat variables that are of lesser value compared to the artifcial noise variables.Tis process is performed repeatedly until a satisfactory model is acquired.

Results and Discussion
3.1.NIRs Spectra. Figure 3 depicts the average of the NIRs absorbance spectra of raw Chinese quinces fruits grown in six diferent habitats.Te raw fruit spectra show that all of the spectra have a relatively similar shape, and there is only a little amount of variation between the spectra of each habitat.However, after going through the frst derivative preprocessing step, the raw fruit spectra showed that there were some major disparities across the diferent habitat groups.Tere were two strong bands of water absorbance at 1450 and 1950 nm that were connected to the overtone of -OH bands.Te -CH 3 groups, such as methyl, methylene, and ethylene, were responsible for the peaks that appear at around 1250 nm, 1700 nm, 2000 nm, and 2150 nm, respectively [14,45,46].In Figure 3(a), the observed spectra consisted of two distinct peaks and one broad peak, resulting in a total of three spectra.Conversely, Figure 3(b) exhibits a total of fve spectra.Specifcally, the absorption peak observed at a wavelength of 2,270 nm was attributed to the vibrational modes of CH-stretch and CH-deformation combination originating from the -CH 3 moiety of ethanol [47,48].Likewise, the absorption peak observed at approximately 2,300 nm is plausibly linked to the -CH 2 functional group present in ethanol [47][48][49][50][51]. Te NIRs region ranging from 1,650 to 1,750 nm is associated with the frst overtones of the CH-stretch in both -CH 3 and -CH 2 functional groups [47][48][49][50][51].Additional research has demonstrated that methanol-based solutions containing phenolic compounds and tannins exhibit comparable absorption patterns within these specifed regions, despite variations in concentration [52].Tis is particularly relevant to the spectral regions centered at 1,650 and 1,850 nm, as well as the range between 2,100 and 2,300 nm.Within this range, a prominent absorption characteristic associated with tannins has been identifed at approximately 2,140 nm [52].Terefore, it is possible that the observed alterations in this region refect diferences in concentrations of sugar, ethanol, phenolics, and tannins.
Te PLS-DA models' sensitivity in the classifcation of the six diferent habitats attained the best results from the frst derivative spectra for both the calibration and validation sets.Te correct classifcation specifcity for the calibration set was 91%, while it was 95% for the validation set.For this reason, the optimal wavenumber selection was achieved by the application of the frst derivative preprocessing approach.

Variable's Selection. PLS-DA was used in conjunction
with the various variable selection methods to develop the fnal model.Table 2 illustrates the specifcity of the PLS-DA models for both the calibration and validation sets for each variable selection method.Te UVE variable selection approach achieved higher specifcity for the calibration and validation sets, with scores of 0.93 and 0.98, respectively.Tis resulted in the best classifcation specifcity that was achieved after employing this method.When compared to PLS-DA with no variable selection, which utilized 256 variables and 8 factors, the number of variables was decreased from 256 to 70 with the usage of UVE, and the number of PLS factors was lowered from 8 to 7. Te specifcity of BVE's classifcation was the least and came in at 0.89 for the calibration set and 0.93 for the validation set, respectively.Except for the GA method, which only eliminated 14 variables from the habitual spectrum, the other variable selection methods did not increase the classifcation model specifcity, despite the fact that the number of variables was signifcantly decreased.
One notable advantage of the UVE-PLS method, in comparison to alternative variable selection methods, is its user independence, which eliminates any potential confguration issues [44].In their study, Koshoubu et al. [53] presented an adapted iteration of UVE-PLS, wherein they incorporated the prediction error sum of squares.Tis modifcation was employed to exclude uninformative samples, considering both wavelength variables and concentration variables [54].Te UVE-PLS method is utilized to identify the wavelength variables that contain relevant information based on the regression coefcients obtained from PLS modeling.Te coefcients of the PLS regression are acquired using the leave-one-out technique on the calibration samples.Nevertheless, the leave-one-out method presents a compelling issue.As highlighted by Martens and Dardenne [55], the leave-one-out technique employed in multivariate data analysis typically tends to overft on average, resulting in an underestimation of the actual predictive error.Hence, the incorporation of the leave-one-out method in the UVE-PLS algorithm introduces the aforementioned drawbacks, potentially resulting in the overftting of the prediction model.
Table 3 provides an overview of the correct classifcation percentages for both the calibration set and the validation set both before and after the application of UVE.Overall, PLS-DA-UVE produced optimal results when used for the classifcation of the diferent habitats of quince fruits.PLS-DA-UVE was superior to PLS-DA in terms of improving the specifcity of classifcation for Anhui, Shandong, and Yunnan in the calibration set when compared to PLS-DA with no variable selection.Te specifcity of the Chongqing and Zhejiang habitats remained the same, whereas it decreased for the Hubei habitat.Using the UVE method in conjunction with PLS-DA resulted in a classifcation specifcity of 100% achieved in the validation set for quinces belonging to the regions of Anhui, Chongqing, Hubei, Shandong, and Zhejiang.Te specifcity of the classifcation of quince fruit harvested in Yunnan habitats improved only marginally, ranging from 86% to 88%.PLS-DA-UVE succeeded in achieving the best overall performance, indicating the superiority of this method over others, it efectively classifes the habitat of Chinese quince fruits using NIRs spectral data..It was found that using UVE in conjunction with PLS-DA methods might produce a result that was more reliable and specifc [56].A similar result was observed when combining UVE with PLS-DA to determine the linoleic acid concentration in eight diferent types of edible vegetable oils [57].Tis indicates that the FT-IR transmission spectroscopy approach combined with the UVE method is promising for the quick detection of glycerol monolaurate [58].
Figure 4 presents the PLS-DA and PLS-DA-UVE score plots for factors 1 and 2, respectively.Te PLS-DA score plot (Figure 4(a)) shows that the fruits from each of the six habitats may be distinguished from one another.Tis might be because Chinese quinces grow in a wide variety of habitats, each of which is unique in terms of the soil, climate, and growing conditions, even though there is some commonality.It is evident from observing Figure 4(b) that the six

Conclusion
Te NIRs technique was employed in this study to successfully classify samples of Chinese quince fruit, resulting in signifcant disparities observed among the habitat groups obtained from six diferent habitats.Raw fruit spectra in the range of 1000 to 2500 nm were found when PLS-DA models were combined with the frst derivative preprocessing method.Tis has the potential to be employed as a fast and nondestructive method for diferentiating the habitat of Chinese quinces.Following an examination of several other variable selection methods, the study found that the UVE variable selection method, when used in conjunction with the PLS-DA method, produces more accurate classifcations for the six diferent habitats.In addition, the fndings suggested that the discrimination against the habitat of Chinese quinces can be due to the diference in the chemical composition of Chinese quince fruits, which resulted from the diferent climatic and geographical conditions of the habitat in which Chinese quinces were grown.Tis diference in the chemical composition of Chinese quince fruits was caused by the fact that Chinese quinces were grown in a habitat in which they had to adapt to diferent conditions.In addition, the fndings of the study suggest that PLS-DA can be used as an alternative method for classifying the habitats of Chinese quince fruits.Tis will help in identifying the primary factors that cause signifcant variation in the habitats, composition, and quality of Chinese quince fruits when combined with other methods like polynomial  Tis classifcation process has the potential to contribute to the enhancement of quality control, grading, and sorting procedures for these fruits.In the current investigation, NIRs is employed to nondestructively analyze samples, rendering it an invaluable instrument for evaluating the quality of fruits while preserving their usability and market value.NIRs is recognized for its rapid analysis capabilities in the current investigation, providing a distinct advantage for time-sensitive applications such as quality control in fruit processing.Moreover, the current study exhibits certain limitations.Te current investigation may exhibit a constrained sample size, potentially impacting the extent to which the fndings can be extrapolated.Another limitation is the potential absence of external validation through the utilization of independent datasets or samples from diverse geographical locations, which could enhance the credibility of the classifcation models.Te present study suggests that there are important implications for future research and clinical practice.Specifcally, it is recommended that future research endeavors focus on validating the fndings using larger and more diverse samples.Tis approach will help to improve the reliability and generalizability of the classifcation models.To ascertain pivotal spectral characteristics, forthcoming investigations should prioritize the identifcation of distinct spectral attributes linked to the categorization of Chinese quince fruits.Tis can facilitate comprehension of the inherent chemical composition and qualitative characteristics of these fruits.Potential applications in clinical practice encompass the utilization of NIRs for expedited categorization of fruits.Tis methodology can be further extrapolated to diverse domains, including the determination of the caliber and genuineness of medicinal plants as well as the evaluation of the nutritional constitution of food products.It may also have implications in clinical practice, such as the expedited identifcation of diseases or conditions through the analysis of spectral signatures in biological samples.Te incorporation of additional analytical methodologies can result in a synergistic efect, whereby the combination of NIRs with other analytical techniques facilitates the acquisition of a more exhaustive and precise dataset.Future investigations may delve into the synergistic combination of NIRs with complementary techniques, such as chromatography or mass spectrometry, in order to augment the analytical capabilities pertaining to Chinese quince fruits or analogous specimens.In a nutshell, this study showcases the capacity of NIRs for categorizing and evaluating the quality of Chinese quince fruits.Subsequent investigations can expand upon these results to investigate wider applications and enhance the technique's efciency.

Figure 3 :
Figure 3: NIRs spectra of the Chinese quince fruits from six diferent habitats: (a) raw fruit and (b) frst derivative preprocessing.

Table 1 :
Mapping presentation of sample collection sites of Chinese quince fruits from six diferent habitats.Diferent locations for the sample collection of Chinese quince fruit.

Table
Results of the classifcation of Chinese quince fruits from six diferent habitats using the PLS-DA full-wavelength and variable selection methods, respectively.

Table 3 :
Percentage of correct classifcation of Chinese quince fruits from six diferent habitats using the PLS-DA full-wavelength and uninformative variable elimination (UVE) methods, respectively.