The Combined Optimization of Savitzky-Golay Smoothing and Multiplicative Scatter Correction for FT-NIR PLSModels

e combined optimization of Savitzky-Golay (SG) smoothing and multiplicative scatter correction (MSC) were discussed based on the partial least squares (PLS) models in Fourier transform near-infrared (FT-NIR) spectroscopy analysis. A total of 5 cases of separately (or combined) using SG smoothing and MSC were designed and compared for optimization. For every case, the SG smoothing parameters were optimized with the number of PLS latent variables (FF), with an expanded number of smoothing points. Taking the FT-NIR analysis of soil organic matter (SOM) as an example, the joint optimization of SG smoothing and MSC was achieved based on PLS modeling. e results showed that the optimal pretreatment was successively using SG smoothing and MSC, in which the SG smoothing parameters were 4th degree of polynomial, 2nd-order derivative, and 67 smoothing points, the best corresponding FF, RMSEP, andRRPP were 7, 0.3982 (%), and 0.8862, respectively.is result was far better than those without any pretreatment.e combined optimization of SG smoothing andMSC could obviously improve themodeling result for NIR analysis of SOM. In addition, a new method for the classi�cation of calibration and prediction was proposed by normalization principle. e optimizations were done on this basis of this classi�cation.


Introduction
With the development of modern science and technology, near-infrared (NIR) spectroscopy analysis is widely applied to many �elds, such as agriculture, food, environment, biomedicine, and so forth because of its quickness, easiness, no reagents, pollution-free process, and multicomponent simultaneous determination [1,2]. Fourier transform nearinfrared (FT-NIR) spectroscopy analysis is much powerful in signal processing and spectroscopy analyzing, which forms a good approximation of the original spectrum by curve �tting with a fewer-term Fourier series [3][4][5][6]. FT-NIR spectroscopy analysis is a technology extracting the component information from the experimental data. e large quantity of data with the higher dimension requires chemometric methods for the quantitative analysis.
Partial least squares (PLS) is an effective dimension reduction method in near-infrared spectroscopy analysis. It is a widely used method of spectral modeling integrating principal component analysis and multiple linear regression. is method not only digs out the information of dependent variable but simultaneously also reduces the dimension of the spectral matrix [7][8][9][10][11][12][13]. e latent variables show the spectrum information of sample components, and the number of latent variables ( , a positive integer) is a main parameter of PLS modeling. Reasonable choice of latent variables is very important to the noise elimination and the full use of spectral information. Frequently, the choice of latent variables requires a joint optimization with the spectroscopy pretreatment methods.
In the process of FT-NIR spectroscopy analysis, the sample volume, sample preparation, the measuring method, and the measuring parameters, such as the choice of the scanning times and the scanning resolution will more or less bring in inevitable noise to the spectral data [3]. In order to make full use of the informative data and to eliminate noise, the data pretreatment is regularly necessary for the spectra before establishing the calibration model. Savitzky-Golay (SG) smoothing is a widely-used pretreatment method that can effectively eliminate the noises like baseline-dri, tilt, reverse, and so forth [14][15][16][17][18][19]. It contains many different smoothing modes. e smoothing parameters include the polynomials degree (PD), the derivatives order of polynomials (DOP), and the number of smoothing points (NSP). Here the NSP is very meaningful. A too-small NSP is prone to cause calculation error, resulting in a decreased model precision, while a too-big NSP would oversmooth and polish the spectral data, leading to the decreased accuracy. A reasonable choice of NSP is very important for SG smoothing. e NSP could be appropriately selected according to the PLS model prediction result by combination with the choice of PLS latent variables.
In addition, for the nonuniform particle size of solids, the NIR diffuse re�ectance spectrum of solid samples is oen accompanied by scattering noise. If the analyte content in the sample is much low, the spectral scattering effects may cover the spectral information. In order to overcome the interference of scattering, multiplicative scatter correction method would be used in the spectral data pretreatment process. Multiplicative scatter correction (MSC) is a pretreatment method that can segregate the informative absorbance of the analyte and the scattering signal in the spectral data [20][21][22][23]. It can eliminate the spectral differences in the same batch of samples because of the nonuniform particle size.
Based on the above introductions, SG smoothing and MSC are both spectral pretreatment methods with much potential. Indeed, the model effect would be much different when separately (or combined) using SG smoothing and MSC pretreatment methods. Moreover, the proper smoothing mode should be selected for the pretreatment optimization. is requires a large number of computer experiments, establishing different NIR spectroscopy analysis models corresponding to different pretreatment parameters. So, a reasonable model would be determined by contrasting the prediction effects. It is an important way to improve the predictive ability of NIR spectroscopy analysis, especially for the samples of complex systems.
Soil is an important part of agriculture and ecological environment, while soil organic matter (SOM) content is an important indicator measuring the fertility of soil [24]. e routine biochemical measurement of SOM is usually performed in the laboratory, with complicated operation, using chemical reaction that may cause pollution. It is of great signi�cance in modern agriculture that establishing direct, rapid, reagents-free measuring method for SOM. ere have been many researches on NIR spectroscopy analysis of soil in recent years [24][25][26][27]. Soil is a complex system with multiple components. e spectrum of soil would contain a lot of noise and interference. erefore, the need for further study is an important issue to select the appropriate spectral pretreatment method and to choose the effective chemometric method, in order to reduce noise and to improve the accuracy of NIR spectroscopy analysis of soil.
FT-NIR spectroscopy analysis of SOM taking as an example, we discuss the model prediction results by separately (or combined) using these two pretreatment methods of SG smoothing and MSC. We tried to, respectively, discuss the following 5 cases of pretreatment by contrasting the PLS model prediction effects: (1) without using any pretreatment; (2) separately using the MSC pretreatment; (3) separately SG smoothing pretreatment; (4) successively using MSC and SG smoothing pretreatment; (5) successively using SG smoothing and MSC pretreatment. Taking into account some actual system may require a bigger number of smoothing points, in the process of SG smoothing, the NSP expanded, a computing platform was built up for SG smoothing to calculate the corresponding smoothing coefficients, expanding the quantity of smoothing modes from originally 117 to 394, making a wider using scope for SG smoothing. Based on PLS modeling, the SG smoothing parameters were optimally selected by combination with the choice of the number of PLS latent variables, according to the model prediction results. is combination optimization could widen the applying range of spectral pretreatment methods and improve the predictive ability of NIR analysis, especially for the complex systems such as soil.
Besides, NIR spectroscopy analysis demands a classi�cation for all samples. Some samples were classi�ed into the calibration set, and the others into the prediction set. e analyte's chemistry value (as the reference) and the spectral absorbance of the samples for calibration are used to establish a calibration model, and then the spectral absorbance of the samples for prediction is taken into the model to calculate the corresponding NIR-predicted chemistry values. According to the proposed model evaluation indicator, the model prediction result could be evaluated by comparing the predicted values and the chemistry values of the analyte in prediction samples, and further the application effectiveness of NIR spectroscopy could be determined. e classi�cation of calibration set and prediction set would directly in�uence the model optimization results of NIR spectroscopy analysis. According to Lambert-Beer law, the NIR analysis model shows the relationship between the chemistry value of the analyte and the spectral absorbance of the samples. To reduce the in�uence of noise on spectral data, and to make the model have its representativeness, the chemistry values and the spectral data of samples were, respectively, pretreated by data normalization. On this basis, a new method for the classi�cation of calibration set and prediction set was proposed in this paper, in order to ensure the correlation similarity for the models, with high correlation coefficients in both the calibration and the prediction processes.

Materials, Experiment, and Methods
2.1. Materials, Instrument, and Measurement. One hundred thirty-�ve soil samples were collected in Guangxi of China (numbered from 1 to 135). Aer drying, crushing, and sieving to granular solids with a diameter of about 2 mm, they were measured in biochemical and NIR spectroscopy experiments. In the biochemical experiment, the content of SOM was measured by potassium dichromate oxidation, and the measured data were called chemistry values, which were taken as reference values for NIR analysis. e chemistry value of all samples were ranged from 1.100 to 6,418 (%, here the unit was the mass percentage), the mean value and the standard deviation were 2.686 and 1.056 (%), respectively.
In the NIR spectroscopy experiment, the instrument was Spectrum One NTS FT-NIR spectrometer (produced by Perkin�lmer Inc., �SA) with di�use re�ectance accessory. e scanning spectral region was set as 10000-4000 cm −1) , the resolution as 8 cm −1 , and the scanning times as 64. e experiment temperature was 25 ± 1 ∘ C and the relative humidity was 47 ± 1%.
���� �e �e� �et��� ��� ���������t���� e classi�cation of calibration set and prediction set is an important part in NIR spectroscopy analysis. It would �nally in�uence the model optimization results of NIR analysis. In order to gain a clas-si�cation whose calibration set owns a correlation similarity to the prediction set, a new method for the classi�cation was proposed in this paper to establish the chemometric models with certain representativeness.
According to Lambert-Beer law, we tried to work on the chemistry values and the spectral data. First, by calculating the correlation coefficients (denoted by ) between chemistry values and spectral absorbance of samples, the wavenumber with the highest correlation coefficient was caught in the scanning spectral range, and the wavenumber was denoted by high , and the highest correlation coefficient by high . e chemistry values and the spectral data were, respectively, normalized by the normalization principle [28][29][30]. en, based on the normalized chemistry values, the two samples with maximum and minimum values were chosen for calibration, while the two samples with 2nd-maximum and 2nd-minimum values chosen for prediction; based on the normalized spectral data, like on the normalized chemistry values, the corresponding four samples were, respectively, classi�ed into the calibration set and the prediction set.
Next, by setting the number of samples in the calibration set ( ) and the number in prediction set ( ), the remaining samples were one-by-one randomly chosen into the calibration set or into the prediction set. e correlation coefficients between chemistry values and spectral absorbance were separately calculated in the calibration set and in the prediction set and denoted, respectively, by Cset and Pset based on the spectral data at high . is kind of random choice was done for enough times until there was one classi�cation whose Cset and Pset were sufficiently close to each other. en this classi�cation could be considered as owning a certain correlation similarity, and it would be suitable for NIR analysis modeling.
e speci�c calculating process was divided into the following two steps.
Step 1. e normalization and the samples chosen: (a) the normalization for chemistry values: (b) the normalization for spectral data: where is the number of all samples, is the number of wavenumbers in the scanning spectral region, is the chemistry value of sample , is the averaged chemistry value of all samples, ′ is the normalized chemistry value of sample , is the spectral absorbance of sample at the th wavenumber, , is the averaged spectral absorbance at the th wavenumber, ′ is the normalized spectral absorbance of sample at the th wavenumber, and | ′ | is the norm of the spectral absorbance vector of sample .
According to the normalization for chemistry values and for spectral data described above, there obtain one ′ and one | ′ | for the sample = 1, 2, , ). Among all samples, the two with maximum and minimum ′ and the two with maximum and minimum | ′ | were classi�ed into the calibration set, while the two with 2nd-maximum and 2nd-minimum ′ and the two with 2nd-maximum and 2ndminimum | ′ | into the prediction set.
Step 2. e classi�cation of the remaining samples: using the measured chemistry values and the spectral data at the wavenumber , the correlation coefficient ) of chemistry values and spectral absorbance at the wavenumber was calculated as follows: then the maximum ) was found out, and denoted by high , here high = max{ ), = 1, 2, , , and the corresponding wavenumber was high .
According to the allocated numbers of and , the remaining samples were randomly put into the calibration set or into the prediction set for sufficient times, producing many di�erent classi�cations. For each classi�cation, we focus on the spectral data at the wavenumber high , combining with the chemistry values, the correlation coefficients in the calibration set and in the prediction set ( Cset and Pset ) were separately calculated, and the calculation formulae are similar to Formula (3).

ISRN Spectroscopy
By Cset and Pset , a new variable SUBR is calculated: where SUBR is a variable describing the similarity of the calibration set and the prediction set. We would choose a classi�cation with a sufficiently small SUBR to establish NIR analysis models. How small is sufficient should depend on actual situation. On the basis of SUBR, we design to put 90 samples out of 135 into the calibration set ( = ), and the remaining 45 samples into the prediction set ( = ). And in this paper, we set SUBR < 1 − as the goal of similarity.

Multiplicative Scatter Correction
Method. Soil samples were made solid powder for experiments, and the NIR spectra were collected in the diffuse re�ectance way. Although the powder has been sied, they are still not uniform particles, and also the analytes (i.e., SOM) content is much low in samples, the spectral scattering effect may override the spectral information of SOM [20]. In order to overcome the interference of scattering, multiplicative scatter correction (MSC) method was used for the spectral pretreatment in this paper. e speci�c computing process is as follows.
Step 1. Calculating the average spectum of the measured spectra: Step 2. Regression based on the average spectum, estimating and : Step 3. Calculate the MSC-corrected spectrum by using and : where ( = 1, , , is the measured spectrum of sample , Ave is the average spectum of all measured spectra, and are the regression coefficients for sample , and ,MSC is the MSC-corrected spectrum of sample .

Savitzky-Golay Smoothing
Method. Savitzky-Golay (SG) smoothing includes three parameters, which are the polynomials degree (PD), the derivatives order of polynomials (DOP), and the number of smoothing points (NSP). For convenience, PD and DOP were always combined denoted by the SG smoothing polynomial pattern (SPP), and NSP is an odd number, expressed as + 1 ( = 1, , , . Besides, if DOP equals 0, it means SG smoothing is without derivatives. SG smoothing works on a subwaveband including + 1 neighboring wavenumbers, constructing a polynomial with the serial numbers of wavenumbers as the independent variable, and �tting the polynomial coefficients by using the principle of least squares regression. In the polynomial �tting process, the spectral data at the + 1 neighboring wavenumbers were embedded into the coefficients. e coefficient of each polynomial term would be a linear combination of the spectral data in the sub-waveband; the -order term will become the smoothed spectrum value of -order derivative smoothing at the centre point ( = ; the coefficients of the linear combination are called SG smoothing coefficients. For a �xed NSP (i.e., a �xed ), a sub-waveband with a �xed-size moving through the whole scanning spectral region, the SG smoothing values of the spectral data at centre wavenumbers of all subwavebands can be calculated, and the SG smoothing spectra can be �gured out. For the changing NSP, by changing the size of the sub-waveband, the SG smoothing spectra can be obtained corresponding to different NSP.
According to the method mentioned above, any derivative smoothed values at the center point of a sub-waveband can be expressed as a linear combination of the measured data at all wavenumbers in the sub-waveband. e coefficients of the linear combinations (i.e., the smoothing coefficients) are uniquely determined by the three smoothing parameters of PD, DOP, and NSP. Every combination of these three parameters corresponds to one group of smoothing coefficients (i.e., one smoothing mode). In Savitzky and Golay's paper [14], PD was set as 2, 3, 4, and 5; DOP as 0, 1, 2, 3, 4, and 5; NSP as 5, 7, , 25 (odd). ere are a total of 117 groups of smoothing coefficients (i.e., 117 smoothing modes). If the spectral resolution was set small, meanwhile the used NSP was also not big, the corresponding smoothed subwaveband would be too narrow, and then this sub-waveband would be in lack of the information. In this situation, a good smoothing effect could be difficult to reach. erefore, it is necessary to expand the NSP. In this paper, the NSP was expanded to 5, 7, , 91 (odd number), and the corresponding smoothing coefficients of more smoothing modes were computed. We totally got 394 groups of smoothing coefficients, including the original 117 groups [14]. is work widened the applied areas of SG smoothing pretreatment method, providing more choices of smoothing modes for different analytes.
Now SG smoothing mode with PD = , DOP = , and NSP = 67 was taken as an example to show how to calculate the SG smoothing coefficients. Actually, we need to use the 4th degree of polynomial and the spectral data of 67 neighboring points to compute the smoothed spectra of 2nd-order derivative. e 67 calculated smoothing coefficients were −5.841, −3. e smoothing coefficients corresponding to every other SG smoothing mode can be calculated by this method in a similar process to this example. A total of 394 SG smoothing modes were designed in this paper.

Model Evaluation Indicator. e model evaluation indi-
cators mainly include the root mean square error of prediction (RMSEP) and the correlation coefficient of prediction ( ), they are calculated as where ′ and were NIR predicted value and chemistry value of the sample in the prediction set, ′ and were, respectively, the mean predicted value and the mean chemistry value of all samples in the prediction set, and was the total number of samples in the prediction set.

Results and Discussions
e NIR diffuse re�ectance spectroscopies of 135 soil samples were collected by using Spectrum One NTS FT-NIR spectrometer, as shown in Figure 1. e scanning spectral region was as 10000-4000 cm − , with the resolution of 8 cm − , and there totally included 1512 spectral data points. Establishing the calibration models on the whole scanning spectral region by using PLS regression method, we mainly discussed the pretreatment effects by separately (or combined) using the two pretreatment methods of SG smoothing and MSC. During the discussion, we simultaneously selected the optimal SG smoothing mode by investigating the SG smoothing parameters.
To get a good classi�cation of calibration set and prediction set, the spectral data of all the 135 soil samples were combined with the chemistry values to calculate the correlation coefficient ( ) at each wavenumber. e corresponding to each data point was shown in Figure 2.
e chemistry values and the spectral absorbance data of all samples were pretreated by normalization, and the corresponding ′ and | ′ | of each sample were calculated.
Eight samples were found out according to ′ or | ′ |. e two samples with maximum and minimum ′ were no. 13 and no. 55, and the two samples with maximum and minimum | ′ | were no. 84 and no. 59. ese four samples were classi�ed into the calibration set. Meanwhile, the samples with 2ndmaximum and 2nd-minimum ′ were no. 7 and no. 60, and the two samples with 2nd-maximum and 2nd-minimum | ′ | were no. 78 and no. 49. ese four samples were classi�ed into the prediction set.
By estimating the chemistry values and the spectral data at the wavenumber with high , the remaining samples were randomly classi�ed for sufficient times. Based on the limitation of SUBR < 0 −5 , a reasonable classi�cation was determined, with 90 samples in the calibration set and 45 in  the prediction set. e basic statistics data for the chemistry values of samples were shown in Table 1.
Using the chemistry values of SOM and the spectral data, calibration models were established for the FT-NIR analysis of SOM by PLS regression method. And the indepth discussion was done about the in�uences on the model prediction result by separately (or combined) using the two pretreatment methods of SG smoothing and MSC. Moreover, the SG smoothing parameters were optimized in this discussion. For the separate (or combined) using the two pretreatments, we tried to, respectively, discuss the following 5 cases of pretreatment by contrasting the PLS model prediction effects: (1) without using any pretreatment (none); (2) separately using the MSC pretreatment (MSC); (3) separately SG smoothing pretreatment (SG smoothing); (4) successively using MSC and SG smoothing pretreatment (MSC + SG smoothing); (5) successively using SG smoothing and MSC pretreatment (SG smoothing + MSC).
In the process of SG smoothing, taking into account that the much higher order derivatives would seriously polish the spectral data, which may result in the loss of information, we designed to keep PD as the original 2, 3, 4, and 5, and to employ the DOP as 0, 1, 2, and 3, but to focus on the expansion of NSP, applied as from 5 to 91 (odd numbers). en a total of 394 SG smoothing modes were designed. Each smoothing mode corresponds to one group of smooth coefficients, and the speci�c calculating process would not be the same, and the formulae cannot be uniformly expressed. e overall amount of computation is very large to compute all the smooth coefficients corresponding to different smoothing modes and to establish PLS models on the smoothed spectral data from each smoothing mode, optimizing the models by debugging the of PLS. To solve this problem, we tried to build up a computing platform, which includes all the calculation process of each group of smoothing coefficients, and the chemometric algorithm of combined optimization on SG smoothing parameter and the of PLS. In this way, a database for pretreatment optimization was constructed simultaneously. Based on the computing platform, the smoothing coefficients of each SG smoothing mode can be calculated online for any expanded NSP. It is more convenient for the optimization of PLS modeling.
In the latter three cases of (3), (4), and (5), we would optimally select the SG smoothing polynomial pattern (SPP) and calculate the groups of smoothing coefficients corresponding to all the 394 smoothing modes. Employing PLS regression method, all the 394 SG smoothing modes were combined with (set changing from 1 to 40), and a total of 15760 different SG-PLS models were formed. By the model prediction results (i.e., RMSEP and mainly), the optimal combination of SG smoothing mode and of PLS can be selected. e optimal PLS model prediction result and the corresponding model parameters of the 5 cases were listed in Table 2. It can be seen that, the model prediction result was better aer MSC pretreatment than before, while the result was also improved by SG smoothing. Moreover, separate SG smoothing pretreatment worked better than separate MSC pretreatment. Combined use of SG smoothing and MSC pretreatment may provide a better result. e best pretreatment method was successively using SG smoothing and MSC (i.e., SG smoothing + MSC).
Next, we discuss the different model prediction results come from different SG smoothing polynomial patterns. For the latter three cases of (3), (4), and (5), the RMSEP of the optimal PLS model corresponding to the 9 different SPPs were, respectively, listed in Table 3, where SPP 20 means a quadratic polynomial with 0th-order derivative; SPP 31 means a cubic polynomial with 1st-order derivative; the rest may be deduced by analogy. By comparing the model prediction results, as was shown in Table 3, the best pretreatment method was selected as successively using SG smoothing and MSC (i.e., SG smoothing + MSC), of which the best SPP was 42 (i.e., PD = 4, DOP = 2).
And then, for the optimally selected spectral pretreatment method (SG smoothing + MSC), in depth we discussed how the changing NSP in�uenced the model prediction effects. Fixing the SPP as 42, the spectral data was, respectively, pretreated by SG smoothing with the changing NSP (odd numbers from 5 to 91), and PLS models were established on the smoothed spectral data. en, the RMSEP of the optimal PLS model corresponding to different NSPs was obtained, as shown in Figure 3. e best NSP was 67, getting the optimal RMSEP of 0.3982 (%). In addition, we can see that if the NSP was limited within 25, the corresponding optimal RMSEP would become 0.4317 (%), which was far from the result of NSP = 67. is indicates that in SG smoothing, the expansion of NSP is very much necessary. e smoothing coefficients corresponding to NSP = 67 were calculated and listed in the example that was used to perform the calculation process of the SG smoothing coefficients. Figure 4 showed the spectra pretreated by successively using SG smoothing and MSC, whose SPP and NSP were 42 and 67, respectively. And the optimal model was selected based on these pretreated spectral data. Aer the best pretreatment, the spectral data were used to establish PLS models, while the was set changing from 1 to 40, obtaining the model prediction results shown in Figure 5, the best was selected as 7, with the corresponding RMSEP of 0.3982 (%). In summary, by using PLS model for FT-NIR analysis of soil organic matter, the best pretreatment method was chosen as successively using SG smoothing (SPP 42 and NSP = 67) and MSC, the corresponding optimal of PLS was 7. e selected optimal model with the best pretreatment method provided the NIR predicted values of SOM of the 135 samples. To compare the NIR predicted values and the measured chemistry values (seen in Figure 6), the correlation coefficient of prediction was 0.8862, and the RMSEP was 0.3982 (%). e model prediction result was good, and the precision was acceptable. is indicates that the optimal selection of pretreatment for NIR analysis can effectively reduce the noise, accordingly enhancing the prediction accuracy of PLS model, and that by pretreatment optimization, NIR spectroscopy analysis can be effectively applied to the detection of soil organic matter content.

Conclusions
In this paper, taking the FT-NIR spectroscopy analysis of soil organic matter as an example, we discussed the in�uence that separate (or combined) use of SG smoothing and MSC pretreatment methods has on the FT-NIR modeling effects by establishing PLS model for quantitative analysis. During the SG smoothing, we emphasized on expanding the NSP, calculating the smoothing coefficients corresponding to each NSP. Furthermore, the NSP selection and the of PLS were simultaneously joint-optimized, with the goal to improve the model prediction accuracy. e results showed that whether or not using SG smoothing and MSC do lead to different results in NIR spectroscopy analysis. And also, when SG smoothing and MSC were both employed, the using order would still in�uence the model prediction effects. e optimal model was the PLS regression with successively using SG smoothing and MSC pretreatment (SG smoothing + MSC), in which the SG smoothing parameter were 4th degree of polynomial, 2nd-order derivative, and 67 smoothing points, the best corresponding number of PLS latent variables was 7. e RMSEP and of this optimal model were 0.3982 (%) 0.8862, respectively. e result was far better than that of models without using any pretreatment. is suggested 8 ISRN Spectroscopy T 3: RMSEP of the optimal PLS model, respectively, corresponding to the 9 different SPPs in the latter 3 cases of SG smoothing, MSC + SG and SG + MSC. that with the optimal selection of pretreatment methods, the FT-NIR analysis of soil organic matter could provide good predicted values having high prediction correlation and low prediction error to the chemistry values measured by potassium dichromate oxidation. e optimal selection of pretreatment for NIR analysis can effectively reduce the noise, accordingly enhancing the prediction accuracy of PLS model. e combination optimization of SG smoothing and MSC pretreatment methods could obviously improve the model prediction result for NIR spectroscopy analysis of soil organic matter. And the computing platform for the optimization of combining SG smoothing with MSC can be tried on applications for NIR analysis of other analytes.