Quantitative Estimation of Organic Matter Content in Arid Soil Using Vis-NIR Spectroscopy Preprocessed by Fractional Derivative

Soil organic matter (SOM) content is an important index to measure the level of soil function and soil quality. However, conventional studies on estimation of SOM content concerned about the classic integer derivative of spectral data, while the fractional derivative information was ignored. In this research, a total of 103 soil samples were collected in the Ebinur Lake basin, Xinjiang Uighur Autonomous Region, China. After measuring the Vis-NIR (visible and near-infrared) spectroscopy and SOM content indoor, the raw reflectance and absorbance were treated by fractional derivative from 0 to 2nd order (order interval 0.2). Partial least squares regression (PLSR) was applied for model calibration, and five commonly used precision indices were used to assess the performance of these 22 models. The results showed that with the rise of order, these parameters showed the increasing or decreasing trends with vibration and reached the optimal values at the fractional order. A most robust model was calibrated based on 1.8 order derivative of R, with the lowest RMSEC (3.35 g kg) and RMSEP (2.70 g kg) and highest Rc (0.92), R 2 p (0.91), and RPD (3.42 > 3.0). This model had excellent predictive performance of estimating SOM content in the study area.


Introduction
Soil organic matter content (SOM) is an important index to measure the level of soil function and soil quality, and detection of SOM content is an important approach to understand the local soil fertility [1,2].As known to all, the correlation between SOM and soil organic carbon (SOC) is significant.Soil represents the largest carbon sinks on earth and plays a major role in the global carbon cycle [3,4].More seriously, along with global warming, intensified human activities, and other factors, the loss of SOC is more severe especially in the arid and semiarid region [5,6].The capacities of SOC and carbon sequestration through SOM management have attracted considerable attention in recent years.
Traditionally, SOM is determined by capacity and combustion methods in laboratory.These methods are generally laborious.Besides, there are possible risks of air contamination during the operational procedure.Because of high efficiency, low-cost, large-scale, nondestruction, and rapid data acquisition, remote sensing technology has been proved to be a promising tool to strengthen or perfect traditional methods [7,8].And it provides a fresh approach for quantitative research of SOM content.Due to the lofty high spectral resolution, convenience, and controllability, the analysis on the laboratory Vis-NIR spectroscopy of soil is fashionable, especially.It is precisely the significant quantitative relationship between SOM and SOC; the estimation of SOM content by remote sensing has been proved as a feasible approach to grasp the condition of local SOC storage.Many studies have shown that SOM has unique spectral response in the Vis-NIR (visible and near-infrared) bands [9][10][11][12].To some extent, the soil spectral reflectance could reflect the content of SOM.Fast detection of SOM content could be conducted by using raw spectral reflectance data (R) and through its mathematical transformations.
The pretreatment of Vis-NIR spectroscopy is very necessary and effective to improve the accuracy of the spectral estimation model.To remove the effect of soil moisture from the spectra, Minasny et al. preprocessed the measured raw spectral data using EPO (external parameter orthogonalisation) algorithm, and it improved the stability and accuracy of SOC predicting model in southern New South Wales, Australia [13].Using spectral reflectance pretreated by Savitzky-Golay (SG) smoothing, first derivative with SG smoothing (FD), and other mathematical methods, the predicting performance of the support vector machine regression model (SVMR) was perfected [14].These spectral inversion models of SOM are mainly based on the R and some commonly appropriate pretreatment methods, that is, inversion (1/R), logarithm (lgR), logarithm-inversion (1/lgR), and root mean square ( R) and their first or second order derivative.As the high-dimensional data source with massive information, the raw spectra were pretreated by conventional integer order derivative generally, but it might influence the effective information detection and cause the loss of spectral information to some extent.Furthermore, the accuracy of modeling will also be constrained.Fractional order derivative broadened the concept of the classic integer derivative [15,16].Due to the better accuracy and higher efficiency, it has been widely used in system control and diagnosis, digital filtering, signal and image processing, and other related fields [17][18][19][20].For spectrum analysis, this algorithm had been introduced in the spectral reflectance pretreatment of saline soil in the Ebinur Lake basin [21].The research demonstrated that it was desirable to extract potential spectral information of soil Vis-NIR spectroscopy using fractional derivative algorithm in arid desert region.
Desert soil is a typical soil in the Ebinur Lake basin of Northwest China.The Ebinur Lake basin is a typical lake wetland in arid areas.There are few studies on fractional derivative applied in Vis-NIR spectroscopy of desert soil, and to this regard and motivated by the previous research, the objective of this study was to utilize laboratory Vis-NIR spectroscopy treated by fractional derivative algorithm combining with SOM content data to establish a predicting model with better accuracy and stability than existing models.

Study Area.
The study area is located in the Ebinur Lake basin (82 °36 ′ ~83 °10 ′ E, 44 °30 ′ ~45 °09 ′ N) in the southwest of Junggar Basin, Xinjiang Uighur Autonomous Region, China.The basin is surrounded by the mountains on 3 sides, north, west, and south side, separately [22,23].This region is a major function area to prevent dust of ecological protection system of the northern slope of Tianshan Mountain.Due to the arid desert climate of the study area, the annual precipitation in Ebinur Lake basin is approximately 102 mm, whereas the potential evaporation can reach 1447 mm.The annual average temperature ranges from 6.6 to 7.8 °C.Strong winds are typical in this region as well [24].The main geomorphic types are stone desert, gravel desert, salt desert, swamp, and so on.The soil types are mainly Piedmont psephitic and Gypsum desert soil [25,26].The Ebinur Lake basin is a normal closed oasis system in the inland arid areas and also an integrated region which is composed of wetland, hydrology, and human activities.
2.2.Soil Sample Collection and Chemical Analysis.Considering the typical landscape features of the study area, such as oasis, desert, the soil condition, and site accessibility, we set up 103 sites (30 × 30 m square area, 5 samples per site).In every measuring unit, the corresponding coordinate of each sample point was recorded by GPS (Figure 1).Each soil sample (about 0.5 kg) was put into a water-tight bag, sealed, numbered, and then brought back to the laboratory.A total of 103 topsoil (depth 0~20 cm) samples were obtained from the Ebinur Lake basin of Xinjiang Uighur Autonomous Region, China, from 18 to 29 May 2015.In order to reduce the effect of water content, all samples were air dried sufficiently, after then, these soil samples were crushed and sieved through the 2 mm screen to remove the stones, plant residue, and other impurities.Every sample was divided into 2 equal parts for soil chemical analysis and spectral reflectance measurement in laboratory, respectively.The potassium dichromate method was used for the determination of SOM.
2.3.Laboratory Reflectance Measurement.All of air-dried soil samples were individually put into wide round containers with a diameter of 12 cm and a depth of 1.8 cm (1.5 cm is considered optically infinitely thick for soil).To avoid the contamination in the period of the measurement, these containers had been painted black previously [27].And the surfaces had to be scraped with a plastic ruler to ensure the same flat measuring surface, as pressing can affect the porosity of the soil and result in false measurements [28,29].For the controlled light conditions, the reflectance spectra of all soil samples measurement were conducted in a dark laboratory with an ASD FieldSpec®3 portable spectrometer (Analytical Spectral Device, Boulder, CO, USA).The sampling intervals of this spectrometer are 1.4 nm (350~1000 nm) and 2 nm (1000~2500 nm), while the resampling interval is 1 nm.A 50 W halogen lamp served as the light source for the laboratory reflectance measurement, shining 8 °from vertical and being placed 50 cm above each soil sample surface.The optical sensor was installed with a distance of 15 cm from the flat of each soil sample with a 30 °zenith angle.Each reflectance measurement was calibrated by a standardized plate with 100% reflectance to ensure the accuracy [30].For each soil sample, twenty spectra curves were collected, and the mean value of the twenty spectra was taken as the final reflectance.

Spectral Processing and Data
Analysis.The real sample information was inevitably contaminated by the instrument noise [31,32].In order to reduce the noise, the ViewSpecPro software version 6.0 was applied to correct and eliminate the breakpoints and remove the marginal wavebands with large noise (350~400 nm and 2401~2500 nm).The SG smoothing method (polynomial order of 2 and frame size of 5) was employed for the smoothness of 103 spectral curves with OriginPro version 9.0.0.The processed spectra constituted the final data for further analysis.The processed spectral reflectance of all soil samples is shown in Figure 2.
Fractional calculus is a theory branch of mathematics and generalizes the classic integer derivative to arbitrary (noninteger) order [31,32].Fractional derivative has different definitions, that is, Grümwald-Letnikov (G-L), Riemann-Liouville (R-L), and Capotu.For less computational cost, G-L definition was applied in this research.Thereinto, v means the order, and zeroth order means the data are not processed by the algorithm.
Commonly, lg(1/R) spectra was used because it represented the absorbance, in spectrum analysis.For more modeling results and the improvement of nonliner relations, the smoothed and predenoised reflectance data were transformed by the absorbance (lg(1/R)).According to (1), R and its absorbance 0~2nd fractional derivatives (order interval 0.2) were computed under the platform Eclipse.

Data Modeling and
Validating.Due to its advantage of dimension reduction, synthesis, and solving colinearity problems among independent variables, partial least squares regression (PLSR) has been proved as a robust and reliable approach in spectral quantitative research [7,[33][34][35].For modeling, the benefit of PLSR is that it uses significance test wavelengths in the range selected to arrive at a prediction equation that uses wavelengths highly correlated to the analyte and gives little weight to the nonpredictive wavelengths.In order to take full advantage of the spectral reflectance, all wavelengths ranging from 401 to 2400 nm were applied in modeling calibration by PLSR.Ranking based the principle from the highest to lowest.The calibration set (n = 69) and the validation set (n = 34) were selected at equal interval for the calibration and precision test.
The capacity of estimation models were tested by five performance indices: ratio of performance to deviation (RPD), the determinant coefficients of calibration (R 2 c ), root mean square errors of calibration (RMSEC), and accordingly in prediction (R 2 p , RMSEP).The optimal models are represented by high values of R 2 c , R 2 p , and RPD, but low RMSEC and RMSEP.Generally, if 1.5 < RPD ≤ 2.0, it indicates that the model only estimates high and low level of SOM poorly.If 2.0 < RPD ≤ 2.5, it indicates a better predictive ability, while if 2.5 < RPD ≤ 3.0, a very good predictive ability, and if RPD > 3.0, the model has excellent predictive performance [12,36].All of the above indicators were calculated by MATLAB software version R2012a (MathWorks, Natick, MA, USA).The final results were used to assess the performance of the models.

Statistical Analysis of SOM Content.
The descriptive statistical characteristics for organic matter of soil samples of the whole dataset, the calibration set, and the validation set were presented in Table 1.Compared with the range of SOM content (0.68-78.39 g kg −1 ) for both the whole dataset and the calibration set, the validation set had a narrower range with 4.79-39.16g kg −1 , because of the deficient soil samples.The average SOM content and coefficient of variation of whole set were 21.43 g kg −1 and 50.46% between the range of the values of calibration and validation set, respectively, while the descriptive statistical characteristics of SOM content in the calibration and validation set were similar to the six parameters of the whole set.Thus, the SOM content of the calibration and validation set could represent those of the whole dataset sufficiently.
3.2.Reflectance of Different Soil Organic Matter Content.In the visible region, absorption bands related to soil color are because of electron excitations, which assist the measurement of SOM, the content of SOM, and the spectral reflectance are correlative [12,37].For researching the relationship between SOM content and spectral reflectance of the corresponding soil sample, five representative soil samples with different contents were selected for the curve plotting.The diagram showed that SOM content of 0.68 g kg −1 and 78.39 g kg −1 corresponded to the highest and lowest reflectance, separately.Spectral curves of soil samples with different organic matter content had similar reflectance and curve slopes, and there were three main obvious absorption features located near 1400, 1900, and 2200 nm, respectively (Figure 3).The absorption peak at 1400 nm is a typical absorption band for water which is associated with the bending and stretching of the O-H bonds of free water.The regions near 1900 and  ). and stretching vibration of Al-OH and Mg-OH, respectively [38][39][40].From 401 to 760 nm, reflectance increased sharply with increasing wavelength.Reflectance gradually decreased and tended to flat between 760 and 1900 nm.The second spectral absorption peak was measured around 1900 nm.Reflectance had acute changes with wavelength increasing from 1900 to 2400 nm.There was a negative correlation relationship between soil spectral reflectance and SOM content in the range of 401-2400 nm; that soil reflectance increased with SOM content decreasing and vice versa.It was easy to distinguish the soil reflectance with different SOM contents through the entire spectrum range, although spectral curves of five soil samples had some overlap sections but could be discriminated approximately from 400 to 600 nm and from 1900 to 2000 nm.The results were consistent with conventional researches [30,41,42].

Model Calibration and Validation.
Model calibration with all wavelengths could take advantage of the whole spectral information of the reflectance.The derivative pretreatment could effectively eliminate the effect of background noise on the target spectrum and highlight the spectral characteristics of analyte.In this research, all raw spectral reflectance and according absorbance data pretreated by fractional derivative algorithm were applied in the process of model calibration.As the order interval set to 0.2, all 22 inversion models were built by PLSR.The five performance indices of calibration and validation were summarized in Table 2 and Table 3.For R, during the range from 0 to 1st order, the preference of models did not increase significantly, the highest R 2 c , R 2 p , and RPD were only 0.41, 0.28, and 1.18, respectively.And the parameters did not reach the maximum at the same order.Five performance indices had a slight improvement with increasing order from 1st to 1.6 order.The RMSEC and RMSEP of model based on 1.6 order derivative reached 8.84 and 6.47 g kg −1 , separately.When the order reached 1.8, the performance of this model had significant promotion with the lowest RMSEC (3.35 g kg −1 ) and RMSEP (2.70 g kg −1 ) and highest R 2 c (0.92), R 2 p (0.91), and RPD (3.42 > 3.0).With the order increasing to 2, the capability of model decreased slightly.
The variation trend of absorbance model built-up by PLSR was similar with R model from 0 to 1st order.The RMSEC and RMSEP of 6 models were kept in the high level of values, that is, significant error.When the order is greater than 1, R 2 c and R 2 p increased sharply and reached highest at 1.8 order.The stability and accuracy of this model were perfected with the lowest RMSEC (3.06 g kg −1 ) and RMSEP (3.06 g kg −1 ).The sensitivity of the spectrum to SOM, the stability, and accuracy of models were enhanced.Both the models based on 1.8 order derivative of R and absorbance had the best predicting accuracy.
After repeated siftings for excellent predictive performance, there were 2 models having acceptable results with RPD > 3, R, and its absorbance model based on 1.8 order derivative, respectively.And among these 22 models, there was only one best model which was built-up based on 1.8 order derivative of R, represented the high values of R 2 c , R 2 p , and RPD, but low RMSEC and RMSEP, relatively.The coefficients of all bands and the constant term were demonstrated in Figure 4.The scatter plot of measured and predicted SOM content of the optimal model is shown in Figure 5. R 2 of measured and predicted values in calibration and validation set both reached 0.91, and the whole performance indices meant the model based on Vis-NIR spectroscopy treated by fractional derivative could be used to predict the SOM content in the Ebinur Lake basin.

Discussion
Due to the massive information, continuous bands, and high resolution of the spectral reflectance, the measured spectra are easily effected by individual differences (the particle size of samples, the angle of light source, the condition of analyte, etc.), and substantial noises [43,44].Therefore, the necessary  5 Journal of Spectroscopy pretreatment should be applied to minimize the irrelevant and useless information of the spectra and increase the correlation between the spectra and measured values.The usual pretreatment methods of soil spectrum mainly include smoothing, denoising, normalization, derivative processing, and multiple scatter correction [45].For derivative processing, the applications of first and second derivatives are popular [46].The first derivative could reduce the effect caused by the background noise of partially linear or near linear.Through the second derivative, signal wander of spectra could be weakened.
The 1st and 2nd derivatives mean the slope and curvature of spectral curves, respectively.Although the explicit spectral meaning of fractional derivative has not been clarified yet, the nonlocal and genetic characteristics of fractional derivative are widely recognized.But it suggests that between 0 and 2nd order of fractional derivative could be identified as the sensitivity to the slope and curvature of spectral curves.The derivative value becomes more sensitive to the slope and less sensitive to reflectance with the order increasing from 0 to 1st, and from 1st to 2nd order, the derivative value become more sensitive to the curvature and less sensitive to the slope [15].In the case of R model in this research, RPD and other parameters of regression models did not increase or decrease monotonously as the order is increasing.The process of change was undulant.They achieved optimal values at fractional order (1.8 order).These indices did not continue to improve at 2nd order as expected; the capability of model decreased slightly.The sensitivity of the spectrum to SOM was enhanced by pretreatment.RPD, R 2 c , R 2 p , RMSEC, and RMSEP all revealed the sensitivity.For conventional researches based on integer order derivative, these process details were ignored, which might cause the concealment of better models.
The soil spectral reflectance differ due to the influence of the parent material and soil type [47].Shi et al. compared the correlations between the reflectance and SOM in different types, just like limestone soils and red soils.The results manifested that the reflectance in the wavelength from 580 to 820 nm could be used to predict the SOM content [12].Liu et al. confirmed that the reflectance in the range of 620-810 nm was relevant to SOM, and the maximum correlation coefficient was discovered at 710 nm [48].With SOM content of 2% as a boundary, that is, when SOM content exceeded 2%, the SOM played a principal role in masking out the spectral features, while the SOM content was less than 2%, it became less effective [30,37,49].Though, it is hard to estimate SOM content of desert soil precisely when it is less than 2%.In this research, the spectral reflectance displayed the higher Journal of Spectroscopy correlation with the SOM content in the range of 600-900 nm (Figure 4).Our results are consistent with the finding of above studies.For higher SOM content, prediction based on Vis-NIR spectroscopy was widely researched in the black-soil region [7,30,50].However, this kind of application is relatively few in the arid and semiarid desert soils.Yang et al. discovered an optimal model to estimate SOM content in brown calcic soil region of Xinjiang and model with R 2 = 0 89 and RMSE = 0.32 [50].Nawar et al. used multivariate adaptive regression splines with first derivative reflectance data to predict SOM in El-Tina Plain, Egypt; the values of R 2 p and RPD reached 0.76 and 1.98, respectively [51].Comparing our results with previous research, in this study, not only considering the single band reflectance, we excavated more potential spectral information by using fractional derivative.It reduced the loss of information, detailed the variation trend of 5 accuracy indexes based on R and absorbance models of 11 order derivatives.
For abundant spectral information of all wavelengths, models based on feature bands only utilize part of all wavelengths, and the number of bands is very limited.The significance test at the level of 0.01 is used in current selection methods for choosing feature bands.This method might miss some suboptimal bands and lead to the loss of some important spectral information.For PLSR model with all wavelengths, spectral parameters of each wavelength in the whole spectral region are considered.Because of the advantages of PLSR, some problems just like fewer samples, more independent variables, and multiple correlations between variables could be solved effectively.In addition, due to the introduction of fractional order algorithm, the related fractional order spectral information of SOM is released, which has been ignored previously.Thus, the performance of estimating model is increased to some extent.The Ebinur Lake basin is the typical arid and semiarid region.Our research could enrich the SOM Vis-NIR spectroscopy studies and provide a new perspective to estimate SOM content in the special areas, where the organic matter content of desert soil with mass fraction is less than 2%.
The characteristics of soil reflectance spectra are not only directly relevant to SOM and water content but also obey the obvious regional differentiation rules.The Ebinur Lake basin is also the representative area with severe salinization.For the predicting model of SOM content, the salt content and texture may have a certain impact on the accuracy to some degree.For a better precision, the next step for further research is to distinguish the features of salt and SOM from spectral reflectance curves.

Conclusion
The pretreatment of Vis-NIR spectroscopy is very necessary and effective to improve the accuracy of the spectral estimation model.In this research, the fractional derivative algorithm was employed for pretreatment to determine the most accurate model for SOM content in the Ebinur Lake basin.We found that the whole 5 five performance indices, that is, R 2 c , R 2 p , RMSEC, RMSEP, and RPD did not increase or decrease monotonously with the increasing order.With the rise of order, these parameters showed the increasing or decreasing trends with vibration and reached the optimal values at the fractional order.Through the comparison of the 22 models, a most robust model was calibrated based on 1.8 order derivative of R, with the lowest RMSEC (3.35 g kg −1 ) and RMSEP (2.70 g kg −1 ) and highest R 2 c (0.92), R 2 p (0.91), and RPD (3.42 > 3.0).This model had excellent predictive performance of estimating SOM content in the study area.

Figure 1 :
Figure 1: Distribution of the all sampling sites and the location of the study area.

Figure 3 :
Figure 3: Spectral reflectance of soils with different organic matter content in the Ebinur Lake basin (g kg −1 ).

Figure 4 :
Figure 4: The coefficients of all bands and the constant term.

Figure 5 :
Figure 5: Comparison of measured SOM content and estimated values of modeling sample (a) and testing sample (b) through R 1.8-order derivative.

Table 1 :
The statistical characteristics of organic matter content of soil samples (g kg −1 ).

Table 2 :
Performance statistics of R model for calibration set and validation set based on PLSR.

Table 3 :
Performance statistics of absorbance model for calibration set and validation set based on PLSR.