Quantitative Estimating Salt Content of Saline Soil Using Laboratory Hyperspectral Data Treated by Fractional Derivative

Most present researches on estimation of soil salinity by hyperspectral data have focused on the spectral reflectance or their integer derivatives but ignored the fractional derivative information of hyperspectral data. Motivated by this situation, the selected study area is the Ebinur Lake basin located in the southwest border in the Xinjiang Uygur Autonomous Region, China, with severe salinization. The field work was conducted from 15 to 25 October, 2014, and a total of 180 soil samples were collected from 45 sampling sites; after measuring the soil salt content and spectral reflectance in the laboratory, the range from 0 to 2 was divided into 11 orders (interval 0.2) and then the hyperspectral data were treated by 4 kinds of mathematical transformations and 11 orders of fractional derivatives. Combinedwith the soil salt content, partial least square regressionmethodwas applied formodel calibrations and predictions and some indexes were used to evaluate the performance of models. The results showed that the retrieval model built up by 250 bands based on 1.2-order derivative of 1/lgR had excellent capacity of estimating soil salt content in the study area (RMSEC = 14.685 g/kg, RMSEP = 14.713 g/kg, RC = 0.782, RP = 0.768, and RPD = 2.080). This study provides an application reference for quantitative estimations of other land surface parameters and some other applications on hyperspectral technology.


Introduction
Soil salinization is one of the most common but serious environmental problems worldwide and is considered as one of the main paths to land desertification [1].Due to large evaporation and higher levels of groundwater table with relatively high soluble salt content [2], it often occurs in fragile arid and semiarid regions and causes the productivity loss of irrigated farmlands [3].On the global scale, approximately 20% of irrigated lands are confronted with a severe threat of salinization and this figure will increase with great population pressure [4].
Faced with such large amounts of salt-affected land, timely detection and assessment of soil salinization become therefore extremely necessary and urgent for sustainable development [5].However, conventional methods often require intensive field investigations restricted by limited funds and labor; thus, these could not meet the need of salinization monitoring for large areas [6].Because of low-cost, rapid data acquisition, and large area coverage [7], remote sensing (especially hyperspectral remote sensing) shows as a promising tool to substitute or complement traditional methods and provides an overview of salinization on different spatial scales, and hyperspectral techniques have been successfully used for quantitative analysis of some indexes of the soil salinization [8][9][10][11].
Among the spectral analytical methods, derivative spectroscopy is a powerful mathematical tool and provides more useful information of spectral data than untreated data [12].It is well-known that first derivative is the slope of the spectral curve and second derivative means the change in slope, that is, curvature of the spectral curve [13,14].Derivative analysis of hyperspectral data has been widely used to eliminate background noises, reduce the effects of baseline, solve overlapping problems, sharpen spectral features, capture subtle details of spectral curves, and increase the estimation accuracy of land surface parameters [15][16][17][18][19][20].However, derivative spectroscopy has some disadvantages, such as spectral information loss and amplification of highfrequency noise [16].
As a more general case, fractional derivative is similar to integer derivative but extends the order of derivative to arbitrary [21].And there are some successful applications on hyperspectral datasets treated by fractional derivative.Schmitt [22] introduced this method and applied it to laboratory measured near-infrared spectra of a liquid mixture of hemoglobin and milk.He found that fractional derivative affords more flexibility than integer derivative to adjust the order of the derivative to reduce baseline offsets and minimize high-frequency noise.Tong et al. [23] used fractional order Savitzky-Golay derivative (FOSGD) and stability competitive adaptive reweighted sampling (SCARS) in simulated, diesel, and Honghe tobacco spectral datasets to improve the performance of the multivariate calibration model.And they found that FOSGD has a better capacity to balance the contradiction of resolution and signal strength than integer derivative.
However, there are few studies on fractional derivative used in hyperspectral data of saline soil, and, to this regard and motivated by the previous works, this current research attempts to use laboratory hyperspectral data treated by fractional derivative combining with soil salt content data to build up a regression model for better accuracy in estimation.Specifically, the study aims to (1) analyze the relationship between soil salt content and hyperspectral data treated by fractional derivative, (2) develop a quantitative model for salt content estimation based on different fractional derivative, and (3) research the variation tendencies of some parameters of retrieval models with the order increasing.

Fractional Derivative Spectrometry Method
Fractional calculus is a theory branch of mathematics with a long history since the end of the 17th century and generalizes the classic integer derivative to arbitrary (noninteger) order [21,[24][25][26].Fractional derivative has been widely used in some engineering fields because the models described by fractional derivative have better accuracy and higher efficiency than these built up based on integer derivative [27].
Although fractional derivative has a long history and many successful applications, its mathematical definition has still not been unified yet [28].The most popular and often used definitions are Grünwald-Letnikov (G-L), Riemann-Liouville (R-L), and Caputo [26].Due to being less complex than the others [21], G-L definition was employed in this study.
The V-order G-L fractional derivative of function () on the section [, ] is defined as  where [(−)/ℎ] is the integer part of (−)/ℎ and the Gamma function is defined as follows [29]: Consider the fact that the instrument for hyperspectral measurement used in this study has a resampling spectral resolution of 1 nm, and, thus, set ℎ = 1 and (1) becomes Therefore, (3) could be regarded as the numerical algorithm for calculating the fractional derivative of hyperspectral data [30,31], and zeroth order means the hyperspectral data are not treated by derivative algorithm.

Field Campaign and Laboratory Experiment.
The field work of this study was conducted from 15 to 25 October, 2014, and a total of 180 soil samples (depth 0∼20 cm) were collected from 45 sites (30 × 30 m square area, 4 samples in each site) (Figure 1).Before sampling, a handheld global positioning system equipment was used to record the coordinate of the sampling site and landscape photographs of the site were taken.Each soil sample (about 1 kg) was put into a plastic bag, sealed, labeled, and brought back to the laboratory.After being air dried sufficiently, all soil samples were crushed and passed through a 2 mm sieve for removing stones, weed roots, and other impurities.Each sample was divided into two equal parts for further analyses (soil properties analysis and laboratory spectroscopy measurement separately).The soil salt content and pH value of 180 soil samples were determined by a WTW inoLab5 Multi 3420 Set B multiparameter measuring instrument (Wissenschaftlich-Technische Werkstätten GmbH, Germany) in 1 : 5 soil to distilled water extracts (20 g of soil sample added into 100 mL of distilled water) at 25 ∘ C.

Laboratory Reflectance Measurement.
For the purpose of controlling the light condition [37,38], the reflectance measurement was conducted in the dark room with a widely used ASD FieldSpec53 portable spectroradiometer (Analytical Spectral Device Inc., USA).The instrument covers the range from visible near-infrared to short wave infrared (350∼ 2500 nm) with 2151 bands resampled to 1 nm [2,39].Petri dishes with a diameter of 15 cm and a depth of 2 cm were used to load sieved soil samples.After fully filling the dishes with the samples, the surfaces were scraped with a plastic ruler to ensure the same flat measurement surface [40].The samples were illuminated with two 90 W tungsten halogen light sources placed on either side of the sample, and the light beams were set at 30 ∘ from vertical direction and the distance between each of the lamps and the sample was set at 50 cm.The probe with an 8 ∘ viewing angle was fixed at a height of 10 cm perpendicular to the surface of the soil sample.The spectrometer was calibrated approximately every 10 minutes by measurements of dark current and a standard white spectralon reflectance panel (Spectralon Labsphere Inc., USA) [39,41].The spectral reflectance of each sample was collected 20 times.

Spectral Data
Processing.Before further analysis, spectral data pretreatment is a necessary and vital step to reduce the calculation errors.In order to minimize instrument noise [2], these 20 spectral curves of each sample were averaged after splice correction by ViewSpecPro software (version 6.0.11).
Due to low signal-to-noise ratios [42], marginal ranges from 350 to 400 nm and from 2351 to 2500 nm were removed and not used in this study.Then, 180 spectral curves of sample were smoothed by Savitzky-Golay filter (polynomial order of 2 and frame size of 5, default settings in OriginPro 9.0.0)[43].Smoothed spectral curves of soil samples are shown in Figure 2.
After preprocessing, in order to change nonlinear relations to linear and get more modelling results, the hyperspectral reflectance data () of 180 samples were transformed by some commonly nonlinear functions: root mean square ( √ ), inversion (1/), logarithm (lg ), and logarithm-inversion (1/lg ).Particularly, lg(1/) spectra was commonly used because absorbing components and their contributions often have near-linear relations with the lg(1/) value [44], and because lg(1/) = − lg , so here lg  was applied in further modelling.
According to (3), their (, √ , 1/, lg , and 1/lg ) 0∼2nd fractional derivatives (interval 0.2) were calculated and the correlation coefficients between the soil salt content and each derivative treated data were computed under the Java programming integrated development platform Eclipse.

Estimation Model and Prediction Accuracy.
In the aspect of quantitative research on hyperspectral data, partial least squares regression (PLSR) has been proved as a robust and reliable mathematical tool because of its advantage of solving colinearity problems [45][46][47][48][49][50].Thus, in this research, PLSR was applied for model calibrations and predictions of soil salt content based on the hyperspectral data treated by G-L fractional derivative.
In order to evaluate the performances and accuracy of estimation models built up by PLSR, five indexes of models: the determinant coefficients of calibration ( 2  ) and prediction ( 2  ), root mean square errors of calibration (RMSE  ) and prediction (RMSE  ), and ratio of performance to deviation (RPD), were employed to perform assessment of calibrated models.Usually, a good and stable model should have high  2   ,  2  , and RPD and low RMSE  and RMSE  [51][52][53].In this step, all the calculations were carried out by MATLAB R2014b software (MathWorks Inc., USA).

Results, Analyses, and Discussion
4.1.Salinity Parameters.The descriptive statistics for the soil salt content and pH values of 180 samples collected in the study area are presented below in Table 1.The soil salt content exhibited a wide range from 0.0 to 196.0 g/kg with a mean value of 14.739 g/kg, a standard deviation (SD) of 15.610 g/kg, and a fairly high coefficient of variation (CV) of 105.909% (>100%).According to the soil salinity classification [54], the numbers of nonsaline, slightly, moderately, and heavily saline soil samples were 85, 33, 20, and 42 respectively.The pH value varied from 7.9 to 9.718 with a very low CV of 4.851% (<10%) [55].Among 180 samples, there were 100 alkaline samples (pH between 7.5 and 8.5) and 80 strong alkaline samples (pH > 8.5) [56].

Spectral Features.
On the basis of the different degrees of soil salinity mentioned above, 180 soil samples were classified into 4 categories and spectral curves of each category were averaged as a representative spectral curve of this degree (Figure 3).Four spectral curves followed similar basic shapes and there were three obvious absorption features located near 1400, 1900, and 2200 nm, respectively [38,57].Among 4 categories, nonsaline soil showed lowest reflectance and slightly saline soil displayed highest reflectance.It was easy to distinguish the differences among slightly, heavily, and nonsaline soil through the entire spectrum range (400∼2350 nm), and, however, spectral curves of heavily and moderately saline soils had some overlap sections but could be discriminated approximately from 400 to 900 nm and from 1900 to 2050 nm.

Correlations between Salt Content and Spectra.
Band selection is an important process for constructing the regression model [58], and correlation coefficients between salt content and spectral reflectance are usually used to identify soil salinity sensitive bands [10].All the correlation coefficients between soil salt content and fractional derivative values of raw reflectance data and mathematical transformations were tested with the significance level of 0.01 (|| = 0.192 or above).The curves of correlation coefficients of raw reflectance data are plotted in Figure 4.For raw reflectance data, no band passed the significance test at the level of 0.01, but with the order of derivative increasing, the correlation coefficients were raised beyond the 0.01 level in some wavelength ranges.In addition when the order increased from 0 to 0.6, variation tendency among the correlation coefficient curves of different orders detailed in the range from 600 to 1100 nm and from 2000 to 2200 nm and some other ranges, but when the order was greater than 0.6, the curves fluctuated greatly, and lacked regularity; thus more details could not be found in Figure 4.
In Figure 4, it is not clear how many bands passed the significance test at the level of 0.01, thus, the numbers of raw reflectance and 4 other transformations are counted and their   trend lines are shown in Figure 5.For these 5 mathematical forms of reflectance (, √ , 1/, lg , and 1/lg ), no band passed the significance test, but with the increase of the derivative order, the numbers followed first increasing then decreasing trend, and all reached maximum at fractional order (, √ , and lg  at 0.6 order, 1/lg  at 0.8, and 1/ at 0.4 separately).

Model Calibration and Validation.
The 180 samples were randomly divided into two parts: 144 (80%) for model calibration and 36 (20%) for model validation.In order to make full use of the hyperspectral data and take advantages of PLSR, all the bands whose correlation coefficient passed the significance test at the level of 0.01 were used as features to participate in the modelling process.The calibration and validation results of 55 models based on spectral data treated by mathematical transformations and different orders of fractional derivative are summarized in Tables 2-6.
As to the integer derivative, the models established on first derivative were much better than second derivative and the data untreated by derivative, because there was no band of spectral data without derivative treatment that passed the significance test and it had more obvious effect for first derivative than second derivative on raising the correlation coefficient.But, for fractional derivative, things had changed; the models based on the data treated by (3) had better results than the integer order models (lower RMSE  and RMSE  and higher  2  ,  2  , and RPD).RPD is an important parameter to evaluate the performance of regression models and the ranges of <1.4,1.4∼2.0,and >2.0 correspondingly mean the model has a poor, receptible, and excellent capacity of predicting soil salinity [57,59].There were 30 models having acceptable results with RPD > 1.4, and among these 30 models there was only one best model which was built up by 250 bands based on 1.2-order derivative of 1/lg  with 4 principal components, RPD = 2.080 (>2.0), lowest RMSE  (14.685 g/kg) and RMSE  (14.713 g/kg), highest  2  (0.782), and  2  (0.768).The scatter plot of measured and predicted soil salt content of the best model is shown in Figure 6. 2 of measured and predicted values in calibration and validation set both reached 0.782 and all these figures meant that the calibrated model based on hyperspectral data treated by fractional derivative could be used to estimate the soil salinity in the study area.

Discussion
According to (3), when the order V = 1 or 2, the equation becomes the same as first-and second-derivative equation with derivative window that equals 1 [42,[60][61][62], and it can be seen from ( 3) that the integer derivative value of a band is related to the bands in the derivative window, while the fractional derivative value of a band has connections with the bands whose wavelength is less than this band.And that is a big difference between fractional and integer derivative and the main cause of the results of this study, in which it is known that the integer derivative is unique and local, while fractional derivative is usually nonlocal and has memory [27,63].Traditionally, there are big differences among the shapes of zeroth, first, and second derivatives, but fractional derivative could provide more useful information from hyperspectral data, because the order is extended to noninteger and it could add detail curves among integer derivative of spectral curves and, as a result, this effect could be directly manifested among the correlation coefficient curves of different orders of fractional derivative (Figure 4) [16,63].
In the result section, the integer derivative indeed raised the correlation coefficients between reflectance data and soil salt content and also improved performances of models built up by PLSR to some extent, but, compared with fractional derivative, it truly lost information of some bands and decreased accuracy and performances of estimation models.Thus, fractional derivative could compensate for this disadvantage due to the flexibility in practice for conveniently choosing the suitable derivative order [64].
As is known to all, the first and second derivatives correspondingly mean the slope and curvature of spectral curves, and, however, the physical meaning of fractional derivative in spectroscopy has not been clarified yet.But it suggests that the order between 0 and 2 of fractional derivative could be described as the sensitivity to the slope and curvature of spectral curves; when the order increases from 0 to 1, the derivative value becomes more sensitive to the slope and less sensitive to reflectance, and while the order increases from 1 to 2, the derivative value turns out more sensitive to the curvature and less sensitive to the slope [22].According to these suggestions, in this study, differences among correlation coefficient curves, the numbers of bands that passed the significance test, and the accuracy indexes of regression models (RMSE  , RMSE  ,  2  ,  2  , and RPD) were all manifestations of this sensitivity, and their tendencies did not directly increase or decrease but showed ups and downs to a certain degree, and some of them achieved optimal values at fractional orders (Figures 4 and 5 and Tables 2-6).The process of modelling in this study was a trying procedure to find a balance with suitable order, lowest RMSE  , RMSE  , and highest  2  ,  2  , and RPD by PLSR; according to the performance evaluation indexes, the best model was finally discovered.
Indeed, there are some other studies with better performances than ours.Mashimbye et al. [65] used bagging PLSR with first-derivative reflectance data to estimate soil electrical conductivity in South Africa and validation  2 reached 0.85.Peng et al. [66] combined visible near-infrared with midinfrared hyperspectral data to predict total dissolved salts in Xinjiang by PLSR, and found RMSE  = 0.20 g/kg, RMSE  = 0.43 g/kg,  2  = 0.96,  2  = 0.70, and RPD = 2.14.According to the result of our study, they might grasp more details if fractional derivative was applied in their researches.

Conclusions
In this paper, the Ebinur Lake in the northwest border of Xinjiang, China, was chosen as the research area; combined with laboratory measured soil salt content and hyperspectral data of 180 samples, PLSR was employed to build up quantitative estimation models of soil salinity based on the hyperspectral data treated by mathematical transformations and fractional derivatives.The conclusions are as follows: (1) The best retrieval model was built up by 250 bands based on 1. (2) During the course of data processing and model calibration and validation, differences among correlation coefficient curves and the numbers of bands that passed the significance test, RMSE  , RMSE  ,  2  ,  2  , and RPD, showed some variation tendencies which were manifestations of the sensitivity to the slope and curvature of spectral curves.
(3) In the process of modelling, the integer derivative lost information and accuracy of quantitative estimation and, to some content, fractional derivative could compensate for this disadvantage because of the flexibility for choosing the suitable derivative order.
As an extension of integer derivative, and due to the flexibility of the order selection, fractional derivative could enrich the method of data preprocessing and dig for information lost by integer derivative from the spectral demission for making full use of hyperspectral data.Although this study is just an application of fractional derivative, it provides a reference for estimation of other parameters by using hyperspectral technology.Further researches should be focused on the physical meaning of fractional derivative in spectroscopy and promote for space-borne hyperspectral technology for precisely monitoring land surface parameters on large spatial scales.

Figure 1 :
Figure 1: Study area and distribution of sampling sites.

Figure 3 :
Figure 3: Spectral curves of soils with different degrees of salinization.

Figure 4 :
Figure 4: Correlation coefficients between salt content and raw reflectance data treated by fractional derivatives.

Figure 5 :
Figure 5: The numbers of bands passed the significance test and trend lines.

Figure 6 :
Figure 6: The relationship between measured and predicted soil salt content in calibration and validation set.

Table 1 :
Statistical results of salt content (g/kg) and pH value.
SD: standard derivation; CV: coefficient of variation.

Table 2 :
The results of the models based on raw reflectance.

Table 3 :
The results of the PLSR models based on √ .

Table 4 :
The results of the PLSR models based on 1/.

Table 5 :
The results of the PLSR models based on lg .

Table 6 :
The results of the PLSR models based on 1/ lg .