Quick Determination of Soil Quality Using Portable Spectroscopy and Efficient Multivariate Techniques

Rapid and onsite determination of the soil status and quality parameters holds a brighter potential for improving food security, and minimizing waste of the excessive application of soil amendments hence reducing environmental pollution. In this study, a pocket-sized shortwave NIR spectroscopy (740–1070nm) and multivariate statistics were used to classify soil from diferent land-use types and simultaneously predict nitrogen (N), phosphorus (P), potassium (K), calcium (Ca 2+ ), magnesium (Mg 2+ )


Introduction
Soil quality and soil fertility management play a signifcant role in agricultural productivity and environmental pollution control, and therefore, a rapid knowledge of the soil quality status of the soil is a vital step.Te status of soils is normally measured by analysing the soil using the traditional laboratory technique known as wet chemistry to provide useful information.Te wet chemistry approach comes with its numerous challenges, such as it is expensive, time consuming, involve chemical usage, often restricted to fewer samples, or samples are bulked from an area to provide representative composites and prompting the use of pedotransfer functions as a substitute [1,2].It is also known to generate unwanted waste and destructive to the original soil samples [3].Above all, it is limited to the laboratory and cannot be used in the feld where it is needed most to provide rapid and accurate results to assist in the promotion of precision agriculture.Tis shows that an alternative technique is required in the face of promoting in situ determination to encourage precision agriculture.Te development of alternative measurement methods that are accurate, rapid, and inexpensive is of great value [4].
NIR spectroscopy is an advanced analytical technique that has gained ground in various felds including agriculture.It provides many useful advantages over the traditional analytical methods.Tese advantages include the following: it is physical, nondestructive, rapid results, and no chemical usage, hence environmentally friendly and inexpensive [2].Research conducted by other researchers have revealed the potential usefulness of NIR spectroscopy for soil analysis and notable among them include the measurement of heavy metals in soil [4], soil physical, chemical, and biochemical properties [5], soil carbon and nitrogen [6,7], discrimination of three major soil types [8] and discrimination of organic matter in soil from grass and forest [9].All these aforementioned studies have proven that NIR spectroscopy could provide the needed alternative for soil analysis.However, all these studies involve the use of large NIR machine that defeat the purpose of onsite usage.Hence, there is little or no attempt to use a small NIR spectrometer for simultaneous determinations of soil health properties.However, due to the advances in computers and electronics, portable or small NIR spectroscopy has been proposed and developed coupled with chemometric.Tis could provide an added advantage over the laboratory-based NIR spectroscopy.However, up until now, little or no studies have been done in Ghana on the use of pocket-sized user-friendly NIR spectroscopy for soil analysis on the classifcation of diferent land-use types and also for predicting soil health quality parameters.
Tis research, therefore, seeks to investigate the feasibility of applying pocket-sized NIR spectroscopic techniques coupled with multivariate statistics by employing a variablewise selection protocol for the simultaneous classifcation and detection of soil health properties to inform the stepwise precision application of soil amendment.Te specifc objectives are to predict the identifcation of soil under different land-use types and determine N, P, K, pH, Ca 2+ , and Mg 2+ simultaneously by employing synergy interval variable selection optimum.

Sample Collection.
A total of 110 soil samples were collected at diferent depth (0-15, 15-30, and 30-45 cm) from diferent land-use types such as arable, native, pasture, and plantation as describe by others [10].Physically, any rough stones and plant debris were removed before the soil samples were air dried.Te soil samples were then individually uniformly ground, sieve through a 2 mm sieved, and then package in a well-labelled polythene bag before analysis.

Sample Spectral Acquisition.
Te spectrum of each sample was obtained in the refectance mode using a pocketsized spectrometer (SCIO ™ ) in a spectral range of 740 nm-1070 nm in a 1 nm resolution for spectra data recording.To scan the samples, a 60 g sample was poured into a glass container as seen in Figure 1 and scanned four times after rotating it at 45 o .Te whole process was carried out at 28−31 °C and 65% relative humidity.Te raw dataset of 110 soil samples stored in the cloud based were downloaded using a research license of SCIO lab and imported into MATLAB version 9.5.0 (Mathworks Inc., USA).Te downloaded raw dataset was divided into two subsets called the calibration set (77 samples) for developing the model and the prediction set (33 samples) for evaluating the predictability of the developed model.To avoid bias in the selection of members in each subset, the Kennard−Stone algorithm was used in the partitioning of the dataset.

Reference Methods.
Te pH of the soils was measured in a 1 : 2.5 (w/v) soil: water ratio with a pH meter [11].Total nitrogen (N) was determined using the micro Kjeldahl digestion method [12].Available phosphorous in the soil was determined following the Bray-1 acid method [13].Ca, Mg, and K were determined through extraction using the ammonium acetate method at pH 7 [14].All the analysis were done in triplicates, and the measured soil chemical properties were statistically processed in terms of the range (maximum to minimum values), mean, and standard deviation (SD) as seen in Table 1.

Mathematical Signal
Treatments.In this study, fve mathematical spectral signal pretreatments (MC, mean centring; MSC, multiplicative scatter correction; SNV, standard normal variate; FD, frst derivative; and SD, second derivative) were comparatively used to obtain the best model developed.In NIR modelling, it has become very necessary to pretreat the raw data set with the best techniques and the challenge, however, is there are several of them.It has, therefore, become a huge task coupled with the fact that it cannot be left-out.Spectral pretreatment is known to be an efective method to reduce or eliminate the optical scattering from diferent particles, reduce noises, and thereby improve prediction accuracy and robustness of the developed model [15].Also, any interferences caused by light scattering, baseline shift, and slope variations caused by the particle size are causing unwanted signals to be removed [16].MC uses the principle of calculation of average; thus, this average spectrum of the data set is calculated and this average is subtracted from each spectrum of the acquired data [17].SNV is normally used to remove scatter variation from the 2 Journal of Spectroscopy light source in the spectral data by eliminating multiplicative interferences and scatter [16,18].MSC is a unique preprocessing technique that is normally used for the correction of scattered light and to remove diferent inclinations of spectral peak.For more information, refer to [19].Also, FD and SD derivatives spectra pretreatments are used to separate overlapping peaks and eliminate the baseline shift and it is improved by using the Savitzky-Golay algorithm.
2.5.Quantifcation Models.Te partial least squares (PLSs) algorithm is a well-known linear multivariate algorithm proposed by Herman Wold for modelling complicated data set [20].It has recently found its use for analysing spectra data with strong collinear, noise, and redundant variables.However, the original PLS works on full spectrum and involves a larger sample matrix which often has both useful and unwanted information.To overcome this bottle neck in the PLS model, other researchers have resorted to the manual selection of diferent spectral regions to estimate some chemical composition [21,22].Tis approach, however, is slow and cumbersome and requires a prior experienced knowledge about unique spectra selection.To solve the aforementioned challenges associated to the PLS model, the interval partial least squares (IPLSs) and synergy interval partial least squares (Si-PLSs) models were proposed.For IPLS, it works by splitting the spectra into smaller equidistant regions and they develop the model for each subinterval by the original PLS, while Si-PLS also split the data set into a number of intervals and then calculate all possible PLS models for all possible combinations of more than one interval (two, three, and four intervals).Te best interval for IPLS and Si-PLS are selected based on the lowest root mean square error of calibration (RMSEC) for a single selected interval and for a combination of intervals with the for the best outcome is chosen respectively.Te results of the model are normally evaluated by using three main parameters, namely, the RMSECV, the root mean square error of prediction (RMSEP), and the coefcient of determination (R 2 ) [23,24].Tese parameters are calculated by using the following equation: where n = the number of samples.y i = the reference measurement results for sample i,  y i = the estimated results of the model for the sample i, and y = the mean of the reference measurement results for all samples in the data set.

Results and Discussion
3.1.Spectral Data Presentation.Spectral profle obtained contains useful information for modelling.Figure 2(a) presents the raw spectra of soil samples from diferent landuse types and this revealed several absorptions bands.However, the spectra profle appears to show similarities with no unique diferences when looked at with the naked eyes.Furthermore, the spectra profle appears to have no useful information and this, therefore, called for the use of multivariate algorithms to assist in the building of qualitative and quantitative models for predicting useful parameters of interest.Also, the wavelength range (740-1070 nm) used possesses unique functional groups such as C-H stretch, C-H deformation, S-H, N-H, CH 2 , and CH 3 that could correspond to various parameters in soil such as N, P, K, pH, and other distinct attributes (as seen in Table 1) that could be useful for diferentiating the various soil types, as seen in Figure 2(b).Te wet chemistry results obtained in this study showed a wide range of chemical properties as seen in Table 1, and this could be attributed to the wide array of land use types for the study from which the samples were collected.Te results obtained also agree with those of other authors [10].Furthermore, the relationship between the spectral absorption wavelength and soil chemical composition (absorption of C-H, O-H and N-H bonds) made it possible to quantify specifc soil health parameter of interest using appropriate selection of the wavelength region [25], and this could be attributed to the clear separation as observed in Figure 2(b).Also, the organic matter present in the samples used have distinct spectral fngerprints in the NIR  region because the relatively strong absorption of overtone and the combination modes relative to several functional groups (CH: aliphatic, CO: carboxyl, NH: amine and amide) are usually present in the organic compounds [26].

Principal Component Analysis (PCA).
Principle component analysis ofers an unsupervised pattern recognition tool in a dimensional space for observing any possible cluster trends.It works by reducing the dimension of the data matrix and translating useful information into interpretable variables known as principal components (PCs).Figure 3(a) shows the outcome of PCA and it revealed that there were four distinct soil groups.All the samples clustered well along the two PCS planes where PC1 and PC2 could explain 92.68% and 6.68% of the variance, respectively, giving a total accumulative contribution of 99.37% variance for the 110 samples used in this study.Tis means the frst two principal components (PC1 and PC2) cover the maximum information and provided the chemical compositional information in the NIR region for modelling.Soil samples have considerable unique diferences in chemical properties in accordance with their land use type.Since PCA is not a classifcation tool, LDA and SVM multivariate classifcation techniques were used for building a classifcation model.

Classifcation Model.
Tere are several classifcation algorithms and most often the selection of the ones to use is a big challenge.In this experiment, linear discriminant analysis (LDA) and the support vector machine (SVM) were comparatively used.Tis was because every multivariate classifcation model has its own strength and weakness.
From Table 2, it could be observed that the LDA model had its optimum classifcation rate at 98.65% and 97.22% in the calibration set and the prediction set, respectively, after the FD preprocess technique was applied on the raw data.Tis fnding supports the aforementioned fact that preprocessing methods are known to improve modelling results as it normally eliminates unwanted information, reduce noise, improved accuracy, and enhance robustness of the developed classifcation model [15].On the other hand, the SVM obtained the best results comparatively at a classifcation rate of 99.32% and 98.61% in the calibration and prediction sets, respectively, as seen in Table 2. Also, among the preprocessing techniques used, and MSC and FD improved the raw spectra data set, hence enhanced the fnal classifcation rate.It could be explained that MSC is unique in the correction of scattered light and to remove diferent inclination of the spectral peak while FD enhanced spectra separation.In this research, the MSC-   4 shows cross validation done using randomly selected spectra to test the model.Among the samples used in Figure 4, it was observed that only one sample was misclassifed.Tis sample was the one from the pasture land-use type.It could be explained that the SVM created a hyperplane that allowed the separation in the higher dimension feature space because the SVM is a transformational tool that converts data from a low dimension input space to a high dimension feature space [17].
Explaining the phenomenon of the accurate classifcation is vital.Figure 5 reveals the total contribution of the unique wavelengths that contributed to the neat separation and classifcation of the land use types.At the frst component, the major peak was found around 900 nm and this corresponds to CH 3 and CH 2 at the third overtone [26] associated with organic materials, while at the second and third components, the major peaks were found around 800-830 nm, 850-875 nm, 925-950 nm, and 1000-1050 nm.Tese wavelengths correspond with RNH 2 , ArCH, CH 3 , CH 2 , and RONH 2 [26] that are associated with chemical properties like nitrogen, pH, organic carbons, and among others in the soils used in this research.

Quantitative Models.
Te spectral prediction of nitrogen, phosphorus, potassium, pH, calcium, and magnesium were modelled by using diferent PLS and other wavelength selection techniques (IPLS and Si-PLS).From the results obtained by using the full PLS algorithm, frst derivative spectra preprocessing performed better than the others in all the soil quality parameters as seen in Table 3. Tis performance could be due to frst derivatives spectra pretreatment's ability in greatly defning the presence and locations of hidden absorption bands [27].Also, from Table 4, the parameters measured did not show any well-defned pattern for the preprocessing models' performance.Te parameters measured did not show any well-defned pattern for preprocessing model performance.Tus, mean centring (MC) preprocessing was superior for nitrogen and calcium, while frst derivative and SNV outperformed the others for phosphorus, pH, and potassium.Generally, results obtained by using Si-PLS showed an optimal performance for all the parameters studied, as seen in Table 5. Specifcally, FD preprocessing spectra treatment also enhanced the results of most quality parameters (N, P, and K), while MC enhanced Calcium results and No preprocessing treatment was needed for pH and magnesium.
Comparatively, as seen from Table 6, IPLS performed the least followed by full PLS, while Si-PLS performed best for all the parameters (N, P, K, Mg 2+ , Ca 2+ , and pH) studied.Tese revelations could be explained by that each PLS type has its unique properties.PLS performed on the full spectral region of the soil samples and contained some irrelevant spectral information which inevitably reduces the performance of the PLS model, while IPLS actually overcome the challenges of PLS by selecting a maximum region of interest to calibrate the PLS model.However, only a single interval selection gives way for the neglect of other useful spectral information.Hence, it could be seen that IPLS performance declined drastically.On the other hand, its counterpart (Si-PLS model) uses the combination of more than one useful selection of intervals to model the parameter of interest as in the case of this study.Terefore, Si-PLS showed its own superiority over PLS and IPLS because it overcame the demerits showed by both techniques (full PLS and IPLS).More specifcally, for nitrogen prediction, Si-PLS performed best, as seen in Table 6.Te optimal spectral interval selected were 770-784, 945-958, and 973-986 nm at 4 PLS components, as seen in Figure 6(a).Tese spectra corresponded to various absorption bands for the nitrogen content in soil as these ranges are associated with RNH 2 according to others [26].Tese wavelengths are also associated with C-H and N-H third overtones.For phosphorus, the optimum selected wavelengths were 768-781, 894-907, 973-986, and 1058−1070 nm at 3 PLS components as seen in Figure 6(b), which represents the third overtone region and correspond to ArOH, CH 3 , and ArCH.Te mobilization of phosphorus plays a vital role in capturing, storing, and   converting the sun's energy into biomolecules, such as adenosine triphosphate (ATP) that drives biochemical reaction (photosynthesis).While for potassium, the optimal spectra range was found around 846-860, 876-890, 921-935, and 996-1010 nm with 7 PLS components in the second overtone region, which represents ArCH and CH in the electromagnetic wave as seen in Figure 6(c).Potassium supports transporting and forming sugars and starch through the plant.It is also vital in water regulation in plant.Te total pH in soil is very important because it infuences several soil factors afecting plant growth such as soil structure, soil bacteria, and nutrient availability among others and it is described as the master soil variable [28].In this study, the optimum spectra tool selected four unique wavelengths for pH were 810-823, 824-837, 922-935, and 1019-1031 nm at 7 PLS components, as shown in Figure 6(d).Tese wavelengths represent C-H 3 , C-H 2 , C-H, and O-H corresponding to acidity [26].It is particularly important to rapidly determine soil pH onsite as it readily gives a hint of the soil condition and the expected direction of many soil processes and can also be applied for nutrient cycling for plant nutrition and soil remediation [28].For the optimum modelling of calcium and magnesium (Figures 6(e) and 6(f )), the Si-PLS method selected 756-770, 801-815, 936-950, and 981-995 nm at 8 PLS component and 768-781, 824-836, 967-979, and 1019-1030 nm at 12 PLS component, respectively.Ca 2+ and Mg 2+ are micronutrients required by plants for growth though in minute quantities.More specifcally, Ca 2+ is a component of plant cell that maintains cell walls strength and improves the fruit set and quality.Also, it has a positive efect on soil properties by improving the soil structure by enabling nitrogen-fxing bacteria on the roots of leguminous plants to capture atmospheric nitrogen into the soil.Mg 2+ , on the other hand, is an essential component of chlorophyll molecule; therefore, it is essential for photosynthesis in plant.Notably, Ca 2+ and Mg 2+ levels and their balances are two important factors afecting the growth of plant [29].Furthermore, heavy metals do not absorb NIR; however, such constituents which do not absorb NIR radiation can be predicted owing to their correlation with other spectrally active parameters [30,31].Also, the fndings in this study were similar to those of other researchers [31,32].And, as can be seen from Table 6, the results means that the model could be used acceptably for screening and other "approximate" calibration and the range 0.83-0.90could be usable with caution for most applications, including research [33].

Conclusion
For the frst time, this work has revealed that pocket-sized NIR spectroscopy in the range of 740-1080 nm could be used onsite to diferentiate soils of diferent land used types and N, P, K, Mg 2+ , Ca 2+ , and pH simultaneously.Te systematic comparison of diferent PLS calibration models for the prediction of soil health parameters revealed that the efcient spectral interval showed its superiority in measuring N, P, K, Mg 2+ , Ca 2+ , and pH in soils with the coefcient of correlation ranging from 0.699 to 0.898 and RMSEP between 0.033 and 3.02 in the prediction set.Tis means that for the models developed, the nitrogen model could be acceptable for very rough to rough screening, while the other could also be acceptable for screening, other "approximate" calibration, and usable with caution for most applications, including research [33].Tese fndings mean that portable NIR spectroscopy could be used for the rapid prediction of the soil status and quality parameters simultaneously with caution.However, more studies are needed to proof the robustness of the fndings as it has a huge possibility of reducing the use of the time-wasting wet chemistry technique.It could also assist in making precision fertilizer application a reality in resource-poor communities, especially in developing countries.Also, this study only provides a feasibility study of using portable NIRS, and further studies are therefore required at diferent geographical locations and wide land-use types.

Figure 2 :Figure 3 :
Figure 2: Spectra profle (a) raw and (b) mean of soil samples from diferent land-use types.

Table 1 :
Wet chemistry measurement of soil health properties.

Table 2 :
Identifcation rate of diferent land use classes by LDA and SVM.Te bold values represents the best results.

Table 3 :
Optimal selection of the preprocessing technique using the PLS model.

Table 4 :
Optimal selection of the preprocessing technique using the IPLS model.

Table 5 :
Optimal selection of the preprocessing technique using the Si-PLS model.

Table 6 :
Modelling results from diferent variable selection models for soil quality parameters.