Visible and Near-Infrared Reflectance Spectroscopy for Investigating Soil Mineralogy: A Review

Clay minerals are the most reactive and important inorganic components in soils, but soil mineralogy classifies as a minor topic in soil sciences. Revisiting soil mineralogy has been gradually required. Clay minerals in soils are more complex and less well crystallized than those in sedimentary rocks, and thus, they display more complicated X-ray diffraction (XRD) patterns. Traditional characterization methods such as XRD are usually expensive and time-consuming, and they are therefore inappropriate for large datasets, whereas visible and near-infrared reflectance spectroscopy (VNIR) is a quick, cost-efficient, and nondestructive technique for analyzing soil mineralogic properties of large datasets. The main objectives of this review are to bring readers up to date with information and understanding of VNIR as it relates to soil mineralogy and attracts more attention from a wide variety of readers to revisit soil mineralogy. We begin our review with a description of fundamentals of VNIR. We then review common methods to process soil VNIR spectra and summary spectral features of soil minerals with particular attention to those <2 μm fractions. We further critically review applications of chemometric methods and related model building in spectroscopic soil mineral studies. We then compare spectral measurement with multivariate calibration methods, and we suggest that they both produce excellent results depending on the situation. Finally, we suggest a few avenues of future research, including the development of theoretical calibrations of VNIR more suitable for various soil samples worldwide, better elucidation of clay mineral-soil organic carbon (SOC) interactions, and building the concept of integrated soil mapping through combined information (e.g., mineral composition, soil organic matter-SOM, SOC, pH, and moisture).


Introduction
Soils are open, complex, and dynamic systems as well as fundamental natural environments for animals, plants, microorganisms, and human interaction [1]. Mineral composition is the most fundamental property of a soil, and soil minerals account generally for half the soil volume [2]. According to Churchman [3], clay minerals in the soil context are "secondary inorganic compounds of <2 μm size" including Fe, Al, and Mn oxides (hydroxides and oxyhydroxides), as well as noncrystalline phases. Importantly, they are the most reactive and important inorganic components in soils, and they occur commonly in close association with the most reactive organic matter [4,5]. Clays influence soil function through both their bulk properties and their associations with their huge outer/inner surfaces (e.g., cation exchange capacity [6]). e effort involved in comprehensive understanding of the nature of soil minerals is of particular importance as they may help us explain and predict how different soil types function [7].
However, soil mineralogy (mainly clay mineralogy) is still a minor topic in soil sciences. is may be due partly to the unjustified assumption that a given soil mineral will have the same properties as those of its better-crystallized counterpart that formed in a more "geologic" context (e.g., sedimentary kaolinite will have the same properties as pedogenic kaolinite) [4]. Revisiting soil mineralogy has been gradually important, for instance, in terms of the manner by which soil minerals are de ned and investigated [8].
e most commonly used method to characterize soil minerals is XRD, which is fundamentally qualitative. Since soil clay minerals are generally more complex and less well crystallized than those of geological environments [9][10][11], they display more complicated XRD patterns [12,13]. Despite quantitative improvements of XRD [14], mineral characterization is usually expensive and time-consuming [2]. Some chemical extraction procedures can be useful in the analysis of Fe oxides. However, this is expensive, timeconsuming, and can complicate our scienti c interpretation of the soil by changing the chemical equilibrium between soil solution and solid phases in soil specimens [15,16]. us, these conventional analyses are not appropriate for larger scale soil studies, and we must use an alternative method to target and characterize soil minerals.
Visible and near-infrared re ectance spectroscopy (VNIR, 350-2500 nm), that is, the study of light of the visible and near-infrared re ected from material surfaces, is a quick, cost-e cient, and nondestructive technique in soil sciences [17,18]. is technique has been greatly developed in soil sciences in the past several decades and has seen apparent exponential growth over the past 20 years [19]. VNIR has been of increasing interest for the analyses of soil parameters including soil organic carbon, pH, bulk texture, elemental concentration, and cation exchange capacity [20,21]. In soil mineralogy, VNIR can be used to characterize various soil mineralogic properties such as clay mineral composition, clay content, and mineral weathering/alteration degree, although quartz and feldspar have weak/nonexistent absorption in the VNIR range [22][23][24]. In this paper, we aim to bring readers up to date with VNIR as it relates to soil mineralogy and we seek to attract more attention from readers to revisit soil mineralogy.

Fundamentals of VNIR
e VNIR part of the electromagnetic spectrum includes both the visible (350-780 nm) and near-infrared (780-2500 nm) ranges, which overlaps with the optical radiation range (100-1000 nm; Figure 1). Sometimes, the 350-1000 wavelength range is referred as VNIR (visible-near-infrared), and the 1000-2500 range is referred as the SWIR (short-wave infrared) in remote sensing literature [25]. e human eyes and brain can process spectral information from the visible region and see color, while modern spectroscopy can observe precise details over a much broader wavelength range.
2.1. Absorption, Scattering, and Emission. When photons enter a solid, liquid, or gaseous material, they will either be absorbed, re ected from its surface, or pass through it [26]. e re ective process is de ned as scattering, and the scattered photons can be detected and measured. Photons can also be detected when they are emitted from a surface with a temperature above absolute zero [25]. ree general physical processes (i.e., electronic transitions, vibrational transitions, and rotational transitions) result in the absorption bands in the spectra of materials. e absorption bands in the VNIR range are derived from both the electronic and vibrational transitions [27,28].

Electronic Transitions.
Discrete ions and atoms have independent energy states. A photon is emitted from an atom when one of its electrons moves to a lower energy state. When an atom absorbs a photon of a given wavelength, its electrons move from a relatively low electron state to a higher one [25]. ese electron processes occur because of their high energy and mobility. e electronic processes are mainly caused by (1) crystal-eld e ects. Since iron is a very common transition element in minerals, a common electronic process revealed in the visible region is due to un lled d-orbitals of Fe-oxide minerals [24,29]. Electron energy levels are in uenced by many factors, including the valence state of the atom (e.g., Fe 2+ and Fe 3+ ), the type of ligands, the asymmetry of the location it occupies, the distance between the metal ion and the ligand, and the deformation degree of the site [28]. (2) Charge transfer: it is dominated by mineralogy, and it is hundred times more powerful than the crystal-eld e ects. It is the main reason of the red color of hydroxides and Fe oxides. Moreover, the conduction bands and color centers can also be causes of the electronic transitions in some minerals [25].

Vibrational Transitions.
e bonds in a crystal lattice or molecule vibrate like springs. e molecule's mass and the strength of each molecular bond dominate their vibration frequency [25]. e absorption bands in the VNIR range are observed as a consequence of molecular vibrations [30]. Soil minerals (e.g., phyllosilicate and carbonate minerals), in particular, have unique absorption features in the VNIR region due to overtones and vibrational combinations related to the stretching and bending of the molecular bonds such as O-H, C-H, C-C, and N-H [31].

Spectral Preprocessing.
e raw spectra are usually preprocessed through various approaches to accentuate features and remove signal noise [32]. e processed soil spectra facilitate mineral identi cation, and the accuracy of soil mineral prediction is greatly improved through the use of various preprocessing methods [33]. e following 2 Journal of Spectroscopy preprocessing methods for spectra have been used in previous soil mineralogic studies.

Continuum Removal
Approaches. e continuum removal approach aims to remove background noise and isolate particular absorption features for identi cation and analysis [34]. e continuum is usually determined using local maxima to generate a hull of boundary points (Figure 2(a)) [22]. All the boundary points are tted by straight-line segments, and then, the continuum removal is calculated by removing the original re ectance intensities from corresponding intensities of the continuum (Figure 2(b)) [23]. Continuum removal analysis is a particularly robust tool for detecting and predicting iron oxides and phyllosilicate minerals. us, it is feasible to substitute a statistical method to apply to soil mineralogy studies [10,20,22,24].
Absorption bands in the VNIR region can be described by geometrical parameters derived from the continuum removal curve (Figure 2(b)). Four parameters are directly displayed in Figure 2(b), which include position (P), width (W), depth (D), and full width at half maximum (FWHM, abbreviated to "F"). e parameter asymmetry (AS) is calculated as follows: where F left represents the left width at half maximum, and F right represents the right width at half maximum [20].

Smoothing Techniques.
Smoothing techniques are used to extract the maximum amount of information from each spectrum possibly by minimizing the in uence of background noise [32]. Commonly used smoothing techniques include the Savitzky-Golay transform (SG [35]), Norris smoothing lter (NG [36]), and averaging spectra [37]. SG smoothing eliminates the in uences of ground interference noise and baseline oat, thus enhancing the signal-to-noise ratio. NG smoothing removes the e ects of particle-size variation when the soil samples vary in texture, moisture, and grain size [32].

Derivative Algorithms.
Derivative algorithms can rapidly identify characteristic positions of spectral minimum, maximum, and in ection point values [32]. Additionally, the e ect of variation in optical setup and sample grinding is eliminated after derivative transformation [38]. Because the spectral noise tends to amplify with derivative transform, a smoothing technique is often used before the derivative algorithm [37]. e spectral curve after the rst derivative, for example, is better at discriminating goethite and hematite and estimating their abundance, with two peaks at 435 and 535 nm for goethite and a single absorption at ∼570 nm for hematite ( Figure 3) [39].

Fe-Oxide Minerals.
Fe-oxide minerals are known to be pedogenic indicators for investigating soil temperature and moisture regimes, which are directly related to pedogenic climate evolution [24,40]. Fe-oxide minerals are the main active components in the VNIR region (350-1000 nm) since most electron transitions are caused by various kinds of iron oxides [41,42]. e most common Fe-oxide minerals in soils are goethite (α-FeOOH) and hematite (α-Fe 2 O 3 ), which can track climate change [43,44]. Goethite and hematite exhibit diagnostic spectral features in the VNIR region, and the absorption bands are generally broad and smooth ( Figure 3). A strong absorption band near 920 nm indicates the presence of goethite (Figure 3(a)), and four absorption bands at 420, 480, 600, and 1700 nm can be used to map its distribution [39]. Hematite is dominated by three absorption bands at 520, 650, and 880 nm [45]. Both goethite and hematite have an absorption band at around 500 nm (480 for goethite and 520 for hematite, resp.); the band for goethite (at 480 nm) is narrow with intense re ectance, while the band for hematite (at 520 nm) is wide with low re ectance ( Figure 3(a)). e absorptions in the VNIR region cause the vivid colors of Fe oxides, for example, yellow goethite and red hematite [37]. For a spectral curve representing a sample soil mixture, the width of the absorption band at ∼870 nm (W 870 ) is higher when the soil sample contains more Fe oxides [46]. e concave shape of the 800-1000 nm range indicates the crystallinity of the Fe-oxide minerals. When a soil sample is composed of well-crystallized minerals, the corresponding spectrum reveals a symmetric and deeper feature in this range [47].

Clay Minerals.
Clay minerals are frequently used as climatic indicators since their nature is directly in uenced by the temperature and amount of precipitation at the site during pedogenesis [9,48]. As climate conditions shift from cool/dry to warm/moist, the dominant clay minerals go from chlorite/illite → vermiculite → montmorillonite → kaolinite [24,49]. e dominant clays in soils show diagnostic absorptions in the SWIR domain [39]. ese absorption bands are caused by vibrational transitions and commonly display sharp and narrow features ( Figure 4). e diagnostic bands are mainly focused on ∼1400 nm (overtones caused by OH), ∼1900 nm (overtones caused by molecular water), and ∼2200 nm (combination tones caused by Al-OH [50,51]).
Additionally, some weak absorption bands in the 2300-2500 nm region are related to the presence of Fe-OH and/or Mg-OH in the clay minerals [24].   e spectral characteristics of some clay minerals are showed in Figure 4 and Table 1. Chlorites are a group of clay minerals containing specific octahedral cations such as Fe, Mg, and Al [52]. eir reflectance spectra exhibit a weak absorption band at approximately 1400 nm and triple absorption features near 2300 nm. e bands at 2250 and 2350 nm are related to Fe-OH and Mg-OH, respectively [53]. Illite is characterized by three prominent absorptions at ∼1400, ∼1900, and ∼2200 nm. Two secondary diagnostic Al-OH absorption peaks close to 2344 and 2445 nm are modified by Fe and Mg tschermak cation exchange [24,31]. Vermiculite has two broad absorptions at 1400 and 1900 nm and two weak absorptions near 2200 and 2300 nm [39]. Montmorillonite has three strong and sharp absorption bands at ∼1400, ∼1900, and ∼2200 nm, which are similar to but generally stronger than illite.
Additionally, the combination bands produced by the vibrations of absorbed water cause weak shoulders near 1468 nm and 1970 nm for montmorillonite spectra [37]. Kaolinite is featured by two spectral doublets: one is near 1400 nm (1390 and 1410 nm), and the other is near 2200 nm (2160 and 2210 nm).

Carbonates.
In soils, carbonates are leached from the surface with time and accumulate in the subsoil at a certain depth [54]. e presence of carbonate is widely used as a basic soil characteristic to describe soil types and quantify soil erosion [22]. Carbonates are characterized by several absorptions in the VNIR domain, caused by overtones and combinations of fundamental vibrations of the CO 3 2− ion ( Figure 5) [31,37]. A strong absorption band at ∼2350 nm and three weak absorption bands at ∼1900, ∼2000, and ∼2160 nm were reported by Hunt and Salisbury [55] for carbonates in the NIR region, with the ∼2350 nm absorption showing obvious double-band structures ( Figure 5).

Prediction from the Continuum Removal Spectra.
As discussed in Section 3.1.1, several geometrical features of the absorption bands can be extracted through the continuum removal method. ose parameters (e.g., P, D, and AS) from the continuum removal spectra are key to characterizing and predicting mineral compositions in soils. Viscarra Rossel et al. [23] quantitatively estimated the mineral composition by using the continuum removal method. Compositions of soil minerals such as kaolinite, illite, Al-smectite, goethite, and hematite are considered in this study, and the parameter  D is selected for prediction. e spectroscopic predictions are generally in consistence with those interpreted by XRD analysis. According to Dufrechou et al. [20], the parameter D at ∼1400, ∼1900, and ∼2200 nm was strongly affected by the amounts of kaolinite, illite, and montmorillonite in soil mixtures. Additionally, the estimation of montmorillonite abundance shows reliability when compared with XRD results. Five parameters (P, D, W, F, and AS) were used in the work by Zhao et al. [24] for assessing the utility of the continuum removal method. We compared these parameters with the results from both XRD and DRS analyses and found that some of the parameters are good at mineral content prediction. Furthermore, some parameters (e.g., AS at ∼2200 nm) are confirmed as reliable proxies for soil weathering and paleoclimate reconstruction.

Chemometric Methods
VNIR spectra of soil mixtures are commonly weak and nonspecific due to (1) low concentration of particular soil minerals, (2) scatter effects caused by soil structure, (3) overlapping absorptions of soil attributes, and (4) influences of specific constituents such as quartz [37]. All of these factors pose a challenge for VNIR analyses. erefore, useful information needs to be mathematically extracted from the spectra and correlated with soil attributes [45]. e development of VNIR in soil studies would have been impossible without the parallel application of chemometric methods [56].
Building a predictive soil mineral abundance model (i.e., multivariate calibration) is an important first step in chemometric analysis. Overall, we should understand the data and the objective of the modeling prior to building a model. en, the spectral dataset is preprocessed and subdivided. Finally, we can proceed to build, evaluate, and select models [57].

Prior to Model Building.
e first step in any model building process for the study of spectral pedology is to understand the characteristics of the dataset. We need to consider three main concepts in understanding the dataset process [57]: (1) understanding the distribution of the responses (i.e., outcomes): the responses are either numerical or categorical. In the model building process for soil mineral analysis, the outcomes (e.g., contents of clay/Fe-oxide minerals) are described numerically. Understanding the characteristics of responses provides better ways for partitioning the data into calibration and validation sets; (2) understanding the nature of the predictors: the predictors in the spectral dataset are numerical, since they are usually the spectral signals between 350 and 2500 nm. In fact, these predictors are highly related, leading to numerically redundant information. Different predictors are suitable for different kinds of models. For example, partial least squares can be used for correlated predictors, while recursive partitioning can manage missing predictor information [58]; (3) the relationship between the amount of the predictor set (P) and the sample set (N): when building a model for a soil mineral study, the dataset commonly has far fewer samples (N < 200) than predictors (P > 2000). erefore, a model that can handle dataset where N < P is preferred.
After understanding the dataset, a preprocessing procedure is often used for improving the performance of the model [47,59]. For a model used in a soil mineral study, the data transformations for multiple predictors contain the following methods: (1) data reduction: principal component analysis (PCA [60]) is a commonly used data reduction technique. In this technique, the number of datasets is largely reduced by seeking principle components (PC)-linear combinations of the predictors that capture the greatest possible variance; (2) removing predictors: in some cases, removing predictors prior to modeling has potential advantages. For example, Adeline et al. [59] showed that performances of the predictive models were globally stable and accurate when the spectral resolution decreased from 3 nm to 60 nm. Additionally, for a model based on a spectral signal dataset, the spectra were transformed to apparent absorbance: A � log(1/R) prior to developing a regression model, and the spectral preprocessing methods discussed in Section 3 also have potential for model performance improvement [18,23].

Candidate Models.
Once we fully understand the dataset, the next step is to setup several candidate models. e most commonly used type of model in soil mineral analysis is a regression model, which is defined as a model that predicts numerical outcomes [57]. Establishing a regression model related to the soil VNIR spectral data is the basic role of chemometric analysis [61]. e regression models are subdivided into linear and nonlinear regressions. Linear regressions are the dominant calibration methods for spectral pedology and include partial least squares regression (PLSR [62]) and principal component regression (PCR [63]). e nonlinear data are managed by data mining techniques, namely, multivariate adaptive regression splines (MARS [38]), neural networks (NN [64]), and regression tree analysis (RTA [65]).

Linear Regression Models.
Both PLSR and PCR can deal with predictors that are highly collinear and are effective in situations where the number of predictors is far beyond the number of available samples [37]. Furthermore, PLSR and PCR are closely related and share similar prediction errors in most situations [61]. Regardless, the PLSR algorithm is usually preferred in spectral pedology analysis because (1) it maximizes covariance between response variables and predictors so that the model is more interpretable, and (2) it is a faster algorithm [45].
PLSR has been widely and successfully used in predicting the mineralogic compositions of weathering levels of soils. Viscarra Rossel et al. [2] accurately predicted the concentrations of kaolinite, illite, and smectite (R 2 � 0.94, 0.96, and 0.92, resp.) in mineral mixtures, although the prediction for Fe oxides was biased against measurement. Summers et al. [66] and Ostovari et al. [67] showed that the PLSR method is good at predicting CaCO 3 content with R 2 values of 0.69 and 0.71 for soil samples from Australia and Iran, respectively. e total clay content and free iron in soils were also proven to be predictable attributes by the PLSR model [68,69].

Nonlinear Regression Models.
e use of models that are inherently nonlinear in nature (i.e., data mining techniques) has gained increasing attention in recent years [37,61]. A more detailed description of the nonlinear models is available in Kuhn and Johnson [57]. Previous studies have suggested that nonlinear regression models or the combination of nonlinear and linear models may provide better predictions for soil properties. Mouazen et al. [70] showed that a combined PLSR-NN model was better at predicting soil properties than a PLSR model. Viscarra Rossel and Behrens [45] proposed that the combined FS VIP -ANN and FS MARS -ANN models were the best models for predicting clay content, pH, and soil organic carbon (SOC) when both the parsimony and accuracy of the model were taken into consideration. Mulder et al. [71] determined the mineral composition of a soil by coupling an RTA model with exponential Gaussian optimization results. e abundances of kaolinite and calcite were predicted with acceptable RMSE values (<0.1) in both laboratory and field samples.

Model Evaluation.
Two techniques are commonly used to test the prediction performance of the model [33,56]. In the first, the soil spectral configuration database is randomly divided into a calibration dataset and a validation dataset [72]. e calibration dataset (usually ∼2/3 of the complete database) is used to derive the model, while the validation dataset (commonly contains 1/3 of the complete database) is set aside to exclusively validate the derived model. is process is used to obtain realistic estimates of prediction accuracy. e second is a procedure called cross validation. It uses the "leave-group-out" method (namely, repeated random subsampling validation method [73]) and was adopted to verify the predictive capability for the calibration dataset. A calibration dataset containing X samples is built from the total database N (N ≥ X + 1). e soil property value of the other N−X samples for validation is predicted. e prediction of relative soil mineral abundance is obtained by repeating the cross-validation process [74].
ParLeS version 3.0 is usually used for multivariate calibration performance [75]. e bias and accuracy of the prediction models are assessed by adjusting the coefficient of determination (R 2 ) between observed and predicted values, the mean error (ME) and the root mean-square error (RMSE) where N is the number of the dataset, Y i is the observed value, and Y i ′ represents the predicted value [18,23]. We compromise between model parsimony and model accuracy to find the most satisfactory model [76]. e Akaike information criterion (AIC) is suggested for bestperforming algorithm selection [45]: where p is the number of factors, and n is the number of samples used in the prediction. e best model will have the minimum AIC value.

Feature Selection.
Feature selection is mainly applied to remove redundant and/or noninformative predictors from the model [57] and may improve model accuracy. Some models such as PLSR, MARS, and RT will provide a feature selection procedure by default. e variable importance of the projection (VIP) and b-coefficient scores obtained by the PLSR model help us measure the statistical significance of predictors and select the most important ones [77]. e VIP score of the kth predictor is calculated as follows: where K represents the total number of the predictor variables, a opt is an optimal number of latent variables selected by the PLSR model, w ak is the loading weight for the kth latent variable, and R 2 a represents the adjust coefficient of determination of the ath latent variable in the PLSR model [78]. A predictor (such as wavelength) is selected and considered to be very important if (1) the VIP exceeds the threshold value of one (Chong and Jun [77]) and (2) its b-coefficient is higher than the b-coefficient based on all spectral bands [23].
According to Gomez et al. [22], the important spectral bands selected by the PLSR model are related to the presence of clay minerals such as kaolinite and illite. Additionally, surrogate spectral features selected by VIP and b-coefficient approach contain enough information to satisfactorily estimate the studied soil attributes. According to Viscarra Rossel and Behrens [45], a combined NN and feature selection model (FS vip -ANN) is the best method to predict clay content and pH and produce smaller RSME and AIC values.

Comparison between Spectral Measurement and Multivariate Calibration
e relative abundance of minerals in a soil sample can be predicted either by spectral analyses (e.g., continuum removal) or chemometric methods (e.g., PLSR and NN) [20,24,45,47]. Although both types of methods correlate the spectral signal with information about the soil minerals, they differ in many ways, including their focused spectral bands, complexity, and how they are applied.

Focused Spectral Bands.
Spectral analyses focus on specific absorption bands representative of the corresponding soil minerals, while the multivariable regression algorithms commonly use the signals from the whole 350-2500 nm region. In some cases, the 350-400 nm and 2450-2500 nm ranges with low instrumental signal-to-noise ratios are removed [59,79]. erefore, a multivariable regression model deals with over 1000 spectral bands-many more than the number of focused bands in a continuum removal study. Moreover, several geometric parameters can be extracted from each band in a spectral measurement, including P, W, D, F, and AS, whereas only the information of depth for each band can be gleaned from a chemometric study. Note that some algorithms intrinsically provide a feature selection method (e.g., SMLR and PLSR), and it has been shown that the most important features selected by a regression model are the ones that we should pay the most attention to in a spectral measurement study [22].

Complexity.
eoretically, multivariate calibration is very complicated because it involves a larger number of algorithms and because different algorithms have the potential to be combined into better predictive models, depending on the situation [45,70,79]. However, in practice, multivariate modeling and prediction is not that complicated. anks to the development of executable and fast running software such as ParLeS and Unscrambler [75,80], the difficult calculation process can be done much more easily. On the other hand, spectral measurement studies cost more time because we must (1) identify a soil mineral based on the spectral features, (2) extract parameters from the bands, and (3) relate those parameters with the information about the soil mineral.

Application Preference.
e geometric features of the spectra are more suitable for monitoring the molecular structural changes of soil minerals, since the variations of the absorption bands are caused by electron transitions (e.g., Fe 2+ to Fe 3+ ) and molecular vibration (e.g., Al-OH versus H 2 O). us, spectral measurement is widely and successfully applied to (1) measure mineral physicochemistry that is sensitive to changes in metamorphic grade [53,81], (2) map and monitor mineral erosion, deposition, and weathering of minerals [24,51], and (3) explore water and potential life on extraterrestrial objects [10,21]. Chemometric methods are more often used in monitoring overall soil properties, since almost all of the signals in the VNIR domain are involved in the modeling process. Several soil attributes are successfully determined by an appropriate multivariate calibration technique, including soil clay [23,69], organic matter [32,67], and nitrogen content [82,83]. Table 2 is a review of some soil mineralogic attributes predicted by VNIR spectroscopy using either chemometric analysis or spectral-based measurement. In this summary, most of the studies used soil samples for analysis, and many of them are among diverse soil types (Table 2). e predictions of the soil properties are still good when there is great range of soil types (e. g., 22, 45, 84, and 85). A single mineral (e.g., kaolinite and goethite) is more precisely predicted when mineral mixtures are used in the measurements [2,86]. e studies in Table 2 include both data collected in the laboratory and data based on field soil sensing. In the lab, the sample pretreatment and illumination conditions can be controlled to eliminate the influences of the moisture and the grain size of the soil sample [18]. While in the field, the VNIR spectroscopy may be affected by many potential problems such as variable distances between the sensor and the soil, the smearing of soil surfaces, the size of the soil aggregates, and the amount of moisture [87]. ese potential problems may reduce the prediction accuracy of field-based analysis [22,85]. However, the field-based VNIR spectroscopy is more attractive because it (1) enables the potential analysis of soil properties with promising results in previous studies [87] and (2) reduces the cost of the measurement by simplifying the sample preparation. Based on the results of the studies, PLSR is proved to be the most robust soil mineralogic analysis method amongst all of the multivariate calibrations (Table 2; 0.43 < R 2 < 0.96). e CR-based model is good at predicting clay mineral concentration (e.g., 20, 47, and 88; R 2 > 0.79). In some cases, the nonlinear models (NN and MARS) exhibit better estimation in predicting soil mineralogy than the PLSR model (e.g., 45 and 89). In general, when a soil mineral is investigated by spectroscopy, the PLSR and the CR-based models are the most promising methods to provide estimates of mineral abundance.

Conclusions and Future Research Directions
Clay minerals in soils are more complex and less well crystallized than those in sedimentary rocks. Traditional characterization methods such as XRD are usually expensive and time-consuming, whereas VNIR is a quick, cost-efficient, and nondestructive technique for analyzing the soil mineralogic properties of large datasets. e major strength of soil mineralogy studies is that there is a direct relationship between soil minerals and their spectra, since the diagnostic absorption bands of soil minerals lie within the VNIR region. erefore, the nature of soil mineralogy can be approached through both spectral measurement and multivariate calibration. e spectral measurement is focused on geometric information extracted from several bands (e.g., 350-400, ∼1900, ∼2200, and 2450-2500 nm) that relate to soil minerals. e parameters derived from the continuum removal method are mainly used for mineral identification and prediction. In a multivariate calibration analysis, the dataset contains the entire VNIR domain. e most robust model for soil mineral estimation is selected after understanding the data, data preprocessing, candidate model building, and performance assessment.
Firstly, VNIR has been greatly developed in soil sciences over the past several decades. However, no definitive results on theoretical calculations have yet been found because most soil studies occur on a regional scale so their results are only regionally representative.
us, it is essential to further develop the theoretical calibrations of VNIR that are more suitable for soil samples worldwide, despite difficulties due to high soil variability across the globe. Secondly, more field analyses are required for obtaining full potential of VNIR. e in situ data collection in the field is one of the advantages compared with conventional techniques. e heterogeneity of the technical and environmental factors (e.g., soil moisture, soil surface condition, and biological residue) will directly influence the characteristics of the absorption bands, causing increased uncertainty of the spectral measurements. Nevertheless, multivariate calibration models for field data show good or even better mineral prediction than laboratory data. ere has been a lack of more systematic studies on the various effects of field sample data and variations in mineralogy, moisture, organic matter, and their interactions. erefore, future work should focus on these types of studies rather than laboratory spectra.
irdly, VNIR may have the potential to help us investigate interactions between soil clay minerals and SOC. Mechanisms of SOC stabilization have attracted increasing interest due to their potential to influence the global carbon cycle. It is widely suggested that soil clay minerals play a central role in capturing and permanently sequestering atmospheric CO 2 . Both clay content and clay mineral type exert important influences on the carbon sequestration. Because VNIR is capable of characterizing most of carbonand hydroxyl-related properties, it should allow us to study clay-SOC interactions when combined with the other common or state-of-the-art techniques.
Finally, integrated soil mapping is needed in future largescale soil analysis. e VNIR spectrum contains integrative information (e.g., mineral composition, SOM, SOC, pH, and moisture) of the soil attributes that reflect the nature of a soil system. us, we could use VNIR to map soils. More collaborative and strategic spectral studies are needed to better understand the complete nature of soil [101,102]. Some global or national spectral libraries [103,104] have been established to build collaborative networks for soil spectroscopy, but more spectral libraries will facilitate the wider use of VNIR and make global-scale soil monitoring possible.

Conflicts of Interest
e authors declare that they have no conflicts of interest. a Soil attributes: clay T , total clay content; clay 2:1 ,2:1 clay mineral content; Clay K , kaolinite content; Clay I , illite content; Clay S , smectite content; Fe T , total iron content; Fe H , hematite content; Fe G , goethite content; C, carbonate content. b soil type: soil type is not given in the study (not mentioned); the soil samples in the study include more than three soil types (diverse). c sample treatment: oven-dried, sieved (<2 mm), and measured under laboratory conditions (lab/sieved (2)-oven-dried); oven-dried, sieved (<0.2 mm), and measured under laboratory conditions (lab/sieved (0.2)-oven-dried); air-dried, sieved (<2 mm), and measured under laboratory conditions (lab/sieved (2)-air-dried); fresh (wet and unprocessed) and measured under field conditions (field/fresh). d n cal /n val shows the number of samples used in the spectral calibration and validation, respectively. CV indicates that the validation was conducted independently using a statistical cross-validation technique. e partial least squares regression (PLSR); multivariate adaptive regression splines (MARS); support vector regression (SVR); the discrete wavelet transform-artificial neural networks (DWT-ANN); continuum removal (CR); feature-based multiple linear regression (CR-MLR). f coefficient of determination. g root mean-square error (g·kg −1 ). -, RMSE is not given in the study. h mg·kg −1 .
10 Journal of Spectroscopy