Visible and Near-Infrared Spectroscopy Analysis of a Polycyclic Aromatic Hydrocarbon in Soils

Visible and near-infrared (VisNIR) spectroscopy is becoming recognised by soil scientists as a rapid and cost-effective measurement method for hydrocarbons in petroleum-contaminated soils. This study investigated the potential application of VisNIR spectroscopy (350–2500 nm) for the prediction of phenanthrene, a polycyclic aromatic hydrocarbon (PAH), in soils. A total of 150 diesel-contaminated soil samples were used in the investigation. Partial least-squares (PLS) regression analysis with full cross-validation was used to develop models to predict the PAH compound. Results showed that the PAH compound was predicted well with residual prediction deviation of 2.0–2.32, root-mean-square error of prediction of 0.21–0.25 mg kg−1, and coefficient of determination (r 2) of 0.75–0.83. The mechanism of prediction was attributed to covariation of the PAH with clay and soil organic carbon. Overall, the results demonstrated that the methodology may be used for predicting phenanthrene in soils utilizing the interrelationship between clay and soil organic carbon.


Introduction
Polycyclic aromatic hydrocarbons (PAHs) are the class of hydrocarbons containing two or more fused aromatic hydrocarbons. PAHs are found in large quantities in mineral oil such as diesel fuel #2 (Chemical Abstract Service [CAS] no. 68476-34-6)-simply referred to as diesel. PAHs are harmful to the environment because some are potentially carcinogenic or mutagenic [1]. The United States Environmental Protection Agency (USEPA) has classified sixteen PAHs (including phenanthrene) as priority pollutants for investigating PAH pollution in the environment (Figure 1).
These priority PAHs have molecular mass ranging from 128 gmol −1 for naphthalene to 278 gmol −1 for dibenzo [a,h]anthracene. The solubility, sorption, and vapour pressure characteristics of PAHs are important factors that control their distribution in the environment [1,2]. PAH compounds are known to exhibit very low water solubility. Only few are slightly soluble. Since PAHs exhibit strong hydrophobicity, they primarily sorb to the organic matter in the soil [2].
The concentration and compositional distribution of PAHs in an environmental sample such as soil are widely used to identify their origin or source by PAH diagnostic ratio analysis [3][4][5]. The nature and concentration of PAHs are vital in tackling their proliferation in the environment by informing risk assessment and remediation. Conventional methods of quantifying PAHs in contaminated soils such as gas chromatography-mass spectrometry (GC-MS) are highly sensitive and specific [6]. But, they are relatively expensive, involve time-consuming sample preparation protocols, and rely on the use of noxious extraction solvents that tend to pose a health risk to operators [7]. This has prompted increasing demand for alternative methods capable of overcoming most of those challenges, without necessarily having to trade off instrument performance, to complement the conventional methods.
One of these emerging alternatives is the VisNIR spectroscopy analysis. VisNIR prediction is based on overtones and combinations of fundamental vibrations occurring in the mid-infrared region. The difficulty in interpreting VisNIR Figure 1: Chemical structures of the 16 United States Environmental Protection Agency (USEPA) priority PAH compounds. * Nonthreshold indicator compounds, also known to possess some genotoxic carcinogenic potential [8] spectra because of broad and overlapping bands [9], to a large extent, has been overcome by the use of advanced chemometrics and data-processing techniques [10]. When analysing spectroscopic data, multivariate calibration generally solves the problem of interference from compounds closely related to the target compound, thereby eliminating the need for selectivity [11]. The origin of VisNIR spectra of hydrocarbon derivatives is attributed to the combinations or overtones of C-H stretching modes of saturated CH 2 and terminal CH 3 , or aromatic C-H (ArCH) functional groups [12]. The intensity and wavelength positions of the vibrating molecule typify the origin of the spectral feature and can be used to identify the properties of the substance by multivariate analytical techniques [10]. The total petroleum hydrocarbon (TPH) content in hydrocarbon-contaminated soils has been predicted by VisNIR spectroscopy using various multivariate techniques [13][14][15][16][17][18][19]. A recent review shows that the quality of these TPH models has improved recently, suggesting a greater likelihood of using VisNIR spectroscopy as a screening tool for hydrocarbon-contaminated soils [7]. It must be pointed out that because of the high variability in a sample or site matrix, more research is still necessary to achieve the required level of accuracy prescribed by the industry and/or regulatory agencies for an online manufacturing system. This is because the models reported so far appear to be local models developed for specific sampling sites or individual soil types. Nevertheless, these models may provide vital information for the development of regional or global model for general application. Moreover, only benzo[a]pyrene out of the sixteen priority PAHs has been predicted by VisNIR spectroscopy demonstrating good accuracy (average prediction accuracy = 78.9%) but moderate to high false-positive rates. [20]. The high false-positive rates reported for benzo[a]pyrene by Bray et al. [20] and the absence of literature for the remaining PAHs accentuate the need for further research on the application of the approach for the determination of other PAHs in soils. As stated earlier, proper knowledge of the nature and concentration of individual PAHs in soils is crucial to riskbased assessment and remediation of petroleum release sites.
The objective of the current study was to evaluate the prediction accuracy of phenanthrene in diesel-contaminated soils by VisNIR spectroscopy and PLS regression analysis.
The Scientific World Journal 3 The first batch consisting of five groups of samples was given various treatments including drying, sieving, and wetting before contamination with diesel (graded set). The second batch, which is the sixth group, was contaminated with diesel without drying and sieving. Each group consisted of 25 samples all taken from one field except for the sixth group that was constituted with 5 samples from each field. This sixth group was used as field-moist intact samples, representing, mixed moisture content set. In all, a total of 150 soil samples were prepared. Drying was done by oven-drying method at 105 ± 5 ∘ C for 24 h while sieving was done with a 2 mm mesh size. Wetting was done by adjusting the moisture content of the dried samples to 0 (dry), 5, 10, 15, and 20% (w/w) with an adjustable-volume pipette (Eppendorf UK Ltd., Stevenage, UK). In wetting the samples, 5 samples in each group were adjusted to the same moisture content. Samples were contaminated with 852, 1136, 1420, 1705, and 1989 mm 3 of diesel with an adjustable-volume pipette. The expected concentrations of the added diesel were 30,000, 60,000, 90,000, 120,000, and 150,000 mg kg −1 of soil, respectively. Similarly, 5 samples in each group were contaminated with the same oil concentration. Water (graded set only) and diesel were then added to the samples. The weight of each sample was approximately 25 g.

Reference Laboratory Analysis of Soil Physicochemical
Properties. Soil particle size distribution was determined by laser diffraction with a Mastersizer2000 (Malvern Instruments, Worcestershire, UK) coupled to a HydroMu dispersing unit (Malvern Instruments, Worcestershire, UK). We used the United States Department of Agriculture (USDA) soil textural classification scheme to determine the soil textural classes on the basis of percent clay, silt, and sand. Soil moisture content (w/w), on dry basis, was determined by the oven-drying method at 105±5 ∘ C for 24 h. Soil organic carbon was determined by the standard operating procedures of the Cranfield University based on British Standard 7755 Section 3.8 : 1995 [22] with a Vario EL III Analyzer (Elementar Analysensysteme, Hanau, Germany) (see Table 1). We extracted PAH compound from the diesel-spiked soil samples by the sequential ultrasonic solvent extraction method [23] with a mixture of dichloromethane (DCM) and hexane (1 : 1). Cleanup of the PAH extract was carried out with an ∼0.6 g Florisil (Fisher Scientific Ltd., Loughborough, UK), on glass wool, microscale column prewashed with DCM. PAH analysis was carried out with a 6890N Network Gas Chromatographic System (Agilent Technologies Inc., USA) coupled to a 5973 Network Mass Selective Detector (MSD) (Agilent Technologies Inc., USA) operated at 70 eV in positive ion mode. PAH compound was quantified by the internal standard method. The instrument was calibrated beforehand with a 5-level calibration solution mix. The calibration solution mix was made up with a EPA 525 PAH Mix-A standard solution (Sigma-Aldrich Co. Ltd., Dorset, UK) and deuterated PAH internal standard solutions: naphthalene-d 8 , anthracene-d 10 , chrysene-d 12 , and perylene-d 12 (Sigma-Aldrich Co. Ltd., Dorset, UK). Quantification of PAH was performed by integrating the peak at specific mass-to-charge ratio ( / ) by means of MSD ChemStation. Major hydrocarbon compounds were identified on the basis of their retention time and by comparing them to those of analytical standards. Matrix spikes, duplicates, solvent, and method blanks were also analysed as quality control samples.

Optical Scanning of Soil Samples.
Diffuse reflectance spectra were taken from the soil samples with a mobile fiber-optic LabSpec2500 VisNIR spectrophotometer (350-2500 nm) (Analytical Spectral Devices Inc., USA) coupled to a high-intensity probe (Analytical Spectral Devices Inc., USA). The spectrophotometer has one Si array (350-1000 nm) and two Peltier-cooled InGaAs detectors (1000-1800 nm and 1800-2500 nm). Spectral sampling interval of the instrument was 1 nm across the entire spectral range. However, the spectral resolution was 3 nm at 700 nm and 10 nm at 1400 and 2100 nm. The high-intensity probe has a built-in light source made of a quartz-halogen bulb of 2727 ∘ C.

4
The Scientific World Journal The light source and detection fibres are assembled in the high-intensity probe enclosing a 35-degree angle. Before the soil samples were scanned, and at intervals of 30 min, whitereferencing with a Spectralon disc of almost 100% reflectance was carried out to optimize the instrument. Scans were taken from the soil sample, tightly packed and leveled in a cuvette, at three equidistant positions, 120 ∘ apart. Each sample was scanned nine times, three times per spot, and averaged for spectral preprocessing and multivariate analysis.

Spectral Preprocessing.
Spectral preprocessing aims to reduce spurious peaks that do not contain the physical or chemical information and to correct physical scatter effects [11]. Perceived noises at the extremes of the spectrum (i.e., at 350-449 nm and 2451-2500 nm) were removed because of low instrument sensitivity at these wavelengths. Spectral truncation was followed by smoothing by averaging the adjacent 5 nm wavelengths to reduce the impact of noise. Thus, the VisNIR wavelength for modeling was in the range 452-2450 nm consisting of 401 wavelengths. To remove the additive baseline shift, the spectra were transformed by the Savitzky-Golay first derivative of polynomial order of two and two smoothing points. This was implemented for all sample subsets using the Unscrambler 9.8 (CAMO Software AS, Oslo, Norway).

Partial Least-Squares (PLS) Regression Analysis.
Before calibration, spectral reflectance (R) was transformed to the logarithm of the relative intensity (1/R) or absorption [11]. The PLS regression analysis is a bilinear modeling method where information in the original -data is projected onto a small number of underlying ("latent") variables called PLS components. The -data are actively used in estimating the "latent" variables to ensure that the first components are those that are most relevant for predicting the -variables. Interpretation of the relationship between -and -data is then simplified as this relationship is concentrated on the smallest possible number of components (latent variables, LV). More detailed information about the PLS can be found in Martens and Naes [24]. We used PLS regression analysis with full crossvalidation to relate the variation in a single-component variable (e.g., PAH) to the variation in a multicomponent variable (e.g., wavelength) with Unscrambler 9.8 (CAMO Software AS, Oslo, Norway). In this study, two categories of PLS models were developed. In the first category, models were developed for each oil treatment level as well as the field moist samples. To do this, the VisNIR spectra of the graded samples were separated into subgroups of 25 spectra according to oil treatment levels regardless of the moisture content. Then, the 25 spectra and chemical variables were used together to develop a PLS regression model for each treatment level. This was to determine and compare the ability of the technique to predict PAH across the range of moisture contents and diesel concentrations used and under field-moist conditions. Fieldmoist PLS models were developed with the 25 field-moist spectra and chemical variables together. Up to twelve LVs were considered, and the optimal number of LVs for future predictions was determined on the basis of the number of factors at the first local minimum [11]. In the second category, a general model was developed to predict the PAH compound with the entire 150 samples. In this category, the entire dataset was randomly separated into calibration (76%) and prediction (24%) sets. The ratio of calibration/prediction samples was chosen to ensure that each sample subset (group) was equally represented in the prediction set by randomly choosing 6 samples from each sample subset. PLS regression analysis with full cross-validation was carried out with the calibration set. The prediction set was then used to test the prediction accuracy of the calibration model. During model calibration in the second model category, potential outliers were identified on the basis of their influence on therelationship. Spectra that differed from the reference by three times the standard deviation of the predicted residuals were removed from the calibration dataset [25]. Model quality was statistically analyzed by the root-mean-square error (RMSE) of cross-validation and prediction, residual prediction deviation (RPD) (i.e., the ratio of standard deviation of laboratorymeasured sample concentration to the RMSE), and corresponding coefficient of determination ( 2 ) [11]. RPD was originally defined by Williams and Sobering [26]. Model prediction ability was categorised based on the following criteria: excellent (i.e., symbol A) if RPD > 2.0, almost good (symbol B) if 1.4 ≤ RPD < 2.0, and unreliable (symbol C) if RPD < 1.4 [27]. Category of prediction is the ability of PLS full-crossvalidation analysis for parameter validation and prediction [27]. Table 2 summarizes the statistics of the PLS models for phenanthrene in calibration dataset. As can be seen in Table 2, 90% of the PLS models (excluding the field-moist models) were classified as almost good to excellent prediction (cross-validation) ability for model parameters, with 70% of the models in the excellent category, signifying the possibility for quantitative applications [28]. The model developed for 90,000 mg kg −1 oil treatment level using first derivative transformed spectra was the best (RPD = 3.88) as compared to others ( Table 2). Only 10% of the models were unreliable. There is a considerable difference in the value of 2 between 120,000 mg kg −1 and others. This poor correlation is because sample concentrations are not uniformly distributed over the working concentration range of 0.02 to 2.80 mg kg −1 (table is not shown). For the field-moist intact soil samples (representing mixed moisture content samples), model prediction ability was also classified as excellent. This suggests that VisNIR spectroscopic method may be used for predicting PAH in soils without lengthy sample preparations. These results largely demonstrate that the VisNIR spectroscopic method may be a useful tool for evaluating PAH concentrations in soils. Nevertheless, the success of the methodology in broader environmental applications will depend on a number of factors including (among others)

Calibration Models of Phenanthrene.
The Scientific World Journal 5 the accuracy (i.e., extraction efficiency) of the reference analytical method and weathering process of petroleum products in soils over time. These two factors explain why, for instance, doubling the amount of added diesel oil does not double the PAH concentrations as one would expect ( Table 2). In connection with providing information for risk assessment and remediation of contaminated soils, further research is required to ascertain the prediction ability of the approach for other PAHs. It would be necessary to point out that the determination of the optimal condition for VisNIR spectroscopic method and the effects of the treatment levels on the prediction ability of the approach for phenanthrene is outside the scope of the present paper. Figure 2 shows the full wavelength (350 to 2500 nm) mean Vis-NIR spectral reflectance curves of the diesel-contaminated soils. As can be seen from the figure, soil spectral reflectance decreased with the increasing diesel concentration particularly in the NIR region (700-2500 nm). Spectral absorption minima of hydrocarbon-based oil are apparent around 1647, 1712, and 1759 nm in the first overtone region of the NIR band ( Figure 2). The absorption around 1647 nm is attributed to C-H stretching modes of ArCH [12,29,30] likely linked to PAH. Absorptions around 1712 and 1759 nm are attributed to C-H stretching modes of terminal CH 3 and saturated CH 2 groups linked to TPH, both present in the contaminating diesel fuel [12,15,29,30]. Table 3 summarizes the statistical results of PLS models in cross-validation and prediction sets for the prediction of phenanthrene. In this model category, the entire 150 VisNIR spectra were combined including both the treatments and levels in the overall model to predict phenanthrene. Only three outlying samples were removed from the calibration dataset (114 samples). From Table 3, the PLS regression models use six LVs for the first derivative and ten LVs for the reflectance models. These are comparable to the range of 6-8 LVs reported for the prediction of saturates, aromatics, resins, and asphaltenes (SARA) fractions in crude oil [12]. The histogram plot of the error distribution between measured and predicted dataset in cross-validation and validation sample set obtained after PLS regression analysis is shown in Figure 3. The histogram shows that about 75% of the error in cross-validation set is less than 0.34 mg kg −1 in absolute values. In the validation set, 72% of the error is less than 0.25 mg kg −1 in absolute values. This indicates that the number of samples with low error is relatively larger than the number with high error (Figure 3). The histogram also shows that the error distribution is normally distributed for the validation dataset but slightly skewed for the cross-validation dataset. The skewness in the positive range is largely attributed to some cases of underestimation of the PAH in some samples by VisNIR method (Figure 3). Nonetheless, PLS model prediction ability was classified as excellent (Table 3), demonstrating the possibility of adopting the methodology for quantitative determinations. The Scientific World Journal Table 3: Sample statistics and results of partial least-squares (PLS) models for the prediction of phenanthrene in cross-validation and prediction datasets for diesel-contaminated soil samples by visible and near-infrared (VisNIR) spectroscopy.

Regression Coefficients.
In PLS regression analysis, regression coefficients are approximations of model parameters resulting from the linear combination of the predictors. The regression coefficients plot is used to identify important wavelengths for the prediction of relevant soil properties. Figure 4 shows the bar plot of regression coefficients versus wavelength derived after PLS regression analysis with fullcross-validation for 10 LVs using raw reflectance spectra of 114 calibration samples. In the bar plots, the absolute value of the regression coefficients indicates the relative importance of the wavelength on the basis of explained -variance in the model. Variables with large coefficient play an important role in the model; a positive coefficient shows a positive link to the response, and a negative coefficient shows a negative link [31]. Nonetheless, importance is not restricted to positive coefficients. This plot over the modeling wavelength range of 452-2450 nm shows that the intensities of regression coefficients vary considerably in magnitude ( Figure 4). In the bar plot (Figure 4), negative coefficients around 1712 and 1759 nm (albeit low) show a link to absorptions due to vibrational C-H stretching modes of terminal CH 3 and saturated CH 2 functional chemical groups linked to TPH [12,15]. But positive coefficients around 1647 nm are consistent with absorption bands due to vibrational C-H stretching modes of ArCH functional groups [29,30] suggesting a link to PAH. These agree with the fact that the spectra of hydrocarbon derivatives originate mainly from combinations or overtones of ArCH functional groups or C-H stretching modes of saturated CH 2 and terminal CH 3 groups [12,15]. In addition to the hydrocarbon absorption bands, large regression coefficients were observed at some other wavelengths in the Vis-NIR range. This indicates the covariation of PAH with other soil properties having direct spectral responses in the VisNIR range. Soil properties that have direct spectral responses in the NIR range are moisture, clay minerals, and organic carbon, as well as color influence in the visible (Vis) range [32,33].
Negative coefficients around 497 nm (Figure 4) are linked to blue color absorption band in the Vis range of the spectrum. Reports have shown that changes in soil color are also linked to changes in the amounts of water and/or diesel in the soil [34,35]. Soil becomes darker with the increase in water content and diesel concentration resulting in an overall increase in absorption or decrease in reflection [34,35]. Coefficients observed around 950 nm (positive), 1450 nm (positive), and 1950 nm (negative) in the NIR spectrum are absorption bands of water ( Figure 4). In the NIR range, O-H stretching modes of water are responsible for the absorptions around 950, 1450, and 1950 nm in the O-H second and first overtones and combinations band, respectively [36]. Coefficients seen around 2200 and 2300 nm (Figure 4) signify absorption features linked to metal-OH bend plus O-H stretch combinations that are characteristic of clay minerals [2,37]. Negative coefficients can also be seen around 2150 nm (Figure 4), which are absorption features attributed to longchain C-H + C-H and C-H + C-C stretch combinations unique to soil organic carbon [15]. These absorption features are also conspicuous in the mean VisNIR spectral reflectance curves shown in Figure 2 earlier.
The covariation of clay and organic carbon with PAH is particularly important in the absorption and reflection of PAH (including phenanthrene) in contaminated soils. As stated, PAHs exhibit strong hydrophobicity and largely sorb 8 The Scientific World Journal to the organic carbon in the soil [2]. Dexter et al. [21] suggested that it is not the total amount of organic carbon that controls soil physical behaviour but the amount of complexed and noncomplexed organic carbon (COC and NCOC, resp.). The NCOC is present in soil only if the Dexter index, = clay/organic carbon, is less than 10 [21]. It is also reported that the NCOC has a higher sorption affinity for PAH than the COC [38]. The Dexter deduced for each soil texture before contamination is shown in Table 1. For all soil texture used in this study, Table 1 shows that only COC was present (as > 10) in all of them. As a result, a significant proportion of PAH is not sorbed to soil because of the absence of NCOC and low PAH sorption affinity of COC. This may explain the amount of NIR spectral signal of sorbed phenanthrene (a PAH) detected. On the other hand, it could be suggested that the presence of NCOC in soil, if < 10, would immobilize PAH within the soil matrix, which might cause increased soil absorption and reduced soil reflection. The swelling characteristics of clay are due to clay minerals, which is such that the higher the clay content, the larger the water retention capacity [9,32,34] and the lower the soil diffuse reflectance [39]. Thus, mineralogy of clay may also play a role in controlling the mobility of PAH. The interrelationship between clay and organic carbon may be useful for evaluating the presence of PAH in soils. Presently, the tendency for clay-organic carbon interactions to partly dictate the behaviour of PAH sorption to soil is not well-known, even by conventional laboratory methods, and requires further investigation [38]. Since studies on the application of VisNIR spectroscopy in the determination of PAH in soils are few in the literature [23,35], there is a dearth of information on the effect of organic matter naturally occurring in the soil on the measurement/prediction of the presence of PAH contamination in soils with VisNIR.

Conclusions
In the current study, results confirmed the following conclusions: (1) phenanthrene could be predicted with reasonable accuracy (RPD = 2.0-2.32, RMSE = 0.21-0.25 mg kg −1 , and 2 = 0.75-0.83) by VisNIR spectroscopy and (2) the interrelationship between clay and organic carbon may explain the intensity of NIR spectral signal of sorbed PAH in soil. Therefore, the clay-organic carbon interactions may be useful for evaluating the presence of PAH in soils by VisNIR spectroscopy. However, the unique spectral signal of sorbed phenanthrene observed in the current study requires thorough investigation in relation to providing evidence for risk assessment and remediation of contaminated sites.