Combining Near-Infrared Spectroscopy and Chemometrics for Rapid Recognition of an Hg-Contaminated Plant

1Research Institute of Applied Chemistry, College of Material and Chemical Engineering, Tongren University, Tongren, Guizhou 554300, China 2The Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei Province, College of Pharmacy, South-Central University for Nationalities, Wuhan 430074, China 3College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, China


Introduction
The growing development of agricultural, industrial, and urban activities has largely increased the release of toxic substances such as heavy metals and organic compounds to environmental systems [1][2][3].In particular, toxic heavy metals in air, water, soil, and plants have caused severe public environmental concern because of their severe adverse influences on human health [4][5][6].It is well known that soil is a major sink for heavy metal pollutants, which can be accumulated and transferred to water, air, plants, and animals.It was estimated that 20% of the total farmland in China had been contaminated, which directly threatens the safety of food production [7].
Numerous research efforts have been devoted to evaluation of the level of soil contamination with heavy metals caused by human activities, including electroplating industry, mining, smelting, coal-fired power stations, steel and iron manufacturing, waste incineration, leather industry, and cement production [8][9][10][11][12].Most of these researches focused on direct determination of heavy metal levels in the soil.Various analytical methods have been developed and used to quantify the levels of heavy metals in soil, plants, and animals [13,14], including inductively coupled plasma-atomic emission spectroscopy (ICP-AES), inductively coupled mass spectroscopy (ICP-MS), atomic fluorescence spectrometer (AFS), X-ray fluorescence spectrometer (XRF), neutron activation analysis (NAA), DC argon plasma multielement atomic emission spectrometer (DCP-MAES), atomic absorption spectrometer (AAS), and scanning electron microscopy with energy dispersive X-ray (SEM-EDX).Although accurate evaluation of heavy metals can be obtained, most of these techniques generally require laborious preconcentration of analytes and sample pretreatment, which have made the analysis time-consuming.
It is well known that excessive adsorption and accumulation of certain pollutants can influence the growth and metabolism of native plants [15,16].A traditional method to recognize and evaluate soil pollution is by examining the morphological variations of plant indicators caused by soil pollution, which are sensitive to the presence of certain pollutants.Although the level of soil pollution could only be qualitatively evaluated using plant indicators, it is more convenient and economic compared with direct methods by analysis of pollutants in soil.However, the use of plant indicators for soil pollution can be limited for some reasons.Firstly, because the plant species in an area can be influenced by many factors, such as geographical and climatic conditions, usually it is not ready to have a well-studied and suitable plant indicator in certain areas.Secondly, in some seriously polluted areas, the plants sensitive to soil pollution would perish and be gradually replaced with the dominant species, which have adapted to the pollution and whose morphological changes are not significant enough to be exactly recognized by the naked eye.Therefore, rather than examining the plants by the naked eye, it is more reasonable and reliable to characterize the changes in chemical compositions of polluted plants using instrumental techniques.
Near-infrared spectroscopy (NIRS) has been widely applied to analysis of various food and agricultural products [17][18][19][20][21][22].The feasibility of using NIRS for quantitative analysis of heavy metals in environmental samples has been extensively evaluated [23][24][25][26].Although in some cases NIRS demonstrates potential for quantitative analysis of heavy metals, in many cases, the sensitivity is lower than by using other methods and time-consuming sample preparation is required to obtain reliable results.Some studies also indicate NIRS is very useful for qualitative analysis of heavy metals [27].NIRS can provide a powerful tool to simultaneously characterize the multicompounds in a complex system, which could be combined with pattern recognition methods [28,29] to perform rapid classifications of different types of samples.
Therefore, the objective of this paper was to investigate the feasibility of rapid recognition of a native Hg-contaminated plant Miscanthus floridulus (Labill.)Warb.(MFLW) from normal MFLW using NIRS and chemometrics.Special attention was made on the experimental design and data collection to avoid obtaining artifacts caused by factors other than Hg-contamination. in Huashi, Tongren, China, within a range of 3 Kilometers; normal samples (2 = 116) were collected from an area about 10 Kilometers away (Chuandong, Tongren, China).All the MFLW samples were cleaned with water and kept in a cool, dry, and ventilated place away from direct sunlight to remove the moisture.Each sample (leaves and stalks) was crushed by a disintegrator and then the powders were sieved through an 80-mesh sieve.The dried, crushed, and filtered samples were kept with integrate packaging.An ultraviolet lamp was used to dry each sample for 10 minutes before NIR analysis and Hg reference analysis.The flowchart of sample preparation is shown in Figure 1.

NIRS Measurements.
Impacted MFLW powders were analyzed in a quartz sample cup using an Antaris II Fourier transform-NIR spectrometer (Thermo Electron Co., Waltham, Massachusetts, USA) using the RESTLT 3.0 software in the reflectance mode.The spectra were measured using a PbS detector with an internal gold background as the reference.The working range of spectrometer was 4000−10000 cm −1 .Each sample was measured triply while being stirred and impacted before each measurement and the average spectra were obtained.The number of scans for each measurement was 32.The instrumental resolution was 8 cm −1 with a scanning interval of 3.857 cm −1 , so each raw spectrum had 1557 wavelengths.The temperature was kept at around 25 ∘ C and the humidity was kept at a stable level during analysis.In order to avoid artificial spectral variations between different types of samples, the order of analysis for all the samples was permuted randomly.1. Hg contents were analyzed using an Agilent 725 ICP-AES system (Agilent, Victoria, Australia).The precision of ICP-AES analysis was verified by triplication of the samples.Pearson's  of the standard curve was over 0.9999.The average relative standard deviation (RSD) was less than 5.0% and the recovery rate was 96.1∼104.5%.The limit of detection (LOD) was calculated to be 0.0025 mg/Kg according to the IUPAC method, where the signal of 3 of 11 blank solutions was calibrated using the standard curve.

Chemometrics Analysis.
The data analysis was performed on MATLAB 7.0.1 (Mathworks, Sherborn, MA).In order to remove the unwanted variation in NIRS data, smoothing [30], taking second-order derivative (D2) [30], and standard normal variate (SNV) [31] were performed on the raw data.The DUPLEX algorithm [32] was used to divide the measured samples into a representative training set and test set.Partial least squares discriminant analysis (PLSDA) [33] was used to develop classification models to distinguish the Hg-contaminated from the regular samples.For PLSDA, a dummy response vector was constructed using +1 and −1 to represent the regular and Hg-contaminated samples, respectively.The number of PLSDA components was estimated using Monte Carlo cross validation (MCCV) [34].The number of PLS components was determined as to obtain the lowest error rate of MCCV (ERMCCV): where  is the number of MCCV data splitting and   and   are the numbers of misclassified and leave-out samples, respectively.For prediction, a cutoff value of zero was used to assign a new sample to one of the two classes.For prediction, the overall accuracy (ACCU) was computed to evaluate the performance of classification models: where TP, TN, FN, and FP represent the numbers of true positives, true negatives, false negatives, and false positives, respectively.In this work, regular and Hg-contaminated MFLW samples were seen as "positives" and "negatives," respectively.Another two usually used indices, sensitivity (SENS) and specificity (SPEC), were also adopted to evaluate the classification performance: SENS and SPEC describe the model ability to correctly accept the "positives" and to correctly reject the "negatives," respectively.

Results and Discussion
According to the analytical results of ICP-AES, the Hg contents of regular and contaminated MFLW objects ranged from 0.0 to 0.1 mg/Kg and 16.2 to 30.5 mg/Kg, respectively, indicating an obvious Hg-contamination of soil surrounding the mercury mining areas.The NIR spectra of regular and Hg-contaminated MFLW samples are shown in Figure 2.
Seen from Figure 2, the raw spectra of regular and Hgcontaminated MFLW samples have verysimilar absorbance peaks in the range of 4000-10000 cm −1 .The peaks can be mainly assigned as follows [35]: 8377 cm −1 (the second overtones of C-H stretching), 6823 cm −1 (overlapping of the first overtone of O-H stretching and N-H stretching), 5662 cm −1 (the first overtones of C-H stretching), 5184 cm −1 (the combination of the baseband of O-H stretching and the first overtone of C-O deformation), and 4748 cm −1 (combination of N-H stretching and deformation of peptide groups).Some bands (8377 cm −1 , 5662 cm −1 , and 4748 cm −1 ) are very weak and the peak resolution is very low. Figure 2 also demonstrates the NIRS data preprocessed by smoothing and taking D2 and SNV transformation.Even with data preprocessing, the spectral difference between regular and Hg-contaminated MFLW samples is still very subtle and is difficult to be distinguished by the naked eye.Therefore, it is necessary to develop chemometric models to extract the relevant information for classification of regular and Hgcontaminated MFLW samples.
In order to obtain representative data sets for developing and validating classification models, the DUPLEX algorithm was adopted to divide the collected samples into training and prediction objects.PLSDA models were developed with the raw and preprocessed spectra.With different numbers of PLSDA components, ERMCCV was computed and the model complexity was determined as to minimize the ERMCCV value.The number of MCCV data splitting was set to be 100 in this work.Considering the size of training set is moderate, in each MCCV data splitting, 30% of the training set was randomly left out for prediction and the other 70% training samples were used for model development.Based on different data preprocessing options, the model parameters and prediction performance are shown in Table 2.It can be seen that, with each data preprocessing option and even without data preprocessing, PLSDA could obtain perfect classification of regular and Hg-contaminated samples and the accuracy, sensitivity, and specificity were all 1, indicating data preprocessing was not necessary to develop an accurate model.Moreover, all the PLSDA models had 2 latent variables and the low model complexity means that the models would provide good generalization performance.The prediction results by PLSDA with different data preprocessing are shown in Figure 3, indicating distinct classification of regular and Hg-contaminated MFLW samples by PLSDA despite the kind of data preprocessing.By examining and comparing the predicted responses by PLSDA models with different preprocessing methods, the results by PLSDA with raw data and smoothed spectra were very similar, which were obviously different from those obtained by PLSDA with D2 and SNV spectra.Moreover, the prediction errors (with references to the dummy response vector of +1 and −1) obtained by PLSDA with D2 and SNV spectra were much lower than those obtained by PLSDA with the raw and smoothed spectra.Although all the four PLSDA models could achieve a classification accuracy of 1, D2 and SNV were still necessary to remove some unwanted spectral variations to ensure the generalization performance of PLSDA when predicting new samples.

Conclusions
The feasibility of using NIRS for rapid classification of regular and Hg-contaminated MFLW samples was investigated.Classification accuracies of 1 were obtained with low model complexity despite the option of data preprocessing.D2 and SNV were demonstrated to be useful to improve training accuracy by removing unwanted spectral variations.Rapid recognition of the Hg levels in the native plant MFLW would provide a useful alternative indicator of Hg-contaminated soil, which can be used for rapid and economic screening of Hg-contamination.Our future research would be focused on investigating the feasibility of other plants as soil indicators as well as on developing the relationship between the levels of heavy metals in plants and soil.
Considering the regular and Hgcontaminated MFLW samples have different distributions, the DUPLEX algorithm was performed separately on the two classes.The 116 regular samples were split into 80 training and 36 test samples; the 125 Hg-contaminated objects were split into 85 training and 40 test samples.For model building, the training and test samples from the two classes were combined to form the final training and test sets, so 165 (80+85) training samples and 76 (36 + 40) test samples were obtained.

Figure 2 :Figure 3 :
Figure2: Some of the raw, smoothed, D2, and SNV spectra of regular and Hg-contaminated MFLW samples.An artificial shift was included to distinguish the spectra of regular and Hg-contaminated samples.

Table 1 :
The programmed microwave-assisted digestion conditions for ICP-AES analysis of MFLW samples.
∘ C for 5 h.The programmed digestion conditions are summarized in Table

Table 2 :
Discrimination of regular and Hg-contaminated MFLW by PLSDA of NIRS data.