Rapid Determination of Leaf Water Content Using VIS / NIR Spectroscopy Analysis with Wavelength Selection

Abstract. Water content in plants is one of the most common biochemical parameters limiting efficiency of photosynthesis and crop productivity. Therefore, it has very important meaning to predict the water content rapidly and nondestructively. The objective of this study was to investigate the feasibility of detecting the water content in the leaf using the diffuse reflectance spectra limited in the VIS/NIR region (400–1100 nm), which could be used to determine other biochemical parameters such as chlorophyll and nitrogen content. The experiment with leaves in different water stress was conducted. The statistical test result indicated that the determination of water content in leaf could be successfully performed by VIS/NIR spectroscopy combined with chemometrics method. The performances of different pretreatment methods were compared. The model with best performance was obtained from the first derivative spectra. In order to make the calibration model more parsimonious and stable, a hybrid wavelength selection method was proposed to extract the efficient feature wavelength. Under the optimal condition, an RMSEP of 0.73% with 25 variables was obtained for water content prediction using extern validation. The conclusions presented could lead to the development of portable instrument for synchronous detecting water content and other biochemical parameters rapidly and nondestructively.


Introduction
Water is one of the most important matters in high plant.Water stress restricts transpiration including closure of stomata and less water evaporating from the leaf surface.Further, it reduces efficiency of photosynthesis and limits crop productivity [1][2][3].Thus, it has important implications in agricultural management practices to detect the water content in plant.Accurate water content estimation is needed to make irrigation decisions and predict crop yields in the field of agriculture [4].Water status of plant can be indicated by the features of tissues such as root, stem, and leaf or the whole canopy.Compared with other tissue of plant, leaf analysis is the most important tool for evaluating nutrient and water status of plant, which are used for guiding its fertilization and irrigation [5,6].Because leaf is metabolically very active, it is the location of photosynthesis processing which is the most important biologic reaction.
Presently, studies about water content determination in plants mainly come forth in the field of remote sensing and multispectral imagery.Water determinations of different scales from canopy to leaf are actualized with reasonable accuracy relying on inversion physical model and spectral indices or ratios between reflectance values at specific wavelengths [7][8][9][10][11][12].But this methodology is suitable for estimating coarsely in a large area and expensive for daily usage.Moreover, these spectral indices of leaf water content found in air-drying leaves are only affected by large changes in water content and could not detect the water stress [13,14].It is thus difficult to obtain the information related to the physiological water status of fresh leaf from the spectra.In many applications, such as greenhouse in precision agriculture or plant growing box in space, the biochemical parameters need to be determined exactly and real time in order to further achieve feedback control of the environment parameters.On this occasion, portable instrument and accurate model are needed to estimate leaf biochemical parameters in real time.
Some researches aiming to developing new methods into precision irrigation in the production processing of primary products have been studied.Kriston-Vizi et al. used the visual multispectral imagery to assess the water status of mandarin and peach canopies by the leaf water potential [15].The leaf reflectance of three channels, respectively, with green (490-580 nm), red (580-760 nm), and nearinfrared (760-900 nm) were obtained.The result showed that a moderately good correlation was found between red reflectance and leaf water potential as well as between green reflectance and leaf water potential.Mizulami Y measured the water content in tea leaves using the electrical spectroscopy [16].Samples with different maturity degrees were separated to construct different regression equations using impedance and capacitance simultaneously.Results showed high correlation coefficient and satisfactory levels of standard error were obtained at each stage of maturity.Gillon et al., explored the relationship between the water content of leaves and their spectral properties using a very wide region of near infrared spectra (400 to 2500 nm) [17].In order to give the general relationship and prediction precision, abundant samples from eight common Mediterranean tree and shrub species from two sites during the summers of 2001 and 2002 were analyzed in this work.PLS regression method was used for the calibration model establishment.The satisfied results were given with R 2 = 0.93-0.99,SECV = 2-7% for single species from the same year and R 2 = 0.92-0.95,SECV = 7% for multispecies.Li investigated the performance of genetic algorithm coupled with partial least squares (GA-PLSs) modeling of spectral reflectance in retrieving equivalent water thickness (EWT) at leaf and canopy level [18].Though the aim of this paper was to estimate water content from remote sensing data, it provided us a mind to apply chemometrics method to determine the water content in leaf.
Spectroscopy analysis as an efficient technique for nondestructive, rapid, and accurate measurement is widely applied in agriculture fields such as plant category discrimination, inspection of diseases and nutritional status, fruit quality parameters assessment, and category discrimination [19][20][21][22].Physical-based studies have shown that the spectral features in reflectance spectra of green vegetable in the 900-2500 nm region are dominated by liquid water absorption and also weakly affected by other biochemical components absorption.Water absorption features as a result of absorption by O-H bonds can be found at approximately 760 nm, 970 nm, 1200 nm, 1450 nm, and 1950 nm [23].
The absorption features at 1450 nm, and 1950 nm are most pronounced.However, the information of other biochemical parameters such as chlorophyll content, nitrogen content would be masked by the water strong absorption in this region.Considering the perspective of precision agriculture or parameter control in the growing environment, synchronous determination of diverse parameters is very important and necessary.It can be found that the absorption features of other biochemical parameters are reflected in VIS/NIR spectra.In addition, it is possible to develop the small and portable instrument using the region.Therefore, it has important meaning to investigate the determination of water content using VIS/NIR region spectrum (400-100 nm).
In this study, the feasibility of nondestructive and rapid measuring the water content in the leaves using VIS/NIR spectroscopy was studied.At the same time, in order to enhance the precision of the model, different preprocessing methods in combination of PLS were used and compared to ascertain the optimal calibration model; in order to enhance the model stability and explicable ability, efficient feature wavelengths were selected by a hybria wavelength selection method (i.e., Backward Interval PLS in combination with Successive Projection Algorithm, Bipls-SPA).It is expected to provide the basis for development of the portable noninvasive water inspection instrument.

Sample Preparation
The experiment was conducted in the greenhouse from Beihang University in China (116 • 46 E, 39 • 92 N). 25 potted plants of Epipremnum Aureum were planted with different water stress levels to obtain heterogeneous water status of the leaves tested.Five levels of water supply input (the irrigation frequency were 3, 6, 9, 12, and 15 days, resp.) were applied.Three fully expanded leaves per plant from bottom, middle, and top were collected.All of which were healthy and homogeneous in color without anthocyanin pigmentation or visible symptoms of damage.A total of 75 samples of leaves with different water content were obtained.In order to minimize water loss during the transfer of the leaves to the laboratory, leaves were immediately enclosed in a black plastic bag after being picked.

Water Content Analysis
The leaf relative water content (RWC), which is used as the reference value of water content, was determined through the method of roasting as where FW was the fresh weight, and DW was the dry weight.Leaves were first weighed quickly using an analytical balance (Mettler Toledo AL104, Switzerland) after scanning the corresponding spectrum and their FW were recorded.Then, they were dried at 120 • C in a circulation oven for 20 minutes, and the temperature dropped to 80 • C until the constant weight (dry weight, DW) was reached.

Spectra Acquisition
Evolution 300 UV-VIS spectrometer (Thermo, America) was used for acquiring the leaf diffuse reflectance spectra.The range of the spectrometer was 200-1100 nm with an adjustable resolution of 1, 2, and 4 nm, respectively.In order to get the refined spectra feature of leaf, the resolution was set as 1 nm in this study.The whole experiment was carried out at approximately 25 • C. The diffuse reflectance spectra were measured with the reflectance accessories integrating sphere.Before the sample spectrum measuring, a baseline correction was run with a standard white panel (Spectralon, Labsphere) of known reflectance on the reflectance sample port.The adaxial epidermis of leaf was fixed on the sample port closely, and the reflectance spectrum of each leaf was obtained.For each leaf, three diffuse reflectance spectra were measured randomly at different locations, and the averaged spectrum of three spectra was used for analysis and stored as the reflectance (R).The spectral information from the region of 200-400 nm is removed due to its low SNR.Hence, only spectra data in the range of 400-1100 nm was take into account for analysis.

Calibration Method
Partial least squares (PLS) analysis [24], which is widely used for calibration in present chemometrics analysis, was performed for the establishment of the water content prediction model.Compared with other linear regression methods like PCA and MLR, the collinearity variables in PLS regression can be expressed in the form of latent variables which are linear combination of observed variables, synchronously, these latent variables have the maximum pertinence with target concentration.The performance of the calibration model was evaluated in terms of the root mean square error of calibration in cross validation (RMSECV).The prediction ability of established model for independent test samples was evaluated by root mean square error of prediction (RMSEP), residual predictive deviation (RPD), and the determination coefficient (R 2 ): where y i , y i denoted the reference value and predictive value of the ith sample in prediction sets or leave-one-out cross-validation sets, respectively, − y denoted the mean value of reference value in the data set, nc and np were the number of calibration and prediction sample, std(Y ) was the standard deviation of reference data.R 2 is the percentage variation explained by the regression model.Generally, a good model should have high R 2 and RPD with low RMSEC and RMSEP value.

Preprocessing Method
Usually the efficient information related with target substance content in spectra is covered by some useless information.It could be the noise and baseline coming from the detector and background or other unwanted variety caused by the physical structure difference between samples.Thus, preprocessing is very important for a good calibration model establishment in spectroscopy analysis.It is required to remove any irrelevant information including noise, uncertainties, variability, interactions, and unknown features.The common steps of preprocessing are: (1) remove noise and baseline first, and (2) correct the difference of samples.There have been lots of preprocessing methods for every step.In this study, the influence of the following preprocessing methods on PLS had been studied including smoothing ways of moving average filters (MA) and Savitzky-Golay (SG) for denoising, multiplicative scatter correction (MSC), and differential coefficient (first and second derivatives, D1and D2) for correcting sample difference.

Wavelength Selection
The aim of the wavelength selection is to extract the most informative wavelength combination for constructing the stable and parsimonious water content detecting model.For this purpose, a hybria wavelength selection strategy, that is, Bipls-SPA was proposed to select the feature wavelength.
Successive Projection Algorithm (SPA) is a successful wavelength extraction method based on the projection technique [25].it reduced the redundant information of spectral and variable collinearity as much as possible to make the model more stable and precise.However, for the spectra of leaf in VIS/NIR region, the information of pigment absorption and scattering are the main background, which conceals the water absorption information.In addition, water absorption in the range of 400-1100 nm is very weak.Therefore, It is difficult for SPA to extract variables concerning with water content.On the other hand, it takes long time to implement the algorithm on the full spectra.Therefore, the effective wavelength intervals were obtained firstly by Backward Interval PLS (Bipls) algorithm.Then, SPA was applied on the wavelength intervals obtained by Bipls to extract more effective feature wavelengths.
The Bipls was developed by Nørgaard et al. [26].Its principle is to split the whole spectra into many smaller equidistant regions and construct PLS regression model in each region.Then, the best combination of intervals is searched through backward selection method.The process is described as following: the dataset is split into a given number of intervals, PLS models are calculated with each interval left out.The first interval left out is the one which is leaf out, the poorest model with respect to RMSCV is obtained by using the other intervals.This procedure is continued until the RMSECV begin to increase leaving any interval out.For the usage of Bipls, the number of interval is the key parameter.If too small intervals are chosen, useless information masked the absorption information could not be removed and made the information extracted difficultly; if too many intervals are chosen, wavelengths containing the same or similar information are cut into two parts which is not benefit to reduce the model complexity.In this work, trial and error, the spectrum was divided into 30 equidistant subintervals, since the utilization of more than this number did not improve the results.

VIS/NIR Spectral of Samples
In this work, the detecting of leaf water content in different water stress was investigated.Figure 1 shows the representative original spectra of leaves in the high-, middle-degree water stress and no water stress, respectively.The spectrum feature in this region is dominated by pigments absorption, weak water absorption, and leaf structure influence.
Compared with the sample of low water content, sample with high water content has relative low reflectance in the region of 700-1100 nm and high reflectance in the region of 400-700 nm.There are several wave troughs around 585 nm, 670 nm, and 986 nm in the reflectance spectrum of leaf.The distinct reflectance trough around 670 nm is caused by the chlorophyll absorption.The small trough around 970 nm reflects the water weak absorption information [23].The sharp increase of reflectance from 680-780 nm makes the water absorption information around 760 nm masked completely.It is guessed that the trough around 580 nm is caused by the leaf structure.

Relationships between Water Content and Original Spectral Data
Figure 2 showed the correlation coefficient (r) curve of water content and leaf reflectance spectra.It could be found that the r is not larger than 0.6 in the whole range.The max absolute value of r is around 550 nm and 720 nm (shown in Figure 2).But the location of peak is not coincident with the expectation.The absolute values of r at 980 nm and 760 nm are about 0.2 and very low.Therefore, it has difficulty to extract water weak absorption information from the spectra in the region of 400-1100 nm although the percentage of water content in fresh leaf is large (20%-90%).Simple methods such as stepwise multivariate linearity regression or wavelength selection method based on correlation coefficient are  impossible.Therefore, the use of multivariate calibration was justified, as the information regarding the water content needs to be extracted from the spectra.

PLS Models with Preprocessing
Before the construction of the calibration models, PCA was used to detect the outliers which affected the model performance.Due to their potential bad influences over the models, seven samples were left out.SPXY [27] was used for dividing the available samples into calibration and prediction sets to avoid bias in subset selection.As shown in Table 1, the range of water content in the calibration set covered the range in the prediction set.
After the calibration and prediction sets were established, PLS models were constructed with the whole spectra after applying different preprocessing strategies (without any preprocessing, application of MA, application of MSC, application of D1 and D2 calculated by the S-G routine using a secondorder polynomial filter, and application of combination of these methods).The evaluated indexes of calibration model mentioned above were calculated in order to verify the improved ability of models.A summary of the results obtained in this experiment is shown in Table 2.
The predictive performance from the model with no preprocessing was R 2 = 0.871 with RMSEP = 1.2%.The result is not too bad.In order to validate the fact that the linear relationship occurs between water content and reflectance spectra in the range of 400-1100 nm, F test and T test were performed for testing of model linearity and significance test of regression parameters.The regression model passed the F test at the significance level of 0.005, and the regression parameters also passed the T test at the significance level of 0.1.The RPD of the prediction set was 2.86 which was applied to deeply analyze The number given in brackets is the half width of smooth window.predicting effect of PLS model.It was showed in the former researches that, RPD ≥ 3 indicated the predicting effect was good, the calibration model could be used for actual test; 2.5 < RPD < 3 indicated the model could be used to the quantitative analysis [28,29].Considering this rule, water content of leaf can be detected quantitatively using VIS/NIR spectroscopy analysis technique.But it could not be used for testing in practice.
From Table 2, it concluded the following: (1) the denoising methods could improve the calibration precision or the explained ability (R 2 ); (2) both methods of scattering correction and differential coefficient could reduce the model complexity (the number of PLS factors).In particular, SG smoothing with 25 points in combination with the first derivative was the best preprocessing method.R 2 increased from 0.871 to 0.920 while the RMSEP decreased from 1.2% to 1.0% with LVs = 5.The conclusion could be explained in terms of the data character.Due to the same species data, the scattering difference is not evident.However, the water content information is concealed by the background of flat reflectance in the region of 740-1100 nm and which one the first derivative could correct perfectly (shown in Figure 3).The correlation coefficient (r) curve of water content and the spectra data after preprocessing was shown in Figure 4.It was found that the relativity was improved evidently.The regions with r above 0.6 were 506-524 nm, 566-577 nm, 765-784 nm, and 950-957 nm.The latter two regions described the water absorption wavebands around 760 and 970 nm.But it has some deviation from the theory.Presumably, it is because water feature wavelength displacement which is induced by leaf structure change from the water stress.The former two regions denoted the relative chlorophyll content information.From this view, water content and chlorophyll content have some relativity.It is consistent with the former research [30].
After advisable preprocessing method conducted, the optimal RPD of predictive sets was 3.66 from the combination method from SG denoising and D1.The results satisfied the requirement of actual test.

Wavelength Selection
Wavelength selection not only enhances the stability of the model but also makes the model more parsimonious.SPA was a successful variable selection method applied in all kinds of spectral measurements.It was compared with the hybrid strategy Bipls-SPA proposed here through applying in water content detecting, the results from PLS models after wavelength selection were shown in Table 3.Compared with the full-spectra model, better result was obtained for the case that PLS was applied on the 30 variables selected by SPA.The RMSEP was 0.81% and reduced by 16.5%.At the same time, the model complexity was reduced consumedly.From Figure 5, it can be found that the variables are selected by SPA distributed in the entire spectral region.In fact, the main absorption information of water is in the short ware NIR (760-1100 nm).Therefore, some unrelated variables were selected.In the proposed hybrid strategy, Bipls was used to select the efficient wavelength intervals to remove the useless information before SPA.Total of 260 variables from 8 efficient wavelength intervals (553-556 nm, 689-720 nm, 755-842 nm, 950-970 nm, 1013-1034 nm, and 1055-1075 nm) were selected.The precision of calibration model based on the selected intervals was improved (RMSECV = 1.2%) compared with full-spectrum model.The aim of Bipls is to select as fewer variables as possible to maintain the prediction precision.Then, 25 variables were selected by SPA applied in the effective wavelength intervals (shown in Figure 5).The model becomes more parsimonious.The RMSEP reduced to 0.73% and reduced by 9.9% compared by direct SPA.
The predicted versus reference values of water content of leaves in the calibration and prediction set based on Bipls-SPA model was shown in Figure 6.The samples were distributed closely to the regression line, which shows an excellent spectral analysis performance.From the above analysis, it concluded that Bipls-SPA is an effective variable selection method.The calibration model built by these wavelengths was more stable and parsimonious and had higher prediction capability.It provided the theory basis for the portable instrument development to detect the water content in leaf rapidly and non-destructively.

Conclusion
In this work, the results showed that it was feasible to use VIS/NIR (400-1100 nm) spectroscopy analysis for the water content detected in leaves using the chemometrics method, which overcomes the difficulty of extracting water weak absorption information from the short-wave near-infrared region (760-1100 nm).Meanwhile, it was found that PLS in combination with preprocessing methods could construct accurate model and present a satisfactory prediction precision.The first derivative which could remove the baseline and make the water absorption information prominent was the optimal preprocessing method.The best model is suitable for testing water content in practice with RPD = 4.86.
In addition, the wavelength selection method Bipls-SPA proposed in this paper could select efficient wavelength.It reduced the model complexity, extracted the most efficient information of water content, improved the model precision, and made the model more stable.The technique presented in this paper provides a detailed analytical view of water content in plant, which could be used to explore portable instrument to realize nondestructive and rapid estimation of water content and other biochemical parameters synchronously, leading to more efficient irrigation planning in precision agriculture or close ecosystem.

Figure 1 :
Figure 1: Representative original spectra of leaves with different water content.

Figure 2 :
Figure 2: Correlation coefficient of water content in leaf with wavelength-dependent reflectance.

Figure 3 :
Figure 3: Spectra corrected by the optimal preprocessing method.

Figure 4 :
Figure 4: Correlation coefficient of water content in leaf with wavelength-dependent reflectance after preprocessing.

Figure 5 :
Figure 5: Efficient wavelength selected by different methods.

Figure 6 :
Figure 6: Calibration model and prediction ability of water content in leaves based on the wavelength selected by Bipls-SPA.

Table 1 :
The statistic values of water content from the same species in the calibration and prediction.

Table 2 :
The results of calibration and prediction sets by PLS models with different preprocessing method.

Table 3 :
The results of calibration and prediction sets by PLS models with variables selected by different methods.