Quantitative Analysis of Dihydroxybenzenes in Complex Water Samples Using Excitation-Emission Matrix Fluorescence Spectroscopy and Second-Order Calibration

The dihydroxybenzenes are organic intermediates in many fields for various purposes and have been widely recognized as fatal environmental pollutants. Simultaneous determination of these compounds is particularly important. These habitual methods are time-consuming and laborious.The combination of two-dimensional excitation-emission matrix (EEM) fluorescence and secondorder calibration of parallel factor analysis (PARAFAC) was investigated for simultaneously determining catechol, hydroquinone, and tryptophan. A total of 25 samples were designed and are divided into a calibration set and a test set. An unexpected constituent was used as unknown interference. The EEM data were successfully decomposed into a four-factor model of PARAFAC. The resolved spectra excitation and emission profiles from PARAFAC algorithmwere compared with the corresponding pure spectra to confirm the compounds in samples. Based on the decomposition, the final calibration models provided satisfactory concentration estimates.Themean recovery percentages were 98.3%, 101.7%, and 97.9% for catechol, hydroquinone, and tryptophan, respectively. The results reveal that the developed method is maybe a potential tool for simultaneous determination of phenolic components in water samples or other complex samples.


Introduction
Phenols are defined as hydroxy derivatives of benzene.Phenols and substituted phenols are important organic intermediates and are widely used in both industry and agriculture for various purposes [1,2].For example, catechol was used widely as solvent for producing food additive agents, hair dyes, and antioxidants, while hydroquinone is also used in cosmetics, polymers, tanning, pesticides, medicines, and photographic chemicals [3,4].In addition, they are generally present in the wastewater from oil, paint, polymer, and pharmaceutical industries [5][6][7].As three isomers of dihydroxybenzene, catechol, hydroquinone, and resorcinol have high toxicity and low degradability in the ecological system.They show toxic effects in animals and plants due to easily penetrating into skin and cellular membrane [8,9].The absorption of catechol or hydroquinone from the gastrointestinal tract can induce some diseases such as renal tube degeneration and liver function disease.Nowadays, they have been recognized as important environmental pollutants by both the US Environmental Protection Agency (EPA) and the European Union (EU).
Different phenolic compounds behave differently and have different ecological effects and toxicity; the determination of individual phenolic compounds is particularly important.Due to the similarity of structure, the development of good analytical methods is necessary for simultaneous determination of these substances and has currently become a topical problem in environmental analysis [10][11][12][13][14]. Up to now, several different analytical methods have been developed for the determination of the dihydroxybenzene, such as high-performance liquid chromatography (HPLC) [15], gas chromatography-mass spectrometry (GC-MS) [16], electrophoresis [17], and spectrophotometry [18].These habitual methods of analysis usually involve timeconsuming and laborious analytical procedures: for example, a previous step where the physical or chemical separation or preconcentration had to be made.Also, these methods are sometimes complicated and difficult to reproduce and are rather hazardous because they involve some toxic materials and reagents, which can be quite costly.Therefore, the development of new methods, which makes possible the simultaneous determination of these derivatives without any previous separations, is a valuable research subject.
Fluorescent properties of many substances have been widely exploited for analytical fields due to the good capability of detection of fluorescence techniques.Fluorescence may be valuable for simultaneous determination of multicomponent of complex samples without any pretreatment [19].However, the selectivity of fluorescence-based technique is often reduced because of broad spectral overlap or in the presence of matrix interferences.Nevertheless, when combined with chemometrics, this situation can be greatly changed [20].Phenol and its derivatives can be determined by molecular fluorescence.In such cases, spectral overlapping is still a serious drawback, because of which separation techniques must be used before applying univariate spectrofluorimetric techniques.In recent years, the so-called two-dimensional total fluorescence spectroscopy becomes popular and can significantly increase the selectivity and sensitivity.In this technique, a total excitation-emission matrix (EEM) spectrum can be obtained by systematically varying the excitation and emission wavelengths.The main advantage of EEM spectroscopy is the ability to utilize second-order calibration to increase the information content extracted from a dataset.Second-order calibration not only can determine the concentration of components of interest, but also can extract the spectral profiles of all components in the presence of unknown interference.Such a property called "second-order advantage" is especially convenient for handling complex matrices.Nowadays, there exist several second-order calibration methods including multivariate curve resolution-alternating least squares (MCR-ALS) [21], generalized rank annihilation method (GRAM) [22], parallel factor analysis (PARAFAC) [23,24], and alternating penalty trilinear decomposition (APTLD) [25].The PARAFAC coupled with EEM fluorescence has been proved useful in many fields.
In the present work, the combination of two-dimensional EEM fluorescence and second-order calibration of parallel factor analysis (PARAFAC) was investigated for simultaneously determining catechol, hydroquinone, and tryptophan.A total of 25 samples were designed and were divided into a calibration set and a test set.An unexpected constituent was used as unknown interference.The EEM data were successfully decomposed into a four-factor model of PARAFAC.The resolved spectra excitation and emission profiles from PARAFAC algorithm were compared with the corresponding pure spectra to confirm the compounds in samples.Based on the decomposition, the final calibration models provided satisfactory concentration estimates.The mean recovery percentages were 98.3%, 101.7%, and 97.9% for catechol, hydroquinone, and tryptophan, respectively.The results revealed that the developed method is maybe a potential tool for simultaneous determination of phenolic components in water samples or other complex samples.

Samples and Spectroscopy.
A total of 25 samples consisting of catechol, hydroquinone, and tryptophan were used for experiment.Here, a brief introduction was provided: the concentration ranges of samples were 0-9 × 10 −5 (mol/L).All chemicals and reagents used were of analytical reagent grade, and double distilled water was used for preparation.The samples were obtained by several dilution steps with distilled water.First, a small amount of catechol, hydroquinone, or tryptophan was weighted and transferred to a container.It was further diluted into standard strength, before they were mixed and diluted to desired concentrations.The prepared samples were then used to collect the EEM spectroscopy by Perkin-Elmer fluorescence spectrophotometer.The emission wavelength ranged from 230 to 500 with an interval of 5 nm while the excitation wavelength ranged from 230 to 320 with an interval of 2 nm.For each sample a matrix spectroscopy was obtained.These samples were divided into a calibration set with 13 samples and a test set with 12 samples.

PARAFAC. PARAFAC (parallel factor analysis
) is a chemometric method employed for decomposition of multiway data and has been described in detail elsewhere.Here, only a brief description will be provided [23].PARAFAC was first proposed by Harshman in 1970s.It was actually a generalization of bilinear principal component analysis (PCA) to multiway arrays.As a second-order calibration method, PARAFAC is capable of extracting concentration and spectral profiles of the components of interest in the presence of any number of unknown constituents.That is the so-called "second-order advantage." This property is especially convenient for analyzing complex matrix.Although PARAFAC has been successfully applied to solve many problems, the result may be sensitive and unstable unless the chosen number of components is equal to the actual one.Therefore, the selection of correct number of components is of great importance.PARAFAC can also be seen as a deconvolution algorithm that solves for a certain number of factors that are contributing to the experimental data, in this case, the EEM of each sample.Due to the multiway nature of the data and the constraints of the PARAFAC model, the solution is unique.Thus, ideally, the loading of each factor in each mode corresponds to a pure component contribution to the total fluorescence signal of the mixture.Nevertheless, it should be pointed out that the recovered fluorescent components may actually represent discrete species, covarying species, interacting pairs or sets of species, or instrumental artifacts.
The EEMs are multiway data with two independent variables.The fluorescence in the EEMs is collected as a function of both excitation and emission wavelengths.An EEM is considered as a two-way data since there are two independent variables, excitation wavelength and emission wavelength, and a dependent variable, intensity.A group of EEMs, each corresponding to a different sample, can be stacked into a so-called three-way array  of dimension  ×  × , where  is the number of samples,  is the number of emission wavelengths, and  is the number of excitation wavelengths.
Figure 1 gives the graphical representation of a PARAFAC model.PARAFAC focuses on decomposing  into three matrices and a residual array by using an alternating least squares algorithm to minimize the sum of squared residuals: +   ,  = 1, . . ., ;  = 1, . . ., ;  = 1, . . ., . (1) These matrices are labeled  (the score matrix) and  and  (the loading matrices) with elements   ,   , and   .In (1),   is the fluorescence intensity of sample  at the th emission wavelength and the th excitation wavelength for a PARAFAC model with  factors (components).The terms , , and  are related to the concentration, emission spectra, and excitation spectra, respectively, of the different components.The   is directly proportional to the concentration of the th fluorophore of the th sample, while   and   are scaled estimates of the emission and excitation spectra of the th fluorophore at wavelengths  and , respectively.The   term contains any unexplained signal including noise and unmodeled variability.Unlike PCA, there is no additional orthogonality constraint in PARAFAC.This means that if the structure of the underlying three-way data is in agreement with the PARAFAC model, then the parameters of PARAFAC can reflect the true underlying parameters.Hence, each fluorophore will give rise to one PARAFAC component and the corresponding scores can be used to indicate the relative concentration.The correct number of components is of great importance.To find the number of components, the criteria of core consistency diagnostic (CORCONDIA) are usually used and the details are available in reference.

The Procedure of PARAFAC-Based Calibration.
Based on the PARAFAC decomposition, a calibration model for threeway data of EEMs can be constructed by the following steps.
(1) For each sample, record the EEM of  ×  dimension, where  and  are the number of emission wavelengths and the number of excitation wavelengths, respectively.If the data structure is assumed to be trilinear, a three-way array, , consisting of EEMs of all calibration samples, can be built.
(2) Perform the PARAFAC decomposition, resulting in the spectral profiles and the sample profiles of the analytes.
(3) Find the correct number of the factors, that is, , which corresponds to the analytes of interest.
(4) The loadings on the sample profile of the analytes of interest from the previous step are regressed on the known concentration so as to obtain the calibration curve.Based on the concentration difference obtained with the PARAFAC decomposition and the true concentration, some performance measure can be calculated.
(5) Determine the concentration of interest analytes in a test sample.To do it, the excitation-emission matrix of the test sample is needed to be recorded and added to the array  to calculate the joint PARAFAC decomposition, followed by the model prediction.

Results and Discussion
When the dihydroxy derivatives of phenol are at low concentrations, these fluorophores are directly proportional to fluorescence intensity and thus fluorescence intensity may be applied to quantify these components.As mentioned above, once coupled with second-order calibration technique, it is convenient to simultaneously determine several components without preseparation step.Based on a surf plot, as an example, Figure 2 shows an excitation-emission matrix (EEM) spectrum of a mixture recorded from 230 to 500 nm in the emission domain and from 230 to 320 nm in the excitation domain.In this range the Rayleigh and Raman scatterings can be avoided.For each sample, there exists an EEM fluorescent array with sizes 136 × 19 (emission wavelength × excitation wavelength).The recorded data of all the samples including calibration set and test set were combined to constitute a 25 × 136 × 19 data array.By visual inspection, the EEM spectra of all samples remain considerably overlapping.For a better observation, Figure 3 gives the corresponding contour of the EEM spectrum of Figure 2. Similarly, Figure 4 shows the contour of the EEM spectrum of pure catechol solution.By comparing Figure 3 with Figure 4, it can be found that the fluorescence signal of the mixture differs from its any pure components, implying that the contour is maybe advantageous, as the difference of the original three-dimensional signal is not easy to perceive.With the purpose of obtaining a second-order calibration model, this calibration set was first analyzed using PARAFAC, that is, building a three-way array with data corresponding to the calibration samples only.It is known that, to develop the PARAFAC model, it is also necessary to determine the correct number of factors/components.Several ways are available for this purpose, including cross-validation, scree plot, residuals investigation, split-half analysis, and so-called core consistency diagnostic.In this study, a core consistency diagnostic was used.The definition of such a parameter is based on the Tucker3 model, which is another decomposition method of multiway data.The number of factors for each mode is the same in a PARAFAC model while each mode in a Tucker3 model may have a different number of factors.Thus, the representation of a Tucker3 model is different from PARAFAC and has an important core array.If a PARAFAC model is well fitted, the Tucker core array has superdiagonal elements close to one and nonsuperdiagonal elements close to zero [26].Therefore, when the model is valid, the core consistency will be close to 100%.On the contrary, if the model is not valid, the core consistency is close to zero.The number of factors is determined to be the highest number of factors and a valid value of the core consistency (mostly 80-100%).The result obtained by the core consistency diagnostic indicated that the correct factor numbers should be four when using the PARAFAC.Chemically, using four factors is a reasonable choice.The four factors represent the catechol, the hydroquinone, the tryptophan, and the background, respectively.
Figure 5 shows the profiles of the three factors from the decomposition of the data array by the PARAFAC with four factors and the corresponding pure spectra.Figure 5(a) shows the loadings related to the resolved excitation spectral profiles and Figure 5(b) shows the loadings related to the resolved emission spectral profiles.Figures 5(c) and 5(d) are the corresponding pure spectra.By inspecting these profiles, the compounds in samples can be confirmed since the resolved spectra excitation and emission profiles from PARAFAC algorithm are basically the same as the pure spectra.
In addition, the rationality of the result is further examined.In general, if the loadings of the PARAFAC model have a direct chemical interpretation, whether they are physically reasonable with respect to the chemical phenomenon being studied should be assessed.In the case of noninteracting organic fluorophores, the emission spectrum should exhibit a pronounced shift relative to its excitation spectrum, known as the "Stokes Shift." It actually reflects the fact that the energy with which a molecule fluoresces is certainly lower than the energy at which the molecule was excited, due to energy losses in its excited state.The Stokes Shift depends on the fluorophore type and position within a macromolecule as well as its electronic environment.Different fluorophores have different shift values.
Finally, based on the PARAFAC, the calibration models are constructed.As an example, Figure 6 gives the scatter plot of predicted versus actual concentration for the catechol component.In such plots, predicted values are plotted against actual ones and the points will fall on the diagonal only if the model predicts the concentration perfectly.It can be observed that, for both the calibration and test sets, the points fall on the diagonal direction relatively compactly, indicating that the predictive ability of the calibration model is very satisfactory.Also, there is no obvious difference on the prediction of the calibration and test set.Furthermore, the recovery percentages of all test samples are calculated.The mean recovery percentages are 98.3%, 101.7%, and 97.9% for catechol, hydroquinone, and tryptophan, respectively.It seems that the final calibration models can provide satisfactory concentration estimates.

Conclusions
In this context, a novel technique that combines fluorescence excitation-emission matrix (EEM) with parallel factor analysis (PARAFAC) was investigated for simultaneous determining three substances with overlapped fluorescence.The EEM data were successfully decomposed into a four-factor model of PARAFAC, based on which the final calibration models provide satisfactory concentration estimates.It is shown that PARAFAC is a powerful tool which can resolve heavily overlapped peaks into their pure spectral profiles and concentration profiles even in the presence of unknown interferences.These results reveal that EEM coupled with PARAFAC could be applied as a valuable tool for simultaneous analysis of the dihydroxy derivative of phenols.

Figure 1 :Figure 2 :
Figure 1: The graphical representation of parallel factor analysis (PARAFAC) model.Each factor has loading vectors a  , b  , and c  , in each dimension of the three-way array .

Figure 5 :
Figure 5: The profiles of the three factors from the decomposition of the data array by the PARAFAC with four factors and the corresponding pure spectra: (a) excitation spectra from PARAFAC, (b) emission spectra from PARAFAC, (c) pure excitation spectra, and (d) pure emission spectra.

Figure 6 :
Figure 6: Scatter plot of predicted versus actual concentration for the catechol component.