A Comparison of Second-Order Calibration Methods Applied to Excitation-Emission Matrix Fluorescence Data

Due to the variety of second-order data being generated by modern instruments and various mathematical algorithms being available for analysis purposes, second-order calibration is gaining widespread acceptance by analytical community. It has the so-called second-order advantage; that is, it enables concentration and spectral profiles of sample components to be extracted even in the presence of unexpected interferences. A comprehensive performance comparison of alternating trilinear decomposition (ATLD) and its two variants, that is, alternating penalty trilinear decomposition (APTLD) and self-weighted trilinear decomposition (SWATLD), was presented in this paper. The experiment was based on the simultaneous determination of three dihydroxybenzenes, that is, catechol, hydroquinone, and resorcinol, by excitation-emission matrix fluorescence (EEMF) spectroscopy. Two special measures, that is, the consistency (COS) between the resolved and actual profiles and the mean of recovery, were used for evaluation. The optimal result was obtained by the APTLD model with five components. No perceptible difference on the speed of convergence was found. It indicates that EEMF linked with the APTLD algorithm can serve as a potential tool of quantifying dihydroxybenzenes simultaneously in environmental samples.


Introduction
Determination of the analytes of interest in complex matrix is a very challenging task in many fields.The traditional practice is to resort to certain time-consuming, laborious, and cost-expensive physical or chemical separation.Often, the equilibrium that existed in the mixture is maybe broken by the separation and can therefore mislead subsequent quantification.With the development of modern secondorder instruments capable of generating matrix signal, a number of new analytical methods become available [1][2][3][4][5].In recent years, the fluorescent properties of many substances have been widely exploited for analytical purposes due to the progress of detecting fluorescence.However, the selectivity is maybe a problem due to heavy spectral overlap and inferences [6].Nowadays, fluorescence detection combined with chemometrics has greatly changed this situation.One of the possible strategies consists of collecting the excitation-emission matrix fluorescence and extracting useful information by second-order calibration algorithms.More specifically, second-order calibration includes the main steps: to decompose a three-way array into three matrices and to build a regression equation between the resolved relative concentration of the analytes of interest and the corresponding actual concentration.Such a procedure has a property named second-order advantage [7], which enables concentrations and spectral profiles of sample components to be extracted even in the presence of any number of unknown constituents.It also makes it possible to quantify several components simultaneously.Such strategies have been successfully applied in many analytical fields, for example, food, pharmaceuticals, environmental biomedical matrices, and so forth.
For second-order calibration-based analytical applications, the corresponding algorithm is decisive.Available algorithms can be classified into three types [8].The first type 2 Journal of Chemistry is based on generalized eigenanalysis, such as generalized rank annihilation method (GRAM) [9] and direct trilinear decomposition (DTLD) [10].However, GRAM is constrained to use only one calibration sample and one unknown sample at a time.DTLD allows for a direct decomposition of multiple samples, but it needs the construction of two pseudosamples.Both algorithms can only exhibit good performance on condition that the ratio of signal-to-noise is high.Otherwise, imaginary solutions can be observed.The second type is based on iterative ones, such as classic parallel factor analysis (PARAFAC) proposed by Harshman [11] and Bro [12] and alternating direct trilinear decomposition (ATLD) proposed by Wu et al. [13].These algorithms use different loss functions to obtain second-order advantage.Although PARAFAC has been successfully applied to many chemical problems, it maybe leads to chemically meaningless solution in certain cases and sometimes its solutions become unstable, especially when the factor number is not appropriate.Besides, PARAFAC easily suffers from slow convergence.By using Moore-Penrose generalized inverse from truncated single value decomposition, ATLD can extract the diagonal elements and avoid the insensitiveness of the results to the component number.Also, as the calculation is implemented on slice matrices, the speed of convergence is relatively fast.In ATLD, all factors related to the diagonal elements influence the resolved results.Subsequently, its modified versions, that is, alternating penalty trilinear decomposition (APTLD) [14] and self-weighted alternating trilinear decomposition (SWATLD) [15], were developed.Both algorithms have the advantage of being insensitive to component/factor number and being fast to convergence.The well-known algorithm named multivariate curve resolution-alternating least square (MCR-ALS) [16] also belongs to the iterative type.In essence, MCR-ALS is a bilinear method and can be used for threeway data array only when the matrix signal of each sample obeys bilinear method.The latter is also true for the other algorithms which need to obey the trilinearity condition.The last type is to rearrange the high-order array into vectors and apply a first-order algorithm, including unfolded-principal component regression (U-PCR) and unfolded-partial least squares (U-PLS) [17].These algorithms were first used to handle second-order data before the true second-order algorithms were developed.The popular multiway partial least squares (N-PLS) [18] is a genuine multiway algorithm, but it does not have the second-order advantage.Both U-PLS and N-PLS do not exploit the second-order advantage.Although many algorithms are available, it is very important to be able to select a second-order calibration method that would be appropriate for the task at hand.
In the present work, a comprehensive performance comparison of three second-order calibration algorithms, that is, ATLD and its variants (APTLD and SWATLD), is presented.The experiment was based on the simultaneous determination of three dihydroxybenzenes (catechol, hydroquinone, and resorcinol) in water samples by excitation-emission matrix fluorescence (EEMF) spectroscopy.Two special measures, that is, the consistency (COS) between the resolved and actual profiles and the mean of recovery, were used for evaluation.The optimal result was obtained by the APTLD model with five components.No perceptible difference on the speed of convergence was found.It indicates that EEMF linked with APTLD algorithm can serve as a potential tool of quantifying dihydroxybenzenes simultaneously in environmental samples.

Theory and Methods
In second-order calibration, the trilinear model can be depicted as follows [2,8]: The  denotes the number of factors/components related to the number of detectable species, including the components of interest, background, and interferences., , and  denote the numbers of excitation wavelengths, emission wavelengths, and samples, respectively.  is the element of three-way array () with dimension of  ×  ×  and   is the element of the corresponding three-way residual array ().  ,   , and   are the elements of matrices A, B, and C with dimensions of  × ,  × , and  ×  size, corresponding to excitation profiles, emission profiles, and relative concentration profiles, respectively.Second-order calibration consists of two main steps: (1) to decompose the three-way data array to produce three matrices and (2) to regress the relative concentration of the components of interest against the reference concentration.Different strategies on decomposing the array lead to different secondorder calibration algorithms.

ATLD Algorithm.
By alternating converting three-way array to the matrix form, Wu et al. [13] developed the alternating trilinear decomposition (ATLD) algorithm, which employs the following loss function to compute A, B, and C in true trilinear sense: where X ⋅⋅ , X ⋅⋅ , and X ⋅⋅ denote the th horizontal, th lateral, and th frontal slices of , respectively, similarly, E ⋅⋅ , E ⋅⋅ , and E ⋅⋅ can be defined, a () , b () , and c () are the th, th, and th rows of profiles in A, B, and C, and diag(⋅) is the function of building a diagonal matrix by given elements.By minimizing (2), the update of A, B, and C can be obtained as The function of diag (⋅) can extract the diagonal elements of a matrix and transform them to a column vector.Actually, (3) is not the least-squares solution of (2) but a strategy for the calculation of A, B, and C. ATLD algorithm focuses on extracting the trilinear part in the three-way data and makes the iterative procedure more efficiency.The combination of generalized inverse based on truncated single value decomposition and diagm operation makes ATLD have the advantage of being insensitive to component number.Also, the calculation by slice matrix needs less memory and releases considerably computation task.In ATLD, any factors that influence the diagonal elements can give a corresponding influence on the results.

APTLD Algorithm.
Based on the alternating leastsquares principle and alternating penalty constraints, Xia et al. [14] developed the so-called alternating penalty trilinear decomposition (APTLD) algorithm.In APTLD, the author uses three new objective functions by introducing the penalty term.Taking the calculation of A as an example, the following loss function is used: where sqrt is the square root operator and 1 is a vector of length  with all elements equal to one.Similarly, one can build the (B) and (C).By minimizing these alternating penalty errors simultaneously, the intrinsic profiles are obtained.Compared to traditional parallel factor analysis (PARAFAC) algorithm, APTLD can avoid the two-factor degeneracy problem and speed up the convergence.It is found that APTLD is also insensitive to the estimated component number, thus avoiding the difficulty of finding correct component number for a model.In general, as long as the component number is not less than the actual number of components, APTLD can perform well.

SWATLD Algorithm.
Self-weighted alternating trilinear decomposition (SWATLD) is a second-order algorithm proposed by Chen et al. [15].SWATLD is derived from ATLD and is based on the same ideology.When updating th row of A, it uses the following equation: The equation can lead to a least-squares solution and a proof is available in [8].SWATLD not only is insensitive to the component number but also holds very fast convergence speed.In addition, the build-in way of updating a () makes the final solution more stable than ATLD.

Figures of Merit.
With the aim of comparing the results of different second-order calibration algorithms, the figures of merit including consistency (COS) between the resolved and actual profiles and the recovery are used.Generally, second-order calibration algorithms can provide the quality information such as the excitation and emission spectra, which is very important, since a compound of interest can be qualified through its spectrum.The value of consistency between the resolved and reference profiles is defined as where a and b are the reference profiles of a component and â and b are the estimated profiles.Thus, the higher the value of COS is, the closer the resolved profile is to the real one.

Experimental and Data
A total of thirty samples containing different quantities of three dihydroxybenzenes, catechol, hydroquinone, and resorcinol, were analyzed by excitation-emission fluorescence spectroscopy.At the same time, indole was used as the interference.All reagents and chemicals were of analytical reagent grade The concentration range of each component of analyte of interest was 0-9 × 10 −5 mol/L.The concentration of indole was randomly controlled in the range of 0-2 × 10 −5 .One half of the samples were used as concentration calibration samples and the other half were used as concentration prediction samples.All response matrices were recorded by Perkin-Elmer fluorescence spectrophotometer with excitation and emission wavelengths varying from 230 to 320 nm at intervals of 5 nm and from 230 to 500 nm at intervals of 2 nm, respectively.A 1 cm quartz cell was used for all measurement.The effect of Rayleigh scattering on response matrices was roughly reduced by subtracting the response matrix of an average blank solution from all samples response matrices.Both excitation and emission of monochromatic slit widths were 5 nm.For each sample, a matrix spectrum of 136 × 19 size was recorded.All programs were implemented in Matlab environment on a personal computer with an operating system of Windows Xp.

Results and Discussion
In the second-order instrument, the response signal for a single chemical sample corresponds to a matrix, which can be visualized as a two-dimensional surface or landscape.
When considering a group of samples, all second-order data can be stacked into a single three-dimensional array.Therefore, second-order data is also called three-way array.
As an example, Figure 1 shows an excitation-emission matrix fluorescent spectrum of a mixture.The excitation and emission wavelengths range from 230 to 500 nm and from 230 to 320 nm, respectively.The selection of such wavelength ranges was also based on a suitable consideration of the regions corresponding to maximum signals for the components of interest and avoiding useless background signals including Rayleigh and Raman scattering.Figure 2 gives the contours of the excitation-emission matrix fluorescent spectra of pure components of interest and a representative mixture.The spectral overlapping of pure components of interest is very significant.So heavy overlapping also hinders the direct fluorescent quantification and restricts the use of univariate calibration.Nowadays, a modern strategy of overcoming this problem is to resort to second-order calibration.It has lighted a new avenue to replace the physical or chemical separation with mathematical separation through extracting the signal of the components of interest from those of background or interferences.Figure 3 gives the excitation and emission profiles of samples of three pure components of interest.As noticed in Figure 3, the spectral overlapping is very obvious.Three kinds of second-order calibration algorithms, that is, ATLD, SWATLD, and APTLD, were used to resolve the spectral and concentration profiles.To develop these models, it is necessary to determine the number of components/factors and several ways are available for this purpose.In this study, the core consistency diagnostic was applied [19].When a sequence of second-order calibration models was constructed with an increasing number of components, the value of core consistency tended to start high and then dropped abruptly at the point where too many components were used.The optimal number of components was set as the number in the largest model with a high core consistency.In the light of the results, the optimal number was five for each of these algorithms.Figure 4 displays the resolved and actual excitation and emission profiles corresponding to three kinds of models with the same number of components.It can be observed in Figure 4 that both the SWATLD and APTLD algorithms work well and there exists no significant difference.However, the ATLD algorithm is difficult to obtain satisfactory results.
To analyze further the sensitivity of the model to the selected number of components, on the test set, these models with 4-8 components were constructed and the mean values of COS and recovery were summarized in Table 1, which presented the results consistent with the trend in Figure 4. Figure 5 displays the mean values of COS corresponding to three different algorithms and a varying number of components.Obviously, when the number of components is 4, 5, or 6, the COS value of each algorithm remains almost the same.It seems to be consistent with some other researches that, compared to PARAFAC, these second-order calibration algorithms are insensitive to the chosen component number and can often work well on condition that the number of components is larger than the actual ones.However, it is also clear in Figure 5 that they still lead to a dangerous overfitting solution when too many OSC components are allowed.Therefore, it should be remembered that the insensitivity to component number is conditional and limited since using more factors inevitably introduces imaginary solutions.
Figure 6 gives the comparison of mean recovery of quantifying dihydroxy derivatives corresponding to three different algorithms and a varying number of components.As can be seen, on average, ATLD models containing 4-7 factors exhibit the same recovery, and the recovery of either APTLD or SWATLD model is highest when using five factors.
In addition, one can also find that the big value of COS does not necessarily mean that the recovery value is high; their trends are not entirely consistent.It also indicates that using more than a measure is necessary when comparing different algorithms or models.

Conclusions
Based on the second-order advantage, the combination of excitation-emission matrix fluorescence spectroscopy and second-order calibration was investigated for simultaneous determination of three dihydroxybenzenes.A comprehensive comparison was made.In terms of the two measures, that is, the consistency between the resolved and actual profiles  (COS) and the mean of recovery on the test set, the APTLD algorithm outperformed the others.The convergence rate of different algorithms was similar.It indicates that such an approach can be a promising alternative for practical application in environmental quality control.
Province of China (2013JY0101), the Yibin Municipal Innovation Foundation (2013GY018), the Innovative Research and Teaching Team Program of Yibin University (Cx201104), and the Scientific Research Foundation of Sichuan Provincial Education Department of China (12ZA201 and 13ZB0300).

Figure 1 :
Figure 1: The excitation-emission matrix fluorescent spectrum of a mixture with excitation and emission wavelengths varying from 230 to 500 nm and from 230 to 320 nm, respectively.

Figure 2 :
Figure 2: The contours of the excitation-emission matrix fluorescent spectra of pure components of interest and a mixture.

Figure 3 :Figure 4 :Figure 5 :
Figure 3: The excitation and emission profiles of samples of three pure components of interest.

Figure 6 :
Figure 6: Comparison of mean recovery of quantization corresponding to three different algorithms and a varying number of components.

Table 1 :
Summary of the mean of COS and the mean of recovery related to a different number of factors and algorithms.