Quantitative Determination of the Multicomponents With Overlapping Ultraviolet Spectra Using Wavelet-Packed Transform and Partial Least Squares

This paper presented a novel method named wavelet packet transform-based partial least squares method (WPTPLS) for simultaneous spectrophotometric determination of α-naphthylamine, p-nitroaniline, and benzidine. Wavelet packet representations of signals provided a local time-frequency description and separation ability between information and noise. The quality of the noise removal can be improved by using best-basis algorithm and thresholding operation. Partial least squares (PLS) method uses both the response and concentration information to enhance its ability of prediction. In this case, by optimization, wavelet function and decomposition level for WPTPLS method were selected as Db16 and 3, respectively. The relative standard errors of prediction (RSEP) for all components with WPTPLS and PLS were 2.23% and 2.71%, respectively. Experimental results showed WPTPLS method to be successful and better than PLS.


INTRODUCTION
Many efforts have been made in order to resolve overlapping signals in spectrophotometry. As a consequence of peak overlapping, the quality of analytical information is lower than what is derived from isolated peaks; the extent of the loss depends on the extent of overlap. In complex samples, however, spectral overlap is often occurring. Strongly overlapped signals do not permit direct determination by traditional methods without previous separation. To overcome this difficulty, multivariate analysis [1][2][3] such as partial least squares (PLS) and principal components regression (PCR), and so forth have been proved to be useful. When PLS and PCR methods were applied to analyze samples, the first problem encountered is to determine the number of components that they contain. Unfortunately, data obtained from instrumental measurements can be contaminated by noise. The presence of noise often causes overestimating the true number of chemical components. In order to eliminate noise, wavelet packet (WP) denoising method was used as a preprocessing step. Wavelet packet transform (WPT) is an important extension of wavelet transform (WT). WT is a powerful tool with a very rich mathematical content and great potential for application [4,5]. WPT inherits the property of having a sparse representation of the original signal and time-frequency localization, and offers more flexibility than wavelet analysis [6,7]. A novel approach tried here is to combine WPT with PLS to eliminate noise and improve the quality of regression. Aniline-type compounds are widely applied in industries such as chemistry, printing, and pharmacy, and are one of the most important raw materials for synthetic medicine, dye, insecticides, polymer, and explosives. Anilinetype compounds are highly poisonous, and can also cause cancer. Therefore, it is very important to test and analyze aniline-type compounds in environmental samples. Simultaneous determination of aniline-type compounds is very difficult due to their overlapping spectra. In this paper, WPTPLS method was developed and used to perform simultaneous determination of α-naphthylamine, p-nitroaniline, and benzidine. Experimental results showed the proposed method to be successful and better than PLS.

WPT denoising
A wavelet packet W jnk is generated from the base function where indices j, n, k are the scale, the oscillation, and the 2 Journal of Automated Methods and Management in Chemistry localization parameter, respectively. j, k ∈ Z, Z means the set of integers, n = 0, 1, 2, . . . , 2 j − 1. The discrete wavelet transform (DWT) can be implemented by means of Mallat's pyramid algorithm [8]. DWT can be characterized as a recursive application of the high-pass and low-pass filters that form a quadrature mirror filter (QMF) pair. The theoretical background about DWT has been described in details [9]. The difference between WT and WPT is the decomposition path. In WPT, both the approximations and details are analyzed. The recursion is simply to filter and downsample all output of the previous level. A fast wavelet packet transform (FWPT) is expressed as where W 0,0 indicates the measured signal f , H = {h l } l∈Z and G = {g l } l∈Z are the low-pass and high-pass filters matrices. The first and second indices of W indicate the level of decomposition and its position at that level. The reconstruction can be implemented by where H * and G * represent the conjugate matrices of H and G.
The wavelet packet denoising procedures include four steps: (1) WPT, (2) estimation of the best basis, (3) thresholding of wavelet packet coefficients, and (4) reconstruction. The best basis is selected according to entropy-based criterion proposed by Coifman and Wickerhauser [6]. Shannon entropy was applied in this case. The thresholding operation is implemented by the SURE method proposed by Donoho [10] based on Stein's unbiased risk estimation.

The wavelet packet transform partial least squares method
In the method, WPT is used as a tool for removing noise from original data. The denoising is applied to the wavelet packet domain as described above, prior to backtransforming it to original domain. The reconstructed matrices from standard and unknown mixtures were obtained for further PLS operation. The PLS algorithm is built on the properties of the nonlinear iterative partial least squares (NIPALS) algorithm by calculating one latent vector at a time. The NIPALS-PLS algorithm and calculating details were described previously [11]. According to this algorithm, the program called PWPT-PLS was designed to perform data compression and denoising as well as simultaneous determination.

Apparatus and reagents
The Shimadzu UV-240 spectrophotometer furnished with OPI-2 function was used for all experiments; a legend Pentium IV microcomputer was used for all the calculations; pH measurements were made by a pH-3B digital pH-meter with a glass-saturated calomel dual electrode. All reagents were of analytical reagent grade. The water used was doubly distilled and deionized. Stock standard solutions of 2.000 mgml −1 αnaphthylamine, p-nitroaniline, and benzidine were prepared from correspondent reagents with water as solvents. Standard solutions were then prepared from their stock standard solutions by serial dilution as required. Acetic acid (HAc)sodium acetate (NaAc) buffer solution (pH 6.30) was used.

Procedures
A series of mixed standard solutions containing various ratios of the three kinds of organic compounds was prepared in 25 ml standard flasks, 10.00 ml of HAc-NaAc buffer solution (pH 6.30) was added, and dilution with distilled water to mark. A blank solution was prepared similarly. Spectra were measured in 1 cm cuvettes between 250 nm and 460 nm at 2 nm intervals with respect to a reagent blank. An absorption matrix D was built up. All the values measured were means of three replicate.

Evaluation of the performance of the test methods
Absolute and relative standard errors of prediction (SEP and RSEP) were used as the criteria for comparing the performances of the test methods. The SEP for a single component is given by (4); that for all components by (5). The RSEP is given by (6) [12]: where C i j and C i j are the actual and estimated concentrations, respectively, for the ith component in the jth mixture, m is the number of mixtures, and n is the number of components.   302 nm, and 278 nm, respectively. It can be seen from Figure 1 that the absorption spectra of three components exhibited are seriously overlapped in their absorbing regions, so that for mixed solution only one peak can be recognized.

Wavelet packet transform and wavelet packet denoising
Here, we selected mean spectra of D matrix as original signal f . WPT of the signal f was carried out using FWPT algorithm. The part of WP coefficients obtained by FWPT is shown in Figure 2. Each coefficient is identified by the couple of index ( j, n), where j is the level of decomposition and n is the position at that level. From Figure 2, it is obvious that the w( j, 0) only contains a positive part and is similar to the original signal. The others are composed of both positive and negative parts. Each block of the coefficients describes the components of the signal f related to a certain frequency band. This flexible time-frequency resolution enables the WP to characterize locally the most relevant parts of a signal and hence to adequately represent a signal with relatively small number of coefficients. In the spectrophotometric measurements, the analytical signals usually center in low-frequency part, whereas the noise in high-frequency part. The aim of WP denoising is to extract the desired signal from a complex instrument output, where the signal is present along with noise. Random Gaussian noise was added to the mean spectra for assessing the WP denoising method. Original and reconstructed spectra as well as their difference with different added noise are displayed in Figure 3. It was found that WPT provides an appropriate approach for denoising even in case where 2% noise is added. Thus the method is safe for preprocessing two-dimensional raw data matrix in the following WPTPLS operation.

The wavelet packet transform partial least squares method
Each of the wavelet functions has different characteristics. The wavelet function, which is optimal for a given signal, is not necessarily the best for another type of signal. Thus, the choice of the wavelet functions is very important for this technique. In this work, the wavelet functions tested were Coiflet 1, 2, . . . , 5, Daubechies 4, 6, . . . , 20, Symmlet 4, 5, . . . , 8. It is possible to use the predictive parameters SEP and RSEP to find the optimum choice of functions. In similar way, one-to-six decomposition levels L were tested too. The influence of wavelet functions and decomposition levels is listed in Tables 1 and 2. According to these experimental results, wavelet functions and decomposition level for WPTPLS method were selected as Daubechies (Db) 16 and 3. A training set of 16 samples formed by the mixture of the three organic compounds was designed according to four-level orthogonal array design with the L 16 (4 5 ) matrix. Table 3 summarizes the composition of the training set. The experimental data obtained from the training set were arranged in matrix D, where each column corresponds the absorbance of different mixtures at a given wavelength and each row represents the spectrum obtained at a given mixture. With FWPT, one can treat each spectrum at a given mixture. Therefore, in the same way each row vector of matrix D and D u was decomposed, and denoised by best-basis selection and thresholding operation, then reconstructed by applying inverse FWPT. Determining the number of factors is one of the most important steps in PLS method. The essence of the step is the pseudorank determination of the raw experimental data. Three principal factors for the case were selected based on previously reported methods [13]. Before starting the WPTPLS calculation, mean centering and data standardization were performed as preprocessing. After this transform, the matrix where each column had zero mean and a variance equals to the unity was obtained. Using program PWPTPLS, the concentrations of the three organic compounds for a test set were calculated. Actual concentrations, found concentrations, and their recoveries are listed in Table 4. The experimental results showed that the SEP and RSEP for all components were 0.192 μgml −1 and 2.23%.

A comparison of WPTPLS and PLS
In order to evaluate WPTPLS method, two methods were tested in the study with a set of synthetic unknown samples. The RSEP for the two methods are given in Table 5.
The RSEP for all components calculated by WPTPLS and PLS methods were 2.23% and 2.71%, respectively. The results demonstrated that the WPTPLS method had better performance than PLS method.

CONCLUSIONS
A method named WPTPLS was developed for multicomponent spectrophotometric determination. The method combines the idea of the WPT denoising with PLS regression for enhancing noise removal ability and quality of the regression. In WP denoising the time-frequency localization, bestbasis algorithm and thresholding operation were used to improve the quality of denoising. In PLS operation, errors both in the concentration and spectra were taken into account to improve predictive properties. Experimental results show the clear superiority of WPTPLS over PLS method.  I  II  III  I  II  III  I  II  III  I  II  III  I  II  III   1