Simultaneous spectrophotometric determination of manganese, zinc and cobalt by kernel partial least–squares method

Simultaneous spectrophotometric determination of Mn, Zn and Co was studied by two methods, classical partial least-squares (PLS) and kernel partial least-squares (KPLS), with 2-(5-bromo-2- pyridylazo)-5-diethylaminephenol (5-Br-PADAP) and cetyl pyridinium bromide (CPB). Two programs, SPGRPLS and SPGRKPLS, were designed to perform the calculations. Eight error functions were calculated for deducing the number of factors. Data reductions were performed using principle component analysis. The KPLS method was applied for the rapid determination from a data matrix with many wavelengths and fewer numbers of samples. The relative standard errors of prediction (RSEP) for all components with KPLS and PLS methods were the same (0.0247). Experimental results showed both methods to be successful even where there was severe overlap of spectra.


Introduction
The partial least-squares (PLS) method is a generalized method used to build a predictive model between two blocks of variables: the C-block of predictor variables and the D-block of response variables. PLS is factor-based method capable of using full spectra, which can include as much spectral detail in the analysis as possible ]. The advantage of PLS is the transformation of the numerous original variables into a small number of latent vectors, which are a linear combination of the original variables [2,3]. New analytical instruments produce huge quantities of data that need some kind of reduction in order to be practicable to analyse [4]. The analysis of large data arrays is emerging as a problem in analytical chemistry. The best approach to this problem is data compression.
Of course, the compression must be such that the loss of significant information is minimized. The increasing complexity of chemical data has recently stimulated the development of two kinds of data compression methods.
The first approach is to use B-splines [5] or any other suitable compression basis to produce a compressed matrix G. G has a much lower number of elements than D [6]. Another approach is based on some small kernel matrices which require much less storage space than the original data [7][8][9]. In this paper, large data sets with many wavelengths and few samples can be easily recorded by the mode called 'data print out at wavelength intervals' of the Shimadzu UV-240 spectrophotometers. The calculation process for simultaneous multicomponent determination using classical PLS is slow or interrupted by out of memory of our microcomputer.
The method [10] provides a simple method for speeding up the calculations. Simultaneous determination of man-ganese, zinc and cobalt with 5-Br-PADAP and CPB using traditional spectrophotometry is difficult because the absorption spectra overlap. The determination of trace amounts of Mn, Zn and Co has recently received considerable attention owing to concern with the problems of environmental pollution [11]. This paper describes the improvement of multicomponent determination using full spectra information with two PLS methods.

Theory
The following notation is used for this paper. Lower-case letters are used for column vectors, capital letters for matrices, the transpose of vectors and matrices will be noted by superscript T, i.e. D T and CT. A scalar product of two columns vectors is thus written (aT bl. The Euclidian norm of a column vector a is written: lall-(aTa) 1/2. The symbol I means the identity matrix.

PLS method
The PLS algorithm is built on the properties of the nonlinear iterative partial least-squares (NIPALS) algorithm by calculating one latent vector at a time. It is assumed that absorbance and concentration matrices (D and C) are mean-centred and normalized. Calibration with use of the PLS approach is done by decomposition of both the concentration and absorbance matrix into latent variables, D TpT+ E and C UQ,T+ F. The inner relation linking both equations is U BT, the matrix B is a diagonal regression matrix. The projection T is computed both to model D and correlated with C. This is accomplished by introducing a weight matrix W and a latent concentration matrix U with the corresponding loading matrix P. The prediction is done by decomposing the D block and building up the C block. For this pupose, p, q, w and b from the calibration part are saved for every PLS factor. The NIPALS-PLS algorithm is shown in table 1. As can be seen in the algorithm, the latent vectors for PLS are determined through the process of estimating the W vectors for the linear combination of d vectors. Once the w vector has been determined, all other quantities can be calculated from them. The objective function for the first of these weight vectors, Wl, is to maximize the sum of squared covariances of the vector Dwl with the original C matrix, max(wfDJCCTDwl), subject to w''wl-1. The solution to this problem is obtained at Wl, the largest eigenvector of the matrix DTccTD.  1. Set new vector to a column of C, the column has the maximum standard deviation in the C matrix 2. Set old vector 10 000 000 3. If norm(/new told)/norm(/new) > convergence criteria, then 4; else 7 4. told tne 5. Set u to the first column of C 6. w uTD/uTu; W w/norm(w); t--Dw/wTw; qT tTc/tTt; q q/norm(q); u Cq/qTq. Calculate the D loading and rescale the scores and weights accordingly: Wold norm old) 10. pT, qT and w T are save for prediction 11. Regression for the inner relation: b Tt/t Tt 14. If enough components, stop; else with updated matrices KPLS method 10] The D matrix has characterization with many wavelengths and fewer samples. A kernel algorithm is based on eigenvectors to the kernel matrix DDTCCT. The size of the matrix DDTCC T is only dependent on the number of samples. Because the size of the kernel matrix is much smaller than the D matrix, the calculating process is much faster than that of the classical PLS. In the PLS, all the vectors w, q, and u can be calculated using the following eigenvalue-eigenvector equations.
wA1 (D TCC T qA2 (CTDDTC)q tA3 (DD T CCT)t g,, 4 (CCTDDT)u Here, /1--,4 are eigenvalues of these kernel matrices [12], and the vectors w, q, and u all have their norm equal to one. Using the kernel matrix DDTCCT, association matrices DD T and CCT, it is possible to calculate all score and loading vectors, and hence conduct a complete PLS regression. The PLS regression solution can be written as: C-DBpLs + F. The regression coefficients are expressed as: BpLs W(P T W) -1QT. The steps of the algorithm are as follows: (1) Calculate the covariance matrices DD T and CCT.
The kernel matrix DDTCC T is then created as: (DD T CCT)a (DDT)a CCT)a, where symbol a means the rank index (a 1,2,...,n).
(2) The PLS score vector is estimated as the eigenvector of the kernel matrix DDTCCT, it is expressed as: la (DD T CCT)ala (3) The PLS score vector u is calculated using the CC T covariance matrix: Ua (cCT)ata.
(5) Steps (2) Cuvettes with a path length of cm were used and the blank absorbance due to 5-Br-PADAP absorption was subtracted. Spectra were measured between 490 and 620nm at 2nm distances, giving values at 66 wavelengths for each standard solution. A spectra matrix D was built up.
Results and discussion  3. When three components were considered, the real error or residual standard deviation function, RE, had a value of 0.0008; the imbedded error function, IE, had a value of 0.0004; and the extracted error, XE, had a value of 0.0007. From these criteria, it was concluded that three absorbing species were present. IE represents the amount of error that remains imbedded in the abstract factor analysis reproduced data. Since RE > IE, abstract factor analysis can lead to data improvement.

KPLS method
The concentrations of the three metal ions in 15 standard solutions are shown in table 3. Spectra measured between 490 and 600 nm at 2 nm distances were extracted from the original D matrix as a training set. The D matrix is characterized by many wavelengths with a fewer numbers of solutions. The added concentrations of a set of eight synthetic 'unknown' samples are shown in table 4. The spectra of the 'unknown' samples were measured in the same way as the training set model. Using SPGRKPLS program, the concentrations of Mn(II), Zn(II), and Co(II) were found and are given in table 5. Average recoveries of Mn(II), Zn(II) and Co(II), and their relative deviations are listed in table 6. All the values measured were means of three replicates.
The size of BpLs, PIT and P matrices is large because of the large D matrix. In the present paper, all calculations of weight and loading are excluded from the iterative process of the program, they were only calculated once to obtain the regression coefficients. It is obvious that the BpLs can be calculated using the kernel matrix DDTCCT, and the associated matrices DD T and CCT. Both the kernel matrix and the association matrices are of size N x N, where JV is the number of solutions. This means that as long as N is small, no large matrices or vectors are used in the calculations. After the first dimension has been determined, the procedure of updating both DD T and CC T takes place by subtracting a rank-one matrix. Both DD T and CC T can be updated by left and right multiplication using the same matrix Ga-I--tata r, thereby avoiding having to return to the original large matrices D and C.
The RSEP is given by: where Cit and 0 are the actual and estimated concentrations for the ith component in the jth mixture, m is the number of mixture and n is the number of components. The SEP and RSEP for the two methods used for the three component systems are given in table 8. There was no significant difference in the precision of the predictions with PLS and KPLS. The prediction ability of two methods for cobalt is more precise than the others. No significant difference was observed in the precision of prediction between the PLS and KPLS routines in any of  Simultaneous determination of Mn(II), Zn(II) and Co(II) with 5-Br-PADAP and CPB by use of two full spectrum methods, PLS and KPLS, has been shown to be successful. The difficulty imposed by overlap of the absorption spectra was overcome by both methods. The KPLS method is not restricted by the number of wavelengths. When the numbers of wavelengths became large, the KPLS method is faster than the PLS method. Properly designed computer programs according to chemometric algorithm can provide successful tools for simultaneous determination.