The Real Time Analyzer (RTA) utilizing DC and ACvoltammetric techniques is an in situ, online monitoring system that provides a complete chemical analysis of different electrochemical deposition solutions. The RTA employs multivariate calibration when predicting concentration parameters from a multivariate data set. Although the hierarchical and multiblock Principal Component Regression (PCR) and Partial Least Squares (PLS) based methods can handle data sets even when the number of variables significantly exceeds the number of samples, it can be advantageous to reduce the number of variables to obtain improvement of the model predictions and better interpretation. This presentation focuses on the introduction of a multistep, rigorous method of dataselectionbased Least Squares Regression, Simple Modeling of Class Analogy modeling power, and, as a novel application in electroanalysis, Uninformative Variable Elimination by PLS and by PCR, Variable Importance in the Projection coupled with PLS, Interval PLS, Interval PCR, and Moving Window PLS. Selection criteria of the optimum decomposition technique for the specific data are also demonstrated. The chief goal of this paper is to introduce to the community of electroanalytical chemists numerous variable selection methods which are well established in spectroscopy and can be successfully applied to voltammetric data analysis.
Electrochemically deposited copper from an acidic bath is a commonly used method of onchip production of interconnects for microelectronics [
Accurate and prompt concentration monitoring and control of multicomponent electroplating baths are indispensable in satisfying process specifications for the manufacturing of electronic components while minimizing production costs. The Real Time Analyzer, utilizing numerous voltammetric techniques, is an in situ, online monitoring system that provides a complete chemical analysis of electrometallization solutions. The fully computerized instrumentation requires no specially trained chemical operators and practically eliminates the need for a chemical analytical laboratory.
Typically used organic additives include (i) suppressors (like polyethers) that inhibit the rate of copper deposition at the tops of the trenches and vias by increasing the copper ion reduction overpotential by surface absorption interaction with chloride ion [
The organic additives undergo significant changes of concentration during bath usage [
Ni and Kokot [
Wikiel et al. [
Specific waveforms are developed to produce voltammograms having regions which show linear dependence of the current on the concentration of the analyte of interest, while being practically immune to varying concentrations of all other bath constituents. Sometimes, the voltammograms recorded for a single waveform contain several portions (ranges of points of voltammogram) meeting this objective. For such a waveform, its data can be divided into meaningful blocks in order to improve the interpretability. The ability of building a multivariate analytical model utilizing information contained in each of the ranges (blocks) of the voltammogram leads to increased diversity within the data as compared to singlerangebased data sets. The diversity among the data results in a greater robustness of the calibration model calculated based on that data.
Hierarchical Principal Component Regression (HPCR) [
The chief goal of this presentation is to show that the variable selection methods whose applications are well established in spectroscopy can also be transferred to electroanalytical data. Specifically, this objective is achieved by an introduction of a rigorous, multistep procedure for selecting the blocks of the voltammogram to be subsequently used for analytical model developments and a choice of proper data decomposition technique in order to compress the multivariate voltammetric data and reasonably extract information.
Wikiel et al. [
Modern chemometrics is a mature scientific discipline presenting the researcher with a vast number of powerful data decomposition techniques. Although some techniques appear more advanced than the others because of their mathematical complexity, the most suitable methods should be properly chosen depending on the kind of data to be analyzed in order to develop a sound and robust analytical model.
Voltammetric experiments were performed utilizing the Real Time Analyzer (Technic, Inc., Cranston, USA), a fully computercontrol electroanalytical system. Measurements were conducted inside a compact flowthrough electrochemical cell (electrode compartment of the MultiTask Electrochemical Probe (MTEP)) submersed in the temperaturecontrolled (25 ± 0.2°C) plating solution. The volume of the inner cell compartment was about 20 mL. The solution to be analyzed was circulated inside the probe using a softwarecontrolled diaphragm pump (KNF Neuberger, Balterswil, Switzerland).
The electrochemical cell was a classical threeelectrode system with a working electrode made of platinum (JohnsonMatthey) wire (1 mm diameter; 10 mm length), an auxiliary platinum (JohnsonMatthey) foil electrode forming a cylinder around the working electrode, and an in situ generated reference electrode made of copper metal deposited onto a platinum (JohnsonMatthey) wire, immediately before the measurement.
All inorganic chemicals were of Analytical Grade (J.T. Baker, Phillipsburg, NJ). A proprietary organic additive system (Enthone, West Haven, CT) was used in this study.
All calculations were performed in the MATLAB R2012b (The MathWorks, Inc., Natick, MA) environment. The procedure for PLS and scaling routines were taken from the PLS Toolbox 6.5.4. (Eigenvector Research, Inc., Manson, WA,
The subject of this investigation was the ViaForm® copper plating bath which consisted of six components: copper (II) ion (from copper sulfate), sulfuric acid, chloride ion, suppressor, accelerator, and leveler present for the target concentrations of 0.785 M, 0.820 M, 1.50 mM, 7.00 mL L^{−1}, 7.00 mL L^{−1}, and 0.76 mL L^{−1}, respectively.
The specific objective was to create a calibration model for suppressor in the presence of varying concentration of accelerator and leveler. The concentrations of suppressor, accelerator, and leveler were varied linearly on four levels within the ranges of 5.00 to 9.00 mL L^{−1}, 5.00 to 9.00 mL L^{−1}, and 0.38 to 1.13 mL L^{−1}, respectively. The training set was composed as a 5level3component linear orthogonal array exploring uniformly distributed 25 combinations of suppressor, accelerator, and leveler concentrations. Additionally, the training set was augmented by the voltammetric data recorded for three standard solution containing suppressor, accelerator, and leveler concentrations at the low, target, and high limits. The inorganic bath components, copper (II) ion, sulfuric acid, and chloride ion, were held constant at their target levels. The voltammetric data were recorded for each of the 28 solutions in triplicate resulting in 84 samples
Composition of the training set solutions.
Calibration solution #  Suppressor 
Accelerator 
Leveler 


5.00  5.00  0.38 

5.00  6.00  0.57 

5.00  7.00  0.76 

5.00  8.00  0.94 

5.00  9.00  1.13 

6.00  5.00  0.57 

6.00  6.00  0.76 

6.00  7.00  0.94 

6.00  8.00  1.13 

6.00  9.00  0.38 

7.00  5.00  0.76 

7.00  6.00  0.94 

7.00  7.00  1.13 

7.00  8.00  0.38 

7.00  9.00  0.57 

8.00  5.00  0.94 

8.00  6.00  1.13 

8.00  7.00  0.38 

8.00  8.00  0.57 

8.00  9.00  0.76 

9.00  5.00  1.13 

9.00  6.00  0.38 

9.00  7.00  0.57 

9.00  8.00  0.76 

9.00  9.00  0.94 

5.00  5.00  0.38 

7.00  7.00  0.76 

9.00  9.00  1.13 
In order to assess the predictive abilities, the calibration model was externally validated on the validation set. The external validation set consisted of 27 data sets obtained for 9 solutions containing suppressor, accelerator, and leveler varied linearly on three levels within the ranges of 5.50 to 8.50 mL L^{−1}, 5.50 to 8.50 mL L^{−1}, and 0.47 to 1.04 mL L^{−1}, respectively. The validation set was a 3level3component linear orthogonal array exploring uniformly distributed 9 combinations of suppressor, accelerator, and leveler concentrations. The composition of the validation set is presented in Table
Composition of the validation set solutions.
Validation solution #  Suppressor 
Accelerator 
Leveler 


5.50  5.50  0.47 

5.50  7.00  0.75 

5.50  8.50  1.04 

7.00  5.50  0.75 

7.00  7.00  1.04 

7.00  8.50  0.47 

8.50  5.50  1.04 

8.50  7.00  0.47 

8.50  8.50  0.75 
Waveform design is a preliminary step of the plating bath analysis utilizing voltammetry [
In some cases, it is possible to obtain a waveform producing voltammograms whose several portions can be utilized for calibration calculation. Figure
Multicyclic staircase voltammograms (
The voltammetric data throughout this paper are predominantly considered numerical input for chemometric treatment aiming to select specific sections of the voltammograms which are most informative for subsequent calibration calculation. Therefore, the abscissa of the voltammograms in this paper is the index point of the voltammogram rather than the applied potential as is usually used. The values of the index points of the voltammogram are transformed values of the applied potential. The index points of voltammograms are a linear function of the voltammetric current sampling time. For the voltammogram of Figure
The recorded current corresponds to the effect of suppressor on copper ion reduction (index points 3340–3400) and on copper metal oxidation (index points 3420–3460). Examination of the voltammograms shown in Figure
Wikiel et al. [
The initial stage of determining of the most promising ranges of the voltammogram to be taken for the calibration calculation utilizes two independent procedures applied for each index point of that voltammograms within the training set data:
correlation calculation based on the univariate LSR,
SIMCA based on calculation of modeling power [
The LSRbased method provides information about which range(s) of the voltammogram shows the greatest correlation with the concentration of the component to be calibrated. It also determines the range where the current responses depend only on changes in concentration of the component of interest. In this method, the regression equation for each index point of the training set voltammograms of Figure
The SIMCAbased method gives information about noisetosignal ratio for each point within the chosen range. This method [
The modeling power, as implemented in the initial stage of the method for selection of optimum ranges, provides information about noisetosignal ratio for each point of the voltammogram. As
Figure
Preliminary estimation of the ranges of points of staircase voltammogram, variables
Data range index, 
Signal  Range of voltammogram, variables 
Width, 


Average 
Average 


Reduction  1401–1430  30  −268  −338  0.9414  0.9804 

Oxidation  1476–1496  21  122  322  0.9204  0.9309 

Reduction  1785–1826  42  −197  −287  0.9592  0.9663 

Oxidation  1865–1888  24  102  333  0.9078  0.9085 

Reduction  2173–2219  47  −168  −268  0.9379  0.9542 

Oxidation  2256–2279  24  102  333  0.8871  0.9005 

Reduction  2562–2612  51  −208  −248  0.9110  0.9455 

Oxidation  2645–2670  26  82  333  0.8599  0.8895 

Reduction  2952–3005  54  −137  −228  0.8867  0.9375 

Oxidation  3038–3062  25  102  343  0.8518  0.8858 

Reduction  3344–3399  56  −148  −198  0.8703  0.9351 

Oxidation  3428–3454  27  92  353  0.8447  0.8755 

Reduction  3734–3793  60  −134  −168  0.8564  0.9280 

Oxidation  3819–3846  28  92  363  0.8436  0.8712 
Squared coefficient of correlationforsuppressorconcentration (
The subsequent step (second step of the SIMCA modeling power) is to determine the optimum ranges of voltammograms based solely on the modeling power technique. The voltammograms of the training set for each of the
SIMCA modeling power
One can notice in Table
Individual (range by range) selection of the ranges of points of staircase voltammogram, variables
Data range index, 
Signal  Range of voltammogram, variables 
Width, 




Reduction  1401–1430  30  −268  −338 

Oxidation  1478–1496  19  142  322 

Reduction  1785–1826  42  −197  −287 

Oxidation  1871–1888  18  162  333 

Reduction  2173–2219  47  −168  −268 

Oxidation  2263–2279  17  172  262 

Reduction  2563–2612  50  −158  −248 

Oxidation  2655–2670  16  182  333 

Reduction  2955–3002  48  −168  −228 

Oxidation  3048–3062  15  202  343 

Reduction  3345–3395  51  −158  −238 

Oxidation  34413454  14  222  353 

Reduction  3735–3786  52  −148  −238 

Oxidation  3834–3846  13  242  363 
As the values of modeling power, calculated via PCA decomposition with a single factor, presented in Figure
The uninformative variables (index points of voltammogram) increase the bias and imprecision of the latent variables [
Because the latent variables are liner combinations of the original ones, the PLS and PCR models can be expressed as
The UVEPLS and UVEPCR (number of latent variables for decomposition,
Regression reliability,
For each variable
Table
Extreme values of parameters calculated for individually selected ranges of points of staircase voltammogram, variables
Data range index, 
Range of voltammogram/ 
UVEPLS 
UVEPCR Min Abs( 
Min PLSVIP  IPLS 
IPCR 
MWPLS 


1401–1430  615  606  1.47  .350  .350  .313 

1478–1496  331  334  1.47  .356  .357  .334 

1785–1826  424  423  1.47  .245  .245  .268 

1871–1888  281  281  1.47  .274  .275  .301 

2173–2219  335  334  1.47  .224  .225  .237 

2263–2279  250  251  1.47  .276  .278  .276 

2563–2612  285  285  1.48  .286  .286  .213 

2655–2670  231  232  1.48  .327  .328  .252 

2955–3002  237  241  1.48  .131  .131  .174 

3048–3062  225  227  1.47  .221  .223  .232 

3345–3395  230  230  1.49  .162  .162  .162 

3441–3454  224  226  1.47  .227  .229  .214 

3735–3786  222  222  1.49  .237  .237  .142 

3834–3846  225  226  1.47  .317  .317  .198 
PLSVIP is a combined measure of how much a variable contributes to describe the two sets of data: the dependent
The weights in a PLS model reflect the covariance between the independent and dependent variables and the inclusion of the weights is what allows VIP to reflect not only how well the dependent variable is described but also how important that information is for the model of independent variables. Since the average of squared VIP scores equals unity, values smaller than one indicate nonimportant variables.
The PLSVIP (number of latent variables for decomposition,
Variable Importance in the Projection by PLS (PLSVIP) (see (
In IPLS [
RMSECV calculated with IPLS for 125 nonoverlapping intervals of width of 31 variables and 1 interval of 36 variables of mean centered voltammograms for
RMSECV calculated with IPCR for 125 nonoverlapping intervals of width of 31 variables and 1 interval of 36 variables of mean centered voltammograms for
It can be seen in Table
In order to address the issues outlined in the paragraph above, MWPLS was implemented. This technique allows a sufficient representation of “proper” variables neighboring on both sides, the variable for which the RMSECV is calculated. The results obtained with MWPLS are presented in Figure
RMSECV calculated with MWPCR for a moving onestepatatime interval of 31 points of mean centered voltammograms for
There must be an agreement about the outcome of the selection of variables for all methods presented (LSR, twostep SIMCA modeling power, UVEPLS (UVEPCR), PLSVIP, IPLS, IPCR, and MWPLS) in order to accept these variables for further model development. As Andersen and Bro [
Multivariate data decomposition techniques are employed in chemometrics in order to compress vast amount of data while extracting the significant information. The techniques commonly used can be divided into three major groups: (i) twoway techniques: PCA and PLS; (ii) hierarchical and multiblock techniques: HPCA [
The structure of the voltammetric data of the training set to be decomposed is described in Table
Qualitatively, the data of Table
One may consider application of MPCA for decomposition of the multiblock training data of Table
The hierarchical and multiblock data decomposition techniques [
Westerhuis et al. [
Calibration calculations were conducted using comparatively CPCR, HPCR, HPLS, and MBPLS methods for the training set data (
Decomposition and regression Technique  # of factors, 

PRESS  MNSRTEP  MNSREP 

CPCR  1  0.9948  0.5475  1.6381  1.7168 
CPCR 





CPCR  3  0.9971  0.1859  1.0152  1.0019 
HPCR  1  0.9948  0.5470  1.6375  1.7161 
HPCR 





HPCR  3  0.9972  0.1892  1.0393  1.0302 
HPLS  1  0.9954  0.5056  1.6003  1.6746 
HPLS 





HPLS  3  0.9967  0.3397  1.4312  1.5598 
MBPLS  1  0.9949  0.5423  1.6323  1.7103 
MBPLS 





MBPLS  3  0.9890  0.6846  1.8698  2.0860 
Additionally, the predictive performance of the CPCR, HPCR, HPLS, and MBPLS analytical models was comparatively assessed by determining the values of the following two derived parameters: mean nosign relativetotarget error of prediction and mean nosign relative error of prediction (Table
The mean nosign relative error of prediction (MNSREP) is described by the following equation:
While examining the results in Table
The performance of all four techniques is equivalent to the exception of the HPLS and MBPLS results for
Decomposition and regression Technique  # of factors, 

PRESS 

CPCR  3  0.9929  1.2435 
HPCR  3  0.9929  1.2375 
HPLS  3  0.9938  1.0798 
MBPLS  3  0.9964  0.6294 
Figure
Actual (black) and predicted with CPCR (red), MBPLS (yellow), HPCR (blue), and HPLS (green) concentrations of suppressor for external validation set for
This paper has introduced several variable selection techniques (UVEPLS, UVEPCR, PLSVIP, IPLS, IPCR, and MWPLS) to the field of electroanalysis which are well established for analysis of spectroscopic data. Specifically, the rigorous, multistep procedure of selecting the blocks of the voltammogram to be used subsequently for analytical model development based on LSR, twostep SIMCA modeling power, UVEPLS, UVEPCR, PLSVIP, MWPLS, IPLS, and IPCR focused on individual blocks was introduced. To ensure feasibility of the model, several variable selection methods were utilized comparatively to verify that their results do not contradict each other. Detailed criteria for choosing the decomposition technique proper for the data in order to compress the multivariate data and reasonably extract information were presented. The optimization of the number of factors based on external validation and crossvalidation was presented. As a general recommendation, a few variable selection methods need to be implemented concurrently for the investigated multivariate data set, as their consistenttoeachother performance provides an evidence for an absence of artifacts in that data. All the methodology for data selection can be automated and ultimately can lead to a fully automatic data selection process that would reduce the time and effort of method development significantly.
Electroanalytical chemists would substantially benefit from utilization of the variable selection methods allowing them to focus only on relevant portions of voltammetric responses while eliminating those uninformative. Otherwise, the incorporation of the irrelevant variables (both random and systematic) into a multivariate model leads to less precision (higher variance due to imbedded error). By analogy to other disciplines of analytical chemistry, variable selection should become a routine step of multivariate data pretreatment in electroanalytical chemistry by utilizing existing chemometric techniques. This paper proves that existing variable selection methods can be transferred to electroanalytical data.
Anodic Stripping Voltammetry
Consensus Principal Component Regression
Cyclic Voltammetry
Generalized Rank Annihilation Method
Hierarchical Principal Component Regression
Hierarchical Partial Least Squares
Interval Principal Component Regression
Interval Partial Least Squares
Least Squares Regression
Multiblock Partial Least Squares
Mean NoSign Relative Error of Prediction
Mean NoSign RelativetoTarget Error of Prediction
Multiway Principal Component Analysis
MultiTask Electrochemical Probe
Moving Window Partial Least Squares
Multilinear Partial Least Squares
Parallel Factor Analysis
Principal Component Analysis
Principal Component Regression
Partial Least Squares
Variable Importance for Prediction by Partial Least Squares
Predictive Residual Sum of Squares
Root Mean Square Error of CrossValidation
Simple Modeling of Class Analogy
Uninformative Variable Elimination by Principal Component Regression
Uninformative Variable Elimination by Partial Least Squares.
Dr. Aleksander Jaworski, Dr. Hanna Wikiel, and Dr. Kazimierz Wikiel declare that there are no conflicts of interest regarding the publication of the paper.
The authors would like to thank Dr. Allan Reed for his comments on the manuscript.