Effects of experimental design on calibration curve precision in routine analysis

A computational program which compares the effciencies of different experimental designs with those of maximum precision (D-optimized designs) is described. The program produces confidence interval plots for a calibration curve and provides information about the number of standard solutions, concentration levels and suitable concentration ranges to achieve an optimum calibration. Some examples of the application of this novel computational program are given, using both simulated and real data.


Introduction
Calibration is a very important step in any analytical procedure. The choice and arrangement of standard solutions, i.e. experimental design, may affect the precision with which a calibration curve can be estimated.
Since a calibration curve is the basis for predicting concentrations of unknown samples, the purpose of good experimental design is to obtain the best possible predictive power ].
Although the theoretical importance of experimental design is widely recognized [2], the relevance of its applicability to analytical laboratory procedures has only attracted limited attention [3,4]. For the construction of a calibration curve, the usual practice is still to divide the experimental region uniformly. This procedure, however, is only indicated when there is a need to establish the linear concentration range. For routine analyses, where this range is known in advance, this procedure is not recommended. This paper describes a computational program which allows selection of the best experimental design for a given situation, based on efficiency values and plots of confidence intervals. Some examples of its application in typical routine analysis are also given, using simulated and real data.
* Author to whom correspondence should be addressed.

Theory
The underlying theoretical principles used in this paper follow; a more detailed discussion can be found elsewhere [5][6][7]. Assuming the linear model, y bo + blx (where y is the dependent variable, x the independent variable and b0 and bl the estimates of the model parameters), the precision of the calibration curve can be evaluated from the width of the confidence interval for the analytical response y0 corresponding to a certain concentration value, x0, which is given by: (YO)e --YO + ts + -n --2 (Xi Xm,2 !)j (1) where (Y0)e is the true value of the analytical response at concentration x0; Y0 is the predicted response by the model for the concentration x0; is a point in Student's distribution; s is the estimate of the standard deviation of y; n is the number of measured responses; and Xm is the average of x values.
In matrix notation, this equation can be written as" where: (Y0)e =Y0 -t-ts[1 + x;(XtX)-lx0] 1 proposed, the most popular consists in choosing the design that minimizes the determinant of (X'X) -1. Such a design is said to be D-optimized [3,8].
For D-optimized designs, the number of concentration levels must always be equal to the number of parameters in the chosen model. For a linear model, for example, which is defined by two parameters, the minimum value of det(X'X) -1 is attained when the concentration levels coincide with the two extreme points of the calibration range. In practice, D-optimized designs are not recommended for all situations, as sometimes it is not previously known whether the functional relationship between x and y is really linear within the range of interest. In such cases, the experimental design must include at least three concentration levels, to allow for a lack-of-fit test of the proposed model [6].
For a given calibration range and a number of measurements, several different designs are possible. Figure  shows four designs, all with the same number of measurements and the same working range (1-6 arbitrary concentration units). Design a is often employed in routine laboratories. Design d is the D-optimized design for these conditions, and results in the most precise calibration curve. A way of comparing a given design with the corresponding D-optimized design is to calculate its efficiency, which is defined by equation (4): where det(X'X) is the determinant of (X'X) for the chosen design and det(XtX)D_otim is the determinant of (X'X) for the D-optimized design with the same number of measurements. Description of the DESIGN computational program DESIGN was written in the MATLAB high-level language and was designed to be user-friendly; a flow chart of the program is shown in figure 2. Three options allow evaluation of different experimental designs in fitting linear models: (1) Selection of the number of standard solutions, taking into account the width of the confidence interval for the D-optimized design.
(2) Selection of the concentration levels (number and location).
In option 1, the user evaluates the effect of the number of standard solutions used for calibration, from plots of confidence intervals for D-optimized designs. The program asks for the lower and higher concentrations of the working range, and the number of standard solutions of the design.
Once the number of standard solutions is chosen, option 2 permits the effect of the number and distribution of concentration levels on the calibration curve to be estimated. Efficiencies and plots of confidence intervals for selected designs are given. Since this option is intended to access only the effect of how many and which levels should be selected, all designs must have the same lower and higher concentrations, in addition to the same number of standard solutions.
In option 3, there is no constraint on the designs under comparison. Designs with different numbers of standard solutions, different numbers of concentration levels or different working ranges are evaluated through confidence interval plots. The concentration range shown on the plot is chosen by the user.
The subprogram used to calculate confidence intervals and efficiencies is described in listing in the Appendix to this paper.

Simulations
Two practical examples were used to demonstrate the use of the options in the DESIGN program: the determination of potassium and iron in drinking water by flame emission spectroscopy (FES) and inductively-coupled plasma atomic emission spectrometry (ICP-AES), respectively. For both cases a calibration curve needs to be constructed relating the intensity of the measured signal (dependent variable y) to the concentration of the analyte in the sample (independent variable x).
It is important to bear in mind that replicate levels require full authentic replicate determinations, and not just replicated measurements of the same solution.  Figure 3 shows that the larger the number of standard solutions, the smaller the confidence interval and, therefore, the more precise the calibration curve. This is explained by the decrease in and 1In in equation (1).
However, it is also evident that this trend is progressively damped, so that beyond a certain point (e.g. N > 6) the improvement in precision does not seem to be enough to compensate for the additional effort of preparing more standard solutions. It is then suggested that designs with six different standard solutions are adequate for the construction of calibration curves for routine analysis.
Selection of the number of concentration levels (option 2) For a linear working range, the ideal experimental design is the D-optimized one [3,8]. In practice, however, range linearity cannot always be taken for granted and it is advisable to employ at least three concentration levels to allow for a lack-of-fit test of the assumed linear model [5][6][7]. The selection of how many and which concentration levels are to be used to construct the calibration curve can be carried out through comparison of the efficiencies of the chosen designs with that of the  In order to select the number of concentration levels the following designs may be considered: Decreasing the number of levels increases the design efficiency and consequently improves the precision of the calibration curve too (see figure 4). Nevertheless, this tendency cannot be generalized, because design efficiency depends not only on the number of levels but also on which points in the working range they are located at. To illustrate this, let us consider the following designs, whose efficiencies are shown in figure 5: a: [1 5 5 10 10], three levels b: [1 3 7 10 10] four levels c: [1 2 9 10 10] four levels.
As shown in figure 5, although design a has only three levels, it results in lower efficiency relative to the other four-level designs. Thus, the selection of concentration levels can be very specific and highly dependent on the kind of analysis being performed. In general, it is recommended that: (1) Four concentration levels be used, to have not only high efficiency but also two degrees of freedom to test for lack of fit of the linear model [4,5]. (2) Measurements be taken as close as possible to the calibration range limits, which is the choice associated with D-optimized designs. From these considerations, an experimental design like c in figure 5 should be employed in the example of the FES potassium assay in water. This design involves six and allows for lack-of-fit testing with two degrees of freedom, yet still has a high efficiency.

Comparison between designs (option 3)
Consider, for example, the determination of iron in drinking water by I CP-AES. In most of the samples, iron concentration is between 0.01 and 0.5 mg/1 and this working range is normally used, although the linear range is much larger. As a consequence, it is not necessary to test for linearity. Besides, it is known from experience that iron concentration in uncontaminated drinking waters is close to 0.01 mg/1, rising to 0.5 mg/1 in iron-contaminated samples. Which design should be adopted for analysing these samples? Recalling that the precision of a calibration curve increases towards the mean [term (x0-Xm) in equation (1) Confidence interval plots for these designs are given in figure 6. The best design for a given case is the one whose mean standard solution concentration is closest to the expected concentrations of the samples to be analysed. As a result, design a in figure 6 should be used for samples without iron contamination, while design b is suitable for contaminated samples. Design c is indicated for batches containing both kinds of samples. Another relevant aspect of experimental design is the selection of the working range. The designs just compared have the same concentration range, but the program permits comparison of designs with different numbers of levels and different ranges.
In order to demonstrate the application of DESIGN, we return to the iron assay in drinking waters by ICP-AES. As already stated, in this analytical procedure the linear ranges are very wide [6] and the working range can be increased without risking deviation from linearity.
The following designs can be used to illustrate the use of option 3: Results are shown in figure 7: design b provides a smaller confidence interval, when the whole range is considered (design a is better only in a small region, close to the concentration mean). This is due to the presence of the Y](X Xm) sum in equation (1).
Applications A real example of iron determination in natural waters by atomic absorption spectrophotometry (AAS) follows.
This illustrates the effect of some experimental designs discussed in this paper on the precision of estimated sample concentration [equation (3)].
Measurements were made using a Perkin-Elmer instrument, Model 503, following the procedure described in the manufacturer's manual. Standard solutions were prepared by dilution of Titrisol (MERCK) standards.    It is important to note that the estimates of bl and s in equation (3) are approximated to 0.0359 and 0.0008, respectively, for curves A-G above. Hence, confidence intervals will only depend on the terms related to experimental design It, n and X, in equation (3)]. (1) When the number of standard solutions increases from four (design A) to six (design B), the width of the confidence interval decreases 39% (from 0.117 to 0.071 mg/1). When the number of standard solutions goes from six to eight, the same interval decreases only by 14% (from 0.071 to 0.061 mg/1). This small decrease needs to be weighed against the task of preparing two more standard solutions to decide whether the increase in precision is worth the trouble.
(2) Designs B, F and G illustrate the effect of the number of concentration levels. Among these, the D-optimized design B presents the narrowest confidence interval, but design G, with four levels close to the limits of the working range, presents practically an equal interval ( Experimental design, of course, varies from case to case and only the user is able to decide which design is the best for any particular situation.

Conclusion
The DESIGN computational program presented in this work is conceived as a tool for the experimental design needed in building calibration curves. It is very simple to use and allows the experimenter to choose among several options. It is hoped that this program will stimulate analytical chemists to adopt the practice of planning their experiments as a matter of routine.