Method comparisons, influence of the number, distribution and range of samples on performance claims

Because of the difficulties associated with obtaining patient samples and the labile nature of some analytes, manufacturers will always require the assistance of clinical chemistry laboratories in the establishment of performance claims, but our experience suggests that this work should not be undertaken lightly by laboratories and that manufacturers would be advised to assess the resources of any chosen site carefully before proceeding.


Introduction
The previous paper [2] described two method comparison studies which followed the guidelines of the National Committee for Clinical Laboratory Standards protocol PSEP-4, comparison of methods experiment ]. The Kodak Ektachem analytical system for urea and glucose was compared with Technicon AutoAnalyzer methodologies. Two hundred patient samples distributed according to PSEP-4 guidelines were analysed in duplicate by the test and comparative methods. Twice the minimum recommended number of patient samples were used in order to study the effect of sample size above as well as below the recommended minimum number. The data for glucose is presented and the data modified to produce changes in the sample number, distribution and range.
The estimates of slope, intercept and standard error of the estimate of y (Syx) from linear regression analysis are used in the calculation of the tolerance limits and in estimates of total error at medical decision levels, which provide a basis for manufacturers' performance claims. This paper illustrates the way in which sample number, distribution and range could alter the manufacturers' performance claims and gives an indication of the magnitude of these effects. The methods adopted for detection of outliers in the data can also have a marked effect on the claims made.

Materials and methods
Experimental methods and materials for glucose have been described previously [2]. The distribution of patient samples recommended for glucose analysis was Group A (<2.8 mmol/1) 10%; B (2.9-6.1 mmol/1) 40%; C (6.2-8.3 mmol/1) 30%; D (8.4-13.8 mmol/1) 10%; and Group E (>13.8 mmol/1) 10%. The information in the draft version of the PSEP-4 protocol contained a misprint and groups for glucose were given as A (10%), B (40%), C (20%, D (10%) and E (10%). In our experiment 20% of samples were-collected in Group E. However, the recommended distribution and our distribution have been compared with other possible distributions for one hundred samples by data modification described below.
The equations for linear regression analysis were those given in Davies et al [3]. Modification of the original data base of two hundred samples analysed in duplicate by test and comparative method is described below. C3 is the distribution recommended by the protocol and C4 Table 2.. Linear regression data on different data sets the distribution which forms the whole data base in this study.
Single analyses of samples Each sample was analysed in duplicate by the test method (Y1Y2) and the comparative (X1X2) method and therefore four possible combinations of single rather than duplicate analyses could be prepared:

Detection of outliers
Tests for detection of outliers,and exclusion results outside the range of each method were used as recommended in PSEP-4 and discussed more fully in a previous publication.

Results
The whole data base as an X/Y plot with the comparative method as the independent variable X was illustrated in a previous paper [2]. Visual inspection reveals no obvious non-linearity in the data. Figure shows the same data with the comparative method as the independent variable but the vertical axis being the bias of each individual test value from the comparative method (Y-X). Each test and comparative method value is the mean of duplicate determinations and the range of groups A to E is indicated.  Table 2.

Discussion
The NCCLS protocol, PSEP-4, states that "Inaccuracy is quantitated by the estimates of bias at various medical decision concentrations, Xc, and by estimation of total error at the medical decision concentration closest to the mean of the comparison of methods data". The bias of a test method at concentration X c is calculated, bias Yc-Xc whereY c is the predicted value at Xc and is given by a + b x c (the estimate of intercept is given by 'a' and of slope by 'b').
Clearly any factors which influence the estimates of slope and intercept are important in this context and the magnitude of the standard deviations of these estimates will determine the confidence which can be attached to them. The tolerance limits are calculated and used to estimate the expected total error. The tolerance limits for a desired population proportion (p) and specified confidence (3') may be calculated at Xc from the equation given in PSEP-4 xJ" (X c )2 Yc+KSy +N+ 2;(Xi-)2 where K is the appropriate tolerance factor for a normal distribution (K values for 3' 0.99 p 0.95 are used in this study) and Syx is the standard error of the estimate of y. Tolerance limits are calculated only for the medical decision concentration closest to the mean of the comparison of methods data. Total error is calculated by taking the differences between the tolerance limits and X c and the absolute value of the largest difference is taken as the estimate of total error. It can be seen that estimates of slope and intercept will influence the calculation of the predicted value Yc and that the magnitude of Syx will affect the tolerance limits and total error. The value of K is influenced by the number of samples used. Previous authors [3,4,5] have drawn attention to the effects of range and numbers of samples on various linear regression parameters. Slope s used in the calculation of Yc and different estimates of the slope are obtained with changes in range aa aa and distribution of data ca -c6.
The confidence attached to the estimates of slbpe (which decide whether the slope is significantly different from 1.0) is affected randomly in this comparison of methods by range (aa aal) and increased by numbers of samples (bl b4). The use of single (da d4) instead of duplicate (ds) analyses has a negligible effect in this set of data since only four out of 200 duplicate estimations were greater than the interval of 3.27 times the average absolute difference as recommended in PSEP-4. An additionally important advantage of duplicates is their value in the study of precision profiles ( Table 4).
The sign and magnitude of the intercept can also be shown to be influenced by range and distribution of data.
No definite trend is apparent when range is extended (a al ).but when the distribution is altered (ca c6)there was an increasing negative intercept related to the changing slope. The difference found between the intercept obtained for duplicate observations (ds) and various combinations of single observations (da d4) has little effect.
Range has no effect on Syx if the error in the data is constant throughout the range chosen for method comparison. Many clinical chemistry assays exhibit an increase in standard deviation with increasing analyte concentration. Precision profiles for glucose on the AutoAnalyzer and Kodak Ektachem show increasing imprecision (Table 4). Syx increases as more high concentration samples are included in the distribution (ca c6) and is also a function of range (aa as) ( Table 2) with consequent effects on estimates of tolerance limits and total error. Syx, the error about the regression line, is independent of sample size [3] and this is illustrated in Table 2, b2 b4. Sample size has very little effect on linear regression parameters but range and distribution can have effects on slope, intercept and Syx. This is illustrated by the values observed for the calculation of total error in Table 3, which combines slope, intercept and Syx. For example, a change in sample size b2, ba and b4 has less effect on total error than a change in distribution of samples ca -c6 and in range of samples a4, a8, al 0 and al 1. The establishment of performance claims by manufacturers as described by the NCCLS includes a comparison of  concerning bias and total error,which are derived from linear regression parameters. In our studies we found that range and distribution had the greatest influence on slope, intercept and Syx, whereas the sample numbers studied had little effect on these parameters. It would therefore seem appropriate to define a minimum range of values and suggested distributions for individual analytes and to provide this information in association with performance claims. Careful inspection of graphical presentation of data is of primary importance. The conventional XY plots of data provide the best approach to the detection of non-linearity whereas the presentation given in Figure where the bias of each individual test result from the comparative method is plotted against the value for the comparative method provides a valuable opportunity to evaluate bias between methods at different analyte concentrations particularly as the scale of the Y axis can be expanded as required. It would also seem appropriate to define the medical decision concentration for calculation of tolerance limits and total error and chose the concentration range and distribution to give a mean value approximating to this concentration.