An evaluation of the Kodak glucose/ BUN analyser including experience with proposed testing protocols

concept in clinical chemistry, namely the use of dry chemistry films to be used initially for colorimetric analysis [2,3]. The glucose and urea methods were evaluated in this study following the guidelines by the National Committee for Clinical Laboratory Standards (NCCLS) [1] which are recommendations for goals to be sought by manufacturers. This evaluation protocol is designed to provide a multipurpose evaluation framework for a wide range of methods and instruments and is in three sections. PSEP-2 (proposed standard for establishing performance claims) describes the four-week baseline period which is used to establish confidence limits for the controls used throughout the study. PSEP-3 details the precision study and PSEP-4 describes the comparison of methods experiment. The Kodak Ektachem GLU/BUN analyser is a microprocessor controlled discrete analyser which operates in single or dual test mode. The instrument used in the present study was an engineering model used in the USA and Europe and was designed to evaluate the concept of quantitative chemical analysis using multilayered reagents. It was used according to the operators manual. The instrument was modified to give urea values, as opposed to blood urea nitrogen values, and was calibrated in mmol/L.


Introduction
The Eastman Kodak Company has recently described a new concept in clinical chemistry, namely the use of dry chemistry films to be used initially for colorimetric analysis [2,3]. The glucose and urea methods were evaluated in this study following the guidelines by the National Committee for Clinical Laboratory Standards (NCCLS) [1] which are recommendations for goals to be sought by manufacturers.
This evaluation protocol is designed to provide a multipurpose evaluation framework for a wide range of methods and instruments and is in three sections. PSEP-2 (proposed standard for establishing performance claims) describes the four-week baseline period which is used to establish confidence limits for the controls used throughout the study. PSEP-3 details the precision study and PSEP-4 describes the comparison of methods experiment. The Kodak Ektachem GLU/BUN analyser is a microprocessor controlled discrete analyser which operates in single or dual test mode. The instrument used in the present study was an engineering model used in the USA and Europe and was designed to evaluate the concept of quantitative chemical analysis using multilayered reagents. It was used according to the operators manual. The instrument was modified to give urea values, as opposed to blood urea nitrogen values, and was calibrated in mmol/L.

Methods and materials
NCCLS protocol for establishing performance claims for clinical chemistry methods The protocol is in three sections [1]. The sections and the material used are described below.
Performance check experiment PSEP-2 The protocol describes control criteria which were established for the test method. These criteria were used to control performance of the Kodak test methods during the performance of subsequent sections of the protocol. The control sera used were lyophilised human material provided by Kodak for the LOW and HIGH levels and Wellcomtrol II (Wellcome Reagents Limited, Kent, UK) for the MID-level.
Replication experiment PSEP-3 The replication (imprecision) protocol specifies a period of twenty days and a total of forty analytical runs. The 'midi' version was chosen as it is designed for medium rate automated methods. The midi experiment involves the analysis of half the number of samples compared with the maxi version, and unlike the mini experiment still enables the effects of carryover to be investigated. For each concentration level studied two different estimates of within run and total imprecision are required for presentation of performance claims. The first, designated 'point estimate', is the actual standard deviation observed .in the experiment performed and the second, designated 'tolerance limit', represents the upper limit that with 95% confidence will contain the estimate of standard deviation from 99% of all similar experiments. for glucose were collected and stored at-20C. Each day twenty-five samples were thawed and analysed as random duplicates by the test and comparative methods. Analyses were performed within two hours by both techniques, and each set contained samples in all category levels. Regression analysis on the data for each analyte used the mean of paired duplicates. The comparative method was the independent variable (X) and the Kodak Ektachem system the dependent variable (Y). In order to illustrate the effect of the advice given in the NCCLS protocol on preparation of manufacturers claims three sets of data have been used for regression analysis. Set e included results from all samples. Set e2 excluded samples if the mean of any duplicate by the Kodak Ektachem system was outside the dynamic range specified in the operators manual [4], that is for glucose 1.11-33.3 retool/1 and for urea 0.71-42.8 mmol/1, or if the mean of any duplicate by the comparative method was outside the manufacturers quoted dynamic range. Set ea was prepared by using the standard error about the regression line (Syx) calculated from data set e2 to apply the test for outliers whereby up to three pairs showing a difference of greater than 3.5 times Syx can be excluded before the final regression analysis. The number of pairs with a difference in excess of 3.5 times Syx is given with all regression statistics. The regression statistics obtained on data set ea were used to calculate average bias (Yc-Xc, where Yc is the test method value at medical decision concentration Xc) at different medical decision concentrations,. Tolerance limits were calculated for Yc so that there is a 99% probability Volume 3 No. 4 October 1981 that 95% of the sample results are included within the upper and lower limits and total error was established as the absolute value of the largest difference between the tolerance limits and Xc. It is recommended that the tolerance limits and total error be calculated only for the medical decision concentration closest to the mean of the comparative method results. Evident non-linearity of the data must be assessed visually in a scatter plot. It has been recommended [5] that the linear-regression procedures used in this paper should be restricted to those cases where the correlation coefficient (r) exceeds 0.99. Similarly it has been suggested [6] that the range of values used is adequate when the standard deviation of the comparative method values (SDx) is greater than seven times Syx. The correlation coefficient and the ratio SDx/Syx are related [7]. With a slope of 1.00 and when the number of pairs is 200 and SDx/Syx 7.0 then r 0.9900. Values for both r and SDx/Syx are given with regression statistics. The relationship between the two values is such that changes in SDx/Syx are easier to observe. Wellcome Group quality control programme Samples previously sent out by the scheme from 16th October 1978 to 26th March 1979 were analysed by the comparative and Kodak methods. Two duplicate sets of twelve lyophilised bovine sera were provided and a sample for analysis taken from each of the twenty-four bottles after reconstitution, giving twelve duplicate analyses for each analyte.
Analysis of the twelve samples by participating laboratories is normally spread over a six month period and the analysis of results returned includes the overall mean for each analyte, that is the mean of all results returned, with results greater than three standard deviations from the mean excluded, and method means which represent the mean of all results from laboratories with a particular method classification. Appropriate standard deviations are also provided. The overall mean values in the samples used ranged from 3.40 to 13.77 mmol/L for glucose and from 5.21 to 23.61 mmol/1 for urea.
Comparative analytical methods Standard AutoAnalyzer (Technicon) methodologies were used as the comparative methods. The diacetyl monoxime reaction for urea employed aqueous urea standards, the sample rate was 60/hour and samples with values above 20 mmol/1 were diluted one in five in deionised water. For glucose the method was a glucose oxidase/peroxidase reaction with phenol and aminoantipyrine. Standards were prepared in saturated aqueous benzoic acid solution, the sample rate was 60/hour and sera with values above 20 mmol/1 were diluted one in five in deionised water.
Kodak methods Details of the glucose and urea slide chemistry are described by Curme et al 2] and Spay_d et al [3 ]. Cartridges containing fifty slides were stored at 4C and allowed to warm to room temperature for half an hour before the foil pack was opened. All experimental work reported here employed one coating batch, with daily calibration using three serum calibrators. Two hundred microliter samples were used for all studies.
Samples with urea values above 40 mmol/1 were diluted one in two with water.

Results
Performance check experiment (PSEP-2) Baseline performance data for forty sets of triplicate determinations over a twenty day period appear in Table 2, which provided performance check parameters for the replication and method comparison studies. Forty consecutive sets of readings were found using the criteria in the protocol.
Performance during the rest of the study was assessed by the mid control charts as described in the NCCLS protocol.
Charts were constructed for the high, mid and low controls. The high and low control charts were only used as corroborative evidence if an outlier occurred in the mid level charts. Throughout the study there was only one mid control outlier. One reading of a triplicate set produced a mean and range error. However, the high and low control mean and range charts were well within limits and so this run was not rejected.
Replication experiment (PSEP-3) Table 3 gives the analysis of variance (ANOVA) results for the replication experiment. It should be noted that the carryover effect on WELL-I for glucose and urea is based on testing significant differences in variance with and without 20 20 20 10 10 carryover. Table 4 gives the claims for imprecision with and without carryover following the NCCLS format.

Comparison of methods experiment (PSEP-4)
Figures and 2 give the regression statistics together with correlation coefficient and the ratio SDx/Syx for patient samples used in the comparison of methods experiment. The preparation of data sets ez, e2 and e 3 is described in the Methods and Materials section. Regression statistics from data set e3 were used to calculate the accuracy performance claims given in Table 5.
Wellcome Group quality control programme There were two main objectives in using the multi-level lyophilised material available from the programme: 1. To provide additional information to that obtained in the comparison of methods experiment and compare the estimates from regression analysis with those calculated from patient samples. No exclusion criteria were applied to these results as the number of samples was very small. The ratio SDx/Syx in every case was well in excess of 7.0 and no pairs showed a difference in excess of 3.5 times Syx (Table 6 lines (a) and (e)). 2. In studies involving comparison of methods, conclusions concerning the performance of the test method are very dependent on the performance of the comparative or reference methods. Each comparative method can be classified in the Wellcome Scheme and the results obtained in each laboratory evaluated against the method mean (Table 6 lines (c) and (g)). Additionally they can be evaluated against the overall mean (Table 6 lines (d) and (h)).
The Kodak results are evaluated against the overall mean as it is difficult to classify this methodology in the Wellcome Scheme method classification ( Table 6 lines (b) and (f)).

Discussion
The establishment of performance and claims for clinical chemical methods has now become a major problem for manufacturers of clinical chemistry systems and is in danger of consuming a major part of a limited resource, namely that of skilled laboratory workers.
In these activities however there are complex problems for manufacturers and clinical chemistry laboratories alike and the publication of proposed standards PSEP-2, 3 and 4 by II the NCCLS represents an important contribution to progress in this field. This paper and the subsequent one [8] reports some of the authors' experience with these standards and the data derived from their work.The PSEP-2 and 3 standards although time consuming present few difficulties in execution based as they are on freeze dried material. However there are difficulties in carrying out the proposed standard for the Comparision of Methods Experiment (PSEP-4). The difficulties relate on the one hand to the selection of patient samples and their analysis according to the protocol and on the other hand to the performance of the comparative methods during the period of study.
The overview of the Comparison of Methods Experiment suggests that "at least 100 fresh patients' samples should_ be analysed in duplicate by both the test method and the comparative analytical method. The experiment must cover a period of at least four days which permits a maximum of twenty-five samples to be analysed in one day, or it can  extend over a longer period of time if that is convenient for the evaluation study". Recommendations for the selection of patients samples are given and one suggested distribution is shown in Table 1.
The authors were only able to comply with the suggested distribution by preselecting samples and freezing them prior to subsequent duplicate analysis. The amount of sample required to perform duplicate analysis by the test and comparative method represents a major problem if the test and/ or comparative methods require substantial amounts of serum or plasma. The use of the NCCLS protocol for evaluation of multichannel systems may present special difficulties although one such evaluation has recently been published [9].
A penalty of not running duplicates is the failure to produce within run estimates of imprecision for human sera for the test and comparative methods. The information is important for evaluation of the comparative method and for its comparison with the test method. Additionally these estimates of within run imprecision can be usefully compared with those obtained in the replication experiment using the lyophilised material.
It was found that there was a tendency when selecting samples for analysis to encounter difficulties at the ends of the range. This could lead to the multiple selection of samples from one patient so that although the number of samples required is fulfilled the variability represented by those samples is reduced. If this were to become a major feature of selection then it might result in falsely low estimates of Syx which would markedly improve the accuracy performance claims. It is interesting in this conrtection to compare the regression statistics for comparison of methods for the bovine material from the Wellcome Scheme (Table 6 lines (a)and (e)) with those obtained on patient samples (Figures and 2).
Bearing in mind the recognised problems associated with commutability of samples and the small number of Wellcome samples used the estimates of slope and intercept were in agreement with those obtained with patient samples. However the value for Syx for urea and glucose using Wellcome material is markedly lower than the value for patient samples. This reflects the fact the Wellcome material is taken from only four homogenous pools and covers a smaller range of analyte concentrations. If in excess of fifty quality control samples were used in a comparison of methods experiment and commutability were satisfactory the standard deviation of the estimates of slope and intercept would be markedly improved but analysis of lyophilised material from different sources can never replace patient samples in estimation of the standard error of the regression line.  Care must be taken to ensure that plasma samples are obtained from blood samples which had the recommended amounts of anticoagulant added. High concentrations of anticoagulant resulting from inadequate filling of a specimen container could adversely affect measurement by a test or comparative method [3 ]. The choice and control of the comparative method represents the second major problem in the comparison of methods experiment and whereas the protocol discusses briefly the factors affecting the choice of a comparative or reference method it does not provide guidance as to the control of that method during the period of study. The data presented in Table 6 represents an attempt to provide some information about the bias of the comparative methods with reference to their overall and method means. It can be seen that the comparative method for urea shows a significant proportional error of the order of +5% when compared against overall mean values which may account in part for the significant proportional error of approximately-5% for the Kodak results against the comparative results using patient samples.
In the preparation of accuracy performance claims it is clear that calculation of bias is dependent on reliable estimates of slope and intercept and that the tolerance limits are additionally dependent on the standard error about the regression line (Syx).
Preparation of data in the manner recommended will sometimes lead to a reduction in the range of samples analysed and additionally the removal of outliers will reduce the value of Syx. For performance claims to be comparable these factors must be taken into account. These effects are discussed in detail elsewhere [8]. The protocol suggested that tolerance limits and total error be calculated only for medical decision concentrations closest to the mean of the comparative method data (x). Table 5 shows that for glucose this requirement is reasonably well fulfilled. For the medical decision concentration of 6.6 mmol/1 the value of the mean of x was 7.1. However for urea the situation was less than satisfactory with a mean value of x of 7.8 and a medical decision concentration of 9.60 mmol/1. This problem has however already appeared in the literature 9] with a medical decision concentration of 1100 mg/1 (6.2 mmol/1) for glucose having tolerance limits and total,error quoted when the means of comparative or reference methods were 1670 mg/1 (9.3 mmol/1) and 1600 mg/1 (9.0 mmol/1) respectively and for a medical decision concentration of 250 mg/1 (8.9 mmol/1) for urea nitrogen with a comparative method mean at 512 mg/1 (18.3 mmol/1). It will be necessary to indicate how close is close if performance claims are to be of value and be comparable. The mean of the comparative method (x) should be given in an accuracy performance claim in order to avoid misunderstanding.
The proposed standard, PSEP-4, would be improved by inclusion of some basic criteria for evaluation of the comparative method against other laboratories in the form of method means. It is in this area that manufacturers are most vulnerable to claims made for or against their products by laboratories using inadequately controlled comparative or reference techniques.  Because of the difficulties associated with obtaining patient samples and the labile nature of some analytes, manufacturers will always require the assistance of clinical chemistry laboratories in the establishment of performance claims, but our experience suggests that this work should not be undertaken lightly by laboratories and that manufacturers would be advised to assess the resources of any chosen site carefully before proceeding.

Introduction
The previous paper [2] described two method comparison studies which followed the guidelines of the National Committee for Clinical Laboratory Standards protocol PSEP-4, comparison of methods experiment ]. The Kodak Ektachem analytical system for urea and glucose was compared with Technicon AutoAnalyzer methodologies. Two hundred patient samples distributed according to PSEP-4 guidelines were analysed in duplicate by the test and comparative methods. Twice the minimum recommended number of patient samples were used in order to study the effect of sample size above as well as below the recommended minimum number. The data for glucose is presented and the data modified to produce changes in the sample number, distribution and range. The estimates of slope, intercept and standard error of the estimate of y (Syx) from linear regression analysis are used in the calculation of the tolerance limits and in estimates of total error at medical decision levels, which provide a basis for manufacturers' performance claims. This paper illustrates the way in which sample number, distribution and range could alter the manufacturers' performance claims and gives an indication of the magnitude of these effects. The methods adopted for detection of outliers in the data can also have a marked effect on the claims made.

Materials and methods
Experimental methods and materials for glucose have been described previously [2]. The distribution of patient samples recommended for glucose analysis was Group A (<2.8 mmol/1) 10%; B (2.9-6.1 mmol/1) 40%; C (6.2-8.3 mmol/1) 30%; D (8.4-13.8 mmol/1) 10%; and Group E (>13.8 mmol/1) 10%. The information in the draft version of the PSEP-4 protocol contained a misprint and groups for glucose were given as A (10%), B (40%), C (20%, D (10%) and E (10%). In our experiment 20% of samples were-collected in Group E. However, the recommended distribution and our distribution have been compared with other possible distributions for one hundred samples by data modification described below.
The equations for linear regression analysis were those given in Davies et al [3]. Modification of the original data base of two hundred samples analysed in duplicate by test and comparative method is described below.