Critical discussion on a method for derivation of reference limits in clinical chemistry from a patient population

Introduction ’Health’ is a concept that is difficult to define. A well-known definition is given by the World Health Organization (WHO): health is a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity. Some authors, for example Grfisbeck [1], have criticized this definition as unrealistic and have proposed others that stress the absence of undesirable affections, the so-called privative definitions. A more complete discussion of the notion ’health’ is beyond the scope of this paper.


Introduction
'Health' is a concept that is difficult to define. A well-known definition is given by the World Health Organization (WHO): health is a state of complete physical, mental and social well-being, and not merely the absence of disease or infirmity. Some authors, for example Grfisbeck [1], have criticized this definition as unrealistic and have proposed others that stress the absence of undesirable affections, the so-called privative definitions. A more complete discussion of the notion 'health' is beyond the scope of this paper.
All writers agree that health is a very individual state, so it seems inappropriate to define it by 'normal values' in clinical chemistry. However, because it is not generally possible to specify normal value for individuals, some form of limit for detecting gross deviations from 'health' would be useful. Of course no limit should be taken as absolute and it should not be called a 'normal' value, but, rather, a 'reference' limit. When determining reference limits it is important to ensure that the reference population matches the patient population as closely as possible in all respects other than disease. Special attention must be paid to similar age and sex distributions, social backgrounds, diets etc. Of course the same sampling and analysis methods must be used for both populations. Since blood donors or laboratory staff usually at least do not have the same age distribution as hospitalized patients, large discrepancies may be found if the former groups are used as reference samples. The International Federation of Clinical Chemistry (IFCC) has given a series of recommendations on the theory of reference values [2]. Solberg has designed a computer program package (REFVAL) to assist in the statistical treatment of reference values.
It would be very helpful if hospital patients could be their own references, or rather their colleagues' ones. This might well be possible if we assume that most patients have only a few distrubed chemical parameters and are 'healthy' in terms of the others. Then, in the bulk of analyses, most results will be 'normal' and the distribution will be polluted by pathological results only to a small extent. This paper describes an attempt to filter out these pathological values.
It must be noted that the benefit of being able to establish normal ranges that are more or less universal is reduced by this approach, because most patient populations will not sufficiently match the one the reference limits are determined for. Furthermore, there will be a slight variation in reference intervals determined at different times in the same hospital. The authors consider the gain in usefulness ofreference intervals to be greater than these disadvantages.

Apparatus
Analyses were performed on two Technicon SMA-C continuous-flow analysers that determine sodium, potassium, chloride, urea, creatinine, uric acid, alkaline phosphatase (AP), lactate dehydrogenase (LDH), aspartate aminotransferase (ASAT), alanine aminotransferase (ALAT), total bilirubin, direct bilirubin, calcium, inorganic phosphate, total protein, albumen, cholesterol, triglycerides, iron and gamma glutamyl transferase simultaneously in serum. For every sample analysed by an SMA-C all components are measured,, due to the concept of multichannel analysis. In this study this was essential because the majority of the measurements on each component came from patients in whom most components were not pathological.
One SMA-C was for hospital patients and the other for general practitioners' patients. For data acquisition a Hewlett-Packard HP 1000 computer system was used.
The program used for the calculations was written by Hemel in Pascal and was run on the University's Control Data Cyber 170/760 computer. The plots were made on the University's Versatec V80 electrostatic plotter.

Techniques
Several methods have been devised in the past using the idea of deriving reference intervals from patient populations. Naus [3] has reviewed the following methods: Hoffmann [4 and 5]: a graphical method using probability paper. Neumann [6]: an iterative modification of the Hoffmann method.
Pryce [7]: mean and standard deviation based on the mode and percentiles of the non-distorted side of the one-sidedly polluted distribution.
Becktel [8]: a modification of the Pryce method. Bhattacharya [9]: mean and standard deviation are derived from the intercept and slope of a plot of In (f(x + h)/f(x)) against x, withf(x)" frequency in class with midpoint x, and h: class width. Naus did not cover a method devised by Martin 10 and 11], who proposed population dissection by fitting two (or three) Gram-Charlier functions that represent a reference distribution, and a (two) polluting pathological one(s), to the observed distribution.
The method of choice should meet the following requirements" (1) As the exact distribution of a variable is usually unknown, the method should preferably be nonparametric: most methods do not meet this requirement, they suppose a Gaussian distribution, if necessary after transformation. Martin's method does not make any assumption about the shapes of the populations. However, it supposes that the polluting pathological cases belong to the same one (or two) population(s).
(2) All pathological results should be filtered out: all methods try to do so. Martin's method places them in one (or two) separate distribution(s).
After an extensive study of the methods he reviewed, Naus advised the use of an automated and newly objective Bhattacharya method for known Gaussian distributions, and developed a method postulating a Pearson gamma distribution for distributions that are known to be non-Gaussian. Although Martin's method has the advantage of not making any assumptions about the shape of the distributions, reasonably good estimates for eight (or even 12) parameters are required to reach convergence of the iterative procedure. We expected this to be difficult to achieve. A further disadvantage is the fact that the pathological cases are thought to form a limited number ofpopulations (one or two). As the choice between a gamma distribution and a Gaussian one was supposed to be usually simple, Naus's techniques were tested with data from the Central Clinical Chemical Laboratory.
(1) Random fluctuations' these affect the lowfrequency bars of the histogram more than the high-frequency bars.
(2) Deviations from the wanted regression line because of pathological scores. Variables that are approximately normally distributed The technique used was Naus's modification of the Bhattacharaya method [3 and 9]. A plot of In (f(x + h)/ f(x)) against x (withf(x): frequency in class with midpoint x, and h: class width), gives a virtually straight line for a normal distribution, presented as a histogram (see Appendix A). A sample from a normal distribution that is partially (only in the tails) distorted by pathological cases will show a straight part in the plot for the 'clean' fraction of the sample (figure 1). It is possible to estimate the population mean and variance from the x-intercept and the slope, respectively. The problem of isolating the straight part from the zig-zagging ones remains, this is generally done by intuition. Naus proposed weighting factors to reduce the influence ofthe distorted parts on the regression line. Initially, wanted influences of two kinds are accounted for in the weighting:  Deviations of both kinds should have as little influence as possible on the regression line. Naus derived the following weighting factor formula: Empirically optimal results are found for o 2 and [3 0, so the weightings become: w() (/( + )+ J()).
The statistical interpretation of these weightings is that the variance in the class frequencies is not equal for all classes and is balanced by the weightings.
By doing this, all short histogram bars (low frequencies) have minimal influence on the regression line. This method was tested on several artificially polluted Gaussian curves by trying to retrieve the parameters of the original curves. This gave excellent results. Subsequently, the author visually checked the results by inspecting the Bhattacharya plot and regression line of the smoothed histogram; the histogram is smoothed before making a Bhattacharya plot for visual inspection to get rid of the random fluctuations which mask the straight part in the plot. Snoothing can be done following the procedure described by Savitsky and Golay [13], and corrected by Steinier et al. 14].
However, there are some drawbacks in this approach: (a) The top part of the histogram should be 'clean' (free from pathological results) because the weightings only include the frequencies. The situation shown in figure 2 is quite believable; ill patients form a distribution that is almost uniform. The resulting superposed (and observed) Bhattacharya plot will not show a straight part, although it might have a distortion that is not conspicious, so that it could be overlooked (figures 3

[a] and [b]
). In practice, more than one illness will disturb a 'healthy population'. It is difficult to predict how the straight line will be distorted. Even an exactly Gaussian distribution will not, in a practical sample, show a perfectly straight line because of random errors. So a seemingly straight line does not guarantee that there really is a Gaussian distribution, let alone that the observed slope and intercept give the right parameters. For these reasons the method is not adequate to test the hypothesis of dealing with a Gaussian distribution, for which it is often used.
Statistically testing this hypothesis requires a homogeneous distribution, so it is difficult to do. The only way left is to base the usage of the method on prior knowledge about the shape of the distribution, for instance on calculations of coefficients of skewness and kurtosis. Of course, these calculations cannot be made for a mixed population such as the patient population used here.
(b) Paradoxically, disturbed parts of the plot have more influence on the regression line the more disturbed they are, because by pollution with pathological results their frequencies rise and consequently their weightings also increase.
(c) In principle, the range of the histogram influences the results because the tails 'pull with their weights' at the regression line. However, thanks to the small weightings on the tailing parts, this effect.is small. Therefore the part with higher frequencies is selected rather than the undisturbed part. Some improvement could result from two modifications of the technique: (i) A threshold of a fraction of the mode, for example one-tenth, to diminish the range effects. (ii) An additional plot of residuals of unsmoothed results after linear regression analysis, in order to facilitate the detection of systematic deviations from the Gaussian model. An example is seen in figure 1. The Bhattacharya plot ( figure 3[b]) appears to have a straight part, but the plot of residuals shows clearly that they are not randomly distributed about zero, so calculated reference intervals will be wrong (figure Sic]).
Variables that are definitely not normally distributed These variables are supposed to follow a Pearson's gamma distribution. This is a two-parameter distribution: R for the number of degrees offreedom, and )v, which is an extension factor. A chi-square distribution with v degrees of freedom is a gamma distribution with R 0.5*v and lambda 0.5 (see Appendix C). Typical shapes of gamma distributions are illustrated in figure 4.
These curves are skewed but tend to become symmetrical with many degrees of freedom.
The criteria for choosing the gamma curve as a model for a variable's distribution are" (1) The distribution is known to be positively skewed (long tail to the right). (c) -+ Concentration R and )v are numerically estimated using a weighted linear regression technique with the weightings as described above (the procedure of the numerical estimation ofR and )v is described in Appendix B). To establish upper (and lower) boundaries the desired fractiles are calculated from a chi-square distribution and transformed into a gamma distribution interval (see Appendix C).

Comments
In Naus's method the weightings for the linear regression technique are derived from experiments with a Gaussian distribution that is one-sidedly distorted by another Gaussian distribution. One might question whether this model is optimal for weighting gamma distributed variables.
In the method described, the 97.5% fractile is chosen as the upper boundary of a one-sided reference interval. The 95 percentile might be more appropriate. Again, the range of the histogram used to estimate the variable's distribution is of some importance for the results. So a threshold frequency of, say, one-tenth of the modal frequency could bring some improvement.
Testing the quality of the derived reference intervals is, of course, tedious. A way of checking is to compare them with literature values, preferably from the same laboratory; of course this is not a thorough test because these values are generally not based on a patient population. A better quality control would be some form of in-process checking: testing not the results, but, rather, the method. The model used can be judged visually by drawing a plot that should be a straight line in the case of a perfect gamma curve. The residuals can be plotted to determine whether there are trends visible in the deviation from the model. Details about these plots are given in Appendix B. In figures 5, 6 and 7 the function for deriving a straight line in the case of a gamma distribution is denoted by F(k, R, x).

Discussion and results
Of course, the distributions of clinical chemical variables are not exactly Gaussian or gamma. However, one may hope that these distributions can be used as reasonably good approximations. In many cases, intuition and prior knowledge supports this idea. A major problem is finding a good criterion to choose between the models and to test the validity of the chosen model. Both methods should perhaps be used and the best chosen on the basis of interpretation of the plots of residuals. A major difficulty is the necessary weighted interpretation of the plots. Another criterion might be the estimated R parameter (number ofdegrees offreedom) of the gamma approximation. If R is greater than a certain number, for example 50, the distribution will be almost symmetrical and so a Gaussian model might be appropriate. Although this criterion may perform well in practice, it must be stressed that, theoretically, it is not a guarantee for the right choice. Fortunately, sometimes it is apparent that a distribution is symmetrical, so that the Gaussian model would be chosen, or positively skewed, that is, long tail to the right, in which a gamma model would be more appropriate.
Minor problems are the influence of the range of the histograms and the coarseness in the estimation of the  7).
The similarity between the reference intervals of (nearly) Gaussian distributed variables if they are based on a normal model and those based on a gamma model is remarkable. One might suggest that a gamma model provides for all cases, but in gamma distributions the variance (=R/)2) is dependent on the mean (=R/)Q; this is unlikely to always be the case with clinical chemical variables, so the observed fact must be a coincidence.
Proof that a reference interval applies only to the patient population it is derived from is demonstrated in table 3" differences between intervals derived from hospital patients and those derived from general practitioners' patients occur, this is partly because of incomplete filtering of the pathological cases and also because of differences in the composition of the two populations. The differences between the serum albumen and protein intervals are the most marked.

Conclusions
It is possible to derive reasonable reference intervals for a clinical chemical variable from a patient reference population provided that: (1) The variable has an approximately Gaussian or gamma distribution.
(2) A choice can be made between these two models.     Although for a histogram the theory of the Bhattacharya plot is complicated (see Bhattacharya [9]  x.
where h class width.
This formula applies only ifh is infinitely small. Ifnot, the following approximation could be used" logf(X + h) (R-1)log (1 + h/x) ,h + h l ---x[(Rl)) (R-1 ) x a(R -2)] (2) This formula is derived as the first three terms of a Taylor series using some approximations. It is useless to calculate more terms from the series because of the other approximations that are necessary to get a formula that is relatively easy to work with.
(1) and (2)  So a plot ofF(k,R,X) (with k and R' the least estimates for the parameters, based on (4)) against X, that is, f(x) 24 x 2 x 3 against log (1 + h/x) will give a straight line for a gamma distribution, with R and k as estimates. Using the plot, R can be calculated from the slope and lambda from the intercept of the regression line. A plot of residuals can be drawn to detect systematic deviations from the gamma model. Since the regression is using weightings, the interpretation of the results must also be weighted. ofket -e-XZdt.
And this is exactly the area under the curve of a chi-square distribution with 2R degrees of freedom, from 0 to 2kx. which formula is based on a usual approximation of chi-square fractiles [12]. 3O