Measurement of Quality of Life VII. Statistical Covariation and Global Quality of Life Data: The Method of Weight-Modified Linear Regression

Existing standard statistical procedures do not seem to fulfill the needs of the researcher in global quality-of-life (QOL) research, because the most interesting question seems to be the exact size of statistical covariations. A method is necessary if we are to isolate the most important factors connected to quality of life among the thousands of possible factors in life. We have developed a new procedure we call “weight-modified linear regression”. Unfortunately as demonstrated in the discussion, the procedure is not totally without problems and weaknesses. In spite of the critique, we believe the procedure to be valid for the purpose of estimating the size of the covariation in population studies including psychometric measures of global quality of life. As we need to be certain that the procedure is valid, we hereby invite the scientific community to give us further critique of the method and suggestions for its improvement.


INTRODUCTION
Over the years, countless statistical procedures have been developed, most of them extremely successful, but when it comes to psychometrics there are still problems. Just as we still face severe problems with the scales when trying to collect our psychometric data [1,2], it is still not that simple to determine the absolute extent of the statistical covariation from such data. We have not found a traditional method that elegantly solves the problem of how large the statistical covariation is between two variables in absolute terms, when one or both of these variables is a global quality-of-life (QOL) measure.
The difficulty arises when a plot including a psychometric measure of global quality of life gives a swarm of dots (x,y) with a very irregular distribution. This is often the case, as the answers "extremely bad" and "bad" from the bottom of a 5-point quality-of-life Likert scale may be rare. Often an extremely low quality-of-life rating will be related to a short-lived condition characterized by huge efforts in improving quality of life. There is often an excessive mortality rate for the group of people feeling "extremely bad". An experienced condition deemed worse than death is not stable, and failure to solve problems within a short time-span will often lead to suicide. This dynamic makes the group with the lowest quality of life very small, sometimes only 1% of the population.
Low global quality of life is statistically connected to poor health [3]. In accordance with this, the life mission theory predicts a very high prevalence of sickness for the group with a very low global quality of life [4]. To understand the nature of life crises, we have to understand the structure of the ego. In order to radically improve global quality of life, it seems necessary to have a fundamental transformation of the psyche. Such a shift in personality has been labeled an "ego death" in Buddhism [5] or a psychic death by Jung [6,7], because it implies a shift back to the existential position of the natural self, i.e., living the true purpose of life [4]. The problem of healing and improving the global quality of life seems strongly connected to the unpleasantness of the ego-death experience, and often a person who lacks the understanding necessary for personal development chooses death instead of personal transformation. The quality-of-life peak experience, which is typically described as moments of total being, great clarity, intense happiness, and deep understanding of life, occurs spontaneously for only a few percent of the population [8].
It is therefore not difficult to understand why population surveys including psychometric variables of global quality of life have an extreme low representation of individuals with very low quality of life. As the commonly used statistical methods do not put a special focus on persons in the rare groups, these groups often drown in very huge and statistically dominating center groups, sometimes making the connections seem smaller than they are.
As an example, we can use the statistical covariation between global quality of life on one side and the positive/negative way of viewing life on the other side. In the Copenhagen Perinatal Birth Cohort 1959-61, 4,589 persons answered both the questions on quality of life and the questions on the view of life (see Table 1 at end of paper) [9].
Only 10 of the respondents did actually have an extreme negative view of life. A linear regression would have been completely blind towards 0.2% of the respondents in this group. The variation in the interval of measurement -the difference between the best and the worst group -was 58.8%, with the eighth and most negative group responsible for almost one-sixth of this difference. In the dimension of immediate subjective well being, this eighth extreme group was responsible for one-quarter of the total difference in the interval. Now, to establish the uncertainty of the measuring and the real size of the interval, we needed a suitable statistical tool to help us determine the extent of this statistical covariation, where we have global quality of life on the one axis, and a similar psychometric variable, view of life, on the other. To resolve this problem we have developed a method we call "weight-modified linear regression".
If we use the formula for weight-modified linear regression, which follows below after a short discussion of the strengths and weaknesses of the most commonly used statistical procedures, the extent of covariation was determined to 60.0 +/-4.4% (significance level: p = 0.05). The extent of the covariation between the view of life measured by 21 questions in the SEQOL, and the global (total) quality of life, is at least 55.6%.

THE TRADITIONAL METHODS
Pearson's correlation coefficient, normally labeled r, is one of the most widely used statistical tools for measuring association. The presumption is linearity, and the method also gives the statistical significance (p) associated with rejecting the null hypothesis that the population correlation is zero. We do not hereby have an expression for the extent of the covariance in absolute terms (percentages) and therefore we are not able to judge on the clinic relevance in patient-studies or the significance of a factor in a population survey.
Linear regression describes the most plausible straight line through the swarm of dots (x,y), and correspondingly indicates the significance. From this line we can calculate the variation over the measured interval. However, in a very distorted distribution as described above, the small percentage of dots representing people who have given answers in the extremely low categories may be far from the regression line dominated by the majority, and valuable information may be lost.
We discovered our problem when we could not find the same results by these standard methods, as we found by simply measuring a drawn curve. Most often the calculated covariation was too small because of a nonlinear tendency in the extreme end of the interval.
We then searched for an explanation of the problem and found that it was necessary to take the structure of the answering scale into consideration. To find a better expression of the covariation (x vs. y) we first calculated the average of the answers y for each possible category of the Likert scale (for each of the Likert scale point groups, please see the formalism below), and then ran the regression using these new points, giving them equal weight (hence the name of the procedure). There is a problem if the number in the group comes close to zero. Another problem is that the extreme answers sometimes arise from people making fun of the survey or from mentally ill persons not reporting their quality-of-life state fairly. Using the formula in a practical way, we have chosen the minimum number in an answer-group to be four answers.
Below follows the mathematical formalism of this procedure that we have named "weight-modified linear regression". The original idea to the solution was originally formulated by Hilden (the mathematical formalism) and Ventegodt [10].

WEIGHT-MODIFIED LINEAR REGRESSION (TREND-ANALYSIS, MODIFIED REGRESSION)
The purpose of the special weighted regression used in the Copenhagen Quality of Life study was to measure the linear component, possibly in a curved line relationship between a factor x and a quality of life measure y, and to do it in such a way that the special relationships in the interesting minority groups, that is, those with a particularly high or low x value, are revealed at the cost of possible flat (horizontal) or opposing trends in the central part of the x distribution. This assumes that the minority groups are not too small, in which case they are amalgamated with the neighboring group.
This approach is only useful when x can only take on a few values, x 1 , x 2 ,... x k (typically, as in QOL1 [11]) k = 5 and x 1 = 0.1, x 2 = 0.3, x 3 = 0.5, x 4 = 0.7, and x 5 = 0.9 for our use with the five-point Likert scale; if the result is calculated from a calculated mean of say five Likert scales (as in QOL5 [11]), the number k will be 25. Let n 1 , n 2 ,...,n k be the number of observations in each of the k x-groups (for example: n 2 = 84 signifies that 84 people answered x 2 = 0.3 = "poor"). The slope is defined as where µ j (for consistency with the notation for x one could prefer y mj, or alternatively, the Greek letter nu Σ In order to understand the formula for h it might be useful to note that if µ 1 = a (a constant), then h = 0; and if µ j = bx j (that is, it is proportional to x j ), then h = b (the proportionality constant). If µ j varies linearly with x j , then µ j = a + bx j , since h is again equal to b, which expresses the average change in the quality of life measure y, when x increases by one unit. If µ j varies nonlinearly, as a function of x j , h can also be regarded as the slope of a linearized variation of y with x.
If y j describes the estimated µ j , which is equal to the average of the n j y-values of people who answered x j , then h is estimated as where S is the same denominator as in (Eq. 1). On multiplying by (x k -x 2 ), we obtain the estimated increase in y from the lowest to the highest (k'th) x-group (using the linearized form if y is a nonlinear function of x). This is denoted as ∆2: The difference between the highest and lowest groups of y 1 j are denoted by The quantity ∆1 will tally with ∆2, if the y 2 j depend more or less linearly on x j ; otherwise ∆1 will reveal a significant departure from linearity. The uncertainty in ∆1 is large and difficult to estimate. On the other hand, the uncertainty in ∆2 is easy to calculate (the following section assumes familiarity with the basic calculation of the standard error). The standard error (SE) associated with ∆2 is, regardless of the degree of linearity and normality, equal to the square root of Here SD j is the standard deviation in the y-distribution of the x j . For our purposes, we can as a rule assume that the variability of y is the same in all the k x-groups, that is, SD 1 = SD2 =......= SD k . The common SD is then estimated naturally as the square root of The quantity ε is interpreted as follows: there is a 95% certainty that ∆2 is estimated with an error of less than plus or minus ε.

DISCUSSION
Jørgen Hilden has lately given important comments on this method [12]. They can be divided into three points: • "Even though the procedure seemed reasonable for the purpose that this project have made of it, we need a much more profound argumentation in order to convince the readers, or in other words that it extracts from the data exactly what is interesting, with respect to the quality of life. Put in another way it ignores the little movements in the huge center-groups and accentuates the tendency in the small outer-groups to exactly the extent that we are interested in." An appropriate answer to this just critique seems to be, that seen philosophically, all the groups corresponding to the Likert scales points are equally interesting. Therefore it seems quite reasonable to give them the same importance in the interpretation, irrespective of the absolute number of participants in each group, which is exactly what the proposed procedure does.
• "A related question is the following: It is obvious that you cannot use the procedure to anything intelligent if the outer-groups are very small (this caution is also discussed in the text above). But the question is: how much is "very" small? N = 2,5,10,25 or how much? And does the answer rely on the total N?
An operational rule must be added to the procedure, and this has to be based on objective criteria. The problem here is also, that the procedure is not derived from a conceptual criteria. Proper statistical procedures are derived from a desiderata of the form: "we wish to quantify in a certain degree..." (i.e., "we wish to quantify the next patient's increase in chance of being cured by shifting from placebo to medicament XYZ"), and there one must derive through mathematical analysis the effect-estimate that, with as little bias as possible and with as little standard error as possible, will increase the amount desired. It is not enough with a vague desire like "we wish to measure the rise in the quality of life in a way that stresses the outer-groups". As it is, the procedures remain ad hoc and theoretically uninteresting." • "There is also a problem with the terminology, since statistical rules ignore any of the Likert scale point groups as they all carry the same meaning and importance." Hilden [12] gives an important argument against the procedure. But we believe that we actually are using a conceptual criterion: when we philosophically state that all the Likert scale point groups are of equal interest, this might be wiser that it seems. To go to one extreme: what if there is only one point in a group? It seems intuitively meaningless to ascribe to one point in the group "very low quality of life" the same statistic meaning as to 1,000 points or answers in a center-group. But if we truly want to understand the profound concept of human quality of life, the single person from group five really is as interesting as the 1,000 persons from group two. He or she simply is our only source to knowledge of that state of being.
Compare the well-known example of one cancer patient in a thousand having a complete, spontaneous remission. This patient might hold the key to healing, so this patient can be considered as of similar interest as the other 999 not recovering. The conclusion is, that because of the special conditions like a possible nonlinearity in the extreme ends of the quality-of-life scales, it seems philosophically much more meaningful to extract all the possible knowledge from just one or a few answers in an extreme quality-of-life group, than to try to extrapolate to these extreme groups from the variation among the central groups. Only a human being who has lived the extreme can report on extreme states of being.
From time to time, respondents will appear who are not very serious or who consciously choose to obstruct the survey by giving impossible or untrue answers. Most of such nonvalid questionnaires are easily spotted by their weird characteristics (i.e., the answer "5" to 100 questions in a row) and can be removed in the routine qualitative inspection of the incoming questionnaires in a survey. This problem seems to be fairly eliminated if we take at least four respondents.
Another question, that we have raised ourselves is if we have to extend this linear method into the nonlinear area, as the quality-of-life curves often lose their linearity in the extreme ends of the global quality-of-life scales. Interestingly, this problem seems also to be minimized with weight-modified linear regression, as the low-quality-of-life Likert scale point group's "jumps of the line" seems to be well represented in the results. Further research is needed to estimate the size of this problem.
Questions might also be raised if grouped values are used for the analysis. Malcolm Campbell [13] suggests that "in this situation, one may have to treat the two (grouped) variables as ordinal and estimate association using Kendall's tau correlation or Spearman's rho correlation. Another possibility is that the chisquare test for trend is used: this is applied to tables where either both row and column variables are ordinal or one is ordinal and the other dichotomous (having two categories), and it is closely related numerically to Pearson's correlation. Bland [14] discusses the test and gives a formula for the test statistic without explicitly stating that his test statistic is the number of cases multiplied by the Pearson correlation between the two variables. The formula used by SPSS is (N-1) * r -they call their test the Mantel-Haenzsel test for linear association (i.e., trend) -see http://www.spss.com/tech/stat/Algorithms/11.5/crosstabs.pdf. Bland [14] points out that the values on the row and columns are usually taken to be 1, 2, 3..., but that other values could be used to reflect the relative importance of the different categories. Kendall's and Spearman's correlations might also "ignore" the interesting outlying cases, but choosing appropriate values for the rows and columns may give them more weight. Choices for the values may have to be ad hoc, and it would be necessary to experiment with different sets of values to observe the effect, but even then, the approach only comes up with a test and not an effect size. Correlation coefficients are effect sizes, but as mentioned above, the nonparametric correlations may well "miss" the cases with small scores. Perhaps that is also true of reporting means and medians: the emphasis is on measuring and reporting the average at the expense of the very small or very large." We are grateful for these remarks from Campbell.

CONCLUSIONS
The existing statistical standard procedures do not seem to fulfill the needs of the research in global quality of life, because the most interesting question seems to be the exact size of statistical covariations. This is necessary if we are to isolate the most important factors connected to quality of life, among the thousands of possible factors in life. We have developed a new procedure we call "weight-modified linear regression". Unfortunately as demonstrated in the discussion, the procedure is not totally without problems and weaknesses.
When weight-modified linear regression is used in practice, the researcher would always have to perform and report the usual Pearson correlation (or linear regression) analyses for comparison and in order to see the difference between the two. He or she should report findings in the interesting (small-score) subgroups in any case.
In spite of the critique, we believe the procedure to be valid for the purpose of estimating the size of the covariation in population studies including psychometric measures of global quality of life. As we need to be certain that the procedure is valid, we hereby invite the scientific community to give us further critique of the method and suggestions for its improvement.    Test: If H0: m1 = m2 = ... = mn is rejected, the groups are tested individually: H0: mi = mnon-i. ∆1 is the measured max-min difference. ∆2 is the variation calculated by weight-modified linear regression; ±ε being the measurement error at α = 0.05.