Pairwise Multiple Comparison Adjustment Procedure for Survival Functions with Right-Censored Data

The aim of this study is to propose a new pairwise multiple comparison adjustment procedure based on Genz's numerical computation of probabilities from a multivariate normal distribution. This method is applied to the results of two-sample log-rank and weighted log-rank statistics where the survival data contained right-censored observations. We conducted Monte Carlo simulation studies not only to evaluate the familywise error rate and power of the proposed procedure but also to compare the procedure with conventional methods. The proposed method is also applied to the data set consisting of 815 patients on a liver transplant waiting list from 1990 to 1999. It was found that the proposed method can control the type I error rate, and it yielded similar power as Tukey's and high power with respect to the other adjustment procedures. In addition to having a straightforward formula, it is easy to implement.


Introduction
Survival analysis is based on making inferences from the time-to-event data. It provides many statistical procedures for studying the data, including the time from a correctly identified origin until the occurrence of a certain event [1]. One of the main interests in survival analysis is evaluating the equality of survival functions for different groups. Many tests such as log-rank and weighed log-rank have been proposed [2][3][4][5][6][7][8]. Although these tests made important contributions to survival analysis, they can only provide overall or two-sample comparison results. Researchers will fail if they use these tests to compare one with another in a multigroup study design because the probability of making at least one type I error will be increased above the critical level. To prevent this mistake, pairwise multiple comparison procedures are needed. In case of the inequality of more than two groups, it is necessary to correctly decide which groups are different from the others. The appropriate way to control the type I error is to consider the familywise error (FWE) rate, which is the probability of making at least one type I error when making all pairwise comparisons [9].
Adjustment methods such as Bonferroni, Holm, and Sidak methods are commonly used in the literature. However, in survival analysis this topic has only recently been studied. Adjustment methods are applied to the results of two-sample log-rank and weighted log-rank tests. Bonferroni is the most preferred method among the others. In a two-sided test, Bonferroni assumes the significance level as ( /2) × , where is the number of pairwise comparisons in the study, but it fails when controlling the familywise error rate. In spite of its simplicity, it has been determined to be a conservative method in survival analysis [9,10]. Logan et al. proposed two different adjustment methods that consider the correlation among the pairwise tests [9]. One of the methods was derived from multivariate normal distribution, while the other was obtained from a simulated martingales approach. Koziol and Reid used the Sidak adjustment method to calculate the pairwise comparisons results of weighted log-rank tests. Although it generates more consistent results than Bonferroni's, it was also found to be conservative [11]. Not only were pairwise multiple comparisons proposed, but comparisons against a single control group were also proposed for survival functions with right-censored data in the statistical literature. Chakraborti and Desu developed linear rank tests, and Chen proposed a generalized Steel's test and an alternative method to the generalization of Steel's test [12][13][14][15]. The aim of this study is to propose a new pairwise multiple comparison adjustment procedure based on Genz's numerical computation of probabilities from a multivariate normal distribution [16,17]. This method is applied to the results of two-sample log-rank and weighted log-rank statistics where the survival data contained right-censored observations. In Section 2, some notations are given, and the construction of the simulation study is detailed. In the simulation studies SAS PROC LIFETEST and R package with mvtnorm library were used. Moreover, all adjustment methods are applied to a real life-time data set and they are compared with each other. The results and discussion about other studies are evaluated in Section 3. Finally, conclusions are mentioned in Section 4.

Notation and Background.
Let ( ) be the survival function of the th group for = 1, . . . , , where is the number of groups. The null and alternative hypotheses for the survival functions are where is the largest observed time.
Let ( , , , ), for = 1, . . . , , indicates that an independent sample for right-censored survival data where is right-censored time, is the indicator variable for censoring ( = 0 if is censored; = 1 if is an event time), is the group indicator of 1, . . . , , and is a weight function. Let 1 < 2 < ⋅ ⋅ ⋅ < = 1, . . . , be distinct event times in the sample. At time , for the th group, let = ∑ : ≥ ( = ) and = ∑ : = ( = ) denote the number of individuals at risk and the number of events, respectively. Let = ∑ =1 and = ∑ =1 denote the number individuals at risk and the number of events, respectively. The weighted number of individuals at risk in the th group is Variance of and covariance for and ℎ are as follows, respectively: ) . (4) Because the sum of is equal to 0, they are linearly dependent. Accordingly, the general test statistic is constructed by selecting any − 1 of 's. The test statistic, Let be the number of all pairwise comparisons where = ( − 1)/2. The two-sided test statistic, ℎ , compares the groups and ℎ and follows a standard normal distribution.
The unadjusted value is ℎ = ( 2 1 > 2 ℎ ). The multiple comparison procedures that are used to adjust the values in this study are shown below: . Tukey: where and Φ are standard normal and cumulative standard normal functions, respectively.

Proposed Adjustment
, ] has a multivariate normal distribution with a mean of zero and a variance-covariance matrix Σ. Under the null hypothesis, the elements of Σ follow a rule which is Cov( ℎ , ℎ ) = 0.5, Cov( ℎ , ℎ ) = 0, and 14,15]. Medicine   3 The function of (a, b, Σ) is

Computational and Mathematical Methods in
For the integration shown above, we used "mvtnorm" library, released February 2, 2016, for numerical computation in R program. There are three algorithms available for evaluating normal probabilities: The default is the randomized Quasi-Monte-Carlo procedure by Genz (1992Genz ( , 1993. We used this approach because it is easy to use and calculate with R program. The proposed multiple adjustment procedures for the pairwise comparison of the th and ℎth groups are obtained using Φ and shown below: where Additionally, the critical value for the pairwise comparison can be evaluated with

Simulation Study.
We performed Monte Carlo simulation studies to examine the proposed and conventional adjustment procedures. The FWE rate and power of the adjustment procedures were obtained through the simulation results. In this study, the number of groups was determined as = 4; = 1, . . . , 4. The sample sizes were considered equal for each group as = 50, 150, and 250 to estimate the FWE rate, while it was just 250 in the power study. The right-censored survival times were derived from the exponential ∼ exp( ) and lognormal distribution ∼ lognormal( , 2 ). The censoring rate was considered to be 30%. Therefore, the censoring variable was generated from a Bernoulli distribution ∼ Bernoulli( = 0.70). Note that the censoring rate was fixed for each group in the FWE rate and power study. To obtain the adjusted values, Bonferroni, Scheffé, Sidak, SMM, Tukey, and the proposed adjustment procedure were applied to the pairwise comparison results of log-rank and weighted log-rank tests. For each scenario 1000 data sets were simulated independently.
To compare the FWE rates of the adjustment procedures, the survival times for each group were generated from the standard exponential distribution with = 1 and the lognormal distribution with a mean of = 0 and scale parameter = 0.5. The estimated FWE rates of the adjustment procedures were evaluated with respect to the critical value = 0.05. In the power study, we used exponential distributions with various parameters and lognormal distributions with = 0.5 but different values of . For power calculation, we calculated the probability of making a correct decision only for unequal pairs. Note that the exponential distribution provides a proportional hazards model while the lognormal distribution corresponds to location shifts in log survival times. The lognormal distributions with various means were used because they have different hazards at early times [15].

Application Data.
The data set was obtained from the free data sets used in the R package, "survival" [18,19]. It consisted of 815 patients on a liver transplant waiting list from 1990 to 1999 with six variables: age at the addition to the waiting list, sex, blood type, year in which a patient entered the waiting list, and time from the entry to end point. The final disposition of the patients was categorized as received a transplant, died while waiting, withdrew from the list, or censored. Blood type is a crucial factor which affects the waiting time for transplantation. Although the liver donation from subjects with blood type O can be used by patients with all blood types, a patient with blood type O can only receive donation from the subjects with blood type O. Thus, patients with O blood type on the waiting list have a disadvantage. These data is of historical interest and provides a useful example of competing risks, but it has little relevance to current practice. We used these data as an example to demonstrate the comparison of the proposed and conventional adjustment techniques on a real data set. We considered that the event is receiving a transplant, while the other categories of final disposition are censored. Table 1 shows the simulation results for the estimated FWE rates of the proposed and conventional adjustment procedures for exponential survival distribution with different sample sizes. Under the null hypothesis, FWE rates are expected to be 0.05. As the sample size increases, estimates get closer to the targeted value in all adjustment procedures. It is obvious that the Scheffé method is the most inefficient among the others. The proposed adjustment procedure and Tukey's present similar results. It can be seen that both adjustment procedures can control the type I error even for small samples. Their performance is followed by Sidak, SMM, and Bonferroni procedures. In Table 2, the estimates of the FWE rates for the survival times from the lognormal distribution with the parameters = 0 and = 0.5 are given. Unlike the previous simulation results, not all procedures give estimates that are close to the targeted value. The proposed adjustment procedure and Tukey's provide the most efficient results. The decrease in the performance of the adjustment procedures could depend on the type of distributions. Because an exponential distribution provides a more appropriate proportional hazard model than a lognormal distribution, this affects the performance of the log-rank and the weighted log-rank tests. Therefore, the adjustment procedures tend to cause errors.

Results and Discussion
Next, the simulation results are calculated for the power of the proposed and conventional adjustment procedures for the exponential survival distribution. Under a variety of hypothesis configurations denoted by , the estimated power results are given in Table 3. As the values of become 4 Computational and Mathematical Methods in Medicine    Descriptive statistics of the application data set are given in Table 5 and the survival functions of the groups are shown in Figure 1. The overall comparison of blood type groups is conducted with log-rank test. The result is found to be highly significant ( 2 = 45.5, df = 3, and < 0.001). Thus, pairwise comparisons followed by multiple adjustment procedures were conducted, and the results are given in Table 6. All of the adjustment procedures had the same conclusions and present results that are similar to those that we observed in the simulation studies. The comparison results show that, with the exception of the pair of B and AB, all of the blood types are highly different from each other. The values obtained for each comparative test for the application data showed significant differences ( < 0.001) between the survival times of the blood groups except for the comparison of AB and B groups ( > 0.05). The results can be seen in Kaplan-Meier curves represented in Figure 1. The survival curves show a proportional structure until the middle of the  A statistician can use this method in usual data analysis procedure as follows. For example, to calculate the adjusted value for the comparison of the groups and ℎ, (1) calculate ℎ and Σ defined in Section 2.1, (2) use pvnorm command in mvtnorm library in R as follows: = rep(−Inf, ).
where is the number of all comparisons, and sigma is Σ.

Conclusions
In this study, we proposed a multiple adjustment procedure for the pairwise comparisons of survival functions with right-censored data. We conducted Monte Carlo simulation studies not only to evaluate the FWE rate and power of the proposed procedure but also to compare the procedure Computational and Mathematical Methods in Medicine 7 with conventional methods. It was found that the proposed method can control the type I error rate, and it yielded similar power as Tukey's and high power with respect to the other adjustment procedures. In addition to having a straightforward formula, it is easy to implement. This study has some limitations. The main issue was that the simulations were performed by using proposed and conventional methods. However, comparisons can be extended including the methods such as that of Logan et al. (2005) in the comparison. Logan et al. proposed two different adjustment methods that consider the correlation among the pairwise tests. One of the methods was derived from multivariate normal distribution, while the other was obtained from a simulated martingales approach. These models may work well for the data with proportional hazard structure. Future researches should take into account the models for comparisons.