The Weibull distribution is widely used in the parametric analysis of lifetime data. In place of the Weibull distribution, it is often more convenient to work with the equivalent extreme value distribution, which is the logarithm of the Weibull distribution. The main advantage in working with the extreme value distribution is that unlike the Weibull distribution, the extreme value distribution has location and scale parameters. This paper is devoted to a discussion of statistical inferences for the extreme value distribution with censored data. Numerical simulations are performed to examine the finite sample behaviors of the estimators of the parameters. These procedures are then applied to real-world data.
In medical research, data documenting the time until the occurrence of a particular event, such as the death of a patient, is frequently encountered. Such data is called time-to-event data, also referred to as lifetime, survival time, or failure time data, which has in general right-skewed distribution. For this reason, the Weibull distribution is widely used. In place of the Weibull distribution, it is often more convenient to work with the equivalent extreme value distribution in which data are the logarithm of those taken from the Weibull distribution (Lawless [
A common feature of lifetime data is that the data points are possibly censored. For example, the event of interest may not have happened to all patients. A patient undergoing cancer therapy might die from a road accident. In this case, the observation period is cut off before the event occurs. In such a case, the data is said to be censored, and it would be incorrect to treat the time-to-death as lifetime. When data are censored (as in the case of the cancer patient who dies from a road accident), conventional statistical methods cannot be directly applied to analyze the data. Insteady, special statistical methods are necessary to handle such data. Censored data have been studied by many authors. Kaplan and Meier [
This paper is organized as follows: Section
The probability density function for the extreme value distribution considered here is
The above probabilities can be combined into the single expression
This yields the sampling distribution of
Knowing that
It can be easily shown that for the extreme value distribution, the survival function is
Hence, the above likelihood function can be written as
From (
which is equivalent to
The above equations can be solved by some numerical techniques such as the Newton-Raphson iteration or random search to locate the estimates,
From (
which are equivalent to
To make inferences about
where
It is often difficult to evaluate the expectations in
where
From the usual large-sample theory, we have
Thus,
where the matrix
From the asymptotic normality of
respectively, where
which is equivalent to
with
Therefore, we have
where
Hence, since
Note that the interval always lies in the positive half of the axis.
The procedures based on the normal approximation are appropriate for quite large sample sizes. An appealing alternative is to use likelihood ratio procedures. Chi-squared (
Consider the test problem
where
at which
Note that
Similarly, a
where
Several experimental simulations were carried out to assess the performance of the confidence intervals discussed in Section
Simulation results, empirical coverage probability (ECP) and empirical mean length (EML) of 95% confidence intervals of
ECP | EML | |||||
Censoring | Method | |||||
20% | 20 | 1 (2) | 94.0 | 87.8 (91.2) | 0.9508 | 0.7566 (0.7770) |
LR | 94.4 | 93.4 | 1.0217 | 0.8167 | ||
50 | 1 (2) | 94.0 | 93.8 (94.2) | 0.6233 | 0.4948 (0.5000) | |
LR | 93.4 | 94.4 | 0.6335 | 0.5100 | ||
100 | 1 (2) | 94.6 | 92.6 (93.0) | 0.4379 | 0.3458 (0.3476) | |
LR | 93.8 | 92.6 | 0.4348 | 0.3510 | ||
30% | 20 | 1 (2) | 95.6 | 89.4 (92.6) | 1.0412 | 0.8303 (0.8564) |
LR | 95.2 | 94.2 | 1.1397 | 0.9081 | ||
50 | 1 (2) | 95.4 | 93.4 (94.4) | 0.6604 | 0.5245 (0.5308) | |
LR | 95.2 | 94.6 | 0.6764 | 0.5431 | ||
100 | 1 (2) | 94.8 | 92.8 (94.4) | 0.4657 | 0.3696 (0.3718) | |
LR | 93.4 | 94.2 | 0.4647 | 0.3760 | ||
40% | 20 | 1 (2) | 93.4 | 88.4 (91.0) | 1.1755 | 0.9163 (0.9520) |
LR | 93.6 | 93.0 | 1.2944 | 1.0241 | ||
50 | 1 (2) | 93.4 | 93.2 (95.0) | 0.7226 | 0.5659 (0.5738) | |
LR | 93.0 | 95.0 | 0.7469 | 0.5898 | ||
100 | 1 (2) | 94.6 | 92.6 (94.0) | 0.5101 | 0.3999 (0.4026) | |
LR | 93.8 | 94.2 | 0.5118 | 0.4081 | ||
50% | 20 | 1 (2) | 91.6 | 88.4 (92.8) | 1.3248 | 0.9886 (1.0345) |
LR | 94.2 | 93.4 | 1.4620 | 1.1317 | ||
50 | 1 (2) | 94.2 | 90.0 (92.0) | 0.8143 | 0.6132 (0.6236) | |
LR | 94.8 | 93.8 | 0.8524 | 0.6451 | ||
100 | 1 (2) | 94.8 | 95.0 (95.4) | 0.5909 | 0.4471 (0.4508) | |
LR | 94.0 | 95.2 | 0.5983 | 0.4584 | ||
60% | 20 | 1 (2) | 92.8 | 87.8 (93.0) | 1.6431 | 1.1305 (1.2012) |
LR | 94.6 | 94.0 | 1.7439 | 1.3587 | ||
50 | 1 (2) | 94.8 | 93.8 (94.4) | 1.0082 | 0.7065 (0.7219) | |
LR | 94.8 | 95.0 | 1.0733 | 0.7500 | ||
100 | 1 (2) | 95.0 | 94.0 (94.8) | 0.6803 | 0.4822 (0.4871) | |
LR | 93.4 | 95.4 | 0.6954 | 0.4975 | ||
70% | 20 | 1 (2) | 89.8 | 87.2 (91.4) | 2.2459 | 1.3388 (1.4785) |
LR | 94.0 | 93.6 | 2.0291 | 1.8666 | ||
50 | 1 (2) | 94.4 | 92.0 (92.6) | 1.2435 | 0.7913 (0.8140) | |
LR | 93.6 | 93.2 | 1.3256 | 0.8641 | ||
100 | 1 (2) | 92.8 | 91.0 (93.2) | 0.8572 | 0.5503 (0.5576) | |
LR | 93.6 | 94.2 | 0.8894 | 0.5739 |
It should be noted that although the normal approximation procedures are adequate for quite large samples, the approximations on which they are based are rather poor for small-size samples (Lawless [
We now look at the results for the censored data case presented in Table
In the case of the location parameter (
We also discuss a graphical method for checking the adequacy of the distribution. The extreme value survival function satisfies
where
Plots of
The procedures are applied to a real data set. Pike [
Confidence interval for
Method | C.I. of | Length | C.I. of | Length |
---|---|---|---|---|
Normal | [5.3756, 5.5374] | 0.1618 | [0.1081, 0.2217] | 0.1136 |
Log | [5.3756, 5.5374] | 0.1618 | [0.1168, 0.2327] | 0.1159 |
LR | [5.3800, 5.5400] | 0.1600 | [0.1204, 0.2419] | 0.1215 |
Plot of
In this paper, we have investigated the inference procedures for the extreme value distribution with censored observations. The extreme value distribution is a useful model in the parametric analysis of lifetime data. Through numerical studies, the inference procedures, based on the maximum likelihood estimates, were examined. The usual normal approximation procedures were enhanced by means of the log transformation and the likelihood ratio method. By analysis of the empirical coverage probabilities and the empirical mean lengths of the confidence intervals, we have found that the likelihood ratio method is very effective for small sample sizes when data are heavily censored. A graphical method for checking the adequacy of the distribution was also discussed. The procedures were then applied to a real-world data set.