The mathematical model discussed in this paper presents a technique to estimate the length of the cancer’s silent growth period. The methodology described utilizes information obtained from observed cancer incidence to reconstruct what is cautiously believed to be the period of time from malignant cancer initiation to diagnosis. Analyses show a decreasing hazard for cancer indicating that the longer a patient survives, the more likely they are to reach the upper limit of their natural lifespan. Based on previous research, the Weibull distribution has been used to describe the mechanisms of cancer development. In contrast to the memoryless exponential distribution which assumes a constant failure rate, the shape of the Weibull distribution is dependent on past events and preserves a memory of prior survival. This provides a simple but powerful way to characterize how the unobserved experience of cancer relates to the observed as a function to estimate the time between onset and diagnosis. The results indicate a window of opportunity for early intervention when cancer is most treatable. The method presented provides useful information to identify cancers with high mortality and prolonged periods of undetected growth to distinguish types of dire public health concern.

Survival analysis statistics in cancer research are often reported in terms of individual survival from the time of diagnosis. When utilizing cancer registry data, the true time in which malignant cancer cells developed in the body is unknown because there is often no indication. The telltale signs and symptoms characteristic of cancer could be months, if not years, away. Causal factors may act in sequence to initiate or promote carcinogenesis, and ten or more years often pass between exposure to external factors and detectable cancer [

The two-parameter Weibull distribution is a popular lifetime model frequently used in biomedical sciences survival analysis to describe age-specific mortality and failure rates [

In this paper, we describe a methodology that utilizes the popular two-parameter Weibull model as its framework and develop a conditional Weibull survival model that accounts for the assumption that the individual survived up to the time of diagnosis. Using simple linear regression methods, we utilize information obtained from observed incidence data to estimate the length of the cancer latency period. When the hazard rate changes over time, the probability of failure is dependent on time, and the Weibull distribution allows for a “memory” of previous survival time for an observation [

The statistical analysis of lifetime data is an important topic in many areas, including the biomedical, engineering, and social sciences [

The concepts of survival and hazard are essential to understanding survival analysis. The survival curve expresses the cumulative effect of the risks faced by an individual, and the hazard function characterizes the rate of change of the survival function over time. This indicates that where survival is quickly decreasing, hazard is high; if the survival curve is constant, the hazard is zero [

Let us suppose that

The hazard function, denoted by

To estimate the survival function, the Kaplan-Meier product-limit estimator method was used. This method is a nonparametric maximum likelihood estimator of the survival function used to estimate survival probabilities as a function of time. This method is favorable since it makes no assumption about the underlying distribution of the survival times and has become the most commonly used approach to survival analysis in medicine [

We assume a sample of

The Weibull model is widely applied in survival analysis and has been shown to fit data involving the time to appearance of tumors or death in animals subject to carcinogenic insults over time [

As previously stated, we assume that observations are available on the independent failure times of

When

Utilizing conditional probability theory and the popular two-parameter Weibull model as our framework, we developed a mathematical model to account for the assumption of individual survival up to the time of diagnosis. By introducing this additional parameter and utilizing the memory property of the Weibull distribution, we restore what we cautiously believe to be the time between cancer initiation and diagnosis. Using information from observed incidence data available from cancer registries, our analysis showed that the Weibull shape parameter,

To illustrate the timeline of events, Figure

Timeline of events which demonstrate the unobserved and observed periods of cancer beginning at the time of disease initiation.

Using the Weibull survival function in (

The Kaplan-Meier method was used to estimate the survival probabilities which were used as the outcome variable in our model. For this analysis, we use linear regression methods to estimate the Weibull model parameters because of their computational simplicity and ease of graphical interpretation [

Utilizing the methods outlined by Nadler and Zurbenko [

Monthly observations of newly diagnosed adult cancer cases in the United States were obtained for the period of 1973–2008 available through the Surveillance, Epidemiology and End Results (SEER) Program. SEER is a national registry for cancers that is commissioned by the National Cancer Institute which began maintaining records of patients with cancer in 1973 [

The types of cancer chosen for this analysis were restricted to those with high mortality rates and limited availability of effective treatment options allowing the disease to follow its natural course, which minimizes potential biases. High mortality rates maximize the amount of information known to the researcher allowing more precise estimates. Overall, 6 in situ and invasive primary cancers were selected and analyzed with a total sample size of 556,696. These cancers include acute myeloid leukemia, brain, liver, lung and bronchus, pancreas, and stomach. Events were considered in cases where the cause of death was cancer-specific.

The conditional survival plot in Figure

Observed conditional survival plots with Weibull shape parameter for melanoma, breast, lung, and pancreatic cancers.

Early diagnosis of cancers can occur from increased screening practices and can alter the natural course of disease. The collection of SEER data began in 1973, and the availability of cancer screenings and effective treatments for breast and melanoma cancers has increased dramatically in the last 20 years. In some cases, routine screenings can identify lesions in patients who otherwise may have never been diagnosed in their lifetime. These biases, known as lead-time bias and overdiagnosis bias, can interfere with our ability to generalize results from a sample to the population. In an attempt to avoid these potential biases, cancers with low death rates and known treatment courses (i.e., breast and melanoma) were excluded from this analysis.

In Figure

Estimating the approximate time of lung cancer initiation using the Weibull model extension.

Applying the Weibull model extension to a subset of cancers in the SEER data, we determined the length of the latency periods and presented these estimates in Figure

Estimated interval between first cancer-related mutation and diagnosis obtained using the Weibull model extension.

A biological study published in

As mentioned earlier, Manton et al. [

Estimated lag determined by Manton et al. [

Cancer type | Lag estimate |
---|---|

Liver 817 | 21.2 ± 2.3 |

Lung and bronchus 804 | 19.8 ± 5.8 |

Lung and bronchus 807 | 15.3 ± 6.2 |

Lung and bronchus 814 | 19.3 ± 4.0 |

Pancreas 814 | 14.8 ± 9.4 |

Overall, our results are consistent with those obtained by Manton et al. [

For lung and bronchus cancers, our results fall within the estimates provided in Table

In this paper, a new algorithm is presented that uses survival information obtained strictly after disease diagnosis to estimate what we cautiously believe to be the time between cancer onset and diagnosis. The ability to “retrace” the progression of prior survival histories is dependent on the shape of the hazard increasing or decreasing over time. Overall, our quantitative analysis indicates that there is a large window of opportunity for diagnosis while the disease is still in the curative stage. Although the Weibull model extension may not provide exact estimates because it is an approximation solution, it undoubtedly allows the medical community to identify cancer types by increasing risk to distinguish the “silent killers” with long undetected periods of growth and a high risk of death. By making this information available, we present a multitude of opportunities for new research on early detection and preventative screening, improving the prognosis of the disease. The main advantages of the conditional Weibull model are its simplicity, utilizing only simple linear regression methods, and ability to permit further research of medical issues through mathematical modeling.