To incorporate biologically observed epidemics into multistage models of carcinogenesis, in this paper we have developed new stochastic models for human cancers. We have further incorporated genetic segregation of cancer genes into these models to derive generalized mixture models for cancer incidence. Based on these models we have developed a generalized Bayesian approach to estimate the parameters and to predict cancer incidence via Gibbs sampling procedures. We have applied these models to fit and analyze the SEER data of human eye cancers from NCI/NIH. Our results indicate that the models not only provide a logical avenue to incorporate biological information but also fit the data much better than other models. These models would not only provide more insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of future cancer cases.
1. Introduction
It is universally recognized that each cancer tumor develops through stochastic proliferation and differentiation from a single stem cell which has sustained a series of irreversible genetic and/or epigenetic changes (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]; Zheng [7]). That is, carcinogenesis is a stochastic multistage model with intermediate cells subjecting to stochastic proliferation and differentiation. Furthermore, the number of stages and the number of pathways of the carcinogenesis process are significantly influenced by environmental factors underlying the individuals (Tan et al. [4, 5]; Weinberg [6]).
Another important observation in human carcinogenesis is that most human cancers cluster around family members. Further, many cancer incidence data (such as SEER data of NCI/NIH, USA) have documented that some cancers develop during pregnancy before birth to give new born babies with cancer at birth. This has been referred to as pediatric cancers. Well-known examples of pediatric cancers include retinoblastoma—a pediatric eye cancer, hepatoblastoma—a pediatric liver cancer, Wilm’s tumor—a pediatric kidney cancer, and medulloblastoma—a pediatric brain tumor. Epidemiological and clinical studies on oncology have also revealed that inherited cancers are very common in many adult human cancers including lung cancer, colon cancer [8], uveal melanomas (adult eye cancer, [9]), and adult liver cancer (HCC, [10]).
Given the above results from cancer biology and human cancer epidemiology, the objective of this paper is to illustrate how to develop stochastic models of carcinogenesis incorporating these biological and epidemiological observations. Based on these models and cancer incidence data, we will then proceed to develop efficient statistical procedures to estimate unknown parameters in the model, to validate the model, and to predict cancer incidence.
In Section 2, we illustrate how to incorporate segregation of cancer genes in multistage stochastic models of carcinogenesis to account for inherited cancer cases. In Section 3, we will develop stochastic equations for the state variables of the model described in Section 2. By using these stochastic equations we will derive probability distributions of the state variables (i.e., the number of intermediate cancer cells) and the probability distribution of time to detectable cancer tumors. In Section 4, assuming that we have some cancer incidence data such as the SEER data from NCI/NIH, we proceed to develop statistical models for these data from these multistage models of carcinogenesis. In Section 5, by combining models in Sections 2–4, we proceed to develop a generalized Bayesian inference and Gibbs sampling procedures to estimate the unknown parameters, to validate the model, and to predict cancer incidence. As an example of application, in Section 6 we proceed to develop a multistage model of human eye cancer with inherited cancer cases as described in Figure 2. We will illustrate the model and methods by analyzing the SEER data of human eye cancer from NCI/NIH. Finally in Section 7, we will discuss the usefulness of the model and the methods developed in this paper and point out some future research directions.
2. The Stochastic Multistage Model of Carcinogenesis with Inherited Cancer Cases
The k-stage multistage model of carcinogenesis views carcinogenesis as the end point of k (k≥2) discrete, heritable, and irreversible events (mutations, genetic changes or epigenetic changes) with intermediate cells subjected to stochastic proliferation and differentiation (Little [1]; Tan [2, 3]; Tan et al. [4, 5]; Weinberg [6]). Let N=J0 denote normal stem cells, T the cancer tumors, and Ji the ith stage initiated cells arising from the (i-1)th stage initiated cells (i=1,…,k) by some genetic and/or epigenetic changes. Then the model assumes N→J1→J2→⋯→Jk→Tumor with the Ji cells subject to stochastic proliferation (birth) and differentiation (death). Further, it assumes that each stem cell proceeds independently of other cells and that cancer tumors develop from primary Jk cells by clonal expansion (stochastic birth and death), where primary Jk cells are Jk cells which arise directly from Jk-1 cells; see Yang and Chen [11].
For example, Figure 1 is a multistage pathway for the squamous NSCLC (NonSmall Cell Lung cancer) as proposed by Osada and Takahashi [12] and Wistuba et al. [13]. Similarly, Figure 2 is the multistage model for uveal melanoma proposed by Landreville et al. [14] and Mensink et al. [15] while Figure 3 is the APC-β-Catenin-Tcf pathway for human colon cancer (Tan et al. [8], Tan and Yan [16]).
Histopathology lesions and genetic pathway of squamous cell carcinoma of NonSmall Cell Lung Caner (NSCLC).
A Multistage Model of uveal melanoma (adult human eye cancer).
The APC-β-catenin-Tcf-myc pathway for human colon cancer.
Remark 1.
To develop stochastic multistage models of carcinogenesis, in the literature (Little [1], Tan [2], Zheng [7]) it is conveniently assumed that the Jk cells grow instantaneously into cancer tumors as soon as they are generated. In this case, the number of tumors is equal to the number of Jk cells and one may identify Jk cells as tumors. It follows that the number of tumors is a Markov process and that the Jk cells are transient cells. In these cases, one needs only to deal with T(t) and Jj cells with j=1,…,k-1. However, as shown by Yang and Chen [11], the number of tumors is much smaller than the total number of Jk cells. Also, in many animal models and in cancer risk assessment of radiation, Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19] have shown that T(t) are in general not Markov.
To extend the above model to include hereditary cancers, observe that mutants of cancer genes exist in the population and that both germline cells (egg and sperm) and somatic cells may carry mutant alleles of cancer genes [2, 20]. Further, without exception, every human being develops from the embryo in his/her mother’s womb (embryo stage, denote time by 0), where stem cells of different organs divide and differentiate to develop different organs respectively (see Weinberg [6], Chapter 10). If both the egg and the sperm generating the embryo carry mutant alleles of relevant cancer genes, then the individual is an J2-stage person at the embryo stage; if only one of the germ line cells (egg or sperm) generating the embryo carries mutant alleles of cancer genes, then at the embryo stage the individual is an J1-stage person. Similarly, the individual is a normal person (N=J0 person) at the embryo stage if both the egg and the sperm generating the embryo do not carry mutant alleles of cancer genes. Refer to the person in the population as an Ji(i=0,1,2) person if he/she is an Ji-stage person at the embryo stage. Then with respect to the cancer development in question, people in the population can be classified into 3 types of people: normal people (N=J0 people), J1 people, and J2 people. Based on this classification, for normal people in the population the stochastic model of carcinogenesis is a k-stage multievent model given by J0→J1→⋯→Jk→Tumor; for J1 people in the population the stochastic model of carcinogenesis is a (k-1)-stage multievent model given by J1→J2→⋯→Jk→Tumor and for J2 people in the population, the stochastic model of carcinogenesis is a (k-2)-stage multievent model given by J2→⋯→Jk→Tumor.
To account for inherited cancer cases, let p1 be the proportion of J1 people in the population and p2 the proportion of J2 people in the population. In general large human populations under steady-state conditions, one may practically assume that the pi is a constant independent of time (Crow and Kimura [21]). Then p0=1-p1-p2(0<p1+p2<1) is the proportion of normal people (i.e., N=J0 people) in the population. Let n be the population size and ni(i=0,1,2) the number of Ji people in the population so that ∑u=02nu=n. Assume that n is very large and that marriage between people in the population is random with respect to cancer genes; then as shown in Crow and Kimura [21] (see also Tan [22], Chapter 2), the conditional probability distribution of (n1,n2) given n is 2-dimensional multinomial with parameters (n,p1,p2). That is,
(1)(n1,n2)∣n~Multinomial(n;p1,p2).
To derive probability distribution of time to cancer under the above model, observe that during pregnancy the proliferation rates of all stem cells are quite high. Thus, with positive probability J2 people in the population may acquire additional genetic and/or epigenetic changes during pregnancy to become J3-stage people at birth. Similarly, J1 people may acquire genetic and/or epigenetic changes during pregnancy to become J2 people at birth; albeit the probability is very small, normal people at the embryo stage may acquire some genetic and/or epigenetic changes during pregnancy to become J1 people at birth. Because the probability of genetic and epigenetic changes is small, one may practically assume that an Ji(i=0,1,2) person at the embryo stage would only give rise to Ji stem cells and possibly Ji+1 stem cells at birth. This is equivalent to assuming that Ji people at the embryos stage would not generate Ji+j(j>1) stem cells at or before birth. This model is represented schematically in Figure 4. Notice that if k=2, one may practically assume that with probability one an J2 person at the embryo stage would develop cancer at or before birth (t0). If k=3, then with probability α(α>0), an J2 person at the embryo stage would develop cancer at or before birth.
Embryo genotypes and their frequencies at embryo stage and at birth.
3. The Stochastic Process of Carcinogenesis with Hereditary Cancer Cases and Mathematical Analysis
Because tumors are developed from primary Jk cells, for the above stochastic model, the identifiable response variables are T(t) and {Ju(t;i),i=0,1,2,u=i,i+1,…,k-1}, where T(t) is the number of cancer tumors at time t and Ju(t;i) is the number of Ju(u=i,i+1,…,k-1) cells at time t in people who are Ji people at the embryo stage (see [3, 5, 8, 23], Remarks 1 and 2). For people who have genotype Ji(i=0,1,2) at the embryo stage, the stochastic model of carcinogenesis is then given by the stochastic process {X~i(t),T(t),t>0}, where X~i(t)={Ju(t;i),u=i,i+1,…,k-1}′. For these processes, in the next subsections, we will derive stochastic equations for the state variables (Ju(t;i)i=0,1,2,u=i,…,k-1); we will also derive the probability distributions of these state variables and the probabilities of developing cancer tumors. These are the basic approaches for modeling carcinogenesis used by the first author and his associates; see Tan [3], Tan et al. [4, 5, 8, 23], Tan and Zhou [9], Tan and Yan [16], and Tan and Chen [24, 25] and Remark 3.
Remark 2.
At any time (say t) the total number of Jk cells is equal to the total number of Jk cells generated from Jk-1 cells at time t plus the total number of Jk cells generated by cell division from other Jk cells at time t; the former Jk cells are referred to as primary Jk cells while the latter are not primary Jk cells. Since each tumor is developed from a single primary Jk cell through stochastic birth and death process, each primary Jk cell will generate at most one tumor. It follows that at any time the total number of Jk cells is considerably greater than the number of cancer tumors (see also Yang and Chen [11]). Thus, for generating cancer tumors the only identifiable state variables are the number of Jj cells with (j=0,1,…,k-1) and the number of detectable cancer tumor.
Remark 3.
To model stochastic multistage models of carcinogenesis, the standard traditional approach is to assume that the last stage cells (i.e., the Jk cells in the model N→J1⋯→Jk→Tumor) grow instantaneously into a cancer tumor as soon as they are generated and then apply the standard Markov theory to T(t) and to the state variables X~(t)={Ji(t),i=0,1,…,k-1}. This approach has been described in detail in Tan [2], Little [1], and Zheng [7]; see also Luebeck and Moolgavkar [26] and Durrett et al. [27]. However, in some cases the assumption of instantaneous growth into cancer tumors of Jk cells may not be realistic (Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19]); in these cases, T(t) is not Markov so that the Markov theory method is not applicable to T(t). To develop analytical results and to resolve many difficult issues, Tan and his associates [4, 5, 24] have proposed an alternative approach through stochastic equations and have followed Yang and Chen [11] to assume that cancer tumors develop by clonal expansion from primary last stage cells. Through probability generating function method, Tan and Chen [24] have shown that if the Markov theory is applicable to T(t), then the stochastic equation method is equivalent to the classical Markov theory method but is more powerful. Also, through stochastic equation method we have shown in the Appendix that the classical approach provides a close approximation to discrete time model under the assumption that the primary last stage cells develop into a detectable tumor in one time unit. This provides a reasonable explanation why the traditional approach (see [2, 22]) can still work well even though the Markov assumption for T(t) may not hold. In this paper we will thus basically use the stochastic equation method and assume that cancer tumors develop from primary last stage cells through clonal expansion.
3.1. Stochastic Equations for the State Variables
Assume now that an individual is an Ji(i=0,1,2) person at the embryo stage. Then in this individual, cancer is developed by a (k-i)-stage multievent model given by Ji→⋯→Jk→Tumor and the identifiable response variables are given by {X~i(t)′,T(t)}. To derive stochastic equations for the staging variables in X~i(t) in this individual, observe that for each i=0,1,2,X~i(t)′ is in general a Markov Process although T(t) may not be Markov; see Remark 1, Tan [3] and Tan et al. [4, 5], Tan and Zhou [9], and Tan and Yan [16]. It follows that X~i(t+Δt)′ derive from X~i(t)′ through stochastic birth-death processes of Ju(u=i,i+1,…,k-1) cells and through stochastic transition {Ju→Ju+1,u=i,i+1,…,k-1} during (t,t+Δt]. Let {Bu(t;i),Du(t;i),Mu(t;i)} be the number of birth, the number of death of Ju cells, and the number of transition from Ju→Ju+1 cells during (t,t+Δt], respectively in people who are Ji people at the embryo stage. Let M0(t) denote the number of transitions from N→J1 during (t,t+Δt]. Because the transition of Ju→Ju+1 would not affect the number of Ju cells but only increase the number of Ju+1 cells (see Remark 4), by the conservation law we have the following stochastic equations for {Ju(t;i),u=i,…,k-1,i=0,1,2} (see Tan [3], Tan et al. [4, 5, 8], Tan and Zhou [9], and Tan and Yan [16]):
(2)Ji(t+Δt;i)=Ji(t;i)+Bi(t;i)-Di(t;i),i=0,1,2,Ju(t+Δt;i)=Ju(t;i)+Bu(t;i)-Du(t;i)+Mu-1(t;i),i<u≤k-1.
Because {Bv(t;i),Dv(t;i),Mv(t;i),i=0,1,2,v=i,…,k-1} are random variables, the above equations are basically stochastic equations. To derive probability distributions of these variables, let bu(t) and du(t) denote the birth rate and the death rate at time t of the Ju(u=0,1,…,k-1) cells, respectively. Let βu(t)(u=0,1,…,k-1) be the transition rate at time t from Ju→Ju+1. Then, as shown in Tan [3], for (i=0,1,2,u=i,…,k-1) we have, to the order of o(Δt),
(3){Bu(t;i),Du(t;i),Mu(t;i)}∣Ju(t;i)~Multinomial{Ju(t;i);bu(t)Δt,du(t)Δt,βu(t)Δt}.
It follows that to the order of o(Δt),
(4){Bu(t;i),Du(t;i)}∣Ju(t;i)~Multinomial{Ju(t;i);bu(t)Δt,du(t)Δt},Mu(t;i)∣Ju(t;i)~Binomial{Ju(t;i);βu(t)Δt}~Poisson{Ju(t;i)βu(t)Δt}+o(βj(t)Δt),independentlyof{Bu(t;i),Du(t;i)},u=0,1,…,k-1.
From these distribution results, by subtracting from the random transition variables its conditional means, respectively, we obtain the following stochastic equations for the state variables X~i(t)(i=0,1,…,Min(2,k-1)):
(5)dJi(t;i)=Ji(t+Δt;i)-Ji(t;i)=Bi(t;i)-Di(t;i)={Ji(t;i)γi(t)}Δt+ei(t;i)Δt,i=0,1,2,dJu(t;i)=Ju(t+Δt;i)-Ju(t;i)=Mu-1(t;i)+Bu(t;i)-Du(t;i)={Ju-1(t;i)βu-1(t)+Ju(t;i)γu(t)}Δt+eu(t;i)Δt,i=0,1,…,Min(2,k-1);i<u≤k-1,
where γu(t)=bu(t)-du(t) for u=0,1,…,k-1 and where ei(t;i)Δt=[Bi(t;i)-Ji(t;i)bi(t)Δt] − [Di(t;i)-Ji(t;i)di(t)Δt] for i=0,1,2, eu(t;i)Δt=[Bu(t;i)-Ju(t;i)bu(t)Δt]-[Du(t;i) − Ju(t;i)du(t)Δt]+[Mu-1(t;i)-Ju-1(t;i)βu-1(t)Δt] for i=0,1,…,Min(2,k-1),i<u≤k-1.
From the above equations, by dividing both sides by Δt and letting Δt→0 we obtain
(6)Ji(t;i)dt=Ji(t;i)γi(t)+ei(t;i),i=0,1,…,Min(2,k-1);Ju(t;i)dt=Ju(t;i)γu(t)+Ju-1(t;i)βu-1(t)+eu(t;i),fori=0,1,Min(2,k-1);i<u≤k-1.
In the above equations, using the distribution results in (4) it can easily be shown that the random noises {eu(t;i),u=i,…,k-1,i=0,1,2} have expected value zero and are uncorrelated with the staging variables and T(t). The initial conditions at birth (t0) for the above stochastic differential equations are {Ju(t0;i)>0,u=i,i+1;Ju(t0;i)=0,u>i+1}.
Given the initial conditions (Ju(t0;i)>0,u=i,i+1) and (Ju(t0;i)=0,u>i+1) at birth (t0), the solution of the equations in (6) is given, respectively, by
(7)Ji(t;i)=Ji(t0;i)e∫t0tγi(x)dx+ηi(t;i),i=0,1,…,Min(2,k-1),Ju(t;i)=Ju(t0;i)e∫t0tγu(x)dx+∫t0tJu-1(x;i)βu-1(x)e∫xtγu(y)dydx+ηu(t;i)=⋯=Ju(t0;i)e∫t0tγu(x)dx+∑v=1u-iJu-v(t0;i)ϕu(v)(t;i)+∑v=1u+1-iηu(v)(t;i),u=i+1,…,k-1,wherei=0ifk=2,i=0,1ifk=3,i=0,1,2,ifk>3,
where
(8)ϕu(1)(t;i)=∫t0te∫xtγu(y)dy+∫t0xγu-1(y)dyβu-1(x)dxu=i,…,k-1,ϕu(v)(t;i)=∫t0te∫xtγu(y)dyβu-1(x)ϕu-1(v-1)(x;i)dx,v=2,…,u-i,ηu(t;i)=ηu(1)(t;i)=∫t0te∫xtγu(y)dyeu(x;i)dx,u=i,…,k-1,ηu(v)(t;i)=∫t0te∫xtγu(y)dyβu-1(x)ηu-1(v-1)(x;i)dx,v=2,…,u-i.
If the model is time homogeneous so that {βu(t)=βu,bu(t)=bu,du(t)=du,γu(t)=bu-du=γu, u=0,1,…,k-1} and if γi≠γu if i≠u, the above solutions under the initial conditions (Ji+u(t0;i)>0,u=0,1;Ji+u(t0;i)=0,u>1) then reduce, respectively, to
(9)Ji(t;i)=Ji(t0;i)eγi(t-t0)+ηi(1)(t;i),i=0,1,…,Min(2,k-1),Ju(t;i)=Ju(t0;i)eγu(t-t0)+βu-1∫t0tJu-1(x;i)eγu(t-x)dx+ηu(1)(t;i)=⋯=Ju(t0;i)eγu(t-t0)+∑r=iu-1Jr(t0;i)(∏v=ru-1βv)∑l=ruAru(l)eγl(t-t0)+∑r=iuηu(u+1-r)(t;i)=∑r=ii+1Jr(t0;i)(∏v=ru-1βv)∑l=ruAru(l)eγl(t-t0)+∑r=iuηu(u+1-r)(t;i),i<u≤k-1,
where for i≤v≤u,
(10)Aiu(v)=1,ifi=u=∏l=i,l≠vu(γl-γv)-1,ifi<u.
Obviously, E[ηu(r)(t;i)]=0 for all (i=0,1,2,u=i,…,k-1,r=1,…,u-i+1). It follows that for (i=0,1,…,Min(k-1,2)), the expected values E[Jk-1(t;i)] of Jk-1(t;i) for homogeneous models with γi≠γu if i≠u are given by
(11)E[Jk-1(t;i)=E[Ji+1(t0;i)]×(∏u=i+1k-2βu)∑v=i+1k-1A(i+1)(k-1)(v)eγv(t-t0)+E[Ji(t0;i)](∏u=ik-2βu)∑v=ik-1Ai(k-1)(v)eγv(t-t0),i=0,1,Min(k-1,2),k≥2,
where as a convention, (∑j=i+1ici=0,∏j=i+1idj=1) for all (ci,dj).
Remark 4.
Because genetic changes and epigenetic changes occur during cell division, to the order of o(Δt), the probability is βu(t)Δt that one Ju cell at time t would give rise to 1 Ju cell and 1 Ju+1 cell at time t+Δt by genetic changes or epigenetic changes. It follows that the transition of Ju→Ju+1 would not affect the population size of Ju cells but only increase the size of the Ju+1 population.
3.2. Transition Probabilities and Probability Distributions of Staging Variables
Let f(x;n,p) denote the probability density function of a binomial random variable X~Binomial(n;p), h(x;λ) the probability density function of a Poisson random variable X~Poisson(λ), and g(x,y;n,p1,p2) the probability density function of a bivariate multinomial random vector (X,Y)~Multinomial(n;p1,p2). Using the stochastic equations of the staging variables given by (2) and using the probability distributions of the transition variables {Br(t;i),Dr(t;i),Mr(t;i)} in (4), as in Tan et al. [4, 5], we obtain the following transition probabilities of {Jr(t+Δt;i),r+i,…,k-1} given {Jr(t,i),r=i,…,k-1} for (i=0,1,…,Min(k-1,2)):
(12)P{Jr(t+Δt;i)=vr,r=i,i+1,…,k-1∣Jr(t;i)=ur,r=i,i+1,…,k-1}=P{Ji(t+Δt;i)=vi∣Ji(t;i)=ui}×∏j=i+1k-1P{Jj(t+Δt;i)=vj∣Jj(t;i)=uj,Jj-1(t;i)=uj-1},
where
(13)P{Ji(t+Δt;i)=vi∣Ji(t;i)=ui}=∑r=0uif(r;ui,bi(t)Δt)×f(ui-vi+r;ui-r,di(t)Δt1-bi(t)Δt),i=0,1,…,Min(k-1,2),P{Jj(t+Δt;i)=vj∣Jj(t;i)=uj,Jj-1(t;i)=uj-1}=∑r1=0uj∑r2=0uj-r1g(r1,r2;uj,bj(t)Δt,dj(t)Δt)×h(vj-uj-r1+r2;uj-1βj-1Δt),j>i.
Define the unobservable transition variables U~i(t)={Bi(t;i),(Bj(t;i),Dj(t;i)),j=i+1,…,k-1}′(i=0,1,Min(k-1,2)). Then, we have for the joint probability density function of {X~i(t+Δt),U~i(t)} given X~i(t)(14)P{X~i(t+Δt),U~i(t)∣X~i(t)}=P{X~i(t+Δt)∣U~i(t),X~i(t)}×P{U~i(t)∣X~i(t)},
where
(15)P{X~i(t+Δt)∣U~i(t),X~i(t)}=f(di(t)Δt1-bi(t)ΔtJi(t;i)-Ji(t+Δt;i)+Bi(t;i);Ji(t;i)-Bi(t;i),di(t)Δt1-bi(t)Δt)×∏j=i+1k-1h{Jj(t+Δt;i)-Jj(t;i)-Bj(t;i)+Dj(t;i);Jj-1(t;i)βj-1(t)Δt},(16)P{U~i(t)∣X~i(t)}=f(Bi(t;i);Ji(t;i),bi(t;i)Δt)×∏j=i+1k-1g{Bj(t;i),Dj(t;i);Jj(t;i),bj(t;i)Δt,dj(t;i)Δt}.
Let e~i(j) be a (k-i)×1 column vector with 1 in the jth position (1≤j≤k-i) and with 0 in other positions. Let u~=(ui,…,uk-1)′ and v~=(vi,…,vk-1)′ be (k-i)×1 column vectors of nonnegative integers. (i.e., uj and vj are nonnegative integers). Then, by using the probability distribution results in (14)–(16) it can readily be shown that (17)P{X~i(t+Δt)=v~∣X~i(t)=u~}=[ujbj(t)+(1-δji)uj-1βj-1(t)]Δt+o(Δt),ifv~=u~+e~i(j),j=i,…,k-1=ujdj(t)Δt+o(Δt),ifv~=u~-e~i(j),j=i,…,k-1=o(Δt)if|1~n-k′(u~-v~)|≥2.
The above results imply that X~i(t) is a (k-i)-dimensional birth-death process with birth rates {ibu(t),u=i,…,k-1}, death rates {idu(t),u=i,…,k-1}, and cross-transition rates {αu,u+1(j,t)=jβu(t),u=i,…,k-1,βu,v(j,t)=0ifv≠u+1}. (See Definition 4.1 in Tan ([22], Chapter 4)). Using these results, it can be shown that the Kolmogorov forward equation for the probabilities P{Jj(t;i)=uj,j=i,…,k-1∣Ji(0)=mi,Jj(0)=0,j>i}=P(uj,j=i,…,k-1;t)(i=0,1,Min(k-1,2)) in the above model is given by
(18)ddtP(uj,j=i,…,k-1;t)=P(ui-1,uj,j=i+1,…,k-1;t)(ui-1)bi(t)+∑j=i+1k-1P(ui,ui+1,…,uj-1,uj-1,uj+1,…,uk-1;t)(uj-1)bj(t)+∑j=ik-2P(ui,ui+1,…,uj,uj+1-1,uj+2,…,uk-1;t)ujβj(t)+∑j=ik-1P(ui,ui+1,…,uj-1,uj+1,uj+1,…,uk-1;t)(uj+1)dj(t)-P(ui,ui+1,…,uk-1;t)×{∑j=ik-1uj[bj(t)+dj(t)]+∑j=ik-2ujβj(t)},
for uj=0,1,…,∞,j=i,…,k-1.
By using the above set of differential equations, one can readily compute the probabilities P{Jj(t)=uj,j=i,…,k-1∣Ii(0)=mi}=P(uj,j=i,…,k-1;t) numerically.
3.3. The Probability Distributions of the Number of Detectable Tumors and Times to Tumors
As shown by Yang and Chen [11], malignant cancer tumors arise from primary Jk cells by clonal expansion, where primary Jk cells are Jk cells generated directly by Jk-1 cells. (Jk cells derived by stochastic birth of other Jk cells are not primary Jk cells). That is, cancer tumors develop from primary Jk cells through stochastic birth-death processes.
To derive the probability distribution for T(t) in Ji people in the population, let PT(s,t) denote the probability that a primary cancer cell at time s develops into a detectable cancer tumor by time t. (Explicit formula for PT(s,t) has been given in Tan [22], Chapter 8 and in Tan and Chen [24]). Than, as shown in Tan ([3, 22], chapter 8), the conditional probability distribution of T(t) given {Jk-1(s;i),s≤t} in Ji people is Poisson with mean ω(t;i), where ω(t;i)=∫t0tJk-1(s;i)βk-1(s)PT(s,t)ds. That is,
(19)T(t)∣{Jk-1(s;i),s≤t}~Poisson(ω(t;i)).
Let Qi(j) be the probability that cancer tumors develop during (tj-1,tj] in Ji people in the population. For time homogeneous models with small βk-1, Qi(j) is then given by
(20)Qi(j)=E{e-ω(tj-1;i)-e-ω(tj;i)}=e-βk-1Hi(tj-1)-e-βk-1Hi(tj)+o(βk-1),
where Hi(t)=∫t0tE[Jk-1(x;i)]PT(x,t)dx.
To derive Qi(j), denote by
(21)θi(k-1)=E[Ji(t0;i-1)]βi∏u=i+1k-1(βuγu),i=1,…,Min(3,k-1),λu(k-1)=E[Ju(t0;u)]βu∏v=u+1k-1(βuγu),u=0,1,…,Min(2,k-1),
and define the functions
(22)ψi(k-1)(t)={∏u=i+1k-1γu}∑v=ik-1Ai(k-1)(v)×∫t0teγu(x-t0)PT(x,t)dx,i=0,1,…,Min(2,k-1).
Applying results of E[Jk-1(t;i)] given in (11), for time homogeneous models with γi≠γj if i≠j we obtain Qi(j)’s as follows.
If k=2, then k-1=1. Hence, Q2(0)=1, Qi(0)=Q2(j)=0 for (i=0,1,j>0) and for j>0,
(23)Q0(j)={e-θ11ψ11(tj-1)-λ01ψ01(tj-1)-e-θ11ψ11(tj)-λ01ψ01(tj)}+o(β1),(24)Q1(j)=(1-α1){e-λ11ψ11(tj-1)-e-λ11ψ11(tj)}+o(β1).
If k≥3, then we have Qi(0)=0 for i=0,1 and Q2(0)=δ2kα and for j>0,
(25)Q0(j)={e-θ1(k-1)ψ1(k-1)(tj-1)-λ0(k-1)ψ0(k-1)(tj-1)-e-θ1(k-1)ψ1(k-1)(tj)-λ0(k-1)ψ0(k-1)(tj)}+o(βk-1),(26)Q1(j)={e-θ2(k-1)ψ2(k-1)(tj-1)-λ1(k-1)ψ1(k-1)(tj-1)-e-θ2(k-1)ψ2(k-1)(tj)-λ1(k-1)ψ1(k-1)(tj)}+o(βk-1),(27)Q2(j)=δ2k(1-α){e-λ22ψ22(tj-1)-e-λ22ψ22(tj)}+(1-δ2k){e-θ3(k-1)ψ3(k-1)(tj-1)-λ2(k-1)ψ2(k-1)(tj-1)-e-θ3(k-1)ψ3(k-1)(tj)-λ2(k-1)ψ2(k-1)(tj)}+o(βk-1),
where δ2k=1 if k=2 and =0 if k≠2.
Notice that if γ0=0, then ψ0(k-1)(t) reduces to
(28)ψ0(k-1)(t)=(∏u=1k-1γu)∑v=0k-1A0(k-1)(v)×∫t0teγv(x-t0)PT(x,t)dx=(∏u=1k-1γu)∑v=1k-1A1(k-1)(v)×∫t0t[eγv(x-t0)-1]PT(x,t)dx.
Notice also that if PT(s,t)≈1 for t>s and if γ0=0, then the above ψij(t)’s reduce, respectively, to
(29)ψii(t)=1γi{eγi(t-t0)-1},i=1,…,k-1,ψ0(k-1)(t)=(∏u=1k-1γu)∑v=1k-1A1(k-1)(v)×1γv2{eγv(t-t0)-1-(t-t0)γv},ψi(k-1)(t)=(∏u=i+1k-1γu)∑v=ik-1Ai(k-1)(v)×1γv{eγv(t-t0)-1},i=1,…,k-1.
4. Probability Distribution of Observed Cancer Incidence Incorporating Hereditary Cancer Cases
For estimating unknown parameters and to validate the model, one would need real data generated from the model. For studies of carcinogenesis such data are usually given by cancer incidence. For example, in the SEER data of NCI/NIH of USA, the data are given by {(y0,n0),(yj,nj),j=1,…,tN}, where y0 is the number of cancer cases at birth and n0 the total number of birth, and where for j≥1, yj is the number of cancer cases developed during the jth age group of a one-year period (or 5 years periods) and nj is the number of noncancer people who are at risk for cancer and from whom yj of them have developed cancer during the jth age group. Given in Table 1 are the SEER data of uveal melanoma (adult eye cancer) during the period 1973–2007. In Table 1, notice that there are some cancer cases at birth implying some inherited cancer cases. In this section, we will develop a statistical model for these types of data sets from the stochastic multistage model with hereditary cancers as given in Section 2. As in previous sections, let nij be the number of individuals who have genotype Ji(i=0,1,2) at the embryo stage among the nj people at risk for the cancer in question. Then, as showed above, (n1j,n2j)∣nj~Multinomial{nj;p1,p2}. It follows that nij∣nj~Binomial{nj,pi},i=0,1,2. In what follows, we let Yj denote the random variable for yj unless otherwise stated.
The SEER incidence data (1973–2007) of uveal melanoma from NCI/NIH (over all races and genders).
Agegroups
Numberof peopleat risk
Observedincidence
Model-Fpredicated
Model-1predicated
Two-stagepredicated
0
12,495,777
34
36
36
38
1
12,221,582
20
11
11
16
2
12,120,990
14
7
8
17
3
12,112,995
9
5
6
18
4
12,146,174
4
4
5
20
5
12,161,336
3
3
4
22
6
12,111,854
2
2
4
25
7
12,160,452
2
2
4
28
8
11,942,586
2
2
4
30
9
12,381,299
1
2
5
34
10
12,512,703
2
3
6
38
11
12,410,338
5
3
6
41
12
12,449,244
2
3
6
44
13
12,527,781
5
4
7
48
14
12,602,883
3
5
7
51
15
12,719,598
5
5
8
55
16
12,766,107
7
6
9
59
17
12,831,400
9
7
9
62
18
12,382,047
8
7
9
63
19
12,581,638
10
8
10
68
20
12,636,509
7
9
11
71
21
12,682,601
6
10
10
75
22
12,840,510
12
11
11
79
23
13,075,528
17
13
13
84
24
13,358,635
16
15
14
89
25
13,473,849
12
16
16
94
26
13,426,340
17
18
18
97
27
13,525,264
28
20
20
101
28
13,149,674
17
22
21
102
29
13,812,811
23
25
25
110
30
13,886,874
24
28
27
114
31
13,488,332
37
30
29
115
32
13,460,286
32
32
32
118
33
13,256,067
38
35
35
119
34
13,428,827
37
39
38
124
35
13,220,037
40
41
41
126
36
12,870,265
30
44
44
126
37
12,689,592
43
47
47
127
38
12,157,014
42
49
49
125
39
12,494,081
46
55
54
131
40
12,272,125
49
58
58
132
41
11,826,573
56
61
60
130
42
11,663,153
54
65
64
131
43
11,407,082
53
68
68
131
44
11,296,848
70
73
72
133
45
11,016,369
57
76
76
132
46
10,651,593
71
79
79
130
47
10,475,708
87
84
83
131
48
9,994,684
82
86
85
127
49
10,138,908
78
93
93
131
50
9,836,359
87
97
96
130
51
9,475,641
95
100
99
127
52
9,250,985
113
104
104
126
53
9,027,382
106
108
108
125
54
8,883,737
117
113
113
125
55
8,547,883
129
116
116
123
56
8,279,648
107
119
119
121
57
8,062,368
119
123
123
119
58
7,654,610
132
124
124
115
59
7,563,706
118
130
129
115
60
7,232,719
131
131
130
112
61
6,927,332
116
132
132
109
62
6,708,273
133
134
134
107
63
6,543,931
143
138
137
106
64
6,404,652
130
141
141
105
65
6,168,486
145
142
142
102
66
5,913,479
138
142
142
99
67
5,746,766
151
144
144
98
68
5,480,517
147
142
142
94
69
5,363,912
149
144
144
94
70
5,110,728
136
142
142
90
71
4,925,076
144
141
141
88
72
4,696,825
140
138
138
85
73
4,512,136
146
135
135
82
74
4,345,300
126
133
133
80
75
4,148,801
122
128
129
78
76
3,900,900
124
122
122
74
77
3,681,587
114
116
116
70
78
3,481,918
93
110
110
67
79
3,243,631
102
102
102
63
80
2,961,234
79
92
93
58
81
2,724,984
93
83
84
54
82
2,495,219
82
75
75
50
83
2,271,595
92
66
67
46
84
2,041,351
64
57
58
42
85
10,466,605
304
282
285
216
Note 1: for age 0 and 1–10 years old, the cancer incidence are derived by subtracting incidence of retinoblastoma from the original SEER data (see Tan and Zhou [9]).
Note 2: the observed uveal melanoma incidence rates per 10^{6} individuals are derived from the SEER eye cancer incidence by subtracting retinoblastoma incidence as given in Tan and Zhou [9].
4.1. The Probability Distribution of <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M315"><mml:mrow><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>
As shown in Figure 4, Ji(i=0,1,2) people would only generate Ji stage cells and Ji+1 stage cells at birth. Thus, for cancers to develop at or before birth, the number of stages for the stochastic model of carcinogenesis must be 3 or less. It follows that if y0>0, the appropriate model of carcinogenesis must be either a 2-stage model or a 3-stage model. Since n20∣n0~Binomial(n0;p2) and n10∣(n0,n20)~Binomial(n0;p1)+o(p2), the probability distribution of Y0 is therefore
(30)Y0~Poisson(χk)k=2,3,
where
(31)χk=n0(p2+p1α)ifk=2=n0p2αifk=3.
The expected number of Y0 given n0 is E(Y0∣n0)=n0(p2+p1α)=χ2 if k=2 and E(Y0∣n0)=n0p2α=χ3 if k=3. Hence, for the 2-stage model (i.e., k=2) or the 3-stage model (i.e., k=3), the maximum likelihood estimate of χk is χ^k=y0 and the deviance D0(k) from the conditional probability distribution of Y0 given n0 is
(32)D0(k)=-2{log{h(y0;χk)}-log{h(y0;χ^k)}}={χk-y0}-y0log{χky0},k=2,3.
4.2. The Probability Distribution of <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M339"><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi /><mml:mo mathvariant="bold">(</mml:mo><mml:mi>j</mml:mi><mml:mo mathvariant="bold">≥</mml:mo><mml:mn>1</mml:mn><mml:mo mathvariant="bold">)</mml:mo></mml:math></inline-formula>
To derive the probability distribution of Yj(j≥1) in the jth age group, let Yij(i=0,1,2) be the number of cancer cases generated by people who have genotype Ji at the embryo stage among these Yj cancer cases. Then Yj=∑i=02Yij and Y0j is the number of cancer cases generated by the n0j=nj-n1j-n2j normal people in the population. The conditional probability distribution of Yij given nij is
(33)Yij∣nij~Poisson{nijQi(j)},i=0,1,2.
Notice that if k=2 (a 2-stage model), then all J2 individuals would develop tumor at or before birth. Thus, if k=2, then Y2j=0 for all j>0 so that if j>0, cancer cases develop only from normal people (N=J0 people) and J1 people. On the other hand, if k>2, then with positive probability Yij>0 for all (i=0,1,2,j=0,1,…,nT), where nT is the last time point in the data. Let δ2k=1 if k=2 and δ2k=0 if k≠2. Then, Yj∣(nij,i=0,1,2)~Poisson(QT(j), where QT(j)=∑i=01nijQi(j)+(1-δ2k)n2jQ2(j). Since (n0j,n1j)∣nj~Multinomial(nj;p0,p1), we have for the conditional probability density function P(yj∣nj) of Yj given nj(34)P(yj∣nj)=∑n0j=0nj∑n1j=0nj-n0jg(n0j,n1j;nj,p0,p1)h{yj;QT(j)},
where g(n0j,n1j;nj,p0,p1) is the probability density function of (n0j,n1j)∣nj~Multinomial(nj;p0,p1) and h{yj;QT(j)} the probability density function of Yj∣(nij,i=0,1,2)~Poisson{QT(j)}.
The probability density function P(yj∣nj) given by (34) is a mixture of Poisson probability density functions with mixing probability density function given by the multinomial probability distribution of {n0j,n1j} given nj. This mixing probability density function represents individuals with different genotypes at the embryo stage in the population.
Let Θ be the set of all unknown parameters (i.e., the parameters (p1,p2,α) and the birth rates, the death rates, and the mutation rates of Jj cells). Based on data (yj,j=0,1,…,nT), the likelihood function of Θ is
(35)L{Θ∣yj,j=0,1,…,nT}=h(y0;χ(k))∏j=1tNP(yj∣nj).
Notice that because the mutation rates are very small, one may practically assume βi(t)=βi for i=0,1,…,k-1. Also, because the stage-limiting genes are basically tumor suppressor genes which act recessively (see Tan [3], Weinberg [6], and Tan et al. [5, 8, 23]), one may practically assume {bi(t)=bi,di(t)=di,γi(t)=bi-di=γi,i=0,1,…,k-1} (see Tan et al. [4, 5, 8]).
4.3. The Joint Probability Distribution of Augmented Variables and Cancer Incidence
For applying the mixture distribution of Yj in (34) to make inference about the unknown parameters, one needs to expand the model to include the unobservable augmented variables (n0j,n1j,y0j,y1j,j=1,…,tN) and derives the joint probability distribution of these variables. For these purposes, observe that for (j=1,…,tN) and for k>2, the conditional probability distribution P{yij,i=0,1∣yj,nij,i=0,1,nj} of (Yij,i=0,1) given (Yj=yj,nij,i=0,1,nj) is
(36)(yij,i=0,1)∣(yj,nij,i=0,1,nj)~Multinomial(yj;nijQi(j)QT(j),i=0,1).
Since the conditional probability distribution of Yj given (nij,i=0,1,2) for k>2 is Poisson with mean QT(j), we have for the joint conditional probability density function P{yij,i=0,1,yj∣nij,i=0,1,nj} of (Yij,i=0,1,Yj) given (nij,i=0,1,nj)(37)P{yij,i=0,1,yj∣nij,i=0,1,nj}=P{yij,i=0,1∣yj,nij,i=0,1,2}×P{yj∣nij,j=0,1,2}=1yj!e-QT(j){QT(j)}yj×g{y0j,y1j;yj,nijQi(j)QT(j),i=0,1}=∏i=02h{yij;nijQi(j)},(k>2),
where y2j=yj-∑u=01yuj and n2j=nj-∑u=01nuj.
If k=2, then Y2j=0 so that Yj=∑i=01Yij. Thus, we have for k=2,
(38)Yj∣(nij,i=0,1,nj)~Poisson{∑i=01nijQi(j)},(39)Yij∣(Yj=yj,nij,i=0,1,nj)~Binomial(yj;nijQi(j)∑u=01nujQu(j)),i=0,1.
It follows that if k=2, then ∑i=01Yij=Yj and the joint probability density function of (Y0j,Yj) given (nuj,u=0,1,2) is
(40)P{y0j,yj∣nuj,u=0,1,nj}=P{yj∣nij,i=0,1,nj}×P{y0j∣yj,nuj,u=0,1,nj}=∏i=01h{yij;nijQi(j)},ifk=2,
where yj=∑i=01yij and nj=∑i=02nij.
Put {Y=(yij,i=0,1,j=1,…,tN), N=(nij,i=0,1,j=1,…,tN), n~=(nj,j=0,1,…,tN), y~=(yj,j=0,1,…,tN)}. From (37) and (40), we have for the conditional joint probability density function of (Y,y~) given (N,n~)(41)P{Y,y~,∣N,n~,Θ}=h(y0;χ(k))∏j=1tN{h[y2j;n2jQ2(j)]}1-δ2k×∏i=01h{yij;nijQi(j)}.
It follows that the joint conditional probability density function of {N,Y,y~} given (n~,Θ) is
(42)P{N,Y,y~∣n~,Θ}=h(y0;χ(k))∏j=1tNg(n0j,n1j;nj,p0,p1)×{h[y2j;n2jQ2(j)]}1-δ2k×∏i=01h{yij;nijQi(j)}.
Notice that the above probability density function is a product of multinomial probability density functions and Poisson probability density functions. For this joint probability density function, the deviance Dev=-2{logP[Y,y~,N∣n~,Θ]-logP[Y,y~,N∣n~,Θ^]} is
(43)Dev=D0(k)+Dev(p1,p2)+∑j=1tNDj,
where
(44)D0(k)=2{χ(k)-y0-y0log{χ(k)y0}},(45)Dev(p1,p2)=2∑j=1tN{∑i=12n0jlog{p^0(1-p1-p2)}+∑i=12nijlog{p^ipi}},(46)Dj=2∑i=01{nijQi(j)-yij-yijlog{nijQi(j)yij}}+2(1-δ2k)×{n2jQ2(j)-y2j-y2jlog{n2jQ2(j)y2j}},
where p^i=((∑j=0tNnij)/(∑j=0tNnj))(i=0,1,2).
The joint probability density function P{Y,y~,N∣n~,Θ} of (Y,y~,N) given by (42) will be used as the kernel for the Bayesian method to estimate the unknown parameters and to predict the state variables.
4.4. Fitting of the Model to Cancer Incidence Data
To fit the model to real data, as in Tan [3–5], we let Δt~1 to correspond to a fixed time interval such as 6 months in human cancer studies. (Tan et al. [4] has assumed 3 months as one-time unit while Luebeck and Moolgavkar [26] has assumed one year as one-time unit). Then, because the proliferation rate of the last stage cells is quite large, one may practically assume PT(s,t)=1 for t-s≥1. Hence, noting that βk-1(t)=βk-1 is usually very small (see [3–5]), the Qi(j) is approximated by
(47)Qi(j)≈E{e-βk-1Gi(tj-1)-e-βk-1Gi(tj)}=e-βk-1E[Gi(tj-1)]-e-βk-1E[Gi(tj)]+o(βk-1),
where Gi(t)=∑s=t0t-1Jk-1(s;i).
Under discrete time approximation, the E[Ik-1(t;i)]’s have been derived in the appendix. Using these results of expected numbers and using the result ∑i=0t-1ai=(at-1)/(a-1) for a≠0, we obtain
(48)βk-1E[Gi(t)]=∑r=ii+1E[Jr(t0;i)](∏u=rk-1βu)×∑v=rk-1Ar(k-1)(v){∑s=t0t-1(1+γv)s-t0}=θ(i+1)(k-1)ϕ(i+1)(k-1)(t)+λi(k-1)ϕi(k-1)(t),
where (θi(k-1),i=1,2,3) and (θi(k-1),i=0,1,2) are defined in Section 3.3 and where the ϕi(k-1)(t)(i=0,1,2,3)’s are given by
(49)ϕi(k-1)(t)=(∏u=i+1k-1γu)∑v=ik-1Ai(k-1)(v)×1γv{(1+γv)t-t0-1},i=0,1,2,3.
Notice that if γ0=0, then ϕ0(k-1)(t) reduces to
(50)ϕ0(k-1)(t)=(∏u=1k-1γu)∑v=1k-1A1(k-1)(v)×1γv2{(1+γv)t-t0-1-(t-t0)γv}.
Applying these results, for time homogeneous models with γi≠γj if i≠j, the Qi(j)’s under discrete approximation are given as follows.
If k=2, then k-1=1. Hence, Q2(0)=1, Qi(0)=Q2(j)=0 for (i=0,1,j>0) and for j>0,
(51)Q0(j)≈{e-θ11ϕ11(tj-1)-λ01ϕ01(tj-1)-e-θ11ϕ11(tj)-λ01ϕ01(tj)}+o(β1),Q1(j)≈(1-α1){e-λ11ϕ11(tj-1)-e-λ11ϕ11(tj)}+o(β1).
If k≥3, then we have {Q2(0)=δ2(k-1)α,Qi(0)=0,i=0,1} and for j>0,
(52)Q0(j)≈{e-θ1(k-1)ϕ1(k-1)(tj-1)-λ0(k-1)ϕ0(k-1)(tj-1)-e-θ1(k-1)ϕ1(k-1)(tj)-λ0(k-1)ϕ0(k-1)(tj)}+o(βk-1),Q1(j)≈{e-θ2(k-1)ϕ2(k-1)(tj-1)-λ1(k-1)ϕ1(k-1)(tj-1)-e-θ2(k-1)ϕ2(k-1)(tj)-λ1(k-1)ϕ1(k-1)(tj)}+o(βk-1),Q2(j)≈δ2(k-1)(1-α){e-λ22ϕ22(tj-1)-e-λ22ϕ22(tj)}+(1-δ2(k-1)){e-θ3(k-1)ϕ3(k-1)(tj-1)-λ2(k-1)ϕ2(k-1)(tj-1)-e-θ3(k-1)ϕ3(k-1)(tj)-λ2(k-1)ϕ2(k-1)(tj)}+o(βk-1).
Notice that if one replaces [1+γv]t-t0 by [1+γv]t-t0=e(t-t0)log(1+γv)≈e(t-t0)γv, the above Qi(j)’s from discrete time model are exactly the same as from the continuous model, respectively, as given in equations (23)–(27) under the assumption that PT(s,t)=1 for t-s>0. Notice that the assumption PT(s,t)=1 for t-s>0 is equivalent to assuming that the last stage cancer cells grow instantaneously into cancer tumors as soon as they are generated; see Remark 1.
5. The Fitting of the Model to Cancer Incidence and the Generalized Bayesian Inference Procedure
Given the model in Sections 2 and 3 and cancer incidence, one may use results in Section 4 to fit the model. By using this model and the distribution results in Section 4, one can readily estimate the unknown genetic parameters, predict cancer incidence, and check the validity of the model by using the generalized Bayesian inference and Gibbs sampling procedures; for more detail, see Tan [3, 22] and Tan et al. [4, 5].
The generalized Bayesian inference is based on the posterior distribution P{Θ∣N,Y,y~,n~} of Θ given {N,Y,y~,n~}. This posterior distribution is derived by combining the prior distribution P{Θ} of Θ with the joint probability distribution P{N,Y,y~∣n~,Θ} given {n~,Θ} given by (42). It follows that this inference procedure would combine information from three sources: (1) previous information and experiences about the parameters in terms of the prior distribution P{Θ} of the parameters, (2) biological information of inherited cancer cases via genetic segregation of cancer genes in the population (P{N∣n~,pi,i=1,2}; see Section 2), and (3) information from the expanded data (Y) and the observed data (y~) via the statistical model from the system (P{Y,y~∣N,Θ}) given by (37) and (40). Because of additional information from the genetic segregation of the cancer genes, this inference procedure provides an efficient procedure to extract information of effects of genotypes of individuals at the embryo stage.
5.1. The Prior Distribution of the Parameters
For the prior distributions of Θ, because biological information has suggested some lower bounds and upper bounds for the mutation rates and for the proliferation rates, we assume
(53)P(Θ)∝c(c>0),
where c is a positive constant if these parameters satisfy some biologically specified constraints are and equal to zero for otherwise. These biological constraints are as follows.
0<p1<10-2, 0<p2<10-6, and -0.01<γi<1(i=1,2).
For βj(j=0,1,…,k-1), we let 1<ω0=N(t0)β0<1000 (N→I1) and 10-8<βi<10-3, i=1,…,k-1.
For the λj(k-1)(j=0,1,2), we let 0<λi<10, i=0,1 and 0<λ2<103.
For the θi(i=1,2,3), we let 0<θ1<10-2 and 0<θ2<10-1.
We will refer to the above prior as a partially informative prior which may be considered as an extension of the traditional noninformative prior given in Box and Tiao [28].
5.2. The Posterior Distribution of the Parameters Given <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M508"><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:mi>Y</mml:mi><mml:mo>,</mml:mo><mml:mi /><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:munder><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mo>~</mml:mo></mml:munder><mml:mo>,</mml:mo><mml:munder><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>~</mml:mo></mml:munder></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:math></inline-formula>
Denote by Θ1=Θ-{p1,p2,,α}. From the posterior distribution P{Θ∣N,Y,y~,n~}, we obtain
(54)P{pi,i=1,2,α∣Θ1,N,Y,y~,n~}∝(χ(k))y0e-χ(k)p1∑j=1tNn1jp2∑j=1tNn2j×(1-p1-p2)∑j=1tNn0j,0<p1,p2,α<1,P{Θ1∣(p1,p2,α),N,Y,y~,n~}∝∏j=1tN∏i=02e-Qi(j){Qi(j)}yij,Θ1∈Ω,
where Ω is the parameter space of Θ1 provided by the biological constraints in Section 5.1.
For computational convenience, we notice that the log of P{p1,p2,α∣Θ1,N,Y,y~,n~} is proportional to the negative of D0(k)+Dev(p1,p2) given by (44)-(45); similarly, the log of P{Θ1∣(p1,p2,α),N,Y,y~,n~} is proportional to the negative of ∑j=1kDj given by (46).
5.3. The Multilevel Gibbs Sampling Procedure For Estimating Unknown Parameters
Given the posterior probability distributions, we will use the following multilevel Gibbs sampling procedure to derive estimates of the parameters. We notice that numerically, the Gibbs sampling procedure given below is equivalent to the EM-algorithm from the sampling theory viewpoint with Steps 1 and 2 as the E-Step and with Steps 3 and 4 as the M-Step, respectively [29]. These multilevel Gibbs sampling procedures are given by the following.
Given Θ and given n~, use the multinomial distribution of {n1j,n2j} given nj in Section 3 to generate a large sample of N. Then, by combining this large sample with P{Y,y~∣N,n~,Θ} in (37) and (40) to select N through the weighted bootstrap method due to Smith and Gelfand [30]. This selected Nis then a sample from P{N∣Y,y~,n~,Θ} even though the latter is unknown. (For proof, see Tan [22], Chapter 3). Call the generated sample N^.
Given {y~,n~,Θ} and given N=N^generated from Step 1, generate Yfrom the probability distribution P{Y∣N^,y~,n~,Θ} given by (36) and (38). Call the generated sample Y^.
Step 3 (Estimation of <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M539"><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1,2</mml:mn><mml:mo>,</mml:mo><mml:mi>α</mml:mi></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:math></inline-formula> Given <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M540"><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">N</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">Y</mml:mi><mml:munder><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mo>~</mml:mo></mml:munder><mml:mo>,</mml:mo><mml:munder><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>~</mml:mo></mml:munder></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:math></inline-formula>).
Given {y~,n~,Θ1} and given (N,Y)=(N^,Y^) from Steps 1 and 2, derive the posterior mode of {pi,i=1,2,α} by maximizing the conditional posterior distribution P{pi,i=1,2,α∣Θ1,N^,Y^,y~,n~}. Under the partially informative prior, this is equivalent to maximize the negative of the deviance D0(k)+Dev(p1,p2) given by (44)-(45) in Section 4.3 under the constraints given in Section 5.1. Denote this generated mode by {p^i,i=1,2,α^}.
Step 4 (Estimation of <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M547"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">Θ</mml:mi></mml:mrow><mml:mrow><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> Given <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M548"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1,2</mml:mn><mml:mo>,</mml:mo><mml:mi>α</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">N</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">Y</mml:mi><mml:mo>,</mml:mo><mml:munder><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mo>~</mml:mo></mml:munder><mml:mo>,</mml:mo><mml:munder><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>~</mml:mo></mml:munder></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>).
Given (y~,n~) and given (N,Y,pi,i=1,2,α)=(N^,Y^,p^i,i=1,2,α^) from Steps 1–3, derive the posterior mode of Θ1 by maximizing the conditional posterior distribution P{Θ1∣p^i,i=1,2,α^,N^,Y^,y~,n~}. Under the partially informative prior, this is equivalent to maximize the negative of the deviance ∑j=1kDj in (46) under the constraints. Denote the generated mode as Θ^1.
Step 5 (Recycling Step).
With {(N,Y,pi,i=1,2,α,Θ1)=(N^,Y^,p^i,i=1,2,α^,Θ^1)} given above, go back to Step 1 and continue until convergence.
The proof of convergence of the above steps can be derived by using procedure given in Tan ([22], Chapter 3). At convergence, the Θ^={p^i,i=1,2,α^,Θ^1} are the generated values from the posterior distribution of Θ given {y~,n~} independently of (N,Y) (for proof, see Tan [22], Chapter 3). Repeat the above procedures once then generate a random sample of Θ from the posterior distribution of Θ given {y~,n~}; then one uses the sample mean as the estimates of (Θ) and use the sample variances and covariances as estimates of the variances and covariances of these estimates.
6. A New Multistage Stochastic Model for Adult Eye Cancer (Uveal Melanoma)—An Example
The human eye cancers consist of pediatric eye cancers and adult eye cancers. The most common pediatric eye cancer is the retinoblastoma which develops from the retinal pigment epithelium cells underlying the retina that do not form melanoma. The most common adult eye cancers are the uveal melanomas involving the iris, the ciliary body, and the choroid (collectively referred to as the uveal). These cancers develop from melanocytes (pigment cells) which reside within the uveal giving color to the eye. In Tan and Zhou [9] we have developed a modified two-stage model for retinoblastoma. Based on results from molecular biology (see Landreville et al. [14], Mensink et al. [15], and Loercher and Harbour [31]), Landreville et al. [14] have proposed a three stage model for uveal melanoma as given in Figure 2. As an example of applications of this paper, in this section we will apply this model of uveal melanoma to the NCI/NIH eye cancer data from the SEER project. We notice that the same methods can be applied to model other human cancers as well, but this will be our future research.
Given in Table 1 are the numbers of people at risk and the eye cancer cases in the age groups together with the predicted cases from the models. These data give cancer incidence at birth and incidence for 85 age groups (k=85) with each group spanning over a 1-year period except the last age group (≥85 years old). For human eye cancer, because the incidence at birth and for age groups from 1 to 10 years old is basically generated by the pediatric eye cancer-retinoblastoma (see [9]), to account for inherited cancer cases of uveal melanomas, the incidence for age 0 (birth) and for age periods from 1 to 10 years old in Table 1 for uveal melanoma is derived by subtracting incidence of retinoblastoma from SEER data (see Tan and Zhou [9]).
To fit the data, we let one-time unit be 6 months after birth and let t0=1. To compare different models and to assess different assumptions, we will consider the following 2-3-stage mixture models: (1) the complete 3-stage mixture model (Model-F) in which no assumptions are made on the parameters. (2) the Type-1–3-stage mixture model in which we assume that γ1=0 and that normal people and J1 at the embryo stage will remain normal people and J1 people, respectively, at birth (Model-1). For comparison purposes, we also fit a 2-stage model as defined in Tan and Zhou [9]. We will apply the methods in Section 6 to fit these models to the SEER data given in Table 1.
Given in Table 2 are the natural logs of the likelihood functions, the AIC (Akaike Information Criterion) and the BIC (Bayesian Information Criterion) for these models. Given in Table 3 are the estimates of parameters in the 3-stage models. Given in Figure 5 are the plots of predicted cancer cases from the 3-stage mixture models (Model-F and Model-1) and the 2-stage model. For comparison purposes, in Table 1, we also provide numbers of predicted cancer cases from the 3-stage mixture models and the 2-stage model together with the observed cancer cases over time from SEER. From these results, we have made the following observations.
As shown by results in Table 1 and Figure 5, it appeared that both Model-F and Model-1 fitted the SEER data well, although Model-1 fitted the data slightly better from values of AIC and BIC. The Chi-square test statistics χ2=∑i=085((yi-y^i)2/y^i) for Model-F and Model-1 are given by 88.43, and 94.48 respectively, giving a P-value of 0.12 (df=86-12=74) for Model-F and a P value of 0.11 (df=86-7=79) for Model-1. On the other hand, the 2-stage model fitted the date very poorly; the Chi-statistic value for the 2-stage model is 2747.69 giving a P-value less than 10-3. The AIC (Akaike Information Criteria) and BIC (Bayesian Information Criteria) values of Model-1 are given by (AIC = 2609.53, BIC = 2631.51) which are slightly smaller than those of Model-F, respectively; however, the AIC and BIC values (8796.84, 8811.57) of the two-stage model are considerably greater than those of the 3-stage models, respectively. These results suggest that uveal melanoma may best be described by a 3-stage model with inherited component and that one may practically assume γ1=0 and that normal people and J1 people at the embryo stage will remain normal people and J1 people, respectively, at birth.
From Table 3, it is observed that the estimate of γ1 is close to zero (the estimate is of order 10-5) indicating that the phenotype of J1 is almost identical to that of N=J0 further confirming that the staging-limiting genes are basically tumor suppressor genes and that there is no haploinsufficiency for these tumor suppressor genes. On the other hand, the estimate of γ2 is of order 10-2 which is about 103 times greater than those of cells with genotype J1.
From Table 3, the estimates λ^j(j=0,1,2) of λj are of order {10-1,10-1,102~103}, respectively. Because {λi=E[Ji(t0;i)]βi∏u=i+12(βu/γu),i=0,1,2}, assuming some values of {E[Ji(t0;i)],i=0,1,2} from some biological observations, one can have some rough ideas about the magnitude of βj(j=0,1,2). For example, if we follow Potten et al. [32] to assume (E[N(t0)]=E[Ji(t0;i)]~108,i=0,1,2), then βj≈10-6~10-5.
From Table 3, the estimates of p1 and p2 from the SEER data are of orders 10-4~10-3 and 10-7~10-6, respectively. This indicates that in the US population, the frequency of the staging limiting cancer gene for uveal melanoma is approximately around 10-3. Table 3 also showed that the estimate of α was 0.8411, indicating that most individuals with genotype J2 would develop cancer at birth. This may help to explain why there are observed cancer incidences at birth for uveal melanoma in the SEER data even though the estimate of the frequency p2 is of order 10-7~10-6.
The log-likelihood, AIC, and BIC of the fitted models.
Models
Log-likelihood
AIC
BIC
Three-stage
Model-F
−1312.02
2648.04
2677.35
Model-1
−1295.76
2609.53
2631.51
Two-stage
−4392.421
8796.843
8811.569
Estimates of parameters for the 3-stage stochastic models.
Parameters
λ0
λ1
λ2
γ1
γ2
p1
p2
α
θ1
θ2
Model-F
Estimates
2.88E-01
4.81E-01
6.55E+02
2.71E-05
3.37E-02
9.95E-04
9.68E-07
8.41E-01
3.29E-04
3.41E-01
St.D
1.21E-01
4.65E-02
1.12E+02
8.86E-06
1.92E-04
1.75E-06
6.48E-09
4.19E-02
3.54E-05
1.07E-02
95% CL-Lower
5.03E-02
3.89E-01
4.34E+02
5.34E-06
3.33E-02
9.91E-04
9.55E-07
7.59E-01
2.60E-04
3.20E-01
95% CL-Upper
5.25E-01
5.72E-01
8.75E+02
4.01E-05
3.41E-02
9.98E-04
9.81E-07
9.23E-01
3.98E-04
3.62E-01
Model-1
Estimates
6.55E-02
3.72E-01
1.98E+02
NA
3.49E-02
9.98E-04
1.00E-06
8.07E-01
NA
NA
St.D
2.54E-02
4.63E-02
3.38E+01
NA
3.58E-03
6.27E-05
4.53E-08
7.06E-02
NA
NA
95% CL-Lower
1.57E-02
2.82E-01
1.32E+02
NA
2.79E-02
8.75E-04
9.11E-07
6.68E-01
NA
NA
95% CL-Upper
1.15E-01
4.63E-01
2.65E+02
NA
4.19E-02
1.12E0-03
1.09E-06
9.45E-01
NA
NA
Note: NA: assumed nonexistence.
Curve fitting of SEER data by the Model-F, Model-1, and the two-stage model.
7. Discussion and Conclusions
To account for inherited cancer cases, in this paper we have developed some general multistage models involving hereditary cancer cases. For human cancer incidence, these models are basically generalized mixture models. In these mixture models, the mixing probability is a multinomial distribution to account for genetic segregation of the staging-limiting tumor suppressor genes. This mixture model allows us to estimate for the first time the frequency of the staging-limiting tumor suppressor gene in human populations. As an example of applications, in this paper we have developed a general 3-stage stochastic multistage model of carcinogenesis for adult human eye cancer. To account for inherited cancer cases in the stochastic model of human eye cancer, we have also developed a generalized mixture model for uveal melanoma in human beings.
For using the proposed models to fit the cancer incidence data, in this paper we have developed a generalized Bayesian inference procedure to estimate the unknown parameters and to predict cancer cases. This inference procedure is advantageous over the classical sampling theory inference (i.e., maximum likelihood method) because the procedure combines information from three sources: (1) previous information and experiences about the parameters in terms of the prior distribution P{Θ} of the parameters, (2) biological information of inherited cancer cases via the genetic segregation of staging-limiting tumor suppressor genes in the population, and (3) information from the expanded data (Y) and the observed data (y~) via the statistical model from the system (P{Y,y~∣N,Θ}) given by (37) and (40).
To illustrate the usefulness and applications of our models and methods, we have applied our models and methods to the eye cancer SEER data of NCI/NIH. Our analysis clearly showed that the proposed 3-stage model with inherited cancer cases fitted the data nicely (see Table 2 and Figure 5); on the other hand, the classical 2-stage model cannot fit the data at all (see Table 2 and Figure 5). These results clearly have confirmed results from molecular biology that the human eye cancer is derived by a 3-stage model with inherited cancer component. Notice, however, our 3-stage multistage model is more general than the classical 3-stage model as described in Little [1], Tan [2], and Zheng [7] in that we postulate that cancer tumors develop from primary I3 cells by clonal expansion (see Yang and Chen [11]). (Note that the stochastic multistage models in the literature assume that cancer tumors develop from last stage cells immediately as soon as they are generated, ignoring completely cancer progression; see Remark 1). As a matter of fact, we had assumed Δt=1 for a period of three months and found that the 3-stage models then did not fit the SEER data, clearly indicating that Pt(s,t)<1 over a period of three months.
Applying our models and methods to the SEER data of human eye cancer, we have derived for the first time some useful pieces of information. Specifically, we mention (1) for the first time that we have estimated the frequency of the staging-limiting tumor suppressor gene in the US population (p^1~.9948×10-3). (2) With the estimate of α as α^=0.8411, the predicted number of uveal melanoma at birth is y^0=n0α^1p^2=36 by 3-stage models with inherited cancer component (Model-F and Model-1). (The observed number of eye cancer at birth is 34). (3) The estimate of the proliferation rate (γ1) of J1 cells using Model-F is γ^1=2.271×10-5~0. (The estimate is 8.603×10-5 using Model-1). This confirms that the stage-1 limiting gene is a tumor suppressor gene, and unlike the p53 gene in chromosome 17p (see [33]), there is little or no haploid insufficiency for this gene in cells with genotype J1.
Using models and methods of this paper, one can easily predict future cancer cases for human eye cancer. Thus, by comparing results from different populations, our models and methods can be used to assess cancer prevention and control procedures. This will be our future research topics; we will not go any further here.
AppendixThe Expected Numbers of State Variables under Discrete Time Approximation
Under discrete time, the stochastic differential equations of state variables reduce to the following stochastic difference equations of state variables, respectively:
(A.1)Ji(t+1;i)=Ji(t;i)[1+γi(t)]+ei(t+1;i),i=0,1,…,Min(k-1,2),Jj(t+1;u)=Jj(t;u)[1+γj(t)]+Jj-1(t;u)βj-1(t)+ej(t+1;u),0≤u<j≤k-1,
where ei(t+1;i)=[Bi(t;i)-Ji(t;i)bi(t)]-[Di(t;i)-Ji(t;i)di(t)] and ej(t+1;i)=[Bj(t;i) − Jj(t;i)bj(t)]-[Dj(t;i)-Jj(t;i)dj(t)]+[Mj-1(t;i)-Jj-1(t;i)βj-1(t)] for j>i.
The initial conditions at birth (t0) for the above stochastic difference equations are Jj(t0;i)>0 if (j=i,i+1) and Jj(t0;i)=0 if j>i+1. The solution of the above difference equations under these initial conditions is given, respectively, by
(A.2)Ji(t;i)=Ji(t0;i)∏s=t0t-1(1+γi(s))+∑s=t0+1tei(s;i)∏u=st-1(1+γi(u)),i=0,1,…,Min(k-1,2),Jj(t;i)=Jj(t0;i)∏s=t0t-1(1+γj(s))+∑s=t0t-1Jj-1(s;i)βj-1(s)∏u=s+1t-1(1+γj(u))+∑s=t0+1tej(s;i)∏u=st-1(1+γj(u)),0≤i<j≤k-1.
If the model is time homogeneous so that {βj(t)=βj,γj(t)=γj(j=i,…,k-1} and if γi≠γjifi≠j, then the above solutions under the initial conditions (Jj(t0;i)=0,j>i+1) reduce to
(A.3)Ji(t;i)=Ji(t0;i)(1+γi)t-t0+ηi(0)(t;i),i=0,1,…,Min(k-1,2),Jj(t;i)=Jj(t0;i)(1+γj)t-t0+∑u=ij-1Ju(t0;i){∏v=uj-1βv}∑r=ujAuj(r)(1+γr)t-t0+ηj(0)(t;i)+∑u=1j-i{∏r=j-uj-1βr}ηj(u)(t;i)=∑u=ii+1Ju(t0;i){∏v=uj-1βv}∑r=ujAuj(r)(1+γr)t-t0+ηj(0)(t;i)+∑u=1j-i{∏r=j-uj-1βr}ηj(u)(t;i),0≤i<j≤k-1,
where ηj(0)(t;i)=∑s=t0+1tej(s;i)(1+γi)t-s, ηj(u)(t;i)=∑s=t0+1tej-u(s;i){∑v=ujAuj(v)(1+γv)t-s}(u=1,…,j-i).
Thus, if the model is time homogeneous and if γi≠γjifi≠j, the E[Jk-1(t;i)]’s in discrete time models under the initial conditions (Jj(t0;i)=0,j>i+1) are given, respectively, by
(A.4)E[Jk-1(t;i)]=∑u=ii+1E[Ju(t0;i)](∏v=uk-2βv)×∑r=uk-1Au(k-1)(r)(1+γr)t-t0,i=0,1,…,Min(k-1,2).
LittleM. P.TanW. Y.HaninL.Cancer models, ionization and genomic instability: a reviewTanW. Y.TanW. Y.Stochastic multiti-stage models of carcinogenesis as hidden Markov models: a new approachTanW. Y.ZhangL. J.ChenC. W.Stochastic modeling of carcinogenesis: state space models and estimation of parametersTanW. Y.ChenC. W.ZhangL. J.TanW. Y.HaninL.Cancer biology, cancer models and stochastic mathematical analysis of carcinogenesisWeinbergR. A.ZhengQ.TanW. Y.HaninL.Stochastic multistage cancer models: a fresh look at an old approachTanW. Y.ZhangL. J.ChenW.ZhuJ. M.TanW. Y.HaninL.A stochastic model of human colon cancer involving multiple pathwaysTanW. Y.ZhouH.A new stochastic model of retinoblastoma involving both hereditary and non- hereditary cancer casesYanX. W.YangG. L.ChenC. W.A stochastic two-stage carcinogenesis model: a new approach to computing the probability of observing tumor in animal bioassaysOsadaH.TakahashiT.Genetic alterations of multiple tumor suppressors and oncogenes in the carcinogenesis and progression of lung cancerWistubaI. I.MaoL.GazdarA. F.Smoking molecular damage in bronchial epitheliumLandrevilleS.AgapovaO. A.HartbourJ. W.Emerging insights into the molecular pathogenesis of uveal melanomaMensinkH. W.ParidaensD.De KleinA.Genetics of uveal melanomaTanW. Y.YanX. W.A new stochastic and state space model of human colon cancer incorporating multiple pathwaysKlebanovL. B.RachevS. T.YakovlevA. Y.A stochastic model of radiation carcinogenesis: latent time distributions and their propertiesYakovlevA. Y.TsodikovA. D.FakirH.TanW. Y.HlatkyL.HahnfeldtP.SachsR. K.Stochastic population dynamic effects for lung cancer progressionKnudsonA. G.Mutation and cancer: statistical study of retinoblastomaCrowJ. F.KimuraM.TanW. Y.TanW. Y.ChenC. W.ZhangL. J.TanW. Y.HaninL.Cancer risk assessment by state space modelsTanW. Y.ChenC. W.Stochastic modeling of carcinogenesis: some new insightsTanW. Y.ChenC. W.Cancer stochastic modelsLuebeckE. G.MoolgavkarS. H.Multistage carcinogenesis and the incidence of colorectal cancerDurrettR.MayberryJ.MoseleyS.SchmidtD.Probability models for cancer development and progressionGoogle Search, Google 2012BoxG. E. P.TiaoG. C.DempsterA. P.LairdN. M.RubinD. B.Maximum likelihood from incomplete data via the EM algorithm (with discussion)SmithA. F. M.GelfandA. E.Bayesian statistics without tears: a samplingresampling perspectiveLoercherA. E.HarbourJ. W.Molecular genetics of uveal melanomaPottenC. S.BoothC.HargreavesD.The small intestine as a model for evaluating adult tissue stem cell drug targetsLynchC. J.MilnerJ.Loss of one p53 allele results in four-fold reduction of p53 mRNA and protein: a basis for p53 haplo-insufficiency