New Cancer Stochastic Models Involving Both Hereditary and Nonhereditary Cancer Cases: A New Approach

To incorporate biologically observed epidemics into multistage models of carcinogenesis, in this paper we have developed new stochastic models for human cancers. We have further incorporated genetic segregation of cancer genes into these models to derive generalized mixture models for cancer incidence. Based on these models we have developed a generalized Bayesian approach to estimate the parameters and to predict cancer incidence via Gibbs sampling procedures. We have applied these models to fit and analyze the SEER data of human eye cancers fromNCI/NIH. Our results indicate that the models not only provide a logical avenue to incorporate biological information but also fit the data much better than other models. These models would not only provide more insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of future cancer cases.


Introduction
It is universally recognized that each cancer tumor develops through stochastic proliferation and differentiation from a single stem cell which has sustained a series of irreversible genetic and/or epigenetic changes (Little [1]; Tan [2,3]; Tan et al. [4,5]; Weinberg [6]; Zheng [7]).That is, carcinogenesis is a stochastic multistage model with intermediate cells subjecting to stochastic proliferation and differentiation.Furthermore, the number of stages and the number of pathways of the carcinogenesis process are significantly influenced by environmental factors underlying the individuals (Tan et al. [4,5]; Weinberg [6]).
Another important observation in human carcinogenesis is that most human cancers cluster around family members.Further, many cancer incidence data (such as SEER data of NCI/NIH, USA) have documented that some cancers develop during pregnancy before birth to give new born babies with cancer at birth.This has been referred to as pediatric cancers.Well-known examples of pediatric cancers include retinoblastoma-a pediatric eye cancer, hepatoblastomaa pediatric liver cancer, Wilm's tumor-a pediatric kidney cancer, and medulloblastoma-a pediatric brain tumor.Epidemiological and clinical studies on oncology have also revealed that inherited cancers are very common in many adult human cancers including lung cancer, colon cancer [8], uveal melanomas (adult eye cancer, [9]), and adult liver cancer (HCC, [10]).
Given the above results from cancer biology and human cancer epidemiology, the objective of this paper is to illustrate how to develop stochastic models of carcinogenesis incorporating these biological and epidemiological observations.Based on these models and cancer incidence data, we will then proceed to develop efficient statistical procedures to estimate unknown parameters in the model, to validate the model, and to predict cancer incidence.
In Section 2, we illustrate how to incorporate segregation of cancer genes in multistage stochastic models of carcinogenesis to account for inherited cancer cases.In Section 3, we will develop stochastic equations for the state variables of the model described in Section 2. By using these stochastic equations we will derive probability distributions of the state variables (i.e., the number of intermediate cancer cells) and the probability distribution of time to detectable cancer  tumors.In Section 4, assuming that we have some cancer incidence data such as the SEER data from NCI/NIH, we proceed to develop statistical models for these data from these multistage models of carcinogenesis.In Section 5, by combining models in Sections 2-4, we proceed to develop a generalized Bayesian inference and Gibbs sampling procedures to estimate the unknown parameters, to validate the model, and to predict cancer incidence.As an example of application, in Section 6 we proceed to develop a multistage model of human eye cancer with inherited cancer cases as described in Figure 2. We will illustrate the model and methods by analyzing the SEER data of human eye cancer from NCI/NIH.Finally in Section 7, we will discuss the usefulness of the model and the methods developed in this paper and point out some future research directions.

The Stochastic Multistage Model of Carcinogenesis with Inherited Cancer Cases
The -stage multistage model of carcinogenesis views carcinogenesis as the end point of  ( ≥ 2) discrete, heritable, and irreversible events (mutations, genetic changes or epigenetic changes) with intermediate cells subjected to stochastic proliferation and differentiation (Little [1]; Tan [2,3]; Tan et al. [4,5]; Weinberg [6]).Let  =  0 denote normal stem cells,  the cancer tumors, and   the th stage initiated cells arising from the ( − 1)th stage initiated cells ( = 1, . . ., ) by some genetic and/or epigenetic changes.Then the model assumes  →  1 →  2 → ⋅⋅⋅ →   →  with the   cells subject to stochastic proliferation (birth) and differentiation (death).Further, it assumes that each stem cell proceeds independently of other cells and that cancer tumors develop from primary   cells by clonal expansion (stochastic birth and death), where primary   cells are   cells which arise directly from  −1 cells; see Yang and Chen [11].For example, Figure 1 is a multistage pathway for the squamous NSCLC (NonSmall Cell Lung cancer) as proposed by Osada and Takahashi [12] and Wistuba et al. [13].Similarly, Figure 2 is the multistage model for uveal melanoma proposed by Landreville et al. [14] and Mensink et al. [15] while Figure 3 is the APC--Catenin-Tcf pathway for human colon cancer (Tan et al. [8], Tan and Yan [16]).
Remark 1.To develop stochastic multistage models of carcinogenesis, in the literature (Little [1], Tan [2], Zheng [7]) it is conveniently assumed that the   cells grow instantaneously into cancer tumors as soon as they are generated.In this case, the number of tumors is equal to the number of   cells and one may identify   cells as tumors.It follows that the number of tumors is a Markov process and that the   cells are transient cells.In these cases, one needs only to deal with () and   cells with  = 1, . . .,  − 1.However, as shown by Yang and Chen [11], the number of tumors is much smaller than the total number of   cells.Also, in many animal models and in cancer risk assessment of radiation, Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19] have shown that () are in general not Markov.
To extend the above model to include hereditary cancers, observe that mutants of cancer genes exist in the population and that both germline cells (egg and sperm) and somatic cells may carry mutant alleles of cancer genes [2,20].Further, without exception, every human being develops from the embryo in his/her mother's womb (embryo stage, denote time by 0), where stem cells of different organs divide and differentiate to develop different organs respectively (see Weinberg [6], Chapter 10).If both the egg and the sperm generating the embryo carry mutant alleles of relevant cancer genes, then the individual is an  2 -stage person at the embryo stage; if only one of the germ line cells (egg or sperm) generating the embryo carries mutant alleles of cancer genes, then at the embryo stage the individual is an  1 -stage person.Similarly, the individual is a normal person ( =  0 person) at the embryo stage if both the egg and the sperm generating the embryo do not carry mutant alleles of cancer genes.Refer to the person in the population as an   ( = 0, 1, 2) person if he/she is an   -stage person at the embryo stage.Then with respect to the cancer development in question, people in the population can be classified into 3 types of people: normal people ( =  0 people),  1 people, and  2 people.Based on this classification, for normal people in the population the stochastic model of carcinogenesis is a -stage multievent model given by  0 →  1 → ⋅ ⋅ ⋅ →   → ; for  1 people in the population the stochastic model of carcinogenesis is a ( − 1)-stage multievent model given by  1 →  2 → ⋅ ⋅ ⋅ →   →  and for  2 people in the population, the stochastic model of carcinogenesis is a (−2)stage multievent model given by  2 → ⋅ ⋅ ⋅ →   → .
To account for inherited cancer cases, let  1 be the proportion of  1 people in the population and  2 the proportion of  2 people in the population.In general large human populations under steady-state conditions, one may practically assume that the   is a constant independent of time (Crow and Kimura [21]).Then is the proportion of normal people (i.e.,  =  0 people) in the population.Let  be the population size and   ( = 0, 1, 2) the number of   people in the population so that ∑ 2 =0   = .Assume that  is very large and that marriage between people in the population is random with respect to cancer genes; then as shown in Crow and Kimura [21] (see also Tan [22], Chapter 2), the conditional probability distribution of ( 1 ,  2 ) given n is 2-dimensional multinomial with parameters (,  1 ,  2 ).That is, To derive probability distribution of time to cancer under the above model, observe that during pregnancy the proliferation rates of all stem cells are quite high.Thus, with positive probability  2 people in the population may acquire additional genetic and/or epigenetic changes during pregnancy to become  3 -stage people at birth.Similarly,  1 people may acquire genetic and/or epigenetic changes during pregnancy to become  2 people at birth; albeit the probability is very small, normal people at the embryo stage may acquire some genetic and/or epigenetic changes during pregnancy to become  1 people at birth.Because the probability of genetic and epigenetic changes is small, one may practically assume that an   ( = 0, 1, 2) person at the embryo stage would only give rise to   stem cells and possibly  +1 stem cells at birth.This is equivalent to assuming that   people at the embryos stage would not generate  + ( > 1) stem cells at or before birth.This model is represented schematically in Figure 4. Notice that if  = 2, one may practically assume that with probability one an  2 person at the embryo stage would develop cancer at or before birth ( 0 ).If  = 3, then  with probability  ( > 0), an  2 person at the embryo stage would develop cancer at or before birth.

The Stochastic Process of Carcinogenesis with Hereditary Cancer Cases and Mathematical Analysis
Because tumors are developed from primary   cells, for the above stochastic model, the identifiable response variables are () and {  (; ),  = 0, 1, 2,  = ,  + 1, . . .,  − 1}, where () is the number of cancer tumors at time  and   (; ) is the number of   ( = ,  + 1, . . .,  − 1) cells at time  in people who are   people at the embryo stage (see [3,5,8,23], Remarks 1 and 2).For people who have genotype   ( = 0, 1, 2) at the embryo stage, the stochastic model of carcinogenesis is then given by the stochastic process { X (), (),  > 0}, where X () = {  (; ),  = ,  + 1, . . .,  − 1}  .For these processes, in the next subsections, we will derive stochastic equations for the state variables (  (; ) = 0, 1, 2,  = , . . .,  − 1); we will also derive the probability distributions of these state variables and the probabilities of developing cancer tumors.These are the basic approaches for modeling carcinogenesis used by the first author and his associates; see Tan [3], Tan et al. [4,5,8,23], Tan and Zhou [9], Tan and Yan [16], and Tan and Chen [24,25] and Remark 3. Remark 2. At any time (say ) the total number of   cells is equal to the total number of   cells generated from  −1 cells at time  plus the total number of   cells generated by cell division from other   cells at time ; the former   cells are referred to as primary   cells while the latter are not primary   cells.Since each tumor is developed from a single primary   cell through stochastic birth and death process, each primary   cell will generate at most one tumor.It follows that at any time the total number of   cells is considerably greater than the number of cancer tumors (see also Yang and Chen [11]).Thus, for generating cancer tumors the only identifiable state variables are the number of   cells with ( = 0, 1, . . .,  − 1) and the number of detectable cancer tumor.
Remark 3. To model stochastic multistage models of carcinogenesis, the standard traditional approach is to assume that the last stage cells (i.e., the   cells in the model  →  1 ⋅ ⋅ ⋅ →   → ) grow instantaneously into a cancer tumor as soon as they are generated and then apply the standard Markov theory to () and to the state variables X() = {  (),  = 0, 1, . . .,  − 1}.This approach has been described in detail in Tan [2], Little [1], and Zheng [7]; see also Luebeck and Moolgavkar [26] and Durrett et al. [27].However, in some cases the assumption of instantaneous growth into cancer tumors of   cells may not be realistic (Klebanov et al. [17], Yakovlev and Tsodikov [18], and Fakir et al. [19]); in these cases, () is not Markov so that the Markov theory method is not applicable to ().To develop analytical results and to resolve many difficult issues, Tan and his associates [4,5,24] have proposed an alternative approach through stochastic equations and have followed Yang and Chen [11] to assume that cancer tumors develop by clonal expansion from primary last stage cells.Through probability generating function method, Tan and Chen [24] have shown that if the Markov theory is applicable to (), then the stochastic equation method is equivalent to the classical Markov theory method but is more powerful.Also, through stochastic equation method we have shown in the Appendix that the classical approach provides a close approximation to discrete time model under the assumption that the primary last stage cells develop into a detectable tumor in one time unit.This provides a reasonable explanation why the traditional approach (see [2,22]) can still work well even though the Markov assumption for () may not hold.In this paper we will thus basically use the stochastic equation method and assume that cancer tumors develop from primary last stage cells through clonal expansion.
Remark 4. Because genetic changes and epigenetic changes occur during cell division, to the order of (Δ), the probability is   ()Δ that one   cell at time  would give rise to 1   cell and 1  +1 cell at time  + Δ by genetic changes or epigenetic changes.It follows that the transition of   →  +1 would not affect the population size of   cells but only increase the size of the  +1 population.

The Probability Distributions of the Number of Detectable
Tumors and Times to Tumors.As shown by Yang and Chen [11], malignant cancer tumors arise from primary   cells by clonal expansion, where primary   cells are   cells generated directly by  −1 cells.(  cells derived by stochastic birth of other   cells are not primary   cells).That is, cancer tumors develop from primary   cells through stochastic birth-death processes.
To derive the probability distribution for () in   people in the population, let   (, ) denote the probability that a primary cancer cell at time  develops into a detectable cancer tumor by time .(Explicit formula for   (, ) has been given in Tan [22], Chapter 8 and in Tan and Chen [24]).Than, as shown in Tan ( [3,22] ( Let   () be the probability that cancer tumors develop during ( −1 ,   ] in   people in the population.For time homogeneous models with small  −1 ,   () is then given by where To derive   (), denote by and define the functions Applying results of [ −1 (; )] given in (11), for time homogeneous models with   ̸ =   if  ̸ =  we obtain   ()'s as follows.

Probability Distribution of Observed Cancer Incidence Incorporating Hereditary Cancer Cases
For estimating unknown parameters and to validate the model, one would need real data generated from the model.For studies of carcinogenesis such data are usually given by cancer incidence.For example, in the SEER data of NCI/NIH of USA, the data are given by {( 0 ,  0 ), (  ,   ),  = 1, . . .,   }, where  0 is the number of cancer cases at birth and  0 the total number of birth, and where for  ≥ where The expected number of  0 given  0 is ( Hence, for the 2-stage model (i.e.,  = 2) or the 3stage model (i.e.,  = 3), the maximum likelihood estimate of   is χ =  0 and the deviance  0 () from the conditional probability distribution of  0 given  0 is Note 1: for age 0 and 1-10 years old, the cancer incidence are derived by subtracting incidence of retinoblastoma from the original SEER data (see Tan and Zhou [9]).Note 2: the observed uveal melanoma incidence rates per 10 6 individuals are derived from the SEER eye cancer incidence by subtracting retinoblastoma incidence as given in Tan and Zhou [9].
The probability density function (  |   ) given by ( 34) is a mixture of Poisson probability density functions with mixing probability density function given by the multinomial probability distribution of { 0 ,  1 } given   .This mixing probability density function represents individuals with different genotypes at the embryo stage in the population.

The Fitting of the Model to Cancer
Incidence and the Generalized Bayesian Inference Procedure Given the model in Sections 2 and 3 and cancer incidence, one may use results in Section 4 to fit the model.By using this model and the distribution results in Section 4, one can readily estimate the unknown genetic parameters, predict cancer incidence, and check the validity of the model by using the generalized Bayesian inference and Gibbs sampling procedures; for more detail, see Tan [3,22] and Tan et al. [4,5].
The generalized Bayesian inference is based on the posterior distribution {Θ | N, Y, ỹ, ñ} of Θ given {N, Y, ỹ, ñ}.This posterior distribution is derived by combining the prior distribution {Θ} of Θ with the joint probability distribution {N, Y, ỹ | ñ, Θ} given { ñ, Θ} given by (42).It follows that this inference procedure would combine information from three sources: (1) previous information and experiences about the parameters in terms of the prior distribution {Θ} of the parameters, (2) biological information of inherited cancer cases via genetic segregation of cancer genes in the population ({N | ñ,   ,  = 1, 2}; see Section 2), and (3) information from the expanded data (Y) and the observed data ( ỹ) via the statistical model from the system ({Y, ỹ | N, Θ}) given by ( 37) and ( 40).Because of additional information from the genetic segregation of the cancer genes, this inference procedure provides an efficient procedure to extract information of effects of genotypes of individuals at the embryo stage.

The Prior Distribution of the Parameters.
For the prior distributions of Θ, because biological information has suggested some lower bounds and upper bounds for the mutation rates and for the proliferation rates, we assume where  is a positive constant if these parameters satisfy some biologically specified constraints are and equal to zero for otherwise.These biological constraints are as follows.
We will refer to the above prior as a partially informative prior which may be considered as an extension of the traditional noninformative prior given in Box and Tiao [28].

The Multilevel Gibbs Sampling Procedure For Estimating
Unknown Parameters.Given the posterior probability distributions, we will use the following multilevel Gibbs sampling procedure to derive estimates of the parameters.We notice that numerically, the Gibbs sampling procedure given below is equivalent to the EM-algorithm from the sampling theory viewpoint with Steps 1 and 2 as the -Step and with Steps 3 and 4 as the -Step, respectively [29].These multilevel Gibbs sampling procedures are given by the following.
Step 1 (Generating N Given (Y, ỹ, ñ, Θ) (The Data-Augmentation Step 1)).Given Θ and given ñ, use the multinomial distribution of { 1 ,  2 } given   in Section 3 to generate a large sample of N.Then, by combining this large sample with {Y, ỹ | N, ñ, Θ} in (37) and (40) to select N through the weighted bootstrap method due to Smith and Gelfand [30].This selected N is then a sample from {N | Y, ỹ, ñ, Θ} even though the latter is unknown.(For proof, see Tan [22], Chapter 3).Call the generated sample N.
The proof of convergence of the above steps can be derived by using procedure given in Tan ( [22], Chapter 3).At convergence, the Θ = {p  ,  = 1, 2, α, Θ1 } are the generated values from the posterior distribution of Θ given { ỹ, ñ} independently of (N, Y) (for proof, see Tan [22], Chapter 3).Repeat the above procedures once then generate a random sample of Θ from the posterior distribution of Θ given { ỹ, ñ}; then one uses the sample mean as the estimates of (Θ) and use the sample variances and covariances as estimates of the variances and covariances of these estimates.

A New Multistage Stochastic Model for Adult Eye Cancer (Uveal Melanoma)-An Example
The human eye cancers consist of pediatric eye cancers and adult eye cancers.The most common pediatric eye cancer is the retinoblastoma which develops from the retinal pigment epithelium cells underlying the retina that do not form melanoma.The most common adult eye cancers are the uveal melanomas involving the iris, the ciliary body, and the choroid (collectively referred to as the uveal).These cancers develop from melanocytes (pigment cells) which reside within the uveal giving color to the eye.In Tan and Zhou [9] we have developed a modified two-stage model for retinoblastoma.Based on results from molecular biology (see Landreville et al. [14], Mensink et al. [15], and Loercher and Harbour [31]), Landreville et al. [14] have proposed a three stage model for uveal melanoma as given in Figure 2. As an example of applications of this paper, in this section we will apply this model of uveal melanoma to the NCI/NIH eye cancer data from the SEER project.We notice that the same methods can be applied to model other human cancers as well, but this will be our future research.Given in Table 1 are the numbers of people at risk and the eye cancer cases in the age groups together with the predicted cases from the models.These data give cancer incidence at birth and incidence for 85 age groups ( = 85) with each group spanning over a 1-year period except the last age group (≥85 years old).For human eye cancer, because the incidence at birth and for age groups from 1 to 10 years old is basically generated by the pediatric eye cancer-retinoblastoma (see [9]), to account for inherited cancer cases of uveal melanomas, the incidence for age 0 (birth) and for age periods from 1 to 10 years old in Table 1 for uveal melanoma is derived by subtracting incidence of retinoblastoma from SEER data (see Tan and Zhou [9]).
To fit the data, we let one-time unit be 6 months after birth and let  0 = 1.To compare different models and to assess different assumptions, we will consider the following 2-3-stage mixture models: (1) the complete 3-stage mixture model (Model-F) in which no assumptions are made on the parameters.(2) the Type-1-3-stage mixture model in which we assume that  1 = 0 and that normal people and  1 at the embryo stage will remain normal people and  1 people, respectively, at birth (Model-1).For comparison purposes, we also fit a 2-stage model as defined in Tan and Zhou [9].We will apply the methods in Section 6 to fit these models to the SEER data given in Table 1.Given in Table 2 are the natural logs of the likelihood functions, the AIC (Akaike Information Criterion) and the BIC (Bayesian Information Criterion) for these models.Given in Table 3 are the estimates of parameters in the 3-stage models.Given in Figure 5 are the plots of predicted cancer cases from the 3-stage mixture models (Model-F and Model-1) and the 2-stage model.For comparison purposes, in Table 1, we also provide numbers of predicted cancer cases from the 3-stage mixture models and the 2-stage model together with the observed cancer cases over time from SEER.From these results, we have made the following observations.(a) As shown by results in Table 1 and Figure 5, it appeared that both Model-F and Model-1 fitted the SEER data well, although Model-1 fitted the data slightly better from values of AIC and BIC.The Chi-square test statistics  2 = ∑ 85 =0 ((  − ŷ ) 2 / ŷ ) for Model-F and Model-1 are given by 88.43, and 94.48 respectively, giving a -value of 0.12 (df = 86 − 12 = 74) for Model-F and a  value of 0.11 (df = 86 − 7 = 79) for Model-1.On the other hand, the 2-stage model fitted the date very poorly; the Chi-statistic value for the 2-stage model is 2747.69giving a -value less than 10 −3 .The AIC (Akaike Information Criteria) and BIC (Bayesian Information Criteria) values of Model-1 are given by (AIC = 2609.53,BIC = 2631.51)which are slightly smaller than those of Model-F, respectively; however, the AIC and BIC values (8796.84,8811.57) of the two-stage model are considerably greater than those of the 3-stage models, respectively.These results suggest that uveal melanoma may best be described by a 3-stage model with inherited component and that one may practically assume  1 = 0 and that normal people and  1 people at the embryo stage will remain normal people and  1 people, respectively, at birth.(b) From Table 3, it is observed that the estimate of  1 is close to zero (the estimate is of order 10 −5 ) indicating that the phenotype of  1 is almost identical to that of  =  0 further confirming that the staginglimiting genes are basically tumor suppressor genes and that there is no haploinsufficiency for these tumor suppressor genes.On the other hand, the estimate of  2 is of order 10 −2 which is about 10 3 times greater than those of cells with genotype  1 .
(d) From Table 3, the estimates of  1 and  2 from the SEER data are of orders 10 −4 ∼ 10 −3 and 10 −7 ∼ 10 −6 , respectively.This indicates that in the US population, the frequency of the staging limiting cancer gene for uveal melanoma is approximately around 10 −3 .Table 3 also showed that the estimate of  was 0.8411, indicating that most individuals with genotype  2 would develop cancer at birth.This may help to explain why there are observed cancer incidences at birth for uveal melanoma in the SEER data even though the estimate of the frequency  2 is of order 10 −7 ∼ 10 −6 .

Discussion and Conclusions
To account for inherited cancer cases, in this paper we have developed some general multistage models involving hereditary cancer cases.For human cancer incidence, these models are basically generalized mixture models.In these mixture models, the mixing probability is a multinomial distribution to account for genetic segregation of the staginglimiting tumor suppressor genes.This mixture model allows us to estimate for the first time the frequency of the staginglimiting tumor suppressor gene in human populations.As an example of applications, in this paper we have developed a general 3-stage stochastic multistage model of carcinogenesis for adult human eye cancer.To account for inherited cancer cases in the stochastic model of human eye cancer, we have also developed a generalized mixture model for uveal melanoma in human beings.
For using the proposed models to fit the cancer incidence data, in this paper we have developed a generalized Bayesian inference procedure to estimate the unknown parameters and to predict cancer cases.This inference procedure is advantageous over the classical sampling theory inference (i.e., maximum likelihood method) because the procedure combines

Figure 2 :
Figure 2: A Multistage Model of uveal melanoma (adult human eye cancer).

Figure 4 :
Figure 4: Embryo genotypes and their frequencies at embryo stage and at birth.

Figure 5 :
Figure 5: Curve fitting of SEER data by the Model-F, Model-1, and the two-stage model.

Table 3 :
Estimates of parameters for the 3-stage stochastic models.: assumed nonexistence.
Histopathology lesions and genetic pathway of squamous cell carcinoma of NonSmall Cell Lung Caner (NSCLC).
1,   is the number of cancer cases developed during the th age group of a one-year period (or 5 years periods) and   is the number of noncancer people who are at risk for cancer and from whom   of them have developed cancer during the th age group.Given in Table 1 are the SEER data of uveal melanoma (adult eye cancer) during the period 1973-2007.In Table1, notice that there are some cancer cases at birth implying some inherited cancer cases.In this section, we will develop a statistical model for these types of data sets from the stochastic multistage model with hereditary cancers as given in Section 2. As in previous sections, let   be the number of individuals who have genotype ( = 0, 1, 2) at the embryo stage among the   people at risk for the cancer in question.Then, as showed above,( 1 ,  2 ) |   ∼ Multinomial{  ;  1 ,  2 }.It follows that   |   ∼ Binomial{  ,   },  = 0, 1, 2.In what follows, we let   denote the random variable for   unless otherwise stated.4.1.The Probability Distribution of  0 .As shown in Figure4,   ( = 0, 1, 2) people would only generate   stage cells and  +1 stage cells at birth.Thus, for cancers to develop at or before birth, the number of stages for the stochastic model of carcinogenesis must be 3 or less.It follows that if  0 > 0, the appropriate model of carcinogenesis must be either a 2-stage model or a 3-stage model.Since  20 |  0 ∼ Binomial( 0 ;  2 ) and  10 | ( 0 ,  20 ) ∼ Binomial( 0 ;  1 ) + ( 2 ), the probability distribution of  0 is therefore

Table 1 :
The SEER incidence dataof uveal melanoma from NCI/NIH (over all races and genders).

Table 2 :
The log-likelihood, AIC, and BIC of the fitted models.