Computing Individual Risks Based on Family History in Genetic Disease in the Presence of Competing Risks

When considering a genetic disease with variable age at onset (e.g., familial amyloid neuropathy, cancers), computing the individual risk of the disease based on family history (FH) is of critical interest for both clinicians and patients. Such a risk is very challenging to compute because (1) the genotype X of the individual of interest is in general unknown, (2) the posterior distribution ℙ(X∣FH, T > t) changes with t (T is the age at disease onset for the targeted individual), and (3) the competing risk of death is not negligible. In this work, we present modeling of this problem using a Bayesian network mixed with (right-censored) survival outcomes where hazard rates only depend on the genotype of each individual. We explain how belief propagation can be used to obtain posterior distribution of genotypes given the FH and how to obtain a time-dependent posterior hazard rate for any individual in the pedigree. Finally, we use this posterior hazard rate to compute individual risk, with or without the competing risk of death. Our method is illustrated using the Claus-Easton model for breast cancer. The competing risk of death is derived from the national French registry.


Introduction
Complex diseases with variable age at onset typically have many interacting factors such as the age, lifestyle, environmental factors, treatments, and genetic inherited components. The genetic component is generally composed of one or several genes including major genes for which a deleterious mutation rises significantly the risk of the disease and/or minor genes which participation in the disease is moderate by itself.
The mode of inheritance can be monogenic if a mutation in a single gene is transmitted or polygenic if mutations in several genes are transmitted. As an example of a major gene in a complex disease, the BRCA1 gene is well known to be strongly correlated with ovarian and breast cancer since the 90s [1,2]. Carriers of a deleterious mutation in BRCA1 gene have a much higher risk to be affected with relative risks ranging from 20 to 80 but deleterious mutations in BRCA1 gene only explain 5 to 10% of the disease [3] as many other implicated known or unknown genes exist along with sporadic cases (cases with no inherited component).
In other rare genetic diseases such as the Transthyretinrelated Hereditary Amyloidosis (THA), no sporadic cases are found and therefore the incidence is equal to zero among noncarriers and all affected individuals are necessarily carriers of a deleterious mutation [4,5].
The family history (FH) of such diseases is often the first tool for clinicians to detect a family of carriers of a deleterious mutation as any unusual accumulation of cases in relatives leads to suspect a deleterious allele in the family. With the appropriate model and computation, the FH can be used to better target the most appropriate individuals for a genetic testing and/or to identify high-risk individuals who require special attention (monitoring and/or treatments).
The first challenge to compute such a model comes from the fact that genotypes are mostly (if not totally) unobserved and that posterior carrier probability computations must sum over a large number of familial founders' genotypes 2 Computational and Mathematical Methods in Medicine configurations. Once such computations are carried out, deriving posterior individual disease risk is also a challenging task since the posterior carrier distribution changes over time and must be accounted for. Finally, for diseases with possibly late age at onset (e.g., cancer), the competing risk of death is not negligible and must be accounted for.
A competing risk situation occurs when an event (called a competing event) precludes the occurrence of the event of interest. This is typically the case for late-onset diseases as the risk of death is not negligible for advanced age. Ignoring the risk of death would amount to assuming that death cannot happen and would therefore lead to overestimating the cumulative incidence (the probability of having the disease before any time point). Famous examples of such situations include dementia where the patients are of a particularly advanced age and have a high risk of dying as in Jacqmin-Gadda et al. [6] or Wanneveich et al. [7], or studies on geriatric patients (see, e.g., [8]).
Classical familial risk models such as Claus-Easton [9,10], BOADICEA [11], or the BayesMendel models (BRCAPRO, MMRpro, PancPRO, and MelaPRO, see [12]) do not take into account the competing event of death. As a result, it is likely that individual predictions will tend to be overestimated from these models [13]. The main result of the present work is that we show how to derive individual risk predictions from the family history while taking into account the competing risk of death, which is a new contribution to the best of our knowledge.
Another interesting point is that, unlike most similar publications, we here provide all the necessary details to integrate the likelihood over the unobserved genotypes and to compute posterior genotype distributions using Bayesian network and sum-product algorithms. One should not that these models and algorithms clearly are often used in the context of genetics (see [14][15][16][17][18], for a few examples) but rarely fully detailed (see, e.g., [12]).
It should also be noted that the genetics community usually prefers to rely on simple peeling algorithms rather than Bayesian network for pedigree computations but the two concepts are in fact totally equivalent, and the sum-product algorithm presented in this paper can indeed be seen as a simple Bayesian network based reformulation of the most general peeling-based algorithm developed so far [19].
The paper is organized as follows: firstly, in Section 2.1 we introduce a formal generic Bayesian network model adaptable to any genetic disease with variable age at onset. Secondly, in Section 2.2, we provide in this context all the necessary details to carry belief propagation on this model and express the marginal posterior carrier distribution using Bayesian network's potentials. Thirdly, in Section 2.3, we give closed-form formulas for the posterior individual disease risk and introduce a simple numerical algorithm allowing taking into account the competing risk of death. Finally, in Section 3, all the methods are illustrated with the Claus-Easton model for breast cancer using the disease model and the parameters of Claus et al. [9] and Easton et al. [10]. In particular, individual predictions derived by taking into account the competing risk of death or ignoring it are compared, which emphasizes the importance of properly taking into account competing risk of death in such models.

Materials and Methods
In this section, we first introduce our model (Section 2.1) as a Bayesian network. We next explain how to perform belief propagation in order to obtain posterior carrier distributions (Section 2.2). Finally, we provide all the details needed to derive disease risks predictions from these posterior distributions, including taking into account the competitive risk of death (Section 2.3).

The Bayesian Network.
We consider a total of (related) individuals. With I = {1, . . . , }, we denote by F ⊂ I the subset of the founders (i.e., individuals without ancestors in the pedigree) and we denote by I \ F the set of nonfounders (i.e., with ancestors in the pedigree). Let X = ( 1 , . . . , ) ∈ {00, 01, 10, 11} be the genotypic distribution (for the sake of simplicity, we consider here a simple biallelic gene but multiallelic genes can obviously be easily considered) of the whole family, where denotes the genotype of Individual . Let T = ( 1 , . . . , ) ∈ R be the time vector representing the age at diagnosis of all individuals. The joint distribution of (X, T) is given by (1) which corresponds to the definition of a Bayesian network (BN). See Koller and Friedman [20] for more details. The genetic part of (1) only relies on the "classical" Mendelian assumption that the distribution of a nonfounder genotype only depends on the parental genotypes. The survival part makes the strong assumption that all are conditionally independent given . This assumption is clearly not true when considering any other familial effect on the disease (e.g., polygenic effect and environmental exposure) which is often taken into account using a familial random effect (often called frailty in the survival context). Such familial random effect is, for example, assumed to account for a polygenic effect in the BOADICEA model [11,21]. Note that, for the sake of simplicity, the symbol "P" corresponds throughout the whole paper either to a probability measure or to a density.
The extension of the present model to frailty models such as BOADICEA is clearly possible and, in many ways, quite straightforward. However, for the sake of simplicity, we focus here on a simpler model and will briefly discuss the extension in the conclusion section. However, even with the strong assumption that only depends on , since (the basically unobserved) X has a strong correlation structure within the pedigree, so does T.
We can see in Figure 1 an example of a moderate size (hypothetical) family with a severe history of breast and ovarian cancer. This family has a total of = 12 individuals There is no inbreeding (mating between individuals with a common ancestor) in this family but a mating loop (two families joined more than once by mating) due to the two brothers of the first nuclear family having children with two sisters of the second nuclear family. Such looped pedigree can be tricky to represent and this explains why Individual 7 appears twice (with an identity link) in Figure 1.
One should note that loops in pedigree are not the same as cycles in the Bayesian networks framework in the sense that the underlying conditional dependence structure of the model remains a proper directed acyclic graph even in the presence of pedigree with loops.
Genetic Part. For the genetic part, we assume that founders' genotypes are distributed according to the Hardy-Weinberg distribution with disease allele frequency . It means that for any founder ∈ F we have P( = 00) = (1 − ) 2 , P( = 01) = P( = 10) = (1 − ), and P( = 11) = 2 . This assumption is extremely frequent in family genetics and usually reasonable since it corresponds to the stationary distribution we observe in a population under mild assumptions. However, one should note that other distributions can easily be considered if necessary (e.g., genotype 11 forbidden because it is lethal). For the nonfounder we simply assume a Mendelian transmission of the alleles, but unbalanced transmission patterns can also be considered.
The genetic part of the model can also be easily extended to account for various constraints. For example, the presence of monozygous twins, say individuals and , only requires one to add an identity variable between the two genotypes: , ∈ {0, 1} such as P( , | , ) = 1{ = }. Genetic tests (including error or not) can also be incorporated as additional variables such as P( | ) corresponding to the test specificity and sensibility. Finally, assuming lethal genotypes (e.g., genotype 11) is done straightforwardly by setting to 0 the probability of carrying such genotype. This is equivalent to working conditionally on { ̸ = 11 for all } which obviously alter all genotype distributions, including Hardy-Weinberg for founders.
We assume an autosomal dominant model where noncarriers have a disease incidence 0 ( ) and carriers have a disease incidence 1 ( ). This simple assumption results in the following expression of the survival part of the model: As explained above, the symbol "P" corresponds to a (conditional) probability measure for the event { > } and to a density for the punctual event { = }.
For example, in the context of the THA, noncarriers cannot be affected ( 0 ( ) ≡ 0) and only carriers have an agedependent incidence. In the context of breast cancer, 0 ( ) might be the incidence for non-BRCA carriers and 1 ( ) the incidence for BRCA carriers (BRCA1 or BRCA2).
Of course, the simple model suggested in (2) can easily be extended to account for other genetic models (e.g., recessive, additive, gonosomal (i.e., nonautosomal), and with parentof-origin effect) as well as for any known covariates (e.g., BMI, smoking, and other diseases) using a classical proportional hazard model.
Hazard rates 0 ( ) and 1 ( ) are typically described by the literature as piecewise constant hazards (PCHs), but our model allows for any parametric or nonparametric shape as long as hazard rates are provided (e.g., hazard rates of Weibull distributions and Gaussian survival).

Carrier Risk.
For all Individuals , let us denote by PH their personal history of the disease. In the case where Individual was diagnosed with the disease at age we have PH = { = }. If Individual was unaffected at age (age at the last follow-up), the variable is right-censored and 4 Computational and Mathematical Methods in Medicine we have PH = { > }. From now on, we denote by FH the family history of the disease. This includes the personal history of all individuals and all possible additional constraints or information (e.g., monozygous twins, genetic tests, and lethal alleles). Formally, we can define FH = ⋃ (PH ∪ { ∈ X }), where X ⊂ {00, 01, 10, 11} is the subset of allowed values for (e.g., X = {00, 01, 10} if we know that genotype 11 is lethal and X = {00} if we know that a particular individual is a noncarrier). Even with genetic testing, it is essential to understand that X is, at best, partially observed. Indeed, even with a (hypothetical and unrealistic) 100% specificity/sensitivity test, a positive heterozygous carrier status cannot distinguish between genotypes 01 and 10. Moreover, genetic tests are in general only available for a few individuals in the whole pedigree. Accounting for the unobserved genotypes is therefore of utmost importance.
Following the classical BN notations, we write the socalled evidence P(FH) as the simple following sum-product of potentials: where the potentials are defined by where P(PH | ) is either P( = | ) or P( > | ) and can be obtained through (2). Note that pa ⊂ I denote the parental set of Individual (empty for founders) and that J = ( ) ∈J for any J ⊂ I. As explained above, any additional information or constraint might and should be added directly into the potentials.
Since X has 4 possible configurations in the worst case, it is clearly impossible to simply enumerate these configurations even for moderate size pedigrees (e.g., for = 10 or = 20). We therefore need a more efficient algorithm to compute (3). An efficient solution is provided by the Elston-Stewart algorithm [22] in the particular (and frequent) case where the pedigree has no loop. The basic idea is to eliminate variables from the sum-product (peeling in the Elston-Stewart literature) from the last generations up to the oldest common ancestor. The resulting complexity O( × 4 3 ) clearly allows one to deal with arbitrary pedigree size as long as there is no loop.
Unfortunately, loops (inbreeding or mating) are not totally uncommon in pedigrees and therefore have to be accounted for. A simple extension of the Elston-Stewart algorithm consists in using loop breakers: working conditionally to a few number of key genotypes that can be considered as duplicated individuals with known genotypes in a pedigree with no loop. For example, in Figure 1 In the context of Bayesian networks, computing P(FH) (and, in fact, the whole P(X, FH) distribution) is typically done through belief propagation (BP) (also called sumproduct algorithm) with a O( ×4 ) complexity, where is the tree-width of the graphical model (see [20], for more details). For a pedigree with no loop, = 3 and the BP complexity is strictly the same as Elston-Stewart, but for more complex pedigrees, usually increases much slower than ℓ + 3 and, as a result, BP is often dramatically faster than Elston-Stewart with loop breakers.
In order to achieve this, BP basically eliminates variables from the sum-product of (3) in a suitable order. In that sense, it is very similar to the notion of cutset long used to compute likelihoods in complex pedigrees (see [23], for a recent reference on the MENDEL package). But BP has the noticeable advantage to allow obtaining the full posterior distribution P(X | FH) for the same algorithmic complexity while likelihood-based approaches need to repeat many cutset eliminations to achieve the same results. As a consequence, it should not be surprising to see that, in parallel with the classical genetic literature [22][23][24], many authors have been using BP and BN to deal with genetic models [14][15][16][17][18].
Let us finally point out that the genetics community has put considerable efforts in developing Elston-Stewart algorithms for any Bayesian network counterpart, claiming that peeling-based algorithms are more natural for geneticists than junction-tree based ones. Note however that the most general version of these peeling algorithms [19] is in fact exactly equivalent to the classical junction-tree based forward/backward algorithm presented below.
For completeness, we will now briefly recall all the minimal necessary results to implement BP in the context of our model. We nevertheless encourage the interested reader to refer to more classical references like Lauritzen and Sheehan [17] or Koller and Friedman [20] for more details.
Variable Elimination and Junction Tree. As an example, we consider the pedigree of Figure 1 and want to compute P(FH) by successive variable elimination. We use the following elimination order: 9 , 10 , 11 , 12 , 1,2 , 5 , 6 , and 3,4,7,8 . Here follow the quantities obtained in the process: We therefore can obtain P(FH) by considering only 6 × 4 3 + 2 × 4 4 = 896 configurations over the 4 12 ≃ 16.8 × 10 6 total number of X configurations. Note that a memory bounded version of the variable elimination exists; see Darwiche [25] for more details. Figure 2 is a graphical representation of this particular sequence of elimination and is also a junction tree defined as a set of cliques 1 , . . . , with ⊂ { 1 , . . . , } with the following properties: (i) Tree: each clique is connected to a subsequent clique to ∈ { + 1, . . . , } (to = root by convention). We also define from = { , to = } (from 1 = 0) and = ∩ to (with the convention that = 0). In the graph theory, junction trees are used as an auxiliary structure for many applications (e.g., graph coloring). The proof that any elimination sequence gives a junction tree can be found in Koller and Friedman [20]. The tree-width of an elimination sequence/junction tree is defined as the size of its largest clique. Finding the elimination sequence with the smallest tree-width is NP-hard in general, but many heuristics are available [20]. The elimination order of Figure 2 has been obtained using the well-known minimum fill-in heuristic.
Belief Propagation. We assume that a suitable elimination order/junction tree has been obtained. For all ∈ {1, . . . , } we hence define the potential of clique as Φ ( ) = ∏ ∈ * ( | pa ) and we have the following result.
Using Theorem 1, it is therefore possible to obtain P(FH) and all P( | FH) by just recursively computing once all forward and backward quantities.

Disease Risk.
While the previous section covered the computation of the posterior probability P( | FH) for all individuals in the pedigree, we now focus in this section on computing individual posterior disease risks, with or without the competing risk of death.
Risk without Competing Events. We consider an Individual with a posterior carrier probability at age ; that is, = P( ̸ = 00 | FH, > ). Conditionally to the family history, we denote the survival and hazard functions, respectively, by and such that, for ≥ , ( ) = P( > | FH, > ) and ( ) = exp(− ∫ ( ) ). We have the following result.

Theorem 2.
For any ≥ , we have Risk with Death as a Competing Event. As explained in the introduction, death precludes the occurrence of the disease. This needs to be taken into account by defining the hazard rate of the disease conditionally to the fact that both disease and death have not occurred yet. From a statistical point of view, such a situation can be seen as a competing risk situation or as an illness-death model; see Andersen et al. [26] or Andersen and Keiding [27] for a presentation of such models. We define * as the minimum between age at disease onset and age at death and we keep the notation to denote the age at disease onset. Given an individual with a family history FH, its hazard rate for the disease is defined as We denote by and the hazard and survival functions of * (conditionally to the family history) and we assume that and are piecewise constants with common cuts = 0 < 1 < ⋅ ⋅ ⋅ < (i.e., ( ) = and ( ) = for ∈] −1 , ]).
Proof. See Appendix B.
Practical Computations. We assume that one individual has a carrier probability at age (his age without the disease in the FH). We denote by death his/her hazard of death. Then the posterior disease risk with the competing risk of death can be computed through the following steps: (1) Choose a fine enough discretization = 0 < 1 < ⋅ ⋅ ⋅ < = max (e.g., all − −1 = 0.1 year).
(4) Then the marginal posterior probability of being diagnosed with the disease before age , in the presence of death as a competing risk, is given for = 1, . . . , by

The Claus-Easton Model.
In order to illustrate our method, we will use the model of illness and the parameters of the Claus-Easton model developed from the Cancer and Steroid Hormone Study in the 90s [9,10]. The Claus-Easton model is a classical genetic model composed of a genotypic part and a phenotypic part with only the family history (FH) as covariate. It assumes an autosomal dominant mode of inheritance and a piecewise constant hazard rate by steps of 10 years. The penetrance ( ( ) = 1 − ( )) and the density ( ( ) = ( ) ( )) are given in Table 2 from Easton et al. [10] for both carriers and noncarriers at ages 25, 35, . . . , 85. The hazard rates can therefore be derived from these data using the formula ( ) = ( )/(1 − ( )).
The results of these computations are given in Table 1. The frequency of the mutated allele has been estimated at = 0.0033 [9]. The death incidences needed in the competing risk section are given in Table 2. Figure 3 presents the incidence and survival for BC (carriers and noncarriers) as well as death. We can notice that the breast cancer incidences in carriers are always much higher than in noncarriers at any age and the relative risk between carriers and noncarriers is especially large (RR > 50) before age 40 (see Table 1) but then decreases with aging. We notice that the death incidence stays above the BC incidence for noncarriers at all ages and exceeds even the BC incidence for carriers from age 80. This shows the importance of taking it into consideration especially over a certain age.

Carrier Risk.
In this section we will use the belief propagation in Bayesian networks to obtain the posterior distribution of individual genotypes given the FH. We get the posterior probabilities of each genotype (noncarrier, heterozygous carrier with a paternal mutated allele, heterozygous carrier with a maternal mutated allele, and homozygous carrier). Figure 4 represents the marginal posterior probability P( = | FH) for all Individuals and for = 10 (paternal carrier) and = 01 (maternal carrier). Note that the posterior probability of the monozygous carrier genotype ( = 11) being almost zero for each individual is not shown here. The posterior probability of the noncarrier genotype can be easily deduced.
We can notice that the probabilities of being a noncarrier for 1, 3, 4, 7, 8, and 9 are all by far the highest despite the severe phenotype of relatives (granddaughter, niece, or daughter). This result is consistent with the personal history of Individual 2 (ovarian cancer at age 51) which points her out as the   (Figure 1). The posterior probability of being a paternal carrier P( = 10 | FH) and a maternal carrier P( = 01 | FH)) is colored in black and in grey, respectively. The deleterious allele being very rare in the general population ( = 0.33%), the probability of the monozygous carrier genotype is almost zero for each individual and it is therefore not represented here. most likely origin of the mutation in the family. Let us note that since we have no additional information on the ancestors of Individual 2, it is impossible to determine whether her mutation was transmitted by her father or her mother. As a consequence, the posterior carrier probability is equally shared between the paternal and maternal carrier genotypes.
Considering the severe personal history of cancer of Individuals 10 and 11, the most likely situation would be that they both received the mutation of their grandmother through their respective fathers (Individuals 6 and 5, resp.). The posterior probabilities are clearly consistent with this scenario: Individuals 5 and 6 have a probability of ≃90% to be maternal carriers, and Individuals 10 and 11 have similar probabilities to be paternal carriers. Note that Individual 12, being unaffected at age 37 (which is not very informative), basically has 50% chance to have received the mutation from her father.  Note that Individual 2 has a severe personal history of cancer (ovarian cancer at age 51) in all other examples. As a consequence, Individual 1, as a male with no personal history of cancer, is mostly totally uninformative and therefore not included in the forthcoming analyses.
Individual 4 having no children is independent from the rest of the family conditionally to her phenotype and her parent's genotype. With no information about her phenotype in any FH, her probability of being a carrier is therefore almost half her mother's in each FH (because her father is almost uninformative). If we compare the posterior distributions of the genotype of Individual 3 in FH2, FH3, and FH4, we can notice that the ovarian cancer of her mother which increased her mother's probability of being a carrier raises her probability of being a carrier (FH2). A piece of protective information about her phenotype such as no cancer until age Table 3: Product of the posterior marginal probabilities P( 2 | FH)P( 6 | FH) and joint posterior probability P( 2 , 6 | FH) in the context of known and unknown 3 . NC: noncarrier; C: carrier. On the contrary, the cancer at young age of her daughter which increases her daughter's probability of being a carrier raises her own probability of being a carrier (FH4-6).
We also notice the causal relationships in a whole branch of the family with the transmission between Individuals 2, 3, and 6 of the deleterious allele being highly probable which raises the probability of being a carrier for Individual 3 even in the presence of a protective phenotype (unaffected at age 61) in FH4.
We finally observe the influence of the spouse's genotype when having children (FH5). The higher risk of being a carrier for Individual 5 (because of his cancer at age 72) strongly decreases the carrier probability of his spouse (in comparison with FH4) since the paternal origin of the disease mutation naturally becomes the most likely event. On the other side, the increase of risk for Individual 3 when suppressing her protective phenotype (FH6) also has a consequence on the marginal posterior distribution of her spouse in lowering his probability of being a carrier as his participation in the risk for their daughter is lowered.
To summarize, one's probability of being a carrier mainly depends on (1) one's probability of having at least one carrier parent, which is correlated to the history of cancer of one's ancestors, and (2) one's probability of having transmitted the mutation to one's offspring which is correlated to the history of cancer of one's descendant relatives and one's spouse probability of being a carrier.

Remark 4.
As introduced in the Disease Risk, we know that posterior carrier probabilities should decrease with time for unaffected individuals. For example, if we assume that Individual 4 is unaffected at age 40 in FH6, her probability of being a carrier is 24%. If she stays unaffected up to age 60 and age 80, her probability of being a carrier decreases to 15% and 8.5%, respectively. Table 3 gives a practical illustration of the dependence and conditional independence in a trio (grandparent-parentchild). We compare the posterior joint distribution and the product of the posterior marginal distributions of genotypes

Cancer Risk.
As in Section 2.3 we now consider a female individual who is unaffected at age (i.e., { > } ⊂ FH) and denote by = P( ̸ = 00 | FH) its posterior carrier probability. The purpose of this section is to compute the posterior risk of cancer for this individual (with or without the competing risk of death). As previously explained, these risks only depend on and . Figure 6 represents the individual risk of breast cancer up to age 100 (note that we obtain qualitatively similar results with a lower age limit (e.g., age 80), but quantitative results are more illustrative with age 100) without the competing risk of death and variant and . We can see that the individual risk of BC rises as increases and decreases. This result is quite intuitive as the younger a patient is, the longer she will be at risk until age 100; the greater her probability of carrying a deleterious allele, the greater her risk to develop a cancer.
As introduced in the previous section the probability of being a carrier for an unaffected individual decreases with time if she stays unaffected. Assuming Individual 4 was 52 in FH4, Figure 7 shows the evolution of the probability of being a carrier for Individual 3 and Individual 4 in FH4. As they stay unaffected we can clearly see the decrease of this probability which has to be taken into account in the computation of the individual risk of breast cancer over time (see Section 2.3).
As explained in Section 2.3, computing risk with the competing risk of death requires a numerical discretization of age by a fixed step Δ . In order to calibrate Δ we used Δ = 0.01 as a reference and observed that Δ = 0.1 is a reasonable balance between accuracy and computational efficiency (data not shown). Figure 8 represents the individual risk of breast cancer for Individual 7 ( = 0.553% and = 62 years) and Individual 12 ( = 44.6% and = 37 years) in our hypothetical family from to 100 years with and without taking into account the competing risk of death. We can see that the difference between the two curves for each individual is increasing with the age. The age from which the difference becomes significant varies with the couple ( , ). We also observe that the individual risk of breast cancer eventually reaches a plateau which corresponds to the point where the incidence of breast cancer becomes negligible compared to the incidence of death in the elderly.
Quantitatively, the importance of taking into account the competing risk of death is pointed out in Figure 9 which represents the difference between the individual risks of breast cancer up to the age of 100 years for variant couples ( , ). For example, for Individual 3 in FH4 ( = 76.59% and = 61, see Figure 5), the error while calculating her individual risk of breast cancer up to the age of 100 years reaches almost 14%. If it is clear that the competing risk of death can have a limited effect on the global risk of cancer for certain couples ( , ), its effect is never totally negligible, and since we provide a rigorous way to take it into account, we strongly advocate its use in all circumstances.

Conclusions
We presented here a general model for genetic disease with variable age at onset. This model, a Bayesian network, combines classical genetic modeling with survival analysis. In order to deal with the (mostly) unobserved genotypes, we first explained in detail how belief propagation can be used to perform likelihood and posterior probability computations. Secondly, we focused on the challenging problem of computing posterior individual disease risks, with or without taking into account the competing risk of death. Finally, we illustrated these results with the Claus-Easton model for breast and ovarian cancer. The R source codes are available upon request for the interested readers.
For the sake of simplicity, we only considered a biallelic locus with standard distribution (autosomal, Hardy-Weinberg, and Mendelian allele transmission) but extensions (e.g., multiloci, unbalanced allele transmission, and lethal genotypes) are straightforward. For the survival model, we presented a simple dominant effect without covariates, but again extensions to any proportional hazard model (e.g., recessive, additive, and with covariates) are easy to implement. Incorporating random effects (at the individual and/or familial level) in the model (like in the BOADICEA model, see [11,21]) is clearly also possible but slightly more challenging.
Computation of posterior carrier distributions remains almost unchanged except for the random effect support which must be discretized (five values are claimed to be sufficient in the BOADICEA literature) and for the belief propagation which must be performed once for each of the possible values of the random effect. For posterior risks, calculations get slightly more complex since the posterior individual hazard must now be integrated over the (changing over time) posterior joint distribution of the individual genotype and of the random effect. Basically, all computations are slightly more intensive with random effects, but most results of Section 2.3 remain very similar.
One of the important limitations of the present work is the fact that we assume that all model parameters are known. However, it should be noted that likelihood and conditional likelihood might be easy to compute through the belief propagation which means that we basically provide all the necessary means to estimate the model parameters from actual data. In that context, it is nevertheless critical to deal efficiently with ascertainment issues: the fact that the family ending up in the database are usually precisely the one with the most severe disease family history. But standard methods like the PEL [5], which basically are conditional likelihood computations, are known to deal relatively well with the problem.
In order to take into account the competing risk of death, we used death from all causes, which was obtained from registry data [28]. However, only death without cancer precludes the onset of cancer and we are not interested in death from all causes. Since registry data usually do not report the causes of death it is a difficult task to estimate the risk of death without cancer. This has been studied, for instance, in Wanneveich et al. [7] through an illness-death model, using registry data and differential equations to model the specific causes of death. Nevertheless, it is very likely that the gain in terms of predictions would be minor as mortality from all causes is likely to be close to mortality without cancer.
Further work includes all the extensions described above (e.g., more complex genetic model, genetic tests, and familial random effects) as well as the development of a clinical web application for the Claus-Easton model in close collaboration with the cancer genetics department of the Institut Curie. From the methodological point of view, we plan to focus on the computation of more complex posterior distribution like the number of carriers in any subgroup of individuals and/or the familial posterior risk (time before any family member at risk is diagnosed).

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.