The Prognostic Quality of Risk Prediction Models to Assess the Individual Breast Cancer Risk in Women: An Overview of Reviews

Purpose . Breast cancer is the most common cancer among women globally, with an incidence of approximately two million cases in 2018. Organised age-based breast cancer screening programs were established worldwide to detect breast cancer earlier and to reduce mortality. Currently, there is substantial anticipation regarding risk-adjusted screening programs, considering various risk factors in addition to age. Te present study investigated the discriminatory accuracy of breast cancer risk prediction models and whether they suit risk-based screening programs. Methods . Following the PICO scheme, we conducted an overview of reviews and systematically searched four databases. All methodological steps, including the literature selection, data extraction and synthesis, and the quality appraisal were conducted following the 4-eyes principle. For the quality assessment, the AMSTAR 2 tool was used. Results . We included eight systematic reviews out of 833 hits based on the prespecifed inclusion criteria. Te eight systematic reviews comprised ninety-nine primary studies that were also considered for the data analysis. Tree systematic reviews were assessed as having a high risk of bias, while the others were rated with a moderate or low risk of bias. Most identifed breast cancer risk prediction models showed a low prognostic quality. Adding breast density and genetic information as risk factors only moderately improved the models’ discriminatory accuracy. Conclusion . All breast cancer risk prediction models published to date show a limited ability to predict the individual breast cancer risk in women. Hence, it is too early to implement them in national breast cancer screening programs. Relevant randomised controlled trials about the beneft-harm ratio of risk-adjusted breast cancer screening programs compared to conventional age-based programs need to be awaited.


Introduction
Breast cancer is the most common cancer among women globally, with an incidence of approximately two million cases worldwide in 2018 [1].In high-income countries, about 75% of breast cancer cases are diagnosed in postmenopausal women and only fve-seven % afect women younger than 40 [2,3].Te illness exhibits heterogeneity, encompassing various histological and molecular subtypes stemming from diverse aetiologies, each exhibiting diferential responses to treatment and prognosis [4,5].Factors such as increasing age, high breast density, history of neoplastic breast disease, family history of breast cancer, genetic predispositions (single nucleotide polymorphisms (SNPs)) (single nucleotide polymorphisms are variations of a single base pair in a complementary DNA double strand and are inherited and heritable genetic variants), as well as hormonal, lifestyle, or radiation exposure factors, can increase the risk of developing breast cancer [6][7][8][9][10][11]. Table 1 presents the criteria usually considered to identify women with an increased risk of developing breast cancer.
To date, great hope is placed in a risk-based screening approach.Since the early 1970s, organised breast cancer mammography screening programs have been established worldwide to reduce mortality by earlier cancer diagnosis [12][13][14][15].Te only risk factor considered so far in these programs is age.In risk-based screening, risk prediction models estimate the likelihood of women developing breast cancer in the future, considering other risk factors next to age [16][17][18].By considering multiple risk factors, women could be stratifed into diferent risk groups, which enable risk-adjusted screening strategies.For example, less frequent Table 1: Criteria for a high risk of developing breast cancer.

Risk of developing breast cancer
First-degree relatives (e.g., parents and siblings) with a breast cancer diagnosis before the age of 50

Twofold risk
An increased breast density on mammography (D3-heterogenous density or D4-extreme density) Women with extremely dense breast tissue have a twofold increased risk compared to women with an average density breast Terapeutic thoracic radiation between the ages of 10 and 30

40% lifetime risk
History of atypical hyperplasia (AH) Te absolute cumulative risk of 30% at a 25-year follow-up History of lobular in situ carcinoma (LCIS)
2 Te Breast Journal mammograms could be recommended for women with a low risk of breast cancer.Hence, risk-adjusted breast cancer screening might reduce the disadvantages of conventional age-based screening programs, e.g., overdiagnosis and overtreatment, or enable breast cancer diagnosis at an earlier stage [19][20][21][22][23].
Tere are empirical, genetic, and other original risk prediction models.Empirical models, e.g., the Gail model (the Breast Cancer Risk Assessment Tool (BRCAT)), the Breast Cancer Surveillance Consortium (BSCS) model, and the Rosner-Colditz model include risk factors previously identifed by logistic regression and Cox proportional hazard regression in cohort and case-control studies [24].Using a statistical algorithm, these models generate the probability that an individual will develop breast cancer in a given time [24].Genetic models, e.g., the International Breast Cancer Intervention Study (IBIS)/Tyrer-Cuzick model and the BOADICEA and BRCAPRO ™ models, are based on the evaluation of family studies and segregation analyses.In addition, pedigree information is used to calculate agedependent mutation and disease risks for all family members [25].Tables S1a and S1b of the supplement provides an overview of the characteristics of the most common empirical and genetic breast cancer prediction models, including a list of risk factors considered in each model.Besides, some further original models combine various risk factors in diferent populations with diferent algorithms, e.g., the Barlow model [26] for pre-and postmenopausal women.
Our study aimed to investigate the prognostic quality of the identifed breast cancer risk prediction models and whether they are suitable for assessing individual breast cancer risk in a screening program.

Materials and Methods
We conducted an overview of reviews, considering most of the Preferred Reporting Items for Overviews of Reviews (PRIOR) statement [27].An overview of reviews was the appropriate methodological approach because a preliminary search yielded several published systematic reviews (SRs) regarding the prognostic quality of individual breast cancer risk prediction models.Tereby, the extensive knowledge from the SRs could be summarised as concisely as possible.

Literature Searches.
In March 2022, we conducted a comprehensive systematic literature search in four databases, namely, Ovid MEDLINE, EMBASE, the Cochrane Library, and CRD.Te systematic literature search was performed considering the predefned inclusion criteria according to the PICO scheme (Table 2).Te detailed search strategy is presented in the supplement (see Tables S2-S6).
In addition, we conducted further manual searches to identify the full texts of the primary studies of the selected SRs for more detailed information if relevant.

Literature Selection Process.
Te systematic literature search yielded references initially assessed at the title level.Subsequently, references deemed pertinent underwent screening at the abstract level.Finally, full texts of relevant abstracts were scrutinised against predefned inclusion criteria for incorporation or exclusion in the overview of reviews.Two reviewers (IF and SW) conducted all procedures independently, with discrepancies resolved through discussion involving a third author (IZK).

Assessed Primary
Outcome.Te primary efectiveness outcome of this overview of reviews was the discriminatory accuracy of the identifed breast cancer risk prediction models; that is to say, the probability that a model correctly categorises a randomly chosen woman with the disease at higher risk than a randomly chosen woman without the disease.To provide the most accurate individual risk assessment, the models need to balance the diagnostic sensitivity and specifcity represented by the receiver operating characteristic curve (ROC).Te area under this curve (AUC) quantifes the discriminatory accuracy of a prediction model.An AUC value of 0.5 indicates that the discriminatory accuracy of a model is no better than a coin toss.In contrast, an AUC value of 1.0 denotes perfect discriminatory accuracy.In practice, models with an AUC value greater than 0.7 are deemed to predict the individual risk for breast cancer at acceptable accuracy.

Data Extraction and Quality Appraisal. One author (IF)
extracted the characteristics of the included SRs and the data of the SRs on primary study level.IF extracted further data directly from the primary studies if necessary information was missing.A second author (SW) controlled the data extraction.Both authors (IF and SW) assessed the quality of the selected SRs independently according to the AMSTAR 2 tool.Te checklist encompasses inquiries about the methodological procedures employed in a review, the thoroughness of the results and conclusions, the origins of funding, and the presence of potential conficts of interest [28].Te overall risk of bias of the systematic reviews included in this overview was evaluated independently by two authors (IF and SW) through a comparative analysis of the checklist fndings derived from the included reviews.Differences were discussed and resolved by consensus of all three authors (IF, SW, and IZK).

Analysis and Synthesis.
Finally, we narratively summarised the evidence on the prognostic quality of the identifed prediction models, including two tables that present the key results.Te detailed extraction tables showing the data on the primary study level are presented online.

Te Breast Journal
Te SRs investigated 30 risk prediction model versions (between one and 17 per SR) with diferent research focuses.One SR [50] examined the performance of various Gail/ BRCAT model versions.Two other SRs [51,52] investigated the improvement in the discrimination accuracy of the models by adding essential risk factors, such as genetic information or breast density.Te remaining fve SRs [16,29,30,48,49] compared the model performance with each other or examined the use of multivariable prediction models in risk-based cancer screening programs.One of the fve SRs [29] evaluated breast, cervical, and colorectal cancer risk prediction models.However, for this overview of reviews, only the results concerning the breast cancer risk prediction models were considered.
Te primary outcome parameters in all eight SRs were the discriminatory accuracy and the calibration accuracy of the breast cancer risk prediction models.Tis overview of reviews focused solely on the discriminatory accuracy of the models.
Table S7 of the supplement presents the characteristics of the included SRs in more detail.

Quality Assessment.
Two of the included SRs were rated with a low risk of bias [16,30] and three with a moderate risk of bias [50][51][52].Te remaining three systematic reviews were rated with a high risk of bias [29,48,49].Te major faws were due to signifcant methodological limitations, including unclear literature selection and data collection processes.Moreover, no quality assessment of the primary studies was performed in three SRs [29,48,52], while the remaining fve SRs assessed the quality of the studies using diferent methods [16,30,[49][50][51].Table S8 of the supplement presents the quality assessment in detail.

Discrimination Accuracy of the Identifed Breast Cancer
Risk Prediction Models  [53].Since then, the original model has been validated in various populations (e.g., Caucasian/White/European, American, African-American, Asian, or Hispanic) and has been modifed many times by adding risk factors, such as breast density or hormone replacement therapy.Regarding the prognostic quality of the Gail model 1, AUC values ranging from 0.54 [54] to 0.69 [55] were reported.Adding or removing risk factors, such as breast density, hormone replacement therapy, alcohol consumption, physical activity, diet, or ethnicity, to or from the Gail model did not improve the models' discrimination accuracy (e.g., AUC values of 0.56 [56] and 0.68 [57]).Solely a body mass index-adjusted Gail model showed an AUC value of 0.85 [52], and there were two outliers in Asian populations; one validation study showed an AUC value of 0.41 [48] and another presented a value of 0.93 [50] for the Gail model.
(2) Te Breast Cancer Surveillance Consortium Model (Empirical).Six validation studies, included in three SRs [16,30,51], assessed the prognostic quality of the BCSC model, which originates from the USA.All six validation studies included mixed ethnicities.
Te original BCSC model includes the following eight risk factors: age, body mass index, age of menopause, hormone replacement therapy, breast density, prior breast biopsies, and family history of breast cancer.Concerning the prognostic quality of the original BCSC model, the validation studies showed AUC values, ranging from 0.58 to 0.67 [58].Tree validation studies added genetic information as a polygenic risk score to the model.Tey achieved AUC values of 0.69 [32], 0.65 [59], and 0.72 [58], whereby the latter applied to the prediction of oestrogen receptorpositive breast cancer.
(3) Te Rosner and Colditz Model (Empirical).In fve of the eight included SRs [16,29,30,48,49], nine validation studies investigated the prognostic quality of the Rosner and Colditz model.Eight of the nine studies were from the USA, and one was from France.Te nine studies considered solely Caucasian/White populations.
Te original Rosner and Colditz model includes the following fve risk factors: age, body mass index, hormone replacement therapy, benign breast disease, and family history of breast cancer.Te original model has an AUC value of 0.57 [31] and was often modifed.For example, adding serum estradiol to the model improved its discriminatory accuracy (AUC value of 0.635) [33].Similarly, adding risk factors, such as breast density, multiple hormone level determinations, and/or a polygenic risk score, to the original Rosner and Colditz model resulted in an improved AUC value of 0.68 [60].
(4) Te International Breast Cancer Intervention Study/ Tyrer-Cuzick Model (Genetic).Four SRs [16,30,51,52] included eight validation studies on the IBIS model.Two of the eight studies came from the USA, fve from the United Kingdom (UK), and one from Australia.Te studies included diferent populations, namely, Caucasian/European, North American, African-American, Hispanic, and mixed ethnicities.One study did not report the assessed population.
Te original IBIS/Tyrer-Cuzick model considers the following 14 risk factors: age, body mass index, age at menarche, age of frst live birth, age of menopause, parity, hormone replacement therapy, breast density, atypical ductal hyperplasia, lobular carcinoma in situ, prior breast biopsies, family history of breast cancer (including age at diagnosis and bilateral breast cancer), family history of ovarian cancer, and genetic testing (BRCA1/2 and SNPs).Te SRs and validation studies did not report an AUC value for the original IBIS/ Tyrer-Cuzick model.Te discriminatory accuracy of diferent model versions ranged from AUC values between 0.51 and 0.76, with the latter AUC value reported from a study in a high-risk European population [34][35][36].IBIS/Tyrer-Cuzick model versions, including a polygenic risk score, reached an AUC value of 0.67 and versions that considered breast density as a risk factor had an AUC value of 0.64 [37].
(5) BOADICEA and BRCAPRO ™ Models (Genetic).One SR [51] included two validation studies that assessed the prognostic quality of two further genetic breast cancer risk prediction models.Both studies were from Australia, whereby one assessed the discriminatory accuracy of the BOADICEA model and the other of the BRCAPRO ™ model.

Both studies included Caucasian populations.
Te original BOADICEA model includes the following six risk factors: age, family history of breast cancer with age at diagnosis, family history of male breast cancer, family history of ovarian cancer, and genetic testing (BRCA1/2 and SNPs).Te BRCAPRO ™ model considers two further risk factors, i.e., family history of bilateral breast cancer and ethnicity of the family.Te discriminatory accuracy of the BOADICEA and the BRACAPRO ™ models is moderate, with an AUC value of 0.66 and 0.65, respectively.Adding a polygenic risk score with 77 risk-associated SNPs to both models improved their discriminatory accuracy signifcantly with AUC values of 0.70 and 0.69, respectively [61].
Table 3 presents an overview of the discriminatory accuracy of the identifed empirical and genetic breast cancer risk prediction models and shows that almost all identifed model versions had a limited discriminatory accuracy with AUC values <0.70 One study did not report on the population.Te discriminatory accuracy of these original models ranged from AUC values of 0.53 [38] to 0.785 [45], whereby the latter applied to the prediction of ER-positive, HER2negative, invasive, and noninvasive carcinoma in a Japanese population considering a polygenic risk score.A Swedish model [46] including age, body mass index, hormone replacement therapy, family history of breast cancer, age at menopause, breast density, microcalcifcations, and spaceoccupying lesions as risk factors showed an AUC value above 0.71 for a Caucasian population.Te discriminatory accuracy of the models considering breast density as a risk factor ranged from AUC values of 0.63 [40] to 0.72 [41], depending on whether the absolute area, per cent of the area, or fbroglandular volume of breast density measurement was used.Te models that included a polygenic risk score as a risk factor-except the Japanese model-had AUC values between 0.60 [38] and 0.693 [44].Te Barlow model had a moderate discriminatory accuracy with AUC values of 0.631 for premenopausal and 0.624 for postmenopausal women [26].
Table 4 summarises the discriminatory accuracy of further original breast cancer risk prediction models, depending on the considered risk factors and breast cancer types.Overall, most of the identifed models have a limited discriminatory accuracy with AUC values <0.70, except a Swedish model with a two-year time horizon [46] and a Japanese model that considered SNPs [45].
In the supplement (Tables S9a-S9f ), the detailed extraction tables present the data per prediction model on the primary study level.

Discussion
Most identifed breast cancer risk prediction models with low prognostic quality do not accurately predict the individual breast cancer risk.Adding breast density and/or genetic information as crucial risk factors moderately improved the discriminatory accuracy of the prediction models but remained below the minimum AUC value of 0.70.Exceptions include a modifed Gail model assessed in an Asian population, a modifed BCSC model that applied the prediction of oestrogen receptor-positive breast cancer, an IBIS/Tyrer-Cuzick model version that was applied in a highrisk European population, the BOADICEA model that considered SNPs, and two further original models, one from Japan and one from Sweden.Te AUC value above 0.70 in the Japanese study [45] may be due to the risk prediction of solely ER-positive, HER-2-negative breast cancer.Te AUC value above 0.70 in the Swedish model [46] could be explained by the short time horizon of two years, as risk prediction becomes more imprecise over a longer time horizon.Overall, the diferences in the AUC values can be mainly explained by diferences in study populations, comprising various geographical regions, cancer risk groups, and cancer types.
Besides the discriminatory accuracy of the risk prediction models, further aspects need to be considered if these models are to be used more widely.
Te identifed breast cancer risk prediction models were developed and validated for use in a clinical (genetic) setting and/or to identify specifc patient groups eligible for preventive intervention but not for population-based screening [47].For example, the Gail/BRCAT model is considered suitable for identifying women who would beneft from chemoprevention [39,42].Terefore, the appropriate setting needs to be assessed before applying a risk prediction model.
Critical risk factors, such as breast density, come with assessment requirements.Density-based risk calculations are often based on visual density estimates using BI-RADS categories.However, objective criteria for a standardised density measurement according to BI-RADS categories are lacking in practice [43].Volumetric density measurements are fully automated and have excellent agreement with 3D magnetic resonance images but are less informative than the BI-RADS categories [62][63][64].Hence, considering breast density as a risk factor for predicting individual breast cancer risk requires a standardised density measurement.Similarly, assessing genetic information as an additional risk factor requires the organisation of cooperations between qualifed centres for medical genetics.Te Breast Journal 0.58 [39]-0.71[40] Models assessing the prognostic quality in invasive and/or in situ breast cancer without and with breast density 5 Without breast density: 0.5 [41]-0.65 [42,43]; with breast density: 0.63 [41]-0.72 [44] Models assessing the prognostic quality in invasive and/or in situ breast cancer without and with SNPs 10 Without SNPs: 0.53 [45]-0.79[46]; SNPs enhanced: 0.60 [45]-0.69[47] Model assessing the prognostic quality in ER-positive, HER2-negative, invasive, and noninvasive cancers without and with SNPs Moreover, risk-based breast cancer screening requires valid risk prediction instruments with good prognostic quality and risk-adjusted screening strategies.Solely conducting risk assessments is not enough.Instead, low, medium, and high breast cancer risk groups need to be defned to provide women with risk-adjusted strategies where the screening intensity matches the individual risk.However, matching is only good if the applied risk assessment model has good discriminatory accuracy [65,66].Currently, there are no internationally uniform cutof values for the assignment to the risk group [25].
Besides, training in risk communication is necessary for healthcare professionals when risk-adjusted screening is planned to be implemented because risk-based screening is more complex for healthcare professionals and participants than standardised age-based screening.Risk-based screening includes performing risk assessments, appropriately communicating risk results, and consulting subsequent preventive interventions.Te latter, in turn, alters the risk of developing breast cancer.
From a scientifc point of view, evidence is lacking on the overall beneft-harm ratio of risk-based breast cancer screening compared to conventional age-based screening programs.Terefore, the results of two large ongoing randomised control trials (RCTs) on the efcacy of riskbased breast cancer screening need to be awaited, with results expected in a few years [67,68].
To our knowledge, this is the frst overview of reviews assessing the prognostic quality of breast cancer risk prediction models and whether they apply to a populationbased screening.However, the results of this overview should be viewed in the context of its limitations.
While adhering to most methodological steps outlined by the PRIOR checklist for systematic review overviews, we did not perform sensitivity analysis to assess the robustness of the review fndings.In addition, although we provided results at the primary study level, we evaluated the risk of bias solely for the systematic reviews rather than for all 99 primary studies.Finally, we did not examine reporting bias in the primary studies or the systematic reviews.
Despite the inclusion of systematic reviews exhibiting varying degrees of methodological rigour, our analysis indicates that reviews with low or moderate risk of bias arrive at similar conclusions to those with a high risk of bias.
Furthermore, the selected SRs included validation studies published until 2019.Hence, the studies refer to earlier screening data, capabilities, and programmes that may no longer be topical.We did not conduct a further systematic search for studies published after 2019 or systematic reviews published after March 2022.A systematic review published in July 2022 [69] also emphasised that there are currently no endorsed risk prediction models for breast cancer tailored to diverse ethnic populations.Furthermore, we did not assess a machine learningbased software tool, the Mammo-Risk ™ model (Predilife, Villejuif, France) [70], as it was published in 2022.Te model was developed in the BCSC cohort [71,72] to estimate the risk of developing breast cancer within the next fve years based on the following four risk factors: age, family history of breast cancer, history of breast biopsies, and breast density with or without a polygenic risk score.Based on the results of the frst validation studies, the model has an AUC value of 0.659 AUC and thus does not predict the individual risk of breast cancer with sufcient accuracy.

Conclusion
All breast cancer risk prediction models published to date show a limited ability to predict the individual breast cancer risk in women.Adding crucial risk factors, such as genetic information and breast density, only slightly improved the discrimination accuracy of the models.Hence, more reliable models with better predictive power are needed before using them in national screening programs.Besides, results of ongoing RCTs need to be awaited to shed more light on the beneft-harm ratio of risk-adjusted breast cancer screening compared to conventional age-based screening.

Table 2 :
Inclusion and exclusion criteria following the PICO scheme.
3.2.Characteristics of the Systematic Reviews.Te eight included SRs were written in English and published between 2012 Figure 1: Representation of the literature selection process (PRISMA fow diagram).Te Breast Journal from Asia, 10 from Europe, and 3 from Australia.Most validation studies included Caucasian/White/European populations.Besides, the studies also considered North American, Asian, Hispanic, African-American, and Australian populations.Two publications did not report on the population.Te Gail model is the most investigated and modifed breast cancer risk prediction model.Te original Gail model, developed in 1989, includes the following fve risk factors: age, family history of breast cancer, age at frst birth, age at menarche, and previous biopsies [16,29,30,[48][49][50][51][52]dels(1) Te Gail/Breast Cancer Risk Assessment Model (Empirical).In the eight included SRs[16,29,30,[48][49][50][51][52], 58 validation studies analysed how accurately the Gail model can predict individual breast cancer risk.33 of the 58 validation studies were from the United States of America (USA),12 [16,30,48,49,51,52ded a modifed Gail model applied in an Asian population, a modifed BCSC model that applied the prediction of oestrogen receptorpositive breast cancer, an IBIS/Tyrer-Cuzick model version applied in a high-risk European population, and the BOADICEA model expanded with SNPs.Six of the eight included SRs[16,30,48,49,51,52] investigated 24 further original models.Four validation studies were from Europe, nine from the USA, nine from Asia, one from Canada, and one from India.Most validation studies included Asian populations.Besides, the studies also included Caucasian/ White/European, North American, and mixed ethnicities.

Table 3 :
Overview of the prognostic quality of the empirical and genetic models.� area under the curve, BCSC � Breast Cancer Surveillance Consortium, BRCAT � Breast Cancer Risk Assessment Tool, CI � confdence interval, IBIS � International Breast Cancer Intervention Study, NR � not reported, SNPs � single-nucleotide polymorphisms. 1 Range involves AUC values for varying risk factors.Te bold values present the AUC value ranges. AUC

Table 4 :
Overview of the predictive quality of the original models.