Accuracy of Self-Reported Breast Cancer Information among Women from the Ontario Site of the Breast Cancer Family Registry

Obtaining complete medical record information can be challenging and expensive in breast cancer studies. The current literature is limited with respect to the accuracy of self-report and factors that may influence this. We assessed the agreement between self-reported and medical record breast cancer information among women from the Ontario site of the Breast Cancer Family Registry. Women aged 20–69 years diagnosed with incident breast cancer 1996–1998 were identified from the Ontario Cancer Registry, sampled on age and family history. We calculated kappa statistics, proportion correct, sensitivity, specificity, and positive and negative predictive values and conducted unconditional logistic regression to examine whether characteristics of the women influenced agreement. The proportions of women who correctly reported having received a broad category of therapy (hormone therapy, chemotherapy, radiation, or surgery) as well as sensitivity and specificity were above 90%, and the kappa statistics were above 0.80. The specific type of hormonal or chemotherapy was reported with low-to-moderate agreement. Aside from recurrence, no factors were consistently associated with agreement. Thus, most women were able to accurately report broad categories of treatment but not necessarily specific treatment types. The finding of this study can aid researchers in the use and design of self-administered treatment questionnaires.


Introduction
Medical records are considered the gold standard in regards to obtaining accurate information pertaining to breast cancer diagnosis, treatment, and prognosis. However, this method of data collection can be very time consuming and labor intensive. Obtaining complete longitudinal treatment information for individuals over long periods of time can be challenging and expensive, particularly when conducting large population-based studies of breast cancer patients. With an increasing focus on epidemiological studies of prognosis and survivorship in breast cancer, clinical data collection is a methodological issue of rising importance.
Self-reported treatment questionnaires may be a more feasible option in some cases; however, few studies have examined the accuracy of self-reported breast cancer treatment and fewer have considered factors potentially influencing accuracy. To our knowledge, only four studies have specifically examined the accuracy of self-reported breast cancer treatment to date [1][2][3][4]. All four studies demonstrated that breast cancer survivors can accurately recall some important information pertaining to their treatment. However, two of the studies had small sample sizes [2,3], while one study was restricted to low income women in California [1] and only two of the studies assessed factors potentially associated with reporting accuracy. To add to the limited 2 Journal of Cancer Epidemiology literature on this topic, we assessed the agreement between self-reported breast cancer treatment and recurrence and medical record information among women in the Ontario Familial Breast Cancer Registry (OFBCR), the Ontario site of the Breast Cancer Family Registry (BCFR). We also examined characteristics of the women that could potentially influence agreement, including breast cancer family history (the study population was enriched for women with a family history of breast cancer) and menopausal status at the time of diagnosis. We also assessed the agreement between menopausal status in medical records and self-report.

Study Sample.
Breast cancer patients included in this analysis were women recruited into the OFBCR. The OFBCR is one of six registries in the Breast CFR, an international collaboration of academic and research institutions located in the United States, Canada, and Australia designed to examine the genetic, environmental, and lifestyle factors associated with breast cancer [5].
In Ontario, women aged 20 to 69 diagnosed with breast cancer between the calendar years of 1996 and 1998 were identified from the population-based Ontario Cancer Registry and recruited between 1996 and 2000 using a sampling strategy incorporating age, ethnicity, and family history of cancer to enrich the registry with genetically predisposed individuals [5]. All women diagnosed under the age of 36 years were eligible for the OFBCR. For women between the ages of 36 and 54 years, all those meeting the high-risk criteria (in the following) or those of Ashkenazi Jewish heritage were eligible. Furthermore, a sample of 25% of the women in this age group who did not meet the highrisk criteria was also eligible. For women between the ages of 55 and 69 years, a sample of 35% of those meeting the high-risk criteria or those of Ashkenazi heritage and a sample of 8.75% (35%× 25%) of those not meeting the high-risk criteria were eligible. High-risk was defined as women with a previous diagnosis of breast or ovarian cancer; at least one first-or two second-degree relative(s) with breast or ovarian cancer; at least one second-or third-degree relative with breast cancer diagnosed before the age of 36 years, ovarian cancer diagnosed before the age 61 years, multiple breast cancer primaries, both breast and ovarian cancer, or male breast cancer; at least three first-degree relatives with any combination of breast, ovarian, colon, prostate, or pancreatic cancer or sarcoma, with at least one diagnosis before the age of 51 years [5].
Using the sampling strategy described above, physicians gave permission to contact 7,662 women of 8,448 women who were identified with incident breast cancer diagnosed between 1996 and 1998. Of these women, 4,962 women completed the Family History Questionnaire (described in the following), and 2,399 women were eligible and willing to proceed. After excluding women for whom blood samples and permission to access medical records were not available (note that in some cases it was because the women had died rather than refused) and those with a prior cancer diagnosis or distant metastases at diagnosis, 1082 women were eligible for the current study, of which 939 had a first diagnosis of unilateral breast cancer with medical record and treatment questionnaire data available for analysis. The selection of the study sample has been previously described in detail [6] with the exception of the final number of women with medical records available as some were subsequently obtained. In the nearly completely overlapping study population in the previous publication, we showed that there were no significant differences compared to women without a blood sample and/or permission to retrieve medical records or women whose records were unavailable with the exception that the latter group were slightly less likely to be node positive [6].

Data Collection
The self-administered Treatment Questionnaire gathered information regarding aspects of treatment for breast or ovarian cancer. It consists of questions that address stage (whether the cancer was only in the breast or breast and lymph glands or had spread to other sites), the type of initial breast cancer treatments received, and whether recurrence has occurred. Each item related to treatment first asked whether the broad category of treatment was received (e.g., "did you have hormonal therapy for this breast cancer"), and then for those who responded "yes," more specific questions regarding the details of the treatment were asked (e.g., "please list medicines if known"). The question on recurrence asked "has the cancer come back (recurred) after the treatments listed above?".

Additional Questionnaires. The mailed Family History
Questionnaire gathered information regarding previous cancer diagnoses in the women, their parents, siblings, children, and any other relatives, and family history was reviewed over the telephone with a genetic counselor. All cancers, except nonmelanoma skin cancers and cervical carcinoma in situ, were recorded, along with dates of all cancer diagnoses and deaths [5]. The mailed Personal History Questionnaire gathered information on the period prior to diagnosis pertaining to the demographic characteristics of the women, personal history of cancers, prior breast and ovarian surgeries, prior radiation exposure, smoking and alcohol behavior, reproductive history, breast-feeding, hormones used, height, weight, and physical activities [5].

Medical Record Abstraction.
Clinical and treatment data were abstracted from the patient's medical records at each clinic using standard data collection forms. Data collected included the method and date of diagnosis, clinical stage, type of surgical treatment, whether or not radiation treatment was given, dates and types of hormonal therapies, and dates and types of chemotherapy administration. All data were reviewed, verified, and coded centrally by the research nurse coordinator and an oncologist [6]. Followup data from medical records were collected annually for breast cancer recurrence, new primary cancers, and death. Vital status was also ascertained from linkage to the Ontario Journal of Cancer Epidemiology 3 Cancer Registry which routinely obtains death certificate information [6].

Statistical Methods.
Agreement analysis was conducted with analysis restricted to the most common types of treatment reported in both the Treatment Questionnaire and in the medical records. Women who had start dates for hormonal therapy, chemotherapy, and surgery were coded as having received the therapy in question, while women without respective start dates were coded as not having received the therapy (based on the assumption that a start date indicated that a prescribed treatment was actually administered). Any treatment that occurred after the completion date of a woman's respective treatment questionnaire was coded as not having occurred (since it would have occurred after self-report). For radiation therapy, no treatment dates were available to us in the study database; accordingly we could not determine whether any treatments occurred after the treatment questionnaire completion date. Thus, analysis regarding radiation therapy was conducted excluding women with breast cancer recurrence at the time of questionnaire completion (based on medical record abstraction) since it is likely that this was the only situation where radiation might have occurred following the questionnaire, which was completed at least a year after diagnosis. In regards to whether women could accurately report the specific type of hormonal therapy or chemotherapy medication they received (alone and/or in combination with other medications), agreement analysis was restricted to only those women who were coded as having received the respective broader category of treatment (i.e., hormonal therapy or chemotherapy).
Agreement between self-reported treatment and breast cancer recurrence and medical records was first assessed by calculating the kappa statistic, which takes chance agreement into account. Next, using medical records as the gold standard, agreement was also analyzed by calculating the proportion correct (the percentage of women who correctly identified whether they received a given treatment or not), sensitivity (the proportion of women who correctly report having received a given treatment) and specificity (the proportion of women who correctly report not having received a given treatment), and positive predictive values and negative predictive values where appropriate. With respect to menopausal status, the Personal History Questionnaire asked women to identify the age and the reasons why their periods stopped. Based on their responses, women were categorized as being premenopausal, postmenopausal, or as unknown at the time of diagnosis, where unknown status applied to women who had undergone a hysterectomy without a bilateral oophorectomy, or were on hormonal replacement therapy at the time of their diagnosis. As there is no gold standard, only the proportion in agreement and the kappa statistic were calculated. For treatment, two models of agreement analysis were conducted. The first model excluded unknown and missing responses, while the second model was based on the classification system suggested by Landis and Koch (1977) [7], where all selfreported "unknown" and missing responses were recoded as "no." The first model provided a more conservative estimate of agreement in terms of the proportion correct and accordingly was the model used to report results of the agreement analyses and for all subsequent regression analyses. Results were similar as there were few missing responses. For menopausal status where there is a larger proportion of unknown and missing information, agreement analysis was conducted treating this variable as both 2level (excluding unknown/missing information) and 3-level variables (including unknown/missing information).
Lastly, unconditional logistic regression analysis was conducted to examine the influence of the following on agreement: age at breast cancer diagnosis (continuous), recall period (continuous), education level, marital status, alcohol consumption (continuous), smoking behavior, first-degree family history of breast or ovarian cancer, menopausal status at diagnosis, breast cancer recurrence, and place of birth. In regards to place of birth, women were categorized as having been born in an English or non-English speaking country which was meant to serve as a proxy for proficiency in English. Logistic regression analysis was conducted to examine the factors associated with agreement for the broad categories of hormonal therapy, chemotherapy, and radiation, and, as suggested by Lipstiz et al. (2003) [8], an offset controlling for agreement due to chance was included in all logistic regression models. Due to the limited number of women who did not have surgery for their breast cancer, regression analysis was not conducted examining the factors associated with whether women could accurately report whether they received surgery or not. All statistical analyses were conducted using SAS Enterprise Guide, version 4.2 OnDemand.

Results
Descriptive statistics are displayed in Table 1, and as displayed in Table 2, according to the medical records, approximately half of the women (54.6%, n = 513) received hormonal therapy and chemotherapy (53.2%, n = 500) respectively, while over three quarters of the women (76.2%, n = 712) received radiation therapy, all but three of the women (n = 936) had surgery for her breast cancer, and 5.3% of the women experienced breast cancer recurrence (n = 50) (prior to the completion of the treatment questionnaire). The overall proportions based on self-reported information were similar ( Table 2).

Results of Agreement Analysis by Broad Category.
Agreement between medical records and self-reported treatment is summarized in Table 3.
Overall the proportion of women who correctly reported having received a broad category of therapy (hormone therapy, chemotherapy, radiation, or surgery) was above 90% for each therapy. Sensitivity and specificity along with positive and negative predictive values were also above 90%, and the kappa statistic was above 0.80 for each category, with the exception of surgery, which is likely attributed to the low number of women who did not have surgery for their breast  cancer. Since the kappa statistic takes chance agreement into account, limited variability increases the likelihood that agreement is due to chance alone. Furthermore, there was moderate agreement between medical records and selfreported information regarding whether women experienced breast cancer recurrence or not. Although 94.8% of women correctly reported whether they experienced breast cancer recurrence, sensitivity was only 75.5%, indicating that a quarter of the women experienced breast cancer recurrence according to their medical records but did not report so.
In regards to menopausal status, using the more conservative results (including unknown responses and treating menopausal status as a 3-level variable), there was 83.0% agreement between medical records and self-report and

Results of Agreement Analysis by Type of Treatment
Received within the Broad Categories. Women were able to report with low-to-moderate agreement the specific type of hormonal therapy or chemotherapy they received. The proportion of women who correctly reported whether or not they received the specific type of hormonal therapy was above 89% for each medication, while specificity was above 97% for each medication. Sensitivity was high for tamoxifen, however, approximately 69%, 82%, and 46% of women who received megestrol, anastrozole and those who were part of the MA12 trial, respectively, received these medications according to their medical records but did not self-report so. The kappa statistics also demonstrated low-to-moderate agreement for all medications.
In regards to chemotherapy, women could report with low-to-moderate agreement the specific type of medication they received. Specificity was high for all medications with the exception of cyclophosphamide (40.5%), while sensitivity was approximately 50% for each medication with the exception of docetaxel (6.7%), indicating that approximately half of the women who received a given medication according to their medical records did not self-report receiving the medication in question. The kappa statistic also demonstrated low-to-moderate agreement for each medication. In general, it appears as though agreement was lower for the medications commonly used in combination therapies (i.e., cyclophosphamide, methotrexate, and fluorouracil), indicating that women may experience greater difficulty identifying a medication they received in combination with other drugs in comparison to medications they received alone. And lastly, women were able to accurately report whether they received a mastectomy or a lumpectomy with excellent agreement on all statistics calculated. Tables  4 and 5, generally speaking, with the exception of breast cancer recurrence, no factors were consistently associated with agreement. In regards to the broad categories of treatment, women who experienced breast cancer recurrence during the study period (up to questionnaire completion) were approximately 80% less likely to correctly report whether or not they received hormonal therapy and chemotherapy (OR = 0.19, 95% CI: 0.09, 0.34, and OR = 0.20, 95% CI: 0.09, 0.42 for hormonal therapy and chemotherapy, resp.). In regards to specific types of medications, women who experienced breast cancer recurrence were also approximately 72%-90% less likely to correctly report the specific types of hormonal therapy medications received and were 58% and 92% less likely to correctly report having received the chemotherapy medications doxorubicin and docetaxel, respectively (OR = 0.42; 95% CI: 0.20, 0.92, and OR = 0.08, 95% CI = 0.03, 0.17 for doxorubicin and docetaxel, resp.). Having a family history of breast or ovarian cancer was not associated with accuracy of reporting.

Results of Regression Analysis. As demonstrated by
Furthermore, although no other factors were consistently associated with agreement pertaining to accurately reporting broad categories of treatment, or specific types of hormonal therapy medications received, older age at diagnosis, longer periods of recall, being born in a non-English-speaking country (in comparison to a non-English-speaking country), increased alcohol consumption, having a high-school education (in comparison to a bachelor degree), and being a current smoker (in comparison to a nonsmoker) were associated with worse agreement for some specific types of chemotherapy medications. In regards to agreement between menopausal status from a subsequent questionnaire and medical records, agreement was lower among older women, married women, and among current smokers (OR = 0.92, 95% CI: 0.91, 0.94, OR = 0.69, 95% CI: 0.40, 0.87, and OR = 0.63, 95% CI: 0.42, 0.96 for age at diagnosis, being married, and current smoking, resp.).

Discussion
This study demonstrates that accuracy of self-reported treatment compared to medical records was high for most broad categories of treatment received (i.e., hormonal therapy, chemotherapy, and radiation). The low specificity and kappa statistic for surgery reporting are likely due to the fact that very few women did not receive surgery for their breast cancer. In regards to menopausal status at time of diagnosis, agreement between medical records and selfreport information was also relatively high. Women were only able to report whether they experienced breast cancer recurrence or not with moderate accuracy. We also observed that agreement was moderate for some, but not all, types of treatment received within the broad categories of treatment (i.e., the type of hormonal therapy and chemotherapy received). The results of this study are generally consistent with previous research which also identified that women are able to accurately recall some important details pertaining to their treatment [1][2][3][4].
We also found that women had the greatest difficulty reporting the type of chemotherapy received, specifically for medications commonly used in combination therapies (i.e., cyclophosphamide, methotrexate, and fluorouracil). Furthermore, agreement for the specific type of therapy given within the broad categories of treatment appeared to be higher for questions including prompts. For example, in regards to surgery, women were asked to specify whether they received a lumpectomy or a mastectomy, whereas in questions pertaining to the details of hormonal therapy and chemotherapy received, the open-ended instructions of "please list medicines if known" were given. Accordingly, future questionnaires designed to assess self-reported treatment history may want to consider adding prompts to aid recall, specifically when women are being asked to report on treatments that may include combination therapies. Although there are disadvantages associated with adding prompts (i.e., they may deter respondents from considering responses that are not already presented), at the very least further research is warranted to examine which approach 8 Journal of Cancer Epidemiology Table 4: Odds ratios (ORs) and 95% confidence intervals (CIs) for factors potentially associated with agreement between questionnaire and medical record information for broad categories of treatment, menopausal status and specific types of hormonal therapy in 939 women with breast cancer.   Variables were treated as continuous variables. 2 Odds ratio (OR) adjusted for all other variables included in the table. 3 Agreement analysis for radiation therapy was conducted excluding women with breast cancer recurrence (according to medical records). encourages more accurate reporting of specific therapies received.
In regards to the factors associated with agreement, women with breast cancer recurrence were less likely to report having received hormonal therapy, chemotherapy, all types of specific hormonal therapy medications, and the chemotherapy medications doxorubicin and docetaxel. This finding is consistent with a previous study conducted by Phillips et al. (2005), who found women with breast cancer recurrence at the time of questionnaire administration were less likely to report correctly on most questions (aside from chemotherapy treatment) [4]. Furthermore, a study conducted by Liu et al. (2010) also found that breast cancer recurrence was significantly associated with lower agreement when reporting having received chemotherapy [1]. This finding is likely attributed to women reporting treatment for the breast cancer recurrence instead of the original diagnosis.
No other factors were consistently associated with better agreement; however, older age at diagnosis, longer recall period, being born in a non-English-speaking country (in comparison to an English-speaking country), increased alcohol consumption, having a high-school education (in comparison to a bachelor degree), and being a current smoker (in comparison to a nonsmoker) were all associated with worse accuracy when reporting some specific types of chemotherapy medication. In regards to education level, alcohol consumption, and smoking status, it is possible these factors may indicate lower socioeconomic status.
Strengths of this study include a large sample size of women recruited from a sample of women initially identified from a population-based cancer registry. Furthermore, although four studies have previously examined the accuracy of self-reported breast cancer treatment, only three of these studies used medical records as the gold standard when assessing agreement, and only two of these studies examined potential factors associated with agreement. To our knowledge, this study is also the first study to examine agreement between menopausal status recorded in medical records at the time of diagnosis and retrospective assessment of menopausal status from a structured questionnaire.
There are several limitations to this study. Medical records may not always be an appropriate gold standard. For example, a prescription may have been given but was not retrieved in the current collection of medical records or the medication may not have actually been taken, although this is less likely with treatments given in a clinic setting such as radiation or most chemotherapy. However, agreement is still high despite this possibility. Furthermore, a Swedish prospective cohort study also confirmed the validity of selfreporting current hormonal therapy using a 7-day personal diary as the gold standard [9]. Accordingly, taking into consideration that self-reported treatment demonstrates high consistency with both medical records and a personal diary and the limitations of medical record abstraction described above, this study suggests that self-reported treatment is a reasonable option for gathering treatment data in broad categories. Additionally, the women in the study are largely Caucasian (96.5%), which limits the generalizability of the results beyond this group. Also, the study sample primarily included women with early-stage breast cancer; women with late-stage breast cancer, who are more likely to be severely ill, may have a more difficult time recalling the details of their treatment. Furthermore, women recruited were oversampled for family history, and it is possible that having a family history of breast cancer may make women more knowledgeable regarding treatment options, which in turn may influence agreement. However, we did not find that family history was associated with measures of agreement. More research is warranted to examine agreement between self-reported treatment and medical records among average-risk women, and particularly among a more ethnically diverse population. Given that we found that most characteristics, including age, education, and family history, were not associated with reporting accuracy, it is likely that the levels of agreement we observed would be similar in other nonimmigrant breast cancer patient populations. The self-reported treatment questionnaire also did not collect information on dates treatments were received, or on adherence, accordingly the accuracy of this self-reported information could not be examined, which may be an important consideration for some studies (e.g., those examining the association between treatment delay and survival). Finally, since we conducted many statistical tests, individual significant results could have occurred by chance. Consequently, the focus was placed on overall consistent patterns rather than individual results.
In conclusion, the results of this study demonstrate that women can accurately report important details regarding their breast cancer treatment, even, in some cases, for more specific questions pertaining to the details of treatment within the broader categories of hormonal therapy, chemotherapy, and surgery. Accordingly, self-report appears to be an inexpensive method of data collection pertaining to broad categories of cancer treatment, if not always specific drugs, and a feasible alternative to medical record abstraction in large epidemiological studies of breast cancer patients where intensive medical record abstraction may not be possible.