Validity of Administrative Databases in Comparison to Medical Charts for Breast Cancer Treatment Data

Objective Medical chart abstraction is the gold standard for collecting breast cancer treatment data for monitoring and research. A less costly alternative is the use of administrative databases. This study will evaluate administrative data in comparison to medical charts for breast cancer treatment information. Study Design and Setting A retrospective cohort design identified 2,401 women in the Ontario Breast Screening Program diagnosed with invasive breast cancer from 2006 to 2009. Treatment data were obtained from the Activity Level Reporting and Canadian Institute of Health Information databases. Medical charts were abstracted at cancer centres. Sensitivity, specificity, positive and negative predictive value, and kappa were calculated for receipt and type of treatment, and agreement was assessed for dates. Logistic regression evaluated factors influencing agreement. Results Sensitivity and specificity for receipt of radiotherapy (92.0%, 99.3%), chemotherapy (77.7%, 99.2%), and surgery (95.8%, 100%) were high but decreased slightly for specific radiotherapy anatomic locations, chemotherapy protocols, and surgeries. Agreement increased by radiotherapy year (trend test, p < 0.0001). Stage II/III compared to stage I cancer decreased odds of agreement for chemotherapy (OR = 0.66, 95% CI: 0.48–0.91) and increased agreement for partial mastectomy (OR = 3.36, 95% CI: 2.27–4.99). Exact agreement in treatment dates varied from 83.0% to 96.5%. Conclusion Administrative data can be accurately utilized for future breast cancer treatment studies.


Introduction
Breast cancer is the most frequently diagnosed cancer among women in Ontario and is the second leading cause of cancer death [1]. Abstraction of medical charts is considered the gold standard for collecting breast cancer treatment data for monitoring and research purposes. However, this process can be laborious and costly, especially when conducting large population-based epidemiology research [2]. An alternative method is the use of administrative databases.
Previous studies using US data have revealed that the overall agreement between medical charts and administrative data for breast cancer treatment type is high, but agreement for specific treatment types and dates has not been evaluated concurrently [3][4][5][6][7][8][9][10]. Only one study has examined receipt of breast cancer radiotherapy in administrative databases in comparison to medical charts [9] but there have been no studies that have examined the validity of the anatomic location to which radiotherapy was received or the validity of radiotherapy start and end dates. Three US studies have examined the validity of breast cancer chemotherapy protocols [6,7,10] and found that they were of high accuracy. One study has validated chemotherapy dates and found moderate agreement with medical charts [8]. Validity of breast cancer surgery data in administrative databases has been examined in Ontario [11], demonstrating 86.2% agreement with medical charts; however, this includes a small cohort from the early 1990s and only validated very broad categories of surgery. A more recent study evaluated agreement for breast cancer surgery between hospital records, medical claims, and the cancer registry in Manitoba and found substantial or almost perfect agreement between data sources [12]. Surgery dates, however, were not validated in either study. Several studies have examined factors associated with agreement between administrative data and medical charts and found that agreement decreased with age, stage, and diagnosis year and varied by treatment site [3-5, 7, 8].
The Activity Level Reporting (ALR) database is housed at Cancer Care Ontario and collects selected systemic therapy and all radiation treatment from regional cancer centres and their associated hospitals [13]. Admission and discharge information for surgeries in Ontario is collected by the Canadian Institute of Health Information's hospital abstracting databases (Discharge Abstract Database [DAD], National Ambulatory Care and Reporting System [NACRS]). The purpose of this study was to evaluate the validity of the ALR and CIHI's DAD and NACRS databases in comparison to medical charts for breast cancer treatment data and to examine factors that may influence agreement. Specifically, the validity of the ALR was evaluated for the receipt of radiotherapy and chemotherapy and for specific radiotherapy anatomic locations and chemotherapy protocols and their corresponding treatment dates. CIHI's DAD and NACRS databases were evaluated for the receipt of surgery and for specific surgery types and their corresponding dates.

Selection of Breast Cancer Cases. The Ontario Breast
Screening Program (OBSP) is a province-wide, organized screening program that provides high-quality breast cancer screening services for women aged 50 to 74 [14]. Women are not eligible if they have acute breast symptoms, a history of breast cancer, or current breast implants [14]. This study identified women aged 50-69 screened through the OBSP between January 1, 2006, and December 31, 2009, with an abnormal mammogram and a diagnosis of screen-detected invasive breast cancer. Exclusions included prevalent cancers detected on initial screens, premenopausal women, bilateral or nonprimary breast cancer, non-Ontario residents, diagnoses more than 1 year following abnormal screening, stage IV breast cancer, women who were missing information required to identify treatment centre location, and women who received treatment at a hospital with less than 15 eligible women.

Medical Chart Abstraction.
Trained chart abstractors visited regional cancer centres across Ontario and reviewed medical charts to abstract relevant treatment data for all eligible women. Regional cancer centres are specialized centres in Ontario that deliver all cancer radiotherapy, and patients may also be referred to a regional cancer centre for diagnostic work-up, systemic therapy, and/or treatment planning. These centres and their associated hospitals maintain detailed patient charts about breast cancer treatment information and they have been shown to have complete, highquality information [15,16]. To facilitate chart abstraction, a local collaborator was identified for each regional cancer centre. A chart abstraction form was developed to collect demographic, prognostic, and treatment data. Abstraction occurred between 2014 and 2016.
Age at diagnosis and year of diagnosis were recorded. Women's postal code of residence at screening was linked to the 2006 Canadian Census to determine income quintiles (Q1 (low)-Q5 (high)) and community status [17]. Community status included urban (population 10,000+), rural (<10,000 and a strong Metropolitan Influenced Zone (MIZ)), rural remote (<10,000 and a moderate MIZ), and rural very remote (<10,000 and a weak/no MIZ) [17]. Women were categorized as having no comorbidities if they had no preexisting illnesses other than arthritis or high blood pressure at the time of diagnosis and comorbid if they had any other preexisting illness at diagnosis outlined by the Charlson Index [18]. Breast cancer classification (invasive without associated ductal carcinoma in situ or invasive with associated ductal carcinoma in situ) was also recorded. Stage (I, II, and III) was based on the TNM classification scheme (6th edition) [19] and tumour grade was categorized as 1, 2, or 3. Women with negative results for estrogen and progesterone receptors (i.e., immunohistochemical assays showing <1% of tumour cells positive for antibody nuclear staining) [20] and negative for HER2/neu protein overexpression (score 0, 1+) [21] were categorized as triple negative. Treatment centre region (South Central, South Eastern, South Western, and Northern [22]) was classified according to the regional cancer centre a woman first attended. Radiotherapy data included the anatomic location of radiotherapy given, the start and end dates of treatment, and whether treatment was completed. Chemotherapy data included the chemotherapy protocol given, the start and end dates of treatment, and whether treatment was completed. Surgery data included the type of surgery performed and the date of the surgery.

Administrative Data.
Administrative data on radiotherapy and chemotherapy were obtained from the ALR. The ALR includes data submitted to Cancer Care Ontario by regional cancer centres and their associated hospitals. Radiotherapy data included the visit date of the activity, disease site, and the anatomic location of the body that received treatment. Chemotherapy data included the visit date of the activity, disease site, and medications given during chemotherapy.
Administrative data on surgery were obtained from CIHI's DAD and NACRS databases. DAD is a health services database that receives inpatient hospital discharge data directly from Ontario hospitals. NACRS is a health services database that receives ambulatory hospital and clinic discharge data from hospitals in Ontario. Surgery data included the date of the procedure, the type of procedure, and the main reason for the procedure (e.g., breast cancer).

Data Analysis.
Medical chart data was linked with the ALR and CIHI databases using the Ontario Cancer Registry group number. For ALR radiotherapy data, a record was excluded if it indicated a non-breast disease site, if the treatment end date was more than 3 months after the followup chart abstraction date or more than 18 months after the diagnosis date (as these records likely indicate treatment for a second primary tumour or metastasis), if it was within 18 months but related to a recurrence, or if it indicated an incomplete treatment. For ALR chemotherapy data, a record was excluded if it indicated a non-breast disease site, if treatment end date was more than 3 months after followup chart abstraction date or more than 24 months after the diagnosis date, if it indicated receipt of hormone therapy only, or if it indicated that the chemotherapy protocol was unknown. For CIHI surgery data, a record was excluded if it was unrelated to invasive breast cancer, if it was missing surgery date and type, if the surgery was not treatment related, or if the surgery occurred before diagnosis or more than 12 months after diagnosis.
Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and kappa statistics were calculated for the receipt of radiotherapy, chemotherapy, and surgery, as well as for specific radiotherapy anatomic locations, chemotherapy protocols, and surgery types. The kappa statistic accounts for chance agreement and was classified into five levels: slight agreement (<0.20), fair agreement (0.21-0.40), moderate agreement (0.41-0.60), substantial agreement (0.61-0.80), and almost perfect agreement (0.81-1.00) [23,24]. To compare radiotherapy anatomic locations, analyses were restricted to women that received radiotherapy according to medical charts. Bilateral internal mammary chain was grouped with the supraclavicular/axilla region and radiation to the breast and chest wall were grouped. To compare chemotherapy protocols, analyses were restricted to women that received chemotherapy according to medical charts and had only one chemotherapy protocol. Agreement analyses for chemotherapy protocols were further restricted to exclude women that received a clinical trial according to the ALR, as it was not noted which specific drug(s) they received. To compare surgery types, analyses were restricted to women that received surgery according to medical charts. Percent agreement for radiotherapy and chemotherapy start and end dates and surgery dates in medical charts and administrative databases was also calculated for an exact match, ±1 day and ±7 days. Dates were validated by type for the most frequent treatment type and restricted to those women with treatment types that matched between the medical charts and the administrative data.
Multivariable logistic regression estimated odds ratios (OR) and 95% confidence intervals (CI) for factors influencing agreement between administrative databases and medical charts. Age at diagnosis (50-59, 60-69), year of diagnosis (2006, 2007, 2008, and 2009), TNM stage (I, II/III), and treatment centre region (South Central, South Eastern, South Western, and Northern) were adjusted for in all models. An offset controlling for agreement due to chance was included in all models [25]. Regression analyses were conducted for the receipt of each therapy and for the most common treatment type for each therapy. All analyses were performed using SAS version 9.4 [26]. The study was approved by the University of Toronto Research Ethics Board and informed consent was not required.

Results
Overall, 2,518 eligible women diagnosed with stages I-III, unilateral, screen-detected invasive breast cancer were identified ( Figure 1). Women were excluded if their medical charts were not available ( = 40), not eligible after review ( = 66), or incomplete ( = 11). The final sample consisted of 2,401 women of whom 2,375 had complete radiotherapy data, 2,292 had complete chemotherapy data, and 2,400 had complete surgery data.
There were a total of 51,020 records in the ALR for the 2,375 women in the radiotherapy cohort (Figure 2(a)). Records were excluded if they indicated a non-breast disease site ( = 766), if the treatment end date was more than 3 months after follow-up chart abstraction date ( = 629), if the treatment end date was before diagnosis ( = 25) or more than 18 months after the diagnosis ( = 2,750), if treatment was within 18 months of diagnosis but was related to a recurrence ( = 73), or if the radiation course was incomplete ( = 3,430). There were a total of 34,539 records in the ALR for the 2,292 women in the chemotherapy cohort ( Figure 2(b)). Records were excluded if they indicated a nonbreast disease site ( = 1,330), if the treatment end date was more than 3 months after follow-up chart abstraction date ( = 1,989) or more than 24 months after diagnosis ( = 5,205), if the records were for hormone therapy only ( = 101), or if the chemotherapy protocol was unknown ( = 53). There were a total of 120,328 records in the CIHI databases for the 2,400 women in the surgery cohort (Figure 2(c)). Records were . Women with complete treatment information for radiotherapy, chemotherapy, and surgery represent overlapping cohorts and thus have very similar characteristics (Table 1). Approximately two-thirds of women were aged 60 to 69 and one-third were diagnosed in 2009. The majority were from an urban community setting and approximately onequarter were from the highest income quintile. More than half of the women did not report any comorbidities, and most were classified as having invasive breast cancer with associated ductal carcinoma in situ. Approximately two-thirds of women had stage I breast cancer, with half having an intermediate grade tumour. Most patients did not have triple negative hormone status. More than half of the women received treatment in the South Central region of Ontario, which represents the Greater Toronto Area.
For receipt of radiotherapy, sensitivity was 92.0%, specificity was 99.3%, PPV was 99.8%, and NPV was 71.9% ( Table 2). The kappa statistic showed substantial agreement between medical charts and the ALR (0.793). Sensitivity was high for breast/chest wall radiation (88.2%) and supraclavicular/axilla radiation (86.6%) but lower for breast boost (43.3%), while specificity was greater than 98% for both breast boost and supraclavicular/axilla radiation. PPV was more than 90% for all radiotherapy anatomic locations and NPV was more than 83% for all radiotherapy anatomic locations. The kappa statistic showed moderate agreement for breast boost (kappa = 0.520) and almost perfect agreement for supraclavicular/axilla radiation (kappa = 0.868).
For receipt of chemotherapy, sensitivity was 77.7%, specificity was 99.2%, PPV was 98.2%, and NPV was 88.5% ( Table 2). The kappa statistic showed substantial agreement between medical charts and the ALR (0.804). ), while specificity was greater than 94% for all chemotherapy protocols. PPV was more than 90% for all chemotherapy protocols, except for TC, which was 72.3%. The NPV for all chemotherapy protocols was above 80%. The kappa statistic showed substantial agreement for all chemotherapy protocols (kappa = 0.682 for FEC-D, 0.714 for AC, 0.733 for ACP/ACT, and 0.734 for FEC), except for TC, which showed moderate agreement (kappa = 0.559).
For receipt of radiotherapy, odds of chance-corrected agreement were 1.41 times higher for women aged 60-69 compared to women aged 50-59 (OR = 1.41, 95% CI: 1.00-1.99;  medical charts and administrative data (

Discussion
This study found that sensitivity, specificity, PPV, NPV, and kappa were high for receipt of radiotherapy, chemotherapy, and surgery. Agreement decreased slightly when considering specific radiotherapy anatomic locations, chemotherapy protocols, and surgery types. Odds of chance-corrected agreement tended to increase with more recent diagnosis year and were impacted by stage of treatment. Approximately 95% of start and end dates for radiotherapy and chemotherapy and surgery dates in administrative databases were within a week of the dates recorded in the medical charts. Agreement for receipt of radiotherapy overall was substantial. Sensitivity and PPV were high for breast/chest wall radiation; however specificity and NPV could not be reliably calculated due to minimal variability, as almost all women received this type of radiation. Agreement was also very high for supraclavicular/axilla radiation, but there was lower sensitivity and only moderate agreement for breast boost. Moderate agreement for breast boost is likely a result of select misclassification in the ALR (i.e., if the original treated site is recorded instead of coding the treatment as a breast boost). Overall, results of this study are consistent with previous research with other administrative databases in the US, which indicated the agreement for the receipt of radiotherapy as substantial (kappa = 0.70 to 0.79) [3,6]. To our knowledge, this was the first paper to validate the anatomic location that received radiotherapy.
Results for chemotherapy agreement are also consistent with previous work from the US, which indicated that the agreement for the receipt of chemotherapy was substantial (kappa = 0.62 to 0.79) [3][4][5]8] or almost perfect (kappa = 0.82 to 0.89) [6,10]. Agreement was slightly lower for specific chemotherapy protocols, consistent with other research [6,10]. Disagreement in chemotherapy protocols in our study was often a result of the ALR missing one drug from a multidrug protocol or listing no treatment information for a woman who had corresponding records in the medical charts. The latter may be explained by the limitation that some smaller hospitals in Ontario did not report to the ALR during the time period of our study [13].
Sensitivity, specificity, and PPV were high for receipt of treatment surgery, which aligns with previous research on 1991 data from the same CIHI database [11]. Agreement was lowest for axillary node dissection and sentinel lymph node biopsy, which was expected because reporting of these procedures in CIHI databases was only optional until 2015 if it occurred during the same episode as the primary lumpectomy or mastectomy. Previous research in Manitoba  comparing hospital records with medical claims and the Manitoba Cancer Registry also found lower agreement for axillary node dissection when compared to other surgery types [12]. Results from this study indicated that agreement for receipt of chemotherapy is not impacted by age. This result is consistent with some literature from the US [7], but not other literature which showed a decrease in agreement with increases in age [4]. Our study also showed that chemotherapy and radiotherapy agreement increased with more recent diagnosis year. Although this finding was inconsistent with other literature [4], it was expected in our study based on active efforts to increase the accuracy of the ALR after its establishment in the late 1990s. Results from this study do align with previous research for stage, with more advanced stage of breast cancer resulting in higher odds of disagreement for receipt of chemotherapy [4]. This may suggest poorer agreement for palliative versus curative treatments. Conversely, we found that agreement increased with advanced stage of breast cancer for partial mastectomy, possibly indicating more substantial agreement for more complex cases requiring longer hospital stays.
Exact agreement in radiotherapy start dates was extremely high at 96.5%. There was slightly less agreement in radiotherapy end dates; however, the discrepancies were minor as agreement increased from 83% to 95.4% when considering end dates within 1 week of the medical chart end dates. Agreement in chemotherapy start and end dates was also high, with disagreements mostly due to missing ALR records at the beginning of treatment protocols. Agreement in dates for surgery was extremely high. Mismatched dates mostly occurred in women whose first treatment surgery was also diagnostic, as these procedures were not coded in CIHI as breast cancer-related and were therefore excluded during data cleaning. To our knowledge, only one previous study has validated dates for breast cancer treatment, finding only 86% agreement ±30 days [8]. Overall, treatment dates in the ALR and CIHI databases were highly accurate, which means that this data is being used reliably for monitoring and evaluation of Ontario treatment wait times.
Strengths of this study include a large sample size and use of data from a population-based cohort of screened women, with access to the medical charts of more than 95% of the eligible women. Also, this was the first study to our knowledge to validate breast cancer treatment types and dates in the ALR and CIHI databases. However, there are several limitations. First, the definitions used for the type of therapy may have differed between medical charts and administrative data. In addition, some types of treatments could not be compared because they were not present in both data sources. The results may not generalize to women diagnosed outside of the OBSP, those with stage IV breast cancers, in situ breast cancers, or other cancers which might have different treatments, such as oral rather than systemic chemotherapy. Finally, while medical charts were used as the gold standard, previous research has suggested that a true gold standard may not exist [27].

Conclusions
Agreement between medical charts and administrative databases for breast cancer treatment data varied from moderate to almost perfect, depending on treatment type. In future Ontario studies, chart review may not be required for collection of breast cancer treatment data. Future research could validate more specific treatment details, such as radiotherapy dose levels, to determine if administrative databases suffice for more detailed epidemiological research.

Disclosure
The Canadian Institutes of Health Research had no involvement in the study design, data collection, analysis, interpretation, manuscript preparation, or the decision to submit the manuscript for publication.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.  The corresponding administrative database is the Activity Level Reporting database (ALR) for radiotherapy and chemotherapy treatment data, and Canadian Institute for Health Information (CIHI) databases for surgical data. 2 Including brachytherapy/internal radiation, clinical trials, and unknown and other types of radiotherapy. N = 2,375 for receipt of radiotherapy (women with complete radiotherapy data in medical charts), and N = 1,970 for radiotherapy types (women who received radiotherapy according to medical charts). 3 Including clinical trials, unknown, and all chemotherapy protocols. N = 2,292 for receipt of chemotherapy (women with complete chemotherapy data in medical charts), and = 652 for chemotherapy protocols (women who received one chemotherapy treatment according to medical charts). Table 3: Odds ratios (ORs) and 95% confidence intervals (CIs) comparing agreement versus disagreement between medical records and administrative database information in women with invasive breast cancer for each treatment and its most common type.