La validité d ’ un diagnostic de maladie obstructive pulmonaire chronique à partir d ’ une grande base de données administrative

1Centre de recherche, Centre de pneumologie, Hôpital Laval, Institut universitaire de cardiologie et de pneumologie de l’Université Laval, Québec, Québec; 2Department of Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota, USA; 3Réseau québécois de l’asthme et de la MPOC, Québec Correspondence: Dr Yves Lacasse, Centre de Pneumologie, Hôpital Laval, 2725 Chemin Ste-Foy, Ste-Foy, Québec G1V 4G5. Telephone 418-656-4747, fax 418-656-4762, e-mail Yves.Lacasse@med.ulaval.ca Y Lacasse, VM Montori, C Lanthier, F Maltais. The validity of diagnosing chronic obstructive pulmonary disease from a large administrative database. Can Respir J 2005;12(5):251-256.

A dministrative databases can be useful to clinical researchers.
Health authorities (often the payers of health care) create and maintain administrative databases by compiling claims data sets.Claims data include the patient diagnosis that motivated the provision of services and the charges for the services provided.Typically, the database includes patient demographics and patient-level data about the utilization of health care resources.Administrators and health care researchers can harness the information in these databases to ascertain resource utilization, even if utilization involves several providers and health care centres.However, the information is limited to services that are billable and to claims submitted to the same payer.When one payer reimburses all health care provisions, these databases afford the opportunity to conduct large population-based observational studies that could answer health services questions (eg, questions about utilization or quality of care) and clinical questions with minimal bias due to referral, nonresponse and dropouts (1).
Despite the potential advantages of administrative databases, the underlying validity of the diagnoses they include must be considered.Incorrect clinical diagnoses (a problem not just of administrative but also of clinical databases including medical records), incorrect billing after a correct diagnosis and clerical errors can all limit the validity of diagnoses in administrative databases.
Chronic obstructive pulmonary disease (COPD) offers the following characteristics that make it suitable to evaluate the ©2005 Pulsus Group Inc.All rights reserved

ORIGINAL ARTICLE
validity of an administrative database: COPD is very prevalent; it is a chronic condition with exacerbations requiring medical attention; family physicians, internists and specialists participate in the care of patients with this condition; and it mimics other chronic lung diseases like asthma.The present study aimed to determine the validity of the diagnosis of COPD in the administrative database of the Quebec universal medical insurance register (Régie de l'assurance-maladie du Québec; RAMQ).

Data source
RAMQ is the government body responsible for universal medical insurance registration for the 7.4 million people that live in the province of Quebec.The RAMQ database includes claims filed for physician services (with a few exceptions, such as general practitioners paid on wage contracts) and prescriptions in Quebec (2).The physicians' services data include patient's age and sex, medical procedures, the specialty of the physician who filed the claim with RAMQ and the diagnoses.RAMQ does not verify these diagnoses, which physicians usually record at the time of the encounter.Furthermore, these diagnoses are not necessary to obtain reimbursement.The RAMQ database uses the Ninth International Classification of Diseases (ICD-9) (3) to code diagnoses (Table 1).
The RAMQ database includes prescription data, including the drug name and dispensation date, on all prescriptions filled for registered patients aged 65 years or older and for patients on social security.The database, however, does not include information on medications dispensed during hospitalization or a nursing home stay, or about home oxygen use.In addition, RAMQ collects information on diagnostic and therapeutic procedures performed in hospitals and ambulatory facilities; spirometry is not included because it is not reimbursed.
Permission was obtained from the Commission de l'accès à l'information du Québec to access the RAMQ database.When used for research, a unique number identifies each patient in the database.This number is encrypted to ensure that patients cannot be identified by the researchers.

Internal validation procedure
All entries matching the diagnosis of COPD (ICD-9 codes 490 [see sensitivity analysis below], 491, 492 and 496) (4), and those matching the diagnosis of asthma (ICD-9 code 493) were obtained from the RAMQ from the period between April 1, 1994 and March 31, 1999.Acute bronchitis (ICD-9 code 466) was not considered in the analysis.Outpatient as well as hospital claims were included.Unless otherwise specified, all entries were handled on a per-patient basis (rather than on a per-occurrence or per-visit basis).For instance, a patient who registered four times as having COPD during the study period following as many visits would contribute only once to the analysis.

Validity criteria for COPD in the RAMQ database
Before obtaining the data, a set of validity criteria for the RAMQdiagnosis of COPD was identified, taking into account the epidemiology of COPD in the province and its standard treatment.
It was proposed that COPD is rare in patients younger than 40 years of age and that it becomes increasingly common with age (4).The risk for COPD increases directly with the intensity of cigarette smoking (5), which is the most important risk factor for COPD; many patients with a diagnosis of COPD have a history of cigarette smoking for more than 20 pack-years (6).Thus, COPD mostly affects the elderly and is a rare occurrence in patients younger than 40 years of age.
To evaluate these criteria, the prevalence of COPD was estimated (according to ICD-9 codes) by age groups (45 to 54 years, 55 to 64 years, 65 to 74 years and 75 years or older) using the 1996 ageadjusted Quebec population as the denominator.These data were then compared with prevalence data from the National Population Health Survey that Statistics Canada conducted in the period of 1994/1995 (4,7).Although this survey recorded patient-reported diagnoses of chronic bronchitis or emphysema made by a health care professional, it nevertheless provided estimates of the prevalence of COPD that were similar to spirometry-based estimates reported in other studies (8).Moreover, this survey provided national (rather than provincial) estimates of the prevalence of COPD, and these prevalence figures may underestimate the true prevalence of COPD because of a higher smoking rate among French Canadians (mostly living in the province of Quebec) compared with English Canadians (9).
To be credible, the RAMQ-diagnosis of COPD should occur repeatedly in the entries for a given patient and should occur rarely in patients with asthma.Although COPD and asthma may coexist and are often treated in the same way, prescriptions of theophyllines and anticholinergics should be more frequent in COPD cases than in asthma; leukotriene receptor antagonists should not be prescribed in COPD cases.
To assess the above criteria, the number of patients registered as having COPD, asthma and COPD plus asthma were counted.For patients with a RAMQ-diagnosis of COPD (and not of asthma), the number of times they received this diagnosis in the five-year study period (up to 10 occurrences) was counted.The proportion of patients aged 65 years or older who filled prescriptions for leukotriene receptor antagonists (an asthma-specific medication), ipratropium bromide and theophylline (medications for COPD that clinicians rarely prescribe for patients with asthma [10]) was computed at least once over the five-year study period.For comparison, the proportion of patients who filled prescriptions for inhaled beta-2 (β2)-agonists (medications appropriate for patients with either COPD or asthma) was also computed.

Operational definitions of COPD
Finally, two operational definitions of COPD were proposed.These definitions are arbitrary and did not rely on smoking history or valid spirometric data because such information was not available in the database.In the database, patients who had 'possible COPD' were defined as those aged 65 years or older who were always registered as such (never as asthmatics), who appeared at least three times in the database and who filled prescriptions for ipratropium bromide or β2-agonists.Patients who had 'probable COPD' satisfied the criteria for 'possible COPD' but were also given the diagnosis of COPD by a specialist (either a pulmonologist or an internist) at least once.This latter criterion reflects the belief that a visit to a specialist for COPD increases the likelihood that the diagnosis of COPD was based on spirometry.

Sensitivity analyses
Of the four selected ICD-9 codes defining COPD (490, 491, 492 and 496), 490 (bronchitis, not specified as acute or chronic) is illdefined and may not specifically represent COPD.It was hypothesized that the inclusion of patients with this ICD-9 code in the analysis would decrease the specificity of the RAMQ-diagnosis of COPD.Therefore, the primary analysis includes only the codes 491, 492 and 496.However, because researchers have traditionally included the 490 code to define COPD (11-13), the same analyses described above were also performed including ICD-9 490 entries.

Study cohort
From April 1, 1994 to March 31, 1999, the RAMQ database included the following: 209,308 individuals (3% of the Quebec population) of all ages who had, at least once in this five-year period, a diagnosis of COPD; 648,055 individuals who had a diagnosis of asthma; and 93,879 individuals who had on separate occasions the diagnoses of COPD and asthma.

Internal validity of the RAMQ-diagnosis of COPD
Validity criteria for COPD in the RAMQ database: The prevalence of COPD increased with age (Table 2) and occurred infrequently in patients younger than 40 years of age (Table 3).Approximately 32% of those aged 40 years or older with a RAMQ-diagnosis of COPD also had a RAMQ-diagnosis of asthma; 33% of those with a RAMQ-diagnosis of asthma also had a RAMQ-diagnosis of COPD (Figure 1A). Figure 2 describes the number of RAMQ-diagnoses of COPD per patient during the five-year study period.Forty-two per cent of patients with a RAMQ-diagnosis of COPD (who never had a RAMQ-diagnosis of asthma) appeared only once with that diagnosis in the database (median number of occurrences =1, interquartile range 1 to 5).Table 4 describes the extent to which patients filled prescriptions.Ipratropium bromide was prescribed to 11% of patients with asthma compared with 33% of patients with COPD.Similarly, theophylline was prescribed to 14% of patients with asthma compared with 23% of patients with COPD.Leukotriene receptor antagonists were used rarely in both groups.Almost one-half (47%) of patients with a RAMQ-diagnosis of COPD and aged 65 years or older did not fill prescriptions for inhaled β2-agonists (the first-line pharmacological therapy for COPD during the study period).Similarly, a relatively low prescription rate for β2-agonists was found in asthma patients.The proportion of patients with both the diagnoses of COPD and asthma that filled prescriptions was higher than in any of the other groups.The fact that these patients must have at least two entries in the database to fall into this category (with the consequence of an increased likelihood of true airway disease) may explain this situation.
Operational definitions of COPD: It was found that 42,652 patients fulfilled the criteria for 'possible COPD' and 26,311 patients fulfilled the criteria for 'probable COPD' (Table 5).Respectively, they represent 37% and 23% of all patients aged 65 years or older who were registered at least once as having COPD (and never asthma) in the RAMQ database.
Sensitivity analyses -including ICD-9 code 490 in the RAMQ-diagnosis of COPD As shown in Table 2, there were 297,809 patients aged 45 years or older who received the diagnosis of bronchitis (ICD-9 code 490) in the RAMQ database during the study period.When classified as COPD, these patients represent approximately two-thirds of all patients with a RAMQ-diagnosis of COPD.Their inclusion greatly inflated the prevalence of this condition across all age groups (Table 2), increasing the degree of overlap between the RAMQ-diagnosis of COPD and the diagnosis of asthma (57% of patients with a RAMQ-diagnosis of asthma also had a RAMQ-diagnosis of COPD, Figure 1B), and increasing the number of so-diagnosed patients not filling prescriptions for inhaled β2-agonists (63%, Table 4).

DISCUSSION
Overall, we were able to verify that the RAMQ-diagnosis of COPD is consistent with the epidemiology of this condition, including age-dependent increases in the prevalence of COPD, with few cases in patients younger than 40 years of age.However, for patients aged 65 years or older, the prevalence of COPD estimated from the database was greater than two times that of the prevalence derived from the 1994/1995 National Population Health Survey (4,7).A significant number of children and young adults were identified from the database as suffering from 'emphysema' or 'chronic airway obstruction, not elsewhere classified'.In addition, we found an important extent of the codiagnosis of COPD and asthma, which was not untangled after considering the prescription of COPD-or asthma-specific medications.Perhaps most notable, given the five-year study period, was the rarity of multiple visits and appropriate therapy.Including ICD-9 490 code entries into the analyses inflated the prevalence and decreased the specificity of the RAMQdiagnoses of COPD and asthma.

Limitations of databases in the diagnosis of obstructive lung diseases
A diagnosis of COPD in clinical practice requires a history of chronic progressive symptoms (cough and/or wheeze and/or breathlessness) supported by objective evidence of airway obstruction, ideally using spirometric testing (6).Although spirometry does not fully capture the impact of COPD on a patient's health, it remains the gold standard for diagnosing the disease and monitoring its progression (5).Unfortunately, spirometry is not reimbursed in Quebec and, therefore, spirometric data are absent from the RAMQ database.Other limitations of the RAMQ database particular to the diagnosis of COPD include lack of information about nicotine-dependence interventions, home oxygen use and pulmonary rehabilitation.
Our work cannot pinpoint whether the source of 'noise' in these data comes from incorrect clinical diagnoses, incorrect billing diagnostic codes despite correct clinical diagnoses, or database-level clerical errors.We did not review medical records from patients with spirometry-documented COPD to assess the diagnostic validity of the RAMQ database.The ideal validation study would identify a random sample of patients with a RAMQdiagnosis of COPD and would review patient medical records or conduct clinical assessments to verify the diagnosis.To avoid selection bias, investigators would have to sample patients of all ages receiving care at all clinical settings (private practice, walk-in clinics, primary, secondary and tertiary hospitals) in all regions of the province.For example, investigators used this approach to evaluate the validity of an algorithm to identify patients with COPD in the United Kingdom General Practice Research Database (14).The chance-adjusted agreement between the database-diagnosis and the general practitioners' diagnosis of a small representative sample of patients in the database (225 with COPD and 75 with asthma) was moderate (κ=0.52).Of note, pulmonary function tests supported only 38% of the practitioners' diagnoses of COPD.Furthermore, we could not determine whether the small proportion of patients with specialist-endorsed diagnoses of COPD reflect poor specificity of the RAMQ-diagnosis of COPD or the limited patient access to specialists.Similarly, we could not determine whether the lack of use of adequate pharmacological treatment in patients with a RAMQ-diagnosis of COPD reflects poor specificity of the database-diagnosis or inappropriate prescribing.Finally, although more credibility may be given to the diagnosis of those classified as 'possible COPD' or 'probable COPD' after operational definitions, these diagnoses cannot be verified.

Validity of diagnoses versus validity of recording
Readers should distinguish our approach from a research design that verifies the fidelity with which a database records clinical diagnoses.Such a design does not test the validity of diagnoses, but rather the validity of recording.Using the Saskatchewan health care datafiles, Rawson and Malcolm (15) selected patients discharged with a primary diagnosis of 'chronic airway obstruction' (ICD-9 code 496).They found excellent agreement between the physician service claims files in the database and the medical records.This research and its findings support the use of the database to study, for instance, the health care utilization of a cohort of patients characterized clinically (ie, using medical records) as having COPD.This cannot be generalized to all databases because others have found only a moderate agreement between the response to survey questions about asthma symptoms in the past 12 months and physician claims in the Manitoba Population Health Repository (16).
In contrast, the present study looked at the validity of the RAMQ-diagnosis of COPD, and our results are consistent with another study that used a similar approach (17).Using the administrative database of a managed care organization, these investigators classified patients into five mutually exclusive categories.These ranged from current medical/pharmaceutical asthmatics (ie, children with asthma claims and at least one filled prescription for asthma medication during a one-year period) to current pharmaceutical asthmatics (ie, children who filled a prescription for asthma medication, but who never had asthma claims).Overall, investigators found concordance between the claims-based categories and the asthma diagnosis according to the medical record.However, 30% of the children in the 'asthmatics' claims-based category did not have this condition according to chart review.Both that study and the present study represent two examples of investigations challenging the validity of diagnoses in a large administrative database.

CONCLUSIONS
Clinical and health services researchers should attempt to validate the information in large administrative databases before using them to generate hypotheses about health care utilization, quality of care or clinical outcomes.Readers should draw weak inferences from studies that rely on administrative databases with unknown or poor validity.
COPD in an administrative database Can Respir J Vol 12 No 5 July/August 2005 253

Figure 1 )
Figure 1) Nonproportional Venn diagram describing the distribution of patients aged 40 years or older with a Régie de l'assurance-maladie du Québec-diagnosis of chronic obstructive pulmonary disease (COPD) and/or asthma.(A) Primary analysis of the Ninth International Classification of Diseases (ICD-9) codes 491 (chronic bronchitis), 492 (emphysema) and 496 (chronic airway obstruction, not elsewhere classified); (B) same entries plus ICD-9 code 490 (bronchitis, not specified as acute or chronic)

Figure 2 )
Figure2) Proportion of patients aged 40 years or older with a Régie de l'assurance-maladie du Québec-diagnosis of chronic obstructive pulmonary disease (but not asthma) according to the number of occurrences in the database.Black bars and diamonds represent the number of patients and cumulative percentage for the primary analysis; grey bars and circles represent the same after including the Ninth International Classification of Diseases code 490

TABLE 3 Ninth
International Classification of Diseases (ICD-9) diagnoses according to age