Administrative health data are increasingly being used to study a variety of medical and surgical outcomes. In the study of gastroesophageal reflux disease (GERD), population-based data have been used to examine the effect of surgery on erosive esophagitis [5], the risk of esophageal adenocarcinoma after fundoplication [11], and trends in resource utilization by therapeutic method [1, 2, 68].

Administrative data lend feasibility to the study of rare outcomes and enable population-based longitudinal follow-up evaluation of large numbers of subjects without great cost or the logistical impediments associated with primary data collection. The use of these data to study large segments of the population increases the external validity (generalizability) of observational research. Administrative health data are ideal for studying the therapeutic effectiveness of health interventions under conditions of “typical” practice. However, administrative databases are designed for administrative and physician claims purposes, not research.

To a large extent, the scientific validity and credibility of health services research using administrative data depend on the accuracy of diagnosis and procedure codes. In Ontario’s administrative health databases, demographic information is complete and reliable. There are high levels of agreement on specific surgical procedure codes, but diagnosis codes (both primary and secondary) vary in completeness and accuracy [12]. The purpose of this study was to determine the accuracy of GERD-related diagnosis coding in administrative data. This study was conducted as part of a larger study on the use and outcomes of surgery for GERD in Ontario.

Materials and methods

Study design

We conducted a cross-sectional study of individuals undergoing upper gastrointestinal endoscopy in the city of Toronto, Ontario, Canada.

Data sources and study populations

Data on patient diagnoses were obtained from two sources: a clinical cohort of patients undergoing upper gastrointestinal endoscopy between January 1, 2000 and June 30, 2001 and the Canadian Institute for Health Information (CIHI) and Same Day Surgery (SDS) discharge abstract databases. These databases contain information on all the residents of Ontario, Canada who have undergone an inpatient or ambulatory procedure. We were interested in patients who had undergone an upper gastrointestinal endoscopy (esophagogastroduodenoscopy [EGD]) because the combination of patient symptoms and endoscopic abnormalities is 97% specific for the diagnosis of GERD [10].

Study subjects were identified from the endoscopy records of four gastroenterologists in the greater Toronto area. All the endoscopies had been performed in one of three teaching hospitals. The date of birth, gender, date of endoscopy, and clinical diagnosis were abstracted from medial records. All data collection was performed by the first author (SRL) and a trained assistant using a standardized form. Data were cleaned by ensuring sensible data ranges in addition to performing random data checks.

Data available in the CIHI pertaining to the hospitalization included patient gender, date of birth, date of procedure, and patient diagnoses. Before the fiscal year ending with 2001, diagnostic coding was according to the International Classification of Diseases Revision-9 (ICD-9). In this database, there is a field for a primary diagnosis code and fields for up to 15 secondary diagnoses for coexisting conditions. The GERD-related ICD-9 diagnosis codes we studied were heartburn (787.1), esophagitis (530.1), ulcer of the esophagus (530.2), and esophageal stricture (530.3). We also sought to assess the validity of acid-related stomach conditions such as gastritis (535.0–535.5), gastric ulcers (531.0–531.9), and duodenal ulcers (532.0–532.9).

Data linkage

Individual-level data were linked to the CIHI database through a multistep sequential process. Attempts were made to link each patient’s Ontario health card number to his or her unique encrypted administrative database identifier, after which all other personal identifying information was deleted. Clinical records then were linked to the Ontario Health Insurance Plan (OHIP) physician claims database by date of procedure and unique identifier. Relevant upper gastrointestinal endoscopy fee codes in the OHIP include esophagoscopy (Z515), elective esophagoscopy-gastroscopy (Z399), esophagoscopy-gastroscopy for active bleeding (Z400), gastroscopy (Z527), gastroscopy with removal of a foreign body (Z547), and repeat gastroscopy within 3 months after previous gastroscopy (Z528).

Records differing by more than 7 days between the OHIP service date and the abstracted procedure date were excluded. When a match was not possible, the dates were inspected to ensure that the day and month were not reversed. Patients who had a valid OHIP record were subsequently linked to the CIHI and SDS databases using the encrypted unique identifier. Other exclusion criteria specified repeat procedures within the study period, age younger than 18 years, residence outside Ontario, and invalid health care number.

Statistical analysis

The primary analysis aimed to determine the sensitivity, specificity, and positive predictive value (PPV) of the “most-responsible diagnosis” of esophagitis in the CIHI for the prediction of esophagitis in the clinical record. The main objective of this study was to test the ability of administrative databases to identify accurately a disease state when present. Hence, we were not interested in measures such as the negative predictive value (i.e., not having a disease in the absence of its diagnosis code).

Secondary analyses determined the performance characteristics of diagnosis codes for heartburn, ulcer of the esophagus, stricture of the esophagus, gastritis, gastric ulcer, and duodenal ulcer. We further tested the properties of a GERD diagnosis code by defining it as any one of its associated terms: esophagitis, heartburn, ulcer, or stricture. Kappa (κ) statistics were computed to determine chance-corrected agreement between the chart and the CIHI discharge abstract for each of the respective diagnoses. Analyses then were repeated for each of the respective diagnostic codes to determine their diagnostic properties when listed in the CIHI either as the “most responsible” diagnosis or as a secondary diagnosis. For each of the diagnostic measures, 95% confidence intervals are presented using the normal approximation of the binomial distribution. Diagnoses abstracted from medical records contained in the chart were considered to be the gold standard.

Barrett’s esophagitis represents a severe premalignant form of long-standing GERD. In addition to esophagitis, Barrett’s esophagus often is associated with ulceration and stricturing of the lower esophagus. There is no ICD-9 code for Barrett’s esophagus. Given the absence of a specific diagnosis code for Barrett’s esophagus, endoscopic Barrett’s esophagus was categorized as esophagitis for purposes of analysis. We performed a sensitivity analysis of our results by excluding these patients.

All statistical analyses were performed using SAS version 9.2 (SAS Institute, Cary, NC, USA) and Microsoft Excel 2003 (Microsoft Corporation, Redmond, WA, USA). The study protocol was approved by the Research Ethics Boards of Sunnybrook and Women’s College Hospital and the Faculty of Medicine, University of Toronto.

Results

We abstracted the charts for 591 patients who had upper gastrointestinal endoscopy between January 1, 2000 and June 30, 2001. Of these, 480 (81.2%) matched exactly the physician claim dates in the OHIP, and 500 (84.6%) matched within the 7-day window. A total of 91 records were excluded for the following reasons: invalid personal identifier (n = 20), missing procedure date (n = 1), no corresponding OHIP record (n = 16), and OHIP service date differing from the abstracted procedure date by more than 7 days (n = 54). No systematic differences were identified between linked and excluded patients (Table 1), with the exception of a higher rate of esophagitis in the linked cohort (p = 0.006).

Table 1. Comparison between patients who linked to the Canadian Institute for Health Information (CIHI) database and those who did not

Of the 500 patients matched to electronic administrative records and included in the final cohort, only one patient had a CIHI most responsible code of 787.1 (heartburn). We therefore assumed that patients with symptoms of GERD who had no macroscopic changes of the esophagus were coded as having “esophagitis.” This is consistent with the observation during abstraction that the spectrum of esophagitis reported by gastroenterologists ranged from “microscopic” to “erosive.” The diagnostic properties (sensitivity, specificity, and PPV) of the remaining CIHI most responsible diagnoses studied are summarized in Table 2. The sensitivity and specificity of a most responsible ICD-9 code 530.1 (esophagitis) for prediction of an abstracted clinical diagnosis were 46.8% and 98.8%, respectively. The PPV of a most responsible code of esophagitis was 94.8%, indicating strong correlation between a positive code and clinical disease. The kappa statistic was 0.53, a marker of moderate overall agreement between the two data sources [9], likely related to a high rate of false negatives (16.6%).

Table 2. Performance characteristics of the Canadian Institute for Health Information (CIHI) most responsible diagnosesa

The specificities of the diagnostic codes for esophageal ulcers and esophageal strictures were 98% and 99.8%, respectively. However, the sensitivity and PPV for a CIHI diagnosis of “esophageal ulcer” were only 14.3% and 9.1%, respectively. Only seven patients (1.4%) in the cohort had documented esophageal ulceration, all of which were secondary to GERD. A total of 14 patients (2.8%) had strictures at endoscopy. The sensitivity of a CIHI most responsible diagnosis of “esophageal stricture” was 50%, and the positive predictive value was 87.5%.

The diagnostic properties of any CIHI diagnosis code (either “most responsible” or “secondary”) for the prediction of the various upper gastrointestinal diseases are reported in Table 3. For each of the respective disease states, sensitivity markedly improved with the broadened criteria at minimal expense to the specificity and PPV. For a CIHI diagnosis of esophagitis (ICD-9 530.1), the sensitivity was 70.5%, the specificity was 97.7%, and the PPV was 93.2%. The κ statistic was 0.73, indicating good agreement between the clinical and administrative datasets [9].

Table 3. Performance characteristics of all Canadian Institute for Health Information (CIHI) diagnosesa

There is no ICD-9 code for GERD. Instead, we defined GERD in the CIHI by the presence of at least one code for heartburn, esophagitis, esophageal ulcer, or esophageal stricture. When limited to only the most responsible diagnosis in the CIHI, the sensitivity for GERD was 56.1%, the specificity was 98.5%, and the PPV was 94.8%. When all diagnoses were considered, including secondary diagnoses, the sensitivity increased to 78.7%, whereas the specificity decreased slightly to 96.7% and the PPV to 92.1%.

In our cohort, 27 patients (5.4%) were identified on clinical charts as having Barrett’s esophagus. Excluding subjects with Barrett’s esophagus did not affect our principal results.

Discussion

We compared diagnosis codes in administrative health databases in Ontario, Canada with primary data collected from the charts of 591 patients who had undergone an upper gastrointestinal endoscopy. Our study is the first attempt to measure the validity of administrative diagnosis codes for the diagnosis of upper gastrointestinal diseases, specifically those for GERD-related disorders. Overall, we found that the ICD-9 diagnosis codes were highly specific and associated with strong PPVs, but had poor sensitivity. For each of the diagnosis codes, without significant impairment to the specificity or PPV, sensitivities became much better by considering a secondary diagnosis code to indicate GERD, in addition to a “most responsible” diagnosis code.

Patients with uncomplicated GERD were almost uniformly coded using ICD-9 530.1 (esophagitis) rather than 787.1 (heartburn). This may reflect the perception among physicians that GERD is a “disease” rather than a symptomatic report of heartburn, at least among patients with symptoms sufficiently severe to prompt an upper endoscopy. It also is possible that this is the standard way that medical record technicians code a diagnosis when there is a mention of GERD, as apposed to the distinction made by physicians between esophagitis and heartburn. We found that a primary diagnosis of esophagitis, representing approximately 30% of our endoscopic cohort, is coded accurately with a sensitivity of 46.8%, a specificity of 98.8%, and a PPV of 94.8%. With the addition of secondary diagnosis codes, 530.1 predicted esophagitis with a sensitivity of 70.5%, a specificity of 97.7%, and a PPV of 93.2%.

Our results are consistent with the findings of studies investigating the accuracy of diagnosis codes for other gastrointestinal diseases. In hospital discharge abstract data from 100 patients in the U.S. Department of Veterans Affairs, an ICD-9 diagnosis of hemorrhoids was associated with a PPV of 97% [4]. In a review of seven V.A. records from patients with an ICD-9 diagnosis of celiac sprue, all were confirmed to have the disease in their respective clinical records [3].

Coding for complicated disease was more variable. Whereas the performance of esophageal stricture codes was good, the coding of esophageal ulcers was not. Given the relatively few cases of these conditions, the precision of our estimates is low. It appears that the diagnosis of esophageal ulcers is overestimated in the administrative data. Of the 22 discharge abstracts given the ICD-9 code 530.2 (ulcer of the esophagus), 19 (86.4%) were false-positive diagnoses. Perhaps this is not entirely surprising because the distinction between “erosive” and “ulcerative” esophagitis may be too subtle for coders. In view of these findings, “outcomes” studies that try to “control” for GERD severity may need to define complicated disease by the presence of an esophageal stricture rather than by an administrative diagnosis of an esophageal ulcer.

The use of administrative data in health research is increasing. There is an urgent need to conduct population-based studies of gastrointestinal diseases to evaluate the effectiveness of therapies. For example, the increased use of laparoscopic surgery for GERD, as demonstrated by Finlayson et al. [6], and the emergence of endoscopic antireflux therapies provide an opportunity for large-scale effectiveness studies to evaluate their effects in actual clinical practice. We appraised a method for identifying patients with GERD through hospital discharge abstracts by combining diagnosis codes for heartburn, esophagitis, esophageal ulcer, and esophageal stricture. For patients who had undergone an upper gastrointestinal endoscopy, identifiable in physician billing claims, GERD was identified accurately with a sensitivity of 78.7%, a specificity of 96.7%, and a PPV of 92.1%.

Our study has several limitations. First, our validation of ICD for benign upper gastrointestinal disorders applies only to the ICD-9, and may not apply to the more recent ICD-10. We chose to evaluate diagnostic codes from the ICD-9 and not the more recent ICD-10 because we wanted to study long-term outcomes for patients with disorders diagnosed in the 1990s and were interested in the validity of diagnosis codes used for these patients. Because the ICD-10 came into effect only recently (fiscal year 2002), the ICD-9 codes still will be needed in studies with long follow-up periods. Coding of these conditions in the ICD-10 era probably will be more accurate because the description and range of diagnosis codes for GERD-related disorders is more specific and comprehensive (Table 4).

Table 4. Comparison of International Classification of Diseases Version 9 (ICD-9) and ICD-10 diagnosis codes for gastroesophageal reflux disease (GERD) and related conditions

Second, this is a retrospective study of patients from teaching hospitals. The study’s findings therefore may not be representative of all hospitals in Ontario. Similarly, patients seen in the chosen gastroenterologists’ practices may not be representative of the general population. However, this is not a major limitation for our study because our interest was specifically in the validation of diagnosis codes rather than the estimation of disease prevalence or treatment outcomes. Future studies, potentially those validating ICD-10 diagnostic codes, may benefit from a larger, more representative sample of hospitals and physician practices.

Third, our study involved linking personal health card numbers to hospital discharge abstract data, with the intermediary step of identifying the procedure within physician billing claims. Of the 591 charts reviewed, 70 records were excluded on the basis of an inability to validate the occurrence of the procedure in physician billing records (86.4% linkage rate). However, this intermediary step increased the internal validity of the sample studied, and also validated a process whereby potential patients were identified on the basis of endoscopy.

Finally, the accuracy of GERD diagnosis codes described in this study may apply only to individuals who have undergone upper gastrointestinal endoscopy. Patients diagnosed and treated by primary care physicians, who represent the majority of patients with GERD, may be different from those referred to specialist care. The advantage of using the subpopulation studied is that these patients are more likely to have a definitive diagnosis. Also, these patients are expected to have a higher degree of disease severity, making the study of adverse outcomes more feasible and clinically relevant.

Conclusions

In summary, we found that benign upper gastrointestinal disorders are coded in Ontario administrative health databases with reasonable accuracy. The ICD-9 diagnosis codes for GERD and its complications are highly specific and associated with strong PPVs, but have limited sensitivity. Patients identified in administrative data as having a GERD-related diagnosis are identified accurately. However, caution should be taken when these patients are followed longitudinally for the progression or resolution of their disease.