Abstract
The comparability of asthma and chronic obstructive pulmonary disease (COPD) epidemiology in different English routine data sources was examined to explore their use and validity in investigating environmental influences on respiratory health.
National data were obtained for mortality, emergency hospital admissions, general practitioner contacts and symptoms in the early 1990s. Age/sex patterns, seasonal variations and regional and urban/rural age/sex standardised event ratios were examined. Spearman rank correlations were used to describe consistency of regional rankings across data sets.
Asthma showed inconsistent disease patterns in different data sources and weak correlations for regional rankings but COPD was notably consistent. Unmeasured confounders may partly explain the findings, but individual level adjustment for social class and smoking (possible for symptoms) only partially attenuated the higher COPD rates in northern and urban areas and did not affect findings for asthma.
When epidemiological patterns are consistent across data sources as with chronic obstructive pulmonary disease in England, healthcare use is likely to reflect the underlying prevalence and severity of disease and can be used to study environmental influences. When patterns vary, as with asthma, the validity of the data in relation to its intended use must be carefully considered.
This study was funded by the Department of Health.
Routine health data for respiratory disease, such as hospital admissions and mortality, are widely used to investigate environmental influences on health 1, 2, epidemiological trends 3 and health services research 4. It is widely acknowledged in general terms that care is needed in the use and interpretation of such data 5, but little has been published addressing what this might mean in practical terms or examining how interpretation might differ for different diseases. The epidemiological patterns seen for asthma and chronic obstructive pulmonary disease (COPD) in England in 1991–1995 across four routine data sources were compared as part of a project to investigate the validity of using such data to examine environmental influences on respiratory health 6.
Methods
Data sources
Data sets relating to mortality, emergency hospital admissions, general practitioner (GP) consultations and symptoms were obtained. Linkage to identify individual patients across the data sets was not possible.
Mortality data for England in 1991–1995 were obtained from the Office for National Statistics (ONS). International Classification of Disease coding version 9 (ICD-9) was used for mortality coding throughout this time period. For this study, asthma was defined as ICD-9 code 493 and COPD as ICD-9 codes 490–492 and 494–496.
Hospital Episode Statistics (HES) data were obtained from the Department of Health for the financial years 1990/1991 to 1993/1994 and from Data Sciences UK for the year 1994/1995. Emergency hospital admissions (defined as finished consultant episodes with episode number equal to one) with a primary diagnosis of asthma or COPD, using the same ICD-9 codes as mortality, were identified for the calendar years 1991–1994. Admissions in 1995 were excluded because the coding of hospital admissions changed to ICD-10, and ICD code changes can lead to artefactual changes in rates of diseases 7.
Patients consulted in primary care with an inhaler prescription during the current year plus a current or prior diagnosis of asthma or COPD in each year from 1991–1995 inclusive, were identified from the General Practice Research Database (GPRD) 8. The GPRD is the largest computerised source of routine information on general practice morbidity and prescription in England 9. Inhaler plus diagnosis was used as this was found to be a better indication of the burden of disease in primary care than diagnosis alone 8.
The Health Survey for England 1995 (HSE95), an authoritative and widely used source of information on health determinants and health status in England, was obtained from the Data Archive held at Essex University (Essex, UK) 10. This contained a representative sample of ∼20,000 individuals in England from ≥2 yrs of age with an overall response rate of 78% 11. Household response rates varied between 73% in Greater London and 82% in northern areas of England. Where the head of the household was in a nonmanual occupation, response rates were a few percentage points higher among adults but no different for children. In the present study, people with asthma symptoms were defined as those reporting wheezing or whistling in the chest in the previous 12 months and people with COPD symptoms were those reporting cough or phlegm for ≥3 months in the winter; the authors acknowledge that there is often overlap between these two groups of symptoms. Symptoms were used rather than lung function (also available in the HSE95) as these applied to both asthma and COPD and the wording of the questions used in the HSE95 was very similar to that used in International Study of Asthma and Allergies in Childhood (ISAAC) 12 and Medical Research Council (MRC) 13 epidemiological survey questions for these diseases.
Statistical analysis
Each data source was analysed by age and sex, week of year (not possible for survey data), and district health authority of residence, classified by regional and urban/rural categories using 1995 regional health authority boundaries.
Rates for analyses for the GPRD were calculated by dividing the observed number of patients by the person years at risk (pyar) in the time period concerned. Incidence rates for hospital admissions and deaths were calculated using events as the numerator and mid year population estimates from the ONS as the denominator. Asthma rates were based on ∼8,000 deaths, 330,000 hospital admissions and 380,000 GP consultations, and COPD rates were based on 124,000 deaths, 245,000 hospital admissions and 85,000 GP consultations (table 1⇓).
Geographical data on the regional and district health authority (DHA) of residence were obtained. DHA of residence was used to assign an urban/rural code with four ordered categories (conurbation, urban, mixed and rural). Geographical variations by region and degree of urbanisation were compared using age- and sex-standardised event ratios (SER): the numbers of observed events were divided by expected events and multiplied by 100, where the expected number of events was derived from the age- and sex-specific rates in all regions and the particular age and sex constitution of the region concerned, using 5-yr age bands from 0–84 yrs.
Spearman rank correlations of regional SERs were calculated for 1994 (for all 14 regions and for all four data sources) and 1991, where correlations used SERs for each of 50 region and urban/rural combinations for mortality and hospital data and 33 combinations for the GPRD (spatial GPRD data are only released if units contain more than two GP practices). Correlations were used as a descriptive measure of association across data sources, rather than the more familiar use in investigative analyses where they are used to test the null hypothesis of no association between exposure and outcome.
Results
Age/sex patterns
There was little consistency in patterns for asthma from the different data sources (fig. 1a⇓). The highest prevalences of wheezing (>20%) were seen in infancy (ages 2–4 yrs), early adulthood (ages 15–24 yrs) and older adults (ages 55+). The highest rates of patient consultations for asthma in general practice were seen in children aged 5–14 yrs. Emergency hospital admission rates were highest in 1-yr-old children, while deaths were most common in the elderly. Higher rates of wheezing were seen in male children and in older males (aged 60+), but rates were similar in both sexes in midlife. Both GP consultations and hospital admissions for asthma were higher in male children than female children until the midteenage years. Female rates then became higher than male rates until extreme old age (≥85 yrs) when rates became similar. Asthma mortality rates were similar in both sexes until the age of 60 yrs when they became higher in females.
Age/sex patterns for COPD were broadly similar in all data sources, with rates increasing with age (fig. 1b⇑) and generally higher in males than in females. COPD inhaler prescriptions and hospital admissions peaked around age 80–85 yrs, while deaths continued to rise with increasing age.
Seasonal patterns
Seasonal patterns for asthma were inconsistent (fig. 2⇓). While deaths peaked in December and January, hospital admissions showed a sharp peak in the autumn in week 38, mainly due to admissions in children (not shown). In general practice, nonrepeat patient inhaler prescriptions for asthma were preferred to all prescriptions as they showed more seasonal variation. Patient prescription rates peaked just before Christmas (fig. 2⇓), while lowest levels of prescriptions were consistently seen in all ages in weeks 32–36 (when hospital admissions started to rise). There was also a marked drop during the Christmas and New Year period, almost certainly related to the opening times of GP surgeries.
Mortality and hospital admissions for COPD showed a similar seasonal pattern, with highest levels in the autumn and winter months and peaks at the end of the year (not shown). Prescriptions of inhalers for COPD showed very little variation, remaining around 1,000 per 10,000 pyar, except in the last week of the year when rates halved followed by a small increase in the first week in January.
Regional patterns
No consistent geographical patterns were seen for asthma. Correlations between age and sex SER rankings in areas from different data sources were generally weak, with Spearman rank correlation (rs) values=|0.5| (table 2⇓). There was a weak correlation between hospital admissions and mortality for all ages, while correlations between the GPRD inhaler prescriptions for asthma and mortality in 1991 and 1994 were inconsistent. This may reflect the relatively small number of deaths that some event ratios are based on. Correlations were based on >1,200 deaths (table 1⇑), but 14 region urban/rural subdivisions had fewer than 20 deaths. Further analyses of SER rankings were conducted for children aged <15 yrs and adults aged 15–84 yrs (table 2⇓), with the exception of mortality as numbers were too small. Correlations by age group were also weak, with all rs values ≤|0.35|, except for the correlation between emergency hospital admissions and wheezing symptoms. For all ages, this showed a correlation of 0.27, but by age group, a strong negative association (rs=−0.86) was seen in children (i.e. areas with higher levels of symptoms had lower levels of hospital admissions), while a moderately high positive correlation (rs=0.67) was seen in adults.
The scatterplot for hospital admissions and GP data (fig. 3⇓) demonstrates that, while the overall correlation was poor (rs=−0.12), systematic differences could be seen. In urban areas, standardised hospital admission rates were higher than corresponding primary care contacts, while the reverse was true in more rural areas.
In contrast to asthma, a clear urban/rural gradient was seen in COPD mortality, emergency hospital admissions and chronic bronchitis symptoms from the HSE95, with lower event ratios in rural areas. In GPRD data, conurbations had significantly higher standardised event ratios than the national average, but no gradient was seen. There were consistently good correlations across different data sources for symptoms, primary care consultations and prescription of drugs for COPD, hospital admissions and mortality (table 3⇓).
The scatterplot for hospital admissions and mortality in 1991 (fig. 4⇓) illustrates the correlations for COPD. It shows a four-fold variation in emergency hospital admissions and a three-fold variation in mortality between different areas. Lower hospital admissions and mortality were seen in rural and mixed areas, while higher event ratios were seen in conurbations. In conurbations, hospital admission ratios were higher than the standardised mortality ratios, while the converse was true in rural areas.
Discussion
To the best of the authors' knowledge, this is the first comparison of epidemiological patterns for asthma and COPD using four different sources of routine data. Generally, it might be expected that patterns across data sources for a specific disease would be consistent, with prevalence and primary care use showing some relationship with hospital admissions and deaths (the term consistency has been used throughout to refer to similarity of patterns of disease across different data sources; the term coherence is sometimes used in this context). However, the present study demonstrated marked differences between these major chronic respiratory illnesses across data sources in England. COPD rates were generally consistently higher in older age groups, in the winter, in northern areas and in urban areas, while asthma rates were strikingly inconsistent across data sources whether viewed by age, by week of the year or by geographical areas and the relationships between asthma hospital admission rates and GP consultation rates differed markedly by degree of urbanisation. One of the implications of this for asthma is that if rates are high in a particular area in one data source (e.g. hospital admissions) when compared to the rest of the country, it cannot be inferred that rates will also be high in another data source (e.g. GP consultations).
Data quality, representativeness and disease misclassification
It seems unlikely that data quality varied geographically and by diagnosis in a systematic way that could account for the marked differences observed between asthma and COPD. Large routine sources of data were used, which are generally held to be of good quality, at least in terms of coding and coverage 14. Since quality problems in HES data in the time period analysed have been documented 4, data quality tables from the Department of Health were consulted. While coverage of emergency admissions was almost 100%, missing diagnostic codes ranged nationally between 13% in 1990/1991 and 4.3% in 1994/1995, with greater variations both between and within regions. However, asthma has been found to be consistently well coded 15 and it was not possible to determine whether the overall percentage of missing codes were directly relevant to this analysis.
National mortality and hospital admission data are likely to provide almost 100% of events, while the HSE95 covers a representative sample of the population of England, so the issue of representativeness chiefly concerns the GP data. The GPRD contains data from volunteer practices that may not be fully representative of all GP practices or practice populations. In a previous analysis 8, the present authors found that the GPRD gave similar patient consultation rates to the widely used 4th Morbidity Survey in General Practice (MSGP), although this too consists of volunteer practices.
It is likely that some misclassification between asthma and COPD occurred, partly related to differential labelling of wheezing illness by age and sex 16, 17. It is difficult to determine how much this affected the results. Asthma as a cause of death has been found to be both over- and underattributed on death certificates 18, 19, but crossover between asthma and COPD on the death certificate appeared to be similar in both directions in a small English study 19. The definition of COPD used in the present study included “bronchitis not specified as acute or chronic” (ICD-9 490), which may account for the relatively high rates of COPD deaths and hospital admissions in those aged 0–4 yrs (fig. 1b⇑). The use of symptoms in the past year from the HSE95 was likely to be a sensitive but nonspecific measure. Further analyses not presented here using alternative measures for asthma (asthma self-reported as a long-standing illness or current use of an inhaler) in the HSE95 also demonstrated relatively weak geographical correlations.
Confounding factors
Health service and survey data can be used to identify areas with higher or lower rates, and confounders need to be considered when looking for the reasons underlying these geographical variations. In this study, rank correlations across data sources were examined to assess geographical comparability. The present authors consider it unlikely that confounders would substantially alter the rankings for an area in different data sources and thus explain the observed poor geographical correlations for asthma. For this to occur, it would have to be hypothesised that there are large geographical variations within England in either the interaction of confounders with healthcare use and symptom reporting or in the labelling of wheezing illness in different settings (e.g. primary care versus secondary care).
Smoking 20 and social class 21, 22 might be expected to be major confounders in geographical comparisons of COPD and asthma using ecological data. However, individual or small area level data on these were not available except in the HSE95, where a geographical analysis adjusting at individual level for smoking and social class as well as age and sex was performed. This found that the north/south and urban/rural differences were attenuated but not fully removed for COPD symptoms, while no differences were seen in adjusted prevalence rates for asthma symptoms. In the circumstances it was not considered possible to adequately adjust for smoking and social class in the other data sets. Some individual level data on smoking are available in the GPRD, but after exploratory descriptive analysis, these were not considered complete enough to use in the present study's analyses. National surveys of smoking in England are only reported at regional level. The present authors chose not to adjust for socioeconomical class as a confounder because the potential for nondifferential misclassification at the level of aggregation used (district health authority, ∼500,000 population) is large. While nondifferential misclassification bias of an explanatory variable tends to dilute the resultant association, in a confounding variable it may increase as well as decrease the apparent association depending on the (unknown) direction of the confounder 23. Adjustment for social class is further complicated as it is likely that the confounding effects interact with geographical area and with age to different extents for different modalities of healthcare use (hospitalisation, primary care consultations) or mortality and these interactions may differ between COPD and asthma.
Age is another important potential confounder. In the geographical analysis, age differences were taken into account by standardising by 5-yr age band. Alternative analyses looking at children and adults separately did not improve the poor geographical correlations for asthma. It cannot be ruled out that using finer age categories in stratification would have improved correlations. Given the natural history of asthma, it was not surprising that area rankings of rates of GP consultation did not correlate with those for mortality, but it was less obvious that those for age-specific patient consultations would not correlate with hospital admissions nor with symptom reporting.
Further confounders include ethnicity 24 and access to healthcare. Behavioural factors may also affect healthcare use. For example, asthmatic children of smokers have fewer GP visits compared with asthmatic children of nonsmokers 25, but exposure to parental smoking is generally associated with more severe asthma 26.
Interpretation of poor correlations between data sources
Having discussed data quality and the influence of confounders, it has to be considered whether the observed patterns are consistent with the natural history of the disease or whether they might reflect the role of some environmental or other influence.
Despite the poor overall geographical correlations for asthma, some patterns were apparent. Inspection of regional analyses and scatterplots for asthma suggested that SERs were higher for hospital admissions than primary care contacts in urban areas, and the reverse in more rural areas (fig. 3⇑). An environmental explanation for this would be that the severity of acute asthma is greater in urban environments. A study of hospital admissions in 1987/1988 in two hospitals in an urban and in a rural setting 27 supports this, but it must be recognised that the observed differences may have been related to different admission thresholds. Other explanations include the following. 1) Ease of access to hospitals in urban areas and comparatively easier access to the GP in rural areas. 2) Differences in primary care. Conurbations such as London have higher numbers of single-handed GP practices and small size of practice partnership is associated with higher hospital admission rates, independent of socioeconomical characteristics 28. 3) Higher deprivation in urban areas associated with higher admission rates 21. This explanation was not supported by the present authors' individual level analysis adjusting for social class in the HSE95 (see above).
The inconsistencies between age and sex and seasonal patterns for asthma will not be surprising to most respiratory clinicians. However, it is worth noting that the elderly had comparatively low rates of hospital admissions and high rates of deaths for asthma, while the reverse was true in children. If these relate to the same sets of patients, it may not only suggest that the natural history of asthma in the elderly is very different from that in children, but also that management varies markedly by age. The latter has ethical implications with relation to treatment of the elderly.
Further factors that may be particularly relevant to the findings for asthma are as follows. 1) Homogeneity of disease characteristics. The asthma label is applied to a spectrum of different conditions with different aetiologies, particularly in childhood 29. In contrast, most cases of COPD are related to smoking. 2) Severity levels within the disease definition. It has been suggested that recurrent minor respiratory illness in children is now being labelled as asthma in general practice 30. This could potentially produce differences in the age patterns for asthma for hospital admissions and GP consultations. 3) Influence of specialist practice. If diagnosis is primarily made by a respiratory physician in secondary care (e.g. with easier access to spirometry), there is likely to be some consistency between GP records and recorded causes of death. This may be applicable to COPD in some areas.
Implications of this study for the use of routine data
The current analyses suggest that an assessment of the consistency of health service or registry data for a specific disease can help to determine their appropriate use, taking into account the context of the healthcare system in place. The English National Health Service has several distinct features that may not apply to other countries, particularly in relation to primary care and to outpatient access to specialists (which the present study did not examine), but emergency hospital admissions and mortality data might be expected to be more comparable with those in other European countries. Where diseases show consistent patterns across data sources, such as COPD in England, it is reasonable to assume differences in provision of medical care are not responsible for the observed regional and urban/rural differences, and that readily available routine data, such as mortality or hospital admissions, can be used as surrogates for prevalence or severity in investigations into high local rates or environmental influences on the disease.
If the routine data for a particular disease are not consistent across data sources, data quality needs to be carefully examined and confounders adjusted for or excluded. If inconsistency remains, the appropriateness of using these data sources needs to be carefully established with the purpose of the study in mind. For example, prescriptions made by general practitioners for asthma were used in an investigation of factory emissions 31 as a proxy for the underlying prevalence but this might be better assessed by survey data. Similarly, hospital admissions for asthma in England are of limited value as an indicator of community prevalence 32, 33 or indeed as a performance indicator for the quality of primary care as has been proposed 22.
Acknowledgments
The authors would like to thank Statistics Division 2, Department of Health for information on data quality. The Health Survey for England 1995 data is Crown copyright material reproduced by permission of the controller of Her Majesty's Stationery Office. The Health Survey for England 1995 copyright holder, the original data producer, the relevant funding agencies and the Data Archive bear no responsibility for the analysis and interpretation of the data presented. R. Anderson was a co-applicant on the original funding proposal for this project.
- Received January 25, 2002.
- Accepted September 3, 2002.
- © ERS Journals Ltd