SERIES “THE GLOBAL BURDEN OF CHRONIC OBSTRUCTIVE PULMONARY DISEASE”
Edited by K.F. Rabe and J.B. Soriano
Number 2 in this Series
ESTIMATING THE BURDEN OF COPD: METHODS AND RESULTS FROM THE GLOBAL BURDEN OF DISEASE STUDY
⇓Information about the comparative magnitude of the burden from various diseases and injuries is a critical input into building the evidence base for health policies and programmes. Such information should be based on a critical evaluation of all available epidemiological data using standard and comparable procedures across diseases and injuries, including information on the age at death and the incidence, duration and severity of cases who do not die prematurely from the disease. A summary measure, disability-adjusted life yrs (DALYs), has been developed to simultaneously measure the amount of disease burden due to premature mortality and the amount due to the nonfatal consequences of disease.
Approximately 2.7 million deaths from chronic obstructive pulmonary disease (COPD) occurred in 2000, half of them in the Western Pacific Region, with the majority of these occurring in China. About 400,000 deaths occur each year from COPD in industrialised countries. The increase in global COPD deaths between 1990 and 2000 (0.5 million) is likely to be partly real, and partly due to better methods and more extensive data availability in 2000. The regional (adult) prevalence in 2000 varied from 0.5% in parts of Africa to 3–4% in North America.
Health systems must increasingly address a broad spectrum of health issues, ranging from epidemic outbreaks to advanced therapeutic care. They must, or should, also support disease prevention and health-promotion activities. Recognising that resources for health were unlikely to grow as quickly as demand, in 1993, the World Bank proposed a series of intervention packages for countries at different stages of development which, if implemented, would probably lead to the greatest gains in population health at affordable cost. The evidence for these recommendations was based on a study of the global burden of disease from various conditions, and an assessment of the cost-effectiveness of known interventions against them.
The Global Burden of Disease (GBD) Study, commissioned for the Bank Report, was the first ever comprehensive attempt to simultaneously assess the burden of premature mortality and nonfatal illness due to >100 diseases and injuries worldwide 1. To assess burden, a summary measure of population health, DALYs, was used, with the stream of yrs of life lost (YLL) or yrs lived with disability (YLD) assessed separately. Thus, for any disease or injury (i) (e.g. lung cancer, road traffic accidents, measles), premature mortality was assessed as the following:where are deaths from cause i at age x and ex is the standard expectation of life lost due to death at age x 1. To assess YLDs, the principal disabling sequelae (j) from cause i were first identified and, for each sequela, age- and sex-specific incidence (Ij) was estimated, as was average age at onset, duration (D) and severity (S), on a scale of 0–1 (0 = perfect health, 1 = death). The methods and estimation procedures are described in detail elsewhere 1. Thus, the following equation holds:
Overall disease burden, measured by DALYs, was then estimated as the sum of the two components:
Internal epidemiological consistency for each condition was forced through a disease model, which simultaneously estimated age-specific incidence, case fatality, prevalence, duration and general background mortality 2.
The base year for the original GBD Study was 1990. The World Health Organization (WHO) subsequently agreed to prepare a revised assessment for the year 2000, the GBD 2000 Study. With the substantial interest in the GBD 1990 Study, there have been several methodological improvements to the way that epidemiological estimates are prepared, to the architecture of DALYs, as well as a very substantial increase in the amount of epidemiological data available 3. The current authors describe the basis for deriving the epidemiological estimates required to compute DALYs from COPD. The key challenge is to estimate age- and sex-specific death rates from COPD, by region (there were eight World Bank regions in the 1990 Study and 14 WHO epidemiological regions in the 2000 Study (fig. 1⇓)), as well as incidence, prevalence and other epidemiological parameters by age, sex and region. In this paper, the current authors' aim is to describe the methods and summarise the key findings from the GBD 2000 Study, and to evaluate the impact of the major methodological advances over the methods and approaches used in 1990 on the comparability of findings across the two periods.
Estimating COPD mortality
Each year, ∼100 countries worldwide report data from their vital registration systems on causes of death in their populations 4. The quality and coverage of these statistics vary enormously, yet they are of major relevance for public health. Vital registration systems that capture all deaths in a population, and include a medical certificate completed by a qualified practitioner as to the medical conditions preceding death, are the “gold standard” for assessing causes of death. Yet, in many countries, these systems either fail to capture all deaths, fail to provide a specific clinical diagnosis as to the underlying cause of death, or provide an incorrect cause. This is true even for developed countries. Thus, many deaths coded to heart failure or ventricular dysrythmias, for example, in countries such as Japan, Spain or France would, in the USA, UK or Australia, be more likely to be recognised as due to ischaemic heart disease 1. The implications of such miscoding can be substantial. For example, when correction algorithms for vascular disease miscoding were applied in the 1990 GBD Study, ischaemic heart disease mortality rates in Japan, France, Brazil and several other countries were estimated to be 50–200% higher than reported 1. It is unclear whether there are similar systematic certification and coding biases across countries for COPD; however, studies of multiple-cause coding suggest that there might be 5.
In addition to vital registration, other sources of cause-specific mortality data for populations were identified and evaluated, including large-scale epidemiological surveillance systems on a sample basis in China and Tanzania, and, where available, community-based epidemiological research studies and disease registers (e.g. cancer). Since claims about causes of death are often made by disease-specific groups working in isolation from others (e.g. HIV, malaria), they are often exaggerated and, when summed, greatly exceed independent demographic estimates of total mortality in any given age or sex group. This is especially true for children and young adults where the impact of most major communicable diseases and conditions of poverty is primarily concentrated. As a first step, the “envelope” of mortality by age, sex and region was first estimated from demographic databases, and the estimates of cause-specific mortality from the >100 causes examined in the GBD Study were constrained to sum to this number of deaths.
For countries with vital registration data, the following procedure was adopted to estimate cause-specific death rates, including COPD. 1) Where vital registration was incomplete, data were first corrected for undercount using standard demographic methods. 2) Ill-defined causes were redistributed pro rata across defined causes (i.e. deaths with a specific International Classification of Diseases (ICD) underlying cause), and deaths of those aged >5 yrs were distributed across chronic diseases only. 3) For countries/regions where cause of death registration was considered reliable, COPD mortality was estimated directly from the data. For each remaining region, available epidemiological information and disease models were used to first estimate broad cause of death groups, by age and sex (group I: communicable, maternal, perinatal, nutritional; group II: noncommunicable; group III: injuries) 6. For each region, a regional “template” of cause-specific mortality was first constructed using local epidemiological evidence, including sample registration and disease registries. This template was then applied to the broad group II estimates of mortality by age and sex to estimate specific COPD mortality.
Clearly, the estimation procedure in the absence of good vital registration is highly uncertain and yields, at best, possibly plausible estimates of mortality rather than observed numbers of deaths from COPD. Any systematic biases or miscoding of COPD inherent in the local epidemiological evidence used to construct the regional template of proportional mortality would be reflected in the regional mortality estimates. Using these methods, COPD was estimated to have been the cause of ∼2.75 million deaths in 2000, half of them in the Western Pacific region, and most of these in China. Another 650,000 COPD deaths were estimated to have occurred in the South-East Asia region, largely in India (table 1⇓). In the industrialised countries as a whole (the very low child and adult mortality (A) regions), COPD caused ∼300,000 deaths, or 10% of the global total. Worldwide, 1.9% of DALYs were attributable to COPD in 2000; these estimates show some regional variability according to WHO subregions defined on the basis of epidemiological and demographic transition (table 1⇓).
The same broad approach to estimating COPD mortality was adopted in 1990, but with less rigorous methods for estimating group I/II/III and all-cause mortality, and with less vital registration data. In that year, 2.2 million deaths were estimated from COPD, although it is difficult to know whether the increase over the decade (0.55 million deaths) is real or largely due to these methodological and data advances. Certainly, a key driver of global mortality from COPD is China and, until a competent evaluation of the Chinese Disease Surveillance Points System is carried out, which is the primary source of epidemiological data for the country, the regional and, hence, global estimates of COPD mortality will remain uncertain 8.
Despite the fact that COPD is now prevalent in both developed and developing countries, largely as a result of the tobacco epidemic, reliable estimates of its prevalence are surprisingly scant in most parts of the world 9–11. There is now a consensus that COPD is characterised by airways obstruction with lung function levels of forced expiratory volume in one second (FEV1)/forced vital capacity <70% and presence of a post-bronchodilator FEV1 <80% of the predicted value that is not fully reversible 12, 13. However, population-based estimates of COPD prevalence by region are problematic, since the disease is progressive, measurement tools and definitions still vary among studies, and implementation of spirometry is often not feasible in developing regions 14.
In such circumstances, observed incidence and prevalence become highly dependent on factors other than the true occurrence of disease. For example, prevalence based on self-reported symptoms (chronic cough, sputum, etc.) probably overestimates true COPD prevalence due to misclassification of other possible respiratory diseases. Typically, only about half of the patients with symptoms of chronic bronchitis actually have COPD as assessed by spirometry (table 2⇓). Conversely, physician diagnosis usually underestimates true COPD prevalence. In the USA, for instance, ∼60–70% of those with lower FEV1 have never had a diagnosis of COPD 15. Furthermore, there is a considerable variation among studies in terms of case definition, study design, sample size and data analysis, which makes comparisons among studies difficult.
Even in the industrialised (A) regions, there is little consensus across definitions as to the true prevalence of COPD. Moreover, since prevalence figures are not available in many developing regions, an alternative approach is needed. What the current authors have done is to infer disease occurrence from mortality figures, with the help of the mathematical constraints imposed by forcing consistent epidemiological relationships among prevalence/incidence, remission and mortality rates 16. The disease model, DISMOD, is used to back-calculate consistent estimates of COPD incidence and prevalence 3. The main advantage of this approach is that incidence, prevalence and mortality estimates are iteratively linked through the causal chain of a disease process, and this chain limits the possible combinations of incidence, prevalence and mortality rates 2. Limits can be imposed, since any prevalent case must have become incident at a younger age, and any person who died with a disease must have become incident previously, and have been prevalent.
In order to estimate prevalence of COPD in a disease model linking incidence, prevalence, duration and case fatality, the relative risk of dying from COPD mortality (or the case-fatality rate) is required 16–19. One approach to approximating the relative risk (RR) of COPD mortality is to use information on risk factors associated with COPD (i.e. RR = DR1/DR0 where DR1 and DR0 are death rates in exposed (1) and unexposed (0) groups). COPD mortality was modelled as a function of risk factors and other possible determinants, along with regional fixed effects of the following form:where rij is age- and sex-specific COPD mortality rate, Mij is number of deaths in each age group, PYij is person-yrs at risk approximated by mid-yr population by age and sex, Xi is risk factor variables, and REGIONj is regional fixed effect.
Goodness of fit and deviation of errors were assessed to choose the best fit regression model. By modelling in this way, the RR of death from COPD for each region can be approximated by the relative impact of COPD risk factors. In other words:
Three major risk factors have been identified for COPD, namely: 1) cigarette smoking; 2) heavy exposure to occupational and indoor air pollution; and 3) α1-antitrypsin deficiency 9, 12, 13, 20. While cigarette smoking accounts for 80–90% of COPD risks in developed countries, smoking behaviour alone is not sufficient to explain the geographical difference in prevalence rate of symptoms 9, 21. Studies have suggested that the prevalence rate for symptoms increases with increasing levels of air pollution, independent of cigarette consumption, indicating that outdoor and indoor air pollution may account for the geographical differences. Several studies in developing countries have shown a significant relationship between female COPD prevalence and the use of biofuels 22, 23.
Although excess mortality has been noted during periods of excessive outdoor air pollution in both developed and developing countries, to date, the role of outdoor air pollution in the aetiology of COPD remains unclear 9, 24. WHO's Comparative Risk Assessment database provides information on exposure to both smoking and indoor air pollution by region 25. Although data on occupational dust exposure is not yet widely available, its role is thought to be much smaller compared with that of cigarette smoking and indoor air pollution 26. Finally, severe α1-antitrypsin deficiency is a recessive trait common in individuals from Northern Europe and virtually absent from other populations. Prevalence is estimated to be much less than 1% in the general population and has thus been excluded from the model.
Other potential factors related to COPD prevalence include age, sex, race/ethnicity and socio-economic status 12. Increasing prevalence with advancing age may reflect either cumulative exposure to smoking and other risk factors, loss of elasticity of lung tissue or both. To take this factor into account, the current model incorporates age as an independent variable.
In general, COPD is more prevalent in males, reflecting differential exposure to risk factors. However, a higher COPD prevalence has been observed among females in the South-East Asia region than males due to indoor air pollution 22, 27. Sex has been included as a variable in the model to adjust for the sex differential in risk exposure and possibly in susceptibility, since the data on biofuel use is only available as a proportion of the population exposed. It has also been suggested that higher COPD prevalence is observed among certain races and ethnic groups, but this is most likely due to confounding by differences in exposure to risk factors, access to healthcare, etc.
The final regression model used in the GBD 2000 Study was as follows:where SIR is the smoking impact ratio that approximates cumulative past exposure to smoking 21, AIR is a proportion of households using biofuel, and SEX and AGE are dummy variables for sex and age, respectively.
From this regression model, the relative risk of COPD-related mortality can be approximated by:where SIR0 and AIR0 are set to zero in the unexposed groups.
A further advantage of this approach is that it yields an independent estimate of age- and sex-specific regional mortality from COPD, derived using a quite different (regression-based) approach to that described earlier. The results of the two approaches are compared in the following section.
Some estimate of the severity of COPD in patients is required in order to determine disability weights and, hence, YLDs from the disease. The traditional approach to caring for COPD patients has been to rely on pulmonary function testing to quantify severity, and to assess response to therapy. However, patients with COPD seek medical care because of symptoms, in particular dyspnoea and inability to function, which clearly have an impact on an individual's health-related quality of life (HRQoL). Accordingly, instruments have been developed to provide a standardised method to measure health status and levels of disability 28.
It has been suggested that when the FEV1 falls to ∼50% of that predicted for a healthy population, the individual typically first experiences some activity limitation because of dyspnoea. When FEV1 reaches a level of 30–40% of that predicted, there would be significant exercise limitations, which can be severely disabling 29, 30.
For the GBD 2000 Study, disability weights rather than disease-specific HRQoL scores have been employed. The original disability weights for untreated and treated COPD in the GBD 1990 Study were 0.428 and 0.388, respectively 1. Recent national burden of disease studies in Australia and the USA used an aggregated disability weight based on the Dutch disability weight exercise in which mild/moderate and severe COPD were assigned the weights of 0.17 and 0.53, respectively.
For the GBD 2000 Study, it was decided that disability weights from these national burden of disease studies should be employed. For the purpose of comparison, YLDs based on current incidence estimates and the GBD 1990 Study disability weights were also calculated.
It has been suggested that treatment can improve the quality of life among COPD patients, but only smoking cessation can alter prognosis and progression 10, 31. Hence, the majority of current treatments for COPD are conservative, but some improvement in the HRQoL score has been observed (for example, for a bronchodilator 32, for lung rehabilitation 33, 34, for treatment guidelines 35, for lung volume-reduction surgery 36). Changes in a generic HRQoL instrument (i.e. the Short Form-36) score ranged from 14% (rehabilitation) to 22% (volume-reduction surgery). Hence, for the assessment of the treatment effect, a conservative 14% reduction in the disability weight as a treatment effect was employed, which is slightly larger than that used in the GBD 1990 Study.
The final regression model (equation 6) achieved a considerably high goodness of fit, with no systematic deviation among residuals. The coefficients for smoking (i.e. SIR) and biofuel use were both positive (1.111 and 2.108, respectively) and highly significant. As expected, the coefficients describing the effect of age on COPD mortality rose monotonically with age (1.664, 3.694, 5.662, 6.922, 8.068 for the age groups 30–44, 45–59, 60–69, 70–79 and ≥80 yrs, respectively). Overall, the model accounted for much of the variation in estimated COPD mortality among regions, with an R2 = 0.942.
Predicted mortality was consistent with the GBD 2000 mortality estimates, except for males in South-East Asia (SEAR) with high child and adult mortality (D) where the GBD 2000 Study estimates were lower than predicted. This may well be due to problems with the regional cause of death template, and more confidence should be placed in the model-based predictions.
This model was used to estimate age- and sex-specific RRs of COPD-related mortality by region. For the purpose of comparison, the RRs used in the GBD 1990 Study, current model estimates, and the recent USA national burden of disease study (unpublished data) are shown in figure 2⇓. The patterns of age- and sex-specific RRs are consistent among studies except for the GBD 1990 estimates, which highlight the limitations of the constant survival values approach used in that study.
Figure 3⇓ shows estimated age-specific incidence and prevalence per 100,000, by broad regions in 2000, using these methods. Incidence rates rise with age, as expected, and are higher for males than females in all regions. By far the highest incidence is estimated for the Western Pacific (WPR) low child and adult mortality (B) region (primarily China) with rates at age ≥60 yrs of 2–3 times those of other regions. Age-specific prevalence of the disease is also highest in WPR B in most cases, but, interestingly, is also high in Latin America (AMR) B and D, and A regions. Prevalence rates in these regions are 3–4 times those for other parts of the world.
Table 3⇓ shows the summary estimates of adult (aged ≥30 yrs) prevalence of COPD, by region, for 2000. Table 3⇓ also provides a comparison of prevalence estimates from the 1990 GBD Study, with the range of published (as of 2002) estimates in the literature. In general, the current (i.e. 2000) model estimates were more consistent with the results of the published literature than the 1990 estimates. In particular, the previous estimates highly underestimated COPD prevalence in AMR A, Europe (EUR) A and WPR A (i.e. the industrialised countries), where the prevalence of smoking was higher than in other regions. Conversely, the current estimates yielded lower prevalence rates in Africa (AFR).
Based on the estimated age-specific mortality, and the incidence/duration and severity weights from these methods, disease burden from COPD was calculated by region for 2000. These estimates are shown in table 4⇓.
Overall, COPD was estimated to have caused >26 million DALYs in 2000, or just <2% of the global total. Of this, WPR B (including China) and SEAR D (including India) accounted for 36% and 25%, respectively. Despite a lower prevalence in these regions than in other regions where risk factors are more prevalent, the large population in both regions contributed to the larger total morbidity from COPD. They were followed by smoking-prevalent regions, such as EUR A (6.7%) and AMR A (6.0%). The male/female ratios of total YLDs and YLLs were 1.4 and 1.1, respectively. The ratios tended to be higher in AFR, EMR and WPR B regions where smoking prevalence among females is still low. Overall, YLDs accounted for 38% of the total COPD burden, but YLDs exceeded YLLs in some mortality subregions (A and B).
Figure 4⇓ shows a comparison of YLDs per 100,000 population by region in GBD 1990 and GBD 2000. Total burden of COPD was larger in the GBD 1990 estimate than the present estimate partly because GBD 1990 tended to overestimate COPD incidence in regions such as AFR and WPR B, and partly because GBD 1990 employed higher severity weights across the regions. It should be noted, however, that actual epidemiological parameters such as mortality and prevalence rates increased considerably in 2000.
Discussion and conclusions
In the current paper, an alternative approach to estimating COPD incidence and prevalence has been described, which was used for estimating YLDs due to COPD in 2000. Compared with the previous estimates in GBD 1990, which employed uniform RRs across the regions, the revised method has the advantage of including regional variations in RRs of COPD-related mortality 1. However, several limitations should be noted.
First, estimated RRs are not the true RRs of COPD-related mortality; rather they are approximated by the two major risk factors for COPD, i.e. RRs of joint effects of smoking and air pollution. Conversely, since the goodness of fit of the regression model was high (0.942), COPD-related mortality would be well represented by the risk factor analysis. Estimated prevalence rates were comparable to those published in the literature, ensuring the validity of the present approach. However, it is likely that the current GBD Study underestimates the true prevalence of COPD. At the time of analysis (2002), <30 studies of COPD prevalence were available to prepare regional estimates. More recent reviews 37, 38 have identified newer published studies from which updated prevalence estimates can be made. These data will be taken into account in any subsequent revision of the GBD Study.
Secondly, COPD mortality estimated for GBD 2000 may be still be an underestimate. A recent study in Canada by Lacasse et al. 33 has suggested that COPD mortality was much lower than that estimated by Mannino et al. 5 from the USA multiple causes of death data. In fact, the study by Mannino et al. 5 suggested that mortality from COPD was highly underestimated when using vital statistics rather than multiple causes of death data. However, if asthma was excluded as a cause of COPD death and cause of death was restricted to the primary cause, actual mortality figures in Canada and the USA were comparable, since COPD often coexists with lung cancer and COPD may often be the secondary cause of death rather than the primary cause 13. Therefore, if the data on cause of death is restricted to the primary cause classified by ICD 9th or 10th revision as in the GBD exercise, COPD mortality is at least comparable across the regions. The possibility of the miscoding in COPD is still likely to be smaller than other causes of death. Nevertheless, the current estimate of COPD burden should be considered as a lower bound of true COPD burden.
Thirdly, another problem of underestimated mortality would be overestimation of prevalence, particularly in SEAR D and EUR B. Since a patient with a disease is selectively being removed from the population, its prevalence is lower than it would have been if the disease ran no excess mortality risk 16.
Finally, the main results from this study are estimates of COPD prevalence rates, which are consistent with a corresponding set of prevalence and mortality rates and, more importantly, appear to be more comparable with those in national burden of disease studies and the published literature than the previous (1990) estimates. Despite the uncertainty in estimates due to data limitations, the GBD Study suggests that COPD is a major cause of death and disability in all regions. More than 2.5 million people die of the disease each year, or about the same number as HIV/AIDS, and most of these deaths are in poor countries. COPD is currently the 10th leading cause of disease burden (DALYs) in the world, causing ∼2% of the entire global burden of disease, and this can be expected to rise unless urgent action is taken to control leading risk factors, particularly tobacco.
PROJECTING THE FUTURE BURDEN OF COPD
Methods to project future burden of disease include risk factor models and extrapolation of past trends. Risk factor models are intuitively appealing, but complex to construct and require detailed information on risk factors. Age–period–cohort extrapolation methods are appropriate for COPD as demographic changes, lifetime smoking habits (cohort effects) and other factors may strongly influence future COPD risk. Data (especially mortality) are often readily available and extrapolation methods are simple to implement, but can be difficult to interpret and are probably best for short-term projections.
To illustrate some of the issues with projecting burden of disease, a Bayesian age–period–cohort method was used to project COPD mortality for 2000–2009 in England and Wales (UK), using routinely available mortality and population data for 1945–1999.
Testing the model by making predictions for the last 10 yrs of existing data gave median totals for COPD deaths within 9% for males and 5% for females. Projections for 2000–2009 suggested a median fall in death rates for males of 24% (90% credible intervals −52–14%) by 2009 on a 1999 baseline and corresponding 2% (90% credible intervals −40–65%) rise for females. The wide credible intervals reflect marked year-to-year variations in numbers of deaths, probably related to infectious disease activity.
Credible or confidence intervals are not routinely presented in current published COPD projections using risk factor models, but may give useful information. As no method is perfect, a more complete assessment would compare projections obtained using risk factor and extrapolation methods.
In words attributed to Niels Bohr (1885–1962), “prediction is very difficult, especially about the future”. However, some knowledge about expected future trends of a disease can be useful for many reasons, including planning of public-health initiatives and healthcare services. Two general approaches are in use to project future incidence or mortality trends: extrapolation based on previous trends and forecasts based on multivariate risk factor models 1. Both methods assume some constancy of whatever they model on, either that the relationship of risk factors with disease stays the same over time or, in extrapolation, that past trends continue. This section considers the use of these methods to project the future burden of COPD, and subsequently illustrates some of the issues that may arise in making projections, using a Bayesian extrapolation method for mortality in England and Wales. While this method is readily transferable to other countries, results may differ as smoking prevalence has been declining for several decades in the UK.
Risk factor methods
Arguably, the most well-known COPD projections come from the authoritative and widely quoted GBD Study, which projected that COPD would rise to the third leading cause of death worldwide in 2020 1. These projections arose from a risk factor model designed to guide international health policy. The model was deliberately simple to facilitate its use across a range of diseases and countries. Risk factors included in the model were projected income, education and smoking levels, and these were used to make projections for disease groupings for 2020 by world region, for baseline, optimistic and pessimistic scenarios 1. The model provided a relatively poor fit (r2 coefficients typically ∼30% for most age and sex groupings), i.e. only 30% of the variation in the model could be explained by the risk factors included. Levels of specific respiratory diseases such as COPD at the country level were then inferred from that country's current distribution of specific conditions within respiratory disease. While the GBD model has been greatly influential in directing attention at respiratory disease internationally, other approaches are arguably better at producing projections at a country level where good risk factor and healthcare data exist.
One of the few risk factor models specific for COPD was published by Feenstra et al. 39 in 2001. This used a system-dynamic multistate lifetable model to make 20-yr predictions about the COPD burden in the Netherlands. Main risk factors for COPD were age and detailed smoking data (survey data with age- and sex-specific starting and stopping smoking rates). RRs for developing COPD related to these risk factors were derived from cohort studies (chiefly from the USA). These were then applied to COPD incidence and prevalence data from a combination of general practice databases, national mortality and population data, and to information on healthcare usage and costs. This complex model provided detailed information on projected prevalence, DALYs, mortality and costs. As well as providing detailed information, informative sensitivity analyses can be conducted using risk factor models. For example, the basic model suggested an increase in COPD prevalence of 43% in males and 142% in females between 1995 and 2015 39. However, if all smokers quit in 1995 and nobody else started, COPD prevalence in 2015 was still projected to increase by around 40% in males and 129% in females, due to future ageing of the population and the legacy of previous smoking. There are some disadvantages of such models: 1) they are complex and can, therefore, be difficult to implement; 2) the complexity magnifies the possibility of combining errors and biases in the data; and 3) such models are not readily transferable to countries lacking comprehensive healthcare information.
The alternative approach to risk factor models is to extrapolate from past trends. Extrapolation models can be readily used at the level of country and are often simpler to implement than risk factor models, although they may be complex to interpret. They are probably best suited to short-term projections because past trends are likely to become less relevant further into the future.
Extrapolation methods are mainly used with mortality data. This is because mortality data are readily available in many countries over a sufficient time period (25–30 yrs are advisable), with reasonably stable definitions of disease and coding over time and of acceptable quality. Simple extrapolations, such as those from a logistic regression model of mortality trends 40, are likely to give a misleading impression of future COPD trends as they do not take into account factors such as population age changes, year-to-year (period) influences such as influenza epidemics, and generational (cohort) influences such as lifetime smoking patterns that may all strongly influence future COPD mortality 41, 42. Age–period–cohort models are more sophisticated extrapolation methods that take into account all of these factors and have been most widely used to make projections of cancer mortality 41–48.
Assessment of uncertainty in disease projections
Leaving aside the inherent problems of attempting to assess the future at all, variations in projected figures relate to the choice of model assumptions and also to statistical uncertainty in the model. While most published projections examine the impact of model assumptions through sensitivity analyses, very few published analyses report uncertainty intervals (confidence or credible intervals), although some authors refer to ways in which these may be calculated 1, 41. The omission of uncertainty intervals may affect interpretation of the projections. The rest of this section illustrates this point in a demonstration of a recently developed Bayesian age–period–cohort model to make short-term mortality projections for COPD for England and Wales 49.
COPD and population data for England and Wales 1945–1999 by 5-yr age band were obtained from the Office for National Statistics (ONS) 50–54. Codes used for COPD included those for asthma, chronic bronchitis and emphysema due to concerns about distinguishing between these conditions using death certificate data 55; asthma constituted ∼7% of deaths over this time period. Population projections for 2000–2009 onwards were obtained from the Government Actuarial Dept (London, UK). As changes in ICD coding over time can lead to artefactual changes in rates, age-specific conversion factors for COPD mortality covering change years were obtained from the ONS 56–60, except for the changes from the ICD8 to ICD9, which are not available.
Projections of numbers of deaths were conducted using a Bayesian age–period–cohort model implemented in a freeware programme called Bayesian Age-period-cohort Modelling and Prediction (BAMP) 49, 61. Here, period effects represent factors affecting people of all ages at a particular point in time (e.g. treatment advances or influenza epidemics), while cohort effects represent factors more common in people born at a particular point in time (e.g. smoking habits).
The statistical method is described in more detail elsewhere 49. In brief, the underlying assumption on which the statistical model is based is that observed mortality rates result from a constant rate modified by age, period and cohort effects, plus unobserved covariates. The model is implemented in a Bayesian framework and is a development of work by Clayton and Schifflers 62 and Berzuini et al. 63. Differing assumptions (prior beliefs or priors) about the smoothness of the age, period and cohort parameters could be incorporated into the model. The smoothing prior defined as random walk (RW)1 favoured solution parameters with constancy of first-order differences of age, period or cohort parameters, thereby assuming a smoothness of age, period and cohort trends, whereas the RW2 prior penalised deviations from a linear trend of the second-order differences of age, period or cohort parameters, assuming a smoothness of the rate of change of parameters 49.
To select the best model with which to make projections, predictions of numbers of deaths were made for the last 10 yrs of existing data (1990–1999), where the actual numbers of deaths were already known. Predictions were based on analyses for 1950–1989 using RW1 or RW2 constraints and data sets adjusted or unadjusted for ICD coding changes. The combinations of data set and model that most closely predicted the numbers of deaths for the last 10 yrs of existing data, assessed by comparing the square root of the sum of median predictive deviances, were then used to project mortality from 2000 to 2009.
Within the Bayesian framework, information from both prior beliefs (priors) and the data are combined to give a posterior distribution, from which actual values for parameters can be derived though repeated iterative sampling. The first 2,000 samples representing “burn in” for the model were discarded, and quoted results are based on a further 100,000 samples. Median values and centiles specified at 5% and 95% and 25% and 75% were obtained.
Deaths from COPD in males and younger females have declined in England and Wales in recent years, with the decline starting at different periods in different age groups, but have been increasing in older females (fig. 5⇓). The impact of coding changes was minimal, as artefactual changes in rates across change years were not visible. However, marked year-to-year variations were also visible in these graphs, and these were located as period effects in the age–period–cohort analyses.
Making predictions for existing data using age–period–cohort analyses
Posterior median predictions from all models overestimated the total number of deaths occurring in 1990–1999. In the best models, overestimation was ∼9% for males and 5% for females (table 5⇓). However, the actual numbers of deaths for each year and age group generally lay within the 5% and 95% centiles of the predictions. Using data adjusted for ICD changes gave slightly worse predictions in females but slightly better predictions in males (table 5⇓).
Projections for 2000–2009 using age–period–cohort analyses
Median projections are presented together with 90% credible intervals, which were obtained from the 5% and 95% centile estimates. The 90% credible intervals can be interpreted as intervals where there is 90% confidence that the “true” value lies within these bounds. Projections using the best models suggested that COPD death rates in males aged ≥45 yrs would continue the decline of recent decades, falling by 24% (90% credible interval −52–14%) by 2009 from a 1999 baseline (table 6⇓). In females, projections suggested fluctuations in COPD death rates and wide credible intervals, with 2% higher rates in 2009 (90% credible interval −40–65%) than in 1999 (table 6⇓). A fall in rates was strongly suggested for males aged in their 60s and 70s and females in their 60s (the 95th centiles for projected rates in 2009 were lower than in 1999), but, for other age groups, 90% credible intervals encompassed both a fall and rise in rates (fig. 6⇓).
The corresponding posterior median projections of numbers of deaths (data not shown) showed an ∼10% fall in COPD male deaths and a rise (with minor year-to-year fluctuations) in female deaths of ∼10% from a 1999 baseline to ∼14,000 each per yr in 2009. The median number of deaths for females at 14,410 (90% credible interval 8,184–21,886) were projected to become higher than those for males at 14,259 (90% credible interval 8,824–21,145) for the first time by 2008, but credible intervals showed wide overlap. Despite higher numbers of deaths, the posterior median rates in females were projected to remain lower than males throughout the period (table 6⇓) due to demographic changes in the population.
To the current authors' knowledge, this is the first use of a Bayesian age–period–cohort method to make mortality projections for COPD. Median projections suggested that male COPD death rates in the UK would continue the decline seen in recent decades and fall by a quarter over the decade, whereas those in females were projected to remain fairly static. Perhaps the most striking feature of the projections was the wide credible intervals. It should also be noted that intervals quoted will underestimate the statistical uncertainty in the model because uncertainties about the population projections for 2000–2009 have not been incorporated. As previously noted, credible or confidence intervals are rarely presented in published projections.
The major factor responsible for the wide credible intervals for COPD mortality was the marked year-to-year variations in mortality rates (fig. 5⇑) located as period effects. These period effects probably relate to variations in activity of respiratory infections, including respiratory syncytial virus and influenza. Similar methods used to project lung cancer mortality, which had much less year-to-year variation than COPD, resulted in much narrower credible intervals 64. Wide credible intervals do not affect the median projections and are of interest in their own right. For example, although COPD mortality rates in males were likely to decrease overall, the results suggest that, given the experience in previous years, there could be marked year-to-year fluctuations in numbers of deaths. This information might be of particular use when planning services. Furthermore, the upper credible intervals could be interpreted as suggesting that there is a 95% certainty that COPD mortality rates in males in 2009 would only be 14% higher than in 1999, even if respiratory infection activity were high, whereas, in females, projections suggested they could be up to two-thirds higher than in 1999 with 95% certainty. This type of information would not have been available from the risk factor models employed by Feenstra et al. 39 or the GBD Study 1 as infectious disease activity was not included in the model and confidence intervals were not presented.
Another issue relates to the degree of certainty required for the projections. Many Bayesian studies use 70% or 80% credible intervals (which are lower than the 95% confidence interval levels used in frequentist analyses in recognition that certain prior knowledge about the data, as well as statistical uncertainty, are reflected in the results) and the 90% bounds presented here may be more stringent than required for planning purposes 65.
Accounting for changes in smoking
A refinement of age–period–cohort analyses is to substitute one or more of the parameters by changes in a known risk factor with an appropriate lag, e.g. the impact of changes in average tar content in cigarettes on lung cancer mortality, which might be expected to exert a period effect (change in rates affecting all ages) 45. This is less easy for COPD as, unlike cancers, the components of cigarette smoke leading to disease development have not been definitively characterised 66. Repeating the analyses presented here using annual tobacco consumption by sex lagged 20 or 40 yrs, as the period effect did not improve predictions and resulted in an unrealistic widening of credible intervals, particularly in females 67. This seemingly unintuitive result can be explained using the knowledge that, in the UK, there are known strong year of birth (cohort) effects in smoking behaviour with males first taking up smoking in large numbers around the time of the First World War and females doing so in the 1940s. The current authors' interpretation is that cumulative smoking exposure is the most important predictor of COPD mortality trends, and this is already captured by the cohort effect. Therefore, any impact of year-to-year changes in smoking habits, which would be reflected in the period effect, adds little explanatory information to the model 42, 56. Another possible explanation is that the measure used was too crude an indicator of tobacco use. Substituting the cohort parameter with a suitable smoking covariate is unlikely to improve predictions over a short time period because the data will hold very good information about cohort trends for the oldest cohorts with the largest numbers of deaths. Cohort smoking information may be able to improve predictions in younger cohorts in the short term, but these constitute the smallest number of deaths 41. It could potentially improve predictions over a longer time period, assuming smoking remains a major influence on COPD mortality.
Mortality data in the UK are of high quality, represent almost 100% of deaths and are usually certified by a medical practitioner or coroner 68. Changes in diagnostic patterns are inevitable over a 50-yr period, but the present authors consider that data were compatible over this time period. Both COPD and lung cancer have distinctive clinical and radiographic features, and the ICD codes used to define COPD were deliberately broad to allow for changes in diagnosis over time. The analysis was restricted to those aged ≥45 yrs as most of the deaths from obstructive lung disease occur in older individuals. Including younger age groups would have been possible, but this would have reduced precision and widened credible intervals 69. There was no advantage to using data adjusted with factors derived from bridge-coding exercises in this analysis because changes to numbers of deaths resulting from coding revisions were small.
A separate issue concerns the usefulness of mortality as an estimate of the public health burden of obstructive lung disease. Some advantages are that mortality and population data are readily available in many countries, information on the quality of the data is available, and this type of analysis is relatively cheap and straightforward. However, it is likely to underestimate the burden of disease. This analysis used COPD statistics relating to underlying cause of death on the death certificate, but, in England and Wales, this only accounts for ∼60% of individuals with the disease mentioned anywhere on the death certificate 70. The Tucson cohort study suggested that only 33% of patients with COPD in life had COPD listed anywhere on the death certificate, although this percentage increased to 77% in subjects with moderate-to-severe disease 71.
In conclusion, predicting future trends in a disease is difficult. Both risk factor and age–period–cohort approaches are based on a number of assumptions; for example, that the dose–response coefficient of the risk factor will remain constant into the future or that current age, period and cohort trends will continue. The extent to which projection methods can predict existing data should be readily available for published projections. As all methods have flaws, a more complete assessment can be made by comparing projections obtained using risk factor and extrapolation methods.
The authors would like to thank all attendees for their active participation in the workshop: R. Beasley, A.S. Buist, K.R. Chapman, Y. Fukuchi, D. Gorecka, A. Gulsvik, A. Hansell, S. Hurd, C. Lai, T. Lee, A. Lopez, D. Mannino, D. Mapel, A. Menezes, M. Miravitlles, D. Sin, S. Sullivan, M. Thun, P. Vermeire, J. Vestbo, G. Viegi, W. Vollmer, G. Watt, J. Hogg, W.C. Tan, S. Ferris-O'Donnell, R. Jagt, K. Knobil, T. Leonard, H. Muellerova, G. Nadeau, M. Sayers, J. Soriano, M. Spencer and R. Stanford.
They would also like to thank K. Poinsett-Holmes for editorial assistance and G. Morley for logistics support. Finally, contributions by the following are acknowledged: G. Marks, N. Pride, P. Aylin and N. Best for their collaboration with A. Hansell in the “Projecting the future burden of COPD” section of the manuscript.
This is the second of four manuscripts presenting the proceedings of a scientific workshop entitled The Global Burden of COPD, held in Vancouver, Canada, October 21–22, 2004, which will appear in consecutive issues of the European Respiratory Journal. A question and answer document file following each of the manuscripts presented during the workshop is available at www.ersnet.org/elearning
Previous articles in this series: No. 1: Chapman KR, Mannino DM, Soriano JB, et al. Epidemiology and costs of chronic obstructive pulmonary disease. Eur Respir J 2006; 27: 188–207.
This article has supplementary material accessible from www.erj.ersjournals.com.
- Received March 4, 2005.
- Accepted August 15, 2005.
- © ERS Journals Ltd