Abstract
Chronic obstructive pulmonary disease (COPD) is greatly underdiagnosed worldwide and more efficient methods of case-finding are required. We developed and externally validated a risk score to identify undiagnosed COPD using primary care records.
We conducted a retrospective cohort analysis of a pragmatic cluster randomised controlled case-finding trial in the West Midlands, UK. Participants aged 40–79 years with no prior diagnosis of COPD received a postal or opportunistic screening questionnaire. Those reporting chronic respiratory symptoms were assessed with spirometry. COPD was defined as presence of relevant symptoms with a post-bronchodilator forced expiratory volume in 1 s/forced vital capacity ratio below the lower limit of normal. A risk score was developed using logistic regression with variables available from electronic health records for 2398 participants who returned a postal questionnaire. This was externally validated among 1097 participants who returned an opportunistic questionnaire to derive the c-statistic, and the sensitivity and specificity of cut-points.
A risk score containing age, smoking status, dyspnoea, prescriptions of salbutamol and prescriptions of antibiotics discriminated between patients with and without undiagnosed COPD (c-statistic 0.74, 95% CI 0.68–0.80). A cut-point of ≥7.5% predicted risk had a sensitivity of 68.8% (95% CI 57.3–78.9%) and a specificity of 68.8% (95% CI 65.8.1–71.6%).
A novel risk score using routine data from primary care electronic health records can identify patients at high risk for undiagnosed symptomatic COPD. This score could be integrated with clinical information systems to help primary care clinicians target patients for case-finding.
Abstract
Patients at high risk of undiagnosed symptomatic COPD can be identified using electronic primary care health records http://ow.ly/pAc4309MBl5
Introduction
Chronic obstructive pulmonary disease (COPD) is the third leading cause of mortality worldwide [1], but 50–90% of the disease burden remains undiagnosed. Patients with undiagnosed COPD have been shown to have significant morbidity and burden to health services from exacerbations many years prior to their diagnosis, therefore contributing to a large drive worldwide to improve early diagnosis [2, 3]. While mass screening with spirometry among asymptomatic individuals is not recommended [4], earlier identification of patients with clinically significant but unreported symptoms (case-finding) could improve access to care and prevent disease progression [5].
A systematic approach to case-finding using an initial screening questionnaire mailed to ever-smokers (current and ex-smokers) followed by invitation to spirometry among those reporting relevant symptoms was recently evaluated in primary care [6, 7]. This proved to be twice as effective, and was more cost-effective than opportunistic case-finding and identified a substantial proportion of patients with potential to benefit from effective interventions. However, this method targeted a broad population (all ever-smokers aged 40–79 years) and was also reliant on patient response [8]. A more efficient approach is therefore needed.
A number of risk scores have been proposed, including one developed by our team, to help identify patients at high risk of undiagnosed COPD using routine clinical records [9–11]. However, their case definitions included patients with a new record of COPD diagnosed through usual care. Estimates in England suggest approximately two-thirds of COPD cases are undiagnosed [12–14]. Given this extent, the characteristics of patients diagnosed through routine clinical care may differ from those detected through active case-finding. Risk scores should therefore ideally be derived using case-found populations.
We report the development and validation of a new clinical score for identifying patients at high risk of undiagnosed COPD in primary care using data from TargetCOPD, a large cluster randomised controlled case-finding trial [6, 7].
Methods
This report has been written in accordance with the TRIPOD statement [15].
Study design
This is a retrospective cohort analysis of the intervention (case-finding) arm of the TargetCOPD cluster randomised controlled trial (RCT) [6], to develop and validate a risk score for identifying undiagnosed COPD. General practices in the TargetCOPD trial were randomised to either targeted case-finding or routine care. Eligible participants were recruited from August 2012 to June 2014. Those in practices that were allocated to the case-finding arm were individually randomised to either receive a screening questionnaire only when attending routine clinical appointments or to additionally receive a screening questionnaire by post. Participants reporting relevant respiratory symptoms (chronic cough or phlegm for ≥3 months of the year for ≥2 years, wheeze in the previous 12 months or dyspnoea of Medical Research Council grade ≥2) were offered a diagnostic assessment with post-bronchodilator spirometry. We used data from their primary care electronic health records (EHRs) and spirometry assessment to develop and validate a risk score for undiagnosed COPD.
Population
Participants were aged 40–79 years with no prior diagnosis of COPD (supplementary table S1 provides clinical codes used for exclusion). Subjects were further excluded at the discretion of their general practitioner (e.g. terminal illness, recent bereavement, learning difficulties or pregnancy). This analysis was restricted to a subset of participants from 13 of the participating 27 practices allocated to the case-finding arm for whom data from their EHRs were available.
Setting
The TargetCOPD trial was based in primary care practices in the West Midlands, UK [6]. Participating practices broadly reflected the diversity of the population in terms of age, ethnicity, socioeconomic status (SES) and practice characteristics.
Outcome
COPD was defined as the presence of at least one chronic respiratory symptom (as described in the study design) together with airflow limitation measured by post-bronchodilator spirometry. Spirometry was performed to American Thoracic Society/European Respiratory Society standards [16] by trained research assistants using EasyOne spirometers (ndd Medical Technologies, Zurich, Switzerland) 20 min after the inhalation of 400 μg of salbutamol delivered through a metered dose inhaler and Volumatic spacer. Spirometers were calibrated on a daily basis and all research assistants underwent supervised training over a period of 3–6 months. All spirometry traces were reviewed by a lung function specialist. For this analysis airflow limitation was defined as a forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) ratio less than the lower limit of normal (less than the fifth percentile) adjusted for age, sex, height and ethnic group using the Global Lung Initiative 2012 equations, which provide the most recent and most representative global estimates [17]. This conservative definition of airflow limitation is less likely to overdiagnose COPD in older patients compared with using a fixed-ratio definition [18].
Data extraction
Data (clinical codes; see supplementary table S2) were extracted from EHRs based on predictors identified as potentially important in our previous analysis [10], including demographic characteristics, smoking status, respiratory symptoms, comorbidities, lower respiratory tract infections (LRTIs), respiratory medication prescriptions and selected antibiotic use indicated for the treatment of LRTIs. Data from residential postcodes were used to estimate SES using the Index of Multiple Deprivation (a measure of SES based on participants’ residential postcodes; higher scores indicate higher levels of socioeconomic deprivation) [19]. All data were stored on an encrypted database.
Sample size
Subjects with missing outcome (COPD) status (predominantly those invited but who did not attend a spirometry assessment) were excluded from the analysis (n=755). Data from 2398 subjects who returned a postal questionnaire were used for model development (development sample) and from 1097 subjects from the same set of practices who returned an opportunistic questionnaire for external validation (external validation sample) (figure 1). This nonrandom splitting of the data ensured the developed risk score could be validated in new data from a different part of the intended population [20]. 7.9% of all subjects were newly diagnosed with COPD through the trial (198 in the development sample and 77 in the external validation sample). At least 10 outcome events are recommended per candidate predictor considered for inclusion in a logistic regression model [21]. There was therefore sufficient power to consider up to 19 candidate predictors in the developed model.
Model development
The model was developed using multivariable logistic regression considering the following candidate predictors for inclusion: age, sex, most recent smoking status, history of asthma and LRTIs, complaints of cough, dyspnoea, wheeze and sputum, and prescriptions of salbutamol, prednisolone and antibiotics, within the previous 3 years. Since there were very little (<1%) missing data for these candidate predictors, a complete-case analysis was performed [22]. We tested for interactions with a particular focus on age, sex and smoking status. The best-fitting terms for continuous variables were determined using fractional polynomial regression [23]. Predictors not statistically significant at the p<0.05 level were removed from the model (although age and smoking status were forced in because of their known clinical importance). The fit of the reduced model was then compared with the full model using a likelihood ratio test.
To improve the calibration of the model predictions and adjust for overfitting, the model's calibration slope coefficient was estimated in 1000 bootstrap samples to determine the shrinkage factor (the average calibration slope). This was multiplied against predictor coefficients in the developed model to produce the final model equation [24].
Internal validation performance
The sensitivity and specificity of the predicted probabilities from the final risk score were plotted on a receiver operator characteristic (ROC) curve to examine the discrimination performance. The risk score was internally validated using bootstrap resampling (with 1000 replications) to estimate the c-statistic (area under the ROC curve) corrected for overfitting [25]. Calibration was assessed by grouping subjects into deciles of predicted risk and comparing the observed with the expected number diagnosed with COPD.
External validation performance
The c-statistic and calibration of the final risk score were then assessed in the external validation sample. As a comparator, we also assessed the discrimination performance of our previously developed clinical score [10] in the external validation sample. This model included smoking status, history of asthma, LRTIs and prescriptions of salbutamol as predictors of undiagnosed COPD.
Sensitivity analysis
The final risk score was additionally validated in the external validation sample using a case definition that also included the presence of at least one chronic respiratory symptom, but required an alternative definition of airflow obstruction commonly used in clinical practice (FEV1/FVC <0.7) [26].
Preparing the risk score for clinical practice
To prepare the risk score for use as a screening tool, we evaluated cut-points for dichotomising the predicted probabilities into low and high risk. The sensitivity and specificity were calculated in the external validation sample across a range of cut-points, alongside the positive and negative predictive values, likelihood ratios, and number of diagnostic assessments needed to identify one individual with undiagnosed COPD. All analyses were performed using Stata version 13.1 (StataCorp, College Station, TX, USA).
Ethical approval
Ethical approval for the TargetCOPD trial was received from the Solihull Ethics Committee (Integrated Research Application System reference 11/WM/0403).
Results
Practice characteristics
Practice size varied, with the majority having a list size below 10 000 (supplementary table S3). Most practices served populations in socioeconomically deprived areas with a diverse range of ethnicities. The mean (range) prevalence of diagnosed COPD prior to the trial was 1.3% (0.8–2.9%).
Development sample: population characteristics
The development sample included 2398 individuals, of whom 198 (8.3%) were diagnosed with COPD during the study (figure 1). The mean age was 59.6 years, 51.6% were male and the majority (85.0%) were of white ethnicity. The majority (77.7%) of newly diagnosed COPD was mild (FEV1 % pred ≥80%), with 21.1% moderate (FEV1 % pred 50–79%), 1.0% severe (FEV1 % pred 30–49%) and 0.2% very severe (FEV1 % pred <30%).
Based on data extracted from EHRs (table 1), current smoking was significantly more common among participants with COPD than those without (32.8% versus 14.1%). There was also a higher prevalence of asthma, and a slightly higher prevalence of anxiety and depression, among those with COPD. However, the prevalence of other chronic conditions was similar in both groups. Documented cough, dyspnoea, sputum production, LRTIs and respiratory prescriptions were all also more common among individuals with COPD.
Individuals with unknown COPD status (predominantly those who did not attend an assessment) differed from those in the development sample across a number of demographic characteristics (supplementary table S4): they were generally younger (mean age 55.8 versus 59.6 years), and higher proportions were female (52.5% versus 48.4%) and current smokers (33.5% versus 15.8%).
Model results
Complete data for candidate predictors were available for 2380 patients (99.2%) in the development sample (table 2). The final model of EHR-recorded factors included smoking status, age, dyspnoea, prescriptions of salbutamol and prescriptions of antibiotics (table 3). Age was included as two fractional polynomial terms since it was not linear in the logit scale. The final model fitted as well as the full model (likelihood ratio test p=0.185) and no significant interactions were found. The final model equation was: predicted probability of undiagnosed COPD=ex/(1+ex), where x=(1.43×10−4×age3)–(3.18×10−5×age3×ln[age])+(0.51×ex-smoker [Y/N])+(1.60×current smoker [Y/N])+(0.72×dyspnoea [Y/N])+(0.045×number of salbutamol prescriptions)+(0.99×salbutamol prescriptions [Y/N])+(0.47×antibiotic prescriptions [Y/N])−6.16, with Y=yes (value=1) or N=no (value=0).
Internal validation
When applied to the development sample the apparent c-statistic was 0.76 (95% CI 0.73–0.80) and after correcting for overfitting using bootstrapping was 0.76 (95% 0.72–0.79). Although smoking status and age were the most important predictors in the risk score, restricting it to just these variables reduced the c-statistic to 0.65 (95% CI 0.60–0.69).
External validation sample: population characteristics
Among 1097 subjects in the external validation population, 77 (7.0%) were newly diagnosed with COPD (supplementary table S5). The mean age was 60.1 years and 51.6% were male, similar to the development sample. Again, a significantly greater proportion of subjects with COPD were current smokers (31.2% versus 17.1%). However, participants in the external validation sample had a slightly higher SES. 1083 subjects (98.7%) had complete data on all candidate predictors and were included in the external validation.
External validation: risk score performance
The developed risk score demonstrated similar discrimination characteristics when applied to the external validation sample (c-statistic 0.74, 95% CI 0.68–0.80) and performed better than our previously developed clinical score (c-statistic 0.70, 95% CI 0.64–0.76) in the external validation sample (figure 2) [10]. The final risk score showed excellent calibration of observed to predicted COPD risk up to 10%, but slightly overestimated the predicted risk from 10% to 30%, beyond which comparisons were unreliable due to small sample sizes (table 4). When using the fixed-ratio definition of airflow limitation (FEV1/FVC <0.7), the c-statistic for the final risk score remained at 0.74 (95% CI 0.70–0.78).
Implementation in clinical practice
Increasing the cut-point to define high risk reduces the number of assessments needed for each new diagnosis of COPD, although accompanied by a reduction in sensitivity (table 5). The optimum cut-point should balance both sensitivity and specificity, taking into consideration costs and resource availability. At a cut-point of 7.5% (i.e. classing subjects with a predicted risk ≥7.5% as high risk), which would represent 33.9% of the target population, the risk score is estimated in the external validation sample to have a sensitivity of 68.8% (95% CI 57.3–78.9%) and a specificity of 68.8% (95% CI 65.8–71.6%), and would require 7 (95% CI 6–10) patients to undergo a diagnostic assessment to identify one with COPD.
Discussion
Principal findings
We have developed and externally validated the TargetCOPD score from a large case-finding trial in primary care [6, 7] to predict the risk of undiagnosed COPD using routine data from EHRs. The risk score incorporates five factors commonly recorded in health records: age, smoking status, presence of dyspnoea, prescriptions of salbutamol and prescriptions of antibiotics commonly prescribed for LRTIs. When externally validated, the risk score discriminated between patients with and without COPD, and performed better than our previously developed score [10], which relied on incident COPD from routine records rather than actively case-found patients. The risk score also performed similarly when using the fixed-ratio definition of airflow limitation. In our newly developed risk score, a cut-point of ≥7.5% would expect to identify >70% of patients with undiagnosed COPD, needing seven diagnostic assessments for each new diagnosis. Use of higher cut-points could reduce this number at the expense of reducing sensitivity.
Comparison with existing literature
Several other risk scores have previously been developed for undiagnosed COPD, although the TargetCOPD score is the only one to use case-found COPD patients (table 6). As with other scores, our own previous risk score used newly diagnosed COPD patients, identified through routine care [10]. Its final predictors differed from the TargetCOPD score, including LRTIs and history of asthma but not history of dyspnoea or prescriptions of antibiotics as predictors. Furthermore, the TargetCOPD score overcomes an important limitation of our previous risk score, where we could not include the effect of age as it was a matching factor (although it is well established that risk of COPD increases with age) [27]. A history of asthma and LRTIs did not remain statistically significant in the full multivariable model in the current analysis. However, prescriptions of salbutamol and antibiotics are closely associated with asthma and LRTIs, respectively, and are possibly better documented in EHRs; therefore, they may be reflecting similar clinical features.
Kotz et al. [9] also recently developed and internally validated a prediction model for COPD using routine longitudinal data from general practices in Scotland. Their model included age, smoking status, history of asthma and also socioeconomic deprivation, but only considered a limited range of risk factors and was not externally validated. Their model, like our previous clinical score, was developed on incident cases of COPD diagnosed through routine care, the disease status of which may have been misclassified because of underdiagnosis [2] and misdiagnosis [28]. Other risk scores have also been developed for COPD using routine primary and secondary healthcare data [29–31], but are unlikely to be applicable in primary care due to the predictors included, many of which are not routinely recorded solely in primary care records (table 6).
A number of other case-finding tools have also been developed and evaluated including screening questionnaires and handheld flowmeters [32–34]. However, these require additional resources and patient interactions, and are likely to be less efficient than the use of automated risk prediction scores.
Strengths
We investigated a range of risk factors and developed and validated our risk score on a population with no prior diagnosis of COPD that were actively case-found in a wide range of general practices. We employed a robust case definition which is likely to be representative of clinically significant, undiagnosed COPD and confirmed with quality-assured spirometry. The developed risk score was externally validated, increasing the likelihood of its validity in other primary care populations, although further external validation is needed on populations from a different location. The final risk score incorporates a small number of commonly recorded factors from EHRs that should ensure its applicability in routine primary care in the UK and similar health systems. However, it would be more challenging to implement in health systems that use paper-based health records or where electronic records are less detailed.
Limitations
We used a smaller sample size than several other studies reporting the development of COPD risk scores from routine healthcare data [9, 10, 29]. Although the study was adequately powered for the number of risk factors considered for the model selection, a larger sample size would have enabled estimation of the parameters with greater precision. Ideally a larger sample size would have been used for external validation (simulation-based estimates suggest at least 100 outcome events are required [35]) and would have improved our ability to evaluate the score calibration.
The case definition of COPD used in this study was the presence of relevant self-reported symptoms in addition to airflow limitation and patients who did not report symptoms were not assessed with spirometry. However, patients may underreport symptoms and compensate for them by limiting their activities. This could have introduced misclassification bias. Furthermore, only 25.7% of all eligible patients responded to the screening questionnaire, which could have introduced response bias and may limit the generalisability of the score. However, this response rate is similar to the average response rate to questionnaires seen in other case-finding studies [33] and because of the pragmatic nature of the trial is likely to represent patients who might respond to screening invitations in real clinical practice.
Finally, the validity of our risk score among all potential subjects could not be determined because we were not able to include those with unknown COPD status and their characteristics differed from those included in our analysis across a number of demographic characteristics. However, our risk score is applicable to populations of individuals that are likely to respond to questionnaire surveys and are willing to attend subsequent clinical assessment.
Implications for clinicians, policy makers and research
The TargetCOPD score has been developed to help primary care services stratify patients according to their risk of undiagnosed COPD for targeted systematic case-finding (supplementary figure S1). The US Preventive Services Task Force recently recommended against screening for asymptomatic COPD on the basis that there was no evidence that it improves health-related quality of life, morbidity or mortality [4]. By contrast, the TargetCOPD score has been developed from patients with symptomatic and spirometry-verified disease who are more likely to benefit from treatment.
The score's ability to estimate the probability of undiagnosed COPD could be used to risk-stratify patients and could be used to help prioritise referral for diagnostic assessment, including spirometry, or for further screening (e.g. using handheld flowmeters). General practitioners could decide on a cut-point which reflects the resources available to them for conducting high-quality spirometry, balancing sensitivity and specificity. Since it relies entirely on routinely recorded data from EHRs, it could be integrated with clinical information systems by programming the model into these digital platforms. This would be applicable in countries with primary care clinical information systems similar to the UK, such as in a number of Western European countries, Israel, the USA, New Zealand, Australia and Canada [36, 37].
Finally, the TargetCOPD score should be externally validated in other primary care populations to better assess its generalisability, and its effectiveness in practice evaluated in RCTs, where the impact of using the risk score on patient outcomes can be evaluated as well as the associated costs [38]. This could include a cluster RCT comparing clinical outcomes (e.g. quality of life, hospitalisation and mortality) in practices that use the risk score to actively case find patients with undiagnosed COPD against practices that continue with alternative approaches to case-finding and usual care.
Conclusions
We have developed and externally validated the TargetCOPD score for assessing the risk of undiagnosed COPD among patients in primary care using routine data from EHRs. This is the first risk score for COPD that has been derived from patients identified through systematic case-finding and uses routine healthcare data readily available in many primary care settings. It could be used to help identify patients at high risk of COPD to provide appropriate clinical care, including earlier testing and treatment. The risk score should be externally validated in further populations, and its impact on clinical care and outcomes evaluated in RCTs.
Supplementary material
Supplementary Material
Please note: supplementary material is not edited by the Editorial Office, and is uploaded as it has been supplied by the author.
Supplementary material ERJ-02191-2016_Supplement
Disclosures
Supplementary Material
P. Adab ERJ-02191-2016_Adab
D. Fitzmaurice ERJ-02191-2016_Fitzmaurice
S. Haroon ERJ-02191-2016_Haroon
R. Jordan ERJ-02191-2016_Jordan
R.D. Riley ERJ-02191-2016_Riley
Acknowledgements
We would like to thank Nicola Adderley (University of Birmingham, Birmingham, UK) for greatly assisting with data collection, Alexandra Enocson (University of Birmingham, Birmingham, UK), Andy Dickens, (University of Birmingham, Birmingham, UK), Joanne O'Beirne-Elliman (University of Warwick, Coventry, UK) and the Birmingham Lung Improvement Studies (BLISS) Team (University of Birmingham, Birmingham, UK) for the successful execution of the TargetCOPD trial, Martin Miller (University of Birmingham, Birmingham, UK) for reviewing all spirometry readings and providing expert guidance on lung function assessment, and Kym Snell (Keele University, Keele, UK) for advice on bootstrap validation. Most of all we would like to thank all the patients, including those in the BLISS patient advisory group, who participated in the TargetCOPD trial.
Author contributions: S. Haroon wrote the final protocol, collated the data, conducted the analysis and wrote the report. R.E. Jordan, P. Adab and R.D. Riley conceived the study, and wrote the initial protocol, advised on the conduct of the study, analysis and drafting of the manuscript. R.D. Riley provided specialist statistical support. R.E. Jordan, P. Adab and D. Fitzmaurice advised on the drafting of the manuscript. R.E. Jordan, P. Adab and D. Fitzmaurice are principal investigators on the TargetCOPD trial. All authors commented on and approved the final manuscript. P. Adab is the guarantor. P. Adab affirms that the manuscript is an honest, accurate and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.
Footnotes
This article has supplementary material available from erj.ersjournals.com
Support statement: This paper presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Dept of Health. S. Haroon was funded by a NIHR doctoral fellowship (DRF-2011-04-064). R.E. Jordan was funded by an NIHR post-doctoral fellowship (pdf/01/2008/023). P. Adab and R.E. Jordan are both principal investigators on an NIHR programme grant for investigating COPD in primary care (RP-PG-0109-10061). Funding information for this article has been deposited with the Crossref Funder Registry.
Conflict of interest: Disclosures can be found alongside this article at erj.ersjournals.com
- Received November 7, 2016.
- Accepted March 2, 2017.
- Copyright ©ERS 2017