Abstract
The COPD assessment test (CAT) is a self-administered questionnaire that measures health-related quality of life. We aimed to systematically evaluate the literature for reliability, validity, responsiveness and minimum clinically important difference (MCID) of the CAT.
Multiple databases were searched for studies analysing the psychometric properties of the CAT in adults with chronic obstructive pulmonary disease. Two reviewers independently screened, selected and extracted data, and assessed methodological quality of relevant studies using the COSMIN checklist.
From 792 records identified, 36 studies were included. The number of participants ranged from 45 to 6469, mean age from 56 to 73 years, and mean forced expiratory volume in 1 s from 39% to 98% predicted. Internal consistency (reliability) was 0.85–0.98, and test–retest reliability was 0.80–0.96. Convergent and longitudinal validity using Pearson’s correlation coefficient were: SGRQ-C 0.69–0.82 and 0.63, CCQ 0.68–0.78 and 0.60, and mMRC 0.29–0.61 and 0.20, respectively. Scores differed with GOLD stages, exacerbation and mMRC grades. Mean scores decreased with pulmonary rehabilitation (2.2–3 units) and increased at exacerbation onset (4.7 units). Only one study with adequate methodology reported an MCID of 2 units and 3.3–3.8 units using the anchor-based approach and distribution-based approach, respectively. Most studies had fair methodological quality.
We conclude that the studies support the reliability and validity of the CAT and that the tool is responsive to interventions, although the MCID remains debatable.
Abstract
Studies support the reliability, validity and responsiveness of the CAT as a HRQoL tool but its MCID remains unclear http://ow.ly/xkVNA
Introduction
Establishing a diagnosis of chronic obstructive pulmonary disease (COPD) requires spirometry; however, recent guidelines suggest that classifying COPD solely by forced expiratory volume in 1 s (FEV1) % predicted is inadequate in reporting disease severity [1]. Assessing a patient’s health-related quality of life (HRQoL) allows clinicians to make individualised patient management decisions; thus, the Global Initiative for Chronic Obstructive Lung Disease (GOLD) strategy document advocates that COPD management no longer be stratified solely by spirometric classification, but through a multidimensional assessment of specific patient attributes [2, 3].
COPD-specific questionnaires assessing HRQoL do exist (e.g. the St George’s Respiratory Questionnaire (SGRQ) or the Chronic Respiratory Questionnaire (CRQ)), although some are impractical for clinical use as they are time consuming [2]. GOLD consequently proposes using either the modified British Medical Research Council (mMRC) dyspnoea scale or the COPD assessment test (CAT); however, preferential recommendation is given to the CAT since it provides a thorough coverage of the impact of COPD on wellbeing [2].
The CAT was created using COPD patients’ input, then developed using modern questionnaire methodology: psychometric analysis and item response theory using Rasch analysis identified items with the best fit to form a unidimensional instrument [4, 5]. The self-administered questionnaire consists of eight items assessing various manifestations of COPD aiming to provide a simple quantified measure of HRQoL [5]. A preliminary evaluation of the CAT’s psychometric properties has been promising [5]. Summarising the current knowledge on the performance of this tool as a HRQoL measurement instrument is valuable, as the test could have important roles in COPD clinical practice and research. To the best of our knowledge, a comprehensive review of the psychometric properties of the CAT questionnaire has not been conducted.
Our objectives for this review were to systematically search the literature to evaluate and summarise the psychometric properties of the CAT (reliability, validity, responsiveness and minimum clinically important difference (MCID)) as a HRQoL instrument used in patients with COPD.
Methods
Detailed descriptions of the psychometric properties assessed in the review and the completed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist can be found in the online supplementary material.
Eligibility criteria for study selection
Randomised controlled trials and observational studies (e.g. cross-sectional, cohort, etc.) with >10 participants were included. Study participants needed to be subjects aged ≥40 years diagnosed with COPD (using the GOLD criteria) [2]. Interventions could have been any intervention, placebo, usual care or time. Outcomes evaluated consisted of the CAT reliability, validity, responsiveness or MCID. Studies that reported at least one or more psychometric properties were included. Detailed inclusion and exclusion criteria can be found in the online supplementary material.
Information sources and search
A search was conducted on March 10, 2014 in the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects, and Clinical Query in PubMed, to identify previous systematic reviews on the subject. The solitary review retrieved addressed several HRQoL instruments, had a broad review question and a limited search strategy [6].
A structured search was performed on March 10, 2014 in five electronic general databases: Cochrane Central, PubMed Medline, OvidSP Medline, OvidSP Embase and Thomson Reuters ISI Web of Knowledge Web of Science. The database searches were done from the year 2009 onwards as this was the year that the CAT was developed. We used a variety of search terms, including text words and database-specific subject headings, for articles in English, French or Spanish. The detailed, database-specific search strategies can be found in the online supplementary material. Top-ranked respiratory journals and the ProQuest Dissertations & Theses database were manually searched. Reference lists from existing narrative reviews of the CAT were searched for potential studies, as were bibliographies of all included studies.
Study selection, data collection process and data items
Two reviewers (Nisha Gupta and Lancelot M. Pinto) independently screened the title and abstract of each study identified from the search. All the potentially relevant articles were then retrieved in full-text form and two reviewers (Nisha Gupta and Lancelot M. Pinto) performed the secondary screen after a review of the full text of the chosen articles. Disagreement on the inclusion or exclusion of a specific study was resolved by reaching a consensus. When necessary, disagreements were resolved by a third reviewer (Jean Bourbeau). A list of excluded studies and reasons for their exclusion was maintained.
Data were electronically extracted from each eligible study using a piloted data extraction form. The form was revised and improved after pilot data extractions were performed to assess concordance between the reviewers.
Two reviewers (Nisha Gupta and Andreea Morogan) independently extracted data from each included study, including study characteristics, population characteristics, interventions and/or events and outcomes studied, along with the corresponding measures of test performance. The data extraction form can be found in the online supplementary material.
Quality assessment of included studies
Two reviewers (Nisha Gupta and Andreea Morogan) independently performed quality assessment for each study using the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN) checklist, which is a validated quality assessment tool that evaluates the methodological quality of studies assessing psychometric properties of an instrument; it is the only specific checklist for methodological evaluation of psychometric properties on patient-reported outcomes [7, 8]. The methodological quality of each psychometric property was evaluated through a number of items and was scored using the four-point rating scale of “excellent”, “good”, “fair” and “poor”. An overall score for the methodological quality of a study was given for each psychometric property by taking the lowest rating of an item (“worst score counts” method) [7]. The list of items, scoring rules and psychometric properties to which the checklist applies to can be found in the online supplementary material.
Synthesis of results
A narrative synthesis was employed to summarise the current knowledge on the CAT’s reliability, validity, responsiveness and MCID. Data were tabulated through detailed tables that compared the studies with respect to study characteristics, population characteristics and CAT psychometric properties. Data synthesis was based on the provision of appropriate and similar outcomes studied, noting the specific statistical tests used. Computing a range (minimum to maximum) of study results on a particular psychometric property assessed the strength and adequacy of the psychometric property. A discussion of the impact of methodological quality on study results was explored to provide some assessment of quality and heterogeneity between the studies. All analyses were performed using STATA 11 (Stata Corporation, College Station, TX, USA).
Risk of bias across studies
Language bias was assessed by retrieving citations from the search strategy with language filters (English, French and Spanish) and without, and was reported as an average across the five general databases according to the filtered citations as a percentage of the overall citations retrieved.
Results
Study selection
figure 1 shows the study selection procedure and numbers of studies screened, assessed for eligibility and included in the review. A total of 36 articles were included in the qualitative synthesis.
Summary of literature search and study selection.
Study characteristics
table 1 summarises the study and population characteristics of the included studies according to the number of outcomes assessed: nine (25%) studies assessed reliability (internal consistency and test–retest), 32 (89%) studies assessed validity (concurrent, convergent, longitudinal and known groups validity), ten (28%) studies assessed responsiveness and four (11%) studies assessed MCID [5, 9–43].
The CAT was administered in 32 countries spanning Europe, North America, South America, Asia and Africa, with 17 studies published in 2012 and 11 studies published in 2013. Of the 36 studies, 16 were prospective cohorts and the remaining 20 were cross-sectional. Types of interventions or events evaluated in the prospective cohorts included pulmonary rehabilitation, onset of an acute exacerbation, recovery from an acute exacerbation and usual care; and the duration of follow-up ranged from 2 to 24 weeks. The number of participants ranged between 45 and 6469 with the percentage of female subjects between 0% and 64.4%. The range for mean age was between 55.9 and 73.0 years and mean FEV1 between 38.7% and 98.0% predicted. 23 studies specified the number of individuals in GOLD grades.
Nonresponse rate and floor and ceiling effect
Only six studies reported the proportions of the population with complete CAT scores (no missing items) and those with missing items [9, 15, 17, 21, 26, 41] (table 2). The percentage of patients with the minimum (floor effect) and maximum (ceiling effect) possible total score was measured in two populations [9, 36] (table 2).
Reliability
The internal consistency of the CAT was reported in eight studies, with the Cronbach’s α range from 0.85–0.98 indicating a high correlation between items [5, 9, 10, 14, 15, 18, 19, 21] (table 2). Test–retest was evaluated in five studies to measure reproducibility [5, 9, 10, 14, 30] (table 2), with the CAT administered on two different occasions (at baseline and then either 1 or 2 weeks later) for three studies [5, 10, 30] and on three separate occasions (at baseline, 2 weeks and then 6 weeks later) for one study [14]. The ICC ranged from 0.80–0.96, demonstrating that the CAT is consistent in producing scores when administered repeatedly under stable disease condition.
Validity
Three studies measured concurrent validity by comparing the total CAT score to healthcare utilisation [15, 34, 39]. The number of physician consultations was associated with total CAT scores in one study (p<0.001) [34], but not in the others [15, 39]. The number of hospitalisations was directly associated with total CAT scores, irrespective of using the total CAT score (p<0.001) [15] or arbitrarily dichotomising the CAT score (e.g. <10 and ≥10) (p<0.001 [34] and p<0.032 [39]). Likewise, the number of emergency room visits was associated with the total CAT scores in two studies (p<0.001 [15] and p<0.001 [34]), but not in the other [39].
Convergent validity was assessed in 21 studies in which the CAT was compared to various questionnaires [5, 9–15, 17–20, 22, 23, 25, 27–29, 31, 41, 42] (table 3). The patients were in stable state when measuring this property unless it was unreported in the studies. CAT longitudinal validity was reported in six studies and the interventions or events consisted of pulmonary rehabilitation or recovery from an acute exacerbation [9, 11–13, 16, 24] (table 3).
19 studies reported known groups validity and the categories that could differ in HRQoL varied [5, 9, 11, 14, 17, 20–23, 25–29, 33, 36, 37, 40, 43] (table 4). The CAT score was statistically different (p<0.05) in the following categories: COPD GOLD grades [9, 14, 17, 20, 22, 25, 27, 37]; primary care physician-rated COPD GOLD grades [20, 22]; healthy individuals versus individuals diagnosed with COPD [21, 28, 33, 36, 40]; infrequent exacerbators versus frequent exacerbators (defined as no acute exacerbation in the last 6 months versus acute exacerbation in the last 6 months; 0–1, 2–4 or >4 exacerbations per year; and <2 or ≥2 exacerbations per year) [9, 23, 26, 33, 43]; exacerbation state versus stable state [5, 9, 20]; body mass index (BMI) (defined as BMI <18.5 kg·m−2, BMI ≥18.5 and <23 kg·m−2, or BMI ≥23 kg·m−2) [25]; and mMRC score [9, 27, 37]. The CAT score was not statistically different (p≥0.05) in the following categories: sex [11, 20, 21, 29]; age (defined as ≤65 years versus >65 years) [20, 29]; current smokers versus nonsmokers [23]; and comorbidities (defined as 0, 1–2 or ≥3 comorbidities) [20, 25].
Responsiveness
10 studies examined responsiveness of the CAT (table 5) [9, 11–14, 16, 24, 26, 32, 38]. The CAT was responsive to pulmonary rehabilitation in four studies and the range of mean change in CAT score was -3.0– -2.2 units at the end of the intervention [11, 12, 16, 24]. The majority of patients improved with pulmonary rehabilitation, whether it lasted for 8 weeks [11, 16, 24] or 6 weeks [12]. Responsiveness was reassessed from the end of an 8-week intervention to 6 months later, and it was determined that the total CAT score deteriorated slightly from the end of rehabilitation [16].
The majority of patients’ total CAT score deteriorated with onset of an exacerbation: mean CAT score increased by 4.7 units (p<0.001) [26]. However, with exacerbation recovery on treatment, patients’ CAT score improved over 2, 4, 6 or 12 weeks [9, 12, 13, 32, 38] (table 5). Patients with and without depressive symptoms improved their CAT scores with exacerbation recovery, but those patients without depressive symptoms had greater improvement in CAT scores over 6 weeks [38].
Minimum clinically important difference
Four studies attempted to determine the MCID of the CAT [11, 12, 14, 35]. Three studies employed the anchor-based approach to calculate MCID. Of those, two studies found that their range of external responses (e.g. much better, a little better, no different or a little worse; responders or nonresponders) was not used by an equal proportion of patients; therefore, it was not possible to determine the MCID [11, 12]. The other study identified a decrease of 2 units as an MCID estimate [35]. Two studies used the distribution-based approach and determined an MCID for the CAT of 3.76 units [14] and a decrease that ranged from 3.3 to 3.8 units [35].
Risk of bias within and across studies
The methodological quality of the studies was mostly rated fair (30 (83%) studies), with one rated poor, four rated good and one rated excellent, according to the COSMIN checklist. Internal consistency, test–retest and convergent and known groups validity were evaluated in studies of fair and good methodological quality; longitudinal validity and responsiveness were examined in studies of fair methodological quality; MCID was evaluated in studies of fair and excellent methodological quality; and concurrent validity was assessed in studies of poor and fair methodological quality. Language bias was minimal across the five general databases; English, French and Spanish filtered citations as a percentage of the overall citations retrieved were 95.4%.
Discussion
The goal in designing a HRQoL tool is for it to accurately and reliably measure HRQoL, and this review identifies the CAT’s adequacy as a HRQoL instrument. Although several articles have been published on the CAT, this is the first study to systematically review the available literature evaluating the CAT’s psychometric properties in a defined population of patients with COPD.
The psychometric properties of the CAT are both acceptable and favourable. The CAT is reliable: the interrelatedness of the eight items within the questionnaire indicate high internal consistency, while the stability of CAT total scores after repeated administrations confirms its reproducibility over time. Furthermore, the CAT demonstrates good construct validity through convergent, longitudinal and known groups validity. It is evident that the CAT is responsive and able to detect a change in score over time: the CAT score improved with pulmonary rehabilitation and exacerbation recovery on treatment, and the CAT score deteriorated with the onset of an exacerbation. Only one study reliably identified a decrease of 2 units as the MCID estimate through an anchor-based approach.
The majority of the studies (83%) did not report missing data on the CAT, and in some studies only subjects with complete data were analysed, leading to a significant proportion of patients being excluded. Future research needs to examine the floor and ceiling effects of the CAT, as they have only been addressed in two populations. Overall, the methodological quality of the studies was rated fair. All of the psychometric properties, except concurrent validity, were assessed in studies rated fair, good or excellent methodological quality.
Although the diversity of the studies retrieved resulted in examination of psychometric properties in many COPD populations, allowing for generalisation of the results, further assessment of validity and responsiveness needs to be completed in specific patient populations (e.g. females, younger age groups, mild disease) to assess the CAT’s capability of discriminating between these groups. It would be of great utility to evaluate the predictive validity of the CAT to determine whether it can predict future clinical outcomes (e.g. mortality, hospital admission, disease progression or exacerbation). Similarly, while there is no correct manner to determine the MCID, several studies must attempt to provide estimates, so that multiple results can be combined to provide a true value (see online supplementary material for methods of calculating MCID). Moreover, linked to the development of the questionnaire was a grading system based on the CAT score, for which the development group proposed potential management considerations according to each scenario. An investigation of the impact of the CAT on the quality of the primary care consultations in patients with COPD has been conducted in a randomised controlled trial, although research needs to be advanced in this area due to the study’s methodological limitations [44].
Strengths of this review include exhaustive search strategies across multiple databases, independent study retrieval, screening, data extraction and assessment of study quality. Data were insufficient to perform a meta-analysis: the variety of outcomes studied, methodological heterogeneity and diverse study populations prevented the generation of a common summary effect of a specific psychometric property, so a meta-analysis was deemed inappropriate.
There were, however, limitations. Investigating heterogeneity was not possible, but heterogeneity between studies is to still be expected. Likewise, formal assessments evaluating publication bias through a funnel plot could not be conducted. Although no language filters were applied in the search strategy, an assessment of the language bias indicated that a minor language bias could be of concern; however, the findings appear appropriate given that the CAT’s development was studied in certain languages (e.g. English) but not in others. Limitations of the data from the included studies must also be considered (e.g. standard deviations for known groups validity were not presented in the majority of studies), although they reflect the dearth of literature available on the CAT.
Conclusion
This review employed rigorous methodology to provide a comprehensive overview of the CAT’s psychometric properties in patients with COPD. The studies support the reliability and validity of the CAT and that the tool is responsive to interventions, although the MCID remains debatable. Since the CAT demonstrates good performance and is a simple and quick tool that assesses the HRQoL in patients with COPD, there is a growing interest in its use in clinical practice. Studies are needed to evaluate the use of this questionnaire for the symptomatic assessment of patients with COPD in the new GOLD classification. It cannot be assumed that the CAT behaves similarly with different patient population characteristics; thus, studies must also attempt to determine the validity of the CAT in females, patients with mild disease or individuals at risk, and younger and older patients.
Footnotes
For editorial comments see page 833.
This article has supplementary material available from erj.ersjournals.com
Support statement: This research was supported by a studentship award from the Research Institute of the McGill University Health Centre (Graduate Student Scholarship award to Nisha Gupta), the Collaborative Innovative Research Fund (CIRF) GSK Canada, the Canadian Cohort Obstructive Lung Disease (CanCOLD) Canadian Institutes of Health Research (CIHR)/Rx&D Collaborative Research Program (IRO-93326), and the Respiratory Health Network of the Fonds de recherche du Québec – Santé (FRQS).
Conflict of interest: Disclosures can be found alongside the online version of this article at erj.ersjournals.com
- Received February 5, 2014.
- Accepted May 22, 2014.
- ©ERS 2014