Abstract
Measuring chronic dyspnoea in clinical studies: numerical rating scale better than the modified Borg scale http://ow.ly/XZc1G
To the Editor:
The subjective nature of the experience of chronic breathlessness (dyspnoea) creates challenges for patients who need to communicate its intensity, and for clinicians and researchers who need to measure the symptom in order to plan management and assess the effect of interventions.
The numerical rating scale (NRS) [1] and modified Borg scale (mBorg) [2] are recommended measures for breathlessness [3]. However, their use has extended beyond their initial validation. NRS scales using different time frames (“now” and “average”) have been validated [1, 4, 5], but not for the mBorg. Further, participants might have a preference for mBorg scores with associated verbal descriptors.
Our objective was to investigate whether: 1) there is a response bias against using mBorg numerical ratings that lack categorical labels; 2) the timeframe (average per 24 h, “worst”, “now” or “at rest”) of the mBorg or NRS effects participants’ assessment; 3) mBorg and NRS scores are correlated
This was a secondary analysis of pooled data from 1048 participants (510 men, 396 women and 142 gender data unavailable; diagnoses: cancer 223 (21.3%), heart failure 200 (19%) and non-malignant lung disease 617 (59%)) with breathlessness due to a variety of causes from 10 studies of people where mBorg, at least, was measured. Where both mBorg and NRS were measured, these were concurrent. All studies used the same version of the Borg; a variant of the Borg Category-Ratio scale with a maximum value of 10, and with verbal descriptors missing for values six and eight.
Most contributing studies are described more fully elsewhere [6–13] but are summarised here as follows. 1) Quantifiable data from a primarily qualitative study (study 1: n=47; mean age 69 years (range 46–92 years)) that measured mBorg (average 24 h, worst, rest, nonspecific now and exertion) with NRS for seven participants [7]. 2) Two phase III trials: the first (study 2: n=35; mean (range) age 70 years (41–89 years)) measured mBorg and NRS (average 24 h, worst, rest, nonspecific now and exertion) [9]; the second (study 3: n=154; mean (range) age 71 years (28–91 years)) measured mBorg and NRS (rest and exertion) [11]. 3) Two feasibility studies: one (study 4: n=46; mean (range) age 69.5 years (62–73 years)) measured mBorg and NRS (rest) [6]; the other (study 5: n=13; mean (range) age 67 years (53–80 years)) measured mBorg only (rest and exertion) [12]. 4) Five observational studies: study 6 (n=50; mean (range) age 69 years (42–83 years)) measured mBorg only (pre- and post-exertion) [8]; study 7 (n=109; mean (range) age 65 years (38–52 years)) measured mBorg only (now) [10]; study 8 (n=99) measured mBorg only (average over previous 24 h) (Farida Malik, St Wilfrids Hospice, Eastbourne and East Sussex Healthcare NHS, UK; personal communication); study 9 (n=353; mean (range) age 65 years (24–90 years)) measured mBorg only (average and worst over previous 24 h) [13]; and study 10 (n=142; mean (range) age 69 years (34–91 years)) measured mBorg and NRS (average, worst over past 24 h and now) (Patrick White, King's College London, London, UK; personal communication). Proxy scores were excluded.
The individual distributions of mBorg and NRS scores (average, worst, now, rest and exertion) were visualised with predicted values using truncated Poisson distribution with their corresponding mean plotted as a reference. Descriptive statistics including mean, standard deviation and frequency were examined. The strength of association between mBorg and the corresponding NRS was examined using two-way mixed intraclass correlation (consistency).
The frequency of mBorg scores for numbers six and eight (no verbal descriptor) was less than expected. There were also fewer than expected measures for 0.5 (verbal descriptor of “Very very weak (just noticeable)”). In general, scores for mean averages over 24 h were normally distributed for mBorg (other than the pattern noted above) and NRS. However, no NRS “average” scores exceeded eight. Although an NRS score of seven is considered “severe”, equivalent to an mBorg of five, mBorg “average” scores included a maximum of 10 (figure 1).
The pattern of scores for “worst” NRS and mBorg per 24 h was similar although, as expected given the equivalent severity scores, there were more high NRS scores.
For point in time measures, the patterns for “exertion” mBorg and NRS were similar, with few mild scores. Conversely, mBorg “at rest” and “now” scores and NRS “at rest” scores shared a similar pattern, but with very few severe scores. However, the NRS “now” scores had measures across the response spectrum, including very severe scores.
The strongest association between NRS (n=21; mean±sd 7.23±1.80) and mBorg (n=261; mean±sd 5.55±2.18) was for “on exertion” (intracluster correlation (ICC)=0.66, 95% CI 0.33–0.85), and the weakest was for “now” (NRS n=106, mean±sd 4.51±2.72; mBorg n=368, mean±sd 2.36±1.79; ICC=0.14, 95% CI −0.05–0.33). ICC (95% CI) for the other associations were: “average” 0.51 (0.15–0.75); “worst” 0.55 (0.34–0.71) and “rest” 0.33 (−0.09–0.66).
Our data indicate preferential reporting of mBorg scores with descriptors. This may be due to the mBorg's stem question: “Choose a number whose words best describe…”. A less than predicted level of use of 0.5, despite a descriptor, suggests that “very very weak” is either indistinguishable in the context of chronic breathlessness or “0.5” is not understood; the visual analogue scale may be more sensitive in reporting breathlessness due to light intensity work [14].
Aside from the observed reduction in non-descriptor mBorg scores, the pattern of mBorg and NRS scores in relation to the previous 24 h appears to be as expected, apart from a possible ceiling with the NRS. The observed pattern of responses for NRS “now” probably reflects the contemporaneous context. For example a patient waiting in the clinic room for some time will respond differently to one who has hurried into the clinic. Thus, unless the measure is taken with close definition of the circumstances of “now”, responses will be difficult to interpret.
Despite the numerical discrepancy between the two scales, the intracluster correlations were moderate for “on exertion” and “average”, albeit with wide confidence intervals, suggesting that the mBorg might be used to assess intensity of breathlessness on average or over the past 24 h. The mBorg and NRS “now” were poorly correlated, presumably for the reasons outlined above. It should be noted that with some ICC calculations there is a large discrepancy between the smaller and larger number. Therefore, the ICCs should be interpreted with caution because the missing data cannot be assumed to be missing at random.
These data suggest that there is a participant response bias against using numerical ratings that lack categorical labels, in which case, the scale would “lose” the ratio properties that Borg wanted to preserve. Therefore, we recommend that, given the non-controlled conditions in chronic breathlessness clinical studies, the NRS is used. Reported mBorg values may differ if the stem is simplified to “choose a number to describe…”.
The NRS “at rest” and “on exertion” appear useful as “point in time” measures. However, the circumstances of “now” should be stipulated. Given the possible ceiling for “average” NRS scores the mBorg (average over 24 h) may be preferable in populations with severe daily breathlessness.
The analysis of this pooled data from people with chronic breathlessness suggests that there is a response bias in favour of mBorg responses with a verbal descriptor. The theoretical advantages of the mBorg scale under known and scalable stimulus conditions (e.g. in pulmonary rehabilitation programmes or cardiopulmonary exercise testing) therefore are not necessarily maintained in less controlled clinical studies. A change in the mBorg stem question should be considered and tested. The NRS scale should be used in preference, except for people with very severe breathlessness. The context of “point in time” measures should be clearly stated on completion of measurements.
Acknowledgements
With thanks to members of the Breathlessness Research Interest Group who provided data: Claudia Bausewein (University of Munich, Munich, Germany); Saskie Dorman (Poole Hospital NHS Foundation Trust, Poole, UK); Morag Farquhar and Sara Booth (University of Cambridge, Cambridge, UK); Alex Molassiotis (The Hong Kong Polytechnic University, Hong Kong); Stephen Oxberry (Calderdale and Huddersfield NHS Foundation Trust, Huddersfield, UK); Farida Malik (Eastbourne and East Sussex Healthcare NHS Trust, Eastbourne, UK); Steffen Simon (University of Cologne, Cologne, Germany); Kyle Pattinson (University of Oxford, Oxford, UK); Janelle Yorke (University of Manchester, Manchester, UK); Patrick White (King's College London, London, UK).
Footnotes
Conflict of interest: None declared.
- Received December 10, 2015.
- Accepted January 24, 2016.
- Copyright ©ERS 2016