European Respiratory Society


The purpose of this study was to examine the accuracy of self-reports of symptom intensity in patients with chronic obstructive pulmonary disease (COPD) and factors that might influence recall of that intensity.

Thirty COPD (forced expiratory volume in one second (FEV1) 36±17% predicted) subjects recorded their dyspnoea and fatigue intensity scores on a 0–10 scale for 14 consecutive days. On the fourteenth day, subjects recalled their average, greatest and least symptom intensity for the previous 14 days. General cognitive function, spirometry, and oxygenation were also measured.

No significant differences were found between actual and recalled scores for dyspnoea or fatigue. General cognitive function, measured by the Mini Mental State Exam, correlated with the greatest and least dyspnoea and average fatigue difference scores (recall-actual) and contributed to the variance in the average and least dyspnoea recalled scores. The greatest contributor to the variance in the recall scores of both symptoms was the symptom intensity level on the day of recall.

These results highlight the importance of current levels of symptom intensity and cognitive function when appraising symptoms in chronic obstructive pulmonary disease patients.

Presented in part at the International Conference of the American Thoracic Society, San Francisco, CA, USA, May 18, 1997.

The patient's ability to accurately recall symptoms is central to determining appropriate therapy. However, questions have arisen regarding the ability of patients to accurately recall past events, since factors such as memory impairment, medications, past experiences, and current life situations may influence recall. In patients with chronic obstructive pulmonary disease (COPD), the ability to accurately remember the intensity of symptoms is particularly important since clinical decisions are frequently based upon self-reports.

Cognitive changes have been measured in patients with COPD 1, 2, raising the question of the ability of patients with COPD to accurately report symptoms. When patients with severe COPD requiring continuous oxygen were compared to healthy age-matched subjects and healthy older subjects, lower scores on verbal memory tasks were found 2. Furthermore, rapid cognitive decline has been associated with the presence of severe bronchial obstruction 3. Taken together, these findings suggest the potential for important memory changes in the COPD population, which could influence the accuracy of symptom reporting.

In related work, patients with chronic headaches were asked to document the intensity of pain over a 2-week period 4. Actual pain scores (over the 2 weeks) were compared with recalled and current intensity levels of pain. Patients with high levels of pain at the end of the study recalled higher levels of pain than those recorded in their daily diary, while patients with low levels of pain at the end of the study reported lower levels of pain than were actually recorded. These findings are consistent with the implicit theory of evaluating current changes from past states, whereby individuals take stock of their current state to evaluate past states 5. From both a theoretical and an empirical basis, there is support for the premise that current symptom intensity impacts recall of past intensity levels.

To date, only recall of dyspnoea after exercise sessions has been examined in the COPD population 6. Neither the impact of memory changes nor symptom intensity on symptom reporting has been examined in this population. In most reports, dyspnoea and fatigue are the two most commonly reported symptoms in this population 7, 8. Dyspnoea and fatigue, like pain, are subjective patient reports. Discrepancies in reports may occur due to: 1) the ability of COPD patients to accurately recall symptoms; and 2) current levels of symptom intensity.

The overall aim of this investigation was to examine the accuracy of recalled symptom reporting in a stable COPD population. To achieve the overall aim, three research questions were investigated. 1) How accurate is a patient's recall of the symptom intensity (dyspnoea and fatigue) over a 2-week period? 2) Is the difference between actual and recalled symptom intensity related to factors influencing symptom intensity or variability? 3) Is the recall of past symptom intensity levels influenced by intensity levels at the time of recall, disease severity (spirometry and blood gases), general cognitive function, variations in verbal memory or self-perceived poor memory?

Material and methods


Subjects with COPD were recruited from a pulmonary outpatient clinic with predominately male patients. The authors excluded patients who had: known cognitive dysfunction (e.g. Alzheimer's Disease, memory changes); psychological disorders (e.g. psychosis); diagnosed sleep apnoea; required hospitalization within the past 4 weeks (or during the study period) for exacerbation of their respiratory condition; increased use of steroids or antibiotics. Subjects who qualified were approached about the purpose of the study (i.e. to examine the role of memory in daily reports of symptoms) and informed of the time commitment involved. Three subjects declined participation due to inability to meet the demands of daily symptom recording required by the study. The first 30 subjects who consented to participate in the study were enrolled. Subjects were not told as part of the consent process that they would be asked to recall their symptom intensity after the 14 days of monitoring.

Stability of the symptom experience and general disease condition during the study was assessed by self-reports in patient diaries, in which subjects recorded whether symptoms were worse than usual or required treatment beyond typical measures (e.g. extra puffs from inhalers or a day of rest). Evaluations of general cognition, verbal memory and overall symptom experience were evaluated at the onset of participation. The study was approved by the Human Studies Subcommittee of a Veterans Administration Medical Center in southern California, USA.

Study design

The study utilized a longitudinal descriptive design that required participants to record their dyspnoea and fatigue intensity levels daily over 2 weeks.


The subjects completed a battery of tests on the initial visit (baseline) and the same tests on the fourteenth day, with the addition of questions asking them to recall their symptom pattern over the 14 days. Only the testing completed on the fourteenth day was used in this analysis as it reflected the current patient status at time of recall. The specifics of the testing are described below.

Lung function testing

Spirometry testing (Spirometry; Keystone S300, S and M Instrument Company, Doylestown, PA, USA) was performed on all patients. Postbronchodilator results were used to establish the severity of lung impairment. Postbronchodilator measurements were obtained following inhalation of 5 mg (0.5 mL of 1%) nebulized isoetharine. Testing followed the American Thoracic Society guidelines for spirometry, using standard reference values 9.

Cognitive evaluation

To eliminate inter-rater variance the Mini Mental State Exam (MMSE) 10 and Babcock Story Recall Alternate Form (modified) 11 were administered by the same individual. The MMSE, a rater-administered screening tool, evaluates the general cognitive status of patients. The 20 items produce scores ranging 0–30 (scores >24 are considered normal). Published norms are 27.6±1.7 for normals 10 and 27.0±1.8 for patients with COPD, with few scores <24 12. The MMSE was used to provide an overall measure of cognitive function as it is closely linked to more extensive memory testing and cognitive decline in the COPD population 2, 3.

The Babcock Story Recall Test was used to examine verbal recall, which has been shown to be reduced in COPD patients 2, 3. The subjects were asked to immediately recall a story just read to them, then after the story was read to them again, recall it 10 min later. Scoring was based on 21 memory units, with allowances for immediate recall and penalties for missing information 11. Scores can range 0–21. Published norms for those >60 yrs of age are 9.2 for immediate scores and 7.2 for delayed recall, with low scores of 7.7 and 4.7, respectively. A ratio of delayed (Bd) to immediate (Bi) recall was used to provide a score representative of an individual's overall ability to retain verbal information (Bd/Bi).

Symptom evaluation

The following tools were used to evaluate symptoms: the Pulmonary Functional Status and Dyspnoea Questionnaire-modified version (PFSDQ-M) 13, Bronchitis Emphysema Symptom Checklist (BESC) 14, Daily Symptom Diary (DSD), and Final Symptom Assessment (FSA). The PFSDQ-M is a 40-item, self-completion questionnaire evaluating activity levels and symptoms of dyspnoea and fatigue. Symptoms are rated by the subject both in general (i.e. intensity of symptom today, etc.) and in relation to 10 activities. The PFSDQ-M subscales have good reliability as measured by Cronbach's alpha (α >0.93) 13. The symptom intensity today and the total scores were used to establish symptom intensity levels at the time of recall.

The BESC is an 89-item, self-administered checklist, with 11 subscales. The reliability testing for internal consistency of the subscales ranged α=0.81–0.94 14. The BESC is scored 1 “never” to 5 “always”. Poor memory (BESC-M), hopelessness/helplessness (BESC-H), dyspnoea (BESC-D), and fatigue (BESC-F) were selected to establish symptom intensity at the time of recall and to evaluate the degree of self-perceived memory changes.

Daily intensity measures of dyspnoea and fatigue were obtained through the DSD, which consisted of two questions adapted from the PFSDQ-M and completed by the subject daily for 2 weeks. The DSD (fig. 1) assesses the intensity (0–10 scale) of the subject's daily experience of dyspnoea and fatigue, with 0 reflecting “no” symptom intensity and 10 “very, very severe” intensity. Subjects were given dated copies of this form with instructions to complete the DSD at the same time daily, and to insert and seal each day's questionnaire in the dated envelope provided. They were asked not to refer to their previous day's rating. All participants reported that they followed the instructions and returned the daily scores sealed in the original envelopes provided.

Fig. 1.—

Example of questions in a) daily symptom assessment diary and b) final assessment questionnaire.

On the final visit, subjects returned with their completed DSDs in sealed envelopes and completed the PFSDQ-M, BESC and FSA. In the FSA, subjects were asked to recall their “average”, “greatest” and “least” level of symptom intensity over the past 2 weeks for both dyspnoea and fatigue (fig. 1). Scaling for the FSA matched the scaling of the DSD. The FSA was the last form completed in the study and was performed in the presence of an investigator. The subjects were not offered and did not request to see their previous responses. The FSA scores for each subject were compared to the actual dyspnoea and fatigue intensity scores that they recorded on the DSD.

Data analysis

Actual average (mean), greatest and least symptom intensity scores for dyspnoea and fatigue were obtained from the DSD. The variance over the 2-week period for both dyspnoea and fatigue scores was calculated. Difference scores between the recalled scores on the FSA and actual scores from the DSD were computed. Data are reported as mean±sd or difference scores. A p-value <0.05 was considered statistically significant.

The questions put forward in the introduction to this paper were addressed as follows.

To answer the first question, paired t-tests were used to compare the actual scores of symptom intensity obtained from the DSD to the FSA recalled scores. A conservative estimate of a greater than one point difference on a 10-point scale was classified as an important clinical difference. The selection of the one point difference criterion was based on previous findings, in which a 0.5 difference reflected a clinically important change on a 7-point scale 15, and preliminary work that indicated a 1.9 difference on a 10-point scale was important 16. By averaging these two scores, the result was approximately a one point difference (1.2) and appeared to be a reasonable assumption. A 1.1 difference, power calculations for paired t-tests, revealed sufficient power (0.80), with a sample size of 30 and assuming an alpha of 0.05 17.

To answer the second research question, correlations were calculated between the difference scores and general demographic measures, variance across the 2-week dyspnoea and fatigue scores, other symptom intensity measures, and the cognitive and verbal memory measures. Given the sample size of 30, the relationship (effect size) needed was moderate-to-large (0.36) to reach significance at an alpha of 0.05, assuming a two-tailed relationship 17.

To answer question three, stepwise regression analysis was used. Correlational analysis was used to screen the variables prior to their use in the regression analysis. Variables with r values >0.20 were used in the analysis 17. The variables that were screened, but not included based on the results (r⪕0.20), were age, forced expiratory volume in one second per cent of predicted (FEV1 % pred), BESC-M, BESC-H, and Bd/Bi. The recalled symptom intensity score was the dependent variable; the entry order of the independent variables was as follows: 1) current symptom intensity (PFSDQ-M today); 2) disease severity reflected by forced vital capacity per cent of predicted (FVC % pred; 3) arterial oxygen level on room air (Pa,O2rm); and 4) general cognitive function (MMSE). Power analysis determined that multiple regression with a sample size of 30, alpha of 0.05, and six independent variables had sufficient power (≥0.80) to detect a cumulative R2 of ≥0.35 17.


Thirty male subjects with severe COPD (table 1) completed the study. Continuous oxygen was required by five subjects, and an additional seven subjects required oxygen only with exercise. Tests of cognitive function averaged 28.9±1.1 (range 25–30), well above 24, the MMSE cut-off score for normals. The mean scores on the Babcock Test were 15.5±3.4 for immediate recall and 13.2±3.8 on delayed recall, which were well above published average scores of 9.2 and 7.2, respectively. The ratio of delayed recall to initial recall averaged 88±26% with a range of 20–142%. However, closer examination of this score revealed important problems with delayed recall (<50%) existed in only two subjects, while excellent (≥100%) delayed recall was present in more than a third of the subjects (n=13).

View this table:
Table 1—

Subject characteristics

Reports of dyspnoea and fatigue showed considerable daily variability, as seen in the variance calculated over the 2 weeks (table 1). Dyspnoea scores were typically reported as being greater in intensity than fatigue intensity scores. This pattern is illustrated by one patient's pattern, as depicted in figure 2. Subject estimates at the end of 2 weeks of their “average”, “greatest” and “least” experience with dyspnoea were similar to their daily reports. The recalled 2-week “average” dyspnoea did not significantly differ from the actual calculated average over that period. No significant differences were found between the actual and recalled values for any of the intensity scores (fig. 3). In fact, the mean difference between actual and recall scores was ⪕0.50 for all scores examined (table 2). Closer examination of the “average” dyspnoea and fatigue scores revealed that 70% of the sample had differences of one point or less, with slightly more individuals overestimating, rather than underestimating their actual average scores. A smaller percentage of subjects recalled the “greatest” dyspnoea and fatigue scores exactly, while with the “least” scores, 80% of the subjects recalled their actual score within one point.

Fig. 2.—

Example of entries in daily symptom assessment diary over the 2-week period. ——: dyspnoea; – – – – : fatigue. Horizontal lines indicate recalled average for dyspnoea and fatigue.

Fig. 3.—

Differences between actual (□) daily reports of a) dyspnoea and b) fatigue and recalled (Embedded Image) values. All t-test values were nonsignificant.

View this table:
Table 2—

Differences between actual and recalled symptom intensity scores

It was not possible to answer question two because no variable was found to have a significant relationship with the difference score for average dyspnoea (table 3). The BESC dyspnoea subscale score had a significant positive relationship (r=0.38, p<0.05) with the dyspnoea “greatest” difference score. The MMSE score was found to have a significant negative relationship with the difference scores for “greatest” and “least” dyspnoea and “average” fatigue scores (r≥−0.38, p<0.05). The lack of significant relationships included the current symptom level and the variance in the scores over the 2 weeks. However, the correlation coefficients with the difference scores for both “greatest” dyspnoea (r=−0.32) and fatigue (r=−0.35) did near an appropriate significance level.

View this table:
Table 3—

Correlation coefficients between symptom difference scores and other symptom variables.

Finally, to answer question three, regression analysis was used, selecting variables that demonstrated at least a small relationship (r⪕0.20) with the recalled scores. For all recalled scores, the primary contributor to the explained variance was the symptom intensity score obtained on the final day from the PFSDQ-M general symptom scores (table 4). For these positive relationships (β), the range of explained variance was from 25% for the dyspnoea today with the “least” dyspnoea recalled, to 70% for fatigue today with the “average” fatigue recalled. Only with the recalled dyspnoea scores did any of the other variables enter the equation. With “average” recalled dyspnoea, Pa,O2rm contributed 13% and scores on the MMSE contributed 10% to the total explained variance (68%). In both cases, the relationship was negative, indicating that the greater the Pa,O2rm or MMSE score, the lower the “average” dyspnoea recall score. For “greatest” recalled dyspnoea, only the FVC % pred added any additional variance and brought the total explained variance to 41%. The direction of the relationship was positive indicating that the greater the FVC % pred, the greater the “greatest” dyspnoea recalled score. The “least” dyspnoea score recalled also had MMSE enter the equation and contributed 23% to the total 48% of the explained variance. Again, as with the “average” recalled score, the relationship was negative.

View this table:
Table 4—

Regression coefficients, explained variance, and significance of the independent variables that entered the regression analysis


The dyspnoea and fatigue symptom pattern seen over 2 weeks in this study can be characterized as one of substantial variability, where dyspnoea was consistently rated as more intense than fatigue. It is important to note that the wide range of daily intensity scores is not consistent with falsified data, as these types of reports have, in general, less variance and are more homogenous than what was seen in this sample. Approximately one-third (30–37%) of the sample had above a one point difference between recalled and actual symptom intensity scores (table 2). Examination of the data also revealed that no subject was able to exactly recall all of the actual values. Exact recall did not occur with the “average” scores and rarely with the extreme values, such as “greatest” and “least” dyspnoea or fatigue scores. The pattern of variance and recall seen is consistent with studies of memory that have examined an individual's ability to recall information whether in the short- or long-term. Recall based on short-term memory would have to occur a very short time (15–30 min) after the symptom score is recorded. Furthermore, recall based on short-term memory without prior knowledge of what was to be recalled has a very poor (<70%) success rate 18, 19. Conversely, long-term (explicit) memory of sensory information is stored and retrieved relative to each individual's unique record of experience based on personal significance 20. Individuals with moderate-to-severe COPD clearly identify changes in their breathing as having personal significance, and thus, would have greater recall of their greatest and least days of symptom intensity over their average levels. The findings of this study are consistent with recall of specific notable days of symptom intensity and not fabrication of the data.

In the present study, stable male patients with moderate-to-severe airway obstruction and no major cognitive deficits were able to accurately recall the intensity, over 2 weeks, for both dyspnoea and fatigue. No single, universal contributor that had at least a moderate-to-large relationship with the differences seen in the actual and recalled symptom score could be identified (table 3). It is important to note that the sample size was not sufficient to detect a significant small relationship (effect size), if one had been present 17. The MMSE score, however, demonstrated significant negative relationships with the difference scores for “greatest” and “least” dyspnoea and the “average” fatigue score. This finding suggests that the lower the cognitive function, the greater the difference between the recalled symptom intensity and the actual value reported in the diaries. Cognitive function, as measured by the MMSE, also explained 10–23% of the variance in the recalled score for “average” and “least” dyspnoea score. Again, the relationship was negative, meaning the better the general cognitive function, the lower (i.e. better) the recall score.

The results of this investigation provide some support for the premise that small changes in cognitive function may influence symptom reporting, both the recalled symptom intensity scores and the difference between actual and recalled scores. This was true given the fact that the majority of the subjects' MMSE scores would be considered within normal limits. Some individuals had deficits in verbal memory, as measured by the Babcock Story Recall Test, but the deficit did not play a role in the results. Although the MMSE has limitations, some reports have suggested that it is a reasonable screening instrument and is able to detect cognitive decline in this population 3.

While the intensity of the symptom on the day of recall did not correlate with the difference score, the intensity score was clearly the dominant influence in the recalled intensity scores. The amount of explained variance associated with the symptom intensity on the day of recall ranged from a minimum of 25% to a maximum of 70% for both dyspnoea and fatigue recalled scores (table 4). The results of the regression analysis support the assertion that the greatest contributor to recalled levels of symptom intensity is the current intensity level. These findings are consistent with the implicit theory of evaluating changes by taking stock of the current state to evaluate past states as seen in a similar investigation of pain 4. In this case, the ability to recall the average score would have required the greatest mental calculation and would potentially have fit the implicit theory best. The fact that both Pa,O2 and MMSE were included in the “average” dyspnoea regression may reflect the importance of general cognitive changes required for recall, described in the literature 2, 3. This explanation does not seem to apply to the regression findings for the average fatigue recall score, since only the current fatigue level entered the equation, explaining 70% of the variance. Potentially, the recall scores of fatigue are related to activity values, such as walk tests or exercise levels, not measured in this study. Inclusion of the FVC % pred as part of the explained variance in the “greatest” dyspnoea recall score is interesting given the positive relationship, although it is not clear what this reflects other than possible air trapping.

The results of this study are heavily based on the presumption that the subjects did not fabricate their daily symptom intensity scores and followed the instructions given. The value of patient diaries has been demonstrated in patients with cystic fibrosis 21, asthma 2224, pain 25, sleep 26, 27 and other conditions 28. Many randomized clinical trials have used patient diary reports, from which conclusions about symptom intensity are made. While there can never be 100% certainty about of the precision of patient reports and adherence to the instructions given, it is felt that: 1) the process followed for consent which did not reveal the need to recall intensity levels; 2) the battery of testing that preceded the request to recall their symptom levels; and 3) the pattern of variance and responses observed in the data, support the claim that it is unlikely that the daily scores were fabricated or that the instructions were not followed.

While replication of these findings with a larger sample including female participants is needed, the results do point to important issues that should be considered when discussing past symptom intensity with patients. Current symptom intensity must be assessed when evaluating past estimates. General cognitive function may be important to evaluate, especially when there are obvious signs of decline. Given both of these considerations, in stable chronic obstructive pulmonary disease patients there appears to be initial evidence that their recall of recent symptom experiences has acceptable dependability.

  • Received October 15, 2000.
  • Accepted May 10, 2001.


View Abstract