Abstract
Studies systematically comparing the performance of health-related quality-of-life (HRQoL) instruments in pulmonary arterial hypertension (PAH) are lacking. We sought to address this by comparing cardiac and respiratory-specific measures of HRQoL in PAH.
We prospectively assessed HRQoL in 128 patients with catheterisation-confirmed PAH at baseline and at 6, 12 and post-24 month follow-up visits. Cardiac-specific HRQoL was assessed using the Minnesota Living with Heart Failure Questionnaire (LHFQ); respiratory-specific HRQoL was assessed using the Airways Questionnaire 20 (AQ20); and general health status was assessed using the 36-item Short Form physical component summary (SF-36 PCS).
The LHFQ and AQ20 were highly intercorrelated. Both demonstrated strong internal consistency and converged with the SF-36 PCS. Both discriminated patients based on World Health Organization (WHO) functional class, 6-min walking distance (6MWD) and Borg dyspnoea index (BDI), with the exception of a potential floor effect associated with low 6MWD. The LHFQ was more responsive than the AQ20 to changes over time in WHO functional class, 6MWD and BDI. In multivariate analyses, the LHFQ and AQ20 were each longitudinal predictors of general health status, independent of functional class, 6MWD and BDI.
In conclusion, both cardiac-specific and respiratory-specific measures appropriately assess HRQoL in most patients with PAH. Overall, the LHFQ demonstrates stronger performance characteristics than the AQ20.
Over the past two decades, clinical research in cardiopulmonary disease has broadened from a purely physiological focus to a more comprehensive approach to health-related end-points. In particular, there has been rapid growth of interest in the development and application of patient-centred outcome measures [1]. The majority of such research has focused on common conditions, such as chronic obstructive pulmonary disease (COPD) or heart failure. The extent to which results from these studies can be extrapolated to more rare disease states is uncertain.
Pulmonary arterial hypertension (PAH) is a relatively uncommon disorder that manifests as both a pulmonary and a cardiac condition. Patients typically present with dyspnoea on exertion that is indistinguishable from other more common pulmonary conditions. In fact, many patients are initially misdiagnosed with obstructive airway disease prior to establishing a diagnosis of PAH. Moreover, such patients also exhibit features of heart failure due to the decompensation of the right ventricle in the face of elevated pulmonary vascular resistance. Although traditionally considered to be a rapidly progressive and fatal disease, patients with PAH are now living longer due to the availability of effective medical therapies [2–5]. Consequently, the goals of PAH therapy have expanded from increasing survival to improving health-related quality of life (HRQoL).
Despite the wealth of data on HRQoL in COPD and heart failure, relatively few studies have addressed HRQoL in PAH. The studies that have been performed have utilised various types of instruments, both generic and disease-specific [6]. Generic HRQoL instruments can be advantageous in that they provide results that are comparable across a heterogeneous mix of conditions; however, such approaches generally provide measures that are less sensitive to the nuances of the specific disease in question. In contrast, disease-specific instruments emphasise aspects of health that are impacted by a particular type of disorder; for example, breathing-related symptoms and their potentially negative impact on a patient's sense of well-being.
As cardiac- and pulmonary-specific measures focus heavily on dyspnoea-related impairment, both have been used to assess HRQoL in PAH [7, 8]. Nonetheless, studies systematically comparing the performance of different disease-specific instruments in PAH are lacking. Such information, juxtaposed against concurrent measures of generic HRQoL, is of critical importance for the interpretation of results of studies that utilise one or another approach to HRQoL in PAH. To compare the performance characteristics of cardiac-specific, pulmonary-specific and generic HRQoL measures in PAH, we co-administered all three types of instrument longitudinally within the context of a prospective, observational cohort study of individuals with well-defined disease.
METHODS
Study subjects and design
Over a 5-yr period from July 2003 to July 2008, we consecutively screened and enrolled patients with an established diagnosis of PAH from the Pulmonary Hypertension Clinic of the University of California San Francisco (UCSF; San Francisco, CA, USA). Recruitment was performed in parallel with an existing prospective, observational cohort study of second-hand smoke exposure in patients with PAH [9]. Inclusion criteria were: ≥18 yrs of age; PAH confirmed by right heart catheterisation; ≤3 months of PAH therapy prior to enrolment; and fluent English. Patients unable to complete a 6-min walk test, diagnosed with an unstable psychiatric disorder, or actively smoking (a criterion of the parallel second-hand smoke study) were precluded from enrolment. 22 subjects who were initially enrolled but had missing HRQoL data were excluded from analysis. Human subject study approval was obtained from the Institutional Review Board of the UCSF in 2003 and has been renewed on an annual basis.
Subject characteristics, including demographic information, classification of PAH, and haemodynamic measurements from diagnostic right heart catheterisation, were taken from subjects' medical records using a standardised data collection form. World Health Organization (WHO) functional class, 6-min walking distance (6MWD), Borg dyspnoea index (BDI; post-walk), and HRQoL (see instruments below) were prospectively assessed at baseline and at 6, 12, and post-24 month follow-up visits. The mean time elapsed from baseline to follow-up time-points was: 6.4, 12.9 and 47.7 months, respectively. The median time interval between adjacent visits was 6.2 months (interquartile range (IQR) 5.8–7.8 months). Of the 128 subjects with HRQoL data who were enrolled at baseline, 107 (84%), 65 (51%), and 56 (44%) completed 6, 12 and post-24 month visits, respectively. The overall mean observation time for the entire cohort was 27.6 months.
Study measures
Three HRQoL questionnaires were administered to each subject in random order at each visit: one generic measure and two different disease-specific measures (one cardiac-specific and one pulmonary-specific).
General HRQoL was assessed using the Medical Outcome Study 36-item Short Form Health Survey (SF-36; version 2). The SF-36 is a generic HRQoL instrument comprised of eight individual domains yielding two summary scores, a physical component summary (PCS) and a mental component summary (MCS) [10]. Summary component scores can also be calculated based on only 12 of the 36 items, yielding the SF-12 PCS and MCS. The possible range for each summary score and domain is 0–100, with a norm-based population mean±sd of 50±10. Higher scores indicate better HRQoL.
Cardiac-specific HRQoL was assessed using an adaptation of the Minnesota Living with Heart Failure Questionnaire (LHFQ). The LHFQ was originally developed and validated for use in patients with congestive heart failure [11]. It has been adapted for use in pulmonary hypertension by substituting the term “heart failure” with “pulmonary hypertension” [7]. The LHFQ is comprised of 21 items utilising a six-point Likert response format. Individual item scores range 0–5, with the overall score ranging 0–105. Higher scores indicate worse HRQoL. Physical (eight items) and emotional (five items) domains have been previously identified [12].
Pulmonary-specific HRQoL was assessed using the 20-item and 30-item Airways Questionnaire (AQ20, AQ30). The items of the AQ20 form a subset of the AQ30 items. The AQ20/30 was originally developed by the St. George's Respiratory Questionnaire (SGRQ) investigators with the aim that it would perform in a similar manner but would be less burdensome to administer [13, 14]. Its performance characteristics have been further adapted for use across a variety of respiratory conditions [15]. The original scale utilises a “Yes”, “No” or “Not applicable” response format. We employed the revised version, which substitutes the term “chest trouble” for “breathing problem”, and adds an additional response option, “Unable”, to items that relate to a specific activity [15]. Only positive responses (either “Yes” or “Unable”) are scored and summed to provide a possible range of 0–20 for the AQ20 and 0–30 for the AQ30. As with the LHFQ, higher scores indicate worse HRQoL. The AQ20 has been shown to yield similar results to the AQ30 [16], and its validity has been demonstrated in relation to well-established pulmonary-specific measures, such as the SGRQ and the Chronic Respiratory Disease Questionnaire [14, 17].
In addition to the three measures above, a PAH-specific quality of life (QoL) measure, the Cambridge Pulmonary Hypertension Outcome Review (CAMPHOR), was added to the study in 2007 as a protocol modification. The CAMPHOR was originally developed in the UK [18] and its validity was recently re-evaluated in PAH patients in the USA [19]. It is comprised of three scales intended to assess symptoms (25 items), activity (15 items) and QoL (25 items). Scores are calculated based on the sum of each individual scale. Higher scores indicate worse HRQoL (symptoms and activity scales) and QoL.
Statistical analysis
Mean±sd and median scores at baseline were calculated for each HRQoL instrument. To assess for normality and possible floor or ceiling effects, we examined the frequency distribution of each instrument individually. In addition, the number and percentage of subjects scoring either the minimum or maximum possible score was also calculated. Internal consistency was evaluated for both disease-specific measures using Cronbach's α.
Correlations among the different HRQoL measures were examined using data from all subject visits. To account for repeated measures data, correlations were based on standardised β coefficients obtained using general estimating equations (GEE) and robust variance estimates. The strength of associations between measures was reported using thresholds commonly used in behavioural sciences: trivial (≤0.1), weak (>0.1–0.3), moderate (>0.3–0.5), strong (>0.5–0.7), very strong (>0.7–0.9) and near perfect (>0.9–1.0) [20–22]. The SF-12 demonstrated nearly identical results to the SF-36 for the PCS, and MCS was not studied further. Near-perfect correlations were also observed between the AQ20 and the AQ30 (r=0.98, p<0.0001; n=110); thus, the AQ30 was not further analysed either.
Relationships among the HRQoL measures and other clinical variables (WHO functional class, 6MWD and BDI) were assessed using all available visit data. GEE was used to account for repeated measures. To avoid strong assumptions regarding the structure of the longitudinal data, we specified an independent correlation matrix with robust variance estimates. Differences in HRQoL among levels of each clinical variable were first examined using an overall F-test, followed by a test for trend (i.e. monotonic relationship) across levels using linear contrasts. For 6MWD and the BDI, categories were defined to approximate quartiles.
Relationships between change in HRQoL measures and change in other clinical variables (WHO functional class, 6MWD and BDI) were assessed in a similar fashion using paired visit data. Change was categorised as: better, same or worse for each clinical variable. Change was defined as a difference of: ≥1 for WHO functional class; >40 m for 6MWD (equivalent to ∼0.5 standard effect size); and ≥2 for BDI. Testing for trend across categories of change was performed using F-tests with linear contrasts.
To assess predictors of general physical health status, we used standard linear regression to model the relationship between measures obtained at baseline and the SF-36 PCS (the dependent variable) at the last study visit for which such data were available (median follow-up time: 12.3 months). Univariate models were constructed for each predictor variable. Separate multivariate models were constructed for the AQ20 and LHFQ as independent predictors of SF-36 PCS, adjusting for age, sex, ethnicity, aetiology of PAH, and all other clinical variables. Among those subjects with CAMPHOR data at follow-up, parallel models were constructed using the CAMPHOR as the dependent variable. All analyses were conducted in SAS 9.2 (SAS, Cary, NC, USA) or STATA/IC 11.0 (Stata Corp. LP, College Station, TX, USA).
RESULTS
Demographic and clinical characteristics for the 128 subjects included in the analysis are shown in table 1. Subjects were ∼50 yrs of age, predominantly female and predominantly white, non-Hispanic. English-speaking Hispanic and Asian subjects represented substantial minorities. Idiopathic PAH and PAH associated with connective tissue disease accounted for the majority of subjects. The overall distributions of PAH aetiology, haemodynamic parameters, WHO functional class, and 6-min walk test results were not substantially different when compared with those reported by other recent pulmonary hypertension registries [23, 24].
Summary statistics for the SF-36, AQ20 and LHFQ at baseline are shown in table 2. There was substantial impairment in the SF-36 PCS, but only minimal impairment in the MCS. Scores for both subscales approximated a normal distribution without evidence of any ceiling or floor effect (fig. 1a and b). The domains most severely affected were: physical functioning, general health and physical role (fig. 2).
Frequency distribution of health-related quality of life scores by instrument: a) 36-item Short Form Health Survey (SF-36) physical component summary; b) SF-36 mental component summary; c) Airways Questionnaire 20 (AQ20); d) Living with Heart Failure Questionnaire (LHFQ). ---: normal distribution for comparison.
Health-related quality of life impairment across eight domains of the 36-item Short Form Health Survey (SF-36). Higher scores indicate better quality of life. A score of 50 represents a norm-based population mean.
Pulmonary- (AQ20) and cardiac-specific (LHFQ) measures also demonstrated evidence of impairment, with mean scores falling in the middle of the possible range for each scale (table 2). Both disease-specific measures demonstrated deviations from a normal distribution, particularly at their tails (fig. 1c and d). Minor ceiling effects (best possible (lowest) score) were observed for both measures, particularly in the AQ20, for which 6% of subjects had the best possible score at baseline. Internal consistency, as measured by Cronbach's alpha (table 2), was very high for both pulmonary- and cardiac-specific measures, indicating a strong unidimensional construct in each case.
PAH-specific QoL was assessed in 67 subjects who completed the CAMPHOR at baseline (n=17) or at a subsequent visit (n=50). Mean±sd scores were 9.1±6.4 for symptoms (possible range 0–25), 8.3±5.1 for activity (possible range 0–30) and 7.7±6.3 for QoL (possible range 0 –25). Observed variance for the CAMPHOR was similar to that which has been described in other PAH cohorts [18, 19]. Differences in data collection for the CAMPHOR and the proprietary nature of the instrument, precluded further psychometric evaluation of its performance relative to other measures.
Correlations amongst the SF-36, AQ20 and LHFQ are shown in table 3. All correlations were statistically significant (p<0.01) in the anticipated direction. Convergent validity of both disease-specific measures was supported by strong correlations with the SF-36. The physical and emotional domains of the LHFQ correlated most strongly with the SF-36 PCS and MCS subscales (r=-0.73 and -0.69, respectively), providing further evidence of convergent validity. Pulmonary-specific HRQoL, as measured by the AQ20, correlated very strongly with cardiac-specific HRQoL, as measured by the total LHFQ (r=0.75) as well as each of its domains.
Relationships between the three main HRQoL measures and functional status, exercise capacity and dyspnoea are shown in table 4. All three HRQoL measures studied showed reasonable discrimination of subjects in terms of categories of severity (i.e. known groups validity) based on WHO functional class, 6MWD and BDI. Although a statistically significant monotonic trend was detected in each case, both disease-specific measures showed evidence of poor discrimination between the two lowest quartiles of 6MWD. No significant associations were observed between any of the haemodynamic data obtained at the time of initial right heart catheterisation and any of the three HRQoL measures assessed at baseline (data not shown).
All three HRQoL measures studied demonstrated responsiveness to change in functional class, exercise capacity and dyspnoea (standardised effect sizes shown in table 5). All measures were less responsive to worsening than improvement, particularly AQ20, suggesting possible floor effects. This phenomenon was least evident for the LHFQ. Of the three instruments, the LHFQ was the most responsive overall, although differences in performance were not substantial. Averaged anchor-based estimates of change derived using WHO functional class and 6MWD corresponded to a point change of: +3.1 (improved) to -1.9 (worsened) for the SF-36 PCS; -1.7 (improved) to +0.8 (worsened) for the AQ20; and -11.5 (improved) to +8.0 (worsened) for the LHFQ. Distribution-based estimates of change derived using the 1 standard error of measurement criterion were: ±3.1 for the SF-36 PCS, ±1.8 for the AQ20 and ±6.1 for the LHFQ.
Linear regression models, intended to assess the extent to which pulmonary- (AQ20) and cardiac-specific (LHFQ) measures predict general physical health status (SF-36 PCS) at follow-up, are shown in table 6. In our univariate predictor models, WHO functional class, 6MWD, BDI, the AQ20 and the LHFQ were all predictive of general physical health status. The type of PAH therapy, however, was not, and this included: intravenous or subcutaneous prostacyclin (p=0.52), inhaled prostacyclin (p=0.42), endothelin receptor antagonists (p=0.99), and phosphodiesterase-5 inhibitors (p=0.42) (data not shown). In our multivariate analyses, both the AQ20 and the LFHQ were independently predictive of general physical health status, even after taking into account functional class, 6MWD, BDI, type of PAH therapy and subject characteristics. WHO functional class, 6MWD and BDI were not independently predictive of general physical health status in multivariate analyses, but a non-significant trend was observed for function class. In a model including both the AQ20 and LHFQ together (data not shown), the LHFQ remained a significant predictor of general physical health status (p=0.006), whereas the AQ20 did not.
A similar pattern of findings was observed using multiple linear regression to predict PAH-specific QoL at follow-up, as assessed by the CAMPHOR QoL scale among a subgroup of 52 subjects (table S1 of the online supplementary material). Both the AQ20 and the LFHQ were independently predictive of PAH-specific QoL, even after taking into account functional class, 6MWD, BDI and subject characteristics. In a model including both the AQ20 and LHFQ, the LHFQ remained a significant predictor of PAH-specific QoL (p<0.02), whereas the AQ20 did not.
DISCUSSION
In the present study, we compared the performance characteristics of cardiac- (LHFQ) and pulmonary-specific (AQ20) HRQoL measures in patients with PAH with respect to traditional end-points. Our results largely support the validity of both disease-specific instruments, but also indicate the potential limitations of each. Both the LHFQ and AQ20 demonstrated difficulty in discriminating between patients in the next to lowest and lowest quartiles of 6MWDs. Although both instruments were sensitive to improvement in functional class, 6MWD and BDI over time, they appeared less sensitive to worsening in the same parameters. Nonetheless, the LHFQ and AQ20 were highly intercorrelated and were each strong predictors of general physical health and PAH-specific QoL measured longitudinally, even after taking functional class, exercise capacity and perceived dyspnoea into account. Overall, the LHFQ performed slightly better than the AQ20.
In general, both cardiac- and respiratory-specific instruments performed in the expected manner. The mild ceiling effect observed for the AQ20 could be related to its limited number of activity-based items along with its “yes/no” response format. A possible floor effect for both the AQ20 and LHFQ was suggested by the lack of discrimination among patients with shorter walk distances and decreased sensitivity to worsening over time, but alternative explanations exist. The lack of a difference in disease-specific HRQoL scores between the two lower quartiles of 6MWD could be due to non-dyspnoea-related impairment (e.g. pain, fatigue). This explanation is supported by the observation that the SF-36 PCS was able to discriminate between patients with shorter walk distances, and that similar floor effects were not observed when measures based predominantly on dyspnoea (functional class and BDI score) were used as the reference. Decreased sensitivity of the HRQoL measures to worsening over time could reflect a true floor effect but may be confounded by a healthy-survivor effect, whereby those with the worst HRQoL have died, thus attenuating the observed decrease in HRQoL among those who survived.
An alternative interpretation of our results could be that the SF-36 PCS outperformed either disease-specific instrument insofar as it did not demonstrate any ceiling/floor effects and was responsive to underlying changes in health status. It is important to keep in mind, however, that the SF-36 is a general measure of physical health status and not dyspnoea-related impairment. It is likely that its strong performance in this cohort is due to the fact that patients had few comorbidities and therefore, the major driver of their physical health status was cardiopulmonary limitation. Care must be taken in extrapolating our findings to other PAH populations, such as connective tissue disease-related PAH, where general measures of physical health status may be confounded by other sources of impairment.
Studies of PAH-related HRQoL published to date have included either a cardiac- [7, 25] or a respiratory-specific instrument [8], but not both. Existing studies are also limited in size and follow-up. In the only analysis to undertake a direct comparison of HRQoL measures, Chua et al. [26] studied 83 patients using pooled data from three different clinical trials. Their study examined the LHFQ, SF-36 and Australian Quality of Life questionnaire (a preference-based utility measure). No respiratory-specific measure was included. Their analysis was primarily restricted to the detection of simple correlations, did not allow for the modelling of asymmetric bidirectional change, and did not standardise parameter estimates, thereby precluding any comparison of effect size.
The particular strengths of our study were not only the inclusion of different disease-specific measures but also the prospective evaluation of a sizeable longitudinal cohort of PAH patients. Moreover, we systematically and thoroughly examined the psychometric performance of both disease-specific instruments for normality/distribution effects, internal consistency (reliability), convergence and discrimination (construct validity), sensitivity to change over time (responsiveness), and ability to predict future health status (predictive validity). To increase interpretability, we provided distributional and anchor-based estimates of meaningful change for each measure. Our estimates, which approximated those obtained in other disease states [27–30], should serve to inform future investigators when developing trial-specific responder definitions for PAH.
One weakness of our study was the lack of direct psychometric comparison between the CAMPHOR and other disease-specific measures. The CAMPHOR was released for use in the USA in 2008 [19], at which point >80% of our study cohort had already completed a baseline visit. In response, we modified our study protocol to include its administration for all newly enrolled subjects and at follow-up for the remaining subjects. Our preliminary data support the predictive validity of existing measures in relation to the CAMPHOR, but further study is required. Despite its comparatively large size in PAH, another limitation of our study was the lack of condition-specific subsets that might have allowed for stratified sub-analyses. The lack of culturally adapted instruments in multiple languages and utility indices also limited cross-cultural comparisons or preference-based evaluation of health states.
Bearing these limitations in mind, our results do show that existing cardiac- and respiratory-specific HRQoL measures perform reasonably well in most situations. Use of less strictly defined dyspnoea-based measures, such as the LHFQ or AQ20, could have potential advantages over an instrument exclusively focused on PAH. For example, the LHFQ has been used extensively in left-sided heart failure [11, 12] and therefore might be a valuable tool for assessing dyspnoea-based impairment in patients with pulmonary hypertension due to left ventricular dysfunction (WHO group II). Likewise, the AQ20 has been used extensively in patients with mixed obstructive disease [14, 15] and therefore might be better suited to those patients with pulmonary hypertension in the setting of obstruction (WHO group III). In contrast, instruments exclusively focused on PAH cannot be applied in other disease states and thus have limited value when comparing dyspnoea-related impairment across conditions or when evaluating populations in which diagnostic heterogeneity may exist.
In conclusion, both cardiac- and respiratory-specific measures can be used to assess HRQoL in PAH, but may be limited by lack of responsiveness to deterioration in health status over time. Overall, the cardiac-specific LHFQ demonstrates stronger performance characteristics than the respiratory-specific AQ20. Future comparison of these instruments with newer PAH-specific measures is imperative if we are to understand how best to utilise these important investigative tools.
Acknowledgments
We wish to thank our study coordinator, C.M. Teehankee (University of California San Francisco, San Francisco, USA), for her dedication to enrolling subjects, conducting study visits and managing the study database.
Footnotes
For editorial comments, see page 512.
This article has supplementary material accessible at www.erj.ersjournals.com
Support Statement
This study was supported by grants to H. Chen from the National Heart, Lung and Blood Institute (NIH K23 HL086585) and to T. De Marco (Flight Attendants Medical Research Institute).
Statement of Interest
Statements of interest for H. Chen and T. De Marco can be found at www.erj.ersjournals.com/site/misc/statements.xhtml
- Received October 13, 2010.
- Accepted January 7, 2011.
- ©ERS 2011