Abstract
Airways disease is currently classified using diagnostic labels such as asthma, chronic bronchitis and emphysema. The current definitions of these classifications may not reflect the phenotypes of airways disease in the community, which may have differing disease processes, clinical features or responses to treatment. The aim of the present study was to use cluster analysis to explore clinical phenotypes in a community population with airways disease.
A random population sample of 25–75-yr-old adults underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, nitric oxide measurements, blood tests and chest computed tomography. Cluster analysis was performed on the subgroup with current respiratory symptoms or obstructive spirometric results.
Subjects with a complete dataset (n = 175) were included in the cluster analysis. Five clusters were identified with the following characteristics: cluster 1: severe and markedly variable airflow obstruction with features of atopic asthma, chronic bronchitis and emphysema; cluster 2: features of emphysema alone; cluster 3: atopic asthma with eosinophilic airways inflammation; cluster 4: mild airflow obstruction without other dominant phenotypic features; and cluster 5: chronic bronchitis in nonsmokers.
Five distinct clinical phenotypes of airflow obstruction were identified. If confirmed in other populations, these findings may form the basis of a modified taxonomy for the disorders of airways obstruction.
Airways disease is currently classified using potentially overlapping diagnostic labels, such as asthma, chronic bronchitis and emphysema. These disorders are characterised by defined symptom complexes, exposure to certain risk factors, variable patterns of airflow obstruction and airway hyperresponsiveness, and different types of airways inflammation with structural changes 1–5. The definitions of these disorders have evolved over time with greater knowledge of their pathogenesis and clinical characteristics; however, this has led to greater awareness of their imprecision, with the overlap of phenotypes resulting in difficulties in differentiating the disorders from each other. For example, reversible airflow obstruction, the characteristic physiological and clinical feature of asthma, may be present in a substantial proportion of subjects with chronic bronchitis and/or emphysema 6, 7. In addition, patients with long-standing asthma may show an accelerated loss of lung function, resulting in irreversible airflow obstruction 8–10. It is also recognised that different phenotypes of asthma can differ markedly in their response to treatment 11, 12. Chronic bronchitis is defined by the presence of chronic sputum production, a symptom complex that is commonly present in asthma 13. Although the use of gas transfer measurements and computed tomography (CT) has enhanced the ability to diagnose emphysema, the definition remains dependent upon pathological appearances 14. Furthermore, the current definition of chronic obstructive pulmonary disease (COPD) is based on the presence of incompletely reversible airflow obstruction (post-bronchodilator forced expiratory volume in 1 s (FEV1)/forced vital capacity (FVC) of <0.7), independent of the presence of chronic cough and sputum or the existence of emphysematous structural changes in the lungs 14.
These limitations have led to recommendations that a new taxonomy is required in order to better define the disorders of airways obstruction 3, 15–18. Conceptually, the phenotypic overlap of the disorders of airways obstruction has been represented in Venn diagram format 1, 2, 19. However, this has resulted in ≥15 phenotypes that are difficult to differentiate in terms of their pathogenesis, response to treatment and prognosis. Similar limitations exist with the recent proposal to classify asthma into multiple overlapping phenotypes 18.
Another approach is to use techniques such as cluster analysis to describe the patterns of disease based on the clinical, pathogenetic and physiological features of airways disease. This approach has previously been employed in a patient population with well-characterised asthma and COPD, in which at least four phenotypic subgroups of airways disease were identified 3. It has also been employed in three separate patient populations with asthma, in which several clinically relevant phenotypic subgroups of asthma were identified 12.
The burden of airways disease is not, however, confined to diagnosed patient populations. Many individuals with airways disease in the community do not meet standard criteria for a diagnosis of asthma 20 or COPD 21, or meet the criteria but are undiagnosed 19. In the present study, a survey of a random community population was undertaken, utilising detailed clinical, immunological, physiological and radiological data. Those with airways disease were identified by the presence of current respiratory symptoms or obstructive spirometric results; a prior diagnosis of asthma, chronic bronchitis or COPD was not required. Cluster analysis was then used to explore clinical phenotypes of airways disease in this community population.
METHODS
Subjects
Participants in the Wellington Respiratory Survey (n = 3,500) were randomly selected from the electoral register, and equally distributed by sex across the five decade age groups ranging 25–75 yrs 19, 22, 23. Subjects were sent a simple postal questionnaire seeking demographic, respiratory and smoking history data. The 2,319 subjects who returned completed questionnaires were invited to undertake an interviewer-administered questionnaire followed by pulmonary function tests, exhaled nitric oxide fraction (FeNO) measurements, blood tests, CT scanning and a peak flow diary. The survey was approved by the Central Regional Ethics Committee (Ministry of Health, Wellington, New Zealand), and written informed consent was obtained from each subject.
Detailed questionnaire
All participants completed a detailed written questionnaire, compiled from a series of validated questionnaires and administered by a trained interviewer in a standardised manner, as previously described 22 (see online supplementary material).
Pulmonary function testing
Pulmonary function tests were carried out using two whole-body constant-volume plethysmographs with heated pneumotachographs and gas analysers (Masterlab 4.5 and 4.6; Erich-Jaeger, Würzburg, Germany), following which peak flow monitoring was undertaken, as described in detail elsewhere 23 (see online supplementary material).
Blood tests
Total serum immunoglobulin (Ig) E levels were measured using the Roche Modular analyser (Roche, Basle, Switzerland). The blood eosinophil level was measured by the Sysmex XE-2100 automated complete blood cell count analyser (Sysmex, Mundelein, IL, USA).
FeNO measurements
Measurements of FeNO were undertaken using an online NO monitor (NIOX®; Aerocrine, Solna, Sweden), as previously described 24 (see online supplementary material).
CT scanning
CT scans of the chest were performed using a single machine (GE Prospeed; GE Yokogawa Medical Systems, Tokyo, Japan), as described previously 25 (see online supplementary material).
Statistical methods
Cluster analysis is a technique that defines the distances of each subject from each other based on the combined values, the multidimensional vector, of their measured characteristics. Using a matrix of distance measurements, cluster analysis finds groups of subjects more similar to each other than to those in other groups. It can be used to describe the phenotypes of disease without the need for historical or arbitrary a priori assumptions about classification. Subjects were selected for inclusion in the cluster analysis if they had a pre-bronchodilator FEV1/FVC ratio of <70% and/or had had wheeze within the last 12 months.
Cluster analysis was carried out on the data set with the subjects defined as described using the following nine variables: 1) pre-bronchodilator FEV1/FVC ratio expressed as a percentage; 2) pre-bronchodilator FEV1 expressed as a percentage of the predicted value; 3) post-bronchodilator change in FEV1, expressed as percentage from baseline; 4) functional residual capacity expressed as a percentage of the predicted value; 5) diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level and expressed as a percentage of the predicted value; 6) natural logarithm of the serum IgE concentration; 7) mean FeNO; 8) sputum production, defined as a positive response to the question “Do you usually bring up sputum from your chest or have sputum in your chest that is difficult to bring up when you don’t have a cold?”; and 9) cumulative tobacco cigarette consumption (in pack-years).
The variables were chosen to provide measures of airflow obstruction, bronchodilator reversibility, hyperinflation, gas transfer, atopy, eosinophilic airway inflammation, sputum production and lifetime smoking. The variables were equally weighted and the logarithmic function of IgE was used to ensure that extreme values did not form clusters.
For the purposes of the cluster analysis, the Gower distance measure, which is meaningful in the presence of a mixture of continuous and categorical variables, was used 26. Two different methods were used for the cluster analysis. The primary method was the agglomerative algorithm using the agnes cluster function (R statistical package). The secondary method was the divisive algorithm using the diana function (R statistical package) 26. Agglomerative methods start with each individual contributing to the cluster analysis being treated as a single cluster and then joins individuals who are closest together on the particular distance metric into clusters. Divisive methods start with all individuals combined into one cluster and then split off individuals and then groups who are furthest apart on the distance metric. Different algorithms use slightly different techniques to efficiently work through the very large number of possible combinations of individuals and clusters at each step of the joining or splitting process. The clusters generated by these techniques were then described using their summary statistics. Tree diagrams or dendrograms were presented to show the progressive divisions or joinings of the clustering process. Cut points in the tree diagrams were chosen to avoid having clusters of less than five subjects, and to aim to have ≥10 subjects per cluster. In addition to the nine primary variables, other phenotypic characteristics of the clusters were also presented in summary form.
RESULTS
The 749 subjects who completed the investigative modules showed similar characteristics to the 2,319 who completed the screening questionnaire, as previously reported 22. Of the 749 subjects, 250 met the criteria for inclusion in the cluster analysis, with 103 (41.2%) having a pre-bronchodilator FEV1/FVC ratio of <0.7 alone, 86 (34.4%) reporting wheeze within the last 12 months and 61 (24.4%) meeting both criteria. Some subjects were missing data, particularly for FeNO measurements, due to technical problems with the equipment on the day of testing. Exclusion of these subjects with missing data resulted in the cluster analysis being carried out on the 175 subjects with a complete data set. The characteristics of the subjects are shown in table 1⇓. The subjects with data for all variables were similar to those with missing data.
Characteristics of subjects
The cluster analysis using the agnes methodology identified five distinct clusters based on the nine variables included in the model (tables 2⇓–⇓4⇓). Cluster 1 grouped subjects with features suggesting an overlap syndrome with atopic asthma, chronic bronchitis and emphysema. This cluster exhibited severe airflow obstruction and the greatest degree of bronchodilator reversibility and peak flow variability. Additional characteristics included concomitant eczema and rhinitis, heavy cigarette smoking and markedly reduced quality of life.
Characteristics of clusters of subjects with full data set# defined by the agnes method
Additional characteristics of subjects included in the five clusters defined by the agnes method
Patterns of variables of subjects with full data set# defined by the agnes method
Cluster 2 grouped subjects with features of emphysema but without features of concomitant asthma or chronic bronchitis. Similarly to cluster 1, these subjects showed severe airflow obstruction and were heavy cigarette smokers, but, in contrast, exhibited minimal bronchodilator reversibility and were nonatopic.
Cluster 3 grouped subjects with asthma who were highly atopic and with concomitant eczema and rhinitis and markedly raised FeNO, indicative of eosinophilic airways inflammation. This cluster did not have the features of chronic bronchitis or emphysema.
Cluster 4 grouped subjects with mild airflow obstruction but without features suggesting emphysema or chronic bronchitis, a dominant atopic predisposition or smoking history.
Cluster 5 grouped subjects with airflow obstruction and chronic sputum production, but without concomitant emphysema or a major smoking history. Other features of this group included concomitant eczema and rhinitis and a modest increase in FeNO and blood eosinophil level.
The cluster analysis using the diana methodology identified four disease clusters based on the nine variables included in the model (tables 5⇓–⇓7⇓). Clusters 1, 2 and 4 were very similar to clusters 1, 2 and 5 using the agnes methodology. Cluster 3 represented a combination of agnes clusters 3 and 4 without differentiating on the basis of FeNO or IgE levels.
Characteristics of clusters of subjects with full data set# defined by the diana method
Additional characteristics of subjects included in the four clusters defined by the diana method
Patterns of variables of subjects with full data set# defined by the diana method
The dendrograms for the cluster analyses using the agnes and diana methods are shown in figures 1⇓ and 2⇓, respectively.
Tree plot for the agnes method. Gower’s distance refers to the distance at which clusters are joined. The distance separating the individual branches is unitless.
Tree plot for the diana method. Gower’s distance refers to the distance at which clusters are separated. The distance separating the individual branches is unitless.
DISCUSSION
The present cluster analysis suggests that the syndrome of airways obstruction can be classified according to five distinct phenotypes. This provides a naturalistic classification that, if confirmed in other studies, could provide a modified taxonomy for the disorders of airways obstruction.
There are a number of methodological issues relevant to the interpretation of the present findings. The first was the selection of patients from a random population sample, thereby ensuring that the results were broadly generalisable. Subjects who had lung function test results or current respiratory symptoms indicating airflow obstruction were then included in the cluster analysis, thereby ensuring that all forms of airflow obstruction were included. This approach differed from that used previously, in which subjects with a diagnosis of asthma and/or COPD were included and patients who did not meet these strict diagnostic criteria were excluded 3. It has previously been shown that the inclusion of only those subjects who meet the strict diagnostic criteria for asthma and/or COPD results in the majority of patients with asthma or COPD in the community being excluded from such studies 20, 21. The cluster analysis was not undertaken in the whole population sample as any signal related to airflow obstruction would have been overwhelmed by the noise of the far larger nonobstructed group.
Another issue was the selection of the variables included in the cluster analysis. The analysis was limited to nine variables, chosen to reflect putative mechanisms and clinical features of the different phenotypes, and to have the best chance of discriminating between underlying clusters, ensuring that the variables were not too close to measuring the same characteristics. Some, but not all, of the variables chosen have previously been shown to contribute to separate dimensions in the description of asthma and COPD 3, 27, 28. Variables, such as sex and age, which represented characteristics of the participants rather than expression of disease phenotypes were not used. Smoking history was an exception because of the common clinical practice of allocating smokers with wheeze or airflow limitation to a diagnosis of COPD.
There was no formal means of assessing the optimal number of clusters with the algorithms used in the present analysis. The approach used was to move the cut points in the tree diagrams to give ≥10 subjects per cluster. Examination of the tree plots suggested that this was a reasonable approach.
The cluster analysis using the divisive diana algorithm identified four distinct groups. These were very similar to the five distinct groups identified with the agglomerative agnes algorithm, with the exception that one cluster represented a combination of agnes clusters 3 and 4, without differentiating subjects on the basis of FeNO or IgE levels. Owing to evidence that measures of FeNO are an important phenotypic feature that reflects the degree of airways eosinophilic inflammation and predicts response to inhaled corticosteroid (ICS) therapy 29, 30, we consider that cluster analysis using the agnes algorithm is of greater clinical relevance.
An important consideration, which is inherent to any study of this kind, is that disease-modifying treatment has the potential to modify clinical variables. This relates primarily to ICS therapy, which was prescribed in 21–42% of subjects in each of the groups. It is not possible to overcome this confounding through study design or analysis, since to include only subjects not prescribed ICS therapy would result in the exclusion of those with the most significant clinical disease, and thereby not be generalisable to a random population with airflow obstruction. However, the observation that the use of ICSs was broadly similar between the groups suggests that it may not have markedly influenced the clusters observed.
Amongst the five distinct phenotypes identified with the agglomerative agnes algorithm, there were two groups with features characteristic of emphysema secondary to tobacco smoking. These two groups exhibited the most severe degree of airflow obstruction and hyperinflation, a marked reduction in gas transfer, a higher prevalence of macroscopic emphysema on CT scanning and similarly heavy exposure to tobacco smoke. However, these two groups were markedly different in terms of other characteristics. One group (cluster 2) had the classical characteristics of emphysema, with no chronic sputum production, minimal bronchodilator reversibility and low levels of serum IgE and FeNO. In contrast, the other group (cluster 1) showed the greatest degree of bronchodilator reversibility and peak flow variability of all five clusters, was strongly atopic in terms of both raised IgE levels and the presence of eczema and rhinitis, and exhibited chronic sputum production. As a result, this group could be considered to represent an overlap disorder with features characteristic of emphysema, atopic asthma and chronic bronchitis.
The other group with chronic sputum production (cluster 5) more closely represented simple chronic bronchitis, without features of emphysema and not associated with heavy tobacco smoking. These subjects commonly exhibited rhinitis and eczema, suggesting a possible role of atopy in the disease pathogenesis.
The other two groups (clusters 3 and 4) could primarily be differentiated on the basis of their FeNO. One group (cluster 3) could be considered to show allergic asthma with eosinophilic airways inflammation, with a markedly raised FeNO (mean 69.2 ppb), high IgE levels and significant bronchodilator reversibility. The other group (cluster 4) showed mild airflow obstruction with minimal bronchodilator reversibility or peak flow variability, no features to suggest a chronic bronchitis or emphysema phenotype, and a low FeNO. It is possible that these latter subjects represent a group with mild disease and intermittent airflow obstruction.
These findings complement those from the previous cluster analyses of patients with diagnosed asthma or COPD 3. In particular, they confirm the presence of an overlap atopic asthma/emphysema/chronic bronchitis group with severe airflow obstruction and marked bronchodilator reversibility. This cluster exhibited the most severe disease based on the degree of airflow obstruction, requirement for hospital admission, prescribed treatment and quality of life, and, as a result, represents a particular priority for research into its pathogenesis and treatment. These findings also support previous findings that there may be more than one asthma group, without emphysema or chronic bronchitis, defined by the presence or absence of atopy and eosinophilic airways inflammation 3, 12. This finding is of clinical relevance in view of the preferentially greater response to ICS therapy in patients with marked eosinophilic airways inflammation 11, 12, 29, 30.
The present findings are also relevant to the generalisability of the results from randomised controlled trials of asthma and COPD 20, 21. Subjects in the overlap atopic asthma/emphysema/chronic bronchitis group with severe airflow obstruction would have been excluded from the major randomised controlled trials of asthma due to their smoking history, and from trials of COPD due to their marked bronchodilator reversibility. As a result, there is an inadequate evidence base for the treatment of this group of subjects with the most severe form of airways obstruction.
Application of the current definitions of asthma, chronic bronchitis, emphysema and COPD results in substantial overlap, with ≥15 potential phenotypes. This potentially represents a major source of confusion due to the difficulties in distinguishing between these phenotypes in terms of their aetiology, clinical features and responses to treatment. We have demonstrated that there is evidence to consider the syndrome of airflow obstruction in terms of the five distinct phenotypes defined by the major variables included in the present cluster analysis. If these findings are confirmed in other populations, these five phenotypes may form the basis of a modified taxonomy for the disorders of airways obstruction. The priority would then be to determine whether the phenotypes vary in their response to different pharmacological treatments. This knowledge could lead to treatment specifically targeted at defined phenotypic groups, rather than asthma or COPD in general, which represents the current management approach.
Statement of interest
Statements of interest for R. Beasley and the study itself can be found at www.erj.ersjournals.com/misc/statements.dtl
Footnotes
-
This article has supplementary material accessible from www.erj.ersjournals.com
- Received November 17, 2008.
- Accepted March 19, 2009.
- © ERS Journals Ltd