There is a need to improve asthma characterisation by integrating multiple aspects of the disease. The aim of the present study was to identify distinct asthma phenotypes by applying latent class analysis (LCA), a model-based clustering method, to two large epidemiological studies.

Adults with asthma who participated in the follow-up of the Epidemiological Study on the Genetics and Environment of Asthma (EGEA2) (n = 641) and the European Community Respiratory Health Survey (ECRHSII) (n = 1,895) were included. 19 variables covering personal characteristics, asthma symptoms, exacerbations and treatment, age of asthma onset, allergic characteristics, lung function and airway hyperresponsiveness were considered in the LCA.

Four asthma phenotypes were distinguished by the LCA in each sample. Two phenotypes were similar in EGEA2 and ECRHSII: active treated allergic childhood-onset asthma and active treated adult-onset asthma. The other two phenotypes were composed of subjects with inactive or mild untreated asthma, who differed by atopy status and age of asthma onset (childhood or adulthood). The phenotypes clearly discriminated populations in terms of quality of life, and blood eosinophil and neutrophil counts.

The LCAs revealed four distinct asthma phenotypes in each sample. Considering these more homogeneous phenotypes in future studies may lead to a better identification of risk factors for asthma.

Asthma is a complex disorder that includes distinct phenotypes, potentially with different aetiologies, natural histories and responses to treatment [1]. Distinct adult asthma phenotypes have been identified for some time but have been based on a limited number of characteristics. Allergic and nonallergic asthma are probably the most commonly discussed phenotypes. Other phenotypes defined by clinical or physiological categories (i.e. severity, age at onset and chronic airflow obstruction), asthma triggers (i.e. exercise, allergens, occupational allergens or irritants) or their pathobiology (i.e. eosinophilic or neutrophilic asthma) have also been proposed [1]. It is expected that a comprehensive examination protocol of asthma patients incorporating several domains of the disease would make it possible to identify more distinct asthma phenotypes. Such widening of the asthma characterisation may allow a better understanding of the aetiology of asthma, by increasing the power to detect environmental and genetic risk factors [2].

For such a purpose, multivariate statistical methods centred on the subjects (and not on the variables, as in regression analysis), such as clustering methods, have already been applied in the respiratory epidemiology field [37] and have recently been described as “steps in the right direction” [8]. This approach, applied to populations of adult asthma patients, identified asthma phenotypes that exhibited differences in clinical response to treatment [4], and clinical, physiological and inflammatory parameters [5]. Latent class analysis (LCA), a clustering model-based method, has been applied in two populations of children from the general population and identified several wheezing phenotypes [3, 6]. These approaches have never been applied in adults with asthma from population-based studies that, compared with the clinical population, are expected to cover a larger range of asthma phenotypes, by including patients with current and remittent asthma.

The aim of the present study was to identify distinct asthma phenotypes for use in aetiological studies, by applying LCA in two large epidemiological studies conducted in adults: the European Community and Respiratory Health Survey (ECRHS), a European population-based study; and the Epidemiological Study on the Genetics and Environment of Asthma (EGEA), a French case–control and family-based study.


Details regarding the methods are provided in the online supplementary data.


The ECRHS is a European population-based study on adults with an 8-yr follow-up (ECRHSI: 1991–1993, n = 18,356; ECRHSII: 1999–2002, n = 10,933) [9, 10]. The EGEA is a French case–control and family-based study with protocols and questionnaires similar to ECRHS (EGEA1: 1991–1995, n = 2,047; EGEA2: 2003–2007, n = 1,601) [1113] (supplementary methods, and figs E1 and E2 in the online supplementary data).

The present cross-sectional analysis was conducted on 1,895 subjects who had ever had asthma at ECRHSII (answered “yes” to the question “Have you ever had asthma?”) and on 641 adults who had ever had asthma at EGEA2 (answered “yes” to the question “Have you ever had attacks of breathlessness at rest with wheezing?” or “Have you ever had asthma attacks?”, or being recruited as an asthma case in chest clinics).

Analysis strategy

LCA, a latent variable model that serves to cluster subjects into classes, was used to identify distinct asthma phenotypes [14]. This approach allows identification of a set of latent classes of individuals who are similar to each other according to the variables used in the analysis (see Methods in the online supplementary data). As our objective was to identify homogeneous asthma phenotypes to better assess risk factors for asthma, we decided to focus on personal characteristics (age and sex), phenotypic characteristics (asthma symptoms over the previous 12 months, age of asthma onset, asthma exacerbation, allergic characteristics, lung function and airway hyperresponsiveness (AHR)) and asthma treatment. Asthma treatment was included as it has a direct impact on the clinical features of the disease, may partly reflect the activity of asthma and has already been used in a previous study with a similar purpose [5]. To comply with the conditional independence assumption of LCA (i.e. the assumption that, within each latent class, all input variables are statistically independent of each other), the original list of 18 variables (table E1 in the online supplementary data) was reduced using an exploratory factor analysis, a multivariate approach that allowed identifying variables that represented similar dimensions (supplementary methods, and tables E2 and E3 in the online supplementary data). The 18 variables were thus reduced to 14 independent variables: age, sex, age of asthma onset, woken up by attack of coughing, asthma symptom score, chronic cough or phlegm, asthma attacks, and asthma exacerbation in the previous 12 months, the type of asthma treatment, eczema, rhinitis, atopy (skin prick tests or specific immunoglobulin (Ig)E), total IgE, and forced expiratory volume in 1 s (FEV1). AHR (provocative dose causing a 20% fall in FEV1 ≤1 mg methacholine) was not included in the factor analysis because it was missing for all subjects with low lung function at baseline (FEV1 <70% predicted for ECRHSII and <80% pred for EGEA2 precluded individuals from undergoing bronchial challenges).

In order to determine the number of latent classes, models with different numbers of latent classes were compared using the Bayesian information criterion (BIC) and the model with the lowest BIC was selected. Each subject was assigned to the latent class for which they had the highest membership probability [6].

In order to validate the identified phenotypes, we assessed their discriminative properties according to health-related quality of life (HRQoL), which was assessed using the total Asthma Quality of Life Questionnaire (AQLQ) score [15]. We hypothesised that HRQoL differences observed between the phenotypes identified by LCA were of stronger magnitudes than HRQoL differences observed between phenotypes identified on a single variable included in the classification (atopy, age of asthma onset and asthma treatment) or a composite score, such as asthma control assessed following Global Initiative for Asthma (GINA) guidelines [13, 16, 17]. In order to allow for the comparison of the HRQoL differences observed across the variables, effect sizes were computed as the ratio of the mean difference between the two groups divided by the pooled standard deviation, as proposed by Cohen [18].

In the EGEA2 study, a further dimension of validity was studied by comparing two inflammatory markers, blood eosinophil and neutrophil counts, between the phenotypes identified by the LCA.


Description of populations

The populations studied are described in table 1. The individuals in ECRHSII were older and more often female compared with the individuals in EGEA2. The prevalence of asthma symptoms over the previous 12 months was comparable between the two studies after adjustment for age and sex, except for shortness of breath following activity (less often reported in ECRHSII than in EGEA2) and nocturnal shortness of breath (more frequently reported in ECRHSII than in EGEA2) (table 1). Because of the different study designs in the two studies, individuals in ECRHSII had less early-onset asthma, allergic characteristics and severe exacerbations.

View this table:
Table 1– Description of the individuals with asthma in the European Community and Respiratory Health Survey (ECRHS)II and Epidemiological Study on the Genetics and Environment of Asthma (EGEA)2 studies

Latent class analysis

Using the BIC criteria, a model with four latent classes was selected as the best model for the ECRHSII data. The mean highest posterior probability was high (83%), indicating that participants were assigned to classes with a fairly high probability. Phenotype A (36.1%), “active treated allergic childhood-onset asthma”, is characterised by individuals with atopic asthma and active disease (asthma symptoms and asthma treatment) at the time of examination (table 2). Compared with the other three groups, individuals belonging to this group had AHR more often. Phenotype B (19.2%), “active treated adult-onset asthma”, was characterised by older subjects with adult-onset asthma (compared with the three other groups); they were mostly females with active disease at the time of examination, and many of them had an asthma symptom score of three or more and reported an asthma attack in the previous 12 months. Compared with the other three groups, the probability of chronic cough or phlegm was the highest in this group. Phenotypes C (28.9%) and D (15.8%) were both characterised by individuals with no or few asthma symptoms and no asthma treatment at the time of examination, these two groups differed in atopy status: phenotype C was labelled “inactive/mild untreated allergic asthma” and phenotype D was labelled “inactive/mild untreated nonallergic asthma”. Compared with the other three groups, allergy-related variables (rhinitis, atopy and IgE <100 IU·mL−1) and AHR were lowest in phenotype D.

View this table:
Table 2– Characteristics of the European Community and Respiratory Health Survey (ECRHS)II population and probability of individuals presenting the characteristics, by inclusion in each of the four phenotypes identified by the latent class analysis

Similarly, in the EGEA2 population, the best-fitting model had four latent classes (phenotypes E–H). The mean highest posterior probability was similar to the one observed in ECRHSII (88%). Phenotype E (34.6%), “active treated allergic childhood-onset asthma”, was composed of young individuals with childhood-onset asthma and atopy, and active disease at the time of examination (table 3). Phenotype F (15.0%), “active treated adult-onset asthma”, was characterised by older subjects with adult-onset asthma and active disease at the time of examination (92% had more than one symptom and 68% used daily asthma treatment). Compared with the other three classes, the individuals belonging to this group reported asthma exacerbation and chronic cough or phlegm more often, and more often had an FEV1 <80% pred. Phenotypes G and H were both composed of subjects with no or few asthma symptoms and asthma treatment, but differed mainly in age, age of asthma onset and allergic phenotypes. Phenotype G (24.8%) was labelled “inactive/mild untreated allergic childhood-onset asthma” and phenotype H (25.6%) was labelled “inactive/mild untreated adult onset asthma”. Compared to the three other phenotypes, allergy-related variables (atopy and IgE <100 IU·mL−1) and AHR were the lowest in phenotype H.

View this table:
Table 3– Characteristics of the Epidemiological Study on the Genetics and Environment of Asthma (EGEA)2 population and probability of individuals presenting the characteristics, by inclusion in each of the four phenotypes identified by the latent class analysis

Discriminative properties of the identified subgroups with regard to quality of life, and blood eosinophil and neutrophil counts

In both studies, strong associations were found between the four phenotypes and the total AQLQ score (fig. 1). In ECRHSII, the difference in HRQoL score between the two most contrasted asthma phenotypes (B and C) identified by the LCA corresponded to an effect size of 1.4. This effect size was larger than any of the differences in HRQoL score observed for the other asthma classifications (effect sizes <1.3) (fig. 1a). Similarly, in EGEA2, the strongest difference in the total AQLQ score between all asthma phenotypes was observed for two asthma phenotypes identified using the LCA (F and G; effect size 2.0). In comparison, the effect size comparing controlled and uncontrolled asthma was 1.7 (fig. 1b). In both samples, the phenotype “active treated adult-onset asthma” (B and F) was associated with the poorest HRQoL.

Figure 1–

Differences on the total Asthma Quality of Life Questionnaire (AQLQ) score between the asthma phenotypes identified by latent class analysis and other a priori asthma subgroups (atopy, age of asthma onset, asthma treatment, asthma control defined following the Global Initiative for Asthma 2006 guidelines [17]). Results observed in the a) European Community and Respiratory Health Survey II and b) Epidemiological Study on the Genetics and Environment of Asthma 2 are presented. Bars represent means and whiskers represent standard deviations. #: p<0.0001; : p = 0.0002; +: p = 0.13.

In the EGEA2 study, blood eosinophil and neutrophil counts were strongly associated with the phenotypes (p<0.0001) (table 4). Eosinophil count was highest in phenotype E (active treated allergic childhood-onset asthma) and lowest in phenotype H (inactive/mild untreated adult onset asthma). Neutrophil count was highest in phenotype F (active treated adult-onset asthma) and lowest in phenotype G (inactive/mild untreated allergic childhood-onset asthma).

View this table:
Table 4– Blood eosinophil and neutrophil counts in the Epidemiological Study on Genetics and Environment of Asthma (EGEA)2 population according to the phenotypes identified by the latent class analysis


Our LCA of two large epidemiological studies revealed four distinct asthma phenotypes. Two of these phenotypes were similar in both populations (“active treated allergic childhood-onset asthma” and “active treated adult-onset asthma”) and corresponded to phenotypes encountered in clinical practice. The other two phenotypes were composed of subjects with inactive or mild untreated asthma, which differed between each other by atopy and age of asthma onset. Interestingly, the asthma phenotypes identified by the LCA significantly discriminated levels of HRQoL more efficiently than simple clinical asthma classification. Blood eosinophil and neutrophil counts were significantly associated with these phenotypes.

One strength of the present study lies on the use of two well characterised and large populations of individuals with asthma. To our knowledge, this is the first time that a clustering approach aimed at identifying asthma phenotypes has been applied in adults with asthma (ever) recruited in a population-based study, allowing us to cover a large range of asthma phenotypes. The purpose of trying to single out homogeneous asthma phenotypes in epidemiological settings is to increase the power to identify risk factors associated with asthma and, therefore, to better understand diseases mechanisms. In this context, including subjects who had ever had asthma and not only current asthma, as in clinical settings, may bring complementary insights to the understanding of persistent versus remittent asthma. The analysis, conducted in the two asthma populations independently, allowed assessment of to what degree the phenotypes obtained differed between these two epidemiological studies relying on standardised protocols but different designs (a European community-based study, and a French case–control and family-based study). It is remarkable that similar results were observed in both populations. The lack of availability of biomarkers in ECRHSII did not allow to include markers of inflammation in the cluster analysis. Inflammation markers have previously been identified as major phenotyping criteria [4, 19].

The LCA, a model-based clustering approach, has been chosen because it is well designed to treat categorical variables included in the analysis; it handles missing data and, therefore, allows consideration of the whole sample in the analysis. The application of this method in children with asthma led to the identification of different wheezing phenotypes [3, 6]. Although very interesting, the findings provided by exploratory analyses have to be interpreted in the context of future work in order to address whether the identified phenotypes are relevant from clinical and aetiological perspectives.

Replication of the results in other datasets is important when using these exploratory approaches; however, such replication is difficult, as the phenotypes identified are dependent on the populations under study (clinical or population based) and on the set of selected variables. Nevertheless, it is noteworthy that phenotypes identified in the present study show overlap with clusters described by Haldar et al. [4] and Moore et al. [5], which relied on different study designs and different a priori lists of selected variables. All three studies identified a phenotype composed of subjects with early-onset atopic asthma. As previously identified in a primary care dataset [4], we also identified groups of benign (mild) asthma; mild asthma was split in two groups according to atopy in ECRHSII and age at onset in EGEA2. Interestingly, compared with the other three phenotypes, phenotypes B in ECRHSII (which consisted mainly of females with late-onset disease and no atopy) and F in EGEA2 (which consisted mainly of subjects with late-onset disease no atopy and with airflow limitation) showed similar characteristics to phenotype 5 in the study by Moore et al. [5]. Moreover, neutrophils were highest in phenotype 5 in the study by Moore et al. [5] and phenotype F in EGEA2 in the present study.

Our findings suggest that treatment is an important feature toconsider when identifying subgroups of subjects in asthma populations in developed countries. This observation did not seem dependent on geographical differences in clinical practices, given the international and multicentre nature of ECRHS. Although factors related to healthcare utilisation and social criteria are associated with the use of asthma treatment, the latter is highly associated with the activity and severity of the disease. The approach that consists of combining clinical features with the level of asthma treatment to distinguish subgroups of subjects with a differential severity, as suggested by the GINA 2002 guidelines, has been suggested by epidemiological results in populations [20]. Furthermore, a genome-wide linkage analysis on asthma quantitative score conducted in the EGEA study showed that scoring asthma severity based on clinical items and asthma treatment increased power to detect linkage, compared with clinical items only [21].

One of the earliest approaches to identifying asthma phenotypes was the differentiation between allergic and nonallergic asthma [2224]. Allergy-related variables played a critical role in the classification in both studies, but to a greater extent in ECRHSII, where two allergic phenotypes (A and C) and two nonallergic phenotypes (B and D) were clearly identified. The less critical role of allergy in the classification in EGEA2 may be explained by the higher prevalence of atopy in EGEA2 compared with ECRHSII, probably resulting from different study designs, with inclusion of cases from chest clinics and children, a population more prone to allergic asthma in EGEA2. Early-onset and adult-onset asthma are well-established asthma phenotypes [25]. A recent study conducted in ECRHS also provided epidemiological evidence for distinguishing adult-onset from early-onset asthma [26]. Moore et al. [5] also showed, in their population of adults with more severe asthma, the importance of the age of onset in phenotyping asthma. Asthma phenotypes show age-related variations [27] and, accordingly, the age at examination was identified in the EGEA2 study as a major phenotyping criterion. Moore et al. [5] and Haldar et al. [4] also identified groups of subjects composed of older subjects.

The distinct asthma phenotypes defined by the LCA exhibited strong differences in HRQoL, even stronger than when using other existing asthma classifications. Also, it is reassuring that our results using an exploratory method are consistent with observations from clinical practice, with phenotypes B and F (both “active treated adult-onset asthma”) being associated with the poorest HRQoL and the lowest FEV1.These results show the strong discriminative properties of our classification with regard to HRQoL, the patient's own perception of their health status. It is also noteworthy that the phenotypes showed significant differences in eosinophil and neutrophil counts, two objective measurements of the inflammatory component of the disease [19].

Despite all the research efforts in asthma genetics over the last decade, the genetic basis of asthma remains largely unknown [28]. Recent genome-wide association studies have confirmed the genetic heterogeneity of asthma according to age at onset [29]. To better benefit from the existing genomic data, there is a need to reduce phenotypic heterogeneity by the improvement of the phenotype definition [2]. Genetic studies relying on more homogeneous phenotypes, such as those defined by a multivariate approach like ours, appear to be a promising approach to this problem.

In summary, the current analyses provide further evidence for asthma heterogeneity in adults in the general population and support the use of multivariate statistical techniques that allow a more integrated classification of asthma. Considering these more homogeneous phenotypes in future studies could lead to identification of novel risk factors, genetic as well as environmental, and to improve the understanding of the disease.


We thank the Epidemiological Study on Genetics and Environment of Asthma (EGEA) cooperative group members as follows. Coordination: F. Kauffmann (Inserm CESP/U 1018, Villejuif, France), F. Demenais (genetics; Inserm U946, Paris, France) and I. Pin (clinical aspects; CHU Grenoble, Grenoble, France). Respiratory epidemiology: M. Korobaeff (EGEA1) and F. Neukirch (EGEA1) (both Inserm, U700, Paris), I. Annesi-Maesano (Inserm U707, Paris), F. Kauffmann, N. Le Moual, R. Nadif and M.P. Oryszczyn (Inserm CESP/U1018, Villejuif, France), and V. Siroux (Inserm U823, Grenoble, France). Genetics: J. Feingold (Inserm U393, Paris), E. Bouzigon, F. Demenais and M.H. Dizier (all Inserm U946, Paris), and I. Gut and M. Lathrop (both Centre National de Génotypage, Evry, France). Clinical centres: I. Pin and C. Pison (both CHU Grenoble, Grenoble, France), D. Ecochard (EGEA1), F. Gormand and Y. Pacheco (all CHU Lyon, Lyon, France), D. Charpin (EGEA1) and D. Vervloet (both CHU Marseille, Marseille, France), J. Bousquet (CHU Montpellier, Montpellier, France), A. Lockhart (EGEA1), R. Matran (both Hopital Cochin, Paris), E. Paty and P. Scheinmann (both Hopital Necker, Paris), and A. Grimfeld and J. Just (both Hopital Trousseau, Paris). Data and quality management: J. Hochez (EGEA1; Inserm ex-U155, Paris), N. Le Moual (Inserm CESP/U1018, Villejuif, France), C. Ravault (Inserm ex-U780, Villejuif, France), N. Chateigner (Inserm ex-U794, Paris), and J. Ferran (CHU Grenoble, Grenoble, France).

The authors thank all those who participated in the setting of the EGEA study and the various aspects of the examinations involved: interviewers; technicians for lung function testing and skin prick tests, blood sampling and IgE determinations; coders; those involved in quality control, data and sample management; and all those who supervised the study in all centres. The authors are grateful to the three CIC-Inserm centres in Necker, Grenoble and Marseille who supported the study and in which subjects were examined.

We thank the Principal Investigators and Senior Scientific Team in ECRHS: P. Vermiere (deceased) and J. Weyler (both University of Antwerp, Antwerp, Belgium); R. Jogi (University Hospital Tartu, Tartu, Estonia); F. Neukirch and B. Leynaert (both Inserm U700, Paris, France); I. Pin (CHU Grenoble, Grenoble, France); C. Raherisson (CHU Bordeaux, Bordeaux, France); J. Bousquet (CHU Montpellier, Montpellier, France); J. Heinrich and M. Wjst (both Helmholtz Zentrum Munchen, Munich, Germany); K. Richter (University of Hamburg, Hamburg, Germany); T. Gislasson (University of Iceland, Reykjavik, Iceland); R. de Marco (University of Verona, Verona, Italy); M. Bugiani (Azienda Sanitaria Locale 4 – Regione Piemonte, Turin, Italy); I. Cerveri and A. Marinoni (both University of Pavia, Pavia, Italy); M. Kerkhof and J. Schouten (both University of Groningen, Groningen, the Netherlands); E. Omenaas and C. Svanes (both University of Bergen, Bergen, Norway); J.M. Antó, J. Sunyer, M. Kogevinas, J.P. Zock, X. Basagaña, A. Jaen and F. Burgos (all CREAL, Barcelona, Spain); J. Maldonado (Hospital Juan Ramón Jiménez, Huelva, Spain); J. Moratalla (Complejo Hospitalario Universitario de Albacet, Albacete, Spain); F. Payo (Hospital Universitario Central de Asturias, Oviedo, Spain); I. Urrutia (Hospital Galdakao-Usansolo-Osakidetza, Galdakao, Spain); C. Janson (University of Uppsala, Uppsala, Sweden); K. Franklin, B. Forsberg (both University of Umea, Umea, Sweden); K. Toren (University of Goteburg, Goteburg, Sweden); N. Kunzli and N. Probst-Hensch (Both Swiss Tropical Institute, Basle, Switzerland); D. Jarvis (Imperial College London, London, UK); M. Burr and J. Watkins (both University of Cardiff, Cardiff, UK); and S. Buist and M. Osborne (both Oregon Health and Sciences University, Portland, OR, USA).

The authors are indebted to all the individuals who participated to the ECRHS and EGEA studies, without whom that study would not have been possible.


  • This article has supplementary material available from www.erj.ersjournals.com

  • Support Statement

    The Epidemiological Study on Genetics and Environment of Asthma (EGEA) was supported, in part, by grants from: Merck Sharp and Dohme; the Hospital Program of Clinical Research (Paris, France); National Research Agency (ANR) Health Environment, Health-Work Program; ANR Biological Collections for Health Program; French Agency of Health Safety; Environment and Work; National Scientific Committee of the Medico-Technology Support at Home; and Isere Committee Against Respiratory Diseases. J. Garcia-Aymerich has a researcher contract from the Instituto de Salud Carlos III (CP05/00118), Ministry of Health (Madrid, Spain).

  • Statement of Interest

    A statement of interest for the study itself can be found at www.erj.ersjournals.com/site/misc/statements.xhtml

  • Received July 28, 2010.
  • Accepted November 21, 2010.


View Abstract