## Abstract

There is abundant literature on how to select and statistically deal with predictors in prediction models. Less attention has been paid to the choice of the outcome. We assessed the impact of different asthma definitions on prevalence estimates and on the prediction model's performances.

We searched PubMed and extracted data of definitions used to diagnose childhood asthma (between 6 and 18 yrs) in cohort studies. Next, using data from an ongoing cohort study (n = 186), we constructed and compared four prediction models which all predict asthma at age 6 yrs, using a fixed set of predictors and four different definitions in turn. We defined an area of clinical indecision (posterior probability between 25% and 60%) and calculated the number of children who remained inside this area.

122 papers yielded 60 different definitions. Prevalence estimates varied between 15.1% and 51.1% depending on the asthma definition used. The percentage of children whose posterior asthma probability was in the area of clinical indecision varied from 14.9% to 65.3%.

Variation in definitions and its effect on the performance of prediction models may be another source of otherwise inexplicable variation in daily clinical decision making. More uniformity of operational asthma definitions seems needed.

Asthma is the leading chronic disease among children. Many definitions of asthma have been proposed in guidelines and used in follow-up studies and clinical trials 1–3. As with many other diseases, asthma prevalence estimates vary widely across time and regions. Even in one region at a single point in time, asthma prevalence estimates may differ by the use of different populations, study designs and illness definitions. Usually, these sources are difficult to disentangle. The same applies to asthma prevalence estimates that are conditional on (one or more) risk predictors such as an atopic constitution or exposure to tobacco smoke. Such conditional prevalence estimates are usually obtained through (multivariable) prediction models 4–7. Prevalence estimates can be useful for healthcare resource planning purposes 8, 9, while prediction models are developed mostly for clinical applications 4–7.

The definition of asthma in young children is complex and varies across authoritative sources 1–3. Even if the conceptual definition of asthma is unequivocal, the operational definitions used in empirical studies may well differ. It is unclear to which extent asthma prevalence estimates are determined by the particular operational illness definition. Therefore, we set out to provide an overview of recently used definitions to diagnose asthma in children aged between 6 and 18 yrs in published literature of research in which asthma was an end-point. We then assessed the impact of four exemplary asthma definitions on prevalence estimates at age 6 yrs. Finally, we determined the impact of definition choice on the performance of prediction models, which all predict asthma at age 6 yrs, by constructing four logistic regression models with a fixed set of three known predictors of asthma using four different, but commonly used, definitions in turn.

## METHODS

### Definitions and operationalisations of asthma

In a MEDLINE search, using PubMed, we searched for studies published between 1998 and 2008 using the MeSH terms “asthma”, “children” and “cohort studies”. Studies that fulfilled the following criteria were included: 1) cohort design; 2) asthma as primary or secondary outcome; 3) participants aged between 0 and 18 yrs; 4) asthma diagnosed between 6 and 18 yrs; 5) ≥100 children included; and 6) English as language of publication.

Papers were selected by one author (K.E. van Wonderen) based on titles and abstracts. If title and abstract were unclear, the full text papers were screened using the same criteria. A second author (L.B. van der Mark) checked a randomly selected 10% of the papers that K.E. van Wonderen had excluded for inadvertent exclusions.

From all included articles one author (K.E. van Wonderen) extracted the following information: 1) definition of asthma; and 2) operationalisation of the definition, that is, the source of the information used to diagnose asthma (*e.g.* parents or medical records) and the instrument used to diagnose asthma (*e.g.* questionnaire or list of diagnostic codes). A second author (L.B. van der Mark) checked the extracted information of a randomly selected 10% of all included papers.

### Prevalence estimates and prediction models' performances using different definitions

To assess the variation in prevalence and prediction model performance we used data from the ARCADE (Airway Complaints and Asthma Development) study, an ongoing prospective cohort study 10. One of the aims of ARCADE is to construct a primary care-based asthma prediction model for pre-school children at risk of developing asthma. Briefly, between 2004 and 2006, 1- to 5-yr-old children at risk for developing asthma were selected from general practices in the Netherlands. “At risk” in this study was defined as “visited the general practitioner with recurrent coughing (two or more visits), wheezing (once or more) or shortness of breath (once or more) in the 12 months previous to enrolment.” All children are being followed up to the age of 6 yrs. At age 6 yrs, a definitive diagnosis of asthma is made according to the operational definition used in the ARCADE study (see below).

For this contribution, we used data from 186 children, aged between 2 and 4 yrs at enrolment, whose follow-up was completed and diagnosis of asthma was made at age 6 yrs. 5-yr-old children were excluded because asthma definitions cover time periods of ≥12 months back. This precludes prediction in a strict sense.

ARCADE was approved by the Central Committee on Research Involving Human Subjects (CCMO/P04.0098C; Den Haag, the Netherlands). Written informed consent was obtained from the parents prior to all measurements.

### Development of different prediction models

We constructed four logistic regression models using a fixed set of three binary known predictors of asthma 11–14. Predictors were: 1) wheezing (during the previous year, but apart from colds); 2) eczema (during the previous year); and 3) specific immunoglobulin (Ig)E directed against house dust mite, cat or dog dander 12, 15. Information on the predictors was collected at time of enrolment (in the ARCADE study) 10.

The first three prediction models were constructed using three operational definitions taken from the literature search (table 1; see Results section for details). The definitions were selected based on the following criteria: 1) definition could be constructed using the ARCADE data; and 2) definitions differed by at least one key clinical component to prevent comparing definitions that are almost similar (table 2). A fourth prediction model was added using the operational definition used in the ARCADE study; that is, a combination of current symptoms (complaints of wheezing and/or shortness of breath and/or recurrent coughing) and/or use of β_{2}-agonists and/or inhaled corticosteroids, both for any length of time during the previous 12 months, in combination with airway hyperresponsiveness to methacholine. Hyperresponsiveness is defined as a provocation concentration of methacholine inducing a 20% fall in forced expiratory volume in 1 s ≤8.0 mg·mL^{−1} 12, 16, 17.

Thus, four asthma prediction models were constructed with a fixed set of three known predictors determined at enrolment (children aged 2–4 yrs) using four different definitions, which all predict asthma at the age of 6 yrs.

### Statistical methods

By multiple imputation, 44 missing IgE values were estimated using several baseline variables collected in ARCADE, such as breastfeeding, history of asthma of the parents and whether the child awoke as a consequence of shortness of breath 18, 19. Five imputed datasets were created and 5×4 regression analyses were run, one for each dataset–definition combination. All further analyses used the mean of the five datasets per definition. Conservatively, per definition, 95% confidence intervals were determined taking the lowest lower bound and highest upper bound of all imputed datasets.

First, the prevalences for the four definitions were compared. Next, the posterior probabilities for the four prediction models were summarised using the 10th, 50th, and 90th centiles of their distributions. To illustrate the potential clinical consequences of these differences between the posterior probability distributions, two decision thresholds were selected. The first threshold we set at 25%, assuming that below that threshold a clinician may well choose a “wait and see” policy as the chance that the child has asthma at age 6 yrs is relatively small. The second threshold was set at 60% assuming that a clinician may pursue a more active management strategy, perhaps including a prescription of anti-inflammatory drugs. Thus, an area of clinical indecision was defined. To be able to focus on a single outcome, the performances of the prediction models were compared, using the proportion of patients who remained in the area of clinical indecision, that is, whose posterior asthma probabilities were between 25% and 60%.

Finally, the areas under the receiver operating curves (AUCs) between the models, as a commonly used measure of overall predictive performance, were compared. All differences and their 95% confidence intervals were calculated using bootstrapping procedures (1,000 times). All calculations were performed using Stata version 10 (Stata Corp., College Station, TX, USA).

## RESULTS

### Literature search for definitions and operationalisations

The overall search yielded 1,238 papers, of which 122 were included. There were no discordances between the two authors with respect to inclusion or extracting information on definitions and operationalisations.

In total, the 122 included papers yielded 60 different definitions (table 2). The most common definitions were: 1) a doctor's diagnosis of asthma ever (10%); 2) a doctor's diagnosis of asthma (time unspecified) (8%); 3) asthma ever (6%); 4) a doctor's diagnosis of asthma ever in combination with asthma symptoms in the previous 12 months (5%); 5) a doctor's diagnosis of asthma ever in combination with symptoms of asthma in the previous 12 months or the use of asthma medication (5%). In total, 34% of the papers used one of these definitions.

The 60 definitions may be categorised in various groups. 62 papers (51%) used a definition which was based on a doctor's diagnosis of asthma with or without other symptoms, medication use or any time constraint. Bronchial hyperresponsiveness or spirometry was a component of the definition in 13 (11%) of the papers. Definitions based on symptoms alone, were also seen in 10 (8%) of the papers. 35 papers (28%) used a definition which was a combination of symptoms, (doctor's) diagnosis of asthma and asthma medication use. Two papers (2%) did not mention any definition.

The three most prevalent operationalisations were: a questionnaire filled in by the parents and or child (58%), interview with the parents and/or child (20%) and a clinical examination by a health professional (7%). In 2% of the definitions it was unclear which operationalisation was used.

### Prevalence estimates and prediction models' performances using different definitions

Table 1 shows the four operational definitions which were used to estimate prevalences and predictive performances of prediction models (see table 3).

For the definition “doctor's diagnosis of asthma ever” (definition 1, “Dr-ever”) it did not seem logical to construct a prediction model since this definition covers the whole period back to birth, which defies the purpose of prediction. Therefore, we did not determine the prediction model performance for the definition Dr-ever.

#### Prevalence estimates

Table 3 shows that prevalence estimates using different definitions ranged from 15.1% (definition 2, “Dr-ever&whe”) to 51.1% (definition 4, “BHR&sym/med”). The prevalence estimate for definition 2 (Dr-ever&whe) is smaller than that according to definition 1 (Dr-ever) since the former requires wheezing and is, therefore, more stringent.

Although a methacholine challenge test was a component of two definitions (definition 3, “BHR&whe” and definition 4, “BHR&sym/med”), prevalence estimates between them varied greatly, with difference of -25.3% (95% CI -31.7– -19.4).

Figure 1 shows that 15% (28 out of 186) and 46% (86 out of 186) of the children were defined as having and not having asthma by all definitions, respectively (overall agreement 61%). This figure also shows that almost all children (95 out of 100) who were defined as having asthma by definition 1 (Dr-ever), definition 2 (Dr-ever&whe), or definition 3 (BHR&whe) had asthma according to definition 4 (BHR&sym/med).

Table 3 shows that the prevalence estimates for definition 1 (Dr-ever) and definition 3 (BHR&whe) were similar, 47 out of 186 (25.3%) and 48 out of 186 (25.8%) respectively. However, figure 1 also shows that definitions 1 and 3 nevertheless disagree in 39 out of 186 (21%) of children.

#### Posterior probability distribution

Table 3 also shows the posterior probability distributions of the three prediction models (as mentioned before, definition 1 (Dr-ever) was omitted from this analysis).

Definitions 2 (Dr-ever&whe) and 3 (BHR&whe) showed a similar posterior probability distribution. The posterior probabilities for definition 4 (BHR&sym/med) differed greatly from the other two definitions. In particular, the 90th centile of definition 2 (Dr-ever&whe; 37.4%) is similar the 50th centile of definition 4 (BHR&sym/med; 40.7%).

#### Predictive performances of prediction models (thresholds)

Table 3 (4th column) shows the predictive performance of the models using the proportion of children who remained in the area of clinical indecision. The percentage of children in this area varied from 14.9% (definition 2 “Dr-ever&whe”) to 65.3% (definition 4 “BHR&sym/med”).

#### Areas under the receiver operating curves

The AUC may be interpreted as the probability that from two randomly drawn children, one with asthma and one without, the one with asthma is assigned a higher probability 20. The AUCs varied from 0.67 for definition 4 (BHR&sym/med) to 0.76 for definition 2 (Dr-ever&whe) with their differences varying from 4% to 9%.

## DISCUSSION

### Main findings

In 122 papers, we found 60 different operational definitions. Applied in a single cohort, we found that prevalence estimates and posterior probabilities varied substantially with the operational definition used. Similarly, the proportion of children that remained in an area of clinical indecision varied greatly with the definition chosen. Although the AUCs between the models were fairly similar, the predictive performances of the models clearly were not.

### Strength and limitations

A strength of this study is that the comprehensive search of the literature for all published cohort studies on asthma in 6- to 18-yr-old children in the previous 10 yrs is unlikely to have missed many operational definitions. In addition, the use of a single cohort, thus fixing time, region, study population and study design, allowed us to isolate the effect of definitions. We see the following limitations. First, 10% of the papers excluded by the first author were checked by a second author. Although there were no discordances between first and second author, we cannot exclude that on a total of 1,126 papers, up to 30 (*i.e.* the upper 95% confidence limit of an exact confidence interval around the proportion of zero found in a sample size of 112) discordances between the authors might have occurred. However, it is unlikely that these papers contained different asthma definitions and would affect our findings and message to any important degree. Secondly, different prediction models were compared using a fixed set of three plausible predictors of asthma. Prediction models using other predictors might show different results. Thirdly, we were unable to construct prediction models based on clinical examination or medical records, since the ARCADE data do not contain such data. Fourthly, 44 observations for specific IgE were missing. In general, test results that cannot be obtained reliably in clinical practice should not be used in prediction models. However, we believe that the missing data were due to the research situation. In a number of children, IgE measurements were not obtained because parents did not make an appointment at their general practitioner (GP)'s surgery. Furthermore, specific IgE outcomes were missing due to GP assistants taking insufficient blood for analysis. Although GP assistants had received instructions on how to perform the measurements, occasionally they failed to collect enough blood. We believe that these problems will seldom occur in practice where, most likely, parents will follow their doctor's advice and visit a dedicated laboratory. Finally, to illustrate potential clinical consequences, we introduced two thresholds. These thresholds were pre-selected by us and not determined by formal cost-effectiveness analysis or cost–utility analysis. In reality, physicians may use different thresholds in their decision making, although we think that they will lie within the proximity of those we selected.

### Relationship to other studies

#### Literature search

62 (51%) of the papers were (partly) based on the definition “a doctor's diagnosis of asthma ever”. This definition is based on the International Study of Asthma and Allergies in Children (ISAAC) questionnaire's core questions “Has your child ever had asthma?” combined with “Has your child's asthma been confirmed by a doctor?” Standardised questionnaires are easy to use and allow prevalence comparisons and trends worldwide, but they can also be subjective and highly dependent on the interpretation and judgement of the person responding to the questionnaire 21.

Only 7% of the definitions used objective criteria for “signs and symptoms” as documented by a physician. Since prevalence estimates based on symptoms of wheezing determined by questionnaire differed from definitions based on clinical examination, this point deserves our attention 21, 22. 11% of definitions were also based on more objective criteria, such as spirometry or severity of bronchial hyperresponsiveness. Measuring bronchial hyperresponsiveness is less influenced by variation in symptom perception. Our literature search made clear that various symptoms are being used in combination with bronchial hyperresponsiveness. Wheezing appeared to be the symptom most often used.

#### Prevalence estimates

Our findings that prevalence estimates are affected by the choice of definition is confirmed in other studies, although these focused on crude prevalence estimates, not on prediction models. Overall, with the exception of that of Greenlee *et al*. 23, lower prevalence estimates were found for definitions based on medical records *versus* parental reports based on the ISAAC questionnaire 23–25. Lower prevalence estimates found in medical records than with parental report may be due to incompleteness of records, physicians mentioning the term asthma to parents without believing strongly enough in the diagnosis to document it, errors in parental memory, or combinations of these factors. Even parental forgetfulness in combination with more severe incompleteness of records may be compatible with these figures.

The prevalence estimates we report may strike as high. Prevalences varied from 15.1% (definition 2, Dr-ever&whe) to 51.1% (definition 4, BHR&sym/med). The dataset of the ARCADE study, however, purposefully consists of data of children in whom a general practitioner is likely to consider a diagnosis of asthma. Furthermore, the majority (158 out of 186) of the children in the analysis were enrolled at age 3 and 4 yrs old, that is, mostly past the stage of transient wheezing, and therefore a higher prevalence could be expected 26, 27. This also makes comparison with studies such as the one by Wördemann *et al.* 22 difficult.

The definition used in the ARCADE study (definition 4, BHR&sym/med) yielded a much higher prevalence than definition 3 (BHR&whe) (51.1% *versus* 25.8%). Although both definitions were based on bronchial hyperresponsiveness, 25.3% of the bronchially hyperresponsive children had no symptoms of wheezing during the previous 12 months but experienced symptoms of coughing and/or shortness of breath and/or had recently used asthma medication.

The similar prevalence estimates (25.8% with definition 1, Dr-ever, and 25.3% with definition 3, BHR&whe) did not label the same children with asthma highlighting that different definitions result in similar prevalence estimates but does not imply that the same children are labelled as having asthma.

#### Prediction model's performance

As far as we know, this is the first time that the variation in the performance of a prediction model associated with the use of different definitions was studied. Although Miller 25 selected predictors of asthma by logistic regression analysis using different definitions, she did not compare different prediction models using different definitions.

We chose, as the outcome of main interest, the proportion of patients remaining in the area of clinical indecision. This method is related to reclassification methods that are currently gaining ground 28 and are, in our opinion, more informative than the area under the ROC curve (AUC) method to express how well a prediction model tells the ill from the healthy. In addition, this reclassification method more directly relates to clinical decision making.

Illness definition issues are not restricted to asthma, nor to prediction models 29, and we believe that many areas of medicine may benefit from scrutinising illness definitions and the variation of operationalisations in research and practice 30.

### Conclusion

Although much has been written on how we should select and statistically deal with predictors used in prediction models, the role of the dependent variables in such models seems to have received less emphasis. We have shown that measurement choices underlying the construction of the outcome or dependent variable may have large impact on estimates of prevalence as well as on predictive probabilities as provided by a prediction model. This variation in posterior probabilities is likely to have its impact on clinical management with both over- and undertreatment as a consequence. Nevertheless, achieving agreement on illness operational illness definitions will remain a challenge.

## Acknowledgments

We would like to thank the general practitioners from HAG-net-AMC, Zorggroep Almere and Prinsenhof, Leemhuis and Buitenhuis, the Netherlands, for their time and effort in this study. We would further like to thank all children with their parents for participating in the study. We also would like to thank P. van Steenwijk (Zorggroep Almere, Almere, the Netherlands) for her invaluable help and M. IJff, A. Karsten, and A. Buijs (Dept of General Practice, Academic Medical Center, Amsterdam, the Netherlands) for their assistance.

## Footnotes

↵This article has supplementary material available from www.erj.ersjournals.com

**Support Statement**This study was financially supported by the Netherlands Asthma Foundation, Leusden, the Netherlands (3.4.02.20 and 3.4.06.078) and by the Stichting Astma Bestrijding, Amsterdam, the Netherlands.

**Statement of Interest**None declared.

- Received September 30, 2009.
- Accepted December 10, 2009.

- ©ERS 2010