Abstract
Phase One of the International Study of Asthma and Allergies in Childhood (ISAAC) has reported the prevalence of asthma, rhinitis and eczema symptoms in children. In 99 centres from 40 countries, a total of just under 317,000 13–14-yr-old children also completed a video questionnaire, showing the symptoms and signs of asthma. This first video sequence has been compared to the ISAAC written question asking about current wheezing to explore variations in agreement and the contribution of each questionnaire to wheezing prevalence between centres, by region and language groups.
In general, responses to the video questionnaire gave a lower prevalence than the written questionnaire and responses were closely correlated. The overall proportion of agreement was high, mean 0.89, but unbalanced, with good negative agreement but poor positive agreement. Chance corrected agreement using Cohen's kappa coefficient, was generally low, with only 20 centres with kappa >0.4. The contribution of each questionnaire to wheezing prevalence also varied between centres and suggests that written questions about wheezing are variably understood and interpreted by 13–14 yr olds.
International comparisons of wheezing and its audiovisual presentation suggest that adolescents interpret a written question about wheezing differently from its audiovisual presentation and that this interpretation shows variation between centres. This relationship and the interpretation of both written and audiovisual presentation of symptoms requires further study in order to better predict asthma.
- asthma
- International Study of Asthma and Allergies in Childhood
- prevalence
- video questionnaire
- wheezing
- written questionnaire
Epidemiological surveys of asthma have relied largely on questionnaire responses to questions about asthma symptoms, a diagnosis of asthma or drug treatment used for asthma management. Each of these has attendant problems. Symptoms may not be specific to asthma and may be difficult for the patient to interpret 1, while a diagnosis of asthma or drug therapy may be biased by the availability of healthcare services 2. These issues become particularly important when measuring asthma prevalence over time. Magnus and Jaakkola 3 have reviewed repeat asthma prevalence studies and concluded that diagnostic fashion and criteria may have changed with time and that the everyday perceptions of terms such as wheezing may well have changed, leading to an increased awareness, and hence to increased reporting. This review underlines the importance of understanding responses to questionnaires over time, but it is equally important to understand how their interpretation differs between countries, languages and cultures.
More recently, simple field methods for measuring bronchial hyperresponsiveness (BHR) have been developed and refined 4, 5, and have been used, often in conjunction with skin-prick tests, in an attempt to differentiate a group with atopy who have greater clinical severity and physiological abnormality 6. While BHR is useful as a marker of an airway abnormality strongly associated with asthma, it is not specific for asthma, may not accurately reflect asthma severity 7 and is too time consuming and expensive for large-scale surveys. These difficulties and the problems associated with questionnaire responses across languages and cultures 8 led the authors of this paper to develop and validate a video questionnaire for measuring asthma symptoms in children (a shortened version of the video questionnaire can be viewed at www.wnmeds.ac.nz/academic/med/warg/ADV.html) 9, 10. Following various refinements, this questionnaire has been used as an optional part of International Study of Asthma and Allergies in Childhood (ISAAC) for self-completion by 13–14 yr olds. ISAAC provides a unique opportunity to compare the audiovisual presentation of asthma symptoms and signs with written equivalents, and to examine the interpretation of asthma symptoms when presented in a variety of languages and cultures.
Materials and methods
Phase One of ISAAC developed simple methods for measuring the prevalence of childhood asthma, rhinitis and eczema symptoms for international comparisons, suitable for different geographical locations and languages.
The aims, methods and structure of the ISAAC project have been described previously 11. Briefly, ISAAC is an international collaboration with three aims: 1) to compare the international variation in the prevalence and severity of asthma, rhinitis and eczema in children using simple self-completed questionnaires; 2) to provide a baseline for assessment of trends in the prevalence and severity of these diseases; and 3) to provide a framework for further aetiological research into their causes.
The sampling frame was schools containing children aged 6–7 and 13–14 yrs, within a specified geographical area (ISAAC centre). Children aged 13–14 yrs were selected because they were able to self-complete the written and video questionnaires. This paper concerns only 13–14-yr-old children and only those centres that completed both the written and video questionnaires. The recommended sample size was 3,000, used to ensure good prevalence estimates for severe asthma. The ISAAC core questionnaires have been published previously 11, and both the written and video questionnaires were piloted prior to the main study 12. Most of the questions used for asthma symptoms are based on questions used in previous respiratory epidemiological studies and include both sensitive and specific indicators of asthma 1.
Translation
In 71% of centres the written questionnaire was translated from English into one (86%) or more local languages. The method of translation was standardised, according to guidelines developed in Germany 13 and adapted for ISAAC (ISAAC document 41). These guidelines included the use of translators familiar with asthma terminology (98% of centres that used translations), consultation with the local community (67%) and back translation to English by an independent translator (95%).
Video questionnaire
Two versions of the video questionnaire, the “European” version (AVQ 2.0) and the ‘International‘ version (AVQ 3.0), were used in the ISAAC Phase One study. Each contains five short sequences of asthma symptoms and signs. Each sequence is followed by three questions asking the respondent if their breathing has ever been like the person's in the video, if “yes” “in the last 12 months?” and if “yes” again “in the last month?”. The first sequence shows a young person seated with clearly audible wheezing, but without breathlessness and no evidence of airway obstruction. Four further sequences are shown in the video: 1) exercise-induced wheezing, 2) waking at night with wheezing, 3) nocturnal coughing and 4) a final sequence showing a severe attack of asthma.
The International version was developed to make the exercise sequence more universally applicable, by showing running rather than indoor aerobic exercise, and also to provide ethnic diversity amongst the subjects in the video. In the European version all subjects are Caucasian, in the international version subjects include two Caucasians, two Maori males, one Chinese male and one Indian female. In the European version the text of the questions appears on the screen translated into the local language, while in the international version no written text appears on screen and a field researcher reads the same instructions to the subjects. The first video sequence, showing wheezing at rest, and the fourth sequence, showing coughing at night, are identical in both versions. Only the first video sequence has been compared with its written counterpart in this paper.
The video questionnaire was encouraged but not a mandatory part of ISAAC Phase One. All children were shown the video after they had completed the written questionnaire. Both questionnaires have been validated against nonspecific BHR among children in New Zealand, Australia and Hong Kong; the European version in 12–14 yr olds in New Zealand 9, 10, the international version amongst 12–18-yr-old Chinese children in Hong Kong 14 and among 13–14-yr-old students from mixed ethnic backgrounds in Australia 15. This latter validation was against a challenge with hypertonic saline. Each of these studies showed that video questions overall had similar sensitivity and specificity for BHR as the written questions.
Data management and analysis
Each collaborating centre or a national centre entered information on the questionnaires exactly as recorded by the child (except five centres where data was re-coded to eliminate inconsistent responses). Thus apparent inconsistencies between responses to the stem and branch questions were generally accepted and not re-coded. Children in some centres answered the written questionnaire but did not see the video, they have been excluded from this analysis.
Regional classification
Centres have been classified by World Health Organisation (WHO) regions. For the purposes of this comparison, regional inclusions of five centres have been changed from the previous WHO classification outlined in the published methods 11. Karachi has been included in Southeast Asia instead of the Eastern Mediterranean and the four Finnish centres have been included in Western Europe rather than North Eastern Europe. These changes were made a priori before any analysis had been undertaken. Pakistan is geographically part of Southeast Asia and shares closer cultural and linguistic ties with India than the Eastern Mediterranean. The Finnish centres appeared to be more closely aligned with Western Europe economically and culturally than with the recently independent Baltic States, Russia and Poland.
Measures of agreement
A variety of measures of agreement have been used in studies comparing categorical and numerical measurements. Cohen's kappa has often been used for measuring reliability between observers, as it compares the observed agreement with perfect agreement while correcting for chance. Landis and Koch 16 have suggested kappa values ≤0.4 indicate poor agreement, values of 0.4–0.6 moderate agreement and ≥0.8 excellent agreement. However, as an omnibus index, kappa suffers from a number of problems that need to be taken into account when trying to understand the relationships between observations. The behaviour of this coefficient with varying prevalence was originally explored by Kraemer 17 and subsequently the paradox of high proportions of agreement (Po) but low kappa values was reviewed by Feinstein and Cicchetti 18, 19. Their observations on kappa are relevant to the comparisons of the ISAAC questionnaires. They have argued that in examining agreement, four measures should be presented, kappa, Po and the proportion of positive and negative agreement (Ppos, Pneg). This is because kappa will tend to diverge from the proportion of agreement in relation to prevalence and to an even greater extent when there is a large disparity in the observer's agreement for positive or negative ratings. Proportions of positive and negative agreement illustrate this disparity.
These four measures have been calculated. The proportions of agreement were calculated, as suggested by Feinstein and Cicchetti 18, 19.
Results
In the 13–14 yr age group 155 centres participated, involving 463,801 children. The video and written questionnaire was completed by 316,992 children in 99 collaborating centres in 38 countries. The European version was used in 32 centres and the International version in 67. Participation rates of schools in the older age group averaged 93% and within the schools an average of 91% (range 67–100%) took part. The age range of children varied because of variations in methods of selection within schools, 88% of centres had >70% of children aged 13–14 yrs.
Comparison of written and video questionnaire responses
In all but nine centres the frequency of positive responses to the written question on wheezing in the last 12 months was greater than for the video question. Figure 1⇓ shows the correlation between the written and video responses to the first video sequence and the written question on wheezing in the last 12 months. The other four video sequences were also highly correlated with their written counterparts with R2 values of 0.52 for exercise-induced wheezing, 0.74 for waking with wheezing, 0.57 for waking with coughing and 0.71 for the severe wheezing sequence (not shown).
Per cent positive responses to video question and written question for wheezing in the last 12 months. R2=0.747.
The four measures of agreement between the two questionnaires for wheezing in the last 12 months, are shown in figure 2⇓. Kappa varied from 0.05 in Tirane, Albania, to 0.66 in Chandigarh in India. In 79 of the 99 centres kappa was <0.4, indicating poor agreement, and in 20 centres was 0.4–0.66, showing moderate agreement. Figure 2⇓ shows that the overall proportion of agreement was high, ranging from 0.77–0.98. The proportion of negative agreement was high ranging 0.85–0.99. The proportion of positive agreement was much lower ranging 0.06–0.67.
Kappa, proportion of agreement (Po), proportion of positive agreement (Ppos) and proportion of negative agreement (Pneg) between the video and written question on wheezing in the last 12 months.
Regional variation in written and video questionnaire responses
Figure 3⇓ shows mean kappa and 95% confidence intervals (CI) by region. Wide variation within regions is seen for agreement between the written and video wheezing question. Centres in South East Asia showed the most variation, between 0.11–0.66 in Pune and Chandigarh (both centres in India), respectively. Chandigarh showed the highest level of agreement of any centre. The prevalence of wheezing according to the written questionnaire was low in both of these centres, 1.8% in Pune and 4.2% in Chandigarh. Agreement tended to be lowest in North Eastern Europe, ranging from 0.05 in Tirane to 0.32 in Moscow. Agreement between centres was closest in the ten centres in Oceania, ranging from 0.41 in the Bay of Plenty in New Zealand to 0.49 in Sydney in Australia. The three centres in North America ranged from 0.36 in Hamilton to 0.39 in Seattle. Centres in Western Europe ranged from 0.25 in Empoli in Italy to 0.44 in Bilbao in Spain.
Individual centre kappa and mean kappa and 95% confidence interval (CI) by region. 1: South East Asia; 2: Asia Pacific; 3: East Mediterranean; 4: Latin America; 5: North America; 6: North East Europe; 7: Oceania; 8: West Europe; 9: Africa. Mean and 95% CI are offset for clarity.
Language variation in written and video questionnaire responses
Figure 4⇓ shows variation in agreement by language and mean values of kappa with 95% CI for major language groups that took part. English (group 1) includes centres where English was used exclusively for questionnaire completion. Agreement tended to be higher for English-speaking centres, with less variation in agreement than for other languages. Mean kappa values were similar in Spain and Spanish-speaking Latin America, although one centre in Latin America showed higher agreement than the others. Agreement was similar in Italian-speaking centres but showed greater variation amongst Chinese speakers. Agreement was lower in Russian-speaking centres and kappa varied from 0.08 in Samarkand to 0.32 in Moscow.
Individual centre kappa values and mean kappa and 95% confidence interval (CI) by language groups. 1: English; 2: Spanish (Spain); 3: Spanish (Latin America); 4: Italian; 5: Chinese; 6: Russian. Mean and 95% CI are offset for clarity.
Proportion of wheezing by questionnaire
By asking both a written question on wheezing and showing an audiovisual presentation of wheezing, it is possible to explore combined and individual questionnaire responses as illustrated by the 2×2 table (table 1⇓).
The 2×2 table
All respondents are included in one of the four cells a, b, c or d. Previous presentation of the ISAAC data has included the prevalence of the written response, a+b/a+b+c+d, or the video response, a+c/a+b+c+d. The value of kappa is largely driven by the relationship between the level of agreement (a) and the level of disagreement (b or c). In figure 5⇓ the current authors' have examined each cell, a, b or c, as a proportion of all wheezing (a+b+c), for the major language groups.
Positive responses (shown also within stacked columns) to each questionnaire (□: written; ┘: video) and both questionnaires combined (▪) as a percentage of all positive wheezing responses (two Russian-speaking centres also had some respondents in Latvian).
Figure 5⇑ shows an increasing proportion of positive responses to both video and written questionnaires combined, amongst Russian, Chinese, Italian, Spanish and English speakers, respectively. The proportion of positive responses to the written question alone falls, while the proportion from the video question alone tends to increase, with the exception of English speakers who show a lower proportion of positive responses to the video question. Thus 84% of Russian-speaking wheezing children and 48% of English-speaking wheezing children did not agree when they saw the video. Six per cent of Russian wheezing children and 28% of Spanish wheezing children only reported wheezing when they saw it in the video question.
Discussion
The ISAAC study has already demonstrated large variations in the prevalence of self-reported asthma symptoms throughout the world from the written and video questionnaires 20. This current analysis examined the level of agreement between a single question from each questionnaire and the contribution of each question to all reported wheezing. In general the video questionnaire responses gave a lower prevalence than the written, although in nine centres the video prevalence was higher than the written. The true prevalence of reported wheezing in the last year will lie between a positive response to either questionnaire and a positive response to both questionnaires. The questionnaire responses are highly correlated and show good overall agreement (Po) but poor chance corrected agreement as indicated by the kappa coefficient. This paradox between good overall agreement and low kappa scores results from the discrepancy between the questionnaires for positive and negative agreement. When a respondent answered in the negative to a written question on wheezing they tended to answer negatively when they saw and listened to a subject wheezing on the videotaped recording, i.e. good negative agreement. In contrast, when subjects responded in the affirmative to the written question, a variable proportion responded affirmatively to the video questionnaire, resulting in poor positive agreement and hence low kappa scores. These patterns of agreement have been noted previously from some of the individual ISAAC centres. Thus Pekkanen et al. 21 showed good negative agreement but poor positive agreement between both questionnaires for wheezing questions in Finland. Pizzichini et al. 22 found good concordance but only moderate chance corrected agreement between the wheezing questions in Canada and by calculation from regional ISAAC studies. A further contribution to the low kappa scores is the variable proportion of children who had responded negatively to the written question but then responded positively to the video question. These children would not have been identified with a written questionnaire alone.
In examining agreement across the 99 centres, the generally low level of agreement also shows considerable variation between centres and to a lesser extent variation by region and amongst different languages. Why is chance corrected agreement low and why does it vary between centres and what does it tell us about reported wheezing in 13–14 yr olds? There are a number of reasons for low levels of agreement, all or any of which are likely to be operative both within and between centres and between language groups.
First, kappa suffers from inherent problems as an omnibus measure of agreement, as do all such measures. Thus whenever the prevalence varies between the two questionnaires, kappa can never attain a value of 1, perfect agreement is impossible. This becomes more marked at the extremes of prevalence and with very high degrees of discordance 23. Various strategies have been employed to try and correct these difficulties using different forms of kappa but none have proved particularly useful 18. Kappa is most often used to compare rater responses for a particular outcome where the prevalence is unknown and where there is no gold standard. In this study, use of kappa is somewhat unusual. The current study compares subject responses to two different forms of a question designed to capture similar information. The study is inherently less interested in the level of agreement per se but rather in what agreement or lack of agreement tells us about the interpretation of wheezing and asthma in different populations.
Secondly, the comparison was undertaken by 13–14 yr olds in very different environments and their understanding of the term wheezing is likely to differ from the present authors' understanding, and may vary by region and centre. English-speaking children have difficulty in understanding and explaining the term wheezing when asked by a physician 24. Responses of 13–14 yr olds to the written ISAAC questionnaire give symptom prevalence rates as much as two-fold higher than their parents 25. This may be because parents do not appreciate the frequency of symptoms that their children actually experience or that the 13–14 yr olds misinterpret symptom questions.
Evidence of differences in interpretation are apparent from the ISAAC study. For example, the written question about wheezing in the last 12 months was designed to collect information on any episode of wheezing during that period. Amongst 13–14 yr olds, in every centre except two, children reported higher levels of wheezing after exercise in the previous 12 months than of any wheezing in the past 12 months. In contrast, amongst 6–7 yr olds, where the identical questions were completed by the children's parents, exercise wheezing in the last 12 months was reported at a lower frequency than any wheezing in every centre 26. Thus the written question to 13–14 yr olds has not captured all wheezing in the previous 12 months and was generally interpreted by almost all children to exclude exercise wheezing. The context of the written question on wheezing in the past 12 months has been differently interpreted by 13–14 yr olds compared with what was expected.
Thirdly, the video question may have been interpreted as illustrating more severe wheezing than that experienced by individuals. This is supported by the higher prevalence of wheezing from the written question compared to the video in 90 of the 99 centres. The video questionnaire was designed to illustrate wheezing. The subject in the video sequence had no objective evidence of airflow obstruction at the time of the filming. Validation exercises in Australia and Hong Kong suggested similar sensitivity and specificity for BHR as the written question suggesting that they were measuring wheezing of similar severity, at least similar severity as judged by BHR 14, 27. However, some children have interpreted the video sequence as being more severe than their experience of wheezing, by a positive response to the written question and a negative response to the video.
Lastly, there were respondents in every centre who responded negatively to the written questionnaire but positively to the video. This is illustrated by language group in figure 5⇑. Presumably these children were uncertain about the meaning of the written term wheezing but recognised the symptom when it was shown to them audiovisually.
Agreement by region
Mean values for kappa are similar between most regions, with the exception of North East Europe, but with considerable variation between centres within each region. Centres in North America, Western Europe and Oceania tended to show less variation than other regions. For Oceania, which comprises Australia and New Zealand, this likely reflects their common language, the common historical cultural heritage of the majority of the immigrant population and a longstanding high level of awareness of asthma as a major health problem in the community.
Agreement by language
Comparison of agreement by language groups shows more variation. Again, English and Russian showed the greatest difference. Mean scores for agreement in Spanish-speaking centres in Spain and Latin America were similar. However kappa varied from 0.27 to 0.44 amongst Spanish centres in Spain and to a greater extent in Spanish-speaking centres in Latin America, which included different countries and where kappa varied from 0.19 to 0.58. Chinese speakers showed lower levels of agreement than English or Spanish speakers, but agreement varied from 0.13 to 0.33. Russian speakers showed the lowest level of agreement ranging from 0.08–0.32.
By employing two questionnaires asking similar questions in different ways it is possible to examine their combined and separate contributions to wheezing prevalence in three distinct groups, a positive response to the written question only, positive response to both or to the video only.
Figure 5⇑ examines the contribution of positive responses to each questionnaire separately and a combined response as a proportion of all wheezing. The written questionnaire was administered before the video in every centre. Thus for English and Spanish speakers almost half of the children (48% and 45%, respectively) who reported any wheezing on either questionnaire did not agree when they saw and heard wheezing in the video question. For Russian speakers this was 84% and Chinese 59%. These respondents would have included some children who did not understand what was meant by the term wheezing or considered that their own experience of wheezing was less severe than the video depiction. Conversely, only 10% of all Russian wheezing children and 38% of English-speaking wheezing children agreed when they saw the video. These children presumably interpreted the video as representing their understanding and experience of wheezing. These children may be a more severe wheezing group and the proportion of children in this group would then be a marker of asthma severity. A variable proportion of children, 6% of Russian-speaking wheezing children and 28% of Spanish, having responded negatively to the written question then responded positively to the video. These children presumably did not understand the written question but recognised the symptom when they saw and heard it. These children would not be included in a survey of wheezing prevalence based on written questions alone.
A large proportion of 13–14-yr-old children who responded to a written question about wheezing, either did not understand the meaning of the term and its intended context or have wheezing that is not distinctly audible and is considered by them to be less severe than a video sequence depicting audible wheeze without breathlessness. Further study of children in this group is clearly warranted. Children who respond only to the video, after responding negatively to the written question, may represent a group with previously unrecognised symptoms who also did not understand the term wheezing but recognised it when they saw and heard it. The proportions of children in these groups varies considerably between centres and by language groups. Study of children separated by their responses to written and or video questionnaires would help clarify what exactly is being measured and how it relates to episodic airflow obstruction. Similar studies amongst adults would also be of interest.
The use of a video questionnaire, in addition to the traditional written format, raises questions about questionnaire interpretation, wheezing severity and unrecognised symptoms, which warrants clarification in order to better define asthma in population surveys. This present analysis of the two questions does not significantly affect the broad conclusions about international variations in reported wheezing prevalence between centres and regions in the International Study of Asthma and Allergies in Childhood, since the two questionnaires are highly correlated. It does, however, raise questions about exactly what is being measured by either questionnaire in different centres and, in turn, how these responses might relate to episodic airflow obstruction.
Acknowledgments
The authors are indebted to the collaborators in the participating centres and all parents, children, teachers and other school staff who participated in the surveys. There are many field workers and funding agencies who supported data collection, and national, regional and international meetings, including the meetings of the International Study of Asthma and Allergies in Childhood (ISAAC) Steering Committee. The authors particularly thank the funders who supported the ISAAC International Data Centre including the Health Research Council of New Zealand, the Asthma and Respiratory Foundation of New Zealand, the National Child Health Research Foundation, the Hawkes Bay Medical Research Foundation, the Waikato Medical Research Foundation, Glaxo Smith Kline New Zealand and Astra Zeneca New Zealand. The authors also wish to thank Glaxo Smith Kline International Medical Affairs for funding the regional coordinating centres.
ISAAC Steering Committee: N. Aït-Khaled, G. Anabwani, H.R. Anderson, M.I. Asher, R. Beasley, B. Björkstén, M. Burr, J. Crane, P.E. Ellwood, U. Keil, C.K.W. Lai, J. Mallol, F.D. Martinez, E.A. Mitchell, S. Montefort, N. Pearce, C.F. Robertson, J.R. Shah, D. Strachan, A.W. Stewart, E. von Mutius, S.K. Weiland, H.C. Williams.
ISAAC International Data Centre: M.I. Asher (Director), T.O. Clayton, P.E. Ellwood, E.A. Mitchell, A.W. Stewart. Principal Investigators are listed in 26.
- Received May 18, 2002.
- Accepted November 14, 2002.
- © ERS Journals Ltd