Population prevalence of obstructive sleep apnoea in a community of German third graders

We aimed to estimate the population prevalence of obstructive sleep apnoea (OSA) in an urban community of German third graders (age range 7.3–12.4 yrs) and the diagnostic test accuracy of two OSA screening methods. Using a cross-sectional study design with a multi-stage sampling strategy, 27 out of 59 primary schools within the city limits of Hanover, Germany, were selected. 1,144 third graders were screened for symptoms and signs of OSA using questionnaires and nocturnal home pulse oximetry. 183 children underwent abbreviated nocturnal home polysomnography (OSA definition: apnoea/hypopnoea index ≥1) and 22 were diagnosed to suffer from OSA. In general, sensitivity for both screening methods was low (<0.6), while specificity was moderately high (mostly >0.7). Independent predictors for OSA were body mass index, history of allergy, a composite questionnaire score, and two oximetry-based criteria. Based on these variables and logistic regression, a prediction model (accuracy; 95% confidence interval: 0.86; 0.71–0.94) was constructed and applied to children who had not successfully undergone polysomnography. This resulted in nine additional OSA cases and an overall design-adjusted population prevalence (95% confidence interval) of 2.8% (1.5–4.1%). Clinical and oximetry findings may be helpful for screening and predicting OSA in primary school children.

C hildhood obstructive sleep apnoea (OSA) is one expression of sleep-disordered breathing (SDB) and characterised by sleeprelated episodes of partial and/or complete upper airway obstruction with or without hypoxaemia, hypercapnia and respiratory-related arousal. The episodes may accompany snoring, laboured breathing, chest retraction, cyanosis and disturbed sleep [1]. OSA occurs in children of all ages. It is most common in the pre-school age group, due to adenotonsillar hyperplasia. Full sleep laboratorybased polysomnography is the gold standard for diagnosing OSA in children [2]. Most children with OSA will have both symptomatic and polysomnographic resolution following adenotonsillectomy [3].
Regarding European countries, prevalence studies on paediatric OSA have been performed in the UK [4], Iceland [5], Sweden [7], Italy [9,12], Spain [10], Greece [15] and Turkey [18], but not yet in Germany. In 2000, the authors initiated a comprehensive community-based cross-sectional study on SDB in children (i.e. the German Study on SDB in Primary School Children) [19]. Among others, the aims of this study were to obtain unbiased estimates for the population prevalence of OSA in an urban community of German third graders (age range 7.3-12.4 yrs) and to determine the diagnostic test accuracy of OSA screening methods.

METHODS
Study design, subjects and screening procedure Details on sample size calculation, sampling strategy, comparisons for representativeness, screening methods and study procedures have been published elsewhere [19]. In short, 27 of the 59 public regular primary schools located within the city limits of Hanover, Germany, were selected using a multi-stage, stratified (by socioeconomic status), probability clustered design ( fig. 1). Following approval by the institutional review board and the regional directorate of education, 1,760 children attending third-grade classes were approached between February and December 2001 and 1,144 (65.0%) were enrolled. Children were included if parents gave written informed consent. Comparisons with the target population (n54,109) revealed good to excellent representativeness of the study sample concerning sex distribution, socioeconomic status, academic performance and doctor-diagnosed asthma [19]. Children were screened twice using a widely used and partially validated parental SDB-questionnaire (SDB-Q) [20][21][22][23][24] and nocturnal home pulse oximetry (HPO) [25][26][27].

Questionnaire
The SDB-Q by GOZAL [22] was adjusted to enable calculation of the OSA score according to BROUILLETTE et al. [20] and extended with questions concerning parental education, child's demographic and anthropometric characteristics [19], daytime behaviour [28], frequent sleep problems [29] and current health status (see Appendix [28]). The body mass index (BMI) was calculated using a standard formula (BMI5weight (kg)/height (m) 2 ) and transformed into age-and sex-specific centiles using German reference values [30]. Snoring was assessed with the question ''Does your child snore?'' and rated on a 4-point scale.
Children were classified as habitual snorers if the answers were ''frequently'' or ''always''. The OSA score according to BROUILLETTE et al. [20], the SDB score according to GOZAL [22], and an adapted SDB score according to PADITZ et al. [28] were calculated. For the calculation of these scores, arbitrary numerical scores were assigned to each of the answers ranging from 0 (never), 1 (rarely) and 2 (occasionally) to 3 (frequently) and 4 (almost always). To enable calculation of scores for each single child and to achieve high sensitivity, missing answers were scored as 0 (never). This imputation method was used for the screening process and the construction of the prediction model. For estimating diagnostic test accuracy, multiple missing data imputation methods were used (see Statistical analysis). Based on questionnaires obtained between February and July 2001 (n5671), the 95 th centile for the adapted SDB score was calculated and found to be 24. Children were screened positive if they: 1) were reported to snore habitually (SDB-Q criterion 1); 2) had an OSA score o0 (SDB-Q criterion 2 [20]); or 3) had an adapted SDB score o24 (SDB-Q criterion 3).

Home pulse oximetry
Recordings of HPO-derived arterial haemoglobin oxygen saturation (Sp,O 2 ) were performed overnight in the child's home using an instrument with a new generation oximeter module that was capable of storing continuous trend and episodic event data [25,26]. Data analysis software was used to determine artefact-free recording time and to calculate the mean, standard deviation, median, and 5 th and 10 th centiles Sp,O 2 , as well as the number of desaturation events of o4% determined using information on signal quality, low perfusion and pulse waveform. Desaturation event clusters were defined as o5 desaturation events of o4% Sp,O 2 occurring within a 30min period [27]. In addition, the average distance from the optimum of 100% Sp,O 2 and a cumulative hypoxaemia score were calculated for each recording [26]. Desaturation indices, defined as events per hour of artefact-free recording, were calculated for desaturation events of o4% Sp,O 2 (DI4), desaturation events to f92% (DI92) and to f90% Sp,O 2 (DI90) as well as desaturation event clusters (DIC). Based on 100 recordings obtained between February and July 2001, the 95 th centile for DI4 and DIC was calculated and found to be 3.9 and 0.4, respectively. Children were screened positive if they: 1) had o3 desaturation events to f90% Sp,O 2 and o3 desaturation event clusters (HPO criterion 1 [27]); 2) had the DI90 .0.6 (HPO criterion 2 [31]); or 3) had the DI4 .3.9 and the DIC .0.4 (HPO criterion 3 [25]). To assess clinical factors that possibly influence oximetry results or result in sleep-related hypoxia, a customised questionnaire (i.e. HPO-Q) was developed and distributed together with the oximetry device [25,32]. The questionnaire included items on the presence of heart disease, chronic lung disease, physician-diagnosed allergy/chronic rhinitis, current upper respiratory tract infection, anaemia, preferred sleeping position, bed/wake time and sensor placement. Parents were asked to fill in this questionnaire on the evening of the oximetry recording.

Home polysomnography
Home polysomnography (HPSG) was performed in all screenpositives and in a subgroup of screen-negatives (i.e. control group). To form the control group, all screen-negatives were listed by date of enrolment and every 20 th child on that list contacted. For participation in this control group, a ticket for the Hanover Zoo was offered as an incentive. For the HPSG, an ambulatory polygraphic device recorded chest and abdominal wall movements, nasal pressure and linearised nasal airflow estimation, oral airflow, snoring, Sp,O 2 , pulse rate, pulse waveform, actigraphy, body position, and user events over one single night [33]. Recordings were then manually analysed for the corrected estimated sleep time, and mixed and obstructive apnoeas, as well as hypopnoeas based on standard guidelines or published criteria [34]. An apnoea was scored if: 1) the amplitude of the nasal airflow fell to f20% of the average amplitude of the two preceding breaths; 2) no airflow was detected at the mouth; and 3) the event comprised at least two breath cycles (i.e. ,6 s for the age group under study). Obstructive apnoeas were scored if criteria for apnoea were fulfilled and out-of-phase movements of the chest and abdomen were present. Mixed apnoeas were defined as apnoeas with central and obstructive components, each of them lasting at least two (not necessarily consecutive) breath cycles. Hypopnoeas were scored if: 1) the amplitude of the nasal airflow fell to f50% of the average amplitude of the two preceding breaths; 2) a fall in Sp,O 2 by o4% occurred within 30 s of the onset of the event; and 3) the event comprised at least two breath cycles. Recordings with a corrected estimated sleep time ,4 h were excluded. An apnoea/hypopnoea index (AHI) was calculated, defined as sum of all mixed and obstructive apnoeas and obstructive hypopnoeas per hour of corrected estimated sleep time. OSA was defined as AHI o1 to comply with international guidelines [35].

Statistical analysis Diagnostic test accuracy
The following parameters were evaluated for their accuracy in predicting OSA on HPSG following re-evaluation of screening results: snore score [23], OSA score [20], SDB score [22], adapted SDB score [28], nadir Sp,O 2 , DI4, DI90, DI92 and DIC. Accuracy was investigated using nonparametric receiveroperating characteristic (ROC) analysis with area under the ROC curve (AUC) and its 95% confidence interval (95% CI), as well as classical measures of accuracy like sensitivity, specificity, and positive and negative likelihood ratio. To enable comparability, SDB-Q scores and HPO parameters were dichotomised into ''test positive'' and ''test negative'' based on the ROC curve. Cut-off values for dichotomisation were set to achieve 0.8 specificity. For the questionnaire scores, missing answers were handled in four different ways: 1) missing answers were scored as 0 (never; this was the primary analysis and in accordance to the screening procedure); 2) missing answers were scored as the item-specific sample mean; 3) missing answers were scored as the maximal item-specific response category (mostly 4 for almost always); and 4) missing answers led to exclusion of individuals. Measures of accuracy were then calculated for all four data sets.

OSA prediction model
Using the subset of children who had undergone HPSG, a prediction model for OSA was elaborated using an explorative data analysis and consecutively applied to those children who were not evaluated with HPSG. Therefore, children with OSA were compared with children without OSA using Pearson's Chisquared test for categorical variables and the Mann-Whitney Utest for continuous variables. 34 factors from the SDB-Q (including age, sex and SDB-Q scores), four factors from the HPO questionnaire, and 25 factors from the HPO were evaluated. Differences in distributions/ranks with a p-value ,0.1 were identified. With the exception of SDB-Q scores, identified SDB-Q factors were then dichotomised into several binary dummy variables using different cut-offs. For example, the variable of a questionnaire item with three response categories (e.g. never, occasionally, frequently) were dichotomised into the dummy variable ''never versus occasionally/frequently'' and ''never/ occasionally versus frequently''. Identified HPO factors were dichotomised into dummy variables using published cut-off or reference values [20,25,30]. Replacing categorical variables by binary dummies aimed to reduce the number of parameters in the regression model which in turn enhanced statistical power. Finally, multiple Pearson's Chi-squared tests were performed on each factor to identify the dummy variable with the lowest pvalue. Multiple binary logistic regression analysis was used to construct the prediction model [36]. All SDB-Q scores and binary dummy variables selected from the explorative data analysis were potentially eligible for inclusion. To enable a complete data set, missing values within each dummy variable were replaced by the same value to form a distinct ''missing'' category. Variables were added to the model using the conditional stepwise forward selection method. A p-value of 0.2 was the criterion for including or excluding a variable.

OSA population prevalence
After establishing the prediction model, probability values for OSA (range: 0-1) were calculated for all children using the logistic function [36]. Probability values were compared between children with and without OSA using ROC curves, and AUC and its 95% CI. Using the ROC curve, a cut-off for the probability values was searched that allowed prediction of OSA on HPSG with at least 0.95 specificity. Based on the probability values and the above-mentioned cut-off value, OSA was predicted in children who had not undergone HPSG. ''Predicted'' OSA cases were added to the HPSG-defined OSA cases and the population prevalence of OSA estimated. To account for the complex sampling strategy and varying response proportion, stratum-and cluster-specific sampling weights were used to adjust the point estimate and the 95% CI for the population prevalence [37].

Analysis software and algorithms
Recoding and creation of variables, descriptive statistics, groupwise comparisons, logistic regression analyses, and creation of ROC curves were performed using SPSS 15.0 (SPSS, Inc., Chicago, IL, USA). Nonparametric ROC analysis (i.e. AUC, its standard error and 95% CI) was performed using Stata 9.2 (Stata Corp., College Station, TX, USA). AUC was computed using the trapezoidal rule; the standard error for AUC was computed using the algorithm described by DELONG et al. [38]; the 95% CI for AUC was determined using the bootstrap t approach with 1,000 replications [39]. The design-adjusted point estimate for the population prevalence of OSA and its 95% CI were calculated using the complex survey module of Stata 9.2. No adjustment for multiple testing was performed.

Screening results
Basic characteristics of the study sample and study subgroups are presented in table 1; screening results are given in figure 2. The SDB-Q was successfully obtained in all children. The amount of missing SDB-Q data ranged from 1.0 to 27.1%. A detailed description of missing SDB-Q data is given in the Appendix. In total, 114 children snored habitually, 37 had an OSA score .0, and 45 children had an adapted SDB score o24. Thus, 125 children were selected for HPSG based on SDB-Q results.
Acceptable HPO recordings were obtained in 995 children. Based on the pre-defined screening criteria, 24, 10 and 35 recordings fulfilled HPO criterion 1, 2 and 3, respectively. In addition, six children had typical recurrent desaturation clusters in their oximetry recording, but did not meet our pre-defined screening criteria. As these recordings were clinically suggestive for OSA, we also included these children in the HPSG followup. Thus, 51 children were selected for HPSG based on HPO results. Finally, 169 children (14.4% of the total study sample) met at least one out of six screening criteria or were suspected to have OSA based on their HPO recording.

Polysomnographic results
Of 169 screen-positives, 13 families could not be contacted by either phone or mail and eight families declined participation in a sleep study. Hence, 148 sleep studies were performed. Of these, 132 recordings comprising at least 4 h of corrected estimated sleep time. Children who successfully underwent HPSG were not systematically different from those eligible concerning demographic variables like age, sex and maternal education (data not shown). There was a mean (minimummaximum) time gap between screening with the SDB-Q and performing the HPSG of 32 weeks (4-77). Of 132 children successfully evaluated by HPSG, 20 had an AHIo1 and were diagnosed to suffer from OSA.
Of 975 screen-negatives, 65 children were approached and 11 children or their parents declined participation. Demographic variables (age, sex, maternal education) did not differ between participants and nonparticipants (data not shown). Of 54 recordings performed, 48 comprised at least 4 h of corrected estimated sleep time and were, thus, considered acceptable for analysis. Two of the remaining recordings could be successfully repeated (one had to be repeated twice), while four children denied further participation. In one child, who had originally screened positive and underwent HPSG, re-evaluation of screening results revealed that the screening had in fact been negative. This child was assigned post hoc to the control group, thereby leading to a final sample of 51 children. Mean (minimum-maximum) time gap between screening with the SDB-Q and the performance of HPSG was 39 weeks (range 10-87). Of 51 children successfully evaluated by HPSG, two had an AHI o1 and were diagnosed to suffer from OSA.

Follow-up
Parents of the 22 children with OSA were informed about the HPSG result and encouraged to visit their otorhinolaryngologist for further evaluation. Six parents refused any treatment and further evaluation, four children were lost to follow-up, five children had their AHI ,1 at follow-up (weight loss was   OSA population prevalence Applying 0.291 as cut-off to the probability values of all non-HPSG-validated children, nine additional OSA cases were predicted (four in screening-negatives, five in screeningpositives; fig. 2). Adding these predicted cases to the 22 HPSG-validated cases resulted in a total number of 31 children suspected to suffer from OSA. The stratum-specific point estimates (95% CI) for the prevalence of OSA was 1.8 (0.6-3.1) for SES stratum 1, 2.8 (0.6-4.9) for SES stratum 2, and 3.9 (0.2-7.6) for SES stratum 3 ( fig. 1). This yielded a design-adjusted point estimate (95% CI) for the population prevalence of OSA of 2.8% (1.5-4.1). Although not statistically significant, the risk of having OSA was higher in SES stratum 2 (odds ratio (95% CI): 1.5 (0.6-4.0)) and SES stratum 3 (2.2 (0.7-6.5)) compared with SES stratum 1, suggesting a dose-effect gradient.

DISCUSSION
We found a relatively high population prevalence of OSA in our urban community of primary school children. If this is true for the total population of primary school children in Germany, OSA is one of the most frequent chronic respiratory diseases in childhood. Asthma, another chronic respiratory disease, was found to have a 12-month prevalence of 3% in the German Health Interview and Examination Survey for Children and Adolescents [40]. In our school enrolment cohort  from 1998, which was the sampling frame for the present study in 2001, 3.9% of children were reported to suffer from doctordiagnosed asthma [19]. These data suggest that, at least in school children, OSA is as prevalent as asthma. Given its potentially life-long consequences [41], OSA may require more attention from paediatric public health services, clinicians and researchers than currently provided.
Several methodological features probably enabled us to obtain a highly accurate estimate for the population prevalence of OSA, as follows: 1) compared with other studies, we achieved a high response proportion (65%) [19]; 2) our study sample was representative of the target population [19]; 3) we used a twostage clinical screening procedure including an objective test for OSA; 4) a prediction model was used to detect individuals who had not been validated by HPSG but probably suffered from OSA; and 5) the estimate for the population prevalence of OSA was adjusted for design aspects like sampling strategy, response proportion, and clustering of individuals within schools.
In contrast to our study, four studies applied HPSG to the total sample and would have been able to yield accurate prevalence estimates [6,10,13,14,16]. These studies, however, suffered from low response, lack of representativeness, and/or the use of adult criteria for diagnosing OSA. The study by REDLINE et al. [6] resulted in a prevalence estimate (10.3%) that was much higher than the current one. Surprisingly, this was achieved despite using an AHI o5 for defining HPSG-based OSA, a relatively high cut-off that is predominantly used in adults. However, a more proper cut-off value would have increased their point estimate even further. Compared with our sample, their children were more obese (mean BMI, 18.5 versus 17.5 kg?m -2 ), were more likely to be of African-American ethnicity (  study used pulse oximetry for screening purpose [9]. However, none of these studies included screen-negatives for gold standard evaluation. Consequently, estimation of the accuracy of screening tests used in these studies was not possible. As we used six different screening criteria and included screennegatives for HPSG evaluation, we were able to estimate the accuracy of our screening criteria. In general, sensitivity was low (,0.6) and specificity high (.0.7) for both SDB-Q and HPO criteria. However, after a detailed investigation of screening methods and analysis of continuous test results, it turned out that AUC was generally higher for HPO parameters (mostly .0.7) compared with SDB-Q scores (mostly ,0.6). This has several implications: 1) OSA prevalence studies using only questionnaires are likely to underestimate the true population prevalence; 2) in contrast to previous studies on the diagnostic test accuracy of HPO [27], sensitivity may be enhanced by using other than the published criteria [27]; and 3) HPO may be used as a screening test for OSA.
Regarding the SDB-Q, we faced several problems. This questionnaire was mainly based on a questionnaire from another epidemiological study in primary school children [22]; however, accuracy in a community-based study was unclear.
In fact, the questionnaire was not used in its original form, and was modified as follows: 1) three items were adapted to enable calculation of Brouillette's OSA score [20]; 2) six items were taken from a German questionnaire on OSA in toddlers and young children [28]; and 3) five items on sleep problems were newly developed [42]. Of the three SDB-Q-based screening criteria (i.e. habitual snoring, OSA score o0, adapted SDB score o24), only the OSA score had been validated [20]. Initially, we were concerned about the low specificity (and, thus, many false positives) of the OSA score. To cope with this problem, we increased the cut-off value for a positive test result from -1 to 0. Conversely, we were also concerned about a low sensitivity when using the OSA score as the only screening criterion. We therefore decided to establish a second SDB-Q score (i.e. the adapted SDB score) and to evaluate all habitually snoring children with HPSG.
Sensitivity was also a matter of concern with HPO. Using the criteria suggested by BROUILLETTE et al. [27], HPO had a sensitivity of only 0.43 in one study. The accuracy of pulse oximetry in a community setting was, in analogy to the SDB-Q, unknown. To enhance its sensitivity, we added two more screening criteria: 1) DI90 .0.6 (criterion 2 [31]) and 2) DI4 .3.9 and DIC .0.4 (criterion 3 [25]). The latter criterion, however, was introduced during the study, as reference values from a healthy subgroup finally became available [25]. There were two reasons why we used HPO as a screening method. First, an objective screening test was needed, because accuracy of subjective parental observations (and reporting via the SDB-Q) of a child's breathing during sleep may depend on demographic (e.g. single-parent family), socioeconomic (e.g. number of rooms in the household) and ethnic factors (e.g. perception of sleep-related symptoms may differ between ethnic groups). Relying on only parental perception therefore most probably decreased the sensitivity of our screening procedure. Hence, an objective, easily applicable, and lowcost screening test was considered mandatory. Secondly, we were also interested in intermittent hypoxia as an intervening factor in the relationship between SDB and several outcomes, such as impaired behaviour [43] and academic achievement [24]. Intermittent hypoxia is thought to cause prefrontal cortical dysfunction leading to impaired cognitive execution [44]. To clarify the role of intermittent hypoxia, we decided to include a screening method that also allows the assessment of night-time intermittent hypoxia.
In our study, a prediction model was used to estimate the population prevalence of OSA. Variables for the model were selected and weighted using effect estimates from logistic regression. Using the model, probability values for OSA were calculated in children that were not investigated by HPSG and children assigned as ''predicted'' OSA cases. Predicted and validated cases were added to obtain the best estimate for the population prevalence of OSA. To our knowledge, this is the first prediction model for paediatric OSA that is based on two different screening tests and is constructed from data of a community-based sample. Prediction models are often used in adults to: 1) exclude a diagnosis of OSA when the probability is low so that no further testing is required; 2) establish an a priori probability before considering the use of a diagnostic method other than polysomnography; and 3) prioritise patients needing polysomnography according to the probability that they will have a positive result [45]. Four prediction models for paediatric SDB have been published [46][47][48][49]. They combined several data sources (i.e. clinical history, anthropometry and radiography) and types of modelling. SILVESTRI et al. [46] published a prediction model showing 0.81 accuracy. However, they did not present other measures of accuracy and did not publish raw data to allow calculation of these measures. The discriminant analysis classification system by SHOULDICE et al. [47] showed 0.86 sensitivity and 0.82 specificity. However, the test set was small and results were not prospectively confirmed in a larger group of children. XU et al. [48] demonstrated that radiological features of upper airway narrowing due to adenotonsillar hyperplasia were found to be predictors for clinically relevant OSA. A combination of six predictors had a sensitivity and specificity of 0.94 and 0.42, respectively. Finally, BITAR et al. [49] presented a clinical score for obstructing adenoids. Polysomnography, however, was not performed and diagnostic accuracy for OSA not determined.
In contrast, the current prediction model has several advantages. First, the model is based on parameters that can be easily obtained by filling in a questionnaire, measuring height and weight, and performing an overnight oximetry recording. In our study, the required data were successfully obtained in schools. Hence, it seems possible to use this model for largescale screening programmes as well as for primary care settings. Secondly, sensitivity and specificity can be ''adjusted''. As the model delivers probability values (i.e. a continuous test result), cut-off values for a positive screening result may be adjusted according to the type of application. If necessary, sensitivity (or specificity) may be enhanced. This, however, would be at the expense of the specificity (or sensitivity). For our estimation of the population prevalence of OSA, we adjusted the cut-off to gain a specificity of .0.95 in order to decrease the false positive fraction. In other settings (e.g. screening studies with a second test or a gold standard evaluation), it could be more advisable to increase sensitivity and lower the false negative fraction. Thirdly, compared with each single SDB-Q score and HPO parameter (AUCf0.75), accuracy of the model was superior (AUC50.86). It is inherent that a combination of diagnostic criteria shows higher accuracy than each single criterion. However, further studies are needed before prediction models similar to the current one may be used in clinical settings.

Limitations
Limitations of the current study have been discussed elsewhere [19,24]. Briefly, there might be a selection bias if subjects with symptoms were more likely to agree to participate. This would cause an overestimation of prevalence. The sample was drawn from an urban community of third graders. As the geographical variation in the prevalence of OSA is unclear, results can be extrapolated to suburban or rural communities only cautiously. Selected individuals for this study were third graders with an age range of 7-12 yrs. This is not the age span where the prevalence of OSA is thought to have its maximum. As OSA is mostly caused by adenotonsillar hyperplasia in children, and the quotient between pharyngeal diameter and adenotonsillar tissue size has its minimum in the first years of life, the age span of 3-5 yrs is suggested to have the highest prevalence. We were, however, interested in the relationship between SDB and academic achievement, which is not assessed until the third grade.
Adenotonsillectomy is the accepted first line treatment for OSA in children. Hence, the frequency of this procedure performed in a population may affect the population prevalence of OSA. In our sample, the frequency of adenotonsillectomy was 3.9%. In other populations with higher or lower rates of this procedure, the prevalence may differ substantially. However, adenoidectomy was a risk (and not a preventive) factor for habitual snoring in one study [50] and adenotonsillectomy did not decrease the risk for habitual snoring in another study [51]. Moreover, adenotonsillectomy was found to be ineffective in 50% of cases on 1-yr follow-up [52], and, in the present study, neither adenoidectomy nor tonsillectomy was a preventive factor for OSA. Hence, it remains speculative if high rates of adenoidectomy and/or tonsillectomy would substantially reduce the prevalence of OSA in a population.
Some screening criteria were introduced during the study to enhance sensitivity. Consequently, some children were screened positive in 2001 and not evaluated with HPSG until the end of 2002. OSA is thought to be a rather stable disease, but the precise variation in its expression and severity is unknown. It is possible that some children who were screened positive and had OSA in 2001 were not suffering from OSA anymore when the diagnostic procedure was done in 2002. This would have led to disease misclassification and may have biased both the prevalence estimate and the estimate of accuracy.
For a final diagnosis of OSA, we used abbreviated HPSG that did not include electroencephalography, -occulography and -myography. In 2000, no device for full HPSG was commercially available and we defined OSA on the basis of the AHI without need for arousal determination or sleep staging. One concern with abbreviated HPSG is the possible loss of diagnostic accuracy because sleep cannot be distinguished from wakefulness. This is based on the assumption that if detection of rapid-eye-movement sleep (when OSA is usually present or most severe) is not possible, OSA cannot be reliably ruled out. Three validation studies on abbreviated HPSG showed conflicting results [53][54][55]. However, as previously discussed by MORIELLI et al. [56] and JACOB et al. [54] there is invariably rapid-eye-movement sleep present in an all night recordings, even though it may not be possible to determine which specific epochs are included. Meanwhile, abbreviated HPSG has been used by a series of other community-based studies [6,10,12,14], possibly because full HPSG suffers from significant artefacts in the electroencephalographic and myographic channels [57]. The convincing advantages of abbreviated HPSG are convenience for both parents and children, and cost-effectiveness [54,58]. Moreover, the omission of sensors and leads attached to the child's face and head should help to improve sleep quality and establish a regular sleep profile in the night of recording. In summary, there is no evidence that abbreviated HPSG is not a valid and reliable diagnostic test procedure for OSA in children.
HPSG was performed in only 16% of all participants. For a prevalence study, it is surely desirable that the diagnostic test is applied to the total population or to all individuals of a representative sample drawn from that population. Performing HPSG in hundreds of children, however, is very cost-intensive and this may be the reason that there are only four prevalence studies where HPSG was performed in the entire sample ranging from 101 to 850 individuals [6,10,13,14]. Most researchers used some kind of screening procedure to identify at-risk individuals for further diagnostic evaluation [4, 5, 7-9, 11, 12, 15-18]. If the screening procedure is sufficiently sensitive, this approach is obviously more cost-effective and reduces the burden of diagnostic procedures for low-risk individuals without introducing bias and underestimating the true sample and population prevalence. In the present study, we used six different screening criteria and a prediction model to reach a high level of sensitivity. We, hence, believe that performance of HPSG in the total sample is unlikely to have led to a significantly higher prevalence estimate than reported here.
Measures of accuracy were prone to verification bias, which occurs if not all screen-positives and only a small fraction of screen-negatives undergo gold standard evaluation [59]. When screen-positives are more likely to be verified for disease than screen-negatives, the bias in naïve estimates of accuracy is always to increase sensitivity and to decrease specificity from their true values. In our study, not all screen-positives and roughly 5% of screen-negatives underwent HPSG. Hence, the estimates of accuracy should be interpreted cautiously.
The prediction model is based on data from 7-to 12-yr-old children and should hence be applied only to this age group. In children outside this age range, factors other than those identified in this study may be more predictive for OSA or the same factors need to be weighted or combined in a different way. This is particularly true for infants and toddlers, where the BMI may not be predictive of SDB [60]. We thus warn against the use of our prediction model outside the age range of primary school children. Moreover, the number of validated subjects (n5183) could be insufficient for a precise estimation of the diagnostic test accuracy of the prediction model. No sample size calculation had been performed and the confidence interval for the AUC was rather wide, ranging from 0.71 to 0.94. We, hence, do explicitly not recommend its clinical use until more validation data are available. However, we believe that its use as an additional ''diagnostic'' procedure to detect possible OSA cases was reasonable for the current study.

Conclusions
The population prevalence of OSA in German primary school children is likely to be at 2-3%. Hence, OSA may be one of the most frequent chronic respiratory diseases in this age group. There are clinical symptoms and oximetry findings that may be helpful to detect OSA in this age group. These symptoms and signs or a combination of both in a prediction model may be used for screening purposes. Such a model may also be used in future studies on the population prevalence of OSA in other settings.

APPENDIX
The sleep-disordered breathing questionnaire is shown in table 6.