Abstract
Obstructive sleep apnoea (OSA) is often treated with autotitrating continuous positive airway pressure (autoCPAP) devices. Clinical and bench tests of these devices have suggested performance limitations. These studies do not indicate whether this is a failure to detect or a failure to respond to airway obstruction.
In this randomised, crossover trial, 34 patients with moderate-to-severe OSA underwent polysomnography on two laboratory visits. The autoCPAP device was randomly set to a fixed subtherapeutic pressure (detection assessment) or autotitrating mode (response assessment). Airflow was measured both from the autoCPAP (autoCPAP flow) and directly from the nasal mask, and recorded on polysomnography. Apnoea/hypopnoea indices (AHIs) measured at the two sites and from the autoCPAP download report were compared.
Regarding detection, the AHI measured from the nasal mask showed good agreement with the autoCPAP flow AHI, but agreement was lower with the autoCPAP report AHI. In autotitrating mode, there was significant misclassification of those with and without OSA (AHI ≥10 events·h−1) on the autoCPAP report. Regarding response, residual OSA (AHI ≥10 events·h−1) was still evident in 24% of patients during autotitration.
In some patients, autoCPAP fails to detect and/or respond to sleep apnoea. Clinicians should consider limitations of each device and use caution when using autoCPAP report statistics to verify effective treatment.
- Autotitrating continuous positive airway pressure
- continuous positive airway pressure
- obstructive sleep apnoea
- residual sleep apnoea
The commonest treatment for moderate-to-severe obstructive sleep apnoea (OSA) is nasal continuous positive airway pressure (CPAP). When worn correctly and optimally titrated, fixed-pressure CPAP almost completely controls OSA, consolidating sleep and eliminating the hypoxic insults that are thought to cause the long-term sequelae associated with the disease [1].
In the past two decades, CPAP manufacturers have developed and marketed automatic CPAP (autoCPAP) devices [2]. These devices have employed different breath-to-breath algorithms that raised or lowered pressure to maintain a patent airway. AutoCPAP devices have been marketed as having identical OSA-controlling efficacy to fixed-pressure CPAP while delivering lower mean pressure, with the assumption that lower pressure application would be more comfortable, improve adherence and, thus, long-term patient outcomes [3, 4]. However, a recent meta-analysis of 30 trials of various autoCPAP devices has found only ∼0.2 h per night of additional use attributable to these machines, which is of questionable clinical relevance [5]. In addition, there is a range of clinical studies where autoCPAP devices, used for home treatment or during titration to estimate a subsequent fixed pressure for CPAP, have failed either to work reliably or to control OSA adequately in some patients [6–9].
AutoCPAP-derived residual apnoea/hypopnoea index (AHI) and other parameters are often used to determine an effective fixed pressure and to assess the efficacy of autoCPAP therapy. However, little is known about the accuracy and reliability of these parameters reported on the data download of autoCPAP devices [10]. Treatment algorithms are poorly described in order to protect their intellectual property value by avoiding public disclosure, which raises clinical concerns [11]. Bench testing of these devices has also indicated response limitations under simulated respiratory challenges [12, 13]. No previous experiments have measured the detection performance of autoCPAP when pressure has been set to a deliberately subtherapeutic level. It is unclear whether clinical limitations of autoCPAP therapy are caused by a failure to detect or a failure to properly respond to breathing events.
We aimed to investigate the mechanisms underlying the reported deficiencies in the performance of autoCPAP. We hypothesised that one or both of two possible circumstances may apply: either the device does not adequately detect OSA and/or it does not adequately respond. An autoCPAP device was assessed under two conditions in random order in the same patients. We tested the detection capabilities of the autoCPAP via the subtherapeutic condition, where the device was set to a fixed, partially therapeutic pressure, and compared the AHI on the autoCPAP data download report with a manually scored AHI from airflow measured at the nasal mask. We also tested the response capabilities of the autoCPAP via the autotitrating condition, where the device was allowed to apply pressure according to its internal algorithm, and assessed its performance by measuring residual AHI derived from polysomnography and its own internally generated report.
MATERIAL AND METHODS
Study subjects
Patients ≥18 yrs of age, of either sex, with previously diagnosed moderate-to-severe OSA (AHI >15 events·h−1) and self-reported positive airway pressure (PAP) therapy use >20 h·week−1 from Royal Prince Alfred Hospital sleep clinic (Sydney, Australia) and a volunteers’ database were invited to participate in the trial. Patients with major coexisting sleep, neurological, psychiatric or serious medical disorders were excluded. The Sydney South West Area Health Service Human Research Ethics Committee approved the protocol and all participants provided written informed consent.
Study design
This single-blind, randomised, crossover trial, designed to assess the detection and response capability of the autoCPAP device, was registered at the Australian and New Zealand Clinical Trials Registry (identifier number ACTRN12606000486527). Patients underwent full overnight polysomnography on two separate visits to the sleep laboratory within a 3-week period. In a random order, the device (AutoSet-T; ResMed, Poway, CA, USA) was set to a fixed subtherapeutic pressure (two-thirds of the prescribed pressure for those using fixed-pressure CPAP at home or two-thirds of the 90th percentile of home autoCPAP pressure for those using autoCPAP at home) or autotitrating mode. The device was configured using default manufacturer settings except that patients could use the ramp feature as normal. Humidifiers were not used and patients wore their own nasal masks, when possible. Leaks were monitored and controlled by the overnight technician to <0.4 L·s−1.
Methods
During polysomnography, airflow was simultaneously measured from two airflow measurement sites commonly used during in-laboratory PAP titration studies: the nasal PAP mask via the port; and the autoCPAP-derived signal, the electronic output signal derived from the in-built pressure transducer of the autoCPAP, recorded on polysomnography via a direct current input.
We also compared sleep apnoea indices calculated by polysomnography to those calculated by the autoCPAP device and displayed on its data download report (autoCPAP report) (fig. S1).
Standardised sleep scoring was undertaken by an experienced sleep technologist without reference to any respiratory channels [14]. Two de-identified copies of each sleep study, with either the autoCPAP-derived or mask-measured airflow signal displayed, were then respiratory-scored in random order by the same technologist, who was unaware of the airflow signal source. The technologist scored the sleep studies in batches of 20 (five patients’ data in each batch). Nine studies (i.e. 18 records) from the subtherapeutic night were scored twice by the same technologist and once by a second technologist to assess intra- and inter-scorer variability.
Apnoeas were defined as a complete cessation of airflow ≥10 s associated with an electroencephalographic (EEG) arousal and/or ≥3% oxygen desaturation. Hypopnoeas were defined as a reduction of airflow >50% from baseline, lasting ≥10 s associated with an EEG arousal and/or ≥3% oxygen desaturation [14]. Baseline breathing was defined as the mean amplitude of stable breathing, or the mean amplitude of the three largest breaths in those without stable breathing, in the 2 min preceding the onset of the event. Flow limitation events were defined as two or more consecutive breaths (event duration generally ≥10 s) that had a flattened/nonsinusoidal appearance but which did not meet the hypopnoea requirements [15]. According to information provided by the manufacturer, the AutoSet-T scores an apnoea when nasal ventilation reduced by >75% for ≥10 s, and a hypopnoea when the 8-s moving average ventilation dropped below 50% but not >25% of the recent average for 15 s [16]. We chose a threshold of AHI ≥10 events·h−1 to classify patients who had residual OSA.
Analysis
AHI (apnoeas and hypopnoeas per hour of sleep) and respiratory disturbance index (RDI) (apnoeas, hypopnoeas and flow limitation events per hour of sleep) measured at the two sites during polysomnography, as well as the AHI from the autoCPAP report, were compared using Bland–Altman plots. The flow signal from the mask was considered the reference standard for identifying and quantifying residual OSA. Logistic regression was used to identify predictors of residual OSA. Agreement of the sites to classify residual OSA was assessed by cross-tabulation and the κ-statistic. Wilcoxon signed-rank test and paired-samples t-tests were used to compare polysomnographic parameters, as appropriate. Continuous variables are reported as mean±sd unless otherwise stated. SPSS 17.0 for Windows (IBM SPSS, Somers, NY, USA) was used for statistical analysis.
We estimated a sample size of 34 patients completing the protocol to be sufficient to investigate our hypothesis and be able to detect moderate-sized effects in the absence of preliminary data to power the study. With this sample size, the 95% confidence intervals of the upper and lower Bland–Altman limits of agreement are ±0.6 sd.
RESULTS
Patient demographics
We screened 104 patients via telephone to identify 35 patients who gave informed consent to enter the study (fig. 1). One patient with an AHI of 12 events·h−1 measured 2 yrs previously and a subsequent 9-kg weight gain was deemed highly likely to have moderate-to-severe OSA and was enrolled into study. 34 patients (30 male) completed the protocol, with one dropout due to work commitments. Patient demographic data are shown in table 1.
All patients were subjectively compliant PAP users (>4 h per night for at least five nights per week), with an average duration on PAP therapy of 44 months and an average pressure of 12 cmH2O. Patients used fixed-pressure CPAP (76%) or autoCPAP at home (24%). Information on sleep architecture is included in the online supplementary material (table S1).
Inter- and intra-scorer reliability
The intra-scorer agreement as rated by the κ-statistic was 1.00 for identifying patients with an AHI ≥10 events·h−1 in the nine studies evaluated (p≤0.05). Additionally, a good inter-scorer agreement of κ=0.73 was observed (p≤0.05).
AutoCPAP detection of sleep-disordered breathing
Table 2 details sleep severity measures reported by the two flow measurement sites and by the autoCPAP download report. There was no significant difference between the mask AHI and the autoCPAP report AHI or the autoCPAP-derived signal AHI during the subtherapeutic night. During the autotitrating night, the autoCPAP report AHI was greater than the mask AHI by an average of 1.7 events·h−1 (8.1±4.4 and 6.5±5.4 events·h−1, respectively; p≤0.05). The RDI identified from the autoCPAP-derived signal was higher than the mask RDI by an average of 4.2 events·h−1 (29.7±16.6 and 25.5±16.0 events·h−1, respectively; p≤0.001) and 4.9 events·h−1 (25.7±14.4 and 20.8±13.7 events·h−1, respectively; p≤0.001) during the subtherapeutic and autotitrating nights, respectively. RDI data were not available from the autoCPAP report and, therefore, could not be included in this comparison.
Figure 2 presents Bland–Altman plots displaying the agreement between the AHI measured at the mask and the autoCPAP report (fig. 2a and b), and the autoCPAP-derived signal (fig. 2c and d) on the autotitrating and subtherapeutic nights. The mean difference between the mask and autoCPAP report AHI was 0.07 events·h−1 (limits of agreement 11.57– -11.44 events·h−1) during the subtherapeutic night. 11 out of 34 patients had a difference of ±5 events·h−1 between the two AHI measures. The plots also demonstrate a trend for the autoCPAP report to underestimate the AHI at higher values and overestimate at lower values. This phenomenon was observed when the device is configured to both fixed-pressure (subtherapeutic night) and autotitrating mode (autotitrating night). The mask and autoCPAP-derived signal AHI showed good agreement during the subtherapeutic (mean difference 0.51 events·h−1, limits of agreement 5.33− -4.31 events·h−1) and autotitrating conditions (mean difference -0.1 events·h−1, limits of agreement 5.91− -6.11 events·h−1).
AutoCPAP response to sleep-disordered breathing
24% of patients had an AHI ≥10 events·h−1 identified at the mask site and 76% had an RDI ≥10 events·h−1 during autoCPAP therapy (table 2). Body mass index, AHI at diagnosis, mask type, length of time on PAP therapy, current pressure, humidifier use or type of PAP device used at home (CPAP versus autoCPAP) were not associated with residual OSA.
Agreement about patients classified with or without residual OSA on the autotitrating night according to the mask-measured indices, compared with the autoCPAP-derived signal and the autoCPAP report, is reported in table 3. The autoCPAP report incorrectly classified 62.5% of patients with residual OSA as being adequately controlled on autoCPAP therapy (i.e. five out of eight patients). Conversely, the report misclassified 19% of patients with adequately controlled OSA as having residual OSA (five out of 26 patients).
DISCUSSION
This study examined two main questions related to the utility of autoCPAP in the clinical setting: first, do the devices adequately detect OSA and secondly, once detected, is there an adequate therapeutic response? We investigated disease detection by deliberately setting a subtherapeutic fixed pressure. There was good agreement between sleep apnoea events measured from the autoCPAP-derived flow signal and the reference standard measured directly at the mask. However, the events detected by the automated algorithm of the device and displayed on the autoCPAP report showed lesser degrees of agreement with the reference standard. In approximately one-third of patients, the autoCPAP report over- or under-estimated the AHI by ≥5 events·h−1 compared with the mask AHI. In addition, when the device was used in autotitrating mode, there was significant misclassification of those with and without residual sleep-disordered breathing (AHI ≥10 events·h−1): nearly one-quarter of patients were inadequately treated. Although the present study did not compare findings from a night of autotitrating therapy with a night of optimally titrated fixed-pressure CPAP, the results indicate that clinicians should interpret autoCPAP download report results with caution.
We have detected significant limitation in the ability of the autoCPAP algorithm to detect and classify respiratory events. Under the subtherapeutic condition, there was disagreement about the severity of OSA between the mask-measured AHI and the algorithm-scored AHI displayed in the autoCPAP report (mean difference 0.07 events·h−1, limits of agreement 11.57− -11.44 events·h−1). Although the mean AHI did not statistically differ between the two measures, 11 out of 34 patients had a difference of ±5 events·h−1 between the two AHI measures and two out of 34 patients had a difference of ±10 events·h−1. Bland–Altman plots also demonstrated inconsistent differences between the two measures with a trend for autoCPAP to underestimate the AHI at higher values and overestimate at lower values (fig. 2).
Due to the high prevalence of residual OSA during the autotitrating night, we evaluated the two measures with the device set to the autotitrating mode. We found similar misclassification issues and the autoCPAP report AHI was significantly higher than the mask-measured AHI (mean difference -1.7 events·h−1, limits of agreement 6.96− -10.36 events·h−1). The observed disagreements would not affect clinical management decision-making for most patients but could impact those that lie near the thresholds of disease classifications and/or efficacious therapy. These data suggest that the in-built detection algorithm may result in the under- or over-treatment of some OSA patients.
A potential explanation for the observed disagreement is that the device is too far away from source of the airflow to allow for accurate detection. However, during the fixed-pressure condition, the autoCPAP in-built pressure/flow transducer produces a high-quality electronic signal and provides an AHI with good agreement with the mask AHI (mean difference 0.5 events·h−1, limits of agreement 5.3− -4.3; fig. 2). Furthermore, continual adjustment of the applied pressure level during the auto-titrating condition only marginally reduced the precision of agreement whilst slightly improving the mean accuracy (mean difference -0.1 events·h−1, limits of agreement 5.9− -6.1 events·h−1). These findings suggest that the disagreement lies not with the in-built flow sensor itself but with the detection algorithm.
Although our experiment was not designed to investigate under-treatment during autotitration, the unexpected residual OSA observed in 24% of patients allowed us to examine the accuracy of the autoCPAP report to classify patients with and without residual OSA (cut-off of AHI ≥10 events·h−1). We found the report failed to identify 62.5% of patients with residual disease (five out of eight false negatives) and over-scored events in 19% of patients with well-controlled OSA (five out of 26 false positives). This suggests limitations in the detection algorithm and questions the reliability of the parameters presented on the autoCPAP report. Misclassification of adequately treated patients on autoCPAP may lead to unnecessary and expensive reviews/re-titrations, and unrecognised residual OSA on therapy may prolong the disease burden on the patient.
The autoCPAP did not respond appropriately to breath-by-breath flow analysis and did not deliver the correct level of pressure to control OSA (AHI <10 events·h−1) under the autotitrating condition in 24% of patients (eight out of 34). We found no patient-specific predictors of residual OSA although the study may have been underpowered to detect such associations. When the sleep apnoea severity measure included flow limitation events (RDI), a greater proportion of patients were under-treated on autoCPAP (26 (76%) out of 34 patients). These findings bring into question claims by the manufacturer that the device consistently detects flow limitation events and increases the pressure before an obstruction occurs [16]. Furthermore, although flow limitation is measured by the algorithm and pressures are altered based on the degree of flattening, the autoCPAP report lacked these data and so we were unable to assess the degree to which the algorithm accurately detects these events. Our data suggest that limitations of the in-built detection and response algorithm contribute to the under-treatment of some OSA patients by autoCPAP.
Other research groups have reported concordant findings. Bench studies have indicated that autoCPAP devices variably respond to certain types of simulated hypopnoeas and demonstrate delays when responding to apnoeas [12, 13, 17–19]. Numerous clinical studies have reported high proportions (17–29%) of under-treated patients on autoCPAP and fixed-pressure CPAP therapy [8, 9, 20, 21]. It is possible that this occult residual OSA may explain some of the variable clinical outcomes associated with autoCPAP [6, 22, 23].
Earlier studies reported strong correlations between the device- and polysomnography-measured respiratory indices, a tendency of autoCPAP to over-score events, and an underestimation of AHI at higher values [24–26]. These early studies were limited in that the devices were evaluated in diagnostic mode and autoCPAP devices are neither intended nor recommended for diagnostic purposes [10].
More recent studies that assessed the accuracy of autoCPAP in estimating residual OSA found good agreement between the algorithm-derived AHI on the report and manually scored AHI [21, 27–29]. However, limitations with these studies include that the assessment was made either during manual titration [27] or data were from non-experimental case series [21, 28]. One previous experimental study assessed the accuracy of autoCPAP-scored AHI as a secondary outcome. However, this was performed in a small group and patients were morbidly obese with severe OSA, so these results may not be generalisable [29]. To the best of our knowledge, this is the first randomised crossover study to evaluate both the detection and response of an autoCPAP device in an attended setting with OSA measured independently (at the mask).
There are several limitations with our study. Polysomnography was necessary to independently evaluate the detection and response capability of the device. It could be argued that the skilled technicians had an advantage over the device’s algorithm to detect OSA as they had access to other physiological signals in addition to flow. Our definition of apnoeas was based on a conservative approach to accepted scoring criteria whereby events that met the amplitude reduction criteria and (rather than or) were associated with an oxygen desaturation of >3% or an arousal were scored. We chose the mask site as the reference-standard flow measure instead of a pneumotachograph because it is a simple and practical measurement site commonly used during PAP titrations. We assessed an early autoCPAP model that employed the AutoSet algorithm to treat OSA because it had been previously validated, and data from these early validation studies are currently used to support and promote newer models [30]. However, the degree to which these initial studies can be relied upon in subsequent technical iterations of the device is unclear as these data are not available from the manufacturer. Moreover, in our clinical experience, the model we tested is still used by many patients. Furthermore, it is often difficult to compare device behaviour due to the proprietary nature of the technology [31]. However, it is clear that technology differs between autoCPAP devices and our findings cannot be generalised to other devices as their detection and response capabilities may be better or worse.
Our findings demonstrate that it is both a detection and response limitation of the in-built algorithm of the device that results in the under-treatment of some OSA patients by autoCPAP. When evaluating patients, clinicians should be aware of the limitations of each device and not rely solely on the autoCPAP report statistics. Further research is required to assess treatment effectiveness of other autoCPAP devices, to validate the accuracy of parameters captured on the download report and to determine the long-term effect of the residual OSA on health.
Acknowledgments
The authors acknowledge the support of the Woolcock Institute of Medical Research (Sydney, Australia), the staff at the Royal Prince Alfred Hospital Sleep Laboratory (Sydney, Australia), and all the participants in the study.
Footnotes
This article has supplementary material available from www.erj.ersjournals.com
Clinical Trial
This study is registered in the Australian and New Zealand Clinical Trials Registry with identifier number ACTRN12606000486527.
Support Statement
This study was supported by an AusIndustry Biotechnology Innovation Fund Grant.
Statement of Interest
Statements of interest for G.C. Dungan II, N.S. Marshall and R.R. Grunstein can be found at www.erj.ersjournals.com/site/misc/statements.xhtml
- Received June 2, 2011.
- Accepted September 26, 2011.
- ©ERS 2012