Abstract
This report describes the quality control programme used within the Bronchitis Randomized on N-acetylcysteine (NAC) Cost-Utility Study, a trial designed to assess the decline in lung function, exacerbation rate, health status, and cost-effectiveness with NAC or a placebo in 523 patients with chronic obstructive pulmonary disease over a 3-yr period.
Spirometry was scored from 0 (worst quality) to 6 (best quality). The mean score of 314 spirometries from 243 patients evaluated during the trial was 5.63±0.83. Linear regression analysis of the scores of 47 participating centres plotted against the time at which spirometries were performed yielded an intercept of 5.7±0.5 and a slope of -0.0001±0.001, which suggests that the initial high quality was maintained over time.
Retrospective examination of a further 345 postbronchodilator spirometries from 208 patients with a forced expiratory volume at one second exceeding the mean individual value recorded over the study in excess of 20% revealed a slightly lower quality of the start-of-test manoeuvre compared with the 314 spirometries.
In conclusion, these findings would suggest that the quality control programme is likely to have helped achieve and maintain long-term spirometry performance in the Bronchitis Randomized on N-acetylcysteine (NAC) Cost-Utility Study trial. Special care should be paid to the spirometries whose forced expiratory volume in one second values exceed the mean value.
In multicentre studies the outcomes are dependent upon the quality of the collected parameters. This is especially true when the latter are derived from spirometry examination, because factors such as patient cooperation, technician skill, instrument quality, centre organisation and motivation over time may eventually affect the final output 1–7. According to the literature, performance and quality of lung function parameters can be maintained at high levels during trials of long duration only if the variability of the instruments, software, and organisation is minimised, and regular and strict spirometry quality-control programmes are applied 1–5. However, the implicit limits of such measures are the increased cost of the trials.
The Bronchitis Randomized on N-acetylcysteine (NAC) Cost-Utility Study (BRONCUS) is a recently designed trial for assessing the decline in lung function, exacerbation rate, health status and its decline, and the cost-effectiveness of NAC or a placebo in chronic obstructive pulmonary disease (COPD) over a 3-yr period 8. The study was assisted by a Quality Control Committee (QCC) aimed at evaluating the quality of the respiratory function to help the centres achieve and maintain best practice standards. This purpose was accomplished by evaluating and scoring 243 postbronchodilator spirometries taken in random order from 47 centres. The spirometry scores of each centre, linearly regressed against time, gave an index of the initial quality (intercept) and maintenance over time (slope). In addition, after the trial was terminated a further 345 postbronchodilator spirometries, which had a forced expiratory volume in one second (FEV1) exceeding the mean individual value recorded over the study in excess of 20%, were evaluated to see whether FEV1 assessments higher than expected were associated with poor quality of the forced expiratory manoeuvre.
MATERIALS AND METHODS
General design
The study was designed as a phase III, two-arm double-blind, randomised placebo-controlled parallel group trial, examining the effects of NAC 600 mg·day−1 versus a placebo on annual decline in lung function, exacerbation rate, quality of life and cost-effectiveness. A total of 523 patients were tracked for 3 yrs.
Centres selection
University or general hospital centres were selected on indication of the country coordinators regarding the institutions' previous experience in national or international clinical trials, established pulmonary function test laboratories following international recommendations for spirometry guidelines, and willingness to perform the trial.
Patient selection
The patients came from 50 centres in 10 European countries. Inclusion criteria were smoking-related COPD between 40–70 yrs, a postbronchodilator FEV1 between 40–70% of predicted, FEV1 and/or forced vital capacity (FVC) reversibility of <12% pred and <200 mL 15 min after inhalation of 400 μg of salbutamol through an metered dose inhaler and a spacer, FEV1/vital capacity (VC) ratio of <88% of pred in males and <89% of pred in females, and a history of at least two exacerbations in 1 yr over the last 2 yrs prior to enrolment.
Spirometry evaluations
All spirometries were done with the use of a pneumotachograph, except in one centre which used a dry spirometer. The same equipment was maintained for the entire duration of the study.
The centres were asked to follow the spirometry recommendations of the European Respiratory Society (ERS) 9. Prior to the trial, a 3-h teaching session on the technical aspects of spirometry and calibration was attended by the investigators and the personnel who conducted the trial. In brief, recommendations were given for acceptability and repeatability criteria of the maximum forced expiratory manoeuvres as listed in table 1⇓. Because only a few centres could provide the complete series of the classical quality control parameters 1, 2, 6, records of at least the three best acceptable forced expiratory manoeuvres with relevant flow-volume loops (FVL) and corresponding values of FEV1, FVC, and peak expiratory flow (PEF) were requested for future evaluation by the QCC. The centres were asked to calibrate the equipment before each study session with a syringe of 3 L according to the ERS guidelines 9. No checks were performed by the QCC on calibration.
Spirometries were arbitrarily scored by one of the authors of this paper on the basis of both visual inspection of the FVL and evaluation of the FEV1 and FVC repeatability on the delivered spirometries (table 2⇓). With visual analysis of the FVL, a score of 2 was assigned to a spirometry fulfilling all the following start-of-manoeuvre parameters: sharp PEF, steep upslope of the FVL, and absent back-extrapolation volume (BEV) (<100 mL or 5% FVC, if available in the printout). In contrast, a score of 0 was assigned to a spirometry with either a visually flat PEF, or a flat upslope of the FVL, or an evident BEV (>100 mL or >5% FVC, if available in the printout). For the assessment of the FEV1 repeatability criteria, the scoring was as follows: 2, if the largest FEV1 of three reported acceptable manoeuvres did not exceed the next highest value by >0.1 L; 1 if it was between 100–200 mL; and 0 >200 mL. The same held for the repeatability of the FVC. Thus, for each spirometry, the highest score was 6. Expiratory time was not considered in the present scoring system because it was not constantly shown in the final report or measurable on the volume-time plots. The terminal portion of the FVL was carefully inspected to see whether or not flow tended to gradually decrease to zero rather than being suddenly interrupted, thus helping exclude obvious cases of premature termination of the forced manoeuvre before 6 s.
Throughout the trial, the QCC reviewed and scored 314 postbronchodilator spirometries randomly selected by the Statistics Department of Zambon group from 243 patients at 50 centres. Selection of the tests to be evaluated was based on the criterion that 100% of the centres, and a minimum of 40% of the patients and 5% of the spirometries had to be included in the sample. In this current paper, only the results of 47 centres are reported; this was because for three centres the spirometries were too few and insufficient to assess the quality standards over time. The QCC regularly reported on the quality of the spirometry at the biannual Steering Committee meetings and the results were subsequently sent to the participating centres together with the necessary persuasive recommendations on how to maintain or improve, if necessary, the standards of spirometric tests. When there was evidence that the quality of spirometry tended to decrease in a particular centre, the principal investigators were contacted in order to discuss the specific problems and potential actions to be taken to improve the quality.
The quality standards of the centres were estimated by linearly regressing the scores of all spirometries of each centre evaluated by the QCC versus the time at which the tests were performed. A representative example of the quality of a centre is shown in figure 1⇓. The resulting intercept was taken as an index of the initial quality level, and the slope as an index of performance over time.
An additional 345 spirometries of 208 patients with a postbronchodilator FEV1 exceeding the mean individual value recorded over the study in excess of 20% were evaluated after the study was completed.
STATISTICS
Mann-Whitney U-test, Wilcoxon signed-ranks test, and Chi-squared test were used whenever appropriate to test the differences between variables. Correlations were assessed by Pearson's test. Values are reported as mean±(sd). When p<0.05 it was considered to be statistically significant.
RESULTS
The main anthropometric and functional data of the patients studied during the trial and those with an FEV1>20% of the individual mean were similar to the entire group that participated in the BRONCUS trial (table 3⇓).
Of the 314 spirometries randomly evaluated throughout the trial, 42 (13%) had BEV, and 123 (39%) had an expiratory time reported in the printout. Visual analysis of these FVLs was capable of identifying the lack of excessive BEV in 40 of the 42 spirometries. BEVs of the remaining cases were 0.13 and 0.11 L. Expiratory time >6 s for the forced expiratory manoeuvres occurred in 117 out of the 123 cases (10.9±4.6 s). Average total score of all 314 spirometries was 5.63±0.83 (table 4⇓). Both scores of start-of-test manoeuvre and FVC repeatability were significantly less than that of FEV1 repeatability (p<0.001 for both). Start-of-test manoeuvre and FEV1 and FVC repeatability fulfilled the ERS requirements in 93, 96, and 81% of the patients respectively. Both start-of-test and FEV1 criteria were satisfied in 90% of the patients. The linear regression analysis of the spirometry scores plotted against time was constructed with an average of 7±3 observations per centre and over a period of 899±344 days (mean r2 = 0.51±0.44). The intercept was 5.7±0.5, thus suggesting high initial performance. The slope was not significantly different from zero (−0.0001±0.001), thus suggesting that the overall quality of spirometry remained quite stable throughout the 3 yr trial. Intercept and slopes of the 47 centres are shown in figure 2⇓. The average total score was not correlated with age, body mass index (BMI), education, FEV1 % pred, FVC % pred, or FEV1/FVC. Males and females performed similarly, whereas former smokers performed slightly, but significantly, better than current smokers (5.5±1.0 and 5.2±1.2, respectively; p<0.05).
Of the 345 spirometries with a postbronchodilator FEV1>20% than the mean individual value recorded over the study, 35 had BEV and 123 had expiratory times reported in the printout. Visual analysis of these FVLs was capable of identifying the absence of BEV in 32 cases. Expiratory time of the expiratory manoeuvre was >6 s on 117 occasions (11.3±3.8 s). An average total score of the spirometries (5.36±1.13) was slightly, but significantly, less than that of the 314 spirometries evaluated throughout the trial (p<0.002), thus suggesting a slightly worse performance. This was due to a slightly, but significantly, worse start-of-test quality (1.69±0.72) (p<0.001). Both start-of-test and FVC repeatability scores were significantly worse than FEV1 repeatability (p<0.001 for both). The start-of-test and FEV1 and FVC repeatability criteria fulfilled the ERS requirements in 85, 94, and 73% of the patients respectively. Both start-of-test and FEV1 criteria were satisfied in 83% of the patients. Average data are presented in table 4⇑. Average total scores were correlated with BMI (r2 = 0.14, p<0.01), but not with age, education, FEV1 % pred, FVC % pred, and FEV1/FVC.
DISCUSSION
The main findings of this study suggest that a regular quality control programme based on random supervision was apparently capable of achieving and maintaining an adequate level of performance over the 3-yr period in the BRONCUS trial. However, the quality of the spirometries with an FEV1 exceeding the mean individual value recorded over the study in excess of 20% was slightly, but significantly, worse than the spirometries below that threshold.
The rate of spirometries failing to meet the criteria of acceptability and repeatability for the classical functional parameters reported in the literature is highly variable. For example, a variability rate as low as 2% to as high as 80% of the spirometries had been reported as failing to meet the American Thoracic Society (ATS) standard for the FEV1 1–5, 10. This rate would be slightly higher if the stricter ERS criteria were applied. Worse results have been reported for FVC 2, 4, 7. Even if assessed with different methods, the rate of performance in the present trial was apparently well within the average range of the studies assisted by very strict QCCs 1–7. Many potential factors could have contributed to the success of the trial. Firstly, the trial was conducted in well-recognised European centres of excellence with long-standing experience in basic and clinical research. Secondly, rather than standardising the devices in all centres, the investigators were allowed to use their own equipment in the hope that working with their familiar spirometers could largely overcome the disadvantages of using new homogeneous equipment. Whether these two factors contributed to maintaining the standard of the current results cannot be inferred from these data, even though they do represent a reasonable presupposition.
QCCs have been used repeatedly in recent multicentre trials without any clear indication regarding the extent of effort and resources needed to be deployed in order to guarantee the desired results. Some studies have utilised top technical and human resources including effective training of the investigators, the supply of the same spirometers to the investigating centres together with software capable of collecting and transmitting all data to a coordinating centre along with personnel dedicated to visiting, to data processing, and statistical analysis 1–5. Notwithstanding all this, the rate of spirometries not fulfilling the international guidelines never fell to zero 1, 5. It is difficult to say whether our QCC was as effective in maximising the quality of the spirometric tests as those of previous trials, mainly due to the differences in the quality assessment of the spirometries and the effort spent encouraging the investigators. From a technical point of view, the current authors acknowledge the numerous methodological differences with the previous trials 1–7. Yet, for the reasons listed below, it is believed that the current approach was at least acceptable for the purpose of the trial. First, visual analysis of the start-of-test manoeuvre has already been used in research 11. Secondly, visual assessment of the initial part of the FVL in the current subsets of 42 and 35 cases yielded similar results to the numerical analysis. Thirdly, a comprehensive gold standard for proper evaluation of the quality of spirometry is still lacking, for important parameters such as time and volume history effects of the inspiratory manoeuvre preceding the forced expiration have never been considered 12. The current authors accept that simple analysis of FVC repeatability, without measuring the expiratory time, could have missed some unacceptable tests. However, the current findings in the subsets of spirometries with reported expiratory time in the printout would demonstrate that the duration of the best manoeuvre was, in most cases, above 6 s. Thus, the problem of the low repeatability of the FVC must have accounted for the difference in time of the three acceptable manoeuvres.
The present results show that start-of-manoeuvre failed to meet standard criteria significantly more often than the repeatability of the FEV1. It is common experience that many patients consistently hesitate to initiate the forced expiratory manoeuvre or loose some air before blowing out. Others in contrast, are able to do submaximal efforts with impressive repeatability of FEV1 and FVC. Because the beginning of forced expiration may seriously affect the FEV1 and its interpretation 13, especially after medical interventions or over time, these findings suggest the importance of visual inspection of the FVL together with the back extrapolation volume and time-to-PEF whenever available as part of the quality control parameters of the instruments. Also the higher rate of failure for FVC to meet standard criteria than FEV1 is in line with the data from the literature 2, 4, 7, and suggests that when this parameter is not the main outcome of the trial, it is too often neglected.
The fact that the current QCC was effective in maintaining the standards of spirometry throughout the study is suggested by the slope of the linear regression of the overall score of spirometry over time not being significantly different from zero. Data from the literature and common experience show that performance in pulmonary function testing tends to worsen with time due to lack of interest, motivation, and control 1, 5. To explain the success of the BRONCUS trial in this regard, the authors deemed it necessary that random monitoring of lung function should never be interrupted and sharing the QCC results with all investigators at least twice a year, in combination, could contribute to minimising, or at least avoiding, the commonest pitfalls, and that in turn best maintained the desired quality standards.
Interestingly, spirometries with an FEV1 exceeding the mean individual value recorded over the study in excess of 20% showed significantly poorer quality than those with an FEV1 below that threshold. Most of the problems were observed in the start-of-test phase as a result of submaximal efforts and/or hesitation. Because poorer effort is associated with a greater FEV1 as a result of less thoracic gas compression in ∼25% ofthe cases 13, this most likely contributed to explain the observed overestimation of the FEV1. The current authors doubt that such a small, though statistically significant difference affected the results of the BRONCUS trial on the decay of lung function over time, either because the current study's scoring system was not intended to address this point or the number of spirometries with an FEV1>20% of mean value was too small compared with the total number used to construct the curves of lung function decay (6.8%). Yet, these findings suggest once again the importance of strict quality programmes when clinical questions have to be addressed with lung function tests in multicentre trials.
In summary, random evaluation of the spirometries of the Bronchitis Randomized on N-acetylcysteine (NAC) Cost-Utility Study trial was apparently helpful to achieve and maintain the desired levels of performance. Factors in the mechanism of this success were likely to involve the awareness and need for adequate support of the investigators in order to minimise or avoid the commonest pitfalls in lung function testing, and provide encouragement to them in ways that best maintained the desired standards. Special attention should be paid to the start-of-test phase because most of the overestimation of the forced expiratory volume in one second was often the result of patient submaximal effort or hesitating expiratory manoeuvres.
Acknowledgments
The Steering Committee of the BRONCUS trial was represented by M. Decramer (Leuven, Belgium; Chairman), A. Ardia (Zambon, Italy), W. De Backer (Antwerp, Belgium), D. Olivieri (Parma, Italy), M. Rutten-van Mölken (Rotterdam, the Netherlands), T. Troosters (Leuven, Belgium), and C. van Herwaarden (Nijmegen, The Netherlands), C.P. Onno van Schayck (Maastricht, The Netherlands).
The following investigators participated in the study: M. Decramer (Leuven, Belgium), W. De Backer (Antwerp, Belgium), P.M. Mengeot (Montignies sur Sambre, Belgium), J. Verhaert (Lanaken, Belgium), B. De Wispelaere (Turnhout, Belgium), L. Delaunois (Mont-Godinne, Belgium), R.J. Broux (Liège, Belgium), J. Bruart (La Louvière, Belgium), A. Nusch (Essen, Germany), H. Reichert (Essen, Germany), H. Steffen (Landsberg am Lech, Germany), B. Kromer (Augsburg, Germany), D. Rost (Augsburg, Germany), V. Sobradillo (Bilbao, Spain), R. Rodriguez-Roisin (Barcelona, Spain), H. Vera Hernando (La Coruna, Spain), J.L. Viejo Banuelos (Burgos, Spain), J. Martinez Gonzalez del Rio (Oviedo, Spain), N. Gonzalez-Mangado (Madrid, Spain), J.M. Rodriguez Gonzalez-Moro (Madrid, Spain), J. Castillo Gomez (Sevilla, Spain), T. Veidebaum (Talinn, Estonia), E. Leesik (Tartu, Estonia), J.F. Muir (Rouen, France), B. Blaive (Nice, France), P. Guerin (Lyon, France), P. Camus (Dijon, France), E. Weitzemblum (Strasbourg, France), J.C. Pujet (Paris, France), M. Del Donno (Parma, Italy), M. Luisetti (Pavia, Italy), L. Pesce (Padova, Italy), C. Sanguinetti (Osimo-An, Italy), T. Todisco (Perugia, Italy), A.J. Neve (Sittard, the Netherlands), D.S. Postma (Groningen, The Netherlands), A.M.P. Greefhorst (Hengelo, The Netherlands), H.M.M. Pouwels (Venlo, The Netherlands), H.E.J. Sinninghe Damste (Amsterdam, the Netherlands), A.J.M. Schreurs (Amsterdam, The Netherlands), P.N.R. Dekhuijzen (Nijmegen, The Netherlands), J.A. van Noord (Heerlen, The Netherlands), F.J.J. van den Elshout (Arnhem, The Netherlands), C.A.R. Groot (Oss, The Netherlands), V.D. Graaf (Utrecht, The Netherlands), P. Bresser, (Amsterdam, The Netherlands), H. Gooszen and J. Creemers (Eindhoven, The Netherlands), R. Avila (Lisboa, Portugal), R. de Almeida (Vila Nova de Gaia, Portugal), L. Oliveira (Coimbra, Portugal), I. Gomes (Porto, Portugal), D. Nowak (Lodz, Poland), J. Zielinski (Warsaw, Poland), G. Riise (Gothenburg, Sweden)
- Received March 7, 2005.
- Accepted August 1, 2005.
- © ERS Journals Ltd