Contradictory results from randomised controlled trials of acupuncture in asthma suggest both a beneficial and detrimental effect. The authors conducted a formal systematic review and meta-analysis of all randomised clinical trials in the published literature that have compared acupuncture at real and placebo points in asthma patients.
The authors searched for trials published in the period 1970–2000. Trials had to measure at least one of the following objective outcomes: peak expiratory flow rate, forced expiratory volume in one second (FEV1) and forced vital capacity. Estimates of the standarised mean difference, between acupuncture and placebo were computed for each trial and combined to estimate the overall effect. Hetereogeneity was investigated in terms of the characteristics of the individual studies.
Twelve trials met the inclusion criteria but data from one could not be obtained. Individual patient data were available in only three. Standardised differences between means ranging from 0.071 to 0.133, in favour of acupuncture, were obtained. The overall effect was not conventionally significant and it corresponds to an approximate difference in FEV1 means of 1.7. After exploring hetereogenenity, it was found that studies where bronchoconstriction was induced during the experiment showed a conventionally significant effect.
This meta-analysis did not find evidence of an effect of acupuncture in reducing asthma. However, the meta-analysis was limited by shortcomings of the individual trials, in terms of sample size, missing information, adjustment of baseline characteristics and a possible bias against acupuncture introduced by the use of placebo points that may not be completely inactive. There was a suggestion of preferential publication of trials in favour of acupuncture. There is an obvious need to conduct a full-scale randomised clinical trial addressing these limitations and the prognostic value of the aetiology of the disease.
Several randomised clinical trials have reported a benefit from acupuncture in the treatment of asthma 1, 2, but generally results appear contradictory, suggesting both beneficial and detrimental effects 3, 4, 5, 6. The efficacy of acupuncture in asthma has not been proven beyond reasonable doubt 7. This may be due to differences in trial design and mode of treatment or to the small size of the trials. In terms of design, the insertion of a needle prevents the use of blindness to remove the placebo effect and therefore needles are sometimes inserted in “placebo points” 8. The wide range of outcomes measured using objective tests (peak flow rates) to perceived breathlessness or anxiety introduces another source of variation. Differences in the mode of treatment include a diversity of acupuncture points, periods of stimulation and methods of needle insertion 9. The size of all the individual studies was only a fraction of the sample size given by a conventional power requirement: 550 patients would be required to detect a standardised difference between means of 0.25, with a power of 80%, at the 5% significance level. To circumvent the problem of small sample size, the current authors aimed to systematically review and combine the results from all relevant randomised clinical trials that have compared acupuncture at real and placebo points in asthma patients 10. This approach allows detection of moderate treatment effects, which are unlikely to be reliably detected in small studies 11, 12, and also a more objective assessment of the sources of the conflicting results achieved in different trials.
A previous systematic review of seven trials involving 174 patients presented in the Cochrane Database of Systematic Reviews 13 integrated quantitative summaries of only three of the trials, using the difference of means. Using the standardised mean difference, the overview presented here allows a quantitative meta-analysis of nine of the eleven clinical trials included. In addition, this study attempts to ascertain and quantify the different sources of bias of the meta-analysis. Finally, although the study eligibility criteria of the Cochrane's Database overview are similar to the ones used in this study, the studies included are not the same. The authors give a full account of the methodology used and compare the use of unstandardised and standardised difference of means for the overall effect.
Eligibility of trials
The authors formulated two eligibility criteria. First, the study had to be a randomised clinical trial comparing real and placebo acupuncture in subjects with asthma. Second, the study had to measure at least one of the objective end points: peak expiratory flow rate (PEFR), forced expiratory volume in one second (FEV1) and forced vital capacity (FVC).
Retrieving the literature
Initially, references from published papers 1, 6 and narrative reviews about the topic 3, 8 were searched. A computer-assisted search examining the following literature databases was also performed: Medline, Biological Abstracts and Dissertation Abstracts. The keywords used were: “acupuncture”, “asthma”, “pulmonary disease”, “clinical trial”, “alternative medicine”, “randomised controlled trial” and “complementary medicine”. Successive searches were performed during the period of study, modifying the initial keywords in order to incorporate any new information. The search spanned the period 1970–2000. Finally, the authors of all eligible reports were contacted and asked if they were aware of any further published or unpublished work. Retrieval spanned the period December 1994–December 2000.
Characterisation of the studies
From each eligible report the following were noted: year and source of publication, number of patients, type of randomisation, blindness, number of acupuncture points, number of excluded patients, medication, and outcomes and statistical summaries of sex, age and duration of disease and treatment. A questionnaire was designed to assess other subjective features related to the quality of the reports 14. The questionnaire had three sections to assess, separately, study design (10 items), statistical analysis (three items) and presentation of results (two items). A score was assigned to each item to give a maximum achievable score of 39 points (27 for study design, 10 for statistical analysis section and 2 for presentation of results). Four experienced biostatisticians independently assessed the papers. Author and published source were unknown to the different assessors. The reliability of this assessment was evaluated by the effective reliability 15.
Every trial reported on at least one of the two outcome measures PEFR (six trials) and FEV1 (seven trials), and no single outcome was reported for all the trials. Consequently, two meta-analyses were performed; one based on PEFR (using FEV1 when PEFR was not available) and the other one based on FEV1 (using PEFR when FEV1 was not available). Since different outcomes had to be combined, the standardised differences between means were used instead of the difference between means, despite the fact that this summary limits the comparability and interpretation of the analysis 16. In crossover studies the correlation coefficient was used to compute the variance of the mean difference. A weighted-combination of the correlation coefficients was used in those studies that did not provide enough information for the correlation coefficient to be recovered 17.
If available, the authors planned to use the individual patient data. Otherwise, only summaries were used. A scanner and a technical drawing program were used to estimate means and standard errors when results were displayed graphically. For those trials with a series of repeated measurements, the maximum mean change was chosen as a criterion to obtain a single effect size for each trial.
To combine results from trials and estimate an overall effect, the authors used the fixed-effects model 18, weighting each trial with the inverse of the variance of the effect size. To assess whether there was any evidence of statistical disparity in results across trials a test of heterogeneity was performed 19. Given the low power of this test 20, possible sources of heterogeneity were also investigated. To explore the contribution of each study, the authors partitioned the sum of squares (of deviations between individual effects and overall) from the test of hetereogeneity (QH) into two parts, one related to the between-subgroup (QB) differences and the other related to the within-subgroup differences (QW). When any heterogeneity was not explained by any identifiable cause, the robustness of the overall treatment effect was assessed using the random-effects model 21.
Description of the clinical trials
Over 200 possible trials were identified but only 12 satisfied the inclusion criteria 1, 2, 6, 22–30. One trial published in a Chinese journal 22 could not be recovered as the authors were only able to retrieve a poor translation of the abstract summarising the results, which did not provide the information needed to extract any useful summary data. The main author was contacted by mail but all attempts were unsuccessful. Consequently, only 11 studies were included for further analysis. The descriptive information for each of the 11 trials is shown in table 1⇓. Some trials had missing information. No trial stated the form of randomisation used and how it was performed.
In table 2⇓, the trials are classified by type of design, either crossover or parallel groups, and by three subcategories, according to whether the analysis had been: unadjusted by baseline measures, adjusted by baseline measures or adjusted by baseline measures and reported as percentage change from the baseline value. Only two of the crossover trials considered the possibility of a period effect 23, 24 and none took account of the possibility of a carry-over effect. Consequently, in the present analysis the authors did not assume that either period or carry-over effects were important in any of the trials. Another feature which varied across trials was whether or not asthma had been induced; asthma was induced by means of exercise or some sort of bronchospasm in five out of eleven trials. Reports were poorly presented. In many cases p-values were not precisely stated and, instead, their relationship to a conventional significance level was given. In addition, means were often stated without an indication of variability (such as a standard error or a plot of the means).
Estimation of the treatment effect
Figure 1⇓ and table 3⇓ show the standardised difference between means from nine studies, choosing a single outcome from each report. The overall treatment effect, estimated by the unbiased standardised difference between means (using the pooled correlation coefficient to estimate its variance), was d=0.12 and its 95% confidence interval (CI) was (−0.07–0.31). This corresponds to an approximate difference in FEV1 means of 1.7 (95% CI −1.3–4.7). Figure 1⇓ shows these results with the corresponding plot for individual studies and the overall combined result.
The test of hetereogeneity was not statistically significant at conventional levels (QH=12.54 with eight degress of freedom (df); p=0.13). However, when the contributions were examined, the study by Dias et al. 6 presented the greatest contribution to the hetereogeneity statistic. After removing the study from the analysis, the test of heterogeneity showed a considerably lower value (QH=5.41 with seven df; p=0.61). The overall-effect size estimator without this trial was 0.167, (95% CI −0.02–0.359). Under the random-effects model this estimator was 0.12 (95% CI −0.14–0.38). This result was similar to the one obtained in the fixed-effects model, although the CI were slightly wider.
Studies with induced (provoked) and noninduced bronchoconstriction were analysed separately. The subset of studies in which bronchoconstriction was provoked gave an estimated effect of 0.3 (95% CI 0.04–0.56). In addition, there was very little evidence of hetereogeneity of results across these trials. In contrast, the estimated effect for the studies where bronchoconstriction was not provoked was −0.08 (95% CI −0.28–0.20). The test of heterogeneity for these trials approached conventional significance (QH=7.49 with four df, p=0.11). This hetereogeneity was mainly due to the study by Dias et al. 6. For this subset the effect-size estimator under the random-effects model was −0.08 (95% CI −0.45–0.29).
Testing the reliability of the estimation
To assess the robustness of the standardised mean difference results, the difference between means was separately calculated for each outcome measure. There were several limitations since the different studies reported different outcome measures. In some cases, this prevented the authors from combining all studies which presented the same outcome measure and, therefore, the number of trials was reduced. For example, the outcome measure FVC was reported in three trials, with only two of them showing similar experimental conditions. Therefore, this outcome measure was not examined. For the same reason one of the six studies that assessed FEV1 was removed from the analysis whereby five studies were integrated. Figure 2⇓ and table 4⇓ display the combination of the mean differences and standardised mean differences for this outcome measure. The overall mean difference in FEV1 was 3.53 (95% CI 0.44–6.62). The overall standardised mean difference in FEV1 was 0.17 (95% CI −0.05–0.39). The test of hetereogeneity for the overall mean difference was QH=4.05 (four df, p-value=0.40). The standardised mean differences showed a slight increase in the heterogeneity statistic (QH=4.93, four df; p=0.29).
Evidence for publication bias
The small number of randomised trials and their relatively small size meant that there was little power to assess the evidence for trials in favour of acupuncture to be preferentially reported. The funnel plot shown in figure 3⇓ is difficult to assess. The authors observed that all the effects corresponding to sample sizes between 11 and 19 were positive and the two largest trials had negative effects. One of the latter, the study by Dias et al. 6, is seen as a clear outlier. This could suggest a small publication bias towards positive results but both negative and positive results were published, reflecting the controversy of the subject (the Chinese trial for which data was unavailable 22, reported a positive effect of the acupuncture on asthma with 184 patients).
Complementary therapies are of growing interest in healthcare. Acupuncture is one of the most popular of the alternative therapies. Some of the attractions stem from its long standing use in Chinese medicine and the avoidance of the side-effects of more conventional treatments for asthma such as corticosteroids and β2-agonist sympathomimetics 8. Overall effect sizes (standardised differences between means) of magnitudes between 0.07 and 0.13 (95% CI −0.07–0.31) were obtained. This corresponds to a largest plausible increase of FEV1 of ∼1.7, which may suggest that acupuncture for asthma has little effect on the objective outcomes considered. However, interestingly, a small effect may have been observed for experimentally-induced bronchoconstriction. The different aetiology may have resulted in this effect. Whether subjective outcomes (quality of life in general or perceived breathlessness or anxiety) were affected cannot reliably be assessed and may need testing in the future.
It appears that there is no clear agreement on the best method of conducting controlled trials of acupuncture in asthma in relation to the type of design, selected end-points and data analysis. In exploring hetereogeneity of effects across studies, the differences between studies with varying experimental designs (crossover, unpaired comparison, paired comparison) and with varying quality in the presentation of the reports were assessed. Nevertheless, these sources of heterogeneity did not seem to affect the final conclusion. First, the test of hetereogeneity generally gave low values although the results might indicate a modest effect of acupuncture in cases where the asthma was provoked and it would be wise to be aware of this hypothesis in future investigations. Secondly, both the fixed- and random-effects models have delivered similar results. The authors consider that the fixed-effects model is appropriate in this meta-analysis since the degree of heterogeneity is not too large and, as seen previously, it may be explained. Moreover, the comparative study using both the standardised and nonstandardised mean difference shows no contradictory conclusions, except for the end-point FEV1. For the five studies that measured FEV1, the standardised mean difference was 0.17 (95% CI of −0.05–0.39) while the mean FEV1 difference was 3.5 (95% CI 0.3–9.5). The discrepancy for this end-point may have resulted from the inclusion of the study by Tashkin et al. 23: it contributes negatively with a large weight in the estimation of the standardised differences between means, whereas its contribution to the pooled estimator of the differences between means is small. The reason for this is that the study by Tashkin et al. 23 shows the largest sample size with the largest variance. This is obviously an extraordinary situation and suggests that there may be a mistake in this paper with regard to the calculation of the estimate.
Given the difficulty of assessing the impact of most biases on the overall result, the limitations that may have affected the reliability of this meta-analysis should be considered. There were factors that may have introduced a bias against acupunture. First, there was no evidence to suggest that any of the trials estimated the sample size a priori and all of them were too small to detect a modest effect of acupuncture. An aim of meta-analysis is to increase the number of patients in order to detect such moderate effects with clinical significance 11 but the integrated number of patients in the present meta-analysis was still below the size given by a conventional power requirement. Secondly, placebo points used in asthma trials seem to be active in pulmonary disease 8. Thirdly, missing information was a considerable limitation. None of the papers presented sufficient information necessary to estimate the effect size and indirect methods had to be used that may have made the results more conservative. The use of the approach by Follman et al. 17 yields conservative results. Furthermore, some basic statistics were obtained by scanner from graphs and this would have added a measurement error. However, the authors believe that this was unlikely to be systematic in one direction. The effect of other factors such as missing information and poor report writing are likely to be important but their impact is difficult to estimate. In relation to presentation of reports, Rosenberger 31 presents a list of recommendations which will be essential information if an updated and more powerful meta-analysis of acupuncture is to be performed in the future.
The assessment of the quality of the studies by the independent assessors in this study is in general agreement with other studies 3. It shows that there are several shortcomings in the studies of acupuncture on asthma, above all, in terms of sample size, effects of prognostic variables, missing information and the bias against acupuncture introduced by the use of placebo points that may not be completely inactive. It may still be possible to obtain the patient data to avoid the problem of missing information and to have the option of using more complex analyses 32. It is important to locate the chinese report 22 by contacting different libraries and through the internet. The current meta-analysis did not find evidence of the efficacy of acupuncture in the treatment of patients with asthma, in agreement with the result presented in the Cochrane Database of Systematic Reviews 13. However, it is important to mention that the integrated sample size in both studies was still below the sample size given by a conventional power requirement. Hence, there is an obvious need to design a large randomised clinical trial in which the above limitations are addressed.
- Received September 8, 2000.
- Accepted May 28, 2002.
- © ERS Journals Ltd