## Abstract

Clinical trials do not report sputum eosinophil data in a consistent method and this makes it difficult to compare across studies and to evaluate the sample sizes estimated in these studies. The objectives of the paper are: 1) to systematically review reporting of effect size and sample calculations in randomised controlled trials using sputum eosinophil count as a primary outcome and 2) to illustrate sample size estimation under different methods of data representation using data from an effective anti-eosinophil treatment strategy (mepolizumab).

Randomised controlled trials in adults (excluding allergen provocation models) of treatment of asthma and chronic obstructive pulmonary disease for the past 10 years were searched in Ovid MEDLINE and 20 studies were identified that met all the inclusion criteria. Only nine studies discussed sample size calculation.

Change from baseline was used as an outcome in 11 studies and was expressed as change in absolute percentage count, percentage change from baseline or as fold changes.

Assuming a minimal clinically important reduction of 15% in absolute terms, 18 subjects in each arm will be required to achieve 80% power using an ANCOVA analysis, which we recommend, to detect significance with an alpha error of 0.05.

## Abstract

**Systematic review and illustration of sample size calculations in RCTs using sputum eosinophil count as a primary outcome** http://ow.ly/mK9g3

## Introduction

Eosinophil counts in sputum are a valid and reliable outcome measure in clinical trials of patients with eosinophilic lung diseases like asthma [1, 2]. Clinical trials of corticosteroids and specific anti-eosinophil agents, like mepolizumab, have consistently shown a reduction in sputum eosinophil counts [3–6]. The recent guidelines for clinical end-points in asthma trials set out by the American Thoracic Society (ATS) and the European Respiratory Society (ERS) have also incorporated the use of induced sputum eosinophil counts as an outcome measure [7]. However, this document does not provide guidelines for expressing results or calculating sample sizes for clinical trials which use sputum eosinophils as an outcome.

Clinical trials with two groups and with pre- and post-interventional data in each group usually compare either the post-interventional scores or change from baseline scores, using either a t-test or an analysis of covariance adjusting for baseline differences. The latter method of analysis has been held to be superior to all other methods by statisticians, as it has the highest statistical power [8]. For sputum eosinophil counts, which are conventionally expressed as a percentage of the total cell count, change from baseline is preferred over post-interventional scores as it is clinically more meaningful and gives an estimate of the amount of improvement or deterioration caused by an intervention. It has been observed that clinical trials reporting change from baseline for sputum eosinophil counts have expressed the results as fold changes (ratio of pre- and post-interventional scores), percentage from baseline or as change in absolute percentage counts. Although none of the three are superior to one another from a biological point of view, change from baseline in terms of absolute percentage count is preferable due to its better clinical interpretability and lack of need for statistical data manipulation. Expressing results as fold changes or percentage change from baseline may be challenging, especially when there are zero counts as often happens in clinical practice. These methods also convert the values into ratios, which have a non-normal or skewed distribution necessitating data transformation (usually log) prior to carrying out statistical tests that assume a normal distribution. The zero values are usually either replaced by positive number, like 0.1 or 0.001 [9], or a constant number, *e.g*. 0.2 [7], added prior to log transformation.

As a consequence of the differences in expressing the results, sample size estimation methods vary significantly in these clinical trials, as sample sizes depend on the selected effect sizes and the standard deviation of the primary outcome variable [10]. Often sample size reporting is also incomplete. We are not aware of any study, so far, that has examined the completeness or the quality of reporting of sample size calculations, effect sizes and analysis methods, or demonstrated the method of doing so for clinical trials using change in sputum eosinophil count from baseline as the primary outcome.

We therefore decided to 1) systematically review current practice of reporting effect sizes and sample size calculations in randomised controlled trials using sputum eosinophil count as a primary outcome; 2) describe and illustrate how to apply different strategies for defining outcomes using sputum eosinophil count in interventional studies that have both baseline and post-intervention data; and 3) demonstrate sample size estimation under the different methods of data representation illustrated in 2) and data analysis with a dataset from a prior clinical trial [3].

## Methods

### Search strategy

The OVID search engine was used to search for all randomised controlled trials, in the English language, in the past 10 years, with “asthma” AND “sputum eosinophils” OR “airway inflammation” as key search words using the Ovid MEDLINE(R), EMBASE and Ovid HealthSTAR databases. The inclusion criteria of the articles were: 1) studies on asthma and chronic obstructive pulmonary diseases (COPD); 2) randomised controlled trials involving drug trials on human subjects; and 3) studies using sputum eosinophil measures as a primary outcome. Both parallel group and cross over studies were included. Studies were included if the primary aim was to look at airway inflammation even if sputum eosinophil count was not the only primary outcome. Studies involving children, allergen or antigen provocation models, environmental exposure studies and those using nonparametric tests as the primary methods of analysis were excluded. Data were extracted independently by two reviewers (A. Dasgupta and S.Z.) after going through the methods section of all the studies and tabulated carefully using a data extraction form. Any disagreements between the two reviewers were discussed and resolved by consensus. Figure 1 shows the study selection process.

### Different strategies for defining outcomes

Sputum eosinophil counts are customarily expressed as percentages of total cell count and not as absolute counts [11]. Change from baseline for sputum eosinophils for asthma and COPD trials can be represented in three different ways as follows: 1) change from baseline in absolute percentage counts, *i.e.* pre-intervention % - post-intervention % (or post-intervention % - pre-intervention %); 2) fold changes, *i.e.* pre-intervention % / post-intervention % (or post-intervention % / pre-intervention %); 3) percentage change from baseline, *i.e.* (pre-intervention % – post-intervention %)×100/pre-intervention %.

Data from the mepolizumab study [3] was used to demonstrate the above strategies. Zero values were replaced by 0.1 prior to all calculations.

### Demonstration of sample size calculation

Sample size calculations were demonstrated using the above dataset for independent parallel group superiority studies for t-tests to achieve 80% power and a Type I error of 0.05 assuming equal sample sizes in intervention and placebo group. The standard deviations and the effect sizes for change in absolute percentage counts were calculated. Fold changes were log transformed (natural logs) and zero counts replaced by 0.1 prior to transformation. We also used the data to calculate the sample size with fold changes as the outcome. For this calculation, both standard deviation and effect estimate were on log scale.

The sample size formula [10] used was:

Where n is the size per group; Z_{α/2} is the standard normal z-score corresponding to the two-sided α level of significance; Z_{1-β} is the standard normal z-score corresponding to the probability β of Type II error; δ_{0} is the minimum clinically important difference; s is the prior estimate of the pooled standard deviation of the estimate of the difference between groups.

Sample size calculation using ANCOVA was done for a hypothetical superiority trial where the clinically minimal important difference was 15% in absolute percentage change or twofold-change difference. Sample size for ANCOVA was calculated from that of the t-test while adjusting by a factor of (1-ρ^{2}), where ρ is the correlation between the baseline and the final outcome measure [12].

## Results

### Reporting of sample size calculation

Of the 361 search results, 20 studies fulfilled the inclusion criteria [9, 13–32]. 11 of the 20 (55%) did not report how the sample size was calculated. Of the nine other articles that reported sample size calculation, only two articles had detailed all the required parameters related to sample size calculation and only one article contained all eight items within the checklist (table 1). The articles expressed outcome variably as change in absolute percentage counts, percentage changes from baseline or as fold changes. One article reported outcome as both change in absolute percentage count and as percentage change from baseline. Nine articles reported post-intervention score comparison, while the rest used change from baseline data for analysis (table 2). Six articles used ANCOVA for statistical analysis but none of the studies adjusted sample sizes for this.

### Demonstration of strategies for defining outcomes

The mepolizumab trial was a parallel group randomised control trial, with 10 patients in the placebo group and eight patients in the intervention (mepolizumab) group with pre- and post-intervention sputum eosinophil counts. The raw data for each of the groups and the different ways of expressing change from baseline are demonstrated below (table 3). On the other hand, changes in absolute percentage counts, being the difference between pre- and post-intervention values, may be assumed to be normally distributed despite the distribution of the pre- and post-interventional values being non-normal. On the other hand, fold changes being ratios have a skewed statistical distribution. Hence, fold changes were log transformed prior to calculating the standard deviation and effect estimate. Similar results for percentage change from baseline were expressed as median (range).

In the placebo group sputum eosinophils increased by mean±sd 17.9±27.8% while in the intervention group it decreased by 21.05±14.6%. The effect size was therefore 38.9% (absolute percentage counts) or 54-fold (or 3.8 in natural log-scale). The pooled standard deviation was 23% and 1.47 in absolute percentage counts and log (fold changes), respectively. The median change in terms of percentage change from baseline was -251.2% (-1669.4–60.6%) for the placebo group and 93.8% (71.9–99.6%) for the intervention group.

### Demonstration of sample size calculation

The values of standard deviation and effect sizes expressed as change in absolute percentage counts and natural log transformed fold changes were put into equation 1 and calculations made for sample sizes (table 4). From the mepolizumab clinical trial data, the correlation between pre-intervention and change from baseline in absolute percentage counts and in the log scale is -0.73 and 0.72, respectively. Therefore, the number of participants required will be reduced by a factor of 0.47. Thus, one (2×0.47) patients in each arm or three (6×0.47) patients in each arm will be sufficient when outcomes are expressed as change in absolute percentage counts and fold changes, respectively, provided other parameters remain constant.

## Discussion

### Reporting sample sizes and effect sizes

Sample size calculations are under-reported in clinical drug trials using sputum eosinophil counts as a primary outcome. This is clear from the results of our systematic review. This reduces the accuracy of trials and leads to inappropriate conclusions. There are also substantial variations in the way change from baseline sputum eosinophil count is expressed, which makes comparison between clinical trials and drug effects difficult. There is, thus, an unmet need to formulate guidelines for reporting drug effects when using sputum eosinophil counts in clinical trials. We did not examine the differences in sample size estimations and reporting between asthma and COPD trials, as the variance of change of sputum eosinophil counts and the minimal clinically important difference are conventionally taken to be the same for both diseases. Therefore, sample size estimation is unlikely to be affected by the disease definition. We also did not include studies in subjects with mild intermittent asthma where effects of various interventions in attenuating allergen-induced eosinophilic inflammation were examined. This was for two reasons. First, we wanted to limit our estimation of sample sizes to clinical trials that evaluated therapies that are relevant to patients with mild persistent or moderate-to-severe asthma in phase IIb or III studies, rather than experimental therapies in subjects with mild intermittent asthma. Secondly, and more importantly, mechanisms of reversal of established eosinophilia may be very different to those that prevent recruitment of eosinophils (such as following an allergen inhalation) and, therefore, effect sizes for these two separate interventions may be different. This needs to be investigated in a separate study.

### Expressing change from baseline

Of the three methods of expressing change from baseline, percentage change from baseline has been considered to be an inefficient method, by statisticians, as it does not correct for any imbalance between groups at baseline and has a non-normal distribution [8]. Further, when used in sputum eosinophil trials, as demonstrated with this dataset, percentage change from baseline values may be both negative and positive and also numerically very large. This makes data transformation difficult, and percentage change from baseline is, therefore, not a very useful method of data representation for sputum eosinophil studies.

By contrast, change in absolute percentage count has greater clinical interpretability compared to fold changes, as exemplified below. A change from 25% to 5% is a 20% change in absolute percentage count, which is equivalent to a fivefold change or 80% change from baseline. Again a reduction from 5 to 1% is a 4% change in absolute percentage count, but has the same fold and percentage changes as the former case. In the latter case the sputum eosinophils have been normalised and a patient with such a report will not require additional treatment. But in the former situation a patient needs to be treated further to reduce the counts into the normal range. In another example it may be shown that a change from 25% to 0.1% is a 250-fold change, which is the same fold change as a change from 5% to 0.2%. It is obvious that the former change has more clinical significance than the latter. Therefore, it may be reasonable to suggest that effect sizes or outcomes in clinical trials be expressed in terms of absolute percentage counts for better interpretation of results in the clinical context. The use of fold changes may be justified for use only in those situations where data transformation is needed for statistical purposes. In clinical practice and in dose-response studies [32], it is generally observed that sputum eosinophil percentages change proportional to the baseline values, although this has not been evaluated systematically.

### Sample size calculation

Sample size calculations using the mepolizumab data show that only six subjects or fewer in each arm in a parallel group randomised study on severe asthma subjects may have sufficient power to detect significant changes when using a t-test, assuming that all participants have the same baseline risk. The reason for such small sample size estimation is the large effect size of this trial. Such large effect sizes are, however, not unusual in severe asthma trials using anti-eosinophil specific therapies [4]. Corticosteroids have shown a reduction in the range of two- to seven-fold [6, 33] or 15% to 20% in absolute sputum eosinophil percentages in clinical trials of severe eosinophilic asthma [9, 13]. Smaller effect sizes are, however, seen with milder asthma trials [14].

The standard minimal clinically important difference (MCID) for sputum eosinophil studies has been traditionally accepted to be a 2-fold or 50% change [7]. From figure 2, which shows the sample size estimates for various effect sizes, it may be appreciated that clinical trials such as the mepolizumab study will have 80% power for detecting such a change with 72 participants in each arm. A consensus agreement on the MCID in terms of absolute percentage counts has not yet been reached. However, as mentioned earlier corticosteroids have shown a reduction in eosinophil counts of between 15% and 20% in various severe asthma studies. Therefore, it may be reasonable to assume an MCID of 15% in terms of absolute percentage counts when performing severe asthma trials with baseline eosinophil counts of more than 15%. Thus, 37 and 18 subjects in each arm will be required to achieve an 80% power using a t-test and ANCOVA, respectively (fig. 3).

It may also be seen that the estimate using log transformation (in fold change calculations) is much smaller than that using absolute percentage counts. This is because the variance gets stabilised as a result of the transformation, thereby reducing the sample size. Thus, sample size estimates are also somewhat affected by the method of data representation. From our calculations there was a reduction in the sample size estimates by a factor of two when similar sample size parameters were used. However, due to the increased level of complexity when using ANCOVA, the gained statistical power is often out-balanced by the difficulty in employing the method.

### Conclusion

Clinical trials using sputum eosinophil counts as a primary outcome do not often report how sample size calculations are made or the parameters assumed for them. Sample sizes in these trials are affected not only by the method of analysis, study design, MCID and variance, but also by the strategies adopted for data representation for defining outcomes. The best method to adopt for such trials is to express results as the change in absolute percentage counts, as this is clinically most relevant and does not involve the statistical intricacies of data transformation. Adjusting calculated sample sizes based on use of ANCOVA may reduce the sample size estimates by half. A sample size of 18 patients in each arm in an ANCOVA superiority trial may be sufficient to achieve an 80% power, with an alpha error of 0.05, to detect a 15% difference in absolute percentage counts in severe asthma studies with specific anti-eosinophil agents, such as the mepolizumab.

## Footnotes

For editorial comments see page 891.

Support statement: P. Nair is supported by a Canada Research Chair in Airway Inflammometry.

Conflict of interest: Disclosures can be found alongside the online version of this article at www.erj.ersjournals.com

- Received May 11, 2012.
- Accepted December 1, 2012.

- ©ERS 2013