Substantial effective sample sizes were required for external validation studies of predictive logistic regression models

Yvonne Vergouwe; Ewout W Steyerberg; Marinus J C Eijkemans; J Dik F Habbema

doi:10.1016/j.jclinepi.2004.06.017

Substantial effective sample sizes were required for external validation studies of predictive logistic regression models

J Clin Epidemiol. 2005 May;58(5):475-83. doi: 10.1016/j.jclinepi.2004.06.017.

Authors

Yvonne Vergouwe¹, Ewout W Steyerberg, Marinus J C Eijkemans, J Dik F Habbema

Affiliation

¹ Department of Public Health, Erasmus MC, P.O. Box 1738, 3000 DR Rotterdam, The Netherlands. Y.Vergouwe@UMCUtrecht.nl

PMID: 15845334
DOI: 10.1016/j.jclinepi.2004.06.017

Abstract

Background and objectives: The performance of a prediction model is usually worse in external validation data compared to the development data. We aimed to determine at which effective sample sizes (i.e., number of events) relevant differences in model performance can be detected with adequate power.

Methods: We used a logistic regression model to predict the probability that residual masses of patients treated for metastatic testicular cancer contained only benign tissue. We performed standard power calculations and Monte Carlo simulations to estimate the numbers of events that are required to detect several types of model invalidity with 80% power at the 5% significance level.

Results: A validation sample with 111 events was required to detect that a model predicted too high probabilities, when predictions were on average 1.5 times too high on the odds scale. A decrease in discriminative ability of the model, indicated by a decrease in the c-statistic from 0.83 to 0.73, required 81 to 106 events, depending on the specific scenario.

Conclusion: We suggest a minimum of 100 events and 100 nonevents for external validation samples. Specific hypotheses may, however, require substantially higher effective sample sizes to obtain adequate power.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Logistic Models*
Male
Monte Carlo Method
Neoplasm Metastasis
Neoplasms, Germ Cell and Embryonal / pathology*
Predictive Value of Tests*
Reproducibility of Results
Sample Size*
Testicular Neoplasms / pathology*