Skip to main content

Main menu

  • Home
  • Current issue
  • ERJ Early View
  • Past issues
  • For authors
    • Instructions for authors
    • Submit a manuscript
    • Author FAQs
    • Open access
    • COVID-19 submission information
  • Alerts
  • Podcasts
  • Subscriptions
  • ERS Publications
    • European Respiratory Journal
    • ERJ Open Research
    • European Respiratory Review
    • Breathe
    • ERS Books
    • ERS publications home

User menu

  • Log in
  • Subscribe
  • Contact Us
  • My Cart

Search

  • Advanced search
  • ERS Publications
    • European Respiratory Journal
    • ERJ Open Research
    • European Respiratory Review
    • Breathe
    • ERS Books
    • ERS publications home

Login

European Respiratory Society

Advanced Search

  • Home
  • Current issue
  • ERJ Early View
  • Past issues
  • For authors
    • Instructions for authors
    • Submit a manuscript
    • Author FAQs
    • Open access
    • COVID-19 submission information
  • Alerts
  • Podcasts
  • Subscriptions

COVID-19 prediction models should adhere to methodological and reporting standards

Gary S. Collins, Maarten van Smeden, Richard D. Riley
European Respiratory Journal 2020 56: 2002643; DOI: 10.1183/13993003.02643-2020
Gary S. Collins
1Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gary S. Collins
Maarten van Smeden
2Julius Center for Health Science and Primary Care, University Medical Center Utrecht, University of Utrecht, Utrecht, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard D. Riley
3Centre for Prognosis Research, School of Primary, Community and Social Care, Keele University, Keele, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Abstract

COVID-19 prediction models should adhere to methodological and reporting standards https://bit.ly/3ebnook

To the Editor:

The coronavirus disease 2019 (COVID-19) pandemic has led to a proliferation of clinical prediction models to aid diagnosis, disease severity assessment and prognosis. A systematic review has identified 66 COVID-19 prediction models: concluding that all, with no exception, are at high risk of bias due to concerns surrounding the data quality, statistical analysis and reporting, and none are recommended for use [1]. Therefore, we read with interest the recent paper by Wu et al. [2] describing the development of a model to identify COVID-19 patients with severe disease on admission to facilitate triage. However, our enthusiasm was dampened by a number of concerns surrounding the design, analysis and reporting of the study which deserve highlighting to readers.

Our first point relates to design. The authors randomly split their dataset into a training and test set. This been long been shown to be an inefficient use of the data [3], reducing the size of the training set (increasing the risk of model overfitting), and creating a test set too small for model evaluation. There are alternative stronger approaches that use the entire data to both develop and internally validate a model based on cross-validation or bootstrapping [3]. This naturally leads us to further elaborate on the sample size. The sample size in a prediction model study is largely influenced by the number of individuals experiencing the event to be predicted (in the study by Wu et al. [2], those with severe disease). Using published sample size formulae for developing prediction models [4, 5], based on information reported in the study by Wu et al. [2] (75 predictors, outcome prevalence of 0.237), then depending on the anticipated model R-squared, the minimum sample size in the most optimistic scenario (e.g. that the model gives the highest R-squared) would be 1285 individuals (306 events). To precisely estimate the intercept alone requires 279 individuals (66 events). After splitting their data, the authors developed their model with a sample size of 239 individuals (57 events): clearly insufficient to estimate even the model intercept, let alone develop a prediction model.

The test set was then used to evaluate the performance of their model comprising 60 individuals of whom ∼14 experienced the event. To put this in perspective, current sample size recommendations to evaluate model performance suggest a minimum of 100 events [6]. The performance of the model was also evaluated separately in each of five external validation datasets where the number of events ranged from 7 to 98, none of which meet this minimum requirement.

Other concerns include the handling of missing data; it is hard to believe all patients had complete information on all 75 predictors, and indeed the flow chart reveals 38 individuals with missing data were simply excluded, which can led to bias [7]. Continuous predictors were assumed to be linearly associated with the outcome, which can reduce predictive accuracy. Model overfitting (a clear concern given the small sample size) was not addressed either in adjusting the performance measures for optimism or shrinking the regression coefficients that are likely overestimated (e.g. using penalisation techniques [8]). “Synthetic sampling” was used to address imbalanced data, but this is inappropriate since artificially balancing data will produce an incorrect estimation of the model intercept (unless it is re-adjusted post-estimation), leading to incorrect model predictions (miscalibration). Model performance was poorly and inappropriately assessed, including presenting a confusion matrix (inappropriate for evaluating prediction models [8]), reporting sensitivity/specificity (where net benefit would be more informative [9]), and assessing model calibration using weak and again discredited approaches (e.g. Hosmer–Lemeshow test, rather than calibration plots with graphical loess curves [6]). We also question the arbitrary choice of risk groupings, and why individuals with a predicted risk of 0.21 are considered the same (“middle risk”) as those with a predicted risk of 0.80.

Arguably the most important aspect of a prediction model article is the presentation of the model so that others can use or evaluate it in own their own setting. The authors have presented a nomogram and (prematurely) linked to a web calculator. Whilst both these formats can be used to apply the model to individual patients (though given our concerns we urge against this), for independent validation the prediction model needs to be reported in full; namely, all the regression coefficients and the intercept [10], but these are noticeably absent.

Finally, the authors followed the STARD checklist for reporting their study, but this is not the correct guideline. STARD is for reporting diagnostic test accuracy studies, and not multivariable clinical prediction models. We urge the authors and other investigators developing (COVID-19) prediction models to consult the TRIPOD Statement (www.tripod-statement.org) for key information to report when describing their prediction model study, so that readers have the minimal information required to judge the quality of the study [10]. The accompanying TRIPOD explanation and elaboration paper describes the rationale of the importance of transparent reporting, but also discusses various methodological considerations [6].

Shareable PDF

Supplementary Material

This one-page PDF can be shared freely online.

Shareable PDF ERJ-02643-2020.Shareable

Footnotes

  • Conflict of interest: G.S. Collins has nothing to disclose.

  • Conflict of interest: M. van Smeden has nothing to disclose.

  • Conflict of interest: R.D. Riley has nothing to disclose.

  • Support statement: This work was supported by Cancer Research UK (grant C49297/A27294). G.S. Collins was supported by the NIHR Biomedical Research Centre, Oxford. Funding information for this article has been deposited with the Crossref Funder Registry.

  • Received July 5, 2020.
  • Accepted July 6, 2020.
  • Copyright ©ERS 2020.
http://creativecommons.org/licenses/by-nc/4.0/

This version is distributed under the terms of the Creative Commons Attribution Non-Commercial Licence 4.0.

References

  1. ↵
    1. Wynants L,
    2. Van Calster B,
    3. Collins GS, et al.
    Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020; 369: m1328. doi:10.1136/bmj.m1328
    OpenUrlAbstract/FREE Full Text
  2. ↵
    1. Wu G,
    2. Yang P,
    3. Xie Y, et al.
    Development of a clinical decision support system for severity risk prediction and triage of COVID-19 patients at hospital admission: an international multicentre study. Eur Respir J 2020; 56: 2001104. doi:10.1183/13993003.01104-2020
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Steyerberg EW,
    2. Harrell FE Jr.,
    3. Borsboom GJ, et al.
    Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54: 774–781. doi:10.1016/S0895-4356(01)00341-9
    OpenUrlCrossRefPubMedWeb of Science
  4. ↵
    1. Riley RD,
    2. Ensor J,
    3. Snell KIE, et al.
    Calculating the sample size required for developing a clinical prediction model. BMJ 2020; 368: m441. doi:10.1136/bmj.m441
    OpenUrlFREE Full Text
  5. ↵
    1. Riley RD,
    2. Snell KI,
    3. Ensor J, et al.
    Minimum sample size for developing a multivariable prediction model: PART II – binary and time-to-event outcomes. Stat Med 2019; 38: 1276–1296. doi:10.1002/sim.7992
    OpenUrlCrossRefPubMed
  6. ↵
    1. Moons KGM,
    2. Altman DG,
    3. Reitsma JB, et al.
    Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73. doi:10.7326/M14-0698
    OpenUrlCrossRefPubMed
  7. ↵
    1. Sterne JAC,
    2. White IR,
    3. Carlin JB, et al.
    Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009; 338: b2393. doi:10.1136/bmj.b2393
    OpenUrlFREE Full Text
  8. ↵
    1. Steyerberg EW
    . Clinical prediction models: a practical approach to development, validation, and updating. 2nd Edn. New York, Springer, 2019.
  9. ↵
    1. Vickers AJ,
    2. Van Calster B,
    3. Steyerberg EW
    . Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016; 352: i6. doi:10.1136/bmj.i6
    OpenUrlFREE Full Text
  10. ↵
    1. Collins GS,
    2. Reitsma JB,
    3. Altman D, et al.
    Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis: the TRIPOD statement. Ann Intern Med 2015; 162: 55–63. doi:10.7326/M14-0697
    OpenUrlCrossRefPubMed
PreviousNext
Back to top
View this article with LENS
Vol 56 Issue 3 Table of Contents
European Respiratory Journal: 56 (3)
  • Table of Contents
  • Index by author
Email

Thank you for your interest in spreading the word on European Respiratory Society .

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
COVID-19 prediction models should adhere to methodological and reporting standards
(Your Name) has sent you a message from European Respiratory Society
(Your Name) thought you would like to see the European Respiratory Society web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
Alerts
Sign In to Email Alerts with your Email Address
Citation Tools
COVID-19 prediction models should adhere to methodological and reporting standards
Gary S. Collins, Maarten van Smeden, Richard D. Riley
European Respiratory Journal Sep 2020, 56 (3) 2002643; DOI: 10.1183/13993003.02643-2020

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Share
COVID-19 prediction models should adhere to methodological and reporting standards
Gary S. Collins, Maarten van Smeden, Richard D. Riley
European Respiratory Journal Sep 2020, 56 (3) 2002643; DOI: 10.1183/13993003.02643-2020
del.icio.us logo Digg logo Reddit logo Technorati logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Full Text (PDF)

Jump To

  • Article
    • Abstract
    • Shareable PDF
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • PDF
  • Tweet Widget
  • Facebook Like
  • Google Plus One

More in this TOC Section

Agora

  • Lower lung function in patients recovering from COVID-19
  • Efficacy of elexacaftor/tezacaftor/ivacaftor in patients with cystic fibrosis
  • Ethical obligations for supporting healthcare workers during the COVID-19 pandemic
Show more Agora

Correspondence

  • Ethical obligations for supporting healthcare workers during the COVID-19 pandemic
  • Is high-dose glucocorticoid beneficial in COVID-19? Response to Correspondence
  • BALF lymphocytosis in chronic hypersensitivity pneumonitis
Show more Correspondence

Related Articles

Navigate

  • Home
  • Current issue
  • Archive

About the ERJ

  • Journal information
  • Editorial board
  • Reviewers
  • CME
  • Press
  • Permissions and reprints
  • Advertising

The European Respiratory Society

  • Society home
  • myERS
  • Privacy policy
  • Accessibility

ERS publications

  • European Respiratory Journal
  • ERJ Open Research
  • European Respiratory Review
  • Breathe
  • ERS books online
  • ERS Bookshop

Help

  • Feedback

For authors

  • Instructions for authors
  • Submit a manuscript
  • ERS author centre

For readers

  • Alerts
  • Subjects
  • Podcasts
  • RSS

Subscriptions

  • Accessing the ERS publications

Contact us

European Respiratory Society
442 Glossop Road
Sheffield S10 2PX
United Kingdom
Tel: +44 114 2672860
Email: journals@ersnet.org

ISSN

Print ISSN:  0903-1936
Online ISSN: 1399-3003

Copyright © 2021 by the European Respiratory Society