Skip to main content

Main menu

  • Home
  • Current issue
  • ERJ Early View
  • Past issues
  • Authors/reviewers
    • Instructions for authors
    • Submit a manuscript
    • Open access
    • COVID-19 submission information
    • Peer reviewer login
  • Alerts
  • Podcasts
  • Subscriptions
  • ERS Publications
    • European Respiratory Journal
    • ERJ Open Research
    • European Respiratory Review
    • Breathe
    • ERS Books
    • ERS publications home

User menu

  • Log in
  • Subscribe
  • Contact Us
  • My Cart

Search

  • Advanced search
  • ERS Publications
    • European Respiratory Journal
    • ERJ Open Research
    • European Respiratory Review
    • Breathe
    • ERS Books
    • ERS publications home

Login

European Respiratory Society

Advanced Search

  • Home
  • Current issue
  • ERJ Early View
  • Past issues
  • Authors/reviewers
    • Instructions for authors
    • Submit a manuscript
    • Open access
    • COVID-19 submission information
    • Peer reviewer login
  • Alerts
  • Podcasts
  • Subscriptions

Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: An observational cohort study

Rishi K. Gupta, Michael Marks, Thomas H. A. Samuels, Akish Luintel, Tommy Rampling, Humayra Chowdhury, Matteo Quartagno, Arjun Nair, Marc Lipman, Ibrahim Abubakar, Maarten van Smeden, Wai Keong Wong, Bryan Williams, Mahdad Noursadeghi on behalf of The UCLH COVID-19 Reporting Group
European Respiratory Journal 2020; DOI: 10.1183/13993003.03498-2020
Rishi K. Gupta
1Institute for Global Health, University College London, London, UK
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rishi K. Gupta
Michael Marks
2University College London Hospitals NHS Trust, London, UK
3Clinical Research Department, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Marks
Thomas H. A. Samuels
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Akish Luintel
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tommy Rampling
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Humayra Chowdhury
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matteo Quartagno
4MRC Clinical Trials Unit, Institute of Clinical Trials and Methodology, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arjun Nair
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc Lipman
5UCL Respiratory, Division of Medicine, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marc Lipman
Ibrahim Abubakar
1Institute for Global Health, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ibrahim Abubakar
Maarten van Smeden
6Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wai Keong Wong
2University College London Hospitals NHS Trust, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bryan Williams
7NIHR University College London Hospitals Biomedical Research Centre, London, UK
8University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mahdad Noursadeghi
2University College London Hospitals NHS Trust, London, UK
9Division of Infection & Immunity, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Abstract

Background The number of proposed prognostic models for COVID-19 is growing rapidly, but it is unknown whether any are suitable for widespread clinical implementation.

Methods We independently externally validated the performance candidate prognostic models, identified through a living systematic review, among consecutive adults admitted to hospital with a final diagnosis of COVID-19. We reconstructed candidate models as per original descriptions and evaluated performance for their original intended outcomes using predictors measured at admission. We assessed discrimination, calibration and net benefit, compared to the default strategies of treating all and no patients, and against the most discriminating predictor in univariable analyses.

Results We tested 22 candidate prognostic models among 411 participants with COVID-19, of whom 180 (43.8%) and 115 (28.0%) met the endpoints of clinical deterioration and mortality, respectively. Highest areas under receiver operating characteristic (AUROC) curves were achieved by the NEWS2 score for prediction of deterioration over 24 h (0.78; 95% CI 0.73–0.83), and a novel model for prediction of deterioration <14 days from admission (0.78; 0.74–0.82). The most discriminating univariable predictors were admission oxygen saturation on room air for in-hospital deterioration (AUROC 0.76; 0.71–0.81), and age for in-hospital mortality (AUROC 0.76; 0.71–0.81). No prognostic model demonstrated consistently higher net benefit than these univariable predictors, across a range of threshold probabilities.

Conclusions Admission oxygen saturation on room air and patient age are strong predictors of deterioration and mortality among hospitalised adults with COVID-19, respectively. None of the prognostic models evaluated here offered incremental value for patient stratification to these univariable predictors.

Abstract

Oxygen saturation on room air and patient age are strong predictors of deterioration and mortality among hospitalised adults with COVID-19, respectively. None of the 22 prognostic models evaluated in this study add incremental value to these univariable predictors.

Introduction

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), causes a spectrum of disease ranging from asymptomatic infection to critical illness. Among people admitted to hospital, COVID-19 has reported mortality of 21–33%, with 14–17% requiring admission to high dependency or intensive care units (ICU) [1–4]. Exponential surges in transmission of SARS-CoV-2, coupled with the severity of disease among a subset of those affected, pose major challenges to health services by threatening to overwhelm resource capacity [5]. Rapid and effective triage at the point of presentation to hospital is therefore required to facilitate adequate allocation of resources and to ensure that patients at higher risk of deterioration are managed and monitored appropriately. Importantly, prognostic models may have additional value in patient stratification for emerging drug therapies [6, 7].

As a result, there has been global interest in development of prediction models for COVID-19 [8]. These include models aiming to predict a diagnosis of COVID-19, and prognostic models, aiming to predict disease outcomes. At the time of writing, a living systematic review has already catalogued 145 diagnostic or prognostic models for COVID-19 [8]. Critical appraisal of these models using quality assessment tools developed specifically for prediction modelling studies suggests that the candidate models are poorly reported, at high risk of bias and over-estimation of their reported performance [8, 9]. However, independent evaluation of candidate prognostic models in unselected datasets has been lacking. It therefore remains unclear how well these proposed models perform in practice, or whether any are suitable for widespread clinical implementation. We aimed to address this knowledge gap by systematically evaluating the performance of proposed prognostic models, among consecutive patients hospitalised with a final diagnosis of COVID-19 at a single centre, when using predictors measured at the point of hospital admission.

Methods

Identification of candidate prognostic models

We used a published living systematic review to identify all candidate prognostic models for COVID-19 indexed in PubMed, Embase, Arxiv, medRxiv, or bioRxiv until 5th May 2020, regardless of underlying study quality [8]. We included models that aim to predict clinical deterioration or mortality among patients with COVID-19. We also included prognostic scores commonly used in clinical practice [10–12], but not specifically developed for COVID-19 patients, since these models may also be considered for use by clinicians to aid risk-stratification for patients with COVID-19. For each candidate model identified, we extracted predictor variables, outcome definitions (including time horizons), modelling approaches, and final model parameters from original publications, and contacted authors for additional information where required. We excluded scores where the underlying model parameters were not publicly available, since we were unable to reconstruct them, along with models for which included predictors were not available in our dataset. The latter included models that require computed tomography imaging or arterial blood gas sampling, since these investigations were not routinely performed among unselected patients with COVID-19 at our centre.

Study population

Our study is reported in accordance with transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidance for external validation studies [13]. We included consecutive adults admitted to University College Hospital London with a final diagnosis of PCR-confirmed (including all sample types) or clinically diagnosed COVID-19, between 1st February and 30th April 2020. Since we sought to use data from the point of hospital admission to predict outcomes, we excluded patients transferred in from other hospitals, and those with hospital-acquired COVID-19 (defined as 1st PCR swab sent >5 days from date of hospital admission, as a proxy for the onset of clinical suspicion of SARS-CoV-2 infection). Clinical COVID-19 diagnoses were made on the basis of manual record review by an infectious disease specialist, using clinical features, laboratory results and radiological appearances, in the absence of an alternative diagnosis. During the study period, PCR testing was performed on the basis of clinical suspicion, and no SARS-CoV-2 serology investigations were routinely performed.

Data sources and variables of interest

Data were collected by direct extraction from electronic health records, complemented by manual curation. Variables of interest in the dataset included: demographics (age, gender, ethnicity), comorbidities (identified through manual record review), clinical observations, laboratory measurements, radiology reports, and clinical outcomes. Each chest radiograph was reported by a single radiologist, who was provided with a short summary of the indication for the investigation at the time of request, reflecting routine clinical conditions. Chest radiographs were classified using British Society of Thoracic Imaging criteria, and using a modified version of the Radiographic Assessment of Lung Edema (RALE) score [14, 15]. For each predictor, measurements were recorded as part of routine clinical care. Where serial measurements were available, we included the measurement taken closest to the time of presentation to hospital, with a maximum interval between presentation and measurement of 24 h.

Outcomes

For models that used ICU admission or death, or progression to “severe” COVID-19 or death, as composite endpoints, we used a composite “clinical deterioration” endpoint as the primary outcome. We defined clinical deterioration as initiation of ventilatory support (continuous positive airway pressure, non-invasive ventilation, high flow nasal cannula oxygen, invasive mechanical ventilation or extra-corporeal membrane oxygenation) or death, equivalent to World Health Organisation Clinical Progression Scale≥6 [16]. This definition does not include standard oxygen therapy. We did not apply any temporal limits on (a) the minimum duration of respiratory support; or (b) the interval between presentation to hospital and the outcome. The rationale for this composite outcome is to make the endpoint more generalisable between centres, since hospital respiratory management algorithms may vary substantially. Defining the outcome based on level of support, as opposed to ward setting, also ensures that it is appropriate in the context of a pandemic, when treatments that would usually only be considered in an ICU setting may be administered in other environments due to resource constraints. Where models specified their intended time horizon in their original description, we used this timepoint in the primary analysis, in order to ensure unbiased assessment of model calibration. Where the intended time horizon was not specified, we assessed the model to predict in-hospital deterioration or mortality, as appropriate. All deterioration and mortality events were included, regardless of their clinical aetiology.

Participants were followed-up clinically to the point of discharge from hospital. We extended follow-up beyond discharge by cross-checking NHS spine records to identify reported deaths post-discharge, thus ensuring >30 days’ follow-up for all participants.

Statistical analyses

For each prognostic model included in the analyses, we reconstructed the model according to authors” original descriptions, and sought to evaluate the model discrimination and calibration performance against our approximation of their original intended endpoint. For models that provide online risk calculator tools, we validated our reconstructed models against original authors’ models, by cross-checking our predictions against those generated by the web-based tools for a random subset of participants.

For all models, we assessed discrimination by quantifying the area under the receiver operating characteristic curve (AUROC) [17]. For models that provided outcome probability scores, we assessed calibration by visualising calibration of predicted versus observed risk using loess-smoothed plots, and by quantifying calibration slopes and calibration-in-the-large (CITL). A perfect calibration slope should be 1; slopes <1 indicate that risk estimates are too extreme, while slopes >1 reflect risk estimates not being extreme enough. Ideal CITL is 0; CITL>0 indicates that predictions are systematically too low, while CITL<0 indicates that predictions are too high. For models with points-based scores, we assessed calibration visually by plotting model scores versus actual outcome proportions. For models that provide probability estimates, but where the model intercept was not available, we calibrated the model to our dataset by calculating the intercept when using the model linear predictor as an offset term, leading to perfect CITL. This approach, by definition, overestimated calibration with respect to CITL, but allowed us to examine the calibration slope in our dataset.

We also assessed the discrimination of each candidate model for standardised outcomes of: (a) our composite endpoint of clinical deterioration; and (b) mortality, across a range of pre-specified time horizons from admission (7 days, 14 days, 30 days and any time during hospital admission), by calculating time-dependent AUROCs (with cumulative sensitivity and dynamic specificity) [18]. The rationale for this analysis was to harmonise endpoints, in order to facilitate more direct comparisons of discrimination between the candidate models.

In order to further benchmark the performance of candidate prognostic models, we then computed AUROCs for a limited number of univariable predictors considered to be of highest importance a priori, based on clinical knowledge and existing data, for prediction of our composite endpoints of clinical deterioration and mortality (7 days, 14 days, 30 days and any time during hospital admission). The a priori predictors of interest examined in this analysis were age, clinical frailty scale, oxygen saturation at presentation on room air, C-reactive protein and absolute lymphocyte count [8, 19].

Decision curve analysis allows assessment of the clinical utility of candidate models, and is dependent on both model discrimination and calibration [20]. We performed decision curve analyses to quantify the net benefit achieved by each model for predicting the intended endpoint, in order to inform clinical decision making across a range of risk:benefit ratios for an intervention or “treatment” [20]. In this approach, the risk:benefit ratio is analogous to the cut point for a statistical model above which the intervention would be considered beneficial (deemed the “threshold probability”). Net benefit was calculated as sensitivity×prevalence – (1–specificity)×(1–prevalence)×w where w is the odds at the threshold probability and the prevalence is the proportion of patients who experienced the outcome [20]. We calculated net benefit across a range of clinically relevant threshold probabilities, ranging from 0 to 0.5, since the risk:benefit ratio may vary for any given intervention (or “treatment”). We compared the utility of each candidate model against strategies of treating all and no patients, and against the best performing univariable predictor for in-hospital clinical deterioration, or mortality, as appropriate. To ensure that fair, head-to-head net benefit comparisons were made between multivariable probability based models, points score models and univariable predictors, we calibrated each of these to the validation dataset for the purpose of decision curve analysis. Probability-based models were recalibrated to the validation data by refitting logistic regression models with the candidate model linear predictor as the sole predictor. We calculated “delta” net benefit as net benefit when using the index model minus net benefit when: (a) treating all patients; and (b) using most discriminating univariable predictor. Decision curve analyses were done using the rmda package in R [21].

We handled missing data using multiple imputation by chained equations [22], using the mice package in R [23]. All variables and outcomes in the final prognostic models were included in the imputation model to ensure compatibility [22]. A total of 10 imputed datasets were generated; discrimination, calibration and net benefit metrics were pooled using Rubin's rules [24].

All analyses were conducted in R (version 3.5.1).

Sensitivity analyses

We recalculated discrimination and calibration parameters for each candidate model using (a) a complete case analysis (in view of the large amount of missingness for some models); (b) excluding patients without PCR-confirmed SARS-CoV-2 infection; and (c) excluding patients who met the clinical deterioration outcome within 4 h of arrival to hospital. We also examined for non-linearity in the a priori univariable predictors using restricted cubic splines, with 3 knots. Finally, we estimated optimism for discrimination and calibration parameters for the a priori univariable predictors using bootstrapping (1000 iterations), using the rms package in R [25].

Ethical approval

The pre-specified study protocol was approved by East Midlands - Nottingham 2 Research Ethics Committee (REF: 20/EM/0114; IRAS: 282900).

Results

Summary of candidate prognostic models

We identified a total of 37 studies describing prognostic models, of which 19 studies (including 22 unique models) were eligible for inclusion (Supplementary Figure 1 and table 1). Of these, 5 models were not specific to COVID-19, but were developed as prognostic scores for emergency department attendees [26], hospitalised patients [12, 27], people with suspected infection [10] or community-acquired pneumonia [11], respectively. Of the 17 models developed specifically for COVID-19, most (10/17) were developed using datasets originating in China. Overall, discovery populations included hospitalised patients and were similar to the current validation population with the exception of one study that discovered a model using community data [28], and another that used simulated data [29]. A total of 13/22 models use points-based scoring systems to derive final model scores, with the remainder using logistic regression modelling approaches to derive probability estimates. A total of 12/22 prognostic models primarily aimed to predict clinical deterioration, while the remaining 10 sought to predict mortality alone. When specified, time horizons for prognosis ranged from 1 to 30 days. Candidate prognostic models not included in the current validation study are summarised in Supplementary Table 1.

FIGURE 1
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 1

Calibration plots for prognostic models estimating outcome probabilities. For each plot, the blue line represents a Loess-smoothed calibration curve from the stacked multiply imputed datasets and rug plots indicate the distribution of data points. No model intercept was available for the Caramelo or Colombi “clinical” models; the intercepts for these models were calibrated to the validation dataset, by using the model linear predictors as offset terms. The primary outcome of interest for each model is shown in the plot sub-heading.

View this table:
  • View inline
  • View popup
TABLE 1

Characteristics of studies describing prognostic models included in systematic evaluation

Overview of study cohort

During the study period, 521 adults were admitted with a final diagnosis of COVID-19, of whom 411 met the eligibility criteria for inclusion (flowchart shown in Supplementary Figure 2). Median age of the cohort was 66 years (interquartile range (IQR) 53–79), and the majority were male (252/411; 61.3%). Table 2 shows the baseline demographics, comorbidities, laboratory results and clinical measurements of the study cohort, of whom most (370/411; 90.0%) had PCR-confirmed SARS-CoV-2 infection (315/370 (85.1%) were positive on their first PCR test). A total of 180 (43.8%) and 115 (28.0%) of participants met the endpoints of clinical deterioration and mortality, respectively, above the minimum requirement of 100 events recommended for external validation studies [30]. The risks of clinical deterioration and death declined with time since admission (median days to deterioration 1.4 (IQR 0.3–4.2); median days to death 6.6 (IQR 3.6–13.1); Supplementary Figure 3). Most variables required for calculation of the 22 prognostic model scores were available among the vast majority of participants. However, admission lactate dehydrogenase was only available for 183/411 (44.5%) and D-dimer measured for 153/411 (37.2%), resulting in significant missingness for models requiring these variables (Supplementary Figure 4).

FIGURE 2
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 2

Decision curve analysis showing delta net benefit of each candidate model, compared to treating all patients and best univariable predictors. For each analysis, the endpoint is the original intended outcome and time horizon for the index model. Each candidate model and univariable predictor was calibrated to the validation data during analysis to enable fair, head-to-head comparisons. Delta net benefit is calculated as net benefit when using the index model minus net benefit when: (1) treating all patients; and (2) using the most discriminating univariable predictor. The most discriminating univariable predictor is admission oxygen saturation (SpO2) on room air for deterioration models and patient age for mortality models. Delta net benefit is shown with Loess-smoothing. Black dashed line indicates threshold above which index model has greater net benefit than the comparator. Individual decision curves for each candidate model are shown in Supplementary Figure 8.

View this table:
  • View inline
  • View popup
TABLE 2

Baseline characteristics of hospitalised adults with COVID-19 included in systematic evaluation cohort

Evaluation of prognostic models for original primary outcomes

Table 3 shows discrimination and calibration metrics, where appropriate, for the 22 evaluated prognostic models in the primary multiple imputation analysis. The highest AUROCs were achieved by the NEWS2 score for prediction of deterioration over 24 h (0.78; 95% CI 0.73–0.83), and the Carr “final” model for prediction of deterioration over 14 days (0.78; 95% CI 0.74–0.82). Of the other prognostic scores currently used in routine clinical practice, CURB65 had an AUROC 0.75 for 30-day mortality (95% CI 0.70–0.80), while qSOFA discriminated in-hospital mortality with an AUROC of 0.6 (95% CI 0.55–0.65).

View this table:
  • View inline
  • View popup
TABLE 3

Validation metrics of prognostic scores for COVID-19, using primary multiple imputation analysis (n=411)

For all models that provide probability scores for either deterioration or mortality, calibration appeared visually poor with evidence of overfitting and either systematic overestimation or underestimation of risk (fig. 1). Supplementary Figure 5 shows associations between prognostic models with points-based scores and actual risk. In addition to demonstrating reasonable discrimination, the NEWS2 and CURB65 models demonstrated approximately linear associations between scores and actual probability of deterioration at 24 h and mortality at 30 days, respectively.

Time-dependent discrimination of candidate models and a priori univariable predictors for standardised outcomes

Next, we sought to compare the discrimination of these models for both clinical deterioration and mortality across the range of time horizons, benchmarked against preselected univariable predictors associated with adverse outcomes in COVID-19 [8, 19]. We recalculated time-dependent AUROCs for each of these outcomes, stratified by time horizon to the outcome (Supplementary Figures 6 and 7). These analyses showed that AUROCs generally declined with increasing time horizons. Admission oxygen saturation on room air was the strongest predictor of in-hospital deterioration (AUROC 0.76; 95% CI 0.71–0.81), while age was the strongest predictor of in-hospital mortality (AUROC 0.76; 95% CI 0.71–0.81).

Decision curve analyses to assess clinical utility

We compared net benefit for each prognostic model (for its original intended endpoint) to the strategies of treating all patients, treating no patients, and using the most discriminating univariable predictor for either deterioration (i.e. oxygen saturation on air) or mortality (i.e. patient age) to stratify treatment (Supplementary Figure 8). Although all prognostic models showed greater net benefit than treating all patients at the higher range of threshold probabilities, none of these models demonstrated consistently greater net benefit than the most discriminating univariable predictor, across the range of threshold probabilities (fig. 2).

Sensitivity analyses

Recalculation of model discrimination and calibration metrics for prediction of the original intended endpoint using (a) a complete case analysis; (b) excluding patients without PCR-confirmed SARS-CoV-2 infection; and (c) excluding patients who met the clinical deterioration outcome within 4 h of arrival to hospital revealed similar results to the primary multiple imputation approach, though discrimination was noted to be lower overall when excluding early events (Supplementary Tables 2a-c). Visual examination of associations between the most discriminating univariable predictors and log odds of deterioration or death using restricted cubic splines showed no evidence of non-linear associations (Supplementary Figure 9). Finally, internal validation using bootstrapping showed near zero optimism for discrimination and calibration parameters for the univariable models (Supplementary Table).

Discussion

In this observational cohort study of consecutive adults hospitalised with COVID-19, we systematically evaluated the performance of 22 prognostic models for COVID-19. These included models developed specifically for COVID-19, along with existing scores in routine clinical use prior to the pandemic. For prediction of both clinical deterioration or mortality, AUROCs ranged from 0.56–0.78. NEWS2 performed reasonably well for prediction of deterioration over a 24-h interval, achieving an AUROC of 0.78, while the Carr “final” model [31] also had an AUROC of 0.78, but tended to systematically underestimate risk. All COVID-specific models that derived an outcome probability of either deterioration or mortality showed poor calibration. We found that oxygen saturation (AUROC 0.76) and patient age (AUROC 0.76) were the most discriminating single variables for prediction of in-hospital deterioration and mortality respectively. These predictors have the added advantage that they are immediately available at the point of presentation to hospital. In decision curve analysis, which is dependent upon both model discrimination and calibration, no prognostic model demonstrated clinical utility consistently greater than using these univariable predictors to inform decision-making.

While previous studies have largely focused on novel model discovery, or evaluation of a limited number of existing models, this is the first study to our knowledge to evaluate systematically-identified candidate prognostic models for COVID-19. We used a comprehensive living systematic review [8] to identify eligible models and sought to reconstruct each model as per the original authors’ description. We then evaluated performance against its intended outcome and time horizon, wherever possible, using recommended methods of external validation incorporating assessments of discrimination, calibration and net benefit [17]. Moreover, we used a robust approach of electronic health record data capture, supported by manual curation, in order to ensure a high-quality dataset, and inclusion of unselected and consecutive COVID-19 cases that met our eligibility criteria. In addition, we used robust outcome measures of mortality and clinical deterioration, aligning with the WHO Clinical Progression Scale [16].

A weakness of the current study is that it is based on retrospective data from a single centre, and therefore cannot assess between-setting heterogeneity in model performance. Second, due to the limitations of routinely collected data, predictor variables were available for varying numbers of participants for each model, with a large proportion of missingness for models requiring lactate dehydrogenase and D-dimer measurements. We therefore performed multiple imputation, in keeping with recommendations for development and validation of multivariable prediction models, in our primary analyses [32]. Findings were similar in the complete case sensitivity analysis, thus supporting the robustness of our results. Future studies would benefit from standardising data capture and laboratory measurements prospectively to minimise predictor missingness. Thirdly, a number of models could not be reconstructed in our data. For some models, this was due the absence of predictors in our dataset, such as those requiring computed tomography imaging, since this is not currently routinely recommended for patients with suspected or confirmed COVID-19 [15]. We were also not able to include models for which the parameters were not publicly available. This underscores the need for strict adherence to reporting standards in multivariable prediction models [13]. Finally, we used admission data only as predictors in this study, since most prognostic scores are intended to predict outcomes at the point of hospital admission. We note, however, that some scores are designed for dynamic in-patient monitoring, with NEWS2 showing reasonable discrimination for deterioration over a 24-h interval, as originally intended [27]. Future studies may integrate serial data to examine model performance when using such dynamic measurements.

Despite the vast global interest in the pursuit of prognostic models for COVID-19, our findings show that none of the COVID-19-specific models evaluated in this study can currently be recommended for routine clinical use. In addition, while some of the evaluated models that are not specific to COVID-19 are routinely used and may be of value among in-patients [12, 27], people with suspected infection [10] or community-acquired pneumonia [11], none showed greater clinical utility than the strongest univariable predictors among patients with COVID-19. Our data show that admission oxygen saturation on air is a strong predictor of clinical deterioration and may be evaluated in future studies to stratify in-patient management and for remote community monitoring. We note that all novel prognostic models for COVID-19 assessed in the current study were derived from single-centre data. Future studies may seek to pool data from multiple centres in order to robustly evaluate the performance of existing and newly emerging models across heterogeneous populations, and develop and validate novel prognostic models, through individual participant data meta-analysis [33]. Such an approach would allow assessments of between-study heterogeneity and the likely generalisability of candidate models. It is also imperative that discovery populations are representative of target populations for model implementation, with inclusion of unselected cohorts. Moreover, we strongly advocate for transparent reporting in keeping with TRIPOD standards (including modelling approaches, all coefficients and standard errors) along with standardisation of outcomes and time horizons, in order to facilitate ongoing systematic evaluations of model performance and clinical utility [13].

We conclude that baseline oxygen saturation on room air and patient age are strong predictors of deterioration and mortality, respectively. None of the prognostic models evaluated in this study offer incremental value for patient stratification to these univariable predictors when using admission data. Therefore, none of the evaluated prognostic models for COVID-19 can be recommended for routine clinical implementation. Future studies seeking to develop prognostic models for COVID-19 should consider integrating multi-centre data in order to increase generalisability of findings, and should ensure benchmarking against existing models and simpler univariable predictors.

Acknowledgements

The UCLH COVID-19 Reporting Group was comprised of the following individuals, who were involved in data curation as non-author contributors: Asia Ahmed, Ronan Astin, Malcolm Avari, Elkie Benhur, Anisha Bhagwanani, Timothy Bonnici, Sean Carlson, Jessica Carter, Sonya Crowe, Mark Duncan, Ferran Espuny-Pujol, James Fullerton, Marc George, Georgina Harridge, Ali Hosin, Rachel Hubbard, Adnan Hubraq, Prem Jareonsettasin, Zella King, Avi Korman, Sophie Kristina, Lawrence Langley, Jacques-Henri Meurgey, Henrietta Mills, Alfio Missaglia, Ankita Mondal, Samuel Moulding, Christina Pagel, Liyang Pan, Shivani Patel, Valeria Pintar, Jordan Poulos, Ruth Prendecki, Alexander Procter, Magali Taylor, David Thompson, Lucy Tiffen, Hannah Wright, Luke Wynne, Jason Yeung, Claudia Zeicu, Leilei Zhu

Footnotes

  • This article has supplementary material available from erj.ersjournals.com

  • Data sharing statement: The conditions of regulatory approvals for the present study preclude open access data sharing to minimise risk of patient identification through granular individual health record data. The authors will consider specific requests for data sharing as part of academic collaborations subject to ethical approval and data transfer agreements in accordance with GDPR regulations.

  • Support statement: The study was funded by National Institute for Health Research (DRF-2018-11-ST2-004 to RKG; NF-SI-0616-10037 to IA), the Wellcome Trust (207511/Z/17/Z to MN) and has been supported by the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre, in particular by the NIHR UCLH/UCL BRC Clinical and Research Informatics Unit. This paper presents independent research supported by the NIHR. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. The funder had no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the article for publication. Wellcome Trust; DOI: http://dx.doi.org/10.13039/100004440; Grant: 207511/Z/17/Z; Research Trainees Coordinating Centre; DOI: http://dx.doi.org/10.13039/501100000659; Grant: DRF-2018-11-ST2-004, NF-SI-0616-10037.

  • Author contributions: RKG and MN conceived the study. RKG conducted the analysis and wrote the first draft of the manuscript. All other authors contributed towards data collection, study design and/or interpretation. All authors have critically appraised and approved the final manuscript prior to submission. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. Members of The UCLH COVID-19 Reporting contributed towards data curation and are non-author contributors/collaborators for this study.

  • Conflict of interest: Dr. Gupta has nothing to disclose.

  • Conflict of interest: Dr. Marks has nothing to disclose.

  • Conflict of interest: Dr. Samuels has nothing to disclose.

  • Conflict of interest: Dr. Luintel has nothing to disclose.

  • Conflict of interest: Dr. Rampling has nothing to disclose.

  • Conflict of interest: Dr. Chowdhury has nothing to disclose.

  • Conflict of interest: Dr. Quartagno has nothing to disclose.

  • Conflict of interest: Dr. Nair reports non-financial support from AIDENCE BV , grants from NIHR UCL Biomedical Research Centre, outside the submitted work;.

  • Conflict of interest: Dr. Lipman has nothing to disclose.

  • Conflict of interest: Dr. Abubakar has nothing to disclose.

  • Conflict of interest: Dr. van Smeden has nothing to disclose.

  • Conflict of interest: Dr. Wong has nothing to disclose.

  • Conflict of interest: Dr. Williams has nothing to disclose.

  • Conflict of interest: Dr. Noursadeghi reports grants from Wellcome Trust, grants from National Institute for Health Research Biomedical Research Centre at Univeristy College London NHS Foundation Trust, during the conduct of the study;.

  • Received September 14, 2020.
  • Accepted September 17, 2020.
  • Copyright ©ERS 2020
http://creativecommons.org/licenses/by/4.0/

This version is distributed under the terms of the Creative Commons Attribution Licence 4.0.

References

  1. ↵
    1. Richardson S,
    2. Hirsch JS,
    3. Narasimhan M, et al.
    Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA 2020; 323: 2052–2059. doi:10.1001/jama.2020.6775
    OpenUrlPubMed
    1. Docherty AB,
    2. Harrison EM,
    3. Green CA, et al.
    Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study. BMJ 2020; 369: m1985. doi:10.1136/bmj.m1985
    OpenUrlAbstract/FREE Full Text
    1. Grasselli G,
    2. Pesenti A,
    3. Cecconi M
    . Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy. JAMA 2020; 323: 1545. doi:10.1001/jama.2020.4031
    OpenUrl
  2. ↵
    1. Imperial College COVID-19 response team
    . Report 17 - Clinical characteristics and predictors of outcomes of hospitalised patients with COVID-19 in a London NHS Trust: a retrospective cohort study | Faculty of Medicine | Imperial College London [Internet]. 2020 [cited 2020 May 14]. www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-17-clinical/.
  3. ↵
    1. Li R,
    2. Rivers C,
    3. Tan Q, et al.
    The demand for inpatient and ICU beds for COVID-19 in the US: lessons from Chinese cities. medRxiv 2020. 2020.03.09.20033241
  4. ↵
    1. Beigel JH,
    2. Tomashek KM,
    3. Dodd LE, et al.
    Remdesivir for the Treatment of Covid-19 — Preliminary Report. N Engl J Med 2020. NEJMoa2007764
  5. ↵
    1. Horby P,
    2. Lim WS,
    3. Emberson J, et al.
    Effect of Dexamethasone in Hospitalized Patients with COVID-19: Preliminary Report. medRxiv 2020. 2020.06.22.20137273
  6. ↵
    1. Wynants L,
    2. Van Calster B,
    3. Collins GS, et al.
    Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 2020; 369: m1328. doi:10.1136/bmj.m1328
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Wolff RF,
    2. Moons KGM,
    3. Riley RD, et al.
    PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med 2019; 170: 51–58. doi:10.7326/M18-1376
    OpenUrlCrossRefPubMed
  8. ↵
    1. Seymour CW,
    2. Liu VX,
    3. Iwashyna TJ, et al.
    Assessment of Clinical Criteria for Sepsis. JAMA 2016; 315: 762. doi:10.1001/jama.2016.0288
    OpenUrlCrossRefPubMed
  9. ↵
    1. Lim WS
    . Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study. Thorax 2003; 58: 377–382. doi:10.1136/thorax.58.5.377
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Royal College of Physicians
    . National Early Warning Score (NEWS) 2 | RCP London [Internet]. [cited 2020 Jul 1]. www.rcplondon.ac.uk/projects/outputs/national-early-warning-score-news-2.
  11. ↵
    1. Collins GS,
    2. Reitsma JB,
    3. Altman DG, et al.
    Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015; 350: g7594. doi:10.1136/bmj.g7594
    OpenUrlCrossRefPubMed
  12. ↵
    1. Wong HYF,
    2. Lam HYS,
    3. Fong AH-T, et al.
    Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients. Radiology 2019; 296: E72–E78. doi:10.1148/radiol.2020201160
    OpenUrl
  13. ↵
    COVID-19 Resources | The British Society of Thoracic Imaging [Internet]. [cited 2020 Jul 1]. www.bsti.org.uk/covid-19-resources/.
  14. ↵
    1. WHO Working Group on the Clinical Characterisation and Management of COVID-19 infection
    Marshall JC, Murthy S, et al. A minimal common outcome measure set for COVID-19 clinical research. Lancet Infect Dis 2020; 20: e192–e197. doi:10.1016/S1473-3099(20)30483-7
    OpenUrl
  15. ↵
    1. Riley RD,
    2. van der Windt D,
    3. Croft P, et al.
    Prognosis research in healthcare : concepts, methods, and impact.
  16. ↵
    1. Kamarudin AN,
    2. Cox T,
    3. Kolamunnage-Dona R
    . Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med Res Methodol 2017; 17: 53. doi:10.1186/s12874-017-0332-6
    OpenUrl
  17. ↵
    1. Hewitt J,
    2. Carter B,
    3. Vilches-Moraga A, et al.
    The effect of frailty on survival in patients with COVID-19 (COPE): a multicentre, European, observational cohort study. Lancet 2020; 5: E444–E451.
    OpenUrl
  18. ↵
    1. Vickers AJ,
    2. van Calster B,
    3. Steyerberg EW
    . A simple, step-by-step guide to interpreting decision curve analysis. Diagnostic Progn Res 2019; 3: 18. doi:10.1186/s41512-019-0064-7
    OpenUrl
  19. ↵
    1. Brown M
    . rmda: Risk Model Decision Analysis. 2018.
  20. ↵
    1. White IR,
    2. Royston P,
    3. Wood AM
    . Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011; 30: 377–399. doi:10.1002/sim.4067
    OpenUrlCrossRefPubMed
  21. ↵
    1. van Buuren S,
    2. Groothuis-Oudshoorn K
    . mice: Multivariate Imputation by Chained Equations in R. J Stat Softw 2011; 45: 1–67.
    OpenUrlCrossRef
  22. ↵
    1. Rubin DB
    . Multiple imputation for nonresponse in surveys. Wiley-Interscience, 2004.
  23. ↵
    1. Harrell FE Jr.
    . rms: Regression Modeling Strategies. 2019.
  24. ↵
    1. Olsson T,
    2. Terent A,
    3. Lind L
    . Rapid Emergency Medicine Score: A New Prognostic Tool for In-Hospital Mortality in Nonsurgical Emergency Department Patients. J Intern Med 2004; 255: 579–587. doi:10.1111/j.1365-2796.2004.01321.x
    OpenUrlCrossRefPubMedWeb of Science
  25. ↵
    1. Smith GB,
    2. Prytherch DR,
    3. Meredith P, et al.
    The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation 2013; 84: 465–470. doi:10.1016/j.resuscitation.2012.12.016
    OpenUrlCrossRefPubMed
  26. ↵
    1. Bello-Chavolla OY,
    2. Bahena-López JP,
    3. Antonio-Villa NE, et al.
    Predicting mortality due to SARS-CoV-2: A mechanistic score relating obesity and diabetes to COVID-19 outcomes in Mexico. J Clin Endocrinol Metab 2020; 105: 2752–2761. doi:10.1210/clinem/dgaa346
    OpenUrl
  27. ↵
    1. Caramelo F,
    2. Ferreira N,
    3. Oliveiros B
    . Estimation of risk factors for COVID-19 mortality - preliminary results. medRxiv 2020. 2020.02.24.20027268
  28. ↵
    1. Collins GS,
    2. Ogundimu EO,
    3. Altman DG
    . Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med 2016; 35: 214–226. doi:10.1002/sim.6787
    OpenUrlCrossRefPubMed
  29. ↵
    1. Carr E,
    2. Bendayan R,
    3. Bean D, et al.
    Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study. medRxiv 2020.
  30. ↵
    1. Moons KGM,
    2. Altman DG,
    3. Reitsma JB, et al.
    Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): Explanation and Elaboration. Ann Intern Med 2015; 162: W1–W73. doi:10.7326/M14-0698
    OpenUrlCrossRefPubMed
  31. ↵
    1. Debray TPA,
    2. Riley RD,
    3. Rovers MM, et al.
    Individual Participant Data (IPD) Meta-analyses of Diagnostic and Prognostic Modeling Studies: Guidance on Their Use. PLoS Med 2015; 12: e1001886. doi:10.1371/journal.pmed.1001886
    OpenUrlCrossRefPubMed
    1. Subbe CP,
    2. Kruger M,
    3. Rutherford P, et al.
    Validation of a modified Early Warning Score in medical admissions. QJM 2001; 94: 521–526. doi:10.1093/qjmed/94.10.521
    OpenUrlCrossRefPubMedWeb of Science
    1. Colombi D,
    2. Bodini FC,
    3. Petrini M, et al.
    Well-aerated Lung on Admitting Chest CT to Predict Adverse Outcome in COVID-19 Pneumonia. Radiology 2020: 296: E86–E96.
    1. Galloway JB,
    2. Norton S,
    3. Barker RD, et al.
    A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: An observational cohort study. J Infect 2020; 81: 282–288. doi:10.1016/j.jinf.2020.05.064
    OpenUrl
    1. Guo Y,
    2. Liu Y,
    3. Lu J, et al.
    Development and validation of an early warning score (EWAS) for predicting clinical deterioration in patients with coronavirus disease 2019. medRxiv 2020. 2020.04.17.20064691
    1. Cambridge Clinical Trials Unit
    . TACTIC trial [Internet]. [cited 2020 Jul 1]. www.cctu.org.uk/portfolio/COVID-19/TACTIC.
    1. Chen X,
    2. Liu Z
    . Early prediction of mortality risk among severe COVID-19 patients using machine learning. medRxiv 2020. 2020.04.13.20064329
    1. Huang H,
    2. Cai S,
    3. Li Y, et al.
    Prognostic factors for COVID-19 pneumonia progression to severe symptom based on the earlier clinical features: a retrospective analysis. medRxiv Cold 2020. 2020.03.28.20045989
    1. Ji D,
    2. Zhang D,
    3. Xu J, et al.
    Prediction for Progression Risk in Patients with COVID-19 Pneumonia: the CALL Score. Clin Infect Dis 2020; 71: 1393–1399. doi:10.1093/cid/ciaa414
    OpenUrl
    1. Lu J,
    2. Hu S,
    3. Fan R, et al.
    ACP risk grade: a simple mortality index for patients with confirmed or suspected severe acute respiratory syndrome coronavirus 2 disease (COVID-19) during the early stage of outbreak in Wuhan, China. medRxiv 2020. 2020.02.20.20025510
    1. Shi Y,
    2. Yu X,
    3. Zhao H, et al.
    Host susceptibility to severe COVID-19 and establishment of a host risk score: findings of 487 cases outside Wuhan. Crit Care; 24: 108. doi:10.1186/s13054-020-2833-7
    1. Xie J,
    2. Hungerford D,
    3. Chen H, et al.
    Development and external validation of a prognostic multivariable model on admission for hospitalized patients with COVID-19. medRxiv 2020. 2020.03.28.20045997
    1. Yan L,
    2. Zhang H-T,
    3. Goncalves J, et al.
    An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell 2020; 2: 283–288. doi:10.1038/s42256-020-0180-7
    OpenUrl
    1. Zhang H,
    2. Shi T,
    3. Wu X, et al.
    Risk prediction for poor outcome and death in hospital in-patients with COVID-19: derivation in Wuhan, China and external validation in London, UK. medRxiv 2020. 2020.04.28.20082222
    1. Hu H,
    2. Yao N,
    3. Qiu Y
    . Comparing Rapid Scoring Systems in Mortality Prediction of Critically Ill Patients With Novel Coronavirus Disease. Burton JH, editor. Acad Emerg Med 2020; 27: 461–468. doi:10.1111/acem.13992
    OpenUrl
PreviousNext
Back to top
View this article with LENS
Vol 59 Issue 5 Table of Contents
European Respiratory Journal: 59 (5)
  • Table of Contents
  • Index by author
Email

Thank you for your interest in spreading the word on European Respiratory Society .

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: An observational cohort study
(Your Name) has sent you a message from European Respiratory Society
(Your Name) thought you would like to see the European Respiratory Society web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
Citation Tools
Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: An observational cohort study
Rishi K. Gupta, Michael Marks, Thomas H. A. Samuels, Akish Luintel, Tommy Rampling, Humayra Chowdhury, Matteo Quartagno, Arjun Nair, Marc Lipman, Ibrahim Abubakar, Maarten van Smeden, Wai Keong Wong, Bryan Williams, Mahdad Noursadeghi
European Respiratory Journal Jan 2020, 2003498; DOI: 10.1183/13993003.03498-2020

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Share
Systematic evaluation and external validation of 22 prognostic models among hospitalised adults with COVID-19: An observational cohort study
Rishi K. Gupta, Michael Marks, Thomas H. A. Samuels, Akish Luintel, Tommy Rampling, Humayra Chowdhury, Matteo Quartagno, Arjun Nair, Marc Lipman, Ibrahim Abubakar, Maarten van Smeden, Wai Keong Wong, Bryan Williams, Mahdad Noursadeghi
European Respiratory Journal Jan 2020, 2003498; DOI: 10.1183/13993003.03498-2020
del.icio.us logo Digg logo Reddit logo Technorati logo Twitter logo CiteULike logo Connotea logo Facebook logo Google logo Mendeley logo
Full Text (PDF)

Jump To

  • Article
    • Abstract
    • Abstract
    • Introduction
    • Methods
    • Results
    • Discussion
    • Acknowledgements
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • PDF

Subjects

  • Respiratory infections and tuberculosis
  • Tweet Widget
  • Facebook Like
  • Google Plus One

More in this TOC Section

  • Lung volumes and survival in chronic lung allograft dysfunction
  • Umeclidinium in patients with COPD
  • EBUS-guided cryobiopsies in peripheral pulmonary lesions
Show more Original article

Related Articles

Navigate

  • Home
  • Current issue
  • Archive

About the ERJ

  • Journal information
  • Editorial board
  • Reviewers
  • Press
  • Permissions and reprints
  • Advertising

The European Respiratory Society

  • Society home
  • myERS
  • Privacy policy
  • Accessibility

ERS publications

  • European Respiratory Journal
  • ERJ Open Research
  • European Respiratory Review
  • Breathe
  • ERS books online
  • ERS Bookshop

Help

  • Feedback

For authors

  • Instructions for authors
  • Publication ethics and malpractice
  • Submit a manuscript

For readers

  • Alerts
  • Subjects
  • Podcasts
  • RSS

Subscriptions

  • Accessing the ERS publications

Contact us

European Respiratory Society
442 Glossop Road
Sheffield S10 2PX
United Kingdom
Tel: +44 114 2672860
Email: journals@ersnet.org

ISSN

Print ISSN:  0903-1936
Online ISSN: 1399-3003

Copyright © 2022 by the European Respiratory Society