Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees

Background — The COVID-19 pandemic has led to an increased demand for healthcare resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2.

Methods — We used routinely collected health records in a patient cohort to develop and validate our prediction model. This cohort included adult patients (age ≥ 18 yr) from Ontario, Canada, who tested positive for SARS-CoV-2 ribonucleic acid by polymerase chain reaction between Feb. 2 and Oct. 5, 2020, and were followed up through Nov. 5, 2020. Patients living in long-term care facilities were excluded, as they were all assumed to be at high risk of hospitalization for COVID-19. Risk of hospitalization within 30 days of diagnosis of SARS-CoV-2 infection was estimated via gradient-boosting decision trees, and variable importance examined via Shapley values. We built a gradient-boosting model using the Extreme Gradient Boosting (XGBoost) algorithm and compared its performance against 4 empirical rules commonly used for risk stratifications based on age and number of comorbidities.

Results — The cohort included 36 323 patients with 2583 hospitalizations (7.1%). Hospitalized patients had a higher median age (64 yr v. 43 yr), were more likely to be male (56.3% v. 47.3%) and had a higher median number of comorbidities (3, interquartile range [IQR] 2-6 v. 1, IQR 0-3) than nonhospitalized patients. Patients were split into development (n = 29 058, 80.0%) and held-out validation (n = 7265, 20.0%) cohorts. The gradient-boosting model achieved high discrimination (development cohort: area under the receiver operating characteristic curve across the 5 folds of 0.852; validation cohort: 0.8475) and strong calibration (slope = 1.01, intercept = -0.01). The patients who scored at the top 10% captured 47.4% of hospitalizations, and those who scored at the top 30% captured 80.6%.

Interpretation — We developed and validated an accurate risk stratification model using routinely collected health administrative data. We envision that modelling such risk stratification based on routinely collected health data could support management of COVID-19 on a population health level.

View Source

Information

Citation

Gutierrez JM, Volkovs M, Poutanen T, Watson T, Rosella LC. CMAJ Open. 2021; 9(4):E1223-31. Epub 2021 Dec 21.

View Source

Discover More

Journal Article

25/04/2024

Multifetal pregnancy after implementation of a publicly funded fertility program

Velez MP, Soule A, Gaudet L, Pudwell J, Nguyen P, Ray JG. JAMA Netw Open. 2024; 7(4):e248496. Epub 2024 Apr 25.

Journal Article

25/04/2024

Proportion of life spent in Canada and the incidence of multiple sclerosis in permanent immigrants

Vyas MV, Kapral MK, Rea A, Fang J, Rotstein DL. Neurology. 2024; 102(10):e209350. Epub 2024 Apr 24.

Journal Article

16/04/2024

Association of blood mitochondrial DNA copy number with risk of acute kidney injury after cardiac surgery

Jotwani V, Thiessen-Philbrook H, rking DE, Yang SY, McArthur E, Garg AX, Katz R, Tranah GJ, Ix JH, Cummings S, Waikar SS, Sarnak MJ, Shlipak MG, Parikh SM, Parikh CR. Am J Kidney Dis. 2024; Apr 16 [Epub ahead of print].

See All

Risk stratification for COVID-19 hospitalization: a multivariable model based on gradient-boosting decision trees

Information

Citation

Contributing ICES Scientists

Research Programs

Associated Sites

Discover More

Multifetal pregnancy after implementation of a publicly funded fertility program

Proportion of life spent in Canada and the incidence of multiple sclerosis in permanent immigrants

Association of blood mitochondrial DNA copy number with risk of acute kidney injury after cardiac surgery