Comparison of machine learning and the regression-based EHMRG model for predicting early mortality in acute heart failure


Background — Although risk stratification of patients with acute decompensated heart failure (HF) is important, it is unknown whether machine learning (ML) or conventional statistical models are optimal. We developed ML algorithms to predict 7-day and 30-day mortality in patients with acute HF and compared these with an existing logistic regression model at the same timepoints.

Methods — Patients presenting to one of 86 hospitals, who were either admitted to hospital or discharged home directly from the emergency department, were randomly selected using stratified random sampling. ML approaches, including neural networks, random forest, XGBoost, and the Lasso, were compared with a validated logistic regression model for discrimination and calibration.

Results — Among 12,608 patients in our analysis, lasso regression (c-statistic 0.774; 95% CI, 0.743, 0.806) performed better than other ML models for 7-day mortality but did not outperform the baseline logistic regression model (0.794; 95% CI, 0.789, 0.800). For 30-day mortality, XGBoost performed better than other ML models (c-statistic 0.759; 95% CI; 0.740, 0.779), but was not significantly better than logistic regression (c-statistic 0.755; 95% CI, 0.750, 0.762). Logistic regression demonstrated better calibration at 7 days (calibration-in-the-large 0.017; 95% CI, −0.657, 0.692, and calibration slope 0.954; 95% CI,0.769, 1.139) and at 30 days (−0.026; 95% CI, −0.374, 0.322 and 0.964; 95% CI, 0.831, 1.098), and best Brier scores, compared to ML approaches.

Conclusions — Logistic regression was comparable to ML in discrimination, but was superior to ML algorithms in calibration overall. ML algorithms for prognosis should routinely report calibration metrics in addition to discrimination.



Austin DE, Lee DS, Wang CK, Ma S, Wang X, Porter J, Wang B. Int J Cardiol. 2022; Jul 19 [Epub ahead of print].

