Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users

Background — Supervised machine learning is increasingly being used to estimate clinical predictive models. Several supervised machine learning models involve hyper-parameters, whose values must be judiciously specified to ensure adequate predictive performance.

Objective — To compare several (nine) hyper-parameter optimization (HPO) methods, for tuning the hyper-parameters of an extreme gradient boosting model, with application to predicting high-need high-cost health care users.

Methods — Extreme gradient boosting models were estimated using a randomly sampled training dataset. Models were separately trained using nine different HPO methods: 1) random sampling, 2) simulated annealing, 3) quasi-Monte Carlo sampling, 4-5) two variations of Bayesian hyper-parameter optimization via tree-Parzen estimation, 6-7) two implementations of Bayesian hyper-parameter optimization via Gaussian processes, 8) Bayesian hyper-parameter optimization via random forests, and 9) the covariance matrix adaptation evolutionary strategy. For each HPO method, we estimated 100 extreme gradient boosting models at different hyper-parameter configurations; and evaluated model performance using an AUC metric on a randomly sampled validation dataset. Using the best model identified by each HPO method, we evaluated generalization performance in terms of discrimination and calibration metrics on a randomly sampled held-out test dataset (internal validation) and a temporally independent dataset (external validation).

Results — The extreme gradient boosting model estimated using default hyper-parameter settings had reasonable discrimination (AUC=0.82) but was not well calibrated. Hyper-parameter tuning using any HPO algorithm/sampler improved model discrimination (AUC=0.84), resulted in models with near perfect calibration, and consistently identified features predictive of high-need high-cost health care users.

Conclusions — In our study, all HPO algorithms resulted in similar gains in model performance relative to baseline models. This finding likely relates to our study dataset having a large sample size, a relatively small number of features, and a strong signal to noise ratio; and would likely apply to other datasets with similar characteristics.

View Source

Information

Citation

Meaney C, Guan J, Wang X, Stukel T. BMC Med Res Methodol. 2025;25(1): 134. Epub 2025 May 15.

View Source

Discover More

Journal Article

19/05/2025

Effect of single-entry referral models and team-based care on wait times for hip and knee joint replacement in Ontario: a simulation study

Seyedi P, Aleman D, Baxter N, Bell C, Bodur M, Calzara A, Campbell R, Carter M, de Jager P, Emerson S, Irish J, Martin D, Lee S, Persitz J, Saxe-Braithwaite M, Takata J, Varkul O, Yang S, Zanchetta C, Urbach D. CMAJ. 2025;197(19): E524-E531. Epub 2025 May 19.

Journal Article

14/05/2025

Investigating use of diagnostic codes for post-COVID- 19 condition in Ontario health administrative data

Munn J, Atzema CL, Austin PC, Butler S, Fidler L, Wang X, Gershon AS. BMC Health Serv Res. 2025; 25(1):694. Epub 2025 May 14.

Journal Article

04/05/2025

Validation of influenza vaccination status using health administrative databases by integrating pharmacy claims and medical billing databases in Ontario, Canada

Amoud R, Kwong JC, Maxwell C, Tyas SL, Cooke M, Hernandez A, Alsabbagh W. BMC Infect Dis. 2025 May 4;25(1):653.

See All

Comparison of methods for tuning machine learning model hyper-parameters: with application to predicting high-need high-cost health care users

Information

Citation

Contributing ICES Scientists

Research Programs

Associated Topics

Associated Sites

Discover More

Effect of single-entry referral models and team-based care on wait times for hip and knee joint replacement in Ontario: a simulation study

Investigating use of diagnostic codes for post-COVID- 19 condition in Ontario health administrative data

Validation of influenza vaccination status using health administrative databases by integrating pharmacy claims and medical billing databases in Ontario, Canada