Skip to main content

Development and validation of an ensemble machine learning framework for detection of all-cause advanced hepatic fibrosis: a retrospective cohort study

Sarvestany SS, Kwong JC, Azhie A, Dong V, Cerocchi O, Ali AF, Karnam RS, Kuriry H, Shengir M, Candido E, Duchen R, Sebastiani G, Patel K, Goldenberg A, Bhat M. Lancet Digit Health. 2022; 4:e188–99. Epub 2022 Mar 1. DOI:

Background — Cirrhosis is the result of advanced scarring (or fibrosis) of the liver, and is often diagnosed once decompensation with associated complications has occurred. Current non-invasive tests to detect advanced liver fibrosis have limited performance, with many indeterminate classifications. We aimed to identify patients with advanced liver fibrosis of all-causes using machine learning algorithms (MLAs).

Methods — In this retrospective study of routinely collected laboratory, clinical, and demographic data, we trained six MLAs (support vector machine, random forest classifier, gradient boosting classifier, logistic regression, artificial neural network, and an ensemble of all these algorithms) to detect advanced fibrosis using 1703 liver biopsies from patients seen at the Toronto Liver Clinic (TLC) between Jan 1, 2000, and Dec 20, 2014. Performance was validated using five datasets derived from patient data provided by the TLC (n=104 patients with a biopsy sample taken between March 24, 2014, and Dec 31, 2017) and McGill University Health Centre (MUHC; n=404). Patients with decompensated cirrhosis were excluded. Performance was benchmarked against aspartate aminotransferase-to-platelet ratio index (APRI), fibrosis-4 index (FIB-4), non-alcoholic fatty liver disease fibrosis score (NFS), transient elastography, and an independent panel of five hepatology experts (MB, GS, HK, KP, and RSK). MLA performance was evaluated using the area under the receiver operating characteristic curve (AUROC) and the percentage of determinate classifications.

Findings — The best MLA was an ensemble algorithm of support vector machine, random forest classifier, gradient boosting classifier, logistic regression, and neural network algorithms, which achieved 100% determinate classifications (95% CI 100·0–100·0), an AUROC score of 0·870 (95% CI 0·797–0·931) on the TLC validation set (fibrosis stages F0 and F1 vs F4), and an AUROC of 0·716 (95% CI 0·664–0·766) on the MUHC validation set (fibrosis stages F0, F1, and F2 vs F3 and F4). The ensemble MLA outperformed all routinely used biomarkers and achieved comparable performance to hepatologists as measured by AUROC and percentage of indeterminate classifications in both the TLC validation dataset (APRI AUROC score 0·719 [95% CI 0·611–0·820], 83·7% determinate [95% CI 76·0–90·4]; FIB-4 AUROC score 0·825 [95% CI 0·730–0·912], 72·1% determinate [95% CI 63·5–80·8]) and the MUHC validation dataset (APRI AUROC score 0·618 [95% CI 0·548–0·691], 75·5% determinate [95% CI 71·5–79·2]; FIB-4 AUROC score 0·717 (95% CI 0·652–0·776), 75·5% determinate [95% CI 0·713–0·797]), and achieving only slightly lower AUROC than transient elastography (0·773 [95% CI 0·699–0·834] vs 0·826 [95% CI 0·758–0·889]).

Interpretation — We have shown that an ensemble MLA outperforms non-imaging-based methods in detecting advanced fibrosis across different causes of liver disease. Our MLA was superior to APRI, FIB-4, and NFS with no indeterminate classifications, while achieving performance comparable to an independent panel of experts. MLAs using routinely collected data could identify patients at high-risk of advanced hepatic fibrosis and cirrhosis among patients with chronic liver disease, allowing intervention before onset of decompensation.

View full text