Purpose — Routinely collected administrative data is widely used for population-based research. However, although clinically very different, atrial septal defects (ASD) and patent foramen ovale (PFO) share a single diagnostic code (ICD-9: 745.5, ICD-10: Q21.1). Using machine-learning based approaches, we developed and validated an algorithm to differentiate between PFO and ASD patient populations within healthcare administrative data.
Methods — Using data housed at ICES, we identified patients who underwent transcatheter closure in Ontario between October 2002 and December 2017 using a Canadian Classification of Interventions code (1HN80GPFL, N = 4680). A novel random forest model was developed using demographic and clinical information to differentiate those who underwent transcatheter closure for PFO or ASD. Those patients who had undergone transcatheter closure and had records in the CorHealth Ontario cardiac procedure registry (N = 1482) were used as the reference standard. Several algorithms were tested and evaluated for accuracy, sensitivity, and specificity. Variable importance was examined via mean decrease in Gini index.
Results — We tested 7 models in total. The final model included 24 variables, including demographic, comorbidity, and procedural information. After hyperparameter tuning, the final model achieved 0.76 accuracy, 0.76 sensitivity, and 0.75 specificity. Patient age group had the greatest influence on node impurity, and thus ranked highest in variable importance.
Conclusions — Our random forest classification method achieved reasonable accuracy in identifying PFO and ASD closure in administrative data. The algorithm can now be applied to evaluate long term PFO and ASD closure outcomes in Ontario, pending future external validation studies to further test the algorithm.
View full text