Go to content

Using the electronic medical record to identify patients at high risk for frequent ED visits and high system costs


Background — A small proportion of patients accounts for a very high proportion of healthcare utilization. Accurate pre-emptive identification may facilitate tailored intervention. We sought to determine whether Machine Learning techniques using text from a family practice Electronic Medical Record (EMR) can be used to predict future high Emergency Department (ED) use and total costs by patients who are not yet high ED users or high cost to the healthcare system.

Methods — Text from fields of the Cumulative Patient Profile within an EMR (PS Suite) of 43,111 patients was indexed. Separate training and validation cohorts were created. After processing, 11,905 words were used to fit a logistic regression model. The primary outcomes of interest in the 12 months following prediction were 1) 3 or more ED visits and, 2) being in the top 5% in healthcare expenditures. Outcomes were assessed through linkage to administrative databases housed at the Institute for Clinical Evaluative Sciences (ICES).

Results — In the model to predict frequent ED visits, after excluding patients who were high ED users in the previous year, the area under the receiver operating characteristic (AUROC) curve was 0.71. Using the same methodology, the model to predict top 5% in total system costs had an AUROC curve of 0.76

Conclusion — Machine learning techniques can be applied to analyze free text contained in EMRs. This dataset is more predictive of patients who will generate future high costs than future ED visits. It remains to be seen if these predictions can be used to reduce costs by early interventions in this cohort of patients.



Frost DW, Vembu S, Wang J, Tu K, Morris Q, Abrams HB. Am J Med. 2017; 130(5):e17-22. Epub 2017 Jan 5.

Associated Sites