Background — Health care data allow for the study and surveillance of chronic diseases such as diabetes. The objective of this study was to identify and validate optimal algorithms for diabetes cases within health care administrative databases for different research purposes, populations, and data sources.
Methods — We linked health care administrative databases from Ontario, Canada to a reference standard of primary care electronic medical records (EMRs). We then identified and calculated the performance characteristics of multiple adult diabetes case definitions, using combinations of data sources and time windows.
Results — The best algorithm to identify diabetes cases was the presence at any time of one hospitalization or physician claim for diabetes AND either one prescription for an anti-diabetic medication or one physician claim with a diabetes-specific fee code [sensitivity 84.2%, specificity 99.2%, positive predictive value (PPV) 92.5%]. Use of physician claims alone performed almost as well: three physician claims for diabetes within one year was highly specific (sensitivity 79.9%, specificity 99.1%, PPV 91.4%) and one physician claim at any time was highly sensitive (sensitivity 93.6%, specificity 91.9%, PPV 58.5%).
Conclusions — This study identifies validated algorithms to capture diabetes cases within health care administrative databases for a range of purposes, populations and data availability. These findings are useful to study trends and outcomes of diabetes using routinely-collected health care data.
View full text