Go to content

Validation of an algorithm to identify children with biopsy-proven celiac disease from within health administrative data: an assessment of health services utilization patterns in Ontario, Canada


Importance — Celiac disease (CD) is a common pediatric illness, and awareness of gluten-related disorders including CD is growing. Health administrative data represents a unique opportunity to conduct population-based surveillance of this chronic condition and assess the impact of caring for children with CD on the health system.

Objective — The objective of the study was to validate an algorithm based on health administrative data diagnostic codes to accurately identify children with biopsy-proven CD. We also evaluated trends over time in the use of health services related to CD by children in Ontario, Canada.

Study Design and Setting — We conducted a retrospective cohort study and validation study of population-based health administrative data in Ontario, Canada. All cases of biopsy-proven CD diagnosed 2005–2011 in Ottawa were identified through chart review from a large pediatric healthcare center, and linked to the Ontario health administrative data to serve as positive reference standard. All other children living within Ottawa served as the negative reference standard. Case-identifying algorithms based on outpatient physician visits with associated ICD-9 code for CD plus endoscopy billing code were constructed and tested. Sensitivity, specificity, PPV and NPV were tested for each algorithm (with 95% CI). Poisson regression, adjusting for sex and age at diagnosis, was used to explore the trend in outpatient visits associated with a CD diagnostic code from 1995–2011.

Results — The best algorithm to identify CD consisted of an endoscopy billing claim follow by 1 or more adult or pediatric gastroenterologist encounters after the endoscopic procedure. The sensitivity, specificity, PPV, and NPV for the algorithm were: 70.4% (95% CI 61.1–78.4%), >99.9% (95% CI >99.9->99.9%), 53.3% (95% CI 45.1–61.4%) and >99.9% (95% CI >99.9->99.9%) respectively. It identified 1289 suspected CD cases from Ontario-wide administrative data. There was a 9% annual increase in the use of this combination of CD-associated diagnostic codes in physician billing data (RR 1.09, 95% CI 1.07–1.10, P<0.001).

Conclusions — With its current structure and variables Ontario health administrative data is not suitable in identifying incident pediatric CD cases. The tested algorithms suffer from poor sensitivity and/or poor PPV, which increase the risk of case misclassification that could lead to biased estimation of CD incidence rate. This study reinforced the importance of validating the codes used to identify cohorts or outcomes when conducting research using health administrative data.



Chan J, Mack DR, Manuel DG, Mojaverian N, de Nanassy J, Benchimol EI. PLoS One. 2017; 12(6):e0180338.

View Source

Contributing ICES Scientists

Research Programs

Associated Sites