Migraineurs were reliably identified using administrative data

Background — Migraine is a common and important source of pain and disability in society. Accurately identifying such people using routinely collected health data would be beneficial for health services research.

Objective — Externally validate a previously published method to identify migraineurs using health administrative data; and determine if a better model can be derived using data-mining techniques.

Methods — Migraine status was determined for Ontarians participating in a population-based, cross-sectional survey. Consenting participants were linked to population-based health administrative data to identify age, sex, and coded diagnoses. Discrimination and calibration measures were used to appraise the models. A de novo technique we term “double threshold analysis” was used to determine optimal lower and upper expected probabilities to identify migraine status in the newly derived model.

Results — A total of 1,01,114 people (mean age 46 years, 46% male) were included in the study, of which 11,314 (11.2%) had migraines. Using data-driven parameter estimates, the previous model to identify migraineurs had adequate discrimination (c-statistic 0.707 [95% CI 0.701–0.712]) and calibration (Hosmer–Lemeshow [H–L] statistic 20.8). A new model that included diagnostic code scores for physician visits, emergency visits, and hospitalizations with nonlinear terms for age and interactions significantly improved the model (c-statistic 0.724 [0.716–0.733], 16.4). Categorizing all people with a predicted migraine probability less than 10% or greater than 90% as without and having the disease, respectively, resulted in a sensitivity of 3.1%, a specificity of 99.96%, and a positive predictive value of 81.0% while capturing 57.0% of the cohort and 29.3% of migraineurs.

Conclusion — A previously derived model to identify migraineurs was improved using data-mining techniques permitting accurate cohort identification using routinely collected health administrative data.