Go to content

Is cancer stage data missing completely at random? A report from a large population-based cohort of non-small cell lung cancer


Introduction — Population-based datasets are often used to estimate changes in utilization or outcomes of novel therapies. Inclusion or exclusion of unstaged patients may impact on interpretation of these studies.

Methods — A large population-based dataset in Ontario, Canada of non-small cell lung cancer patients was examined to evaluate the characteristics and outcomes of unstaged patients compared to staged patients. Multivariable Poisson regression was used to evaluate differences in patient-level characteristics between groups. Kaplan-Meier estimates of survival and log-rank statistics were utilized.

Results — In our Ontario cohort of 51,152 patients with NSCLC, 11.2% (n=5,707) were unstaged, and there was evidence that stage data was not missing completely at random. Those without assigned stage were more likely than staged patients to be older (RR [95%CI]), (70-79 vs. 20-59: 1.51 [1.38-1.66]; 80+ vs. 20-59: 2.87 [2.62-3.15]), have a higher comorbidity index (Score 1-2 vs 0: 1.19 [1.12-1.27]; 3 vs. 0: 1.49 [1.38-1.60]), and have a lower socioeconomic class (4 vs. 1 (lowest): 0.91 [0.84-0.98]; 5 vs. 1 (lowest): 0.89 [0.83-0.97]). Overall survival of unstaged patients suggested a mixture of early and advanced stage, but with a large proportion that are probably stage IV patients with more rapid death than those with reported stage IV disease.

Conclusion — In this case study, evaluation of stage-specific healthcare utilization and outcomes for staged patients with stage IV disease at the population level may have a bias as a distinct subset of stage IV patients with rapid death are likely among those without a documented stage in administrative data.



Robinson AG, Nguyen P, Goldie CL, Jalink M, Hanna TP. Front Oncol. 2023; 13:1146053. Epub 2023 Apr 4.

View Source

Contributing ICES Scientists

Research Programs

Associated Sites