Go to content

Population-based clustering of co-occurring social determinants: an application of unsupervised machine learning

Share

Purpose — This study aimed to develop a cluster-based measure of multiple co-occurring social determinants of health by applying unsupervised machine learning to a population-based cohort, offering a data-driven approach to organize complex social exposures.

Methods — Unsupervised clustering was applied to a population-based cohort of Ontario respondents to six-cycles of the Canadian Community Health Survey (2001-2012) linked to the Canadian census and vital statistics data. Clusters were evaluated using internal metrics, visualization techniques, descriptive analysis and theoretical considerations to determine the optimal number of clusters. Sensitivity analyses were integrated across the iterative clustering process. Premature mortality rates were generated assess validity.

Results — Optimal clustering solutions included 4-clusters and 6-clusters. Both cluster solutions revealed distinct social typologies. The 6-cluster solution offered greater granularity and theoretical interpretability. The 4-cluster solution showed greater heterogeneity within certain marginalized groups. Premature mortality rates differed meaningfully across clusters, supporting the clustering approach in capturing risk associated with social exposure.

Conclusions — Unsupervised machine learning methods identified meaningful population subgroups reflecting complex patterns of social exposures. This approach offers a flexible, data-driven method for characterizing social exposures that can be considered alongside theoretical frameworks and used for equity monitoring, intervention planning and policy development.

Information

Citation

Giesinger I, Buajitti E, Siddiqi A, Smith PM, Krishnan RG, Postill G, Rosella LC. Ann Epidemiol. 2026; Mar 3 [Epub ahead of print].

View Source

Contributing ICES Scientists

Associated Sites