Go to content

Bayesian modeling of missing data in clinical research


The issue of missing data frequently confronts researchers using data derived from patient medical records. We used Monte Carlo simulations to examine the performance of three Bayesian methods that imputed missing data by placing a simple prior distribution upon the variable that was subject to being missing. These methods compared with two methods that used a multivariate logistic regression model to impute missing data. As a final comparator, we also examined the performance of the conventional complete case analysis and a method that equated missing data on a risk factor with evidence of absence of the risk factor. It was shown that the bias and mean square error of each method depended upon the prevalence of the risk factor under consideration and the missing data mechanism. No method performed well in all situations. However, assuming that the risk factor had a Bernoulli distribution and placing a uniform prior distribution upon the parameter of this distribution resulted in lower relative bias than the other competing methods in the majority of settings. The performance of each method was then examined on a dataset of 5131 patients admitted to hospital with a heart attack.



Austin PC, Escobar MD. Comput Stat Data Anal. 2005; 49(3):821-36.

Contributing ICES Scientists

Research Programs

Associated Sites